Elasticsearch对接python实现简单搜索引擎

个人先参考下面的,再进行es的学习

Web信息挖掘和主题特征研究:
https://blog.csdn.net/zhangfei2018/article/details/8739668
https://zhuanlan.zhihu.com/p/20787014
https://www.afenxi.com/47020.html
http://blog.chenpeng.info/html/1930
https://www.itcodemonkey.com/article/14419.html (10 种最流行的 Web 挖掘工具)
https://www.afenxi.com/14962.html Python爬虫进行Web数据挖掘总结和分析

实现搜索引擎:
https://blog.csdn.net/luanpeng825485697/article/details/78997189 不错的讲解
http://python.jobbole.com/88921/ 过滤器+分词
https://blog.csdn.net/ryinlovec/article/details/53547233 爬取+解析
https://www.cnblogs.com/lucky-pin/p/7117182.html Pylucene
http://ijianbian.com/home/post/detail?id=6202571 Python中使用haystack实现django全文检索搜索引擎功能(不错的网站)
https://github.com/lixinsu/simple-search-engine pyhton 实现的简单搜索引擎 代码
https://learnku.com/laravel/t/1649/phpmysql-do-front-desk-python-do-reptiles-write-a-skydrive-search-engine-to-share-my-code-to-write-reptiles 爬取网页代理代码 参考
https://liam.page/2017/04/04/Python-100-lines-of-PageRank/ 100 行代码实现 PageRank 算法
http://www.chickendinner.online/douban-house-source-searcher/ Python实现入门级房源搜索引擎
http://www.omegaxyz.com/2018/01/09/python_search_engine/
https://yuannow.com/2017/10/09/implement-search-auto-complete-by-python-hadoop/ 用Python在Hadoop上实现搜索自动补全
https://write-bug.com/article/2513.html 基于Python的图片及音频搜索引擎
https://cblog.xyz/article/195 pylucene的安装与使用
https://www.open-open.com/lib/view/open1393378372193.html pylucene搜索引擎
https://blog.csdn.net/chuter/article/details/1672364?utm_source=blogxgwz3 pylucene的使用
https://faldict.github.io/faldict/lucene/ 基于lucene的搜索引擎
http://www.voidcn.com/article/p-hjpmifud-ss.html
https://www.jianshu.com/p/268a2a55d700
https://cloud.tencent.com/developer/article/1356584

https://yq.aliyun.com/articles/661985
https://github.com/lqkweb/sqlflow

http://www.zhuanzhi.ai/document/24b37e0f190697f3b0e89cec9af0f8b8 33款可用来抓数据的开源爬虫软件工具
https://blog.51cto.com/johnny84/1172177 13款优秀的开源搜索引擎
https://juejin.im/post/5bad93efe51d450e9d64aede Elasticsearch 及其与 Python 的对接实现
http://bk.poph163.com/2018/07/04/elk%E7%94%B1%E6%B5%85%E5%85%A5%E6%B7%B1/ Elasticsearch由浅入深(强推)

https://www.yanxurui.cc/posts/project/2017-07-17-elasticsearch-10million-documents/ 使用elasticsearch搜索1000万条数据
http://www.zhongruitech.com/66659901.html scrapy+django
https://www.ctolib.com/mbinary-dbworld-search.html 用 python, django 实现的一个很简单的搜索引擎
https://www.imooc.com/article/45267
还不错 https://www.fengiling.com/blog/view/?id=970825 搜索引擎–基于Django/Scrapy/ElasticSearch的搜索引擎的实现

一.安装与配置Elasticsearch

参考:https://juejin.im/post/5b2777416fb9a00e8626e238(出现问题,按照文中解释即可解决)

不用能root用户启动,添加:

root@kali:~/temp/elasticsearch-5.5.1# groupadd elasticsearch
root@kali:~/temp/elasticsearch-5.5.1# useradd elasticsearch -g elasticsearch -p elasticsearch
root@kali:~/temp/elasticsearch-5.5.1# chown -R elasticsearch.elasticsearch /root/temp/elasticsearch-5.5.1

调整内存大小,./config/jvm.options xms2g xmx2g

二.基础使用

参考:

安装中文分词插件:./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.5.1/elasticsearch-analysis-ik-5.5.1.zip

以及安装等等插件

基本操作(增删改查):https://www.ruanyifeng.com/blog/2017/08/elasticsearch.html

一些概念理解与应用:http://bk.poph163.com/2018/07/04/elk%E7%94%B1%E6%B5%85%E5%85%A5%E6%B7%B1/

配置文件的一些记录:https://www.jianshu.com/p/f283d876b1cb

elasticsearch+python的增删改查:https://juejin.im/post/5bad93efe51d450e9d64aede

三.进阶操作

1. kibana + elasticsearch 可视化操作(暂未实际操作过)

2.

别人的成品: https://blog.smilehacker.net/2017/12/28/搜索引擎的搭建/

源码:https://github.com/smile0304/Article_Search

如果在使用这套源码出现 ImportError: cannot import name ‘InnerObjectWrapper’ from ‘elasticsearch_dsl问题 降低版本即可解决 pip install elasticsearch-dsl==5.1

3.

参考:https://github.com/lqkweb/sqlflow?spm=a2c4e.11153940.blogcont661985.19.6fd23247wLxGNj

4.

参考:https://github.com/lijingpeng/search_system_example

2019.5.27