Suggest As You Type,即在用户输入搜索的过程中,进行自动补全或者纠错。
参考地址:
Elasticsearch Suggester详解 https://elasticsearch.cn/article/142
为Elasticsearch添加中文分词 http://keenwon.com/1404.html
Suggesters API
- Term Suggester
- Phrase Suggester
- Completion Suggester
- Context Suggester
Term Suggester
创建索引
PUT /blogs/ { "mappings": { "tech": { "properties": { "body": { "type": "text" } } } } }
bulk api 导入数据
POST _bulk/?refresh=true { "index" : { "_index" : "blogs", "_type" : "tech" } } { "body": "Lucene is cool"} { "index" : { "_index" : "blogs", "_type" : "tech" } } { "body": "Elasticsearch builds on top of lucene"} { "index" : { "_index" : "blogs", "_type" : "tech" } } { "body": "Elasticsearch rocks"} { "index" : { "_index" : "blogs", "_type" : "tech" } } { "body": "Elastic is the company behind ELK stack"} { "index" : { "_index" : "blogs", "_type" : "tech" } } { "body": "elk rocks"} { "index" : { "_index" : "blogs", "_type" : "tech" } } { "body": "elasticsearch is rock solid"}
这里需要注意的是,需要在完成一次操作之后,添加一行空格
https://stackoverflow.com/questions/35840740/bulk-indexing-using-elastic-search
每一个操作都有2行数据组成,末尾要回车换行。第一行用来说明操作命令和原数据、第二行是自定义的选项。
使用suggester搜索
POST /blogs/_search { "suggest": { "my-suggestion": { "text": "lucne rock", "term": { "suggest_mode": "missing", "field": "body" } } } }
suggest就是一种特殊类型的搜索,DSL内部的”text”指的是api调用方提供的文本,也就是通常用户界面上用户输入的内容。这里的lucne是错误的拼写,模拟用户输入错误。 “term”表示这是一个term suggester。 “field”指定suggester针对的字段,另外有一个可选的”suggest_mode”。 范例里的”missing”实际上就是缺省值
返回结果
{ "took": 9, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 0, "max_score": 0, "hits": [] }, "suggest": { "my-suggestion": [ { "text": "lucne", "offset": 0, "length": 5, "options": [ { "text": "lucene", "score": 0.8, "freq": 2 } ] }, { "text": "rock", "offset": 6, "length": 4, "options": [ { "text": "rocks", "score": 0.75, "freq": 2 } ] } ] } }
my-suggestion中的 text的内容是用户输入,options text中的内容是建议项。
suggest_mode 还有popular,always
Term suggester正如其名,只基于analyze过的单个term去提供建议,并不会考虑多个term之间的关系
Phrase suggester
Phrase suggester在Term suggester的基础上,会考量多个term之间的关系,比如是否同时出现在索引的原文里,相邻程度,以及词频等等
POST /blogs/_search { "suggest": { "my-suggestion": { "text": "lucne and elasticsear rock", "phrase": { "field": "body", "highlight": { "pre_tag": "<em>", "post_tag": "</em>" } } } } }
返回结果:
{ "took": 11, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 0, "max_score": 0, "hits": [] }, "suggest": { "my-suggestion": [ { "text": "lucne and elasticsear rock", "offset": 0, "length": 26, "options": [ { "text": "lucne and elasticsearch rocks", "highlighted": "lucne and <em>elasticsearch rocks</em>", "score": 0.12709484 }, { "text": "lucne and elasticsearch rock", "highlighted": "lucne and <em>elasticsearch</em> rock", "score": 0.10422645 }, { "text": "lucne and elasticsear rocks", "highlighted": "lucne and elasticsear <em>rocks</em>", "score": 0.10036137 } ] } ] } }
Completion Suggester
Completion Suggester,它主要针对的应用场景就是”Auto Completion”。 此场景下用户每输入一个字符的时候,就需要即时发送一次查询请求到后端查找匹配项,在用户输入速度较高的情况下对后端响应速度要求比较苛刻。因此实现上它和前面两个Suggester采用了不同的数据结构,索引并非通过倒排来完成,而是将analyze过的数据编码成FST和索引一起存放。对于一个open状态的索引,FST会被ES整个装载到内存里的,进行前缀查找速度极快。但是FST只能用于前缀查找,这也是Completion Suggester的局限所在。
创建索引
PUT /blogs_completion/ { "mappings": { "tech": { "properties": { "body": { "type": "completion" } } } } }
导入数据
POST _bulk/?refresh=true { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "Lucene is cool"} { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "Elasticsearch builds on top of lucene"} { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "Elasticsearch rocks"} { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "Elastic is the company behind ELK stack"} { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "the elk stack rocks"} { "index" : { "_index" : "blogs_completion", "_type" : "tech" } } { "body": "elasticsearch is rock solid"}
注意点同理
查找
POST blogs_completion/_search?pretty { "size": 0, "suggest": { "blog-suggest": { "prefix": "elastic i", "completion": { "field": "body" } } } }
结果
{ "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 0, "max_score": 0, "hits": [] }, "suggest": { "blog-suggest": [ { "text": "elastic i", "offset": 0, "length": 9, "options": [ { "text": "Elastic is the company behind ELK stack", "_index": "blogs_completion", "_type": "tech", "_id": "AfuEGWIBTnJNJ3DYfBHg", "_score": 1, "_source": { "body": "Elastic is the company behind ELK stack" } } ] } ] } }