【转载】elasticsearch 搜索补全

Suggest As You Type,即在用户输入搜索的过程中,进行自动补全或者纠错。

参考地址:

Elasticsearch Suggester详解 https://elasticsearch.cn/article/142

为Elasticsearch添加中文分词 http://keenwon.com/1404.html

Suggesters API

  1. Term Suggester
  2. Phrase Suggester
  3. Completion Suggester
  4. Context Suggester

Term Suggester

创建索引

PUT /blogs/
{
  "mappings": {
    "tech": {
      "properties": {
        "body": {
          "type": "text"
        }
      }
    }
  }
}

bulk api 导入数据

POST _bulk/?refresh=true
{ "index" : { "_index" : "blogs", "_type" : "tech" } }
{ "body": "Lucene is cool"}

{ "index" : { "_index" : "blogs", "_type" : "tech" } }
{ "body": "Elasticsearch builds on top of lucene"}

{ "index" : { "_index" : "blogs", "_type" : "tech" } }
{ "body": "Elasticsearch rocks"}

{ "index" : { "_index" : "blogs", "_type" : "tech" } }
{ "body": "Elastic is the company behind ELK stack"}

{ "index" : { "_index" : "blogs", "_type" : "tech" } }
{ "body": "elk rocks"}

{ "index" : { "_index" : "blogs", "_type" : "tech" } }
{  "body": "elasticsearch is rock solid"}

这里需要注意的是,需要在完成一次操作之后,添加一行空格

https://stackoverflow.com/questions/35840740/bulk-indexing-using-elastic-search

每一个操作都有2行数据组成,末尾要回车换行。第一行用来说明操作命令和原数据、第二行是自定义的选项。

http://blog.csdn.net/napoay/article/details/51907709

使用suggester搜索

POST /blogs/_search
{ 
  "suggest": {
    "my-suggestion": {
      "text": "lucne rock",
      "term": {
        "suggest_mode": "missing",
        "field": "body"
      }
    }
  }
}

suggest就是一种特殊类型的搜索,DSL内部的”text”指的是api调用方提供的文本,也就是通常用户界面上用户输入的内容。这里的lucne是错误的拼写,模拟用户输入错误。 “term”表示这是一个term suggester。 “field”指定suggester针对的字段,另外有一个可选的”suggest_mode”。 范例里的”missing”实际上就是缺省值

https://elasticsearch.cn/article/142

返回结果

{
    "took": 9,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 0,
        "max_score": 0,
        "hits": []
    },
    "suggest": {
        "my-suggestion": [
            {
                "text": "lucne",
                "offset": 0,
                "length": 5,
                "options": [
                    {
                        "text": "lucene",
                        "score": 0.8,
                        "freq": 2
                    }
                ]
            },
            {
                "text": "rock",
                "offset": 6,
                "length": 4,
                "options": [
                    {
                        "text": "rocks",
                        "score": 0.75,
                        "freq": 2
                    }
                ]
            }
        ]
    }
}

my-suggestion中的 text的内容是用户输入,options text中的内容是建议项。

suggest_mode 还有popular,always

Term suggester正如其名,只基于analyze过的单个term去提供建议,并不会考虑多个term之间的关系

Phrase suggester

Phrase suggester在Term suggester的基础上,会考量多个term之间的关系,比如是否同时出现在索引的原文里,相邻程度,以及词频等等

POST /blogs/_search
{
  "suggest": {
    "my-suggestion": {
      "text": "lucne and elasticsear rock",
      "phrase": {
        "field": "body",
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  }
}

返回结果:

{
    "took": 11,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 0,
        "max_score": 0,
        "hits": []
    },
    "suggest": {
        "my-suggestion": [
            {
                "text": "lucne and elasticsear rock",
                "offset": 0,
                "length": 26,
                "options": [
                    {
                        "text": "lucne and elasticsearch rocks",
                        "highlighted": "lucne and <em>elasticsearch rocks</em>",
                        "score": 0.12709484
                    },
                    {
                        "text": "lucne and elasticsearch rock",
                        "highlighted": "lucne and <em>elasticsearch</em> rock",
                        "score": 0.10422645
                    },
                    {
                        "text": "lucne and elasticsear rocks",
                        "highlighted": "lucne and elasticsear <em>rocks</em>",
                        "score": 0.10036137
                    }
                ]
            }
        ]
    }
}

Completion Suggester

Completion Suggester,它主要针对的应用场景就是”Auto Completion”。 此场景下用户每输入一个字符的时候,就需要即时发送一次查询请求到后端查找匹配项,在用户输入速度较高的情况下对后端响应速度要求比较苛刻。因此实现上它和前面两个Suggester采用了不同的数据结构,索引并非通过倒排来完成,而是将analyze过的数据编码成FST和索引一起存放。对于一个open状态的索引,FST会被ES整个装载到内存里的,进行前缀查找速度极快。但是FST只能用于前缀查找,这也是Completion Suggester的局限所在。

https://elasticsearch.cn/article/142

创建索引

PUT /blogs_completion/
{
  "mappings": {
    "tech": {
      "properties": {
        "body": {
          "type": "completion"
        }
      }
    }
  }
}

导入数据

POST _bulk/?refresh=true
{ "index" : { "_index" : "blogs_completion", "_type" : "tech" } }
{ "body": "Lucene is cool"}
{ "index" : { "_index" : "blogs_completion", "_type" : "tech" } }
{ "body": "Elasticsearch builds on top of lucene"}
{ "index" : { "_index" : "blogs_completion", "_type" : "tech" } }
{ "body": "Elasticsearch rocks"}
{ "index" : { "_index" : "blogs_completion", "_type" : "tech" } }
{ "body": "Elastic is the company behind ELK stack"}
{ "index" : { "_index" : "blogs_completion", "_type" : "tech" } }
{ "body": "the elk stack rocks"}
{ "index" : { "_index" : "blogs_completion", "_type" : "tech" } }
{ "body": "elasticsearch is rock solid"}

注意点同理

查找

POST blogs_completion/_search?pretty
{ "size": 0,
  "suggest": {
    "blog-suggest": {
      "prefix": "elastic i",
      "completion": {
        "field": "body"
      }
    }
  }
}

结果

{
    "took": 5,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 0,
        "max_score": 0,
        "hits": []
    },
    "suggest": {
        "blog-suggest": [
            {
                "text": "elastic i",
                "offset": 0,
                "length": 9,
                "options": [
                    {
                        "text": "Elastic is the company behind ELK stack",
                        "_index": "blogs_completion",
                        "_type": "tech",
                        "_id": "AfuEGWIBTnJNJ3DYfBHg",
                        "_score": 1,
                        "_source": {
                            "body": "Elastic is the company behind ELK stack"
                        }
                    }
                ]
            }
        ]
    }
}

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注

开始在上面输入您的搜索词,然后按回车进行搜索。按ESC取消。

返回顶部