Elasticsearch中的常用查询语句示例

哎呀是太阳嘛

2019-12-25

分布式存储

Elasticsearch

前记

由于公司内部的研发团队越来越多的接触到复杂的查询需求，也越来越多的依赖大数据部门提供的es搜索引擎提供查询服务

特此整理一些es常用的查询语句用于培训，目的在于帮助其他不熟悉es的同学快速熟悉es的dsl语句的编写

数据初始化和准备工作

es和kibana安装

可以从 https://github.com/leriou/docker-env/tree/master/elasticsearch

直接使用docker编排文件构建基于docker的es本地服务

docker-compose up -d 启动服务,启动成功访问kibana命令控制台

es: elasticsearch 实例，主要存储和搜索引擎
kibana： elasticsearch的一个web层GUI客户端，可以方便的查询es里面的数据，这些年用了一大堆各种各样的第三方GUI，用来用去还是kibana最方便

准备测试数据

先准备一部分数据用于演示

创建测试用的索引put test_idx

创建测试文档

put test_idx/_doc/1
{
	"name":"111",
	"tags":["a","c","d"]
}

以下是用于示范的数据文档

[
      {
        "_index": "test_idx",
        "_type": "_doc",
        "_id": "8",
        "_score": 1,
        "_source": {
          "name": "a888",
          "tags": [
            "ab"
          ],
          "age": 7,
          "content": "明天"
        }
      },
      {
        "_index": "test_idx",
        "_type": "_doc",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "222",
          "tags": [
            "a"
          ],
          "age": 18
        }
      },
      {
        "_index": "test_idx",
        "_type": "_doc",
        "_id": "4",
        "_score": 1,
        "_source": {
          "name": "444",
          "tags": [
            "d",
            "i"
          ],
          "age": 0,
          "content": "明天北京的天气很好，是个大晴天 "
        }
      },
      {
        "_index": "test_idx",
        "_type": "_doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "111",
          "tags": [
            "a",
            "c",
            "d"
          ]
        }
      },
      {
        "_index": "test_idx",
        "_type": "_doc",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "333",
          "tags": [
            "e",
            "f"
          ],
          "age": 18,
          "content": "明天上海的天气不好，有小雨"
        }
      }
    ]

简单查询示范

n对n查询

# 1对1匹配查询：适用于 查询条件为1个值，被查询对象字段也为1个值的情况
# where name = 111
GET test_idx/_search
{
  "query": {
    "term": {
      "name": {
        "value": 111
      }
    }
  }
}

# 1对多查询：适用于 查询条件为一个，查询值为[]的情况
# where find_in_set(tags, "a")
GET test_idx/_search
{
  "query": {
    "terms": {
      "tags.keyword": ["a"]
    }
  }
}

# 多对1查询：适用于查询条件为[]，被查询字段为1个值的情况
# where age in (0,18)
GET test_idx/_search
{
  "query": {
    "terms": {
      "age": [
        0,
        18
      ]
    }
  }
}

# 多对多查询：适用于查询条件为 [], 查询值也为[] 的情况
GET test_idx/_search
{
  "query": {
    "terms": {
      "tags.keyword": ["a","e"]
    }
  }
}

万能匹配-match

match查询可以用于多种查询用途，常见的全文检索，关键词匹配等都使用该方法，是es中最常用的查询

# where tags.contains("a")
GET test_idx/_search
{
  "query": {
    "match": {
      "tags": "a"
    }
  }
}
# where age = 18 
GET test_idx/_search
{
  "query": {
    "match": {
      "age": 18
    }
  }
}
# where content like "%天气%"
GET test_idx/_search
{
  "query": {
    "match": {
      "content": "天气"
    }
  }
}

其他常用查询（range, exist, a and b等）

# 范围查询  where a between 10 and 20
GET test_idx/_search
{
  "query": {
    "range": {
      "age": {
        "gte": 10,
        "lte": 20
      }
    }
  }
}
# 字段是否存在 where content not null
GET test_idx/_search
{
  "query": {
    "exists":{
      "field":"content"
    }
  }
}
# 多条件查询 where (age between 10 and 20) and content like "上海"
GET test_idx/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "age": {
              "gte": 10,
              "lte": 20
            }
          }
        },
        {
          "match": {
            "content": "上海"
          }
        }
      ]
    }
  }
}

查询原理解析

es文档字段的存储逻辑

es中的字段看起来有多种数据结构，实际抽象出来只有一种数据结构就是 k-v

类似
`tags:["a","b","c"]`
的数组数据结构在es中实际上是
{
	tags.a : a 
	tags.b : b
	tags.c : c 
}
这样的分解成多个字段进行存储的

`match`为什么可以做到万能查询

是因为match在查询时候会对查询条件进行分词

# 不分词的查询查不到 tags:["a","b"]的值， 只能查询到 tags:["ab"]的值
GET test_idx/_search
{
  "query": {
    "match": {
      "tags.keyword": "ab"
    }
  }
}

ps： es自带了一部分内容格式转换规则，类似 type = “VIDEO” 这种字段如果要使用term查询的话需要用 term:{type.keyword:”VIDEO”}, 因为大写的字段值会被默认分词，如果是type =”video”这种小写就可以用 term:{type:”video”}来进行匹配

分值相关查询

有时候我们希望按照某种特殊的顺序对es的文档进行排序，这个时候往往需要自定义文档查询得分

filter过滤

# filter使用布隆过滤器进行过滤所以没有分值，性能较好
GET test_idx/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "age": {
            "gte": 10
          }
        }
      }
    }
  }
}

constant_score和boost提权

constant_score用于指定查询命中的单位得分值，每个查询价值一个分值单位

boost 用于提升权重

# 该查询命中则价值 1.2分 且忽略tf/idf得分
GET test_idx/_search
{
    "query": {
        "constant_score" : {
          "filter": {
            "terms": {
              "tags": [
                "a",
                "d"
              ]
            }
          },
          "boost": 1.2
        }
    }
}

# boost进行提权
GET test_idx/_search
{
  "query": {
    "terms": {
      "tags": ["a"],
      "boost":3
    }
  }
}

function_score

function_score自定义分值，可以根据文档内容进行得分定制

# 该查询根据文档的age字段 * 5 作为最终分值
GET test_idx/_search
{
  "query": {
    "function_score": {
      "query": {
        "match_all": {
        }
      },
      "script_score" : {
        "script" : {
          "source": "5*doc['age'].value"
        }
      }
    }
  }
}

查询示例：根据用户关注的标签匹配数量计算得分

# 假设用户关注了["a","b"] 标签，根据用户的关注标签匹配数量进行分数计算，a标签价值1.3分，b标签1.1
GET test_idx/_search
{
  "query": {
    "bool": {
      "should": [
        {
         "constant_score": {
           "filter": {
             "terms": {
               "tags": [
                 "a"
               ]
             }
           },
           "boost": 1.3
         }
        },
        {
          "constant_score": {
            "filter": {
              "terms": {
                "tags": [
                  "d"
                ]
              }
            },
            "boost": 1.1
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

聚合查询

terms求count值

# 等价于 group by tags
GET test_idx/_search
{
  "size": 0,
  "aggs": {
    "t": {
      "terms": {
        "field": "tags.keyword",
        "size": 10
      }
    }
  }
}

avg求平均值

GET test_idx/_search
{
  "size": 0,
  "aggs": {
    "t": {
      "avg": {
        "field": "age"
      }
    }
  }
}

其他查询和dsl

原子更新文档

# 由于es本身不支持源字形的更新文档，我们需要借助内置脚本的帮助来操作
POST test_idx/_update/1
{
 "script" : {
    "source": "ctx._source.age += params.count",
      "lang": "painless",
      "params" : {
          "count" : 4
      }
    }
}

explain

explain 用于查看查询的执行过程和各部分的具体得分，一般用于排查问题

GET test_idx/_search
{
  "explain": true, 
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "age": {
              "gte": 10,
              "lte": 20
            }
          }
        }
      ],
      "should": [
        {
          "match": {
            "content": "上海"
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

#explain 结果示例
{
        "_explanation": {
          "value": 1.5753641,
          "description": "sum of:",
          "details": [
            {
              "value": 1,
              "description": "age:[10 TO 20]",
              "details": []
            },
            {
              "value": 0.5753642,
              "description": "sum of:",
              "details": [
                {
                  "value": 0.2876821,
                  "description": "weight(content:上 in 0) [PerFieldSimilarity], result of:",
                  "details": [
                    {
                      "value": 0.2876821,
                      "description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
                      "details": [
                        {
                          "value": 0.2876821,
                          "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                          "details": [
                            {
                              "value": 1,
                              "description": "docFreq",
                              "details": []
                            },
                            {
                              "value": 1,
                              "description": "docCount",
                              "details": []
                            }
                          ]
                        },
                        {
                          "value": 1,
                          "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                          "details": [
                            {
                              "value": 1,
                              "description": "termFreq=1.0",
                              "details": []
                            },
                            {
                              "value": 1.2,
                              "description": "parameter k1",
                              "details": []
                            },
                            {
                              "value": 0.75,
                              "description": "parameter b",
                              "details": []
                            },
                            {
                              "value": 12,
                              "description": "avgFieldLength",
                              "details": []
                            },
                            {
                              "value": 12,
                              "description": "fieldLength",
                              "details": []
                            }
                          ]
                        }
                      ]
                    }
                  ]
                },
                {
                  "value": 0.2876821,
                  "description": "weight(content:海 in 0) [PerFieldSimilarity], result of:",
                  "details": [
                    {
                      "value": 0.2876821,
                      "description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
                      "details": [
                        {
                          "value": 0.2876821,
                          "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                          "details": [
                            {
                              "value": 1,
                              "description": "docFreq",
                              "details": []
                            },
                            {
                              "value": 1,
                              "description": "docCount",
                              "details": []
                            }
                          ]
                        },
                        {
                          "value": 1,
                          "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                          "details": [
                            {
                              "value": 1,
                              "description": "termFreq=1.0",
                              "details": []
                            },
                            {
                              "value": 1.2,
                              "description": "parameter k1",
                              "details": []
                            },
                            {
                              "value": 0.75,
                              "description": "parameter b",
                              "details": []
                            },
                            {
                              "value": 12,
                              "description": "avgFieldLength",
                              "details": []
                            },
                            {
                              "value": 12,
                              "description": "fieldLength",
                              "details": []
                            }
                          ]
                        }
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
}

参考资料

https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

前记