Elasticsearch中的常用查询语句示例

前记

由于公司内部的研发团队越来越多的接触到复杂的查询需求,也越来越多的依赖大数据部门提供的es搜索引擎提供查询服务

特此整理一些es常用的查询语句用于培训,目的在于帮助其他不熟悉es的同学快速熟悉es的dsl语句的编写

数据初始化和准备工作

es和kibana安装

可以从 https://github.com/leriou/docker-env/tree/master/elasticsearch

直接使用docker编排文件构建基于docker的es本地服务

docker-compose up -d 启动服务,启动成功访问kibana命令控制台

es: elasticsearch 实例,主要存储和搜索引擎
kibana: elasticsearch的一个web层GUI客户端,可以方便的查询es里面的数据,这些年用了一大堆各种各样的第三方GUI,用来用去还是kibana最方便

准备测试数据

先准备一部分数据用于演示

创建测试用的索引put test_idx

创建测试文档

1
2
3
4
5
put test_idx/_doc/1
{
"name":"111",
"tags":["a","c","d"]
}

以下是用于示范的数据文档

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
[
{
"_index": "test_idx",
"_type": "_doc",
"_id": "8",
"_score": 1,
"_source": {
"name": "a888",
"tags": [
"ab"
],
"age": 7,
"content": "明天"
}
},
{
"_index": "test_idx",
"_type": "_doc",
"_id": "2",
"_score": 1,
"_source": {
"name": "222",
"tags": [
"a"
],
"age": 18
}
},
{
"_index": "test_idx",
"_type": "_doc",
"_id": "4",
"_score": 1,
"_source": {
"name": "444",
"tags": [
"d",
"i"
],
"age": 0,
"content": "明天北京的天气很好,是个大晴天 "
}
},
{
"_index": "test_idx",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"name": "111",
"tags": [
"a",
"c",
"d"
]
}
},
{
"_index": "test_idx",
"_type": "_doc",
"_id": "3",
"_score": 1,
"_source": {
"name": "333",
"tags": [
"e",
"f"
],
"age": 18,
"content": "明天上海的天气不好,有小雨"
}
}
]

简单查询示范

n对n查询

1
2
3
4
5
6
7
8
9
10
11
12
# 1对1匹配查询:适用于 查询条件为1个值,被查询对象字段也为1个值的情况
# where name = 111
GET test_idx/_search
{
"query": {
"term": {
"name": {
"value": 111
}
}
}
}
1
2
3
4
5
6
7
8
9
10
# 1对多查询:适用于 查询条件为一个,查询值为[]的情况
# where find_in_set(tags, "a")
GET test_idx/_search
{
"query": {
"terms": {
"tags.keyword": ["a"]
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
# 多对1查询:适用于查询条件为[],被查询字段为1个值的情况
# where age in (0,18)
GET test_idx/_search
{
"query": {
"terms": {
"age": [
0,
18
]
}
}
}
1
2
3
4
5
6
7
8
9
# 多对多查询:适用于查询条件为 [], 查询值也为[] 的情况
GET test_idx/_search
{
"query": {
"terms": {
"tags.keyword": ["a","e"]
}
}
}

万能匹配-match

match查询可以用于多种查询用途,常见的全文检索,关键词匹配等都使用该方法,是es中最常用的查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# where tags.contains("a")
GET test_idx/_search
{
"query": {
"match": {
"tags": "a"
}
}
}
# where age = 18
GET test_idx/_search
{
"query": {
"match": {
"age": 18
}
}
}
# where content like "%天气%"
GET test_idx/_search
{
"query": {
"match": {
"content": "天气"
}
}
}

其他常用查询(range, exist, a and b等)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# 范围查询  where a between 10 and 20
GET test_idx/_search
{
"query": {
"range": {
"age": {
"gte": 10,
"lte": 20
}
}
}
}
# 字段是否存在 where content not null
GET test_idx/_search
{
"query": {
"exists":{
"field":"content"
}
}
}
# 多条件查询 where (age between 10 and 20) and content like "上海"
GET test_idx/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"age": {
"gte": 10,
"lte": 20
}
}
},
{
"match": {
"content": "上海"
}
}
]
}
}
}

查询原理解析

es文档字段的存储逻辑

es中的字段看起来有多种数据结构,实际抽象出来只有一种数据结构就是 k-v

1
2
3
4
5
6
7
8
9
类似
`tags:["a","b","c"]`
的数组数据结构在es中实际上是
{
tags.a : a
tags.b : b
tags.c : c
}
这样的分解成多个字段进行存储的

match为什么可以做到万能查询

是因为match在查询时候会对查询条件进行分词

1
2
3
4
5
6
7
8
9
# 不分词的查询查不到 tags:["a","b"]的值, 只能查询到 tags:["ab"]的值
GET test_idx/_search
{
"query": {
"match": {
"tags.keyword": "ab"
}
}
}

ps: es自带了一部分内容格式转换规则,类似 type = “VIDEO” 这种字段如果要使用term查询的话需要用 term:{type.keyword:”VIDEO”}, 因为大写的字段值会被默认分词, 如果是type =”video”这种小写 就可以用 term:{type:”video”}来进行匹配

分值相关查询

有时候我们希望按照某种特殊的顺序对es的文档进行排序,这个时候往往需要自定义文档查询得分

filter过滤

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# filter使用布隆过滤器进行过滤所以没有分值,性能较好
GET test_idx/_search
{
"query": {
"bool": {
"filter": {
"range": {
"age": {
"gte": 10
}
}
}
}
}
}

constant_score和boost提权

constant_score用于指定查询命中的单位得分值,每个查询价值一个分值单位

boost 用于提升权重

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# 该查询命中则价值 1.2分 且忽略tf/idf得分
GET test_idx/_search
{
"query": {
"constant_score" : {
"filter": {
"terms": {
"tags": [
"a",
"d"
]
}
},
"boost": 1.2
}
}
}

# boost进行提权
GET test_idx/_search
{
"query": {
"terms": {
"tags": ["a"],
"boost":3
}
}
}

function_score

function_score自定义分值,可以根据文档内容进行得分定制

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 该查询根据文档的age字段 * 5 作为最终分值
GET test_idx/_search
{
"query": {
"function_score": {
"query": {
"match_all": {
}
},
"script_score" : {
"script" : {
"source": "5*doc['age'].value"
}
}
}
}
}

查询示例:根据用户关注的标签匹配数量计算得分

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# 假设用户关注了["a","b"] 标签,根据用户的关注标签匹配数量进行分数计算,a标签价值1.3分,b标签1.1
GET test_idx/_search
{
"query": {
"bool": {
"should": [
{
"constant_score": {
"filter": {
"terms": {
"tags": [
"a"
]
}
},
"boost": 1.3
}
},
{
"constant_score": {
"filter": {
"terms": {
"tags": [
"d"
]
}
},
"boost": 1.1
}
}
],
"minimum_should_match": 1
}
}
}

聚合查询

terms求count值

1
2
3
4
5
6
7
8
9
10
11
12
13
# 等价于 group by tags
GET test_idx/_search
{
"size": 0,
"aggs": {
"t": {
"terms": {
"field": "tags.keyword",
"size": 10
}
}
}
}

avg求平均值

1
2
3
4
5
6
7
8
9
10
11
GET test_idx/_search
{
"size": 0,
"aggs": {
"t": {
"avg": {
"field": "age"
}
}
}
}

其他查询和dsl

原子更新文档

1
2
3
4
5
6
7
8
9
10
11
# 由于es本身不支持源字形的更新文档,我们需要借助内置脚本的帮助来操作
POST test_idx/_update/1
{
"script" : {
"source": "ctx._source.age += params.count",
"lang": "painless",
"params" : {
"count" : 4
}
}
}

explain

explain 用于查看查询的执行过程和各部分的具体得分,一般用于排查问题

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
GET test_idx/_search
{
"explain": true,
"query": {
"bool": {
"must": [
{
"range": {
"age": {
"gte": 10,
"lte": 20
}
}
}
],
"should": [
{
"match": {
"content": "上海"
}
}
],
"minimum_should_match": 1
}
}
}

#explain 结果示例
{
"_explanation": {
"value": 1.5753641,
"description": "sum of:",
"details": [
{
"value": 1,
"description": "age:[10 TO 20]",
"details": []
},
{
"value": 0.5753642,
"description": "sum of:",
"details": [
{
"value": 0.2876821,
"description": "weight(content:上 in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.2876821,
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 0.2876821,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 1,
"description": "docFreq",
"details": []
},
{
"value": 1,
"description": "docCount",
"details": []
}
]
},
{
"value": 1,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 12,
"description": "avgFieldLength",
"details": []
},
{
"value": 12,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
},
{
"value": 0.2876821,
"description": "weight(content:海 in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 0.2876821,
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 0.2876821,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 1,
"description": "docFreq",
"details": []
},
{
"value": 1,
"description": "docCount",
"details": []
}
]
},
{
"value": 1,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 12,
"description": "avgFieldLength",
"details": []
},
{
"value": 12,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
]
}
]
}
}



参考资料

https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html