操作手册
测试环境
-
主集群:http://10.0.1.2:9200 , 用户名: elastic 密码:*** ,9 节点 , 硬件规格:12C64GB (31GB JVM)
-
备集群:http://10.0.1.15:9200 , 用户名: elastic 密码:*** ,9 节点 , 硬件规格:12C64GB (31GB JVM)
-
网关服务器 1(公网 IP:120.92.43.31,内网 IP:192.168.0.24) 硬件规格:40C 256GB 3.7T NVME SSD
-
压测服务器 1(内网 IP: 10.0.0.117) 硬件规格:24C 48GB
-
压测服务器 2(内网 IP: 10.0.0.69) 硬件规格:24C 48GB
测试说明
本次测试主要验证网关索引加速的可操作性,以及评估达到不同性能所需要的硬件规格,用于实际生产环境的部署配置参考。
场景描述
网关通过将请求按照目标节点来重新组合,实现请求快慢分离,从而提高整体集群的写入吞吐。
数据描述
以 Loadgen 自动生成的 Nginx 数据为例来介绍,分别就直接写 Elasticsearch 和走网关来写 Elasticsearch,对比他们的速度差异,执行步骤依次说明。数据样例:
{
"_index": "test-10",
"_type": "_doc",
"_id": "cak5emoke01flcq9q760",
"_source": {
"batch_number": "2328917",
"id": "cak5emoke01flcq9r19g",
"ip": "192.168.0.1",
"message": "175.10.75.216 - webmaster [29/Jul/2020:17:01:26 +0800] \"GET /rest/system/status HTTP/1.1\" 200 1838 \"http://dl-console.elasticsearch.cn/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36\"",
"now_local": "2022-06-14 17:39:39.420724895 +0800 CST",
"now_unix": "1655199579",
"random_no": "13",
"routing_no": "cak5emoke01flcq9pvu0"
}
}
数据架构
极限网关可以本地计算每个索引文档对应后端 Elasticsearch 集群的目标存放位置,从而能够精准的进行请求定位,在一批 bulk 请求中,可能存在多个后端节点的数据,bulk_reshuffle 过滤器用来将正常的 bulk 请求打散,按照目标节点或者分片进行拆分重新组装,避免 Elasticsearch 节点收到请求之后再次进行请求分发, 从而降低 Elasticsearch 集群间的流量和负载,也能避免单个节点成为热点瓶颈,确保各个数据节点的处理均衡,从而提升集群总体的索引吞吐能力。
我们分别对比测试 3 分片和 30 分片的场景。
测试准备
部署网关程序
- 系统调优
- 下载程序
[root@iZbp1gxkifg8uetb33pvcoZ ~]# mkdir /opt/gateway
[root@iZbp1gxkifg8uetb33pvcoZ ~]# cd /opt/gateway/
[root@iZbp1gxkifg8uetb33pvcoZ gateway]# tar vxzf gateway-1.6.0_SNAPSHOT-649-linux-amd64.tar.gz
gateway-linux-amd64
gateway.yml
sample-configs/
sample-configs/elasticsearch-with-ldap.yml
sample-configs/indices-replace.yml
sample-configs/record_and_play.yml
sample-configs/cross-cluster-search.yml
sample-configs/kibana-proxy.yml
sample-configs/elasticsearch-proxy.yml
sample-configs/v8-bulk-indexing-compatibility.yml
sample-configs/use_old_style_search_response.yml
sample-configs/context-update.yml
sample-configs/elasticsearch-route-by-index.yml
sample-configs/hello_world.yml
sample-configs/entry-with-tls.yml
sample-configs/javascript.yml
sample-configs/log4j-request-filter.yml
sample-configs/request-filter.yml
sample-configs/condition.yml
sample-configs/cross-cluster-replication.yml
sample-configs/secured-elasticsearch-proxy.yml
sample-configs/fast-bulk-indexing.yml
sample-configs/es_migration.yml
sample-configs/index-docs-diff.yml
sample-configs/rate-limiter.yml
sample-configs/async-bulk-indexing.yml
sample-configs/elasticssearch-request-logging.yml
sample-configs/router_rules.yml
sample-configs/auth.yml
sample-configs/index-backup.yml
- 修改配置
将网关提供的示例配置拷贝,并根据实际集群的信息进行相应的修改,如下:
[root@iZbp1gxkifg8uetb33pvcoZ gateway]# cp sample-configs/async-bulk-indexing.yml
修改集群的注册信息,如下:
根据需要修改网关监听的端口,以及是否开启 TLS(如果应用客户端通过 http://协议访问 ES,请将 entry.tls.enabled 值改为 false),如下:
不同的集群可以使用不同的配置,分别监听不同的端口,用于业务的分开访问。
- 启动网关
启动网关并指定刚刚创建的配置,如下:
[root@iZbp1gxkifg8uetb33pvcoZ gateway]# ./gateway-linux-amd64 -config gateway.yml
___ _ _____ __ __ __ _
/ _ \ /_\ /__ \/__\/ / /\ \ \/_\ /\_/\
/ /_\///_\\ / /\/_\ \ \/ \/ //_\\\_ _/
/ /_\\/ _ \/ / //__ \ /\ / _ \/ \
\____/\_/ \_/\/ \__/ \/ \/\_/ \_/\_/
[GATEWAY] A light-weight, powerful and high-performance elasticsearch gateway.
[GATEWAY] 1.6.0_SNAPSHOT, 2022-05-18 11:09:54, 2023-12-31 10:10:10, 73408e82a0f96352075f4c7d2974fd274eeafe11
[05-19 13:35:43] [INF] [app.go:174] initializing gateway.
[05-19 13:35:43] [INF] [app.go:175] using config: /opt/gateway/gateway.yml.
[05-19 13:35:43] [INF] [instance.go:72] workspace: /opt/gateway/data1/gateway/nodes/ca2tc22j7ad0gneois80
[05-19 13:35:43] [INF] [app.go:283] gateway is up and running now.
[05-19 13:35:50] [INF] [actions.go:358] elasticsearch [primary] is available
[05-19 13:35:50] [INF] [api.go:262] api listen at: http://0.0.0.0:2900
[05-19 13:35:50] [INF] [reverseproxy.go:261] elasticsearch [primary] hosts: [] => [192.168.0.19:9200]
[05-19 13:35:50] [INF] [reverseproxy.go:261] elasticsearch [backup] hosts: [] => [xxxxxxxx-backup:9200]
[05-19 13:35:50] [INF] [reverseproxy.go:261] elasticsearch [primary] hosts: [] => [192.168.0.19:9200]
[05-19 13:35:50] [INF] [reverseproxy.go:261] elasticsearch [backup] hosts: [] => [xxxxxxxx-primary:9200]
[05-19 13:35:50] [INF] [reverseproxy.go:261] elasticsearch [primary] hosts: [] => [192.168.0.19:9200]
[05-19 13:35:50] [INF] [entry.go:322] entry [my_es_entry/] listen at: https://0.0.0.0:8000
[05-19 13:35:50] [INF] [module.go:116] all modules are started
- 启动服务
快速安装网关为系统服务,操作方式如下:
[root@iZbp1gxkifg8uetb33pvcpZ console]# ./gateway-linux-amd64 -service install
Success
[root@iZbp1gxkifg8uetb33pvcpZ console]# ./gateway-linux-amd64 -service start
Success
部署管理后台
为了方便在多个集群之间快速切换,使用 Console 来进行管理。
- 下载安装
将提供的安装程序解压即可完成安装,如下:
[root@iZbp1gxkifg8uetb33pvcpZ console]# tar vxzf console-0.3.0_SNAPSHOT-596-linux-amd64.tar.gz
console-linux-amd64
console.yml
- 修改配置
使用 [http://10.0.1.2:9200](http://10.0.1.2:9200)作为 Console 的系统集群,保留监控指标和元数据信息,修改配置如下:
[root@iZbp1gxkifg8uetb33pvcpZ console]# cat console.yml
elasticsearch:
- name: default
enabled: true
monitored: false
endpoint: http://10.0.1.2:9200
basic_auth:
username: elastic
password: xxxxx
discovery:
enabled: false
...
- 启动服务
[root@iZbp1gxkifg8uetb33pvcpZ console]# ./console-linux-amd64 -service install
Success
[root@iZbp1gxkifg8uetb33pvcpZ console]# ./console-linux-amd64 -service start
Success
- 访问后台
访问该主机的 9000 端口,即可打开 Console 后台,http://10.0.128.58:9000/#/cluster/overview
打开菜单 [System][Cluster] ,注册当前需要管理的 Elasticsearch 集群和网关地址,用来快速管理,如下:
- 注册网关
打开 GATEWAY 的注册功能,设置为网关的 API 地址来进行管理,如下:
测试 Gateway
为了验证网关是否正常工作,我们通过 Console 来快速验证一下。
首先通过走网关的接口来创建一个索引,并写入一个文档,如下:
首先查看主集群的数据情况,如下:
继续查看备集群的数据情况,如下:
两边集群都返回相同的数据,说明网关配置都正常,验证结束。
安装 Loadgen
测试机器同样需要调优,参考网关的调优说明。
- 在测试机上面,下载安装 Loadgen,如下:
[root@vm10-0-0-69 opt]# tar vxzf loadgen-1.4.0_SNAPSHOT-50-linux-amd64.tar.gz
- 下载一个 Nginx 日志样本,保存为
nginx.log,如下:
[root@vm10-0-0-69 opt]# head nginx.log
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] "GET / HTTP/1.1" 200 8676 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] "GET /vendor/bootstrap/css/bootstrap.css HTTP/1.1" 200 17235 "http://dl-console.elasticsearch.cn/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] "GET /vendor/daterangepicker/daterangepicker.css HTTP/1.1" 200 1700 "http://dl-console.elasticsearch.cn/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] "GET /vendor/fork-awesome/css/v5-compat.css HTTP/1.1" 200 2091 "http://dl-console.elasticsearch.cn/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] "GET /assets/font/raleway.css HTTP/1.1" 200 145 "http://dl-console.elasticsearch.cn/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] "GET /vendor/fork-awesome/css/fork-awesome.css HTTP/1.1" 200 8401 "http://dl-console.elasticsearch.cn/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] "GET /assets/css/overrides.css HTTP/1.1" 200 2524 "http://dl-console.elasticsearch.cn/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] "GET /assets/css/theme.css HTTP/1.1" 200 306 "http://dl-console.elasticsearch.cn/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] "GET /vendor/fancytree/css/ui.fancytree.css HTTP/1.1" 200 3456 "http://dl-console.elasticsearch.cn/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"
175.10.75.216 - - [28/Jul/2020:21:20:26 +0800] "GET /syncthing/development/logbar.js HTTP/1.1" 200 486 "http://dl-console.elasticsearch.cn/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"
- 修改 Loadgen 的配置文件
修改其中的变量,将 message 指向刚刚准备好的 nginx 日志,并修改 es 的地址和身份信息,Loadgen 将随机构造写入请求,具体配置如下:
[root@vm10-0-0-117 opt]# cat loadgen.yml
variables:
- name: ip
type: file
path: dict/ip.txt
- name: message
type: file
path: nginx.log
- name: user
type: file
path: dict/user.txt
- name: id
type: sequence
- name: uuid
type: uuid
- name: now_local
type: now_local
- name: now_utc
type: now_utc
- name: now_unix
type: now_unix
- name: suffix
type: range
from: 10
to: 13
requests:
- request:
method: POST
runtime_variables:
batch_no: id
runtime_body_line_variables:
routing_no: uuid
basic_auth:
username: elastic
password: xxxx
url: http://10.0.128.58:8000/_bulk
body_repeat_times: 5000
body: "{ \"create\" : { \"_index\" : \"test-$[[suffix]]\",\"_type\":\"_doc\", \"_id\" : \"$[[uuid]]\" } }\n{ \"id\" : \"$[[uuid]]\",\"routing_no\" : \"$[[routing_no]]\",\"batch_number\" : \"$[[batch_no]]\", \"message\" : \"$[[message]]\", \"random_no\" : \"$[[suffix]]\",\"ip\" : \"$[[ip]]\",\"now_local\" : \"$[[now_local]]\",\"now_unix\" : \"$[[now_unix]]\" }\n"
- 启动 Loadgen 进行测试
指定相关运行时间参数 -d和 并发参数 -c,开启请求压缩,如下:
[root@vm10-0-0-117 opt]# ./loadgen-linux-amd64 -d 60000 -c 200 --compress
__ ___ _ ___ ___ __ __
/ / /___\/_\ / \/ _ \ /__\/\ \ \
/ / // ///_\\ / /\ / /_\//_\ / \/ /
/ /__/ \_// _ \/ /_// /_\\//__/ /\ /
\____|___/\_/ \_/___,'\____/\__/\_\ \/
[LOADGEN] A http load generator and testing suit.
[LOADGEN] 1.4.0_SNAPSHOT, 2022-06-01 09:58:17, 2023-12-31 10:10:10, b6a73e2434ac931d1d43bce78c0f7622a1d08b2e
[06-14 18:47:29] [INF] [app.go:174] initializing loadgen.
[06-14 18:47:29] [INF] [app.go:175] using config: /opt/loadgen.yml.
[06-14 18:47:29] [INF] [module.go:116] all modules are started
[06-14 18:47:30] [INF] [instance.go:72] workspace: /opt/data/loadgen/nodes/cajfdg0ke012ka748j30
[06-14 18:47:30] [INF] [app.go:283] loadgen is up and running now.
[06-14 18:47:30] [INF] [loader.go:320] warmup started
[06-14 18:47:30] [INF] [loader.go:329] [POST] http://10.0.128.58:8000/_bulk -{"took":115,"errors":false,"items":[{"create":{"_index":"test-11","_type":"_doc","_id":"cak6eggke0184a2dcc70","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":39707421,"_primary_term":1,"status":201}},{"create":{"_i
[06-14 18:47:30] [INF] [loader.go:330] status: 200,<nil>,{"took":115,"errors":false,"items":[{"create":{"_index":"test-11","_type":"_doc","_id":"cak6eggke0184a2dcc70","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":39707421,"_primary_term":1,"status":201}},{"create":{"_i
[06-14 18:47:30] [INF] [loader.go:338] warmup finished
在另外一台压测机执行同样的安装操作,不重复描述。
测试方法
准备模板
创建一个默认的索引模板,用于优化写入性能:
PUT _template/test
{
"index_patterns": [
"test*"
],
"settings": {
"index.translog.durability": "async",
"refresh_interval": "-1",
"number_of_shards": 3,
"number_of_replicas": 0
},
"mappings": {
"dynamic_templates": [
{
"strings": {
"mapping": {
"ignore_above": 256,
"type": "keyword"
},
"match_mapping_type": "string"
}
}
]
}
}
开启压测
分别在压测机器上面执行压测工具:
[root@vm10-0-0-117 opt]# ./loadgen-linux-amd64 -d 60000 -c 200 --compress
观察吞吐
打开 Console 工具来查看集群的吞吐情况,打开监控菜单,点击顶部的下拉选项,可以快速切换不同集群,查看主集群的吞吐情况,如下:
限制 CPU
为了测试不同 CPU 资源下的网关性能,我们使用 taskset 来绑定进程的 CPU, 如下:
测试过程
网关配置:
直写 ES
Loadgen 配置
[root@vm10-0-0-69 opt]# cat loadgen2.yml
statsd:
enabled: false
host: 192.168.3.98
port: 8125
namespace: loadgen.
variables:
- name: ip
type: file
path: dict/ip.txt
- name: message
type: file
path: nginx.log
- name: user
type: file
path: dict/user.txt
- name: id
type: sequence
- name: uuid
type: uuid
- name: now_local
type: now_local
- name: now_utc
type: now_utc
- name: now_unix
type: now_unix
- name: suffix
type: range
from: 10
to: 13
requests:
- request:
method: POST
runtime_variables:
batch_no: id
runtime_body_line_variables:
routing_no: uuid
basic_auth:
username: elastic
password: ####
#url: http://localhost:8000/_search?q=$[[id]]
url: http://10.0.1.2:9200/_bulk
body_repeat_times: 10000
body: "{ \"create\" : { \"_index\" : \"test-$[[suffix]]\",\"_type\":\"_doc\", \"_id\" : \"$[[uuid]]\" } }\n{ \"id\" : \"$[[uuid]]\",\"routing_no\" : \"$[[routing_no]]\",\"message\" : \"$[[message]]\",\"batch_number\" : \"$[[batch_no]]\", \"random_no\" : \"$[[suffix]]\",\"ip\" : \"$[[ip]]\",\"now_local\" : \"$[[now_local]]\",\"now_unix\" : \"$[[now_unix]]\" }\n"
第二台 Loadgen 的配置:
[root@vm10-0-0-117 opt]# cat loadgen2.yml
statsd:
enabled: false
host: 192.168.3.98
port: 8125
namespace: loadgen.
variables:
- name: ip
type: file
path: dict/ip.txt
- name: message
type: file
path: nginx.log
- name: user
type: file
path: dict/user.txt
- name: id
type: sequence
- name: uuid
type: uuid
- name: now_local
type: now_local
- name: now_utc
type: now_utc
- name: now_unix
type: now_unix
- name: suffix
type: range
from: 10
to: 13
requests:
- request:
method: POST
runtime_variables:
batch_no: id
runtime_body_line_variables:
routing_no: uuid
basic_auth:
username: elastic
password: ####
url: http://10.0.1.2:9200/_bulk
body_repeat_times: 5000
body: "{ \"create\" : { \"_index\" : \"test-$[[suffix]]\",\"_type\":\"_doc\", \"_id\" : \"$[[uuid]]\" } }\n{ \"id\" : \"$[[uuid]]\",\"routing_no\" : \"$[[routing_no]]\",\"batch_number\" : \"$[[batch_no]]\", \"message\" : \"$[[message]]\", \"random_no\" : \"$[[suffix]]\",\"ip\" : \"$[[ip]]\",\"now_local\" : \"$[[now_local]]\",\"now_unix\" : \"$[[now_unix]]\" }\n"
分别启动压测:
[root@vm10-0-0-69 opt]# ./loadgen-linux-amd64 -c 100 -d 66000 -config loadgen2.yml
[root@vm10-0-0-117 opt]# ./loadgen-linux-amd64 -c 100 -d 66000 -config loadgen2.yml
直接写 ES 的吞吐稳定在 ~600k eps,每个索引 3 分片。
网关 1C
走网关模式,先测试默认索引 3 分片的:
网关 2C
网关 4C
网关 6C
网关 8C
Loadgen 并发都调成 200:
[root@vm10-0-0-117 opt]# ./loadgen-linux-amd64 -c 200 -d 66000 -config loadgen1.yml
性能无提升,网关 CPU 吃不满。
直写 ES - 32 分片
删除所有,并修改模板,默认 30 分片:
DELETE test-10
DELETE test-11
DELETE test-12
DELETE test-13
DELETE test-14
DELETE test-15
PUT _template/test
{
"index_patterns": [
"test*"
],
"settings": {
"index.translog.durability": "async",
"refresh_interval": "-1",
"number_of_shards": 30,
"number_of_replicas": 0
},
"mappings": {
"dynamic_templates": [
{
"strings": {
"mapping": {
"ignore_above": 256,
"type": "keyword"
},
"match_mapping_type": "string"
}
}
]
}
}
继续压测:
30 分片,直接 ES 稳定在 ~750k eps。
网关 1C - 32 分片
网关 2C - 32 分片
网关 4C - 32 分片
网关 6C - 32 分片
网关 8C - 32 分片
流量和写入比较大。
开启压缩:
修改消息压缩到磁盘:
注意:开启流量或者磁盘压缩会耗费额外的开销,吞吐有一定的下降。
网关 12C - 32 分片
恢复去掉压缩,扩大 CPU 到 12C,吞吐没有变化,已到极限。
分片 Level
测试结果
3 分片* 4 索引, 直接写 ES 600k eps.
| 网关 CPU 核数 | 吞吐能力 (events per seond) | 备注 |
| ----------------- | ------------------------------- | -------------------------- |
| 网关 1C | ~180k | |
| 网关 2C | ~350k | |
| 网关 4C | ~650k | |
| 网关 6C | ~770k | |
| 网关 8C | ~930k | 后端 ES 处理能力已接近饱和 |
30 分片* 4 索引, 直接写 ES 750k eps.
| 网关 CPU 核数 | 吞吐能力 (events per seond) | 备注 |
| ----------------- | ------------------------------- | -------------------------- |
| 网关 1C | ~200k | |
| 网关 2C | ~400k | |
| 网关 4C | ~760k | |
| 网关 6C | ~1000k | 后端 ES 处理能力已接近饱和 |
| 网关 8C | ~930k | 后端 ES 处理能力已接近饱和 |
小结
网关性能强悍,使用简单,通过使用网关,有 30%~50%的吞吐提升。
共同學習,寫下你的評論
評論加載中...
作者其他優質文章
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)
.png)