Elasticsearch Index Management with Rack Attribute
We have 2 kinds of queries.
- One is normal queries (24 hours)
- Another is batch queries from Spark Job. (3 hours)I had to deal with these two kinds of queries naturally.
I add rack name to my nodes. I added this option to ASG’s user-data.
rack #1 elasticsearch --node.rack DEFAULT
rack #2 elasticsearch --node.rack FRUIT
rack #3 elasticsearch --node.rack ANIMAL
Every night, when before batch job is starting,
I increased ASG’s size and I change indices to new Rack (new ASG)
PUT /apple/_settings
{
"index.routing.allocation.include.rack": "FRUIT"
}
After finishing Batch job, I can change all index to default Rack
PUT /_all/_settings
{
"index.routing.allocation.include.rack": "default"
}
If indices move slowly, then please check Elasticsearch’s default settings
GET /_cluster/settings?include_defaults=true
And change the settings.
PUT /_cluster/settings
{"transient" :
{
"cluster.routing.allocation.cluster_concurrent_rebalance" : 80, // cluster can move 80 shards at once. "cluster.routing.allocation.node_concurrent_recoveries" : 4, // one node can move 4 shards at once. "cluster.routing.allocation.balance.threshold" : 1.0 // 1 node can have 1shards of 1 index. ex) 5.0 : 1 node can have 5 shards of 1 index
}
}
Sometime, I need to restore index from S3 with prefix ‘backup_’
PUT /_snapshot/s3_repository/REPOSITORY_NAME/_restore
{
"indices": "tiger-1,lion-2",
"index_settings": {
"index.number_of_replicas": "2",
"index.refresh_interval": "-1",
"routing.allocation.include.rack": "ANIMAL"
},
"include_aliases": "false",
"ignore_unavailable": "false",
"include_global_state": "false",
"rename_pattern": "(.+)",
"rename_replacement": "backup_$1"
}
Sometimes, I need to exclude some nodes.
PUT /_cluster/settings
{
"transient": {
"cluster.routing.allocation.exclude._ip":
"xxx.xxx.xxx.10,xxx.xxx.xxx.11,xxx.xxx.xxx.12"
}
}
After moving index to another rack, I increased replica.
PUT /myindex/_settings
{
"index" : {
"number_of_replicas" : 9
}
}
This is last tip, sometimes I get timeout because master timeout default setting. Please increase mater time out.
es = Elasticsearch('http://localhost:9200', timeout=120)result = es.snapshot.get(repository='s3_repository',snapshot='_all', master_timeout='120s')