Elasticsearch Index Management with Rack Attribute

Daehwan Bae
2 min readFeb 1, 2021

We have 2 kinds of queries.
- One is normal queries (24 hours)
- Another is batch queries from Spark Job. (3 hours)

I had to deal with these two kinds of queries naturally.

I add rack name to my nodes. I added this option to ASG’s user-data.

rack #1  elasticsearch --node.rack DEFAULT
rack #2 elasticsearch --node.rack FRUIT
rack #3 elasticsearch --node.rack ANIMAL

Every night, when before batch job is starting,
I increased ASG’s size and I change indices to new Rack (new ASG)

PUT /apple/_settings
{
"index.routing.allocation.include.rack": "FRUIT"
}

After finishing Batch job, I can change all index to default Rack

PUT /_all/_settings
{
"index.routing.allocation.include.rack": "default"
}

If indices move slowly, then please check Elasticsearch’s default settings

GET /_cluster/settings?include_defaults=true

And change the settings.

PUT /_cluster/settings
{"transient" :
{
"cluster.routing.allocation.cluster_concurrent_rebalance" : 80, // cluster can move 80 shards at once.
"cluster.routing.allocation.node_concurrent_recoveries" : 4, // one node can move 4 shards at once. "cluster.routing.allocation.balance.threshold" : 1.0 // 1 node can have 1shards of 1 index. ex) 5.0 : 1 node can have 5 shards of 1 index
}
}

Sometime, I need to restore index from S3 with prefix ‘backup_’

PUT /_snapshot/s3_repository/REPOSITORY_NAME/_restore
{
"indices": "tiger-1,lion-2",
"index_settings": {
"index.number_of_replicas": "2",
"index.refresh_interval": "-1",
"routing.allocation.include.rack": "ANIMAL"
},
"include_aliases": "false",
"ignore_unavailable": "false",
"include_global_state": "false",
"rename_pattern": "(.+)",
"rename_replacement": "backup_$1"
}

Sometimes, I need to exclude some nodes.

PUT /_cluster/settings
{
"transient": {
"cluster.routing.allocation.exclude._ip":
"xxx.xxx.xxx.10,xxx.xxx.xxx.11,xxx.xxx.xxx.12"
}
}

After moving index to another rack, I increased replica.

PUT /myindex/_settings
{
"index" : {
"number_of_replicas" : 9
}
}

This is last tip, sometimes I get timeout because master timeout default setting. Please increase mater time out.

es = Elasticsearch('http://localhost:9200', timeout=120)result = es.snapshot.get(repository='s3_repository',snapshot='_all', master_timeout='120s')

--

--