This short article will show how it is easy to do a live data migration from one server to another.

Recently I had to setup up a simple search index in the Elasticsearch (ES) which suddenly went production and lots of fancy applications started to rely on it. After ordering and setting up the new and more powerful server all the data should have been copied over somehow.

There are two approaches were considered:

  1. snapshot/restore
  2. live data migration using the cluster reroute API

Second one was chosen as a way to go, since it can be done online while the application still using the index. The first approach would require shutting down all the apps and making sure that nothing would populate the index while snapshotting/restoring.

Let’s assume we have a node with an index which we would like to migrate over to another node:

node01$ curl 'localhost:9200/_cat/nodes?v'
host                    ip          heap.percent ram.percent load node.role master name
node01.example.com     192.0.2.10    18          32          0.00 d         *      Anything

node01$ curl 'localhost:9200/_cat/shards?v'
index    shard prirep state    docs   store   ip           node          
articles 0     p      STARTED 1003522 703.6mb 192.0.2.10 Anything

On the new prepared and deployed ES server we should tweak elasticsearch.yml by adding discovery.zen.ping.unicast.hosts: [ "node01.example.com", "node02.example.com" ], since we always read documentation and have disabled multicast for production

After starting the node we should see something like this:

node02$ curl 'localhost:9200/_cat/nodes?v'
host                    ip           heap.percent ram.percent load node.role master name
node02.example.com    192.0.2.11        32          80        0.00      d     m     Monster
node01.example.com    192.0.2.10        29          32        0.00      d     *     Anything

Now we have the ES cluster with 2 nodes and can do next:

  • disable shard allocation - this will prevent ES from rebalancing missing shards
node02$ curl -XPUT  'localhost:9200/_cluster/settings' -d '{ "transient": { "cluster.routing.allocation.enable": "none"}}'
  • move the shard to the new server
node02$ curl -XPOST 'localhost:9200/_cluster/reroute' -d '{"commands": [{"move": {"index": "tests", "from_node":"node01.example.com", "to_node":"node02.example.com", "shard": 0}}]}'
  • make sure that the shard has been moved
node02$ curl 'localhost:9200/_cat/shards?v'
  index    shard prirep state    docs   store   ip           node
  articles 0     p      STARTED 1003522 703.6mb 192.0.2.11   Monster
  • shutdown the first node
  • and enable the shard allocation
node02$ curl -XPUT  'localhost:9200/_cluster/settings' -d '{ "transient": { "cluster.routing.allocation.enable": "all"}}'
  • now we can actually revert the elasticsearch.yml to the previous state (without discovery.zen.ping.unicast.hosts) and restart ES

The applications will stay functional all the time and at the end one might need to change the connection settings when the first node goes down, if there are no VIP (virtual IP) in use which could be moved instead to the new node.