Reindex elasticsearch with Logstash

Wednesday, May 20, 2015 | 5 minute read

David Pilato
Reindex elasticsearch with Logstash

Sometimes, you would like to reindex your data to change your mapping or to change your index settings or to move from one server to another or to one cluster to another (think about multiple data centers for example).

For the later you can use Snapshot and Restore feature but if you need to change any index settings, you need something else.

With Logstash 1.5.0 , you can now do it super easily using elasticsearch input and elasticsearch output .

Let’s do it!

The old cluster

Let’s say you have already elasticsearch 1.5.2 up and running on localhost:9200 with cluster name old.

bin/elasticsearch --cluster.name=old

The cluster has an existing index named person. It has 5 shards and 1 million documents.

sense

sense

The new cluster

Let’s start a new cluster. It will run on localhost:9201 with cluster name new:

bin/elasticsearch --cluster.name=new

It’s empty:

curl -XGET "http://localhost:9201/person"
{
   "error": "IndexMissingException[[person] missing]",
   "status": 404
}

Installing logstash

So let’s download and install logstash 1.5.0:

wget http://download.elastic.co/logstash/logstash/logstash-1.5.0.tar.gz
tar xzf logstash-1.5.0.tar.gz
cd logstash-1.5.0

Now we can define our logstash config file logstash.conf:

input {
  # We read from the "old" cluster
  elasticsearch {
    hosts => [ "localhost" ]
    port => "9200"
    index => "person"
    size => 500
    scroll => "5m"
    docinfo => true
  }
}

output {
  # We write to the "new" cluster
  elasticsearch {
    host => "localhost"
    port => "9201"
    protocol => "http"
    index => "%{[@metadata][_index]}"
    index_type => "%{[@metadata][_type]}"
    document_id => "%{[@metadata][_id]}"
  }
  # We print dots to see it in action
  stdout {
    codec => "dots"
  }
}

Run and fix

Run it:

bin/logstash -f logstash.conf

Check and fix documents

What does it produce?

curl -XGET "http://localhost:9200/person/person/AU1wqyQWZJKU8OibfxgH"
{
   "_index": "person",
   "_type": "person",
   "_id": "AU1wqyQWZJKU8OibfxgH",
   "_version": 1,
   "found": true,
   "_source": {
      "name": "Tali Elyne",
      "dateOfBirth": "1955-05-03",
      "gender": "female",
      "children": 2,
      "marketing": {
         "cars": null,
         "shoes": null,
         "toys": null,
         "fashion": null,
         "music": null,
         "garden": null,
         "electronic": null,
         "hifi": null,
         "food": 846
      },
      "address": {
         "country": "Germany",
         "zipcode": "0099",
         "city": "Bonn",
         "countrycode": "DE",
         "location": [
            7.075943707068682,
            50.72883500730124
         ]
      }
   }
}

Let’s compare that with the other cluster:

curl -XGET "http://localhost:9201/person/person/AU1wqyQWZJKU8OibfxgH"
{
   "_index": "person",
   "_type": "person",
   "_id": "AU1wqyQWZJKU8OibfxgH",
   "_version": 1,
   "found": true,
   "_source": {
      "name": "Tali Elyne",
      "dateOfBirth": "1955-05-03",
      "gender": "female",
      "children": 2,
      "marketing": {
         "cars": null,
         "shoes": null,
         "toys": null,
         "fashion": null,
         "music": null,
         "garden": null,
         "electronic": null,
         "hifi": null,
         "food": 846
      },
      "address": {
         "country": "Germany",
         "zipcode": "0099",
         "city": "Bonn",
         "countrycode": "DE",
         "location": [
            7.075943707068682,
            50.72883500730124
         ]
      },
      "@version": "1",
      "@timestamp": "2015-05-20T09:53:44.089Z"
   }
}

Logstash has just added @version and @timestamp fields. We might want to filter them out using Mutate filter plugin and its remove_field :

filter {
  mutate {
    remove_field => [ "@timestamp", "@version" ]
  }
}

Check and fix mapping

Actually, logstash reads here the _source field from existing documents and reinject them directly into a new cluster. But logstash does not take care at all about mappings.

If we compare the old mapping and the new one, here is what we get:

curl -XGET "http://localhost:9200/person/person/_mapping"
{
   "person": {
      "mappings": {
         "person": {
            "properties": {
               "address": {
                  "properties": {
                     "city": {
                        "type": "string",
                        "index": "not_analyzed"
                     },
                     "country": {
                        "type": "string",
                        "index": "not_analyzed"
                     },
                     "countrycode": {
                        "type": "string",
                        "index": "not_analyzed"
                     },
                     "location": {
                        "type": "geo_point"
                     },
                     "zipcode": {
                        "type": "string"
                     }
                  }
               },
               "children": {
                  "type": "long"
               },
               "dateOfBirth": {
                  "type": "date",
                  "format": "dateOptionalTime"
               },
               "gender": {
                  "type": "string",
                  "index": "not_analyzed"
               },
               "marketing": {
                  "properties": {
                     "cars": {
                        "type": "long"
                     },
                     "electronic": {
                        "type": "long"
                     },
                     "fashion": {
                        "type": "long"
                     },
                     "food": {
                        "type": "long"
                     },
                     "garden": {
                        "type": "long"
                     },
                     "hifi": {
                        "type": "long"
                     },
                     "music": {
                        "type": "long"
                     },
                     "shoes": {
                        "type": "long"
                     },
                     "toys": {
                        "type": "long"
                     }
                  }
               },
               "name": {
                  "type": "string"
               }
            }
         }
      }
   }
}
curl -XGET "http://localhost:9201/person/person/_mapping"
{
   "person": {
      "mappings": {
         "person": {
            "properties": {
               "address": {
                  "properties": {
                     "city": {
                        "type": "string"
                     },
                     "country": {
                        "type": "string"
                     },
                     "countrycode": {
                        "type": "string"
                     },
                     "location": {
                        "type": "double"
                     },
                     "zipcode": {
                        "type": "string"
                     }
                  }
               },
               "children": {
                  "type": "long"
               },
               "dateOfBirth": {
                  "type": "date",
                  "format": "dateOptionalTime"
               },
               "gender": {
                  "type": "string"
               },
               "marketing": {
                  "properties": {
                     "cars": {
                        "type": "long"
                     },
                     "electronic": {
                        "type": "long"
                     },
                     "fashion": {
                        "type": "long"
                     },
                     "food": {
                        "type": "long"
                     },
                     "garden": {
                        "type": "long"
                     },
                     "hifi": {
                        "type": "long"
                     },
                     "music": {
                        "type": "long"
                     },
                     "shoes": {
                        "type": "long"
                     },
                     "toys": {
                        "type": "long"
                     }
                  }
               },
               "name": {
                  "type": "string"
               }
            }
         }
      }
   }
}

We can see some obvious differences!

 "location": {
    "type": "geo_point"
 }
 "location": {
    "type": "double"
 }

So we can fix that by creating first the mapping we want to apply before indexing any data. At this point you can obviously change the original mapping to whatever you want. For example, change an analyzer. You can also define new settings for your index. By default, elasticsearch creates 5 shards and 1 replica for each shard. But once again, you can control that:

curl -XDELETE "http://localhost:9201/person"
curl -XPUT "http://localhost:9201/person" -d'
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  }
}'
curl -XPUT "http://localhost:9201/person/person/_mapping" -d'
{
  "person": {
    "properties": {
      "address": {
        "properties": {
          "city": {
            "type": "string",
            "index": "not_analyzed"
          },
          "country": {
            "type": "string",
            "index": "not_analyzed"
          },
          "countrycode": {
            "type": "string",
            "index": "not_analyzed"
          },
          "location": {
            "type": "geo_point"
          },
          "zipcode": {
            "type": "string"
          }
        }
      },
      "children": {
        "type": "long"
      },
      "dateOfBirth": {
        "type": "date",
        "format": "dateOptionalTime"
      },
      "gender": {
        "type": "string",
        "index": "not_analyzed"
      },
      "marketing": {
        "properties": {
          "cars": {
            "type": "long"
          },
          "electronic": {
            "type": "long"
          },
          "fashion": {
            "type": "long"
          },
          "food": {
            "type": "long"
          },
          "garden": {
            "type": "long"
          },
          "hifi": {
            "type": "long"
          },
          "music": {
            "type": "long"
          },
          "shoes": {
            "type": "long"
          },
          "toys": {
            "type": "long"
          }
        }
      },
      "name": {
        "type": "string"
      }
    }
  }
}'

And now, you can reindex your data again!

bin/logstash -f logstash.conf
sense

sense

Changing index or type names

Note that you can of course reindex in another index name if you want to, change the type name, even the id! 😀

  elasticsearch {
    host => "localhost"
    port => "9201"
    protocol => "http"
    index => "europe_people"
    index_type => "someone"
    document_id => "%{[@metadata][_id]}"
  }

© 2010 - 2025 David Pilato

🌱 Generated from 🇫🇷 with ❤️ on Sat Jan 11, 2025 at 08:22:25 UTC
Powered by Hugo with theme Dream.

Who am I?

Developer | Evangelist at elastic and creator of the Elastic French User Group . Frequent speaker about all things Elastic, in conferences, for User Groups and in companies with BBL talks . In my free time, I enjoy coding and DeeJaying , just for fun. Living with my family in Cergy, France.

Details

I discovered Elasticsearch project in 2011. After contributed to the project and created open source plugins for it, David joined elastic the company in 2013 where he is Developer and Evangelist. He also created and still actively managing the French spoken language User Group. At elastic, he mainly worked on Elasticsearch source code, specifically on open-source plugins. In his free time, he likes talking about elasticsearch in conferences or in companies (Brown Bag Lunches AKA BBLs ). He is also author of FSCrawler project which helps to index your pdf, open office, whatever documents in elasticsearch using Apache Tika behind the scene.

Visited countries

You can see here the countries I have visited so far. Most of them are for business purpose but who said you can not do both: business and leisure?

38 countries visited

Social Links