This blog post is part of a series of 3: Importing Bano dataset with Logstash Using Logstash to lookup for addresses in Bano index Using Logstash to enrich an existing dataset with Bano In the previous post, we described how we can transform a postal address to a normalized one with also the geo location point or transform a geo location point to a postal address. Let’s say we have an existing dataset we want to enrich.
This blog post is part of a series of 3: Importing Bano dataset with Logstash Using Logstash to lookup for addresses in Bano index Using Logstash to enrich an existing dataset with Bano In the previous post, we described how we indexed data coming from the BANO project so we now have indices containing all the french postal addresses. Let’s see what we can do now with this dataset. Searching for addresses Good. Can we use a search engine to search?
This blog post is part of a series of 3: Importing Bano dataset with Logstash Using Logstash to lookup for addresses in Bano index Using Logstash to enrich an existing dataset with Bano I’m not really sure why, but I love the postal address use case. Often in my career I had to deal with that information. Very often the information is not well formatted so it’s hard to find the information you need when you have as an input a not so nice dataset.
I just discovered a nice video which explains the Zipf’s law. I’m wondering if I can index the french lexique from Université de Savoie and find some funny things based on that… Download french words wget http://www.lexique.org/listes/liste_mots.txt head -20 liste_mots.txt What do we have? It’s a CSV file (tabulation as separator): 1_graph 8_frantfreqparm 0 279.84 1 612.10 2 1043.90 3 839.32 4 832.23 5 913.87 6 603.42 7 600.61 8 908.03 9 1427.
I gave a BBL talk recently and while chatting with attendees, one of them told me a simple use case he covered with elasticsearch: indexing metadata files on a NAS with a simple ls -lR like command. His need is to be able to search on a NAS for files when a user wants to restore a deleted file. As you can imagine a search engine is super helpful when you have hundreds of millions files!
Some months ago, I published a recipe on how to index Twitter with Logstash and Elasticsearch. I have the same need today as I want to monitor Twitter when we run the elastic FR meetup (join us by the way if you are in France!). Well, this recipe can be really simplified and actually I don’t want to waste my time anymore on building and managing elasticsearch and Kibana clusters anymore. Let’s use a Found by elastic cluster instead.
Recently, I got a database MySQL dump and I was thinking of importing it into elasticsearch. The first idea which pops up was: install MySQL import the database read the database with Logstash and import into elasticsearch drop the database uninstall MySQL Well. I found that some of the steps are really not needed. I can actually use ELK stack and create a simple recipe which can be used to import SQL dump scripts without needing to actually load the data to a database and then read it again from the database.
I’m often running some demos during conferences where we have a booth. As many others, I’m using Twitter feed as my datasource. I have been using Twitter river plugin for many years but, you know, rivers have been deprecated. Logstash 1.5.0 provides a safer and more flexible way to deal with tweets with its twitter input. Let’s do it! Let’s assume that you have already elasticsearch 1.5.2, Logstash 1.5.0 and Kibana 4.0.2 running on your laptop or on a cloud instance.
Sometimes, you would like to reindex your data to change your mapping or to change your index settings or to move from one server to another or to one cluster to another (think about multiple data centers for example). For the later you can use Snapshot and Restore feature but if you need to change any index settings, you need something else. With Logstash 1.5.0, you can now do it super easily using elasticsearch input and elasticsearch output.
I gave recently a talk at Devoxx France 2015 with Colin Surprenant and I’d like to share here some of the examples we used for the talk. The talk was about “what my data look like?”. We said that our manager was asking us to answer some questions: who are our customers? how do they use our services? what do they think about us on Twitter? Our CRM database So we have a PostgreSQL database containing our data.
Recently I saw a tweet where Capitaine Train team started to open data they have collected and enriched or corrected. Ouvrez, ouvrez, les données structurées. Capitaine Train libère les gares : https://t.co/y6DjWsbALF #opendata — Trainline France (@trainline_fr) April 23, 2015 I decided to play a bit with ELK stack and create a simple recipe which can be used with any other CSV like data. Prerequisites You will need: Logstash: I’m using 1.5.0-rc3. Elasticsearch: I’m using 1.