David Pilato
Developer | Evangelist Elastic
20+ years of experience, mostly in Java. Living in Cergy, France.

Self Introduction

Developer | Evangelist at elastic and creator of the Elastic French User Group. Frequent speaker about all things Elastic, in conferences, for User Groups and in companies with BBL talks. In my free time, I enjoy coding and DeeJaying, just for fun. Living with my family in Cergy, France.


I discovered Elasticsearch project in 2011. After contributed to the project and created open source plugins for it, David joined elastic the company in 2013 where he is Developer and Evangelist. He also created and still actively managing the French spoken language User Group. At elastic, he mainly worked on Elasticsearch source code, specifically on open-source plugins. In his free time, he likes talking about elasticsearch in conferences or in companies (Brown Bag Lunches AKA BBLs). He is also author of FSCrawler project which helps to index your pdf, open office, whatever documents in elasticsearch using Apache Tika behind the scene.

Visited countries

You can see here the countries I have visited so far. Most of them are for business purpose but who said you can not do both: business and leisure?

38 countries visited

I have been missing you! Indeed, last year, I have not been able to publish my anniversary blog post as I’m used to do every year since I joined Elastic 9 years ago. That was for a technical reason actually. I was using a old and not updated blogging platform and it took me a looooong time before I was able to invest time to switch everything to Hugo. So here we go! This year celebrates my 9 years anniversary at elastic but also a new blogging system.
Featured Image
What a ride! 10 employees to around 2000 now. As I imagined 8 years ago, I still think that Elasticsearch (the product) and elastic (the company) are successful. Becoming a public company did not change a lot my daily activities. I’m still on the road meeting/building the community, specifically in France and making sure people are sharing the same love that we have internally for the products we are building. I’d like this year to focus this anniversary blog post on some items:
Featured Image
When I joined Elastic (formerly Elasticsearch) it was a startup with 10 employees + the founders. As one of those first employees I was invited (with #elkie and my wife) to the NYSE event where Elastic went listed as ESTC symbol. Some of us there (Rashid, Karel, Myself, Igor, Costin, Luca, Clinton). Yeah. You are not probably used to see us wearing a suit! :) If you want to read again my story, it’s there:
Featured Image
This blog post is part of a series of 3: Importing Bano dataset with Logstash Using Logstash to lookup for addresses in Bano index Using Logstash to enrich an existing dataset with Bano In the previous post, we described how we can transform a postal address to a normalized one with also the geo location point or transform a geo location point to a postal address. Let’s say we have an existing dataset we want to enrich.
Featured Image
This blog post is part of a series of 3: Importing Bano dataset with Logstash Using Logstash to lookup for addresses in Bano index Using Logstash to enrich an existing dataset with Bano In the previous post, we described how we indexed data coming from the BANO project so we now have indices containing all the french postal addresses. Let’s see what we can do now with this dataset. Searching for addresses Good. Can we use a search engine to search?
Featured Image
This blog post is part of a series of 3: Importing Bano dataset with Logstash Using Logstash to lookup for addresses in Bano index Using Logstash to enrich an existing dataset with Bano I’m not really sure why, but I love the postal address use case. Often in my career I had to deal with that information. Very often the information is not well formatted so it’s hard to find the information you need when you have as an input a not so nice dataset.
Featured Image
What a milestone! Can you imagine how changed the company in the last 5 years? From 10 employees when I joined to more than 700 now! If you want to read again my story, it’s there: 2013: Once upon a time
 2014: Once upon a time: a year later
 2015: Once upon a time: Make your dreams come true 2016: 3 years! Time flies! 2017: 4 years at elastic! Before speaking about what happened the last 5 years for me, let’s modify a bit the script I wrote last year.
Featured Image
This post is starting to become a long series 😊 Yeah! That’s amazing! I just spent 4 years working at elastic and I’m starting my happy 5th year! If you want to read again my story, it’s there: 2013: Once upon a time
 2014: Once upon a time: a year later
 2015: Once upon a time: Make your dreams come true 2016: 3 years! Time flies! This year, I will celebrate this by writing a new tutorial

Featured Image
In a recent post we have seen how to create real integration tests. Those tests launch a real elasticsearch cluster, then run some tests you write with JUnit or your favorite test framework then stop the cluster. But sometimes, you may want to add existing plugins in your integration test cluster. For example, you might want to use X-Pack to bring fantastic features such as: Security Alerting Monitoring Graph Reporting Let’s see how you can do that with Maven and Ant again

8 min read
This blog post is part of a series which will teach you: How to write a plugin for elasticsearch 5.0 using Maven. How to add a new REST endpoint plugin to elasticsearch 5.0. How to use Transport Action classes (what you are reading now). How I wrote the ingest-bano plugin which will be hopefully released soonish. In this plugin, new REST endpoints have been added. In the previous article, we discovered how to add a REST plugin.
11 min read
This blog post is part of a series which will teach you: How to write a plugin for elasticsearch 5.0 using Maven. How to add a new REST endpoint plugin to elasticsearch 5.0 (what you are reading now). How I wrote the ingest-bano plugin which will be hopefully released soonish. In this plugin, new REST endpoints have been added. Imagine that you wish to add a new REST endpoint so you can send requests like:
7 min read
Integration tests
 How do you run them? Often, you are tempted to run services you want to test from JUnit for example. In elasticsearch, you can extend ESIntegTestCase class which will start a cluster of a given number of nodes. public class BanoPluginIntegrationTest extends ESIntegTestCase { public void testPluginIsLoaded() throws Exception { // Your code here } } But to be honest, the test you are running does not guarantee that you will have the same result in production.
13 min read
This blog post is part of a series which will teach you: How to write a plugin for elasticsearch 5.0 using Maven. How to write an ingest plugin for elasticsearch 5.0 (what you are reading now). How I wrote the ingest-bano plugin which will be hopefully released soonish. Today, we will focus on writing an Ingest plugin for elasticsearch. Hey! Wait! You wrote Ingest? What is that? Ingest is a new feature coming in elasticsearch 5.
9 min read
Elasticsearch 5.0 switched to Gradle in October 2015. You can obviously write a plugin using Gradle if you wish and you could benefit from all the goodies elasticsearch team wrote when it comes to integration tests and so on. My colleague, Alexander Reelsen aka Spinscale on Twitter, wrote a super nice template if you wish to create an Ingest plugin for 5.0. Hey! Wait! You wrote Ingest? What is that? Ingest is a new feature coming in elasticsearch 5.
5 min read
Sounds like a cool music, right? At least this is one of my favorite tracks. May be some of you already know that, I enjoy doing some DeeJaying for my friends. But today, I want to speak about another kind of beats. Elastic beats! Elastic Beats Actually my favorite funky music track is a one from Georges Duke: Reach out! But this is another story
 Beats So what are beats?
Featured Image
3 years! Can you imagine that? Already 3 years spent working at elastic? Time flies! 2015 has been an uncommon year for me. Not because Marty Mc Fly and Doc Emmett Brown finally arrived
 Not because, Han Solo, Leia and friends were finally back again
 But for technical and also personal reasons. On a personal side, I had to deal with two major issues which cause some slow down in my professional activities.
Featured Image
I just discovered a nice video which explains the Zipf’s law. I’m wondering if I can index the french lexique from UniversitĂ© de Savoie and find some funny things based on that
 Download french words wget http://www.lexique.org/listes/liste_mots.txt head -20 liste_mots.txt What do we have? It’s a CSV file (tabulation as separator): 1_graph 8_frantfreqparm 0 279.84 1 612.10 2 1043.90 3 839.32 4 832.23 5 913.87 6 603.42 7 600.61 8 908.03 9 1427.
Featured Image
I gave a BBL talk recently and while chatting with attendees, one of them told me a simple use case he covered with elasticsearch: indexing metadata files on a NAS with a simple ls -lR like command. His need is to be able to search on a NAS for files when a user wants to restore a deleted file. As you can imagine a search engine is super helpful when you have hundreds of millions files!
Featured Image
Some months ago, I published a recipe on how to index Twitter with Logstash and Elasticsearch. I have the same need today as I want to monitor Twitter when we run the elastic FR meetup (join us by the way if you are in France!). Well, this recipe can be really simplified and actually I don’t want to waste my time anymore on building and managing elasticsearch and Kibana clusters anymore. Let’s use a Found by elastic cluster instead.
Featured Image
This article is based on Recommender System with Mahout and Elasticsearch tutorial created by MapR. It now uses the 20M MovieLens dataset which contains: 20 million ratings and 465 000 tag applications applied to 27 000 movies by 138 000 users and was released in 4/2015. The format with this recent version has changed a bit so I needed to adapt the existing scripts to the new format. Prerequisites Download the 20M MovieLens dataset. Unzip it.
Recently, I got a database MySQL dump and I was thinking of importing it into elasticsearch. The first idea which pops up was: install MySQL import the database read the database with Logstash and import into elasticsearch drop the database uninstall MySQL Well. I found that some of the steps are really not needed. I can actually use ELK stack and create a simple recipe which can be used to import SQL dump scripts without needing to actually load the data to a database and then read it again from the database.
I’m often running some demos during conferences where we have a booth. As many others, I’m using Twitter feed as my datasource. I have been using Twitter river plugin for many years but, you know, rivers have been deprecated. Logstash 1.5.0 provides a safer and more flexible way to deal with tweets with its twitter input. Let’s do it! Let’s assume that you have already elasticsearch 1.5.2, Logstash 1.5.0 and Kibana 4.0.2 running on your laptop or on a cloud instance.
4 min read
Featured Image
Sometimes, you would like to reindex your data to change your mapping or to change your index settings or to move from one server to another or to one cluster to another (think about multiple data centers for example). For the later you can use Snapshot and Restore feature but if you need to change any index settings, you need something else. With Logstash 1.5.0, you can now do it super easily using elasticsearch input and elasticsearch output.
Featured Image
Using Found by elastic cluster helps a lot to have a ready to use and managed elasticsearch cluster. I started my own cluster yesterday to power brownbaglunch.fr website (work in progress) and it was ready to use after some clicks! It’s a kind of magic! But I ran into an issue when you secure it and use the elasticsearch javascript client. Creating your cluster Found Console Adding ACL By default, your cluster is opened but you can fix that by opening “Access Control” menu.
Featured Image
I gave recently a talk at Voxxed Istanbul 2015 and I’d like to share here the story of this talk. The talk was about adding a real search engine for your legacy application. Here “legacy” means an application which is still using SQL statements to execute search requests. Our current CRM application can visualize our customers. Each person is represented as a Person bean and have some properties like name, dateOfBirth, children, country, city and some metrics related to the number of clicks each person did on the car or food buttons on our mobile application (center of interests that is).
Featured Image
I gave recently a talk at Devoxx France 2015 with Colin Surprenant and I’d like to share here some of the examples we used for the talk. The talk was about “what my data look like?”. We said that our manager was asking us to answer some questions: who are our customers? how do they use our services? what do they think about us on Twitter? Our CRM database So we have a PostgreSQL database containing our data.
Featured Image
Recently I saw a tweet where Capitaine Train team started to open data they have collected and enriched or corrected. Ouvrez, ouvrez, les donnĂ©es structurĂ©es. Capitaine Train libĂšre les gares : https://t.co/y6DjWsbALF #opendata — Trainline France (@trainline_fr) April 23, 2015 I decided to play a bit with ELK stack and create a simple recipe which can be used with any other CSV like data. Prerequisites You will need: Logstash: I’m using 1.5.0-rc3. Elasticsearch: I’m using 1.
Featured Image
I was trying to use Hibernate 4.3.8.Final with Log4J2 and I spent some hours to find why Hibernate was not using Log4J2 though it was declared in my pom.xml file. Actually, I hit issue JBLOGGING-107. The workaround is simply to add a more recent jboss-logging dependency than the one shipped by default with Hibernate 4.3.8.Final. <dependency> <groupId>org.jboss.logging</groupId> <artifactId>jboss-logging</artifactId> <version>3.2.1.Final</version> </dependency>
Oh wait! Already 2 years spent working for Elasticsearch? Time flies! After the first year, I wrote that I did 58 talks in 4 countries, 37 towns for about 18 000 kilometers traveled. I was pretty sure that things would continue to grow. This year, I spoke 78 times! Around 2 talks per week! I did around 48 000 kilometers. 8 000 km more than the earth’s circumference! I still can’t believe it
 12 countries. And no need to say that I love giving talks and sharing my enthusiasm about Elasticsearch!
Featured Image
I joined Elasticsearch Inc one year ago. Those were pretty exciting days! But now
 It’s more than that! Really! You could think that after one year, my motivation would start to decrease. I have the total opposite feeling. Still excited by my job, by the company and by the project, but most of all by the amazing team I’m lucky to work with! Everyone is different and each of us adds different value to Elasticsearch. Personally, I learn a lot from my co-workers.
Featured Image
Once upon a time
 In fact 2 years ago, I was looking for a way to make Hibernate search distributed on multiple nodes. My first idea was to store indexes in a single database shared by my nodes. Yes, it’s a stupid idea in term of performances but I would like to try to build it. Digging for source code, I came to the JdbcDirectory class from the compass project. And I saw on the compass front page something talking about the future of Compass and Elasticsearch.
Featured Image
Il Ă©tait une fois
 En fait, il y a 2 ans, je cherchais un moyen pour distribuer Hibernate search sur plusieurs noeuds. Ma premiĂšre idĂ©e Ă©tait de stocker les index dans une base de donnĂ©es partagĂ©e par les diffĂ©rents noeuds. Oui ! Il s’agit d’une idĂ©e stupide en terme de performances, mais j’avais envie d’essayer et de construire ce modĂšle. AprĂšs avoir cherchĂ© du code source, je suis finalement tombĂ© sur la classe JdbcDirectory du projet Compass.
Featured Image
Avec Malloum, nous venons de publier notre premier projet open-source commun: Scrut My Docs ! Technical overview Nos objectifs Fournir une application web clĂ© en main permettant d’indexer des documents de vos disques locaux. Fournir Ă  la communautĂ© Elasticsearch un modĂšle de base pour dĂ©velopper votre propre webapp pour une utilisation simple de recherche (« Ă  la google »). Aider les dĂ©butants Elasticsearch Java avec des exemples concrets en Java Les technologies employĂ©es Elasticsearch !
Featured Image
Et voilĂ , la premiĂšre release de la factory spring vient d’ĂȘtre faite. Vous pouvez donc maintenant l’utiliser dans vos projets Maven : <dependency> <groupId>fr.pilato.spring</groupId> <artifactId>spring-elasticsearch</artifactId> <version>0.0.1</version> </dependency> Le code source est disponible sur github.
Nativement, Elasticsearch expose l’ensemble de ses services sans aucune authentification et donc une commande du type curl -XDELETE http://localhost:9200/myindex peut faire de nombreux dĂ©gĂąts non dĂ©sirĂ©s. De plus, si vous dĂ©veloppez une application JQuery avec un accĂšs direct depuis le poste client Ă  votre cluster Elasticsearch, le risque qu’un utilisateur joue un peu avec votre cluster est grand ! Alors, pas de panique
 La sociĂ©tĂ© Sonian Inc. a open sourcĂ© son plugin Jetty pour Elasticsearch pour notre plus grand bonheur 😉
Le besoin Il existe dans Hibernate une fonctionnalitĂ© que j’aime beaucoup : la mise Ă  jour automatique du schĂ©ma de la base en fonction des entitĂ©s manipulĂ©es. Mon besoin est de faire quasiment la mĂȘme chose avec Elasticsearch. C’est Ă  dire que je souhaite pouvoir appliquer un mapping pour un type donnĂ© Ă  chaque fois que je dĂ©marre mon projet (en l’occurrence une webapp). En me basant sur le projet dĂ©veloppĂ© par Erez Mazor, j’ai donc dĂ©veloppĂ© unefactory Spring visant Ă  dĂ©marrer des clients (voire des noeuds) Elasticsearch.
Bonjour, C’est avec une certaine Ă©motion et fiertĂ© que j’ai appris samedi dernier la sĂ©lection de mon talk sur Elasticsearch Ă  Devoxx France. Devoxx France est une confĂ©rence organisĂ©e du 18 au 20 avril 2012 Ă  Paris, pour les DĂ©veloppeurs. Y faire parti au milieu de talents incroyables est vraiment un honneur. Je suis d’autant plus comblĂ© que je vais pouvoir parler du sujet qui me passionne depuis maintenant 1 an : Elasticsearch. A l’origine, Shay Banon devait venir lui-mĂȘme nous parler de l’analyse des donnĂ©es avec les facettes Elasticsearch, mais il ne pourra malheureusement pas ĂȘtre prĂ©sent.
Featured Image
Il existe deux modes d’accĂšs Ă  elasticsearch en Java : Inscrire un noeud client dans le cluster elasticsearch Utiliser un client “simple” Noeud client dans un cluster elasticsearch L’idĂ©e de cette mĂ©thode est de fabriquer un noeud elasticsearch (node) qui dĂ©marre avec les mĂȘmes caractĂ©ristiques qu’un noeud d’indexation et de recherche sauf qu’on lui prĂ©cise qu’il n’hĂ©bergera pas de donnĂ©es. Pour cela, on utilise la propriĂ©tĂ© suivante : node.data=false Elle indique que le noeud que nous dĂ©marrons n’hĂ©bergera pas de donnĂ©es.
Il existe dans elasticsearch la notion de river (riviĂšre) qui comme son nom le laisse supposer permet de voir s’écouler des donnĂ©es depuis une source jusqu’à elasticsearch. Au fur et Ă  mesure que les donnĂ©es arrivent, la riviĂšre les transporte et les envoie Ă  l’indexation dans elasticsearch. En standard, il existe 4 riviĂšres : CouchDB qui permet d’indexer toutes les nouveautĂ©s d’une base CouchDB. Voir aussi cet article Ă  ce propos. RabbitMQ qui permet de rĂ©cupĂ©rer des documents dans une queue de traitement asynchrone (genre JMS) Twitter qui permet d’indexer votre flux de messages twitter par exemple Wikipedia qui permet d’indexer toutes les nouveautĂ©s de l’encyclopĂ©die au fur et Ă  mesure de leurs publications Premiers pas J’ai commencĂ© par bidouiller un peu la riviĂšre CouchDB pour y apporter quelques fonctionnalitĂ©s dont mes collĂšgues avaient besoin :
Featured Image
Les aventures avec Elasticsearch se poursuivent. Combien de fois ai-je dit rĂ©cemment que ce projet est absolument gĂ©nial et qu’il va constituer Ă  mon sens un des projets majeurs des prochaines annĂ©es
 Qui n’a pas besoin de moteur de recherche ? Qui s’est dĂ©jĂ  “emmerdĂ©â€ Ă  fabriquer ça lui-mĂȘme ou Ă  utiliser des briques pouvant aider au prix d’une complexitĂ© plus ou moins grande de mise en oeuvre ? Je crois que nous sommes tous passĂ©s par lĂ  !
AprĂšs avoir testĂ© Elasticsearch, me voici parti pour regarder ce monde Ă©trange qu’on appelle le NoSQL
 A dire vrai, j’ai entendu ce mot il y a quelques annĂ©es, sans jamais vraiment m’y interesser
 AprĂšs tout, une base de donnĂ©es non SQL, ça n’est tout simplement pas possible !!! Puis, Ă  force de cotoyer le monde d’Elasticsearch et les technos JSon et REST, je me lance. Pour des raisons trĂšs pratiques, je choisis CouchDB de Apache. D’une part, il est directement intĂ©grable avec Elasticsearch, et Ă  la lecture rapide de sa documentation, il semble rĂ©pondre Ă  un des besoins auquel une Ă©quipe de mon pĂŽle de dĂ©veloppement est confrontĂ©e.
Featured Image
Elasticsearch, un projet mature en quelques mois
 A suivre de trĂšs prĂšs ! En cherchant un bout de code pour rendre la couche Hibernate Search facilement distribuable sur un cluster de machines JBoss, je suis tombĂ© sur le projet Elasticsearch. Au dĂ©but, un peu interloqué  Puis, je me lance
 Je tĂ©lĂ©charge le projet. Je dĂ©zippe. Je lance
 Miracle. En quelques secondes, je dispose d’un outil dans un Cloud, simple, me permettant d’indexer n’importe quel type de document, de le rĂ©cupĂ©rer et de faire une recherche (au sens google du terme) sur n’importe quel champ
 Et cela, quelque soit la technologie employĂ©e (Java, C#, .
Voici la suite de l’article sur l'installation d’une forge. Finalement, le temps d’obtenir une machine sous Redhat 5 a laissĂ© le temps Ă  la team FusionForge de sortir une release finale de la version 5.0. Nous voilĂ  donc lancĂ©s dans cette installation que je me propose de dĂ©crire ici. A noter que pour le moment la forge n’est pas totalement opĂ©rationnelle. Des Ă©volutions dans la configuration devront ĂȘtre menĂ©es et j’espĂšre pouvoir tenir Ă  jour cet article pour les dĂ©crire.
Featured Image
Jetty peut ĂȘtre trĂšs utile aux projets Maven, notamment dans la phase de tests d’intĂ©gration. Il faut souvent dĂ©ployer l’application sur un serveur type JBoss puis lancer les tests. Avec Jetty, on dispose alors d’un conteneur lĂ©ger qui permet de disposer des fonctionnalitĂ©s essentielles d’un conteneur (webapp, datasource, 
). ProblĂšme : avec la version 7 de Jetty, il faut gĂ©rer l’authentification. Sinon, on obtient une erreur du type : java.lang.IllegalStateException: No LoginService for org.eclipse.jetty.security.authentication.BasicAuthenticator@4095c5ec in ConstraintSecurityHandler@28f52a14@ J’ai trouvĂ© la solution Ă  ce problĂšme sur le blog de Max Berger.
Lorsqu’on souhaite lancer une WebApp avec le plugin Jetty sous Maven 2 depuis un PC sous windows on obtient une erreur rĂ©fĂ©rencĂ©e sous JIRA #JETTY-1063 : java.net.URISyntaxException: Illegal character in path at index 18: file:/C:/Documents and Settings/USER/.m2/repository/org/mortbay/jetty/jetty-maven-plugin/ Ce problĂšme n’est rĂ©solu que sous Maven 3. Pour ceux qui souhaitent rester sous Maven 2 (Maven 3 est encore en version alpha), il faut modifier l’emplacement de la repository pour Ă©viter le souci du caractĂšre ESPACE prĂ©sent dans le chemin C:\Documents and settings\USER\.
Lorsqu’on utilise Hibernate pour dĂ©lĂ©guer la gestion de la persistence, se pose alors le classique problĂšme de l’exception LazyInitialisationException. En effet, dans une modĂ©lisation assez classique, imaginons le cas suivant : Couche ModĂšle (ou DAO) Classe POJO contenant un attribut x et une collection cols @Entity @Inheritance(strategy=InheritanceType.SINGLE_TABLE) public class Dossier { @Id @GeneratedValue private Long id; private String x; @OneToMany(cascade=CascadeType.ALL) private Collections cols; // Getter et setters } Classe DAO Voir le blog pour l’utilisation des generics de Java5 afin d’éviter d’avoir Ă  coder toujours les mĂȘmes mĂ©thodes CRUD.
Voici une astuce permettant de laisser les analystes ou concepteurs utiliser leurs logiciels habituels de documentation (oOo ou Word), tout en permettant de publier automatiquement avec la génération du site un document PDF lisible par tous.
Description de la mise en place de la forge GForge pour les besoins de mon centre informatique. Pour les besoins internes de la douane, j’ai proposĂ© la mise en place d’une forge afin de consolider nos moyens de dĂ©veloppement et de gestion de projets. Histoire d’ĂȘtre cohĂ©rent avec d’autres choix faits par l’administration, projet Adullact, j’ai retenu la forge GFORGE. Je vais dĂ©crire ici le processus d’installation que je vais suivre afin de partager cette information avec d’autres personnes qui pourraient ĂȘtre intĂ©ressĂ©s par cette dĂ©marche.
Je viens de découvrir Google App Engine pour Java. Je vais essayer de compléter cet article au fur et à mesure que je vais avancer dans son utilisation
 Stay tuned