Indexer ses documents bureautique avec la suite Elastic et FSCrawler

Nov. 2022

David Pilato

Slides

Abstract

Vous avez sous la main des tonnes de documents Open Office, Microsoft Office, PDF voire des images… Et vous aimeriez être capable de chercher dans leurs meta-données et dans le contenu lui-même.

Comment faire ? Surtout depuis l’annonce de la fin de Google Search Appliance.

Dans cette session, David expliquera comment Apache Tika peut fournir ce service et comment combiner cette fantastique librairie avec elasticsearch :

Resources

The following resources were mentioned during the presentation or are useful additional information.

© 2010 - 2026 David Pilato

🔍 Search is powered by QueryBox. Just hit CTRL+K or CMD+K to start searching.

⚙️ Generated from 🇫🇷 with ❤️ on Wed Jan 28, 2026 at 08:39:25 UTC

🌱 Powered by Hugo with theme Dream and some custom templates.

Details

I discovered Elasticsearch project in 2011. After contributed to the project and created open source plugins for it, David joined elastic the company in 2013 where he is Developer and Evangelist. He also created and still actively managing the French spoken language User Group. At elastic, he mainly worked on Elasticsearch source code, specifically on open-source plugins. In his free time, he likes talking about elasticsearch in conferences or in companies (Brown Bag Lunches AKA BBLs ). He is also author of FSCrawler project which helps to index your pdf, open office, whatever documents in elasticsearch using Apache Tika behind the scene.

Who am I?

Developer | Evangelist at elastic and creator of the Elastic French User Group . Frequent speaker about all things Elastic, in conferences, for User Groups and in companies with BBL talks . In my free time, I enjoy coding and deejaying as DJ Elky , just for fun. Living with my children in Cergy, France.

Social Links