Indexing your office documents with Elastic and FSCrawler

Played 15 times
Videos 7
First Mar 2021
Last Apr 2023
Indexing your office documents with Elastic and FSCrawler

Indexing your office documents with Elastic and FSCrawler

You have plenty of Open Office, Microsoft Office, PDF, image documents and you may want to be able to search for their metadata and content. How can you do that?

In this talk, David will explain how Apache Tika can be used for that and how to combine this fantastic library with Elastic Stack:

Title

Indexing your office documents with Elastic and FSCrawler

Abstract

You have plenty of Open Office, Microsoft Office, PDF, image documents and you may want to be able to search for their metadata and content. How can you do that?

In this talk, David will explain how [Apache Tika](https://tika.apache.org/) can be used for that and how to combine this fantastic library with Elastic Stack:

* Elasticsearch [ingest-attachment plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/ingest-attachment.html)
* [FSCrawler](https://github.com/dadoonet/fscrawler)

Indexer ses documents bureautique avec la suite Elastic et FSCrawler

Vous avez sous la main des tonnes de documents Open Office, Microsoft Office, PDF voire des images… Et vous aimeriez être capable de chercher dans leurs meta-données et dans le contenu lui-même.

Comment faire ? Surtout depuis l’annonce de la fin de Google Search Appliance.

Dans cette session, David expliquera comment Apache Tika peut fournir ce service et comment combiner cette fantastique librairie avec elasticsearch :

Title

Indexer ses documents bureautique avec la suite Elastic et FSCrawler

Abstract

Vous avez sous la main des tonnes de documents Open Office, Microsoft Office, PDF voire des images… Et vous aimeriez être capable de chercher dans leurs meta-données et dans le contenu lui-même.

Comment faire ? Surtout depuis l’annonce de la fin de Google Search Appliance.

Dans cette session, David expliquera comment [Apache Tika](https://tika.apache.org/) peut fournir ce service et comment combiner cette fantastique librairie avec elasticsearch :

* Elasticsearch [ingest-attachment plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/ingest-attachment.html)
* [FSCrawler](https://github.com/dadoonet/fscrawler)

Resources

Useful resources related to this talk.

© 2010 - 2026 David Pilato

🔍 Search is powered by QueryBox. Just hit CTRL+K or CMD+K to start searching.

⚙️ Generated from 🇫🇷 with ❤️ on Wed Jan 28, 2026 at 08:39:28 UTC

🌱 Powered by Hugo with theme Dream and some custom templates.

Details

I discovered Elasticsearch project in 2011. After contributed to the project and created open source plugins for it, David joined elastic the company in 2013 where he is Developer and Evangelist. He also created and still actively managing the French spoken language User Group. At elastic, he mainly worked on Elasticsearch source code, specifically on open-source plugins. In his free time, he likes talking about elasticsearch in conferences or in companies (Brown Bag Lunches AKA BBLs ). He is also author of FSCrawler project which helps to index your pdf, open office, whatever documents in elasticsearch using Apache Tika behind the scene.

Who am I?

Developer | Evangelist at elastic and creator of the Elastic French User Group . Frequent speaker about all things Elastic, in conferences, for User Groups and in companies with BBL talks . In my free time, I enjoy coding and deejaying as DJ Elky , just for fun. Living with my children in Cergy, France.

Social Links