https://david.pilato.fr/posts/2015-04-28-exploring-capitaine-train-dataset/