PyData London 2014 In this talk I will describe a system that we've built for doing hierarchical text classification. I will describe the logical setup of the various steps involved: data processing, feature selection, training, validation and labelling. To make this all work in practice we've mapped the setup onto a Hadoop cluster. I'll discuss some of the pro's and con's that we've run into when working with Python and Hadoop. Finally, I'll discuss how we use crowdsourcing to continuously improve the quality of our hierarchical classifier. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps