Scraping the IMDb for Use in Tableau

In the last quarter of 2016 the German marketing team came up with a great way to follow the immense success of last year’s Tableau Stadium Tour: the Tableau Cinema Tour! After visiting ten cities all over Germany, Austria, and Switzerland, we are now considering rolling it out all over Europe. Stay tuned for that! Since we often got requests for the data used in the main demo, I decided to produce this write-up of how to extract the data from the Internet Movie Database (IMDb). Unfortunately copyright reasons make it impossible for us to just provide you the ready-made data. That said, with this walk-through everybody should be able to get the data!

Continue reading →

How to Analyse and Make Sense of Humongous Datasets

This was the title of an invited talk I gave at MongoDB’s first public event in Germany on September 26th. MongoDB is awesome in that it is able to handle large amounts of both structured (read: relational sources) and unstructured (read: NoSQL) data. Also, the ability to integrate data from a number of disparate sources and the fast response times make it a good fit to be used together with Tableau for any kind of ad-hoc analysis task. In order to show these capabilities and also to have some fun I decided to spice up the introduction of Tableau I provided there with a little live demo of how this looks in real life. When it came to select what data to use I decided to go with movie data – a logical choice since we have the Tableau Cinema Tour coming up soon (see below). Also, one of our founding fathers here at Tableau is Prof. Pat Hanrahan, who received his first Academy Award (of three!) for the development of the RenderMan┬« Software that only made movies like Toy Story possible in the first place. Continue reading →