How to Analyse and Make Sense of Humongous Datasets

This was the title of an invited talk I gave at MongoDB’s first public event in Germany on September 26th. MongoDB is awesome in that it is able to handle large amounts of both structured (read: relational sources) and unstructured (read: NoSQL) data. Also, the ability to integrate data from a number of disparate sources and the fast response times make it a good fit to be used together with Tableau for any kind of ad-hoc analysis task. In order to show these capabilities and also to have some fun I decided to spice up the introduction of Tableau I provided there with a little live demo of how this looks in real life. When it came to select what data to use I decided to go with movie data – a logical choice since we have the Tableau Cinema Tour coming up soon (see below). Also, one of our founding fathers here at Tableau is Prof. Pat Hanrahan, who received his first Academy Award (of three!) for the development of the RenderMan┬« Software that only made movies like Toy Story possible in the first place. Continue reading →

Why Visual Data Analysis is Great

Wow, another year has passed and so much has happened in the meantime!

During my job at the Institute for Transport Research at the German Aerospace Center (DLR) in Berlin I not only worked on the theoretical underpinnings and actual development and implementation of micro-scale traffic models but was obviously also involved in publicizing the results of said models and also other research work. I did this mostly with R, Shiny, PostgreSQL/PostGIS, QGIS and the occasional line of Python code sprinkled in-between. They’re all great. I love them with all my heart and enjoy every second I’m working with one of them. But I found it increasingly hard to visualize data easily and quickly while still being pretty. Sure R and ggplot allow for camera-ready plots, Shiny and Leaflet make it increasingly easy to put together interactive plots and maps. But sometimes fiddling with their settings and writing the necessary code is just not practical to get to the point quickly. Also, during the fascinating stage of exploratory data analysis (kind of the first date with your new data in the data analysis process…) I felt focusing too much on the code and other technical aspects which distracted me from what I was originally doing: exploring my data to get a better understanding. Going back to the dating analogy it’s like over-thinking what to order and what small-talk topic to bring up next and thereby losing the interest of your possible future partner instead of being focused exclusively on him/her. Not a recipe for success… Continue reading →