Connecting the Dots – Visualizing Paths in Tableau

I had been planning to write this post for a long time. Not only have I been asked many times how to do this in my daily consulting work, but especially during and after my hands-on training “Stretching the Boundaries with Advanced Mapping” at our Tableau Conference On Tour 2017 in Berlin earlier this year. The question is pretty simple: How can I draw paths in Tableau? Oftentimes these are some kind of movement data, e.g. refugees or flight connections. The way to do this in Tableau is actually very easy – and some of the recently introduced features made it even easier – but it’s imperative to understand how Tableau draws lines and how the data therefore needs to be structured.

The OpenFlights.org route network visualized in Tableau

The OpenFlights.org route network visualized in Tableau

Continue reading →

Using Coordinate Data in Degrees (DMS) Format in Tableau

Have you ever received a spatial data set that you wanted to visualize in Tableau, only to find out the coordinates looked like this: 50°07'01.9"N 8°40'20.8"E If so, or if you’re just generally interested in geographic data and Tableau, this post is for you.

Continue reading →

Linear regression line in Tableau calculated by external R script

Run code from an external .R file in Tableau

Tableau introduced the R integration in version 8.1 back in 2013. That’s awesome because it opens up to Tableau the whole range of analytical functionality R offers. Most of the time the R code being triggered from within Tableau is rather short, such as a regression, a call to a clustering algorithm or correlation measures. But what happens when the code you want to run out of Tableau is getting longer and more complicated? Are you still bound to the “Calculated Field” dialog window in Tableau? It’s nice but it’s tiny and has no syntax coloring or code completion for our precious R code.

Run R code inline in a Calculated Field in Tableau

Run R code inline in a Calculated Field in Tableau

Continue reading →

The Power of Subqueries in PostgreSQL

I’m pretty sure everything I’m writing here is true for most other RDBMS, too, but since I’m currently using PostgreSQL I had a chance to test it and show some hard figures here.

The problem to solve is actually a common one and rather easy to solve conceptually: Take attributes from table1 and store them in corresponding rows in table2, using a common id to join them. The straight-forward (and almost direct) translation into SQL is therefore:

UPDATE schema.table1 a 
SET attribute = b.attribute 
FROM schema.table2 b 
WHERE a.id = b.id;

There’s nothing wrong with that statement and it’s going to do exactly what’s intended. Only, it’s not very clever, and hence not very performant. This obviously only matters if your tables are on the bigger end. In my case table1 (the one to update) has 576,806 rows while table2 (the one providing the attribute data) has a whopping 848,664,485 rows. Also I should mention that table2 contains multiple rows for each corresponding row in table1. In that concrete case table2 contains data about point locations (latitude, longitude, timestamp) of people whose attributes (age, gender, etc…) are stored in table1. And there is this one attribute which is wrongly and inefficiently stored with each point location, while it is only dependent on the person and should hence be stored there.

Continue reading →

Running Gephi 0.8.2-beta on MacOS X 10.8.5 Mountain Lion

For my research I’m recently doing a lot of graph and network analysis. So far the tools in ESRI ArcGIS have been sufficient for what I was trying to achieve. I used their Network Analyst Extension and also the Urban Network Analysis toolbox by the City Form Lab at the MIT/SUTD for a more scientific application: the calculation of centrality measures. While these warrant some more in-depth articles in themselves, here I’d like to put a more technical focus on a really annoying problem when using Gephi on a more recent MacOS X system.

Continue reading →