In 2013 Dr. James Cheshire from the Centre for Advanced Spatial Analysis at the University College London created a data visualization that was critically acclaimed back then and saw something of a renaissance a few weeks ago when a modified version by Henrik Lindberg made its way onto the Reddit front page. I had been mesmerized by the viz from the beginning, so when it reappeared in my blog reader I decided I had to try reproducing it in Tableau.
Imagine you have some kind of system that produces reports on your data – for this example I randomly decided to use bookings for events -, and these reports are published on a regular schedule. Now you want to see two things in your report:
- The current status of participants per event – both for past events (i.e. the actual number of participants) and for future events (i.e. the current number of people registered).
- An overview of how the number of people registered changed over time.
Also, your source system is publishing these data as
.csv files. How can this be done?
Well, very easily using the new wildcard union feature introduced in Tableau Desktop 10.1! Read on to see how this can be done.
Makeover Monday is a great project started by Tableau‘s very own Andy Cotgreave (@acotgreave) and Andy Kriebel (@VizWiz) from Tableau partner The Information Lab. They started it off last year, won an award in the meantime, and Andy Kriebel and Eva Murray (@TriMyData, Tableau Evangelist at EXASOL) decided to continue running the project in 2017 as well. It’s great because it makes people look at data visualizations more critically, try out what works and doesn’t work in data viz themselves, and get discussions started. All that in a very productive and friendly way. Also, people practice working with Tableau. So it’s a win-win-win situation!
How does it work? Andy and Eva publish a a visualization with an interesting story (and the accompanying data set) every week on Sunday. There’s not hard rules per se, but you shouldn’t add any ancillary data and work just with the original data set, you shouldn’t invest hours but instead try to limit yourself to roughly one hour of work, and you should publish the fruits of your work on Twitter using the hashtag #makeovermonday. You can read more about the details on the project website.
I always wanted to participate in Makeover Monday myself but that never materialized. Tonight I felt like finally doing it and actually spent about half an hour with my very first submission. I’m not too proud of it but think it’s OK-ish…
This week’s topic was the massive differences between the income of men and women doing the exact same jobs. This is a global issue but the data was from Australia. I tried out a few things but knew from the beginning I wanted to focus on the actual pay gap – a figure that wasn’t in the original data set. Luckily calculations like this are very easily done in Tableau. As for the color scheme I used the main color used by the source website and some neutral grey. It’s not much, but sometimes less is more, right? In the end the viz doesn’t seem to provide detailed data about different occupations but instead only shows the major negative bias in terms of income differences between men and women. But there’s some interactivity! Just hover over the chart to see the actual detailed data and income values for all the occupations.
Tableau Desktop has a great (undocumented) feature that allows you to automatically taking screenshots of your worksheets whenever they are rendered. It’s no rocket science but you should be careful when activating this mode (performance decrease, anyone?!) and understand fully what you’re doing when following the instructions here. And don’t forget to reset to normal mode when you’re done! This mode is especially interesting when it documents the process of finding the right way to tell the stories hidden in data sets like the ones used for Makeover Monday. Here’s mine for this week’s exercise:
Overall I’m OK with the result, I’m happy with the dataviz exercise, and I’m conviced this will not be my last Makeover Monday submission!
With my background of spatial terrorism analysis I’m always very interested in the statistical analysis of crime data. Scott Stoltzman over at stoltzmaniac.com discovered a great data set by the City and County of Denver. It has data about all the criminal offenses in the City and County of Denver for the previous five calendar years plus the current year to date with plenty of attributes, timestamps and even geographic locations. Scott wrote a series of blog posts (starting here, then here, and here) showing some initial exploratory data analysis (ETA) in R and also some in-depth looks into a few topics that sparked his interest along the way. That’s exactly what Tableau wants to enable people to do, so with this first episode of “Let me tableau this for you” I want to show how easy it is to get to the same interesting insights Scott outlined in his write-up, only without all the coding. I’m not sure how long it took him to get from finding the data to generating all the plots in the article, but my guess would be it took longer than this ~20 minute screencast. Enjoy the video below, and please let us know in the comments section if you have anything to add. Please refer to the original post if you want to know more about the idea behind Let me tableau this for you (LMTTFY).
Today I stumbled across something I wouldn’t exactly consider a bug, but at least some rather unintuitive and annoying behavior in QGIS when performing table joins.
I did something very mundane: joining a Postgres table of spatial data to another Postgres table of attribute data. The normal way to do this (for me) is as follows:
- Open the spatial table using
Layer > Add Layer > Add PostGIS Layers...
- Open the attribute table the same way (1 & 2 can be loaded in one go)
- Join the tables in the spatial table’s
For that last step I decided to join the two tables (
plr is the spatial table here, while
mss has the attributes) using the field
plr_id, which exists in both tables and only once on each side (hence a plain vanilla 1:1 join).
That works perfectly fine, except that somehow the order of the joined fields appears to get messed up:
Some research revealed that this seems to be a problem caused by identical field names in the two joined tables other than the join field itself. In my case the aforementioned
plr_id was used to join the two tables, but in addition both tables also had a field
gid, as can be seen in the following screenshot on the left:
Removing this field
gid from the attribute table
mss was no problem, since the 1:1 relation to the spatial data uses the key
plr_id anyways. As can be seen in the screenshot above on the right, the new table
mss2 is identical to
mss, only without the field
gid. And lo-and-behold – joining this attribute table to the spatial table
plr in QGIS works flawlessly now:
This problem had already been identified in QGIS 2.0 in late 2013, and has been marked as fixed in the meantime. Removing fields with identical names in the two tables is one – admittedly rather radical way – to
solve circumvent the issue. Another, more intuitive way would be to choose a meaningful table prefix in the
Add vector join dialog which can be seen in the first image above. As you can see I checked the
Custom field name prefix checkbox but left the field empty. I prefer this, since it keeps my field names nice and tidy, but in cases where homonymous fields exist in the two tables you will run into trouble – hence entering a prefix here would be a nice and easy fix for this issue.
Everything described above was performed on
QGIS 2.8.1-Wien (64bit) on a Windows 7 machine and
PostgreSQL 9.1.16 on a
64bit Ubuntu 4.6.3 server (