Tableau Kino Tour: Budget vs. Einspielergebnis

Hinter den Kulissen der Tableau Kino Tour – Teil 1: Die IMDb

Im letzten Quartal 2016 haben wir als würdigen Nachfolger der Tableau Stadion Tour 2015/16 die Tableau Kino Tour 2016/17 aus der Taufe gehoben. Nach vier bisher sehr erfolgreichen Stationen (Frankfurt, Hamburg, Berlin und Düsseldorf) warten jetzt im zweiten Teil der Tour in 2017 noch Wien, München, Zürich und Stuttgart auf uns. Da immer wieder die Frage nach den Daten kam, die hinter der Kino Tour stecken, und auch weil es tatsächlich ein sehr interessantes und spaßiges Projekt war, diese Daten zu generieren, habe ich im Folgenden mal die einzelnen Schritte zusammengefasst, vom Download der Quelldaten bis zum fertigen Produkt. Aus Copyright-Gründen dürfen wir die fertigen Daten leider nicht einfach so zum Download anbieten, aber mit der Schritt-für-Schritt-Anleitung sollte es nicht allzu schwierig sein, das zuhause selbst nachzustellen.

Continue reading →

My first Makeover Monday submission

Makeover Monday is a great project started by Tableau‘s very own Andy Cotgreave (@acotgreave) and Andy Kriebel (@VizWiz) from Tableau partner The Information Lab. They started it off last year, won an award in the meantime, and Andy Kriebel and Eva Murray (@TriMyData, Tableau Evangelist at EXASOL) decided to continue running the project in 2017 as well. It’s great because it makes people look at data visualizations more critically, try out what works and doesn’t work in data viz themselves, and get discussions started. All that in a very productive and friendly way. Also, people practice working with Tableau. So it’s a win-win-win situation!

How does it work? Andy and Eva publish a a visualization with an interesting story (and the accompanying data set) every week on Sunday. There’s not hard rules per se, but you shouldn’t add any ancillary data and work just with the original data set, you shouldn’t invest hours but instead try to limit yourself to roughly one hour of work, and you should publish the fruits of your work on Twitter using the hashtag #makeovermonday. You can read more about the details on the project website.

I always wanted to participate in Makeover Monday myself but that never materialized. Tonight I felt like finally doing it and actually spent about half an hour with my very first submission. I’m not too proud of it but think it’s OK-ish…

This week’s topic was the massive differences between the income of men and women doing the exact same jobs. This is a global issue but the data was from Australia. I tried out a few things but knew from the beginning I wanted to focus on the actual pay gap – a figure that wasn’t in the original data set. Luckily calculations like this are very easily done in Tableau. As for the color scheme I used the main color used by the source website and some neutral grey. It’s not much, but sometimes less is more, right? In the end the viz doesn’t seem to provide detailed data about different occupations but instead only shows the major negative bias in terms of income differences between men and women. But there’s some interactivity! Just hover over the chart to see the actual detailed data and income values for all the occupations.

Tableau Desktop has a great (undocumented) feature that allows you to automatically taking screenshots of your worksheets whenever they are rendered. It’s no rocket science but you should be careful when activating this mode (performance decrease, anyone?!) and understand fully what you’re doing when following the instructions here. And don’t forget to reset to normal mode when you’re done! This mode is especially interesting when it documents the process of finding the right way to tell the stories hidden in data sets like the ones used for Makeover Monday. Here’s mine for this week’s exercise:

Advanced Logging for Makeover Monday 2017/01

Advanced Logging for Makeover Monday 2017/01

Overall I’m OK with the result, I’m happy with the dataviz exercise, and I’m conviced this will not be my last Makeover Monday submission!

LMTTFY #1: Crime in Denver

With my background of spatial terrorism analysis I’m always very interested in the statistical analysis of crime data. Scott Stoltzman over at stoltzmaniac.com discovered a great data set by the City and County of Denver. It has data about all the criminal offenses in the City and County of Denver for the previous five calendar years plus the current year to date with plenty of attributes, timestamps and even geographic locations. Scott wrote a series of blog posts (starting here, then here, and here) showing some initial exploratory data analysis (ETA) in R and also some in-depth looks into a few topics that sparked his interest along the way. That’s exactly what Tableau wants to enable people to do, so with this first episode of “Let me tableau this for you” I want to show how easy it is to get to the same interesting insights Scott outlined in his write-up, only without all the coding. I’m not sure how long it took him to get from finding the data to generating all the plots in the article, but my guess would be it took longer than this ~20 minute screencast. Enjoy the video below, and please let us know in the comments section if you have anything to add. Please refer to the original post if you want to know more about the idea behind Let me tableau this for you (LMTTFY).

httpv://www.youtube.com/watch?v=OoF5Is-_XNQ

Annoying join behavior in QGIS

Today I stumbled across something I wouldn’t exactly consider a bug, but at least some rather unintuitive and annoying behavior in QGIS when performing table joins.

I did something very mundane: joining a Postgres table of spatial data to another Postgres table of attribute data. The normal way to do this (for me) is as follows:

  1. Open the spatial table using Layer > Add Layer > Add PostGIS Layers...
  2. Open the attribute table the same way (1 & 2 can be loaded in one go)
  3. Join the tables in the spatial table’s Properties dialog.

For that last step I decided to join the two tables (plr is the spatial table here, while mss has the attributes) using the field plr_id, which exists in both tables and only once on each side (hence a plain vanilla 1:1 join).

Add vector join dialog window in QGIS 2.8

Add vector join dialog window in QGIS 2.8

That works perfectly fine, except that somehow the order of the joined fields appears to get messed up:

QGIS attirbute table with erroneously shifted field contents

QGIS attirbute table with erroneously shifted field contents

Some research revealed that this seems to be a problem caused by identical field names in the two joined tables other than the join field itself. In my case the aforementioned plr_id was used to join the two tables, but in addition both tables also had a field gid, as can be seen in the following screenshot on the left:

Table design in pgAdmin: original table including field gid on the left, fixed table without (unnecessesary) field gid on the right

Table design in pgAdmin: original table including field gid on the left, fixed table without (unnecessesary) field gid on the right

Removing this field gid from the attribute table mss was no problem, since the 1:1 relation to the spatial data uses the key plr_id anyways. As can be seen in the screenshot above on the right, the new table mss2 is identical to mss, only without the field gid. And lo-and-behold – joining this attribute table to the spatial table plr in QGIS works flawlessly now:

QGIS attirbute table with correct field contents

QGIS attirbute table with correct field contents

This problem had already been identified in QGIS 2.0 in late 2013, and has been marked as fixed in the meantime. Removing fields with identical names in the two tables is one – admittedly rather radical way – to solve circumvent the issue. Another, more intuitive way would be to choose a meaningful table prefix in the Add vector join dialog which can be seen in the first image above. As you can see I checked the Custom field name prefix checkbox but left the field empty. I prefer this, since it keeps my field names nice and tidy, but in cases where homonymous fields exist in the two tables you will run into trouble – hence entering a prefix here would be a nice and easy fix for this issue.

Everything described above was performed on QGIS 2.8.1-Wien (64bit) on a Windows 7 machine and PostgreSQL 9.1.16 on a 64bit Ubuntu 4.6.3 server (PostGIS 1.5.3).

Upcoming Event: 2015 Annual Meeting of the Association of American Geographers (AAG)

While there’s still some time until the 2015 AAG Annual Meeting kicks off in Chicago next spring the deadline for submitting papers is approaching almost here: November 20th, 2014!

As for me, I will present an algorithm I developed as part of my PhD thesis and in the course of my related research of people’s movements in urban areas:

Konstantin Greger, University of Tsukuba
A Spatio-Temporal Betweenness Centrality Measure for the Micro-Scale Estimation of Pedestrian Traffic

The spatio-temporal mobile population estimation approach I introduce here can be used to calculate an index for the pedestrian traffic volume on street segments divided into deliberately chosen time steps. This is especially useful in the spatial context of highly urbanized areas, as it provides the populations in public space as a complementary element to building populations.

This was achieved by employing a graph theory methodology, namely that of betweenness centrality, and extending it by the temporal dimension. This new model was then applied using a number of datasets that provide information about building populations and train station passenger transfers segregated both spatially and by time.

The introduction of the temporal dimension to the estimation of populations in public space allows for a micro-scale analysis of the actual population figures according to the underlying human activities. I believe that this is the most interesting characteristic of the proposed estimation methodology, since for the first time it allows for a reliable estimation of mobile populations even for large study areas with justifiable requirements in terms of both necessary input data and computational expense.

The output result of the spatio-temporal model can be used to visualize the amount of pedestrians on the streets of a chosen study area. While the data do not represent the absolute numbers of pedestrians, they do reflect the traffic volume and allow for a comparison of crowdedness, which can be used for further quantitative analyses, such as population density calculations for certain points in time.

This year I made an effort to not being placed into some random session as has happened to me both in 2012 and 2014 – in 2013 I went all the way and organized my very own session. Therefore I browsed the (admittedly a wee bit confusing) “abstract and session submission console” on the AAG conference website. There I came across an effort by Prof. Diansheng Guo at the University of South Carolina, who proposed a session (or a series thereof?) labeled “Spatial Data Mining and Big Data Analytics”. I was more than happy to receive an almost instantaneous feedback from Prof. Guo, let alone a positive one!

Obviously I don’t have details about the “where and when”s of said session(s) and my presentation, but I will update this article accordingly once the information has become available. The details are:

Paper Session: Spatial Data Mining and Big Data Analytics (2)
Tuesday, 4/21/2015 10:00 AM – 11:40 AM
304 Classroom, University of Chicago Gleacher Center, 3rd Floor

In the meantime, Here’s the general conference information:

2015 AAG Annual Meeting
April 21 – 25, 2015
Hyatt Regency Chicago
http://www.aag.org/cs/annualmeeting

I’m already looking forward to my fourth AAG, and I would be happy to see you there!