Why Visual Data Analysis is Great

Wow, another year has passed and so much has happened in the meantime!

During my job at the Institute for Transport Research at the German Aerospace Center (DLR) in Berlin I not only worked on the theoretical underpinnings and actual development and implementation of micro-scale traffic models but was obviously also involved in publicizing the results of said models and also other research work. I did this mostly with R, Shiny, PostgreSQL/PostGIS, QGIS and the occasional line of Python code sprinkled in-between. They’re all great. I love them with all my heart and enjoy every second I’m working with one of them. But I found it increasingly hard to visualize data easily and quickly while still being pretty. Sure R and ggplot allow for camera-ready plots, Shiny and Leaflet make it increasingly easy to put together interactive plots and maps. But sometimes fiddling with their settings and writing the necessary code is just not practical to get to the point quickly. Also, during the fascinating stage of exploratory data analysis (kind of the first date with your new data in the data analysis process…) I felt focusing too much on the code and other technical aspects which distracted me from what I was originally doing: exploring my data to get a better understanding. Going back to the dating analogy it’s like over-thinking what to order and what small-talk topic to bring up next and thereby losing the interest of your possible future partner instead of being focused exclusively on him/her. Not a recipe for success… Continue reading →

Setting up QGIS 2.8 on MacOS X 10.10 Yosemite

The wait is finally over: the new QGIS 2.8 “Wien” has finally been released for MacOS as well! Following the (kind of) tradition of my articles showing how to install QGIS 2.6 2.4, and 2.0 on MacOS, I now sat down to write a brief walkthrough for the latest version as well.

Continue reading →

Setting up QGIS 2.6 on MacOS X 10.10 Yosemite

It’s been awfully quiet here on the blog recently. This is owed to some major changes in my life, including the successful end of my PhD program, a successful job hunt, a move from Japan to Germany, and an interesting yet challenging start in my new job at a major German research institute.

But the recent release of MacOS 10.10 “Yosemite” together with the even more recent release of the new QGIS 2.6 “Brighton” was a brilliant opportunity to not only bring back some life here, but also to continue my mini-series of articles about installing and running QGIS and other rather scientific software packages on the latest versions of MacOS (see here, here, and here for example).

So I sat down on my freshly delivered sofa between unpacked boxes to try my luck. To make a long story short, in my case the installation ran smoothly and was done in about half an hour – downloading the necessary disk images took most of time. But before I updated my QGIS 2.4 to the new version 2.6 I first tried if 2.4 still runs on my freshly upgraded MacOS Yosemite. And there was a small surprise waiting for me here, as MacOS asked me to update my Java SE 6 runtime!

Updated Java SE 6 runtime necessary

Updated Java SE 6 runtime necessary

Luckily this was no big deal, since the error message provided a link to the download page at Apple.

Apple provides the Java update

Apple provides the Java update

Easy installation of the Java update

Easy installation of the Java update

After running this update QGIS 2.4 worked fine like before.

For the download of QGIS itself I decided once again for the packages provided by William Kyngesburye a.k.a KyngChaos – not only did I never have any problems with these, but to my best knowledge they are the only available pre-compiled QGIS packages for MacOS… The installation process follows the steps known from earlier releases:

1. GDAL
First is the new GDAL 1.11. The installation is as easy as downloading the DMG and installing GDAL from the respective PKG therein. Please ignore the NumPy package also contained in the GDAL disk image, since it’s an outdated version. Oh ya, and then there’s this thing that’s still annoying me:

That's still annoying

That’s still annoying

Gatekeeper refuses to open applications and packages from “unidentified developers” (that is, developers that can’t afford a certificate by Apple) by double-clicking. Hence you need to right-click it and select Open.

2. matplotlib and NumPy
Before we can install matplotlib we need to install NumPy. There you can find the most recent version 1.8.0-1. As is stated on the website NumPy is “included on the GDAL Framework disk image, though it may not be up to date”. And indeed the GDAL image mentioned above includes NumPy 1.6.2-1 from mid-2012…
Now that that’s out of the way we can install matplotlib 1.3.1-2.

3. QGIS
And finally QGIS 2.6.0-1 itself. As in the other cases we open the DMG file and install from the PKG file therein. That’s it!

QGIS 2.6 "Brighton" splash screen

QGIS 2.6 “Brighton” splash screen

QGIS 2.6 "Brighton" UI on Yosemite

QGIS 2.6 “Brighton” UI on Yosemite

Now that everything was installed it was time to fire it up for the first time. And lo and behold, it works! Just like that. You can’t ask for more. Now it’s time to discover all the great new features QGIS 2.6 brings!

The Short Way to QGIS 2.4 Chugiak on MacOS

Today I finally found some breathing room in my projects to dare updating my MacBook Pro (running MacOS X 10.9.4 Mavericks) from QGIS 2.0 (Dufour) to the recently released version 2.4 (Chugiak). Well, to be honest I also realized that I should update to be able to use the QgsFeatureRequest.setFilterExpression() method to make use of filtering expressions (introduced in version 2.2).

The first step was to download the installer images for QGIS 2.4 and GDAL 1.11 from KyngChaos.

The GDAL disk image contains not only the complete GDAL framework (including the GEOS, PROJ, SQLite and UnixImageIO frameworks), but also NumPy. Since version 1.6.2-1 is dated from end of August 2012 I decided to skip installing, since I installed my NumPy later than that and should therefore be up to date already. The GDAL installation worked without a problem.

Next step was the installation of the actual QGIS 2.4. The readme files recommend to delete any existing QGIS.app file from the Applications folder, so that’s what I did. The installer then confronted me with this error message: "QGIS requires the Matplotlib python module (kyngchaos build)."

matplotlib python module required

matplotlib python module required

Luckily matplotlib 1.3.1-2 from early 2014 can also be found on the KyngChaos website, so I installed that from the disk image (root authorization necessary) and went back to the QGIS installer. When the installer presented me the readme file once again I realized that I had apparently just read over the hint that not only NumPy but also the matplotlib python module was required – classic user error on my end!

The QGIS installer also requires root authorization, takes few minutes and about half a gigabyte of hard disk space. After the small hickups earlier it finished without a problem, and I was presented first with the beautiful new splash screen and then the GUI itself. Side note: I love the fact that QGIS remembered all my settings regarding toolbars, window locations etc.!

QGIS 2.4 Chugiak splash screen

QGIS 2.4 Chugiak splash screen

QGIS 2.4 GUI fresh after installation

QGIS 2.4 GUI fresh after installation

Huge thanks and props have to go to the team behind QGIS – I can’t wait to look for reasons to try out all the new features. For a quick overview I can recommend Nyall Dawsons blog, whose most recent blog articles provide both an overview of and also some details about what’s new in QGIS 2.4.

Batch-Loading and Merging Shape Files Into PostGIS

Today I was faced with the task of having to load a massive amount of shape files into my PostGIS database. The data in question is the Advanced Digital Road Map Database (ADF) (拡張版全国デジタル道路地図データベース) by Sumitomo Electric System Solutions Co., Ltd. (住友電工システムソリューション株式会社). It contains very detailed information (spatial and attributive) about the road network of all Japan and is thereby quite heavy.

Therefore, it was split into a plethora of files using the following naming schema: mmmmmm_ttt.shp, where mmmmmm represents a six-digit mesh code and ttt represents a 2- to 3-digit thematic code. The mesh code is a result of the data being split spatially into small, rectangular chunks. It follows a simple logic, whereby bigger mesh units (represented by the first four digits) are further subdivided into smaller units (represented by the last two digits). It took only a small amount of time to figure out this naming schema and filter the files that would be necessary for my analysis.

Basically I wanted to merge the shape files into PostGIS tables divided by their topic (i.e. road nodes, road links, additional attribute information, etc.). So I had to find a way to batch import the shape files into PostGIS and merge them at the same time. Yet, since the node IDs were only unique within each mesh unit (i.e. shape file), I also had to find a way to incorporate the mesh codes themselves into the data, so I could later on create my own ID schema for the nodes, based on the mesh code and the original node ID (e.g. mmmmmmnnnnn, where mmmmmm represents a six-digit mesh code and nnnnn represents the original 5-digit node ID).

Continue reading →