The Short Way to QGIS 2.4 Chugiak on MacOS

Today I finally found some breathing room in my projects to dare updating my MacBook Pro (running MacOS X 10.9.4 Mavericks) from QGIS 2.0 (Dufour) to the recently released version 2.4 (Chugiak). Well, to be honest I also realized that I should update to be able to use the QgsFeatureRequest.setFilterExpression() method to make use of filtering expressions (introduced in version 2.2).

The first step was to download the installer images for QGIS 2.4 and GDAL 1.11 from KyngChaos.

The GDAL disk image contains not only the complete GDAL framework (including the GEOS, PROJ, SQLite and UnixImageIO frameworks), but also NumPy. Since version 1.6.2-1 is dated from end of August 2012 I decided to skip installing, since I installed my NumPy later than that and should therefore be up to date already. The GDAL installation worked without a problem.

Next step was the installation of the actual QGIS 2.4. The readme files recommend to delete any existing QGIS.app file from the Applications folder, so that’s what I did. The installer then confronted me with this error message: "QGIS requires the Matplotlib python module (kyngchaos build)."

matplotlib python module required

matplotlib python module required

Luckily matplotlib 1.3.1-2 from early 2014 can also be found on the KyngChaos website, so I installed that from the disk image (root authorization necessary) and went back to the QGIS installer. When the installer presented me the readme file once again I realized that I had apparently just read over the hint that not only NumPy but also the matplotlib python module was required – classic user error on my end!

The QGIS installer also requires root authorization, takes few minutes and about half a gigabyte of hard disk space. After the small hickups earlier it finished without a problem, and I was presented first with the beautiful new splash screen and then the GUI itself. Side note: I love the fact that QGIS remembered all my settings regarding toolbars, window locations etc.!

QGIS 2.4 Chugiak splash screen

QGIS 2.4 Chugiak splash screen

QGIS 2.4 GUI fresh after installation

QGIS 2.4 GUI fresh after installation

Huge thanks and props have to go to the team behind QGIS – I can’t wait to look for reasons to try out all the new features. For a quick overview I can recommend Nyall Dawsons blog, whose most recent blog articles provide both an overview of and also some details about what’s new in QGIS 2.4.

Batch-Loading and Merging Shape Files Into PostGIS

Today I was faced with the task of having to load a massive amount of shape files into my PostGIS database. The data in question is the Advanced Digital Road Map Database (ADF) (拡張版全国デジタル道路地図データベース) by Sumitomo Electric System Solutions Co., Ltd. (住友電工システムソリューション株式会社). It contains very detailed information (spatial and attributive) about the road network of all Japan and is thereby quite heavy.

Therefore, it was split into a plethora of files using the following naming schema: mmmmmm_ttt.shp, where mmmmmm represents a six-digit mesh code and ttt represents a 2- to 3-digit thematic code. The mesh code is a result of the data being split spatially into small, rectangular chunks. It follows a simple logic, whereby bigger mesh units (represented by the first four digits) are further subdivided into smaller units (represented by the last two digits). It took only a small amount of time to figure out this naming schema and filter the files that would be necessary for my analysis.

Basically I wanted to merge the shape files into PostGIS tables divided by their topic (i.e. road nodes, road links, additional attribute information, etc.). So I had to find a way to batch import the shape files into PostGIS and merge them at the same time. Yet, since the node IDs were only unique within each mesh unit (i.e. shape file), I also had to find a way to incorporate the mesh codes themselves into the data, so I could later on create my own ID schema for the nodes, based on the mesh code and the original node ID (e.g. mmmmmmnnnnn, where mmmmmm represents a six-digit mesh code and nnnnn represents the original 5-digit node ID).

Continue reading →

Working With Non-Unicode Data in Python

Being a researcher in Japan means I often have to work with Japanese data. While generally data is data is data, there are some peculiarities I came across that seem to be related to the fact that those data are about and produced in Japan.

Firstly there is the way they are delivered. I’m no so much talking about deliveries on “hard media” such as CD-ROMs and DVDs being snail-mailed, even though this seems to be the major way of obtaining data until this day. Luckily I’m embedded in an ecosystem of research institutions and university laboratories that engage in joint research projects and thereby share the necessary datasets online using portal websites. I’d especially like to mention the JoRAS portal of the Center for Spatial Information Science (CSIS) at the University of Tokyo (東京大学) here, since their stock is quite extensive and they are always open for collaboration inquiries.

Secondly there is the fact that, not very surprising, Japanese datasets often contains Japanese data. By this I’m not referring to the fact that this data is dealing with information about Japan, but to the fact that it is making use of Japanese script. This introduces some technical difficulties, which I would like to elucidate in this article.

Continue reading →