I have always been a great fan and avid user of databases. They’re just so versatile, efficient, easy to use, … I found this to be true for all kinds of data, small and large, high-dimensional and low-dimensional, spatial, temporal, you name it. It was only very recently that my data seemed to have outgrown my PostgreSQL database. Not so much in size, but rather in performance.
Whenever possible I recently try to get all my GIS work done in QGIS. Most of the time this is no problem at all. Sometimes it makes things even easier, such as when you’re trying to work with your geospatial data in a PostgreSQL/PostGIS database (good luck trying that in ArcGIS!). But sometimes you come across a task that is just so exotic that nobody has ever come across it. Or at least nobody wrote about coming across it…
[UPDATE: Find informations about installing QGIS 2.4 in this newer article.]
After a hardware failure on my MacBook Pro’s hard disk in the end of last year (replaced for free within half a day at the Apple Store, thanks to my Apple Care Protection Plan) and an extended christmas and new year holiday I’m currently being struck down by a nasty cold and hence decided I have some spare time on my hands to give the latest Mac OS X version 10.9 “Mavericks” a try. It had been released in October, but I had been too busy to play around with it, so far. Obviously I’m not going to screw up my main production machine, the iMac, but instead designated my old, trusted MacBook Pro to be the guinea-pig.
Today I was faced with the task of having to load a massive amount of shape files into my PostGIS database. The data in question is the Advanced Digital Road Map Database (ADF) (拡張版全国デジタル道路地図データベース) by Sumitomo Electric System Solutions Co., Ltd. (住友電工システムソリューション株式会社). It contains very detailed information (spatial and attributive) about the road network of all Japan and is thereby quite heavy.
Therefore, it was split into a plethora of files using the following naming schema: mmmmmm_ttt.shp, where mmmmmm represents a six-digit mesh code and ttt represents a 2- to 3-digit thematic code. The mesh code is a result of the data being split spatially into small, rectangular chunks. It follows a simple logic, whereby bigger mesh units (represented by the first four digits) are further subdivided into smaller units (represented by the last two digits). It took only a small amount of time to figure out this naming schema and filter the files that would be necessary for my analysis.
Basically I wanted to merge the shape files into PostGIS tables divided by their topic (i.e. road nodes, road links, additional attribute information, etc.). So I had to find a way to batch import the shape files into PostGIS and merge them at the same time. Yet, since the node IDs were only unique within each mesh unit (i.e. shape file), I also had to find a way to incorporate the mesh codes themselves into the data, so I could later on create my own ID schema for the nodes, based on the mesh code and the original node ID (e.g. mmmmmmnnnnn, where mmmmmm represents a six-digit mesh code and nnnnn represents the original 5-digit node ID).