Whenever possible I recently try to get all my GIS work done in QGIS. Most of the time this is no problem at all. Sometimes it makes things even easier, such as when you’re trying to work with your geospatial data in a PostgreSQL/PostGIS database (good luck trying that in ArcGIS!). But sometimes you come across a task that is just so exotic that nobody has ever come across it. Or at least nobody wrote about coming across it…
I’m pretty sure everything I’m writing here is true for most other RDBMS, too, but since I’m currently using PostgreSQL I had a chance to test it and show some hard figures here.
The problem to solve is actually a common one and rather easy to solve conceptually: Take attributes from
table1 and store them in corresponding rows in
table2, using a common
id to join them. The straight-forward (and almost direct) translation into SQL is therefore:
UPDATE schema.table1 a SET attribute = b.attribute FROM schema.table2 b WHERE a.id = b.id;
There’s nothing wrong with that statement and it’s going to do exactly what’s intended. Only, it’s not very clever, and hence not very performant. This obviously only matters if your tables are on the bigger end. In my case
table1 (the one to update) has 576,806 rows while
table2 (the one providing the attribute data) has a whopping 848,664,485 rows. Also I should mention that
table2 contains multiple rows for each corresponding row in
table1. In that concrete case
table2 contains data about point locations (latitude, longitude, timestamp) of people whose attributes (age, gender, etc…) are stored in
table1. And there is this one attribute which is wrongly and inefficiently stored with each point location, while it is only dependent on the person and should hence be stored there.