Data model to combine the fact table(s) with the sessions table

Tableau Row-Level Security Using Initial SQL with PostgreSQL

Imagine you have Tableau dashboards that require the use of row-level security on data coming from PostgreSQL, but the connection to that database is established through the same technical user for everyone. How can you achieve that? Is it even possible? Short answer: it’s possible – and it’s not even that complicated, provided you’re not afraid of some PL/pgSQL scripting, PostgreSQL session IDs, and clicking on that mysterious “Initial SQL” button in Tableau Desktop. For full disclosure I’d like to mention that I found the original idea for this approach on my colleague Bryant Howell’s excellent blog, where he outlines the process for Microsoft SQL Server. That said, the approach I’m showing here is not a mere translation from SQL Server to PostgreSQL, but I also added a few more features.

Initial setup

It all starts with creating an additional table on your PostgreSQL database, which we will use as the central repository for user sessions. To do that, we are using a nifty function called pg_backend_id() in the PostgreSQL language standard to read the process ID of the server process attached to the current session of this user. In more understandable terms, every time a user connects to a PostgreSQL database, they are assigned a process ID which is valid for this session. We will use this process ID as an identifier that, together with the Tableau Server username, will provide the row-level access to our data. More on that in the next section.

The SQL to create this session table is trivial:

DROP TABLE IF EXISTS superstore.tableau_sessions;

CREATE TABLE superstore.tableau_sessions
 (
     current_pid integer,
     tableau_server_user character varying COLLATE pg_catalog."default"
 )
 WITH (
     OIDS = FALSE
 )
 TABLESPACE pg_default;

 ALTER TABLE superstore.tableau_sessions
     OWNER to postgres; 

We will return to this table later when we talk about security topics.

Initial SQL

In Tableau, a new session on PostgreSQL is started when a user opens a connection, which happens both by establishing a brand new connection (in Tableau Desktop: File > New, or click on the Tableau sparkle, or Data > New Data Source, or Ctrl + D, or click on the New Data Source icon in the toolbar; as a Creator on Tableau Server: click on Create > Workbook > Connect to Data; in Tableau Prep Builder: click on Connect to Data), but also by connecting to an existing connection inside a workbook or as part of a published data source. We will use this fact to establish unique sessions as a combination of Tableau Server user name and PostgreSQL process ID.

To do this, we use the mystical Initial SQL option, that few of us ever touch. It allows us to send some, well, initial SQL query to the data source at connection time (see above). In our case, we’re querying the aforementioned process ID of the newly opened session, check if the current Tableau Server user already has a session entry in this session table, and either insert a new record if they didn’t exist, or update their record with the fresh PostgreSQL process ID:

DO $$
DECLARE selected_pids integer;

BEGIN
EXECUTE 'SELECT current_pid FROM superstore.tableau_sessions WHERE tableau_server_user = $1' USING [TableauServerUser];
GET DIAGNOSTICS selected_pids = ROW_COUNT;
IF selected_pids = 0
THEN
INSERT INTO superstore.tableau_sessions (tableau_server_user, current_pid) VALUES ([TableauServerUser], pg_backend_pid());
ELSE
UPDATE superstore.tableau_sessions SET current_pid = pg_backend_pid() WHERE tableau_server_user = [TableauServerUser];
END IF;
END $$;

If you’re unfamiliar with the syntax, this is PL/pgSQL, a nifty way to run scripts (“procedural language”) as part of your regular SQL queries to PostgreSQL databases. Read all about it in the official documentation. Also, note how we’re providing the currently logged-on Tableau Server username into the query by inserting the built-in parameter [TableauServerUser].

If we check on the database, this works nicely:

postgres=# SELECT * FROM superstore.tableau_sessions;
 current_pid | tableau_server_user
-------------+---------------------
        3024 | anna
(1 row)

You can also make your current process ID visible inside of Tableau Desktop by using a simple query as Custom SQL on this data source:

SELECT pg_backend_pid();

Now that we have a connection established between the current Tableau Server username and the associated PostgreSQL session, we can use this information to filter the data from our fact table(s) and implement the actual row-level security.

Initial results

In order for the row-level security to work, the fact table needs to be equipped with row-level information on who can see each row of data. This can be either already part of the table (i.e. there is a column with this information), or it can be enriched with a simple join. The latter is the scenario you are familiar with if you have ever seen a row-level security demo on Tableau using the Superstore demo dataset. For this example, I uploaded the default Tableau Superstore Orders and People tables to my PostgreSQL database, so to get to a user-filter enabled version I need to join them based on the [Region] fields in both tables. So that’s the left part of the data model you see below. The right part is of way more interest here, as this is where we use the information of the current user session from the table we just populated with our fact table. To do this, I’m using a short Custom SQL query:

SELECT "tableau_server_user", "current_pid" 
FROM "superstore"."tableau_sessions" 
WHERE "current_pid" = pg_backend_pid()
Data model combining the fact table(s) with the sessions table

The join then needs to be established between the field containing the username in your fact table (in my case it’s [Username] in the People table) and the [Tableau Server User] field from our sessions table. Make sure that both joins are INNER JOINS, so as to not have artifacts slip though, where either the region or the username are not set and result in NULLs! Also, please enable the “Assume Referential Integrity” option for this data model, to minimize the potential performance implications our dreaded Custom SQL might have:

Assume referential integrity for our data model

Publish this as a data source to your Tableau Server and you should see it works beautifully!

Initial concerns

Unfortunately, there are a number of security-related issues here.

The first one can be exploited very easily: just modify the Tableau Server username associated with a certain PostgreSQL process ID in our sessions table to hijack their session and see their data. Try it out yourself! Login to Tableau Server as user A, open a new workbook based on your data source on Tableau Server, drag the [Username] field to the view, and see A’s username. Then update the sessions table by running

UPDATE superstore.tableau_sessions 
SET tableau_server_username = 'B' 
WHERE current_pid = <user_As_current_pid>;

and refresh your view on the server. Shocker, you’re still logged on as user A but are seeing B’s username!

Luckily, it’s very easy to prevent this way of hijacking attack. Before publishing the data model as a data source to Tableau Server, we add a data source filter:

[Tableau Server User] = USERNAME()

Because we’re using the USERNAME() function here as reference value, you can’t just use the regular filter dialog, but have to enter this formula on the Condition tab of the filter dialog:

The Condition tab of a Tableau filter dialog

At least this makes this attack vector moot, but it’s still worrying that people can see and modify information in our sessions table. So as a next mandatory step we have to make sure to grant read and write permissions to this table only to the technical user we used to connect Tableau to the PostgreSQL database in the first place. In my case that’s a PostgreSQL role called postgres, and if you scroll all the way up to the section where we created the sessions table, you will see that this role postgres is also the owner of this table. Unless the default privileges on that database have been modified by the ALTER DEFAULT PRIVILEGES command, only the owner of a table should have any privileges on a table, so we’re good. But please make sure you read and understood the information in the official PostgreSQL documentation, and/or speak to your database admin. Also, run a quick check on your table:

postgres=# \dp superstore.tableau_sessions;
                                    Access privileges
   Schema   |       Name       | Type  | Access privileges | Column privileges | Policies
------------+------------------+-------+-------------------+-------------------+----------
 superstore | tableau_sessions | table |                   |                   |
(1 row)

You can go even one step further, if it bothers you that the usernames are stored in this sessions table in clear text. To do so, we employ the md5() function in the PostgreSQL vocabulary. That means that we will store MD5-hashed usernames in that table. Of course, as a result, any reference to the username anywhere in the process now needs to account for that. So here we go:

  • Luckily no changes necessary in the initial CREATE TABLE statement, as we setup the relevant column tableau_server_user as dynamic character varying.
  • The Initial SQL needs to be amended with calls to the md5() function in three places (the SELECT, the INSERT, and the UPDATE statements):
DO $$
DECLARE selected_pids integer; 

BEGIN
     EXECUTE 'SELECT current_pid FROM superstore.tableau_sessions WHERE tableau_server_user = md5($1)' USING [TableauServerUser];
     GET DIAGNOSTICS selected_pids = ROW_COUNT;
     IF selected_pids = 0 
     THEN 
         INSERT INTO superstore.tableau_sessions (tableau_server_user, current_pid) VALUES (md5([TableauServerUser]), pg_backend_pid());
     ELSE
         UPDATE superstore.tableau_sessions SET current_pid = pg_backend_pid() WHERE tableau_server_user = md5([TableauServerUser]);
     END IF;
 END $$;
  • The Custom SQL can remain unchanged.
  • But the join between our fact table and the Custom SQL on the sessions table needs to account for the change. Here, we can make clever use of the Tableau RAWSQL_*() functions and a Join Calculation to combine the hashed username from the sessions table with the “live hashed” version of the clear text username in the fact table:
A calculated join to MD5-hash the usernames on-the-fly.
  • That “live hashing” also needs to be applied to the user filter we set above, so the formula in the filter dialog looks like this:
[Tableau Server User] = RAWSQL_STR("md5(%1)", USERNAME())

The only issue with this hashing is, that unfortunately we can’t hide the column with the hashed usernames from this data source, at it is required for the join. So in theory, a malicious agent could look over your shoulder (or sniff the traffic to your client), take note of your hashed username, and – provided that they also gained access to the sessions table on PostgreSQL – hijack your session by altering the hashed username of the respective session with their own hashed username. Highly unlikely, but I want you to be aware of this possible attack vector.

In the following days I will also record a short video showing the process, so stay tuned.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.