Imagine you have Tableau dashboards that require the use of row-level security on data coming from PostgreSQL, but the connection to that database is established through the same technical user for everyone. How can you achieve that? Is it even possible? Short answer: it’s possible – and it’s not even that complicated, provided you’re not afraid of some PL/pgSQL scripting, PostgreSQL session IDs, and clicking on that mysterious “Initial SQL” button in Tableau Desktop. For full disclosure I’d like to mention that I found the original idea for this approach on my colleague Bryant Howell’s excellent blog, where he outlines the process for Microsoft SQL Server. That said, the approach I’m showing here is not a mere translation from SQL Server to PostgreSQL, but I also added a few more features.Continue reading →
When trying to fit a machine learning model on a very wide data set, i.e. a data set with a large number of variables or features, it is advisable for a number of reasons to try to reduce the number of features:
- The models become easier understandable, and their output as a result better interpretable, which leads ultimately to results that can be trusted rather than those of a complex black box model.
- The exclusion of strongly correlated features can prevent model bias, as the effect of multiple variables could otherwise gain greater influence on the model as they actually do.
- Similarly, it can help to avoid the curse of dimensionality, in the case of very sparse data.
- Ultimately, the performance of the model can be optimized, as training times are shorter and the models are less computationally intensive.
While developing my talk “Machine Learning, Explainable AI, and Tableau”, that I presented together with Richard Tibbets at Tableau Conference in November 2019 in Las Vegas, I wrote a number of R scripts to perform feature selection and its preliminary tasks in Tableau. Due to the large number of questions I received about those scripts after the presentation, I decided to put together this article explaining what precisely I did there, in an attempt to make the “Tableau Feature Importance Toolbox” – as I’m calling the collection of scripts – available to the interested public. At a later point I will also summarize the contents of our talk in an article here on the blog, but for now you can find details about the scripts in the following, as well as the actual code files on my GitHub repository.Continue reading →