Tableau Feature Important Toolbox as Tableau Prep flow

Feature Selection in Tableau

When trying to fit a machine learning model on a very wide data set, i.e. a data set with a large number of variables or features, it is advisable for a number of reasons to try to reduce the number of features:

  • The models become easier understandable, and their output as a result better interpretable, which leads ultimately to results that can be trusted rather than those of a complex black box model.
  • The exclusion of strongly correlated features can prevent model bias, as the effect of multiple variables could otherwise gain greater influence on the model as they actually do.
  • Similarly, it can help to avoid the curse of dimensionality, in the case of very sparse data.
  • Ultimately, the performance of the model can be optimized, as training times are shorter and the models are less computationally intensive.

While developing my talk “Machine Learning, Explainable AI, and Tableau”, that I presented together with Richard Tibbets at Tableau Conference in November 2019 in Las Vegas, I wrote a number of R scripts to perform feature selection and its preliminary tasks in Tableau. Due to the large number of questions I received about those scripts after the presentation, I decided to put together this article explaining what precisely I did there, in an attempt to make the “Tableau Feature Importance Toolbox” – as I’m calling the collection of scripts – available to the interested public. At a later point I will also summarize the contents of our talk in an article here on the blog, but for now you can find details about the scripts in the following, as well as the actual code files on my GitHub repository.

Continue reading →
A histogram with an overlaid bell curve

Jingle Bells – Adding a Normal Distribution to a Histogram in Tableau

It’s the holiday season, so why not amp you your vizzes’ holiday spirit by adding some bell curves to your histograms? Also, I just recently came across this request in a customer meeting and thereby discovered how easy that is to do. The most difficult part is wrapping your head around what a normal distribution is (please resort to Wikipedia for that), how it’s calculated (I literally stole the equation from Wikipedia) and how to translate that into a Calculated Field in Tableau. The rest is a simple dual-axis chart, a parameter and some rather basic Tableau techniques that need your attention.

Continue reading →

Embedding R Plots in Tableau Dashboards

“R you nuts?” is what my colleague asked me when I once proposed this little hack. He’s not completely wrong, we’ll get to that later…

The task I was presented with was to embed the graphical output from an R package in a Tableau dashboard. Of course it’s possible to run R code from within Tableau Calculated fields, you can read more about it in official Tableau resources here, here, and here and also here on my blog. But part of the game is that there is only one vector of data being returned from the R session via Rserve into a Table Calculation in Tableau. So what about some of the complex graphics R can produce? Sure, you can try to rebuild those natively in Tableau based on the data returned from the code. But what if a) you’re too lazy to do that (and also it’s all just about rapid prototyping something anyways), or b) the visualization is just too complex (think 3D brain models)?

Continue reading →