Linear regression line in Tableau calculated by external R script

Run code from an external .R file in Tableau

Tableau introduced the R integration in version 8.1 back in 2013. That’s awesome because it opens up to Tableau the whole range of analytical functionality R offers. Most of the time the R code being triggered from within Tableau is rather short, such as a regression, a call to a clustering algorithm or correlation measures. But what happens when the code you want to run out of Tableau is getting longer and more complicated? Are you still bound to the “Calculated Field” dialog window in Tableau? It’s nice but it’s tiny and has no syntax coloring or code completion for our precious R code.

Run R code inline in a Calculated Field in Tableau

Run R code inline in a Calculated Field in Tableau

But things will get arduous when trying to write longer code. So why not write the code externally in your favorite R IDE (something I recommend doing anyways – even with shorter scripts!), save it to a .R file and reference that code from Tableau’s calculated field? Sounds like a great idea, and it’s totally possible. Let’s try it out, shall we?

The simple and straightforward attempt

First attempt: Let’s just outsource the exact code we used in the calculated field to the .R file and run source() within the Tableau calculated field. Here’s the code which I then saved into a file called Rcode.R in my Windows machine’s Downloads folder:

model <- lm(.arg1 ~ .arg2)
print(summary(model))
predict(model, interval = 'prediction')[, 1]

And here’s the accompanying formula for Tableau – please note that the path within the setwd() function needs to be modified according to where you put your R code file:

SCRIPT_REAL("
    setwd('C:/Users/kgreger/Downloads')
    source('Rcode.R')
", 
SUM([Profit]), 
SUM([Sales]))

Did it work?

Tableau error message: The result returned by the SCRIPT function is of an unexpected type.

Tableau error message: The result returned by the SCRIPT function is of an unexpected type.

Nope, that’s not it.

The slightly more involved attempt

Next let’s try to call the external R code file and pass the input data from Tableau as command line parameters. This is the slighly modified R code – note that the first command line parameter commandArgs[1] is the name of the script itself, hence we use arguments 2 and 3 here:

model <- lm(commandArgs[2] ~ commandArgs[3])
print(summary(model))
predict(model, interval = 'prediction')[, 1]

And the accompanying formula for Tableau:

SCRIPT_REAL("
    setwd('C:/Users/kgreger/Downloads')
    system(paste('Rcode.R', .arg1, .arg2))
", 
SUM([Profit]), 
SUM([Sales]))

Did it work?

Tableau error message: R's system() expects character string as first argument.

Tableau error message: R’s system() expects character string as first argument.

Still not quite what we want. The problem here is that paste()ing together an atomic string ("Rcode.R") and two vectors with length greater than 1 results in a non-atomic vector – which system() can’t handle. You can check that yourself by running this R code snippet:

str(paste("text", c("a", "b")))

A possible solution would be to translate the two input vectors into strings such as "c(10, 20, 30)", pass these to the R code file and re-translate them into proper vectors. Sounds like a lot of work, both in terms of implementing it (which is why I didn’t even try that) and also in terms of what Tableau, RServe and R would have to deal with. Just imagine how long these two vectors would be with the 5,009 elements as shown in the workbook that accompanies this blog post… (Right click > Save target as...)

The not so straight-forward but pretty slick (and working!) attempt

OK, so let’s think about what’s actually happening here to see if we can find a better solution for this. What we want to do is run code that’s written in an external file (our Rcode.R), but use data from Tableau, which only lives in the Tableau fields or the meta-fields .argn. If you think about it, that sounds like function call. And that’s actually the solution to our problem!

So within the R file we wrap our code in a user defined function, like this – I decided to give it a prefix with my initials and the name of the main function it’s using, here lm, hence kg.lm:

kg.lm <- function(y, x) {
  model <- lm(y ~ x)
  print(summary(model))
  predict(model, interval = 'prediction')[, 1]
}

This is really neat, now we just have to modify the call from the calculated field in Tableau to use our newly created function:

SCRIPT_REAL("
    setwd('C:/Users/kgreger/Downloads')
    source('Rcode.R')
    kg.lm(y = .arg1, x = .arg2)
", 
SUM([Profit]), 
SUM([Sales]))

Did it work this time?

Linear regression line in Tableau calculated by external R script

Linear regression line in Tableau calculated by external R script

Now ain’t that pretty? It totally worked! And neither the R code nor the syntax in Tableau’s calculated field became significantly longer or more complex. There’s a few other nice aspects about externalizing your R code this way (this starts to sound like one of those home shopping shows on TV…):

  • You can re-use your R code in multiple calculated fields within one workbook or even across multiple workbooks and still have only one place to manage/edit/update the code. Imagine you developed a nice machine learning model with a few hundred lines of code and suddenly discover a bug – you wouldn’t want to have to browse through all your workbooks using this model to correct the bug there, would you?
  • You can put many of these user defined functions in one single R file, source() this in your calculated fields and use just the functions you need.
  • The previous fact is especially interesting regarding the fact that you can preload external R resources when firing up your RServe session. This needs to be done in a file called Rserv.cfg, which needs to be placed in the RServe folder: DRIVE:\INSTALLATION_PATH\R\R-VERSION_NUMBER\library\Rserve\libs\x64 (or \i386 if you’re on a 32-bit machine) – the parts in italics need to be set accordingly. There’s a lot of documentation about what can go in the Rserv.cfg file, but for our purposes here it is sufficient to put the following line for each external R code file you’d like to preload:
    source C:/MyFolder/sample.R
    

And there you go: Running code from an external .R file in Tableau’s calculated fields. Enjoy and let me know if you run into problems or know an even easier way of doing this. If you want to skip all the hard work and just look at the ready-made thing, feel free to download the workbook that accompanies this blog post… (Right click > Save target as...)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*