Date Aggregations and Linear Modeling With R In #Tableau

Introduction

I promised to not write any more blog posts regarding mathematical modeling in Tableau, but I just could not help myself when I uncovered the Tableau/R behavior that I describe in this post.  I researched many documents to see if anyone has written about this issue, but I couldn’t find any information. If you use R-scripts in Tableau to do linear modeling like I do, this insight will help you save some time. I’ll keep this short, I promise.

Update May 2015

It took a while for me to revisit this issue, but now I have resolved the problem. The solution is shown as the last section of this post. Also, here is a link to a nice two-part series for using R to interact with your trend line in Tableau.


If you like this article and would like to see more of what I write, please subscribe to my blog by taking 5 seconds to enter your email address below. It is free and it motivates me to continue writing, so thanks!

Enter your email address to follow this blog and receive notifications of new posts by email.


The Issue Explained

I use R-scripts to do correlations and linear modeling in R within Tableau so that I can write the linear model equations to the title block, rather than having to cut and paste model results from the “Describe Trend Model” window of Tableau. Figure 1 shows an example where I am modeling a quantitative variable against a date field. This example is for a one-year time aggregation, Figure 2 is for a month aggregation, and Figure 3 is for a day aggregation. In these figures, you can see the slopes and intercepts returned by R in the title block as well as the trend model slopes and intercepts as described by the Tableau trend model. Figure 4 is a summary of the slopes and intercepts from Tableau and R.

What you see in Figure 4 is that you only get agreement between Tableau and R if the time aggregation is at the day level.  In fact, R returns positive slopes rather than negative slopes for the month and year aggregations, which is clearly not correct. I even tried scaling the dates (from months and years back to days) within the function calls to R and I tried many other tricks, but at this time I conclude that date aggregations at the day level are required for the “lm” package in R, as well as the “cor” function call.  Clearly with the slopes returned from R having the wrong signs, no amount of proportional scaling is going to change the result. Since I’m not an R expert, I suppose that there are some other ways to get Tableau and R to work together for time aggregations other than at the day level, but I don’t have time to uncover those methods right now.  Until someone is able to straighten me out or until I solve this mystery, I’ll stick with day aggregations for modeling cases of a variable vs time.

Figure 1 - Year Aggregation Results

Figure 1 – Year Aggregation Results

Figure 2 - Month Aggregation Results

Figure 2 – Month Aggregation Results

Figure 3 - Day Aggregation Results

Figure 3 – Day Aggregation Results

Figure 4 - Tableau Vs R, slopes, intercepts, and correlations

Figure 4 – Tableau Vs R, slopes, intercepts, and correlations

Figures 5 through 8 show the R-scripts used for this work.  To plot these variables in the Tableau title block as shown in Figures 1 through 3, all you have to do is set these variables on the level of detail shelf.  Once you place them on the detail shelf, you can create dynamic labels like those shown in the examples above.

Figure 5 - R Function call to return the slope

Figure 5 – R Function call to return the slope

Figure 6 - R function call to return the intercept

Figure 6 – R function call to return the intercept

Figure 7 - R function call to return the correlation coefficient

Figure 7 – R function call to return the correlation coefficient

Figure 8 - Calculated field for R-squared

Figure 8 – Calculated field for R-squared

 Solution to this Problem (May 2015)

During a series of working sessions with Joe Mako, I mentioned this problem and we decided to probe a little deeper into this behavior. The solution to the problem involved changing the formulas used in the R-scripts. For monthly aggregations, the slope formula is changed as shown in Figure 5a, the intercept in Figure 6a, and the correlation coefficient in Figure 7a. For monthly aggregations, the date field had to be truncated to the level of month for the Tableau solutions to match the r-script solutions. You also need to remember to set your compute using to Date.

Slope_month

Figure 5a – The r-script to calculate slope for monthly time aggregations.

Intercept_month

Figure 6a – The r-script used to calculate the intercept for monthly time aggregations.

Corrleations_month

Figure 7a – The r-script used to calculate the correlation coefficient for monthly time aggregations.

 Previous Work on Modeling and Tableau

You can find all my previous posts regarding modeling in Tableau by clicking on the Figure 9 picture below, which will take you to my Tableau public workbook that contains links to my blog post material.

Figure 9 - My blog posts for mathematical modeling in Tableau

Figure 9 – My blog posts for mathematical modeling in Tableau

5 thoughts on “Date Aggregations and Linear Modeling With R In #Tableau

    • Hi Russell,

      Thanks for the comment. It is very good for me to hear that there is at least one other person that sees the significance of posts like these! Like so many of my other posts, this is just basic stuff that seems like it should be easy, but somehow it doesn’t work as expected. I hope someone comes along and builds upon this topic (and others) to further shed insight into using R within Tableau. One of the leaders on this topic is Bora Beran, who is now working for Tableau and writes a blog at: http://boraberan.wordpress.com/. His stuff is really good and application oriented. However, there are still major gaps in documentation on how to do things. For example, try to find a document that tells you that R “lm” package returns the intercept as coefficient[1] the slope as coefficient[2], or even which variables (x or y) should be .arg1 and .arg2. I’ve tried to find basic information like that and it takes way too long to get anywhere. I really appreciate your comment and I appreciate your work very much!

      Thanks,

      Ken

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s