Preamble and Motivation For This Work
I have a somewhat modest goal that will require a lot of computations to complete. I invite you to come along for the ride because it is going to be interesting.
I want to crunch gigabytes of daily temperature data from every monitoring station around the world to determine the temperature time rate of change during each month of the year, over the past 57 years for every monitoring station having complete records. Once I have all those trends calculated, I want to compute the total temperature change at each location and create very interesting interactive dashboards to check some ideas I have about the state of global warming in 2017.
To do this isn’t really a modest goal, it is ambitious. Trillions of calculations will have to be completed. Serious software excellence will be put to the test. Not only am I going to do this as part of my Tableau vs Power BI series, I’m going to add Alteryx to the competition. If you like big data, climate change, scientific computing, trend modeling, and data visualization, get ready to take a ride along with me.
This is the first of three technique-based articles that I will use to demonstrate how this type of work can get done in Tableau, Alteryx and Power BI. Advantages and disadvantages of each software platform will be determined and documented.
In each of these first three articles, I use a small test case that identifies the monitoring station in Alaska that has experienced the most warming over the past 57 years in the month of April. I start this work in this article by using Tableau to do the work and then I will repeat that test in Alteryx and Power BI. Once these technique articles are completed, the full worldwide competition will begin and will form article 7 of my Tableau vs Power BI series.
With me being a student of Tableau for nearly a decade now, I have observed that Tableau has a very nice feature that is highly under-utilized. The feature I am talking about has to do with mathematical modeling. Specifically, I am talking about using Tableau trend models to perform time-series or trend analysis for a large number of items stored in big data sets. Once the trend models are complete, it is a simple calculation of delta time * slope of the model to determine the total change over the time period of interest.
In this article, I use daily temperature data from monitoring stations in Alaska to demonstrate some highly effective techniques for quickly performing trend analysis on a large number of items in a data file. In this case, the items are climate monitoring stations and the data is daily maximum temperatures.
There have been many occasions that I have used trend models in Tableau. In fact, several years ago I did a close examination of the available trend models and wrote a series of at least 10 articles. You can click here to review my Tableau mathematical modeling series in preparation for what I am going to show in this article. Within the global climate quantified series, I have used this technique to uncover some really interesting results (which has lead to the current work).
The Goal of This Work
The goal of this work is to rapidly determine which temperature monitoring station in Alaska has shown the largest increase in maximum daily temperature between 1960 and 2016. This question arose recently when I wrote about buckling roads and melting permafrost in Bethel, Alaska. The data I use for this example is daily data for the month of April, which contains a maximum of 1,708 daily readings (1960-2017) per monitoring station.
Trend Modeling In Tableau
The video shown below described how to trend modeling can be easily completed in Tableau. I am not going to take the time to write the instructions because that would take way longer for you to read compared to watching a short video. Throughout the climate series, there are a number of other videos like this one that demonstrate how you can perform this type of work.
There really was no need for me to benchmark how long this took me to do. I know that it could be done in a matter of minutes because I have done things like this many times before. When we get to the full competition, I’ll try to rank the software performance so that we can learn which package completes the job most effectively and efficiently.
Figure 1 contains the Tableau trend results for the top 5 monitoring stations in Alaska that have shown the most heating over 57 years in the month of April when all the daily T max readings are used to compute the trend. It turns out that station USW00026510 has shown the most warming over this time period and it is located in the interior of Alaska. It has experienced about 10 degrees of warming in April since the beginning of the 1960’s.
One of the primary benefits of doing this work has not yet been explained and explicitly demonstrated. That benefit will become clear after the first three articles are written and published. You are going to have to wait for Part 7 of the Tableau vs Power BI article to find out!
Next, Alteryx will be used to do this same test (click here for the article). After that, Power BI will step up to the plate.