This is one of the those blog posts that is more of a novel than a short-story and it is more of a marathon than a sprint. In this series of articles, I build a detailed case study that uses the combined power of Alteryx and Tableau to investigate worldwide climate data.
The fact that I chose climate data for this case study isn’t that important. You can substitute any data source into this framework and still have the same kinds of challenges to overcome as I show in this series. So even if you are not interested in climate data, there is a lot to learn by reading, studying and working your way through these articles.
The real reason for these articles is to show how a modern-day analytics project can get completed from beginning to end using these fabulous software tools. I want to document how Extract-Transform-Load (ETL) processes happen in Alteryx to produce data that can be so effectively visualized in Tableau.
Learning About Alteryx
When you begin using Alteryx, there is a lot to learn about the individual functions available to you in the software. You begin learning how each function works and what it accomplishes. That is only one part of the learning process, however. It is akin to taking baby steps when you first learn to walk.
The next stage of learning involves piecing together a series of functions to accomplish a particular goal, much like a baby has to learn to take multiple steps to begin walking. You start piecing together a series of these types of operations to form a more complete workflow.
In a sense, you become a computer programmer using a series of high-level functions that gobble up data and then give it back to you the way you want it to be. You have to learn how to make data flow from its beginning to its ending point, as you modify its shape or structure, or combine it with other pieces of data.
The problem is, in the real-world, there aren’t too many complete examples that are available that show you how to do these things. Much of what you learn to accomplish is through trial and error because there is no road map to follow. Part of the learning process is to visualize the steps needed to get from point A to point B to point C.
Furthermore, there are even fewer examples that show you how to interface the Alteryx-created data for effective visualization and quantitative analysis in Tableau. That is one of the reasons I took the time to conceptualize, create and document this work.
My Motivation and My Goal
The sparsity of easily accessible and complete Alteryx+Tableau examples is one reason I decided to write this series of articles. I want to document, in great detail, the steps necessary to complete a big, real-world problem that has enough complexity to warrant its use as a study case.
I want to build a case study that explains what it really means to perform an ETL-based analytics project that features both significant data preparation work combined with significant quantitative and visual analysis. I want to write about each of the ETL steps in this type of project because this technique is becoming more important for data scientists to understand and complete. Furthermore, I want show how these data manipulation steps can lead to great quantitative analysis and visualization in Tableau.
In the future I hope that these articles will help me and others learn how to get things done by combining Alteryx with Tableau, starting with a unprocessed data files to the final graphics. As always, there can be multiple ways to accomplish certain tasks. However, the biggest importance of this work lies not in the details, but rather in the total scope of what it accomplishes.
There are two caveats I need to add before launching into this project. First, due to the complexity of this case study (or project), some parts of these articles will likely by updated over time (even after initial publication) as I continue to add salient technical details where needed. Secondly, I don’t know how many articles there will be to this series because there is a lot of information for me to cover.
This work is the culmination of a series of articles I wrote in 2014 that describe how to accomplish certain fundamental things in Alteryx, like how to read data from a flat-file.
Figure 1 shows the Alteryx-based articles I have written and you can access them by clicking here. From this Tableau workbook, you can launch any of the articles directly. All of these articles are stored on this blog. Notice in Figure 1 that I used a key-word search of “Alteryx” to generate the listing shown in the dashboard. Many of those articles represent what I call foundation building-block types of techniques. These techniques are used throughout this case study.
The reason that I am showing these articles is that I will be referring to them in this work. By pointing you to those articles when the time is right, you will be able to learn the details of those particular techniques without me having to bog-down this case study. Additionally, you are free to start reading these articles as I prepare the rest of this case study. That is why I wrote those articles first, followed by this article. Apparently there is a method to my madness, after all.
The Climate Data Example
I have chosen to use climate data as the basis of this case study. An enormous amount of time and effort has been spent in assembling this data from monitoring stations around the world. The project that maintains this data is ongoing, well documented, and is world-class. It gives us a lot of information to use in a case study of this magnitude.
This data source has been carefully chosen for this case study because it offers to us several things:
- Free data (that is updated monthly);
- Interesting data (worldwide climate data spanning hundreds of years);
- Big Data (billions of records and growing fast);
- Data that requires ETL operations;
- Important data (we all live on planet earth, don’t we?).
If you are able to take a project like this from the starting point of hitting this website to the ending point of visualizing data in Tableau, then you can do much of what you need to do on a typical real-world ETL-based analytics project.
Disclaimer: This project isn’t easy. It is ambitious. This project will likely take you many hours of work to follow and reproduce. However, if you are motivated to learn and work, you will be able to complete the job and produce your own results. By the time this series of articles is completed, I will have provided to you all the instructions and guidance needed to do the job.
Upcoming in Part 2
The project begins by grabbing and unzipping the data. You need to make sure you have at least 30 gigabtyes (but preferably 50 Gb) of storage available to you for this project. Hey, welcome to the real-world. If you don’t have this much space available, you can still process a smaller subset of data in about 2-3 Gb, so don’t dispair.
If you like this article and would like to see more of what I write, please subscribe to my blog by taking 5 seconds to enter your email address below. It is free and it motivates me to continue writing, so thanks!