I create a lot of scatter plots in high-volume data situations. In this article, I show how beneficial it can be to use a new Tableau 10 feature called “highlighting” when you have a lot of data to view in scatter plot form.
The purpose of this work was simple. I wanted to better understand a phenomenon related to where customers live and where they chose to buy a product. I wanted to quickly visualize how far customers had to drive to pick-up the product they chose to buy. In other words, I wanted to know how far the consumers were willing to go out of their way to get what they wanted.
For this work, I processed over 2 million transactions from all 50 U.S. States, with a maximum drive time of 6 hours (360 minutes) used as a limiting factor. Given this much time, a person could drive approximately 400 miles via the U.S. interstate system to get their purchase.
It took over 250 hours of computational time for Alteryx to perform the required data operations and to compute the drive times and distances between customers and products they purchased. I also computed the straight-line distance between the locations and the average drive speed obtained for each transaction.
I used the difference between the straight line and drive distances to highlight some interesting situations where customers really had to go out of their way to pick-up the product they acquired. Lastly, there are a lot of other analytics developed from this data, but a discussion of those concepts are not included in this article.
Using State-Level Highlighting With a Lot of Data
With over 2 million data points in a scatter plot, I was interested in using Tableau 10 highlighting to quickly identify anomalous data by state. By anomalous data I mean being able to quickly identify large differences between the straight-line distance and the drive distance. I was also interested in viewing the data point scattering within a state to identify states where customers really had to work (i.e., drive out of their way) to get the products they wanted.
To achieve my goal, I simply turned on highlighting by clicking the “Show Highlighter” field for a state as shown in Figure 1. That was tough – thanks Tableau!
Next, I added a worksheet action (Figure 2) to allow me to visualize the drive path from point A to B for the selected point of interest.
The hyperlink <Google Map> that was used in the worksheet action was created in Alteryx as shown in Figure 3. This same type of link could just as easily been created in Tableau using a calculated field and certain strategic functions.
In the movie shown below, I allowed the hyperlink to launch a new browser window to show the Google map. I also have a version where the map is displayed in a Tableau dashboard within a web object.
This link spawns off a drive route map as shown in Figure 4. This Google map drive path was independent of the one calculated by Alteryx. Alteryx uses the Tom-Tom mapping system to compute the drive path, but every time I have compared results, they are remarkably similar. In this example, the Google drive distance was 313 miles and Alteryx reported 313.5 miles. Random sampling of a bunch of others cases were just as accurate.
By visualizing the scatter of any individual state (Figure 5) with respect to all the other states, I can make an assessment of how rural a state is and how hard it is for the people to travel from home to the product purchase locations. Highlighting makes this comparison so easy to see.
To hear me talk about the intriguing nature of this work and to see some interesting randomly-selected results, you can watch the video below. This is one of the most interesting and fun Tableau workbooks I have ever created.
After making the video, I realized that the relatively few situations of straight line distance being greater than drive distance occur in rural sites where Tom-Tom cannot perfectly identify the road network. I realized this once I started viewing a few of these within Google maps, all of which indicate unknown road networks near point A or B.
This is an example of a project that uses the power of Alteryx to do a zillion calculations and then sending the output to Tableau to very rapidly analyze results.
Although I haven’t gone into detail of why I wanted to do this work, the conceptual framework shown demonstrates how incredibly powerful these two tools can be when used in combination. Finally, there were other uses of highlighting (different dimensions) for this particular application, but I chose not to show them to keep this article short and sweet.