If you are an experienced Alteryx user that has the privilege to use the “complete package of Alteryx”, which includes mapping, demographic and business data, you may already have techniques to handle the issues that I discuss in this blog post series. This series will be focused on using Tableau to help with Alteryx usage, especially for new Alteryx users. New Alteryx users (myself included) might find some benefit to the concepts outlined in these posts. Experienced Alteryx users might scoff at my inexperience with the tool. However, even with this risk, I plan on writing how I overcame some new-Alteryx user limitations by using Tableau Desktop during my first month of usage of Alteryx. I can see that there is a natural fit between Tableau and Alteryx and I hope to explain how I use these tools in complementary ways to accomplish some really amazing work. Much of what I can present in this series is conceptual, however, because I cannot show client-specific data.
If you like this article and would like to see more of what I write, please subscribe to my blog by taking 5 seconds to enter your email address below. It is free and it motivates me to continue writing, so thanks!
How Can Tableau Supercharge Alteryx?
When you begin using Alteryx, the program and its workflow based-structure can seem overwhelming. There are so many functions to choose from and there are many ways to accomplish a particular task. You have to decide how to establish the workflows that you want to use in a way that make sense to you. You have to learn about how the Alteryx functions work and how to connect them to extract information from the databases that you now have access to. These databases are extensive and they change over time through an excellent quarterly update system provided by Alteryx. For the United States in the first quarter of 2014, the Experian supplied databases (principally demographic data) have over 35,000 fields and the Dunn and Bradstreet database has over 15,000 business-related fields. The sheer number of databases and data fields in the complete version of Alteryx can take some time to review and comprehend.
Since some fields can be extracted from more than one database, it is important for the new user to understand the vintage and content of these databases. Not only that, if you are searching for a particular type of data such as population statistics for the Latino ethnic group, you have to know a-priori which part of the Experian databases you want to extract your data from. Even more specific, if you are searching for information on the Puerto Rican populations in an area, you would have to know to have Alteryx search for these variables in four databases including US Census 2000, 2010, Current year 2013 and Current year 2014, as shown in Figure 1.
This means you have to understand the database hierarchy, including categories, subcategories and database vintage to find the information you are looking for. You need to decide if you to want to use current-year estimates or if you would want to use data from the 2010 census, for example. The problem that a new user faces is being able to quickly find the fields that they are interested in and then figuring out how to extract this data for usage on your projects. If you make a mistake and tell Alteryx to extract all variables for your locations, chances are good that your workflow will either die or take so long to accomplish the task that you will be working for another company before the job is complete. As shown in Figure 2, I was searching for a particular topic and I chose two primary databases as shown, with only a limited subset of categories chosen for each. The number in the lower right corner (1903) tells you how many fields will be sent to your output file, so pay attention to this number. Many of my extracts were in the thousands of fields and the more you have, the longer the job takes to complete.
The potential power of Alteryx is explosive, but its ultimate utility to a project will largely be dependent upon the skills of the person creating the workflows and handling the data input and output tasks. For users like me that happen to have years of experience with GIS projects in scientific disciplines, using Alteryx is very natural and intuitive. I understand the concepts and workflows because I’ve done them before, although they were done with different software. However, there still is a learning curve for any new user because you have to learn the details of the Alteryx system. You have to understand how geo-referenced data elements can be tied together using the Alteryx functions. Once you begin understanding these concepts, you can quickly create data sets to use in business intelligence projects by extracting data from the Experian and Dunn and Bradstreet databases. To complete your analysis, Alteryx allows you to extract these data sets to multitude of output files including *.csv files or Tableau *.tde files, if you so choose. For me, I ended up using both types of files as described in this blog post series.
If you are trying to extract data from these databases for specific purposes at specific locations, for example, the data that Alteryx sends out for you to analyze has the field names encoded as alpha-numeric characters as shown in the Fieldname column of Figure 3. There are no descriptions of what these codes mean in the output file, although the definitions exist in the Alteryx-provided Excel file shown in Figure 3.
To understand what you have received from Alteryx, you have to do one of two things. First, you have to develop a more advanced workflow that is capable of looking up the descriptions of these alpha-numeric codes from the Excel workbook shown in Figure 3. This workbook has a large number of worksheets for each of the databases that you can access in Alteryx, and on each of these worksheets is information that describes the alpha-numeric fields and what these fields represent (see worksheet tabs at the bottom of Figure 3). If you are a new user of Alteryx like me, developing a workflow like this might be challenging and time consuming. Secondly, you can develop an efficient method to post-process your exported file to look up the descriptions of the alpha-numeric field names you have received from your workflow. This post-processing step can be labor intensive, so I added some Tableau magic to my work process to make my life easier. This is the first method I developed to supercharge my Alteryx experience, and this is the topic of this blog post.
How Did Tableau Help Me Understand My Alteryx Exports?
Why would anyone want to use Tableau to help understand the descriptions and values of the alpha-numeric fields received from an Alteryx export? This seems completely counter-intuitive and isn’t a normal usage of Tableau. If you have been a reader of this blog, you will already understand that Tableau is my data engine, my data librarian, and my data artist all rolled up into one. I commonly do things that go beyond the normal use cases of Tableau. This is one of those examples, and here is how I did it.
First off, you might ask what is the big deal about looking up the descriptions of a few alpha numeric codes? Anyone can do vlookups, or perform index/match operations in Excel. Well, if you had to do what I recently did, you would be spending a long time looking up the codes. I extracted thousands of variables across many of the databases accessed by Alteryx at hundreds of locations to perform statistical analyses for a project. Would you really want to customize your look-ups across all those Excel worksheets to accomplish this task ? Trust me when I tell you that it wouldn’t be a wise investment of your time because you would have to figure out which sheets you would have to reference to find the field descriptions for all of the thousands of alpha-numeric variables. This task would bog you down to the point of extreme inefficiency so much that you probably could not get your job done in the assigned amount of time.
To solve the problem, I constructed a file to send to Tableau that contained the entire content of the Experian databases (as shown in the worksheet tabs at the bottom of Figure 3). I did this so that Tableau could be my librarian by making the linkage between the “Fieldname” field that you get in your Alteryx export and the “Description” field that are shown in Figure 3. This linkage allow me to quickly understand the content of the exported files I received from Alteryx. By doing a Tableau data join (using the “Fieldname” variable) between my Tableau file that contains the Experian codes and descriptions with my Alteryx exported file, I was able to immediately identify each of the fields that I had in my exported file. This made it easy for me to determine which parameters were most significant in my analysis, without having to look-up the field descriptions for all of the alpha-numeric codes. The amount of time this saved me was enormous (days of work) and it only took me about 1 hour to to create the file that contained the content of the Experian database.
There was one interesting aspect to this work that I should mention. This work was done in Tableau 8.2 with the new database driver that replaced the Jet driver. When I first tried to make the connection between the “Fieldname” variable in my Tableau file that contained the Experian definitions and the Alteryx export file, I did so by loading each file as a data source and I tried using the linking field approach to make the connection. This approach only picked up about 1/2 of the field definitions for the alpha numeric codes. There were about 1/2 of the codes that were reported by Tableau as Null fields. When I did a data join between the two files, however, the definitions for all the alpha numeric fields were found. This approach worked perfectly, so now I know that I just have to do a data join to gain access to the field definitions. At this time, I have no explanation to offer as to why the linking field approach did not work as expected although the data join did work perfectly.
Finally, there is one additional usage of this Tableau file that contains the content of the Experian databases. I use it all the time to find out if certain information is stored in the Experian databases. For example, I searched the Tableau file to find information on the term “graduate”, as shown in Figure 4, With this information, I am able to quickly customize my Alteryx workflows and create the proper configuration settings for the data I want to extract. I can search for anything I want and Tableau gives me the direct insights I need to know which Experian databases (see the Vintage field in Figure 4) I need to include in my workflows. This also saves me a huge amount of time.
Upcoming in Part 2
In Part 2 of this blog post series, I’ll be discussing the Dunn and Bradstreet database and how I have discovered some interesting aspects of this information.
Part 2 has now been published, so click here to read it.