I previously wrote about solving seemingly impossible problems with Alteryx. While writing the workflows for the challenging problem I described in that article, I discovered a few interesting aspects of how Alteryx handles leading zero fields and performs date conversions. Since these topics took me a bit of testing to understand, I thought that these insights would be interesting to document and share. Of course there might be other techniques to handle the situations that I describe, but for me, the techniques I describe work reliably and there isn’t a lot of documentation on the web for these topics.
If you like this article and would like to see more of what I write, please subscribe to my blog by taking 5 seconds to enter your email address below. It is free and it motivates me to continue writing, so thanks!
The topic to be addressed in this blog post relates to very specific issues related to the handling of leading zeros fields (LZF). In an upcoming blog post, I’ll described the lessons I learned in handling date conversions. Understanding how Alteryx initializes and transforms leading zero data fields throughout a workflow is very important to be able to get correct results if data joins or other potentially complex operations are being completed. Additionally, you need to understand how Alteryx handles these items if you are exporting data directly to Tableau through a Tableau Data Extract.
Handling Fields With Leading Zeros
Fields that have leading zeros (i.e., 00012398745) can be tricky to properly control in your workflow because Alteryx can interpret these fields in different ways, depending upon where the data originates. This means that the way Alteryx views LZF fields depends upon the file type that contains the original data. If you read a LZF from a flat file, for example, you can specify at the time of data reading that you want these fields to treated as strings. All LZFs can be specified as strings in the XML file that is used to load the flat file, and the syntax for this is shown in Figure 1. By specifiying a LZF as a string, you will ensure that Alteryx will not drop the leading zeros from your data fields.
In contrast to reading a LZF field from a flat file, consider reading an LZF from an MS Access database. Figure 2 shows a portion of a database that has a LZF that is exactly 13 characters wide (the PCN field). In Access, these fields are properly formatted with 13 characters shown in Figure 2. When this database is read by Alteryx, however, the PCN field will automatically be mapped as a double precision number. There is no current routine in Alteryx that allows you to change the default data type for any field when you are connecting to a data source. In Tableau version 8.2 or later, it is possible to simply change the default data type before you complete your live connection to the database or before you create an extract. It would be nice if Alteryx would adopt a similar strategy at load time or provide a routine to configure the incoming fields.
Since the PCN is considered a double precision number, the leading zeros will be lost when you write this field out to a file. The leading zeros are considered inconsequential since the field is treated internally by Alteryx as a double precision number. However, if you are doing a data join between two data sources that use a leading zero field such as the PCN shown in the examples above, you will quickly realize that it is not possible for Alteryx to complete this join. Alteryx will report that a join cannot occur between a string and a floating point number. To solve the problem, you have to re-pad the leading zeros onto any LZF in any data sources like what was shown in Figure 2. This manipulation can be completed in using an Alteryx formula tool with the command shown in Figure 3. This solves the problem of LZFs that originate in sources like MS Access.
In general, when you are loading various data from a variety of data sources into a complex workflow, you need to pay attention to how LZF are characterized at load time by Alteryx. If your data resides in Excel, you might want to read this post I wrote to make sure you are handling your LZF’s properly. You might also want to consider reading this post about working with alpha-numeric fields.
In general, you must make sure that the data types match for proper pairing of data sets to occur in Alteryx. If you are doing data joins between big databases using LZF, you might get some matches for certain records that don’t have leading zeros. You could be fooled into thinking that everything is working correctly only to find later on that a lot of records were dropped because you didn’t pay attention to the details related to leading zero fields.