I originally wrote this article 1.5 years ago, in March, 2015. I believe it has sat languishing in my draft blog box, like a forlorn Grossery Ganger that has long been forgotten! I believe that it is now time to get on with it.
Tonight I decided to make an instructional video to get this concept completed. The article is less than I wanted it to be, but yet somehow may be more than I hoped it would be. I think that the instructional video will help people in the future, so I decided it was time to unleash this beast as I continue to work through my blog backlog.
Was this a simple case of procrastination on my part? Or did I stop writing the article so that I could learn the macro techniques better than when I started writing the article? Only I know that answer, but I encourage you to watch Adam Grant in this TED talk (Figure 0) for additional enlightenment on the subject of original thinkers and procrastination.
When you are a serious mountain climber, you eventually try to reach the highest peaks. As you ascend towards the heavens, you try to gobble up air that just isn’t there. You may eventually need supplemental oxygen to make it to the top.
That might be true for most people, unless you are the amazing, superhuman Wim Hof. Wim climbed nearly to the top of Mt Everest without oxygen, while wearing only shorts! Go ahead and watch the documentary and read the article if you don’t believe me. If I can find the time to write another article, you will soon be reading about how I practiced a method similar to the Wim Hof method starting in 1979 or so, which was at least 10 years before he discovered the methods he now practices.
Now getting back to the story, when you work with Alteryx, your skills can improve dramatically over time. You might find yourself becoming more and more capable of producing workflows that are efficient, fast, and that can generate a lot of useful information.
Eventually, you might find yourself producing workflows that cannot be executed because too much information is being produced within the workflow or the workflow is taking too much memory or time to complete. These are the times when you need to reach for the supplemental oxygen. In this case, the oxygen comes to you in the form of Alteryx macros.
I am still learning a lot about Alteryx on a daily basis, that is for sure. I am just a guy that wants to understand how to better use one of the most fantastic software packages ever created.
By the time this article is complete, I will have authored several batch macros because I am going to produce examples that help me (and hopefully you) to better understand Alteryx macros.
For these reasons, I might not produce perfect macro solutions to the problems I am attacking. However, I promise you that my approaches will be well-documented and will work. The macros will be for real-world problems that were developed by me as I strive to accomplish previously impossible tasks in Alteryx.
A long time ago, I read something that has driven me as a teacher and as a scientist. The great physicist Richard Feynman once said that if he couldn’t explain a topic to a kindergarten-age child, then he didn’t really understand the topic.
At the time of the initial writing of this article, I do not completely understand Alteryx macros. By the time this article is finished, I will have a better understanding of them because I will teach myself new concepts. If I do the job the right way, you will have a better understanding of them, too.
What Are Alteryx Macros?
Alteryx macros can be written in various forms including (1) standard macros, (2) batch macros, and (3) iterative macros. These techniques appear to me to generally increase in complexity from 1 to 2 to 3. Each type of macro allows a process (workflow) to be repeated until all records or groups of records (or a specified number of them) pass through the macro.
A simple macro is a tool that allows you to introduce a user-entered data value, for example, into a workflow so that it can be included in calculations. You might also ask the user a question that impacts how you are going to execute your workflow. For example, you might have a workflow that calculates monthly margin for your company, but that margin depends upon a multiplier that gets calculated outside of your workflow. The standard macro can be used to ask the user for the value of the multiplier at the time the monthly margins are created.
A batch macro is a tool that allows you to send information into a workflow to be able to run the workflow a set number of times. By controlling what you send to the workflow, you can control how much content is being produced, how much memory is being consumed, and how long each workflow will take. Batch macros allow you great flexibility in executing your workflows.
An iterative macro is a tool that allows you to run a workflow any number of times until a specified criteria is met. You can place a limit on the number of iterations that can occur in the macro. The writing of an iterative macro is more advanced than either the simple or batch macros and requires more thought on how to design the workflow. This is my impression based on the examples I have studied.
The Focus of This Article
In this initial article, I focus on the second type of Alteryx macros – batch macros. The reason for this is that I recently had to develop a couple of batch macros to get some computationally-intensive work completed. Eventually, I will produce examples of each type of macro to demonstrate their flexibility and how they can be effectively used for many Alteryx workflows.
The reason I want to write this article this is simple. As I continue to improve my Alteryx skills, I am finding that I need to use macros. Currently, the existing documentation and examples I have found have not been sufficient for me to learn to be a competent macro builder.
For some of my more advanced workflows, I have to have some fine-scale control over the workflows to avoid excessive run times and memory consumption. When you work with massive data sets, you may have to be smart with how you design and execute the workflows.
I have only needed to use a batch macro once in the time that I have been using Alteryx. I wrote an article that described the usage of this macro.
In that example, I used a simple batch macro to send a list of climate monitoring stations into a moderately complex workflow to process time-series climate data. I needed that macro to avoid producing too many output files at too many climate monitoring stations.
I simply wanted to process data from a select list of monitoring stations I had chosen and that required the use of a batch macro. So if you find yourself in a situation where you have a lot of input files to send to a workflow and you only want to send a subset of them, read that article to find out how to create that type of batch macro.
Developing that macro took me way too long, and even now when I revisit it, I still have to really think about how I created it to understand how it works.
Conceptually, it is very simple. I send a list of files into a workflow to process the data at these monitoring stations. The list of files is used in the batch macro to process the data one file at a time. This is the control that I desired for this workflow.
The review of this work taught me one thing. Alteryx batch macros may not be intuitively obvious when you first start using them. My inability to immediately comprehend what my own macro was doing (after a few months of elapsed time) surprised me a bit since I have literally created hundreds of macros for all kinds of computer applications for over 30 years.
To improve my comprehension and command of Alteryx macros, I decided to conduct the research, testing and development of additional examples to help others use these techniques.
Existing Macro Documentation
When I researched how to use Alteryx macros, I primarily studied three training videos that were available as on-line resources. I would strongly recommend viewing these three macros to anyone that wants to learn how to write Alteryx macros. The hyperlinks to these training videos are shown below:
Each example gives a complete description of the sequences needed to complete the macros. One of the problems I had with these videos, however, is that the problems to be solved are not properly introduced.
The videos jump into developing the workflows and macros without properly preparing the user by giving them a problem definition. Additionally, the examples are not necessarily real-world based. The examples are more ad-hoc in nature and are intended only for demonstrating the macro techniques. For these reasons, I had to watch these videos repeatedly to understand the full complexity of each type of macro.
Additionally, Alteryx provides several macro examples in sample files that are provided with the software. As shown in Figure 1, there are many samples to study. I didn’t realize that these examples existed until after I developed the examples shown later in this article. Now I have more studying to do to learn additional techniques!
My Example Batch Macros
I recently developed a workflow that had to extract demographic data from a series of over 4200 locations across the US. The locations were stored in an Excel data source that formed the beginning of the workflow.
If I tried to run the workflow in its entirety, the workflow eventually ground to a halt because all of the system resources were consumed (RAM and hard drive space) such that the process could not be completed. To solve this problem, I thought of a few ways that I could complete the work.
Method 0 – The Brute Force Method (Not Recommended, but It Can Work!)
Although I am ashamed to admit it, I went old-school and went toe-to-toe with Alteryx in a weekend-long battle of wills. I reverted back to my roots and used the brute-force method to complete the job while whittling away the time down in Austin, TX.
The brute force method is not elegant and it is not macro-based. I did this for three reasons: (1) I didn’t know how to write a macro to do this work; (2) it was a Friday night and I was all alone with nothing better to do all weekend, and (3) I really wanted to get these results for a big demonstration project. Sometimes you have got to get down and get dirty, and that is what I decided to do that weekend. I don’t regret it at all!
I started the workflow and used the record limiting feature of Alteryx (Figure 2) to control how many locations were processed at a time. I found out that through trial and error, that if I chose about 200 locations, the workflow took about 1.5 hours to complete.
With 4200 locations to process, this meant that I had to do this 21 times!. After each run of the workflow, I had to delete the top 200 rows of the Excel database to get to the next set of 200 records to process. It took me over two days of nearly around-the-clock attention to complete this work. I wouldn’t recommend this to a friend, however.
Moving Beyond the Brute Force Method into Macro Land
After completing the work, I thought about how I should have set-up a macro to complete the work. There were two ways I could visualize for doing this type of job.
First, I wanted to be able to run the workflow (n) number of times based on a field in the database. In this case, I wanted to be able to process locations by state name, which was a field in the database.
Second, I wanted to be able to control the group size (or record count) that would be pushed through the batch macro. If I chose, I could process, 1, 5, 10 or 100 records at a time. I wanted to have a macro in which the size could be controlled so that the workflow run-times were held at a manageable level.
You might wonder how this is different than the brute-force method. Well, the macro can execute the entire job n-records at a time. In the brute force method above, I had to do Excel record deletions and start the workflow 21 times. With this proposed batch approach, I only had to run the workflow once, with the batch processing doing the job 200 records at a time for 21 times!
To accomplish the second objective, I summoned some Alteryx help, which came back to me in the form of an awesome Alteryx sales engineer named Anthony Johnson. In a relatively short working session, we were able to build the group size based batch macro that did exactly what I wanted to do. Sometimes there is no substitute for experience, and Anthony had exactly the right experience for the job.
Method 1 – Create a batch Macro to Process a List of States
Today, I decided to complete this article (thank God!). Rather than writing more verbage (thank God again!), I chose to demonstrate how I converted a standard workflow into a batch macro. The video shown below discusses all aspects of the standard workflow and how it gets converted to a batch macro.
I also show how to build the driver program, including how to insert the macro. I think this video will be well-worth watching for anyone wanting to learn how to make an Alteryx batch macro. The twelve minutes it takes you to watch it will save you hours in the long-run.
Method 2 – A Batch Macro To Process a Set Group Size
Anthony and I built a batch macro to execute my workflow N-times, in which (N) represents the number of locations to be sent through the workflow. Before I show the macro for this case, let me just explain something this is important to know.
To build batch macros, you start by converting the standard workflow into a batch macro. Then you install some controls in your batch macro to enable the batch behavior. Once the controls are in place, you design what I call a driver workflow to send the information into the batch macro. These steps were exhibited in the video in the previous section.
Figure 3 shows the batch macro we developed for this case of running a macro N-times, based on a group or block size. The group size should be chosen to maintain control on memory usage and/or execution time.
Figures 4 through 7 show some of the details installed in the workflow. The tool configurations were developed to run the workflow in particular group sizes that are defined as shown in Figure 8.
Figure 8 shows the driver program that was constructed to drive this program. With over 4,200 locations used and thousands of demographic variables being extracted using each drive time polygon, the batch macro approach was perfect for executing this huge volume of work. The results of this work were mind-blowing, by the way. Long live Alteryx, the king of data prep, processing, and full-spectrum analysis.
According to Ned Harding, the chief mastermind of Alteryx, the dynamic input tool is a better technique for doing work like this. He once explained to me why using the dynamic input tool is superior, but that was more than a year ago and I cannot remember what he said!
As I explained to Ned at the time, I could not find any examples of how to use the dynamic input tool. Since that time, however, there are now a few examples of dynamic input tool usage thanks to superstars like Chris Love and a few others. I’ll have to indulge in this greatness at a later date.
However, for an old-school guy like me, this batch macro technique makes sense and is easy to implement once you understand the fundamentals. One day I’ll write the second article on iterative macros, especially since I spent about 2 decades studying the topics of convergence, iterations, and numerical methods.