I have scraped a wide variety data from websites over the years. Many times I have had to pre-process the data before using it in Tableau.
Data Pre-Processing Methods
One of the common pre-processing steps is to extract URL’s (hyperlinks) that are embedded within the HTML codes that I capture when scraping the websites. I previously discussed how I do that in the article titled: Techniques for Manipulating Data Formats Before Usage in #Tableau.
In an upcoming post, I’ll finally be writing part 2 of that post by revisiting and expanding those techniques to include data manipulations performed in Alteryx (which is much easier and much more fun!).
Another form of pre-processing I commonly have to do is extracting hyperlinks from Excel cells. This is the topic of this blog post. I always need to extract the hyperlinks to be able to use them in action commands within Tableau dashboards.
I’ll attempt to keep this post short, but a certain degree of expertise has to be explained. I will try to remain focused on how I extract the hyperlinks from Excel cells.
There is all kinds of information on how to use the hyperlinks as an action within a Tableau dashboard, so I will not be discussing this topic in this post. You can go to my Tableau Public site (Figure 1) to find my searchable Tableau knowledge base tool (Figure 2 – which will be getting updated soon!) to learn all kinds of neat dashboard tricks.
Figure 3 gives basic information on how URL’s can be used as actions in dashboards. The source of this information is here.
The Hyperlink Extraction Method
Let’s start with an assumption that you have a series of Excel cells like those shown in column E in Figure 4. These cells have text in them (blog post titles) but also have embedded hyperlinks (currently invisible in Column E in this view).
The text and the hyperlinks can be exactly the same thing, but in general they are not. In this example, there are a series of blog post titles shown in the cells (column E) and each one of them has their own currently invisible hyperlink (shown in column C after being processed). Figure 5 shows the hyperlink in a pop-up box that Excel displays when I hover over cell E474 in Excel.
The easiest way to extract the hyperlinks from all the cells in column E is to write a macro and trigger it with an Excel formula that can be easily copied down column C.
Since I have written thousands of formulas and macros and written extensive VBA (visual basic for applications) programs within Excel, a lot of this is second nature to me. If you haven’t ever tried to do something like this, my explanation that follows might not be sufficient for you. If you need any additional help, please feel free to ask me via the comment section.
Figure 6 shows the formula that I placed in cell C474 that makes the function call [=GetURL(E474)] that actually extracts and returns the hyperlink.
Writing the Macro Code
Before you can write the formula and copy it down column C, you have to write the macro code for GetURL. This means that you will need to activate the Developer ribbon, know how to work in the VBA environment, create a module, save your file as a macro-enabled file (*.xlsm) and then actually write the macro code. This is what I mean when I say that my explanation might be insufficient for some readers!
Figure 7 shows what this code looks like within the VBA interface of Excel. The code is very simple but powerful. The GetURL function receives a range variable (i.e., a cell address) and returns the hyperlink in that range. I normally would provide a download of an example like this one, but I cannot put a link to a *.xlsm file within the WordPress content management system. If anyone wants this, just send me an email at email@example.com and ask me for the Excel file.
I have used this type of technique for countless operations within Excel when preparing data for usage in Tableau. If people find this concept useful, I might unleash some of my more powerful techniques on those willing to partake in the exercises.
Although many Tableau users might scoff at the notion of using Excel to house their data, it still is a formidable tool in the data analysis world. I have written extensive analysis programs within Excel. My largest code has over 30,000 lines of computational VBA code, with hundreds of functions like the one I’ve shown here as well as many other types. There is a LOT you can do within Excel to implement computational methods, to perform statistical analysis and to complete data processing/reformatting operations. Thanks for reading…
Many Months After Publication
Sometimes you have to go the other way. What I mean by this is that you have to make string fields into working hyperlinks in Excel. This process isn’t obvious, so I created a video to show how it is done. Here is that technique:
If you like this article and would like to see more of what I write, please subscribe to my blog by taking 5 seconds to enter your email address below. It is free and it motivates me to continue writing, so thanks!