Earlier this year I realized that one of the absolutely most powerful features of Tableau is its ability to act as a visual interface to any type of database that I can dream of.
I use Tableau to visualize every data set that I work with both professionally and personally. I have used Tableau for real-time score reporting of AAU basketball tournaments, for organizing and managing youth rec sports leagues, for handling personal information, etc.
There is really no limit to what Tableau can do for me and our clients. Not too many software packages can say that, and that is why I have stated that Tableau is the most revolutionary computer program (a serious paradigm buster) that I have used in the past 30 years of being involved in computational sciences.
One of the reasons that I have this insight, I suppose, is that I spent years writing graphical processing engines to do the things that Tableau does so well. If you have ever tried to write routines to handle date operations, including leap years, day of week summaries, etc, you know what I mean. When you see the concept professionally executed like Tableau does, it is easy to recognize brilliance in its clearest form.
If you like this article and would like to see more of what I write, please subscribe to my blog by taking 5 seconds to enter your email address below. It is free and it motivates me to continue writing, so thanks!
The Origins of the #Tableau Database
There are a lot of great ideas being generated by people all over the globe with respect to Tableau. Many people write about their work on blogs and on Twitter. People are using various hashtags to represent events or ideas and Twitter is a great way to spread the message about their work.
About six months ago, I learned of a technique to capture Twitter data based on any given search term including hashtags and have the information automatically updated and stored in a database.
Termed the Twitter Archiving Google Spreadsheet (TAGS), this work was completed by Martin Hawksey, who is unbelievably brilliant in his work and is openly willing to share his techniques. You can click here to get instructions on how to do this, or click here to learn more about Martin and TAGS version 5.1.
By setting up a TAGS spreadsheet with #Tableau as the search term, I have been able to capture Tableau-related Twitter activity over the past six months. Every couple of weeks I update my version of the database. I take what Martin’s code produces and I process the information a little further to produce some additional fields that I want to show in the Tableau dashboards.
There is a fair amount of work completed in these steps and the best way for me to demonstrate that is with a video. Although a little bit long, Video 1 shows how I gather the data, append it to the existing database, perform some operations on the data and then update the Tableau database. If you like to learn about detailed data manipulation and processing tricks and techniques, this video is probably for you. Otherwise, skip it to save your sanity.
The processes I developed that are shown in the video took a little while to figure out, and sometimes I needed some help from my buddies at Greenview Data, so there are some valuable nuggets of information shown. As an additional salute to pure genius and unbelievable software capability, I hereby recognize Ted Green of Greenview Data for his programmer’s editor Vedit.
I have been using this program for well over 20 years and has always been ahead of its time. It is pure brilliance in every sense of the word. Vedit is used in this video to process the data fields using regular expressions. Show me another tool that can allow you to edit files of over 100+ gigabytes in size with the speed that Vedit does and I’ll be sure to have a look at it.
These data processing steps are necessary for the following reasons. Although the database as received by Martin’s Tags program is structured, some of the fields are not exactly consistent or contain the information in the format that I need to show the data in the Tableau dashboards I created.
The tweet field in particular, is a nasty conglomeration of text, codes, hashtags, variable languages and characters that contain spurious line feeds and other mysterious things that make processing that field problematic. The reason that this field is so unwieldy is that there are many different platforms (devices, operating systems, apps, etc) that people use to send data to Twitter.
In the video, a real-time example of a previously unknown problem occurs as I process the tweet text field to pull out the hashtags from each line. Although I never saw this problem before in dozens of usages of this code, I was able to isolate the issue to that one particular tweet I show in the video.
I solved the problem simply by replacing the spaces with spaces. Go figure?
Whoever wrote that tweet, on whatever device they used, had an unusual character entered into the field that looked like a space but really wasn’t. That is part of the problem of dealing with “structured” but “not really structured” data coming from social media sites like Twitter.
With each task comes new challenges and as time ticks on, I’m inventing new approaches to getting the job done the most effective way that I can. I like to say that I’m expanding my toolbox for handling social media data.
One of the beautiful things about Martin’s approach is that the #Tableau database is built over time without any effort on my behalf, other than the initial setup. The Twitter API only allows access the past 7 days of data, due to the enormity of information being gathered I suppose.
It would be impossible for someone to try to reproduce at any one time the data that we have gathered over the past 6 months. By having an accumulation of social media data over time, you can get a feeling for the growth of the next hot item, you can view twitter trends like the number of followers for any individual over time, and it is easy to query the work of any individual using the Tableau dashboards I created.
This approach really is a research tool for me in my quest to learn as much as I can about Tableau. There are a lot of brilliant minds working independently on various topics and much of that work is being captured in this database thanks to all the people tweeting about Tableau.
It is fantastic that so many people are collectively using their talents to build a exceptional database without even knowing it! Thanks very much Martin, you are awesome and thanks to all the people that are creating great Tableau-based content. All of us are building upon one another to create some great tools for the future.
The #Tableau Twitter Database
Yesterday was a good example of why I maintain this tool. I had recently read on Twitter about someone writing a blog post on how to draw a bell-shaped, normal distribution curve on top of a histogram. Since this is something that I do routinely and was specifically working on yesterday, I wanted to read about how that was done.
Without knowing specifically who wrote the post, I had to search for the information on Twitter. Although in this case I probably could have found the original link to the information using a Twitter search because it was only a few days old, I chose to use my database instead.
In Video 2, I recorded my actions during the usage of this database and went on to show a couple additional features available in the other dashboards. So the next time you are trying to find something you remembered seeing on Twitter, use this tool to locate the tweet because it can save you some time and lead you to associated information quite easily.
Update Three Years Later (6/15/16)!
A lot has changed since I first wrote this article. I now would do this work differently using Alteryx to gather the data. I doubt that I would use a Tableau web connector due to the difficulties in the Tweet field that I described above. The Alteryx solution would completely solve these problems.
Today there was an interesting article about Twitter. This article indicated that Twitter should be selling its data. This article indicated that Twitter might just be the most valuable data sources in human history. I happen to agree with him and I realized this over 3 years ago!