Using Tableau to Visualize Hashtag Spelling Variations for #BigData and #Tableau Using Twitter Posts

I was curious to see how valuable the use of hashtags can be in performing some basic research.  Two answer the question, I did a little study over the past couple of weeks.  Over the past 12 days (7/20 to 7/31/13), I captured all the Twitter posts that had either #BigData or #Tableau in the post.   How I captured the Twitter data using the Twitter API is another interesting story but is beyond the scope of this article, although you can see a short video of a part of the process below.

Once I had the data, I processed all the hashtags that were included in these 2,450 posts.  I sorted the data set, organized it by category and sent it to Tableau. The organizational step simply put all the like terms such as BigData into one bucket (i.e., #BigData, #bigdata, #Bigdata…) so that I could do some counting and visualization.  A simple dashboard showing the results is shown below.


This short study taught me a few of things regarding hashtags.  First, I was surprised to see how many spelling variations there can be for key terms used as hashtags.  For the term #BigData, there were 21 variations captured in the Twitter feeds. Many people apparently don’t really understand what a hashtag is supposed to represent (and they probably don’t know about so they just invent their own hashtag at the time they are writing the post! Secondly,  the study allowed me to look at the amount of peripheral hashtag noise that is being generated for these two key terms.  There are over 900 other hashtags used in the posts that are principally related to BigData (1847 posts = 75%) and Tableau (698 posts = 28%).  Lastly, I was very surprised to see that Google, one of the leaders and innovators in BigData, were mentioned in only 4 hashtags, with two of these being related to Google glass.  I guess I expected to see more hashtags referencing company names and/or emerging technologies, especially for companies that are promoting their BigData technologies.  Maybe the 140 character limit has something to do with this. I was also surprised to see that only 9 posts combined the hashtags #BigData and #Tableau (0.4% of activity, but required exact hashtag spelling), especially considering that Tableau was a 2013 Codie best big data solution finalist. Click here for a link to the Tableau Public Workbook for this analysis. For more information on how to use hashtags properly, have a look at this article

In the second part of my analysis, I’m going to investigate who wrote the posts and who is referenced in the posts using the @content available in these posts.  I want to see who is doing the bulk of the publishing on these topics and what they are writing about in particular.  More on this later.

