I have used many hundreds of comma-delimited (*.csv) files with Tableau over the past few years. I like to use csv files because Tableau reads them really fast and I can easily build gigantic files. Over the years, I have been using business-related files that have continued to increase in size from thousands, to millions, to billions of lines. I’ve built files that have over 1 Billion lines and have used them in Tableau without any trouble, but I haven’t had to do what I did today. Today I was throwing a moderately big 66 Million line csv file at Tableau when I encountered an issue I had not faced before, so I thought that I’d document this issue and provide a solution to this problem.
If you like this article and would like to see more of what I write, please subscribe to my blog by taking 5 seconds to enter your email address below. It is free and it motivates me to continue writing, so thanks!
The Problem Encountered
In this application, Tableau was unable to refresh the data extract when I made a simple change in the *.csv file. I received the error message shown in Figure 1.
Why did this happen? The file contained over 66 million rows with 5 columns of data (>330 M pieces of data), but the file size was beginning to get a little big at 4.4 Gigabtyes. Tableau reported that the query was too complex to process. The Tableau data extract refresh was necessary because I had to change one field in the csv file from text to numeric. This simple change caused Tableau to derail, or more accurately, to not be able to update the data extract. The problem was that I had already invested a bunch of time into the Tableau workbook before realizing that I needed to change that text field to numeric. So once I saw that Tableau could not refresh the data extract, I felt like I may have lost a bunch of time. So instead of starting over, I tried a simple solution to see if it would work.
Rather than fighting with Tableau in my current workbook, I simply saved and closed this workbook, hoping to reuse it later. I opened a new workbook and connected to the revised csv file. Tableau read this file fine, re-created the data extract (the same file name), and then I closed the new (temporary) workbook. I reopened the original workbook and was pleased to find that the worksheets updated perfectly using the new data extract. All the work I had completed in the original workbook was still in-tact and I was able to use the revised field in the manner that I wanted to. It seemed like magic at the time. Sometimes simple solutions are awesome.
Some of you might be wondering why I didn’t just allow Tableau to convert the data from text to numeric to avoid having to refresh the extract. Well, I thought of that too, and I tried it. Although this technique might work fine for small applications, having to do data conversions on 66M rows of data probably isn’t the wisest thing to do. Performance issues might begin to plague you. Also, my buddy and I are inventing a framework for an advanced statistical modeling application for medical data that has to be used over and over, so we didn’t want to have to remember to tell Tableau to convert this field on all upcoming data sets. To make the correction, all I had to do was remove the quotes from the field in the csv file and this was done in under 1 minute using my favorite editor, Vedit. This is another example of why it is so important to get your data formats correct before you begin working with the data in Tableau.
Lastly, occasionally you might experience an error in Tableau that is similar to the problem described above. This problem is simply that Tableau reports that it cannot refresh a data extract that you are using. This happened to my buddy yesterday, so he called me over to find out why this was happening. A common reason for this is that you have two separate Tableau workbooks open at the same time that both refer to the same Tableau data extract. You might not realize that this is the case as you might have several workbooks open, so that when you change your source data and tell Tableau to refresh the extract, it will not do it. To solve the problem, you have to close one of the workbooks so that the data extract is only linked to one open workbook. This is just one of the behaviors I’ve stumbled into through the years of using Tableau, so I thought it was worth a mention.