A Quick Big Data Excel Tip For #Tableau Users

Introduction

What I am going to describe has happened to me many times and has caused me a whole bunch of grief. Most recently, the problem was so severe that I knew I had to find a solution. What you will see is a simple solution from:

20140322_194200

The Big Data Excel Problem

Let’s say that you are given an Excel file from your buddy at work or from a client and they want you to visualize the contents with Tableau. The problem is, the person that created the file failed to realize that when they gave you the file, they were also about to directly deliver some pain to you.

The reason you feel the pain is that the creator of the Excel file jammed the workbooks full of formulas such as countifs, vlookups, string concatenations, etc. On top of that, they gave you hundreds of thousands of rows and many columns of data in the file. In my worst-case scenario, I was given a file with a few hundred columns of data stored in about 1M rows. That file took over 20 hours to open!  Can you believe that? It is true and that was on a very fast, very capable computer.

If you have never experienced this, let me tell you what it is like. The first time you go to open the file in Excel to see what they gave you, your computer bogs down and you see Excel reporting that all 4 or 8, or 40 cores are running and the progress is very slow-going (40%…45%… and minutes or hours are passing). As you sit there thinking about what you are going to do after work, you stop and you begin plotting your revenge on the author of that spreadsheet. You wonder, how am I ever going to get this job done?  You think to yourself that this situation is NOT

20140322_204706.

Don’t Make This Mistake!

Your instinct is to wait for the file to complete the loading process. Once complete, you decide to grab all the data at once and perform a paste value operation. You think you are going to solve the problem by removing the formulas with the paste value operation. I am here to tell you that your instinct is wrong.

When you paste values, you will be waiting as long as it took to load the file in the first place because Excel is going to recalculate everything before pasting the values. You will have inadvertently doubled your pain.

Do This Instead!

The best solution to the problem is to do this. Once the file is loaded, immediately save the file as a *.csv file. if you want to do more work on the data, simply reload the *.csv into Excel. When you reload the data (which is a quick process), the formulas will be gone and the values will be the same was when the person originally gave them to you.

If you do not need to do any more work on the data, just load the csv file into Tableau. The csv files load much faster into Tableau anyway.

20140322_204621

By doing this sequence, you will save yourself a lot of time and grief. Trust me, I’ve had this happen more times than I care to remember.

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s