Test 5 of Tableau Vs Power BI: Pure Computational Speed

summaryjpg


Introduction

This is test 5 in a series of real-world examples where Tableau capabilities are compared to those of Power BI (PBI). Once again, a completely random example was selected for this test with no pre-conceived notion of how each tool would perform during the test. The story about how the mathematical approach used in this test is funny (and true!) and is explained in the first video shown below.

This test represents a challenge in which PBI claims to have an advantage over Tableau. I remember reading a couple of months ago that Microsoft claimed that the PBI data engine was reported to be 10 to 100 times faster than the Tableau Data Engine. This statement intrigued me so I thought I would try to verify it myself.

The question I want to answer with this testing is this: How long does it take each tool to compute a series of numbers? What I am trying to determine in this test is a direct comparison of the computational speed of the data engines. I hope to be able to find out which engine is faster.

My Background in Benchmarking

I’ve got a long history (30 years) of benchmarking computer programs. Typically, the benchmarks I completed were of two types:

  1. Given a few different compilers, take a section of code (Fortran, C, Pascal, Basic, etc), compile it, run it and compare the time required to complete the computational test. These benchmarks helped us determine which were the best compilers for performing serious number crunching, allocating memory, and other nerdy topics. Those results were always conclusive because we controlled the parameters of the test (i.e., how many iterations were allowed, the convergence criteria used, etc).
  2. Given a single compiler, compare the computational speed of algorithm 1 vs 2 vs 3, for doing the same tasks. Determine things like big O for the algorithms. This type of test might compare different sorting algorithms or iteration methods, for example. Once again, the benchmark results were repeatable and definitive.

The Initial Test Case – Computing Squares With A Finite-Series

In the first video shown below, I discuss how I created the idea for this test and I show some initial testing of Tableau. The computation of the square of any number can be achieved with an iterative method. This method is a finite-series for the computation of squares and it uses recursion to solve the problem. I programmed this approach in Tableau to see how Tableau would perform under a heavy computational load.


This approach isn’t efficient, but I didn’t want efficiency – I needed to make the data engines do some work. By feeding a list of 10 million numbers into Tableau and asking it to compute the square of each number, I made it do some work by performing at least 30M operations. The iterations were possible using the calculated field that I show in the video (i.e., using the previous_value(0) function).

This form of recursion is very useful in Tableau and gives Tableau the ability to calculate a lot of the readily available quick calculations like running totals. I need you to remember the importance of this feature for the next few minutes, at least until you get down to the PBI explanation!

The Tableau Results


The time required for Tableau to compute 10 M squares was less than 35 seconds. When Tableau draws the resulting curve of 10M squares, it draws all 10M marks. In this case, the graphical rendering takes longer to complete than the computation of the squares.

The Alteryx Results

Just for the fun of it, I added Alteryx to the test, which really isn’t fair because it is such a highly optimized program. The best time for Alteryx to compute 10M squares directly was 2.4 seconds and iteratively it was 3.7 seconds. Yikes, that is fast. Now you know why I wrote this article.


The Power BI Results

For me to do this test in Power BI, I had to do some DAX research. I had to find the equivalent function to previous_value() in Tableau to be able to do the iterations. Essentially I need to iterate on a new calculated field (called a measure in Power BI lingo). Even with their row based iterators (Sumx, etc), there is no way to access previous values of a measure.

In other words, I needed to find a way to do multi-row operations like I used in Alteryx. Well, unfortunately, there are no such functions in PBI (at least that I can determine). Therefore, I was not able to make a direct comparison of Tableau to PBI for this numerical test.

I am now wondering if the lack of this feature is one reason why Microsoft has not been able to deliver on its promise to deliver a whole series of standard-issue quick calculations, like the ones that are available in Tableau.

A Modified Test

To get an idea of the computational speed comparison, however, I used a different method to compute the squares. I let the measure compute the square directly as Number * Number. This single operation was less than the three operations used in Tableau (2 additions and 1 subtraction), so in theory this would take less time.

For this formulation, PBI completed the 10M squares in 28 seconds. Tableau also completed the 10M squares in 28 seconds. The programs tied in this case.


The Results

The research and testing I conducted tried to determine which program is faster with respect to pure computational speed. In the first test I designed, I was not able to apply the same mathematical formulation to both Tableau and PBI. In the second test, I applied a less strenuous direct calculation formulation and both programs completed the task in 28 seconds, for a virtual tie. All results are shown in Figure 1 below.

summary_results

Figure 1 – Results of computing 10M squares, iteratively and directly. Power BI was not able to iteratively compute the squares due to the lack of a previous_value function.

Since I was testing programs that perform multiple functions (i.e., computations, graphical rendering, different capabilities of rendering) and have differing capabilities, it is not so easy to conclusively state which data engine is faster. What I can say is that both data engines are very fast, very robust, and will make you happy with what they can do. At this point, I’d say that they are tied in computational speed, although Tableau does have some additional computational flexibility.

Questions for Microsoft

Although this research and testing may not be totally definitive, I now feel like I need to ask Microsoft these three strategic questions:

  1. How did you determine that the PBI engine is faster than the Tableau data engine?
  2. What testing were you able to do that was able to isolate the pure computational engine from the rendering engine?
  3. Can you publish the results of your testing so that I can see what you did?

Future Work?

Sometimes when I do this type of research, I will think of new approaches that can be used to answer the question. Now that I know more about what DAX can and cannot do, there might be other tests I can design to force the programs to work very hard in a computational sense. It is clear that longer computational times will be needed to determine which engine is faster.

With that being said, maybe I’ll revisit this topic later. For now, however, all I can say is that PBI is fast, Tableau is fast, and Alteryx is the most robust and fastest data engine of all of them.

Some Funny Things Can Happen When You Do This Type of Work

Thanks for reading.

5 thoughts on “Test 5 of Tableau Vs Power BI: Pure Computational Speed

  1. Hi Ken,

    Nice post!

    I have a question…did you look more closely at calculation time vs. rendering time? Since Tableau is rendering all the marks and PowerBI is sampling I’d think that would be a variable that would need to be isolated. I know you can pull it out of Performance Recordings in Tableau, no clue on PowerBI.

    Jonathan

    >

    • Hi JD,

      The test that I did attempted to capture only the computational time required to calculate the squares. You can see that in the videos. The time required for rendering was not included because 10M points were being drawn vs some unknown number of points being rendered. The past test (#4) included the rendering component, and I had people write to me to say: “Hey, see if you can just compare the computational engine rather than the ingestion+computation+rendering time”. So that is what I did.

      As you well know, it probably isn’t a perfect comparison between products because rendering and computation probably have some code intermixed. I did the best I could by having labels update in Tableau when the computations appear to be complete, so that is why I have written the three questions to Microsoft. How was it possible for them to isolate the computational engine of Tableau if they don’t have the source code? I would like to know how they were able to derive their performance stats and make the claim of a 10x to 100x faster data engine.

      I was going to dive deep into the Tableau performance logs to uncover more insights, but quite frankly, I spent too much time on PBI for this simple test case. Getting things done in PBI is just not as easy as it is in Tableau. I know this because every time I revisit PBI, I have to relearn how things are done. This is an indicator of software that is not optimally designed.

      In this article, I showed a faster data engine when I demonstrated Alteryx. Alteryx completed the tasks direct and iterative calculations in 2.4 and 3.7 seconds, which was faster than both Tableau and PBI. However, Alteryx is only running optimized computational code with no concerns over rendering. The squares were created but didn’t have to be used or moved anywhere in those cases.

      These tests are intellectually interesting and informative. People can argue about this or that, but when the job has to get done, I’m going to Tableau. As I told my team today: “Power BI isn’t bad, it just is not Tableau”. I like some things about Power BI, but there are no compelling reasons for me to use it instead of Tableau. It just does not have the flexibility I need to do the work I do.

      Power BI is immature compared to Tableau, it is restrictive in many ways, and it doesn’t have the same degree of computational flexibility that is offered by Tableau. Doing testing like this reinforces these findings, without me having to being punitive or judgmental. I just let the results speak for themselves. In every test conducted so far, Power BI has had some form of deficiency that has not allowed me to do what I wanted to do. In the future these weaknesses will be removed, but for now they are real and are an impediment to me using the tool for production work.

      Ken

      • Hi Ken,
        Can you share the DAX you used for the calculation?

        The tabular engine in PowerBI is very fast for scanning data – 5 billion rows in a second. So operations like SUM, Count, Distinct Count etc. with boolean filtering will be extremely fast.
        Operations that require iterating over every row in a large table e.g. getting the previous rows data in a very large fact table are slower.

        For most use cases e.g. Give me the distinct count of customers who bought this product, the Tabular engine in Power BI will be very fast.

        I’d like to see some comparisons on performance for use cases such as counts and distinct counts. I’d also like to see performance comparison involving many-to-many scenarios such as “Show me the sales for each store group where a store can belong to many groups – and the totals should be correct”. This is a common scenario where I work and we use the Tabular engine over hundreds of millions of rows of data with a many-to-many bridging table containing hundreds of thousands of rows of data – with query response times of a second.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s