The Prep Off…
Two major contenders in the ring tonight: new kid on the block, Tableau Prep and reigning champion here at the Information Lab, Alteryx Designer.
How do these two softwares compare when cleaning two Excel sheets (within the same workbook)? The chosen data features the table seen in Figure 1. The data includes merged cells, totals, unnecessary content, dates in a row and times in a column, plus its in German! Download the Excel file (2018-09-18 Hamburg) and follow along with me.
Connecting to the Excel files in Prep is really easy. It’s also easy to connect to both sheets with a wildcard union. Points to Prep. Figure 2 below is how the data looks in Prep without having switched Data Interpreter on. As you can see, the dates are in the first row and the second rows have some field names. Not ideal. I tried processing the data from here and it was really tricky getting the dates into their own column. I gave up on this route.
Luckily, Prep has a Data Interpreter which does a lot of the leg work. Figure 3 is how the data looks with it switched on. You can see that the date and potential field names have been concatenated into the headers and the field values contain only the correct field values (for the most part). Much better starting point.
Noticing that the data has the same three fields for each date (Anzahl Bons, Umsatz pro Bon, and Umsatz pro Stunde) as shown in Figure 4, I decided to split the data into three streams by adding branches and removing unwanted fields and then process each stream separately.
To get the days of the week and extra information out of the headers, leaving only date, I manually clicked into the header and edited. There are alternative ways to do this.
Since Tableau likes long thin data, I want a column for dates instead of dates as headers. So, the need to pivot all my fields aside from time (F1 in Figure 5)
After pivoting and some cleaning, I have three streams of data, each in the structure shown in Figure 6. I have time in all three streams (F1 in Figure 6), a field for the measure values and a date field. The next step is to join these three streams back together.
The final result of my joining (and cleaning unnecessary fields is shown in Figure 7. Not too shabby.
After cleaning the data in Prep, I attempted to clean the data in Alteryx using the same method. Right off the bat, it is not as easy to join multiple sheets from the same Excel file in Alteryx. As there were only two sheets in this Excel workbook, I created two Input Data tools and joined them. If there were more sheets, a Dynamic Input may be necessary.
Also, as Alteryx does not have a Data Interpreter, I worked with the data as seen in Figure 2 earlier. Figure 8 shows how it looks in Alteryx.
After skipping the first row and using the Dynamic Rename tool to get the dates in the field headers, I am at the same point as I was after using the Data Interpreter in Prep. As before, I now split my data into three streams using the Select tool. The next step is to pivot my data using the Transpose tool. Each stream is now in the structure shown in Figure 9.
Finally, I can join my streams back together and end up with at the same point as I did in Prep. My data looks pretty good (Figure 10)!
So which was better?
Both Tableau Prep and Alteryx were able to clean the data and get into the same format for Tableau. Figure 11 below compares the full workflows in both Tableau Prep and Alteryx. From Figure 11, it looks like more tools are used in Alteryx but it should be noted that multiple actions were carried out within most of the Cleans in the Prep workflow.
Tableau Prep’s Data Interpreter is for sure, amazing. It really helped me work out the logic of how to process the data. Alteryx, though missing this feature, has the Dynamic Rename tool which was really useful!
Prep took 1.0 second to run and output while Alteryx took 1.6 seconds. Not a huge difference, plus with large datasets, Alteryx would definitely be quicker.
Overall, it’s a draw!
Both Prep and Alteryx have their pros and cons. I will say that it took me a while to work out the logic in Prep. Putting my workflow together in Alteryx was much quicker as I was applying a logic I had already come up with. I think if I were to start off in Alteryx instead of Prep, I would have come up with a method much quicker. But this could just be because I am more familiar with Alteryx.
Take your pick of which ever software you’d like. They both work!
Happy data prepping!
This post was also posted on thedataschool