Since my last blog, I have used Tableau Prep 2018.2 to clean five different datasets so I think it’s a good time to discuss the good, the bad and the ugly of Prep…
As with Tableau Desktop, Prep is pretty. Compared to Alteryx, it looks modern, clean and is just overall, aesthetically pleasing. The user interface is friendly and intuitive. Who doesn’t love a good ol’ drag and drop? I mean, we love Tableau Desktop right?
Unlike Alteryx, Prep lets you actually interact with your data as you would in Desktop. I personally love being able to do this.
Tableau Prep has some great built-in features for data cleansing. It’s easy as pie to split fields as you would in Alteryx with Text to Columns. It’s easy to remove whitespace and change the case of your fields (note: you can only change the case of the whole string, not title cases).
Another super useful function is the Pronunciation Group and Replace. Take a look at Figure 1, you can see that ‘Growlith’ should be spelt like ‘Growlithe’. You could click on ‘Growlith’ and type the ‘e’ manually or… you could use Pronunciation as seen in Figure 1. Doing this groups the two terms together under ‘Growlithe’ (denoted by the paperclip icon).
Amazing right? Yes, but there are limitations. This segways us nicely into…
When prepping my ‘Happiness in Words’ viz (link here), I needed to group lots of terms that were the same but incorrectly spelled (e.g. happiness and happyness). When using the Pronunciation tool, it grouped terms that were definitely not the same word together (Figure 2). I actually didn’t notice this until I went into Tableau Desktop, played with the data and was surprised that there was not a single mention of God. As you can see, Prep has grouped ‘God’ with ‘got’, as well as 121 other words like good, caught, could, kitty and err… Kuwait. Not a perfect algorithm.
Prep also has a Spelling Group and Replace function but I gave up after 20 minutes of running time.
Another shortfall I ran into was when removing punctuation. The ‘Remove Punctuation’ function successfully removes characters like ‘ . , ? – but does not remove characters like = + ^ | (Figure 3).
The biggest gripe I have with Tableau Prep is its slow and unreliable. During class, it was fairly speedy (almost on par with Alteryx) but since then, it has been so slow! I prepped a 20+ million row dataset on UK house prices (viz here) and it was insanely slow. As I was opening up Prep today to prepare the screenshots for this blog, Prep crashed with this error message (Figure 4).
The unreliability of Prep means I won’t currently choose it over Alteryx. However, I can see future versions working out the kinks and potentially introducing more amazing features like Pronunciation (with a better algorithm!). Maybe then it’ll surpass Alteryx in data prepping capability and ease of use. We’ve got to wait and see! For now, it’s not perfect but it’s pretty decent.
That’s all folks,
This post was also posted on thedataschool