Tableau Prep: The Good, the Bad, and the Ugly

prep error

 

Since my last blog, I have used Tableau Prep 2018.2 to clean five different datasets so I think it’s a good time to discuss the good, the bad and the ugly of Prep…

 

The Good

As with Tableau Desktop, Prep is pretty. Compared to Alteryx, it looks modern, clean and is just overall, aesthetically pleasing. The user interface is friendly and intuitive. Who doesn’t love a good ol’ drag and drop? I mean, we love Tableau Desktop right?

Unlike Alteryx, Prep lets you actually interact with your data as you would in Desktop. I personally love being able to do this.

Tableau Prep has some great built-in features for data cleansing. It’s easy as pie to split fields as you would in Alteryx with Text to Columns. It’s easy to remove whitespace and change the case of your fields (note: you can only change the case of the whole string, not title cases).

Another super useful function is the Pronunciation Group and Replace. Take a look at Figure 1, you can see that ‘Growlith’ should be spelt like ‘Growlithe’. You could click on ‘Growlith’ and type the ‘e’ manually or… you could use Pronunciation as seen in Figure 1. Doing this groups the two terms together under ‘Growlithe’ (denoted by the paperclip icon).

Figure 1. Demonstration of Pronunciation Group and Replace

 

Amazing right? Yes, but there are limitations. This segways us nicely into…

 

The Bad

When prepping my ‘Happiness in Words’ viz (link here), I needed to group lots of terms that were the same but incorrectly spelled (e.g. happiness and happyness). When using the Pronunciation tool, it grouped terms that were definitely not the same word together (Figure 2). I actually didn’t notice this until I went into Tableau Desktop, played with the data and was surprised that there was not a single mention of God. As you can see, Prep has grouped ‘God’ with ‘got’, as well as 121 other words like good, caught, could, kitty and err… Kuwait. Not a perfect algorithm.

Prep also has a Spelling Group and Replace function but I gave up after 20 minutes of running time.

Figure 2. Flaw of Prep’s Punctuation algorithm

 

Another shortfall I ran into was when removing punctuation. The ‘Remove Punctuation’ function successfully removes characters like ‘ . , ? – but does not remove characters like = + ^ | (Figure 3).

Figure 3. Before and after cleaning to remove punctuation

 

The Ugly

The biggest gripe I have with Tableau Prep is its slow and unreliable. During class, it was fairly speedy (almost on par with Alteryx) but since then, it has been so slow! I prepped a 20+ million row dataset on UK house prices (viz here) and it was insanely slow. As I was opening up Prep today to prepare the screenshots for this blog, Prep crashed with this error message (Figure 4).

Figure 4. Tableau Prep error

 

The unreliability of Prep means I won’t currently choose it over Alteryx. However, I can see future versions working out the kinks and potentially introducing more amazing features like Pronunciation (with a better algorithm!). Maybe then it’ll surpass Alteryx in data prepping capability and ease of use. We’ve got to wait and see! For now, it’s not perfect but it’s pretty decent.

 

That’s all folks,

 

Louise

 

Check me out my blog feedmedata and follow me on @FeedMeData_

This post was also posted on thedataschool

4 thoughts on “Tableau Prep: The Good, the Bad, and the Ugly

  1. Zahra Badaroudine 6th Nov 2018 / 9:10 pm

    I *love* Prep, but… I agree that Prep needs a lot more fine-tuning before I consider it “production-ready”. It has improved a lot with each iteration. 2018.3 freezes a lot less and it’s got the data roles improvement (it wouldn’t help with got/God, though).

    It’s still slow (not as bad as it was) and demands more resources to work more efficiently than the typical business laptop has. I play a lot with the data sample size depending on what I want to do. Change field names? Since it rescans the data each time, I go with 10 lines of data. Create a calculated field for which I already know the formula? 10 lines of data again. A join on unfamiliar tables/sources? Default or all data. At worse, I’ll display only the columns I’m looking at and check those or output only those to see how my join performs.

    As I said, I love it, but it definitely has some shortcomings.

    Like

    • feedmedata 6th Nov 2018 / 9:46 pm

      Thanks for leaving a comment Zahra! I haven’t used early versions of Prep yet but it’s good to hear that it’s developing quickly, I mean, version 1 was released not too long ago. I can’t wait to see how much more clever Prep will become in the near future. I definitely agree that for now, it needs more tweaking to become “ready”.

      Like

      • Sidhesh mangle 30th Nov 2018 / 4:45 pm

        Yes, I agree Prep is good on excel, flat files but not stable for enterprise dBs.I connected to Hadoop using impala connector, pulled few raw tables and did just selection- It got freeze. Tried 3-4 times. Reduced steps but still no luck.. It had same issue. After checking task bar. Java memory utilized almost 6.5Gb of my 8 gigs. Not sure.. If it needs more memory or need some intelligence or optimization on data profile queries etc

        Like

      • feedmedata 3rd Dec 2018 / 8:56 pm

        Thanks for your comment, Sidhesh! Yes, Tableau Prep isn’t great for cleaning large datasets as it works by bringing all your data into memory. This may be why you encountered problems. Alteryx is much better for databases.

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s