In the past few releases, we have seen the hard work behind the new Data Prep (recipes) emerge. While Data Prep aims to make it easier for users to get started with transforming data and creating datasets for insight, there might be a few question marks if you are used to navigating the dataflow. Hence I will in this blog try to give some tips for those dataflow builders keen to get started with data prep.
What is Data Prep
Before getting to the deep dive of what I really want to talk about – tips to master the new Data Prep – let’s just quickly talk about what Data Prep actually is with a little history lesson.
When Tableau CRM started we first movers just had a JSON file that defined how to join and transform our datasets to work with. But to make it easier to work with the Dataflow the Dataflow UI was introduced. Once you have mastered the dataflow it’s easy enough to work with, but new users find it overwhelming.
Somewhere down the line recipes was introduced to do some of the same transformations as you could do in the Dataflow. However, the UI was easier to navigate for newcomers but transformation wise Dataflows were still more powerful.
The recipe is what the product team has continued developing on, but of course taken even further than before, which is what we now refer to as Data Prep.
The important thing to know is that existing dataflows can currently not be transferred to Data Prep automatically, it will be a manual task, where old recipes will work automatically in the new Data Prep UI. As Data Prep is different from Dataflows and is thought to be easier to navigate (read more about that here) you will not see a direct one-to-one mapping of nodes, transformations, navigation, etc. Hence this blog is aiming to reintroduce Dataflow builders to how to work with their data.
Data is key for both Dataflow and Data Prep. But there are a few things that prove to be helpful to know when creating your datasets.
Pre Data Prep building
When you want to build a dataset in Data Prep you first need to make sure the data you want to use is already available in the connector, which is a change when it comes to the local connector. Hence you need to go to the connector and bring in the objects and fields that you want to use.
Add a new object to the local connector:
Add fields to an existing object:
Data selection in Data Prep
Bringing data into Data Prep is done by adding “input nodes”. These can be added in two ways. First by clicking the “Select Data” button in the middle of the canvas if it’s empty or top left corner.
Secondly by clicking the plus icon next to any node and select join or append in the drop down.
When you are selecting the data there are a few things to be aware of:
- You can filter on the data source (dataset and connected data)
- Notice the location which will tell you the data source – if you are doing a sfdcDigest, digest or edgemart
- You need to check the check box next to the data to actually add it. Notice you can bring in multiple data sources at once.
- Once you have selected a data source a field panel will show up letting you select the fields you want to bring in.
Being fluent in Dataflow I am sure you have your nodes down and you know exactly which one to choose. However in Data Prep, there is not a one-to-one mapping, so let’s look at how you can match functionalities in terms of nodes.
When creating formulas in Data Prep you want to select the “Transform” node, which I consider to be the Swiss Army knife of data transformations. This node is selected for a range of different options that in Dataflow is split into different nodes (see the table in the previous section for a mapping of nodes).
When it comes to computeExpressions we are used to doing it all with SAQL. Before you go down that route in Data Prep check out the common options like bucketing or attributes to make it more simple for yourself. However, do you need to do something more complex you can of course create custom formulas via 𝑓𝑥 button.
One lesson I quickly learned is the syntax is a bit different from what we are used to in computeExpressions and computeRelative as we are using EA-SQL and not SAQL, so do expect to spend a little extra time on this. I’ve personally relied heavily on the documentation for details on the syntax and practical examples. But I do want to share a few generic things I wish I was aware of when I created my first custom formula (let me know if you have other tips):
- API names should not be wrapped in a single quote.
- Use a single equal sign to do an exact match.
- Double and single quotes have the same behavior and are used to identify text values.
- If you select functions from the overview you do need to remove the help text.
- You can have multiple lines without getting errors.
An example with a case statement in Dataflow:
case when 'StageName' == "Closed Lost" then "Lost" else "Not Lost" end
An example with a case statement in Data Prep:
case when StageName = "Closed Lost" then "Lost" else "Not Lost" end
In Data Prep we have the option of doing proper joins (left, right, inner and outer) to add columns from a different data source in addition to lookups which we know from the augment node, needless to say, that this, of course, gives us more options when bringing data together. While I won’t go into details of what the different types of joins do in this blog (instead check this article out) I will say do consider the output when using joins and if that is the desired output. Especially when you are doing an outer join the number of output rows can increase massively.
Another way of joining data is by adding more rows using the append node. The main difference between Dataflow appends and Data Prep Appends is that in Data Prep you can map column headers, hence the name of the column is irrelevant.
Having the concepts of the nodes established let’s cover how the UI works and some things to be aware of.
Creating a Join or Append
The most obvious choice to create a join is to simply click on the plus icon next to a node and select “Join” or “Append” in the drop-down. This action will open up the data selection view and add the selected source to the join. Once you have selected the data to use you will have to make sure for joins to define the keys or for append map the column headers.
For joins specifically, you can create composite keys by adding another key pair. You can also in the column section select which fields to bring forward from both the left and the right side.
The crucial thing to note when you are joining two data sources by clicking the plus sign is that it will add the selected data source as a new input node. If you already had that source on the canvas it will now appear twice. So let’s have a look at how to join two existing data sources.
Drag and drop to join data
If you already selected the data sources you need and want to avoid the same object appearing twice you can simply join two sources by dragging and dropping. Next to the node you want to join click and drag the plus sign on top of the node you want to join with and then select if you want to perform a “join” or an “append”.
Left and right side behavior
One thing that did make me scratch my head when I created my first join was the question “how do I control what is the left and right side?”.
If you are joining by clicking plus followed by selecting a data source the existing source (next to the plus) will be your left side and the source you are selecting in the dialog box will be the right side.
If you are joining two existing sources by drag and drop the node you are dragging will be the right side and the node you are dropping it onto is the left side.
Currently there is no way to edit the left and right side in the node properties.
Creating datasets is rarely a linear process so let’s have a look at some tips when adjusting and deleting elements in Data Prep.
Adjusting field selections
Let’s say you have created a data source but you want to change the field selections, maybe you want to add more fields or maybe you want to drop some. All you have to do is to select the input node, click the edit link next to the object name followed by selecting or deselecting fields.
Note that if an input node is used in different branches you can also drop fields in the join node.
Deleting a node is also possible. Click the node you want to delete so you see the three dots appear and click to delete the node. Hereafter you may have a broken link which you can repair by dragging and dropping the plus sign onto a new node and select the “Connect Nodes” option.
Now let’s have a look at some of the features that go beyond a comparison of Dataflow and Data Prep, but that I see as a welcome addition when working with data orchestration.
Data Prep has new transformations in the Transform node that leverages machine learning such as sentiment analysis and predict missing values. As the topic has been covered in previous blogs, I won’t go into details here, but I recommend checking them out if you want more information:
- New Data Prep platform made easier with native machine learning,
- Smart ETL with Data Prep: Detect Sentiment in Three Steps.
More smart transformations are on their way for example clustering which is covered in this blog.
One of my favorite things about Data Prep is the ability to see a preview of your data. Just click a node to see a preview of the data at the selected stage in the flow. This also applies within a Transform node; each transformation will generate a new field and you will not see the output of the last transformation while previewing the first transformation.
Just like with operational reports the data you see in the preview is just a sample, hence you cannot count on the values you see. This is especially important to note when you are bucketing values. When you are picking values it will only show values that are available in the sample data. However, you can always type and add any value to be bucketed under a new name.
Renaming nodes & descriptions
As you know with Dataflows over time they become complex and multiple people are often involved with building out the flow of data – the same will be the case for Data Prep. To make it easier to manage a great addition is to change the default naming of nodes and add descriptions to document the function of any given node. Simply click the node, then the three dots, and select “Edit Name and Description”. You will see a little speech bubble for any nodes that have a description.
I think many will love that in Data Prep you can rearrange the layout of the nodes by dragging nodes on the canvas. It’s simple but very effective when you are working with complex flows. Oh yes, the layout is saved and used next time you open up your recipe.
One of the things that I often do in my Dataflows is leverage previous nodes in new branches and ultimately create an additional dataset. You can achieve the same behavior in Data Prep by creating a branch.
To create a branch hover the mouse over the line right next to a node and you’ll see the branch option (two arrows instead of a plus). If you move the mouse too far to the right you will get the plus instead which will just add the new node on the line.
The JSON looks different
I don’t know if it’s just me but when I first looked at the new Data Prep I was curious if I could modify the JSON as I can do with Dataflow and Dashboards. It’s probably a legacy need of mine, regardless you can in fact download and upload the instructions in JSON. So should you want a backup just hit the download button in the top left corner.
One thing to be aware of is that the JSON is slightly different from what you know in a Dataflow. The main difference I found is that there is a UI section that controls the layout of all the nodes. But you’ll also see that the transformations have changes, which I think is only natural given the complexity of the node. However, hopefully you don’t need to make any changes to the JSON.
If you have any additional tips for the savvy Dataflow user then feel free to drop a comment below – sharing is caring.
Data Prep is not complete… you will see the team will continue work on Data Prep providing enhancements and additional features. So if you got feedback use the “Got Feedback” button next to the JSON upload/download options, I know the product team is checking all your comments. It might be nice to know that they are working on making it possible to convert a Dataflow to Data Prep for a future release (safe harbor).