Flag Null Values in your Datasets

Rikke Hovgaard 22. January 2018 Connect and Prepare Data, Spring 18 7

It would be great if your data in Salesforce was complete! I mean it would be so much easier to do reporting and get some great insight. Unfortunately, complete data is the same as seeing unicorns, it would be great, but just not realistic. So what happens when you have null values in Einstein Analytics? Well, if you group by a dimension containing null value those records with null values are omitted from the result, which means that your reporting is not so accurate. Using the dataflow editor you can actually modify your datasets to define null values, it’s really easy, and it allows you to group by null values. There are two ways of getting to the end result; one that is purely done using the visual editor and one way that demands you to modify the JSON. So let’s have a look at how it is done.

Visual Editor

If we want to do this in the visual editor, there are a few steps, which can be beneficial ones you are working with computeExpressions anyway. But personally, I like not having to modify the JSON. Anyway, the steps include the computeExpression as well as sliceDataset.

Compute Expression

I’m sure you have plenty of datasets to play with, but if you don’t you can create anything. For this example, it’s just important that you have a field in the dataset that contains null values, but hey that shouldn’t be that difficult to find. I know I have 2 null values for my lead source, so I will use this field and I’ve noted that the api name for the field is LeadSource.

Jump to your data manager and find the dataflow that contains the dataset you want to modify.

Click on it to go into the dataflow editor. Your dataflow may very well contain multiple datasets but find the one you want to modify. You can leverage the search box and type the name of your dataset and it should come up with the “register” node for that dataset. The sfdcRegister node is what actually creates the dataset in Einstein Analytics. Before we make any changes you may want to download the JSON file in case something goes wrong you can then revert back by uploading your JSON – so just enter d on your keyboard and it should download the file.

The first thing we have to do is create a new field that takes null values into consideration, we will do that with a computeExpression node. So in the menu bar find the ƒx symbol and click on it.

Next we are prompted to give the new node a name. Rule of thumb call it something meaningful to what the node is doing. I will in my case call mine “LeadSource_NullValues”. Continue when done.

Next thing we need to do is define the source of our computeExpression, so which data extract do we want to base our new logic on. I know that the node I want to use is the augment_Opportunity_User_0 node. It’s the final node before the sfdcRegister, where the dataset is created. You can use the search box to quickly find the right source node.

We now need to create the new field. So click the “+ Add Field” button. Give your new field a name and a label. Your name is the same as your Salesforce api name, so avoid spaces. Make sure your field is the type text.

We now have to define our logic or SAQL Expression, where we want to use a case statement. I’ll be using the following. You can, of course, replace “-” with “N/A” or “null” or something completely different, just remember this is what the end user will see as the Lead Source when the opportunity doesn’t have the Lead Source defined.

case
when LeadSource is null then "-"
else LeadSource
end

Click “Save”. You can add more fields to your node, but I will just need my one field. So you can click “Save” again. You can now see that your new node is attached to the augment_Opportunity_User_0 node but the output is not going anywhere.

Slice Dataset

The reality is we now have two Lead Source fields in our dataset, but we only need the new one, so we will drop the one Lead Source and exclude it from the dataset. In order to remove the original Lead Source field, we need to add another node to the flow, this time the slideDataset. So find it in the menu bar and click on it.

First, we need to name the node, I’ll be calling mine “Remove_LeadSource”.

Just as with the computeExpression we need to pick a source, this time we want to use the node we just created; in my case, it will be “LeadSource_NullValues”. Use the default setting of having mode as “drop”, since we want to drop the LeadSource field.

When you click on Fields you can search for the value or you can just scroll down to find the fields you want to drop from the dataset. Once you have found the field(s) you want to drop click on the checkmark so it turns green. When you are done click “Save”.

We now have the two new nodes attached to each other but the output still doesn’t go anywhere.

The Finishing Touches

What we have to do in order to make sure our changes are added to the dataset, we need to connect the “Remove_LeadSource” node with the “register_Oppty_0” node. So click and hold the arrow to the right of the “Remove_Leadsource” and drag it to the “register_Oppty_0”.

Now you can click on the blue “Update Dataflow” button in the top right corner. You will get a warning that you cannot revert once you update, click continue.

Jump back to the data manager and start your dataflow.

Once the dataflow has run, you can jump back to analytics studio and test your new field. Also notice that the original LeadSource field is no longer included in your dataset.

Modify JSON with defaultValue

The second way to identify default values is by adding a field attribute to the dataflow JSON as this is not possible to do in the UI. First of all, what is the dataflow JSON? Well, it’s basically all the instructions for the dataflow in one file, it includes the digests, register etc. for all the datasets in the dataflow. Naturally, that file can become quite large and harder to read than in the visual editor. Now we already fixed the Lead Source, so let’s have a look at the field NextStep – I know this field does not contain any values.

The first thing we want to do is download the dataflow JSON, so from the dataflow editor click the download button in the top right corner or simply hit d on the keyboard.

Once you have that file downloaded, open it up in Sublime, Bracket, notepad or any text editor. If you know your editor supports the JSON format then feel free to use that. If you are unsure, you can always copy the content of the JSON file and add it to an online editor like JSON Editor Online. Depending on how many nodes you have in your dataflow, this file can be huge. Who am I kidding, it’s always huge!

Anyway, we want to look for the sfdcDigest node that contains the field we want to add a default value to. Looking at the dataflow I know I am looking for “sfdcDigest_Opportunity_0”. The easiest way to find this node is by doing a search (command+F or control + F). The node should look something like what you see below. Remember I have chosen to extract certain fields, you have most likely chosen other fields in your dataset, however the concept should be the same.

"sfdcDigest_Opportunity_0": {
 "action": "sfdcDigest",
 "parameters": {
 "fields": [
 {
 "name": "Amount"
 },
 {
 "name": "CloseDate"
 },
 {
 "name": "Name"
 },
 {
 "name": "Type"
 },
 {
 "name": "StageName"
 },
 {
 "name": "IsWon"
 },
 {
 "name": "NextStep"
 },
 {
 "name": "IsClosed"
 },
 {
 "name": "LeadSource"
 },
 {
 "name": "OwnerId"
 }
 ],
 "object": "Opportunity"
 }
 },

You know want to find the field you want to add a default value to, which in my case is NextStep. So next to “NextStep” you will add the following:

,
"defaultValue": ""N/A""

Your node should now look something like this:

"sfdcDigest_Opportunity_0": {
 "action": "sfdcDigest",
 "parameters": {
 "fields": [
 {
 "name": "Amount"
 },
 {
 "name": "CloseDate"
 },
 {
 "name": "Name"
 },
 {
 "name": "Type"
 },
 {
 "name": "StageName"
 },
 {
 "name": "IsWon"
 },
 {
 "name": "NextStep",
 "defaultValue": ""N/A""
 },
 {
 "name": "IsClosed"
 },
 {
 "name": "LeadSource"
 },
 {
 "name": "OwnerId"
 }
 ],
 "object": "Opportunity"
 }
 },

The final step is to save your file; be careful and not overwrite the existing file. In case something isn’t quite right in your file, you want to be able to revert back to before your changes.

Now upload your file by clicking the upload button in the top right corner or hit u on the keyboard.

You will now get a warning telling you that once you have uploaded the new dataflow you cannot go back – that’s why we have the original dataflow downloaded. Now choose the file and save your changes.

Before we can see the actual data, we need to run the dataflow – just as we did with the Lead Source field. Once your dataflow has run, go to analytic studio to see your changes.

Written by

Rikke Hovgaard

I have worked with CRM Analytics since 2015 (back then Wave). I started this blog to help make the journey to master CRM Analytics easy by documenting tips, tricks & cool solutions.

See author's posts

7 thoughts on “Flag Null Values in your Datasets”

1

J. on March 12, 2018 Reply

The defaultvalue can now be defined from the compute expression 🙂
2

Keyuri on February 14, 2020 Reply

Thank you Rikke. This is amazing..
3

Sirish Upadhyay on November 4, 2020 Reply

Hi Rikke,

Great post! Have you come across issues with Einstein converting NULL values to zero? I have tried changing the dataset’s JSON defaultValue from “0” to “null”, as I have read on multiple sites, but then I get an Application Error that states I’m using an invalid defaultValue. Have you encountered this?

Thanks!
4

Akash Kumar on April 9, 2021 Reply

Is there any way to handle this for Multivalue fields in Dataflow ?
5

Juan Munoz on January 28, 2022 Reply

Hi Rikke,

great post.
I am having a current issue with sales and goal from sales.
The issue is the ranking of 1-etc. for null values, it shows as a -. the dash puts sales people in a rank of 1. I have tried changing this in a recipe, dataflow, and nothing has worked. I was trying to tur the null to a zero instead.

Any advice?
- 6
  
  Geoff Rothman on March 28, 2022 Reply
  
  What widget are you trying to display ranking in? (table, bar chart?) If bar chart, you can change the sort order on the widget in the right nav properties.
  
  or what if you created a new formula field on your widget and then create a case statement to evaluate and reassign?
  
  case
  when (first(‘rep_ranking’) is null then 0
  else first(‘rep_ranking’)
  end
7

Whitney French on July 28, 2023 Reply

Hi Rikke,

Thank you for this post! I was setting up Analytics Studio Event Monitoring for a client and kept running into a Failed error for one of the Nodes in the Dataflow. It continued to state “Something went wrong while executing the FlattenRole node: parent_field does not have any data (all rows have null value): ParentRoleId”. I was searching and reading through several articles, and after coming across yours I said hey, lets give this a try because this sounds like my exact issue! Nervous at first, because I didn’t want to screw anything up, but I was able to follow the Visual Editor steps by plugging in my values and when I Updated the Flow and ran it again, the error was RESOLVED! Also, the screenshots were VERY helpful! My org is on a more updated version but I was still able to make out from the instructions and screenshots what needed to be done. I hope this continues to help others who are running into this same issue.

Thanks!