Einstein Discovery allows the business scientist to explore patterns, trends and correlations in business data using Stories. The Story answers various questions, depending on the data it was trained on. Examples include Opportunity win-rate analysis, proprensity-to-buy (PTB) and Case average handling-time (AHT) or satisfaction (CSAT) in customer service.
One particularly useful component that results automatically from the story, is the model. The model is what’s actually deployed to make predictions about the future. On a record level, the model outputs contain the prediction, its top predictors (leading causes) and its top improvements (recommendations).
Different options exist for this operationalizing these model outputs. The most famous one is the no-code display in a Lightning Page, with or without storing the information in an Einstein Predictions Field, shown here. Another possibility is to use predictions in process automation to supercharge automations like Next Best Action, validation rules and workflow. It is even possible to go entirely programmatic and get predictions through APEX, or to bring predictions outside Salesforce using the Prediction Service API.
And now, coming to the purpose of this post, with the Spring ’21 release, we are adding one more easy yet very powerful way to get predictions from the model: using a Predict node in Data Recipes! As we will see, this capability greatly facilitates some steps around model creation and deployment and opens up a new set of visual deployments of the model outcomes.
Use the Predict Node to enrich a dataset with model outcomes
Recently, the data prep experience was completely revamped with the new Recipes. These recipes now also contain a special node, called ‘Discovery Predict’. When you run a recipe with a Discovery Predict node, Einstein calculates and saves model outcomes on a row-by-row basis in the dataset. You can even store top predictors and improvements. For example, a Recipe can save predicted win rates in an Opportunity dataset, including top factors, and top recommendations to improve win rates on a row level. The dataset resulting from the recipe can then be used for direct-to-dashboard visualization.
All you need to get this to work is a deployed Einstein Discovery model in the org, and a dataset that you want to enrich with the model outcomes. A complete recipe that includes a Discovery Predict node may then look like this:
As you can see, the dataset that the recipe enriches with model outcomes, doesn’t need to contain field names that are identical to the Story columns. A field mapping allows you to map Story columns to dataset fields.
Another option that is in your control is the inclusion of Top Factors and Top Improvements. Do you want these to be stored in the dataset? If so, how many? Finally, please note that this Predict node also supports Prediction sources with multiple model segments.
The Predict node is available for all model types, meaning the GLM model type as well as the Tree Based Learning Algorithms such as GBM, XGBoost and Random Forests. Generating the predictions for a tree-based model will take a bit more time than for a GLM model. Please note that the option to write Top improvements to the dataset will in the Spring ’21 release only available be for the GLM model types. Watch this space for updates when also top improvements will be written for tree-based methods!
Intelligent Applications of the Predict node in Recipes
The implications of this Predict node on intelligent applications are much larger than they may appear at first sight. Let’s explore a few examples. First we look at what this means for the business user and what new sets of insights can be leveraged. Second, we will look at what how this helps creating good Stories and checking model quality. Finally we will see how this makes certain types of model deployments easier.
Benefit for the business user and executives: Explore predictions, predictors, and improvements at an aggregate level
The fact that top positives and top negatives are stored in the dataset, allows an understanding of these at an aggregate level. Take the Opportunity win-rate example: the Recipe will enrich every Opportunity row with its corresponding prediction, top predictors and top improvements. Consequently, a dashboard built on that dataset allows you to interactively explore what the most frequent improvements are for an Account. Or alternatively, understand what the most impactful win reasons in a specific territory are. Or ask yourself: which Products have the expected highest win rate, and what is most significantly driving that win rate?
It’s important to differentiate this analysis from the insights that you can take from the story. Remember, these are model outcomes that the recipe enriches your Opportunity dataset with; and this dataset represents your current pipeline. That means that in the dashboard, you analyse the model outcomes for your today’s business, as opposed to what happened in the training data set. Therefore, you are even closer to the ball, and able to immediately take the right corrective action at an aggregate level. Do you observe that a specific sales team is seeing low predicted win rates on a certain product line? Get a dedicated enablement campaign going for those teams at once, directly impacting the closing likelihood of your current pipeline!
See this dashboard on customer Churn in a Telecommunications scenario for another illustration (image above). The churn risk is calculated at the customer level, but thanks to the dashboard that allows for an aggregation of these model outcomes, we can explore this for different pockets of our business at an aggregate level, and surface top predictors for low or high churn not for a single customer, but for that slice of the data.
Creating better Stories and checking model accuracy
Einstein Discovery models are fully transparent, also when it comes to model quality. The model metrics that are shown are based on what’s called cross-validation. That means that randomly selected subsets from the training data are kept aside as validation data, and therefore not used during the model training. These validation sets are then used to calculate the model quality, by comparing the observed outcomes with the predicted outcomes. What matters here is the word randomly: these subsets are composed of randomly selected rows and that’s how it should be, that’s how you know the model will do well across the data.
However, there will often be justified reasons to want to understand model quality on a specific subset of data. Suppose that you have just created a customer churn model looking at customer churn over the past 3 years. Suppose that the model metrics show you that it is a high quality model, you may still wonder: how much is this model inline with our recent state of the business? Stated differently, you may wonder: how accurate would the model have been, predicting the customer churn just looking at the past quarter, would it have correctly predicted the right customers churning? The answer to this question is easily found with the Predict node in Recipes. Just create a Recipe to Predict churn on your customer dataset, filter it down and compare it to the actual observed churn. Did the model find the right customers in that last quarter?
A variation to this theme is to first create a specific manual validation set and train the model on the other data. Suppose for example that you have introduced a new product line, and there aren’t a lot of closed Opportunities with this product yet, and the ones that do exist are too varying in nature to already include in the model training. To understand how well an Opportunity win rate model will work with this new product, you can create a model trained on Opportunities without this product, and then use the Predict node to predict outcomes for the Opportunities that contain the new product. Comparing that to the real win rate of those Opportunities, will give you a good sense of the accuracy of the model for this new product line!
Finally, it can also be used to ‘dark launch’ of multiple models with different setups, and build a dashboard comparing model accuracy of all deployed models. To do this, deploy multiple versions of a model, and create a recipe that writes model outcomes according to all of them to the dataset. As time passes, you can easily compare the different predictions to the observed outcome in the same dataset, and see which model is the most accurate one!
Deploy model outcomes easily and visually with dataset scoring
Finally, let’s explore how the life of the model administrator or model manager becomes so much easier using this new capability. Deploying model outcomes back to Salesforce becomes even easier than it already was.
To begin with, this allows simple in-dashboard visualization of the predictions. Would you like to show the churn risk as a gauge? The Opportunity win likelihood as a traffic light? A propensity-to-buy score expressed as category of one to five stars? Or actually plot the expected lifetime revenue on a scatter plot where we see this customer compared to peer-group customers? Such compelling visualizations require the model outcomes to be in the dataset, and sometimes require a small post-processing step (such as the conversion of a propensity-to-buy into the one to five stars)
Because the model outcomes are now persisted in the dataset, all these transformations and visualizations are easy to create. Just complete the recipe with the required data transformations, build the dashboard on the resulting dataset and embed them in the corresponding lightning pages.
Another innovation that was brought to the new recipes is the Salesforce Output Connector, which allows writing back data from the data set into a Salesforce object. This highly anticipated capability allows the user to prepare and transform data in Tableau CRM, to then push it back into Salesforce core. Of course, this is especially powerful in combination with the Predict node. To get an idea of the output connector, see the following recipe:
In an Output node, select to write to the Output Connection (1). Select the Salesforce Output connection to push data to (2) and the object to write to (3). Select whether you want the push to UPDATE, INSERT, or UPSERT data (4). Map recipe columns to their equivalent external object columns (5).
The combination of the Salesforce Output Connector and a Recipe Predict node then becomes a powerful way to push model outcomes into salesforce, next to the existing option of the writeback of Discovery Predictions into custom fields using the managed package. The refresh frequency of these model outcomes on the Salesforce records can then easily be controlled using the Recipe scheduler.
Final thoughts to consider
The Predict node in Recipes may appear as a somewhat technical feature, but its implications to Intelligent Applications are large. This capability allows to aggregate model outcomes (predictions, top factors and top improvements) easily in a dashboard and understand what is currently driving our business. The applications of that concept are countless and will redefine how business users and executives alike will work with model outcomes at scale.
It also provides many benefits for the Story author and model administrator, in the way it allows for further inspection of model accuracy, transformation of the prediction before actually showing it to the user – and of course management of the predictions flowing back to Salesforce. This has to be one of my favorite Einstein Discovery features in the Spring ‘21 release.
Now, how will you use this Predict node in Recipes to make your applications more intelligent! Let me know in the comments!