Build an AI Model With Clicks In Data Cloud

Bobby Brill 19. February 2024 Add Intelligence, Spring 24 2

Read in Japanese

The all new Einstein Studio tab in Data Cloud is now GA and will allow users with access to Data Cloud to build, connect, and manage their predictive and generative models. Models which can be seamlessly integrated into any Salesforce Customer 360 application for intelligent decision making.

Previously Einstein Studio allowed users to connect to models built externally by data scientists in Amazon SageMaker or Google Vertex AI, but now we’re empowering the data specialist or analyst, who’s already very comfortable with their Data Cloud data, to build predictive models with clicks not code.

In this blog, I’ll walk you through step by step how to build a predictive model in Data Cloud using an example of predicting the likelihood of customer churn. We’ll walk through the following steps:

Preparing the Data – Utilizing Data Cloud features to prepare model training data.
Train and inspect the model – Here we will walk through the detailed steps on how to build and train the model as well as inspect the model metrics.
Operationalizing the activated model – here we will see all the ways models can be used to build intelligent Customer 360 applications.

Step 1: Preparing the Data

The majority of work required to build a predictive model actually lies in preparing the data. This is why data specialists or analysts should feel empowered to build a model because they truly know the data. In its first release, the Einstein Studio model builder can train a model based on a single Data Model Object (or DMO) in Data Cloud. While we aim to make this easier in the future, there are many tools already in Data Cloud that allow users to prepare and model data to form a table of data which can be represented as a single, denormalized, DMO. Here’s an example of how Batch Data Transforms can create a DMO for training a model.

To train a model, the historical outcome must be a column in the data – this is also known as creating a “labeled dataset”. So in the example of customer churn, I have a text column called “Churn” that shows “true” for customers that have churned and “false” for customers that still do business with my company.

I can also use Batch Data Transforms to calculate things like total charges over time and total monthly charges based on the subscription details DMO.

This output DMO could look something like this:

Data preparation can take many iterations. Thankfully Data Cloud provides the tools to make this easy.

Step 2: Train the Model

1 – Navigate to the Einstein Studio Tab and click the New button:

From here you can either create a model from scratch or connect to a model that lives in an external AI platform application like SageMaker, Vertex AI or Databricks. Since we’re training a model with clicks, click on “Create a model from scratch”.

2 – Here is where you need to choose the DMO that was created from the Batch Data Transform in Step 1. That DMO will live in a specific Data Space, so you will need to choose that first. Then you can select the DMO you want to train the model with.

You may see many DMOs in this list. Most DMOs will probably not work with Einstein Studio (yet) since they are normalized and related to many other DMOs. You will see error messages (hopefully they are helpful) if you choose a DMO that we can’t use to train a model.

3 – Once you choose the DMO, you have the option to filter it further. Yes, you could have done it in the Batch Data Transform, but this is a step that is often iterated on when training a model, so we wanted this as part of the model training process. In the example of customer churn, if I include all customers where churn = false, that might skew the model too much since I might have a lot of new customers. I might want to only consider customers that have done business with my company for at least 1 year. I can use this filter step to focus on that criteria and only include customers whose tenure is greater than 12 months.

4 – Now I can set the goal of my model. Here is where I choose the column that represents the outcome. This is also known as the labeled column or the column I want to predict. The Model Builder only supports binary classification and regression type models (for now) so I can choose any numerical field in the DMO or any text field which has only 2 unique values. Numerical fields with values as 1 and 0 will also show up and can be used to train a binary classification model as well. In this case, my outcome column, or labeled column or column I want to predict is called “Churn” as described above.

Here I can see the distribution of values. I have almost 2K examples (rows of data) of customers who have churned and 5.2K examples (rows of data) of customers who have not churned. There must be at least 25 rows for each value and a total of 400 rows to train a model – we can’t promise that will yield a good model, but Einstein always tries his best.

Next, I need to tell my model what value I want to predict. When the model runs, it will give me a score between 0 and 100 where a higher number is closer to the value I’m trying to predict – think of this as a likelihood score. So if I want to predict churn, I can choose the value true. Next, I can tell Einstein if true is good or not. This will help determine the directionality of the insights. Since churn is bad, I’ll choose “minimize”.

5 – Next I can choose the fields I want to include in my model. In the Spring ’24 version of this feature, I can choose up to 50 input variables, but there isn’t a good way to see what’s in them (yet). For now, Tableau, CRM Analytics, or any other visualization tool could be used to get a better sense of the shape and contents of each input variable. The idea here is if you know the data model very well, you should start with some hypothesis which fields best describe a customer and might be driving churn and use those as inputs.

6 – Select the algorithm and train the model. There are 3 different model algorithms that can be used to train a model. If you’re not sure what to use, just stick with the default. XGBoost is really good for a classification problem like this, so I’m going to choose that. Last I can name my model and begin training.

Step 3: Inspect the Model

When the model is complete (this one took about 5 min), I can inspect the metrics and see how well the model will be able to predict. In the Spring ’24 version of Model Builder, we use 4 fold cross validation to determine how well this model can be used to predict the outcome of unseen data. However, this model can also be activated and applied to a separate set of historical data and these metrics can be calculated manually (see Step 4 where you can use the model inside Batch Data Transform).

The model will output a score regardless of the threshold, so it’s up to you and your business requirements on how you will interpret the score. The threshold can be adjusted to see how the model performs.

If I raise the threshold it can better predict “Churn” = true, but there are also fewer examples. It may make sense to get more rows of data (i.e. more examples of customers that churn).

If I click settings, I can make changes to the filter criteria the fields I used to train the model, and the model algorithm, which will create a new version of the model so I can see if the model performs better or worse. Additionally, I can update the DMO and create a new version of the model based on the new data.

Similar to the data preparation step, model building can be iterative as well. A new version of the model can be created easily or I can go back to prior versions by clicking on the version picker.

I’m happy with the results after version 2, so I’ll activate the model. This can be done either from the Training Metrics page or the Model Details page seen below:

Step 4: Operationalize the Model

The whole reason this model was built in the first place was to help the business better understand which customers have the highest likelihood of churn so they can take steps to mitigate and save the business. Having the model integrated at the point of decision making is crucial to the success of these models, and that’s why we’ve made it easy to put these models into just about any Salesforce workflow.

Write Predictions into DMOs via Inference Builder

Inference Builder or Prediction Jobs can be found in the Usage tab within the Model Details.

Every prediction job will output predictions to a new DMO that gets automatically created and connected to an existing DMO that you must first select when creating the job. In this case, I’m going to connect my model to a DMO that has all the customers I want to get a churn score (called Customers to Score):

Next, I must tell the model where to get the data to make the prediction. I can either get data from the base object Customers to Score or choose any DMO that has a relationship to the main object by clicking Add an object. Note that both Monthly Charges and Total Charges were aggregated from the subscription DMO – I would need to make sure I have access to these calculations when I’m mapping the data to get new predictions.

Once I’ve mapped all my fields I can choose whether I want Streaming or Batch Predictions

Batch is a great option if I just want some one off predictions that I can use for marketing segmentation or so I can analyze further in an analytics tool like Tableau or CRM Analytics, or if I want to test this model on a data that wasn’t used to train the model (note that this would require me to calculate accuracy manually which can be done in an analytics tool)

Streaming is a great option to automate actions based on predictions. Every time a new record shows up in my source DMO or a record changes, it will be rescored. Then in Data Cloud, I can create Data Actions or Data Cloud Triggered Flows to automate business rules based on the results of predictions.

I can control which field updates generate new scores to avoid unnecessary calls to make new predictions.

When I go to save the Prediction Job I can give it a name, which will correspond to the name of the DMO that is getting automatically created for the Predictions. In this case, I’m going to call the job StreamingChurnPredictions.

The newly created Prediction Job will be created in the inactive state and must be activated to use.

Once the prediction job is activated it can be run to generate predictions for all the existing records. Yes, even streaming jobs can be used in batch as well to make sure all records have a prediction before update events are fired.

To see a history of the job status click on view last run:

By creating and activating the prediction job a new DMO of type ML Prediction was automatically created and a relationship was created back to the base DMO (Customers to Score) we used when setting up the job. You can find the DMO by going into the Data Model tab (make sure to use the “All” list view).

Use AI Models within Batch Data Transforms

An alternate way to write predictions to Data Cloud objects is through the Batch Data Transforms. Insert AI models as a node in Batch Data Transforms in order to use the results downstream within the transformation. This is a great way to apply predictions back to a separate set of data than was used in the model training. Using certain transformations you can compare the actual outcome to the predicted outcome to calculate accuracy, and then you can visualize this all in Tableau or CRM Analytics.

Real-time Predictions

While a Streaming Predict job is a great way to kick off Data Cloud Triggered Flows, you may want to use the model in other flow types such as Record Triggered or Screen flows. All flow types will have access to the Data Cloud Action

Simply create any flow type and add an action. Select the Data Cloud category and you will see all active Einstein Studio models in the drop-down list:

In order to use the model within the flow you must map data into each model input. This data can come from anywhere in the flow like a previous step where you ask the user to enter a value. There is no requirement that the data comes from Data Cloud:

Now you can use the model in any way you want. For example, every time something changes back in Salesforce, I may want to assess the likelihood of the related account to churn, and based on a decision node, if the churn prediction is too high I want to send an email.

Final Remarks

The new Einstein Studio Model builder now gives customers the opportunity to build models with clicks on top of Data Cloud data and uses the power of the Salesforce platform to operationalize those models within any Salesforce application to improve decision making.

Written by

Bobby Brill

I have been a Salesforce enthusiast and trailblazer for the past 12 years designing, developing and delivering innovative solutions that enable customer success. As a Product Manager with the Einstein Analytics team, I focused primarily on AI-Augmented Analytics (Einstein Discovery).

See author's posts

2 thoughts on “Build an AI Model With Clicks In Data Cloud”

1

Manuel Bolivar on February 20, 2024 Reply

Great article Bobby, right to the point.
One question: where can I get some free data to give it a try?
Thanks a lot for you work!!
- 2
  
  Peter Tobac on February 22, 2024 Reply
  
  Kaggle holds a lot of data, eg on churn: https://www.kaggle.com/datasets/blastchar/telco-customer-churn