New enhancements in Einstein Studio to accelerate model building

3.7
(3)

This article rounds up the latest enhancements in Einstein Studio’s Model Builder, designed to simplify key stages of the machine learning lifecycle, from training data preparation and feature engineering, to model retraining and agentic flow integration.

Note: Read the previous blog on Using Model Predictions in Data Cloud One (DC1) Orgs

🔥 Explore what’s new:

  1. Simplified Model Training prep – Train using multiple DMOs
  2. Simplified Model Training prep – Train using CIs
  3. Simplified Feature engineering – 5 New feature transforms in Model Builder
    1. Text Clustering
    2. Numeric imputation / Handling numeric values
    3. Date transformations
    4. Number Bucketing
    5. Data distribution charts
  4. Streamlined Model maintenance – Schedule automatic model retraining
  5. Agentic Flow Integration – Predictive Agent Actions to power intelligent agents

Let’s get started with the set of features.

1. Simplified Model Training prep – Train using multiple DMOs

Till date, we could train models with only a single DMO i.e. a denormalized DMO with all of the profile, engagement data that you want to use as input variables in a model. To prep the data, one had to rely on Data transforms or SQL.

Now, with our new feature, you can train models using multiple DMOs. Use a base DMO, say your profile data or object like Leads and add additional variables from any other related DMOs ( those with a 1:1 relationship with the base DMO). With this, you can keep the profile DMO as is and create engagement or other activity data with complex transformation logic.

Sample use cases:

  • Lead scoring / Propensity to buy model: Base DMO = Lead; join with Engagement data grouped at a lead level
  • Service case escalation: Base DMO = Case; join with case engagement data to predict escalations.
  • Cross-sell models: Base DMO = Account; join with Opportunity history to understand buying patterns.
  • Retention models: Base DMO = Subscription; join with Support Ticket logs to capture service experience.
  • Sales rep productivity: Base DMO = Opportunity; join with Task or Event data for outreach tracking.

2. Simplified Model Training prep – Train using CIs

Similar to using related DMOs as a source of input data, related Calculated insights CIs / CIOs is a new source for input data that you can explore. Any metrics and analytical insights built for dashboard consumption as CIs can be used here.

Sample use cases:

  • Customer Lifetime value CLTV: The most quoted use case for CIs is the Lifetime value built for every customer. This can now be pulled in directly into the model training against the Customer object.
  • Churn prediction: Pull in CI for “Avg Days Since Last Engagement” as an input variable.
  • Customer tier modeling: Use CI for “Customer Tenure” and “Average Monthly Spend”.
  • Lead prioritization: Include CI like “Propensity to Close” based on past lead conversions.

3. Simplified Feature engineering – 5 New feature transforms in Model Builder

Although Data Cloud offers tooling for transforming data, for on-demand prediction scenarios, you don’t always have time to preprocess your data before making a prediction. Hence we focused on simplifying feature transformations during model build.

At training time, Model Builder will take care of processing the data before feeding it into the training pipeline. At prediction time, just pass in the raw data and the model will take care of the transformation before making the prediction.

Lets look at the new feature transformations:

3.1. Text Clustering

If you wanted to use free-form or long-text fields like notes, descriptions, email or chat data as inputs to the model, it was difficult to pull any usable data points easily. Description fields having all unique values wouldn’t have any patterns for model to learn from, hence not a usable field in its raw form.

Text Clustering as a Feature transform will solve this by

  • first analyzing the raw data in such long text fields
  • uncovering the most frequent words and pairs of words
  • assigning a cluster to each row of text.

At prediction time, the model will

  • process this long text field
  • assign it to one of the 10 clusters
  • then treat it like any other text field with limited set of values.

When to use: Your data contains free-text fields (emails, notes, chats, ticket descriptions) and want to uncover themes without manual tagging.

Sample use cases:

  • Customer support triage: Cluster support ticket descriptions into intent categories (e.g., billing, technical, shipping).
  • Marketing campaign feedback: Cluster free-text responses from surveys to derive themes.
  • HR ticketing: Cluster employee-submitted IT/help desk issues to route faster.

Note: The image in the Settings panel is indicative of the raw data distribution, not of the clustered data.

3.2. Numeric Imputation / Handling missing numbers

Numeric fields with missing values distort the patterns or become very unusable in case of high missing values %. To solve this, data scientists usually fill the missing numbers with some other values, commonly known as numeric imputation.
Model Builder provides an in-built transform that allows users to define the strategy for handling such null values.

While there are several imputation techniques, Model Builder’s transform Handle Missing Numbers lets you fill missing values with the mean / median of another field. This is known as Class-based mean imputation, where the mean imputation is carried out within the same class. (See Wikipedia reference).

Let’s illustrate this with an example.

  • Assume, the data has missing values in Age (imputation variable)
  • The mean of Gender is utilized to fill this (Class variable)
  • Now the mean age is grouped by the class values in gender.
  • The Male mean age is applied to the Missing ID of C, and the Female mean age is applied to the missing ID of F.
  • Notice how the imputed values remain within the same zone in this approach as against a global mean.

How is class-based mean imputation better than a single global mean?

  • Class-based means preserve differences between groups.
  • Keeps relationships between variables more intact than global mean substitution.
  • Features retain meaningful variation instead of collapsing toward a single constant.

When to use: Your model has an important numeric field with many missing values and you can logically infer them from related group averages.

How to decide the Class variable?

  1. Choose a variable strongly correlated with the one missing values are in. If you’re filling “Income,” maybe group by “Job role” or “Education level.”
  2. Pick a variable with enough data per group (avoid tiny groups with unstable averages).
  3. Use business knowledge to link variables:
    1. Prices may vary by Product category
    2. Time spent may vary by User segment
    3. House size may vary by location
  4. Avoid variables that could cause data leakage (e.g., using future information not available at the time of prediction).

Sample use cases:

  • Loan applications: Impute missing annual income using median income by geography.
  • Healthcare predictions: Fill missing BMI using mean of age / gender.
  • E-commerce segmentation: Handle missing “Number of purchases” using mean per customer group.

3.3. Date transformations

Raw dates are never usable as is, since they don’t provide patterns. Now users can identify patterns like day of week or month of year from a date field and use this transformed data as inputs to the model. You can identify patterns around certain engagements or transactions occurring on specific days or months.

Sample use cases:

  • Abandoned cart predictions: Extract day-of-week from cart creation date to detect shopping behavior patterns.
  • Subscription churn: Use month or season to spot churn spikes (e.g., Q1 post-holiday).
  • Sales forecasting: Capture quarter or fiscal period from transaction date for temporal trends.

3.4. Number Bucketing

All number fields are bucketed using a strategy that tries to evenly distribute values. While Einstein Studio Model Builder has always performed bucketing for numbers behind the scenes, this feature allows customers to control and change the number of buckets from the default of 10 to a max of 100 which may increase the accuracy for certain models. This is most useful when the range is large and there is need for micro-segmentation.

Sample use cases:

  • Spending segments: Group customers into 30-40 spend buckets
  • Revenue/Sales Price Bucketing: In retail, bucket thousands of continuous price points (ranging from tens to millions) into 50+ buckets to uncover micro-segments across products.
  • Long-Tail User Engagement: For metrics with wide, skewed distributions, 10 buckets give broad insights, while 50+ buckets reveal detailed patterns in the long tail.

3.5. Data distribution charts

The Settings panel also highlights min and max values for numbers and dates. For text fields, it provides a distribution of the values by frequency via a bar chart. Quickly identify data skew or outliers in numeric fields.

Text field distribution

Number field distribution

Date field distribution

4. Streamlined Model maintenance – Schedule automatic model retraining

With this new feature, you can now schedule when models should be retrained. When to re-train the model or how often to retrain the mode depends on the frequency of generating new training data

Sample use cases:

  • Daily order quantity prediction models: Adjust predictions based on daily purchase behavior, inventory levels, and promo impact.
  • Weekly sales model updates: Re-rank leads based on newest interactions (opens, replies, visits) and updated behavior every Monday
  • Monthly churn prediction models: Auto-retrain using latest subscription cancellations every month beginning.

5. Agentic Flow Integration – Predictive Agent Actions to power intelligent agents

Predictive Agent Action is a new reference action type, that allows agents to perform on-demand predictions. On-demand predictions allow users working with Agents to perform what-if scenarios and generate new, actionable, predictions powered by deterministic ML models.

Sample use cases:

  • Sales agents: Predict deal win probability on-the-fly based on current deal context.
  • Support agents: Trigger next-best-action recommendation while resolving a case.
  • Marketing agents: Simulate a campaign’s impact by tweaking customer attributes and re-predicting.

Conclusion

In this article, we explored the latest features in Einstein Studio designed to simplify and accelerate model building. These enhancements aim to support faster development, easier model maintenance, and enable Predictive AI-powered workflows that drive real business impact.

Special thanks to Bobby Brill for his review

For more resources on Model Builder:

How useful was this post?

Click on a star to rate useful the post is!

Written by

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top