This blog uses Channel optimization as a use case to explain and contrast the 2 approaches to solving a classification problem: Multiclass model and Multiple binary models
Channel optimization, a very popular marketing use case, is essentially choosing the best channel to contact each customer from options like Email, Phone, Social Media, SMS and more.

Note: Do read other marketing use cases in this blog: From Clicks to Conversions: Marketing Use Cases Powered by Model Builder
In Einstein Studio, there are 2 ways to solve Channel Optimization (or any other similar multi-category classification problem) i.e. when you are trying to decide from multiple options:
- Option 1: Multiclass model: This helps you pick just one winning channel (or category) from several.
- Option 2: Multiple binary models: This helps you score each channel (category) separately and add your own logic on top of it.
The main reason these are 2 different options in Einstein Studio is that, Binary models primarily solve for scoring and prioritization via business rules and acts as a decisioning engine. Whereas multiclass models are designed to directly produce a single optimized choice where all outcomes compete within one unified prediction.

Note: Multiclass is currently in beta and can be enabled via Feature Manager.
At the end of this article, you’ll understand the pros and cons of each approach and more importantly know when to use each approach.
Why the Modeling Choice Matters?
At first glance, Channel Optimization seems simple: Predict the best channel and move on. In practice, your modeling choice determines:
- Whether every customer must be contacted?
- Whether “do not engage” is allowed?
- How easily you can apply cost, consent, or fatigue rules (or communication capping) into deciding the best channel?
- How flexible the system is when channels evolve?
The key question is not just Which channel works? It is alsoHow do I want to activate this prediction?
Let’s break this down clearly.
Multiclass vs. Multiple Binary Models: Aren’t they the same?

Approach 1: One Multiclass model to predict “The best channel”
A multiclass model outputs exactly one channel per customer by comparing and selecting the one with the highest score. . Example:
- Customer A → Email
- Customer B → SMS
- Customer C → Push
The question being answered is simple – “Which channel is the best way to reach this customer?“
1. How to prepare training DMO from historical data?
Let’s also understand how you need to create data for training for each campaign. (Granularity of Campaign Id)
- For each campaign, we can create engagement features (metrics) e.g. clicked in past 30 days as indicated in the table below
- Add a rule to tag the channel with highest engagement across the 4 columns as “Engaged channel”
- Consider campaign 301, Email drove the highest engagement, hence “Engaged channel” = Email, making it the Positive class
- As you can see SMS had reasonable engagement, but it is treated as non-engaged for this record
- Use “Engaged channel” as Goal variable during training.
Here is a sample training dataset showing data at customer + campaign level.

You can use the sample data from files attached for your own model.
Some considerations while preparing the data:
- Each of these metrics should be created for every channel (as columns) to avoid biasing the model towards a particular channel. If Email related features were more ( e.g 7 email features like email_sends_30d, email_opens_30d etc) the model will bias the outcomes towards email.
- All channels should have had almost equal number of outreach historically (In ML parlance, a balanced dataset). Sample scenarios include:
- If some channels were historically prioritized (non-randomized outreach) e.g. VIP customers were reached out only via calls and no other options were tried out, it gets biased
- If SMS was never used much, then SMS will now show up as zero across the training data simply because it was never attempted, yet the model interprets that zero as if SMS has low impact.
- To mitigate this, do effective sampling or filtering to reach a good mix of engagements across channels
- When calculating engagement metrics, use only customer activity 30 days before the campaign (campaign date-30 days) was sent. That way, the model learns from information that was actually available at that time. If we use information from after the campaign, like whether email opened or not, we are training the model with future information. In real life, we would not know that yet, so the model would look accurate in testing but give poor results on live data in production.
2. What does this mean for explainability?
Because one model serves all channels:
- Feature importance / Insights are more generalized
- You learn what drives overall channel choice, not what uniquely drives each channel
3. What does this mean for predictions / outcomes?
- Inference always produces one winning channel 🏅
- Scores are comparative. High score indicates that particular channel performs better than others
- There is no concept of “do not engage”. The model will choose a channel, even when confidence is low.
Note: There are ways to overcome this with a threshold based filter post model predictions
4. Implications for Model Management
- Operationally simple with one model to train, deploy, and monitor
- But less flexible because changes to the channel mix often require retraining or even rebuilding a new model.
5. Implications for segmentation and activation or downstream usage
- Deterministic routing into a single channel segment
- No additional decision logic required
- Guaranteed engagement for every customer
Approach 2: Multiple channel-specific binary models: Scoring each channel independently
How-to decompose a multiclass problem into multiple binary models with Einstein Studio:
Einstein Studio, combined with Data 360 aka Data Cloud, makes it straightforward to model each channel independently using channel-specific labels and features composed of generic channel-agnostic and channel-specific features
This approach is commonly known as One-vs-Rest (OvR) or One-vs-All (OvA).
In an OvR- based model setup:
- You train n binary models for n channels
- For each model:
- Label the target channel as Positive (Selected / Engaged)
- Label all other outcomes as Negative (Not selected / Not engaged)
- Each model predicts the likelihood that its channel is suitable for a given customer
- As expected, at inference time, every customer receives a score per channel
The question being answered is: “How suitable is each channel for this customer?“
Example predictions from across models would be:
- Customer 1: Email (0.78), Push (0.65), SMS (0.22)
- Customer 2: SMS (0.81), WhatsApp (0.74), Email (0.30)
- Customer 3: Push (0.40), Email (0.38), SMS (0.35)
A winner 🏅 can still be selected, but only after evaluating all scores.
1. How to prepare training DMO from historical data?
Let’s now create training for each of the channel specific models (Campaign id granularity), with email model as an example.
- For each of the channel specific binary models, we now have the flexibility to create different features and different number of features too.
- email_sent_date, emails_sent_last_30d, email_opens_last_30d, email_clicks_last_30d, etc.
- Goal column will be “Email_Engaged_in_30_days(Y/N)”
- What counts as engaged is flexible. You can use opens or clicks or even conversion.
- For an opens-based model, Mark it as a 1 when email_opens_last_30d >0, else 0.
- Use this goal column to Train the model.
Here is a sample training dataset showing data at customer + campaign level for email and SMS models. You can prep the training data for other models similarly.

2. What does this mean for explainability?
Because models are channel-specific with different sets of features
- Feature importance is also channel-specific. You learn what drives each channel.
- Allows more targeted insights and avoids dilution of the essence of the individual features
3. What does this mean for predictions / outcomes?
- Inference produces one score per channel.
- Scores represent independent probability of engaging in that channel
- Decisioning using individual scores:
- Score-Based Winner selection
🏅 Select the channel with the highest score across all models i.e. max(scores) - Threshold-based engagement
Apply thresholds so a channel is chosen only when scores exceeds a threshold (e.g. Scores ≥ 0.5);
⛔ Suppress engagement entirely when confidence is low - Policy-Aware routing for final channel
Layer in business rules with weights on before making the final selection:- channel cost per campaign
- prior engagement history
- consent, or fatigue controls.
This setup separates prediction from decisioning, giving you channel-level intelligence first, and business-controlled routing on that data.
- Score-Based Winner selection
The policy-aware routing for final channel option is the most suitable, with real-life constraints incorporated such that
🥈Second-best fallback options can still be made the winner.
4. Implications for model management
- Operationally heavier with One model per channel
- But Performance, drift, and thresholds can be managed independently
- Highly flexible with New channels being added as new models, with zero impact to existing models
5. Implications for segmentation and activation or downstream usage
- Customers can qualify for multiple channel segments
- Activation can layer business rules such as:
- Cost constraints
- Consent requirements
- Confidence thresholds
- Fatigue controls to suppress engagement if need be.
- Suitable for controlling costs with suppression on engagement for low-engagement segments
Bonus: How to use these predictions in segmentation and beyond?
Irrespective of which model was used to arrive at “the best channel prediction” (Multiclass or multiple binary models which was transformed to get a max score via transforms as seen above), marketers can make use of these in Segmentation + Flows to reach customers:
Approach 1 – Create a segment with predicted Best Channels and Use in Segment triggered flows:
- Segment on the DMO output to pull the entire population with predicted channel and a score.

- Optionally you can add filters when score > threshold of 0.5 to ensure predictions with higher scores are only brought into the segment
- Use this segment to create a Segment triggered flow to route customers based on their predicted “Best Channel”. Here is an oversimplified Segment-triggered flow for understanding the possibilities.
Note: Like Batch transforms, Flows is flexible enough to incorporate weights on cost, consent and convert this from a model-driven workflow to a hybrid-workflow driven by predictions but with additional rules to decide the final channel (e.g. default outcome or overriding routes)
Approach 2: Create Channel-specific segments and Use in Flows:
- You can also create 4 segments. Below is an image of Email-specific segment
- Post this, you can create 4 separate flows based on each of the segments to activate accordingly.
These are some options to take the scores from models to segmentation and beyond. The possibilities to customize are endless.
Cheatsheet on When to use Multiclass vs multiple binary models
Now lets round up this blog, with a simple cheatsheet to when to use each approach:

Conclusion
In this article, we explored the difference between multiclass and multiple binary models to handling multiple category problems.
To summarize, multiclass models always select one winning channel 🏅whereas Multiple binary models generate channel-level scores, allowing you to apply business rules before making a final decision 🛠️
The right choice depends not just on model accuracy, but on how you want to use the prediction, whether you need simplicity and guaranteed routing, or flexibility and policy-driven control.
Special thanks to:
- Bobby Brill for his review
- George Zhang and Pratyush Kumar for their ML inputs to fine tune this blog
- Kiranmai Bhamidi for her inputs on Segmentation and Flows
For more resources on Model Builder:
- Blogs:
- How to articles:
- Introducing Model Ops in Einstein Studio – Part 1: Ensure Post-Deployment Reliability with Model Monitoring
- Using Model Predictions in Data Cloud One (DC1) Orgs
- Build an AI Model with Clicks in Data Cloud
- Build Forecasting models using Einstein Studio’s Model Builder
- From Data to Decisions: Integrating Predictive AI for Key Business Workflows
- Use case series:
- How to articles:
- Salesforce Help articles:
- YouTube: Model Builder: Ask Me Anything with Salesforce Developers
- Podcast: What are the key features of Salesforce’s Model Builder?

