New Data Prep platform made easier with native machine learning


If you have followed along with this blog series you will hopefully have noticed that what is coming with the new data platform looks great. The UI is supporting users to quickly load, transform, join and what else they need to shape their data in a dataset, check out more details in Tim Bezold’s blog that gives a closer look at the features. But what makes the new data platform truly unique and powerful goes beyond the UI; it’s because it’s built native on the Salesforce platform and provide out of the box machine learning capabilities that don’t demand technical or statistical skills to leverage. In this blog, we will go down the rabbit hole and try to uncover why this new data platform is so great.

Data crunching with machine learning

One of the best things with the Einstein Analytics data platform is that it is part of Salesforce, you can extract, transform and load your data without having to leave Salesforce; no need to log in elsewhere or worry about data being transferred outside of Salesforce. The Einstein Analytics data platform as well as Einstein Analytics itself is embedded in the Salesforce data center but has its own dedicated part that is optimized for data processing at scale. But this is actually not really new, it’s been the case from the beginning however in working with creating the new data platform the product team has not only been creating a new UI they have been working underneath the hood and given the data processing steroids, so it performs better and can more.

Since customers are seeing the flexibility of Einstein Analytics over standard reports and dashboards the use cases gets bigger and granter. Together with the increased data storage in Einstein Analytics, the result is more and more data has to be processed with more data crunching and data wrangling. In other words, high demands are being generated. The new data platform supports this. Data processing and crunching with large data volume can be done much faster than traditional approaches, which means you can create datasets with billions of rows of data. Pretty amazing.

Instead of starting from scratch, the new data platform is using an established, highly tested data processing framework Apache Spark, which ensures a robust foundation to build on. In fact, a library of components exists that gives a jump start in creating new transformations in the tool including machine learning (ML) capabilities to the benefit of all the Einstein Analytics users. And ML transformations are key in the new data platform.

With ML you can take your data to higher levels as it allows you to look at your existing data, extract patterns, interpret them, and enrich your dataset. An example in the Summer 20 release is sentiment analysis; take unstructured data like comment fields on your survey data and extract user sentiment by applying the sentiment analysis transformation. Following very simple steps and in a very short time, you are able to get even more insight into all the data you collect.

Imagine without this out of the box transformation the process becomes way more complex. Before you create your dataset you would have to:

  1. Export your data from Salesforce
  2. Import your data in your ML tool
  3. Build your model
  4. Train your model
  5. Export results
  6. Import results in Salesforce/Einstein Analytics

To add most of us are not a data scientist nor do we have a team of data scientists available at our command to give us more insight into our data. So the task and desire of leveraging ML becomes nearly impossible or the very least time consuming and manual. So really these ML transformations in Einstein Analytics is truly a blessing for any Salesforce admin with limited or no knowledge on building models as the hard and time-consuming work has already been done for you – no need to worry about how they work, just apply them to your data.

Machine learning in the data platform

As mentioned above, ML to cleanse or enrich your data as part of the data prep is with the new data platform an actual possibility at your fingertips, but what can you do? There is a lot on the roadmap but in the Summer 20 release, you can leverage sentiment analysis or missing value prediction. Let’s take a closer look at how these works.

Detect sentiment – if you have long free-form text fields in your data like a case comment or response in a survey and you want to understand the overall input without spending days reading through your millions of rows of data, detect sentiment is perfect. Picking the free-form dimension, use the transformation to extract sentiments in the form of “Positive”, “Neutral” and “Negative”. This Natural Language Processing transformation works by breaking down the long text field into word vectors and applies feedforward neural network with a pre-trained model to classify the long texts into the sentiment with the highest likelihood of being accurate.

Predict missing values – if your dataset has a column with a lot of missing values, it is possible to leverage the predict missing values transformation. When selecting this option you can select up to 3 dimensions that you think are likely predictors. For instance, if you want to predict the state it might very well be beneficial to look at the city and county columns. The underlying model will look at city and county as well as the state as input and use it to determine which state should be applied in a new state column that will be generated. This feature simply works because we spin up a model, train it on your data in the data pipeline, and apply it directly right back on the same data all as part of the ETL process.

The machine learning journey continues

Though the concept of ML itself is not new, surfacing ML models directly in the Data Prep platform available for any user to easily use to enrich or cleanse data is truly a game-changer in making data more usable for reporting and analytics. To top it all off, with the planned output connector any Salesforce Admin can leverage Einstein Analytics Data Prep and push this data not only to datasets but other data storage as Tim Bezold mentioned in his feature deep-dive blog.

The mentioned ML transformations are only the start and of course, the product team has some more ML transformations up their sleeves that they are working on and aim to bring to the Einstein Analytics data platform (forward-looking statement).

One ML transformation that the product team is keen to bring your way is clustering; looking at a column and group data into one or multiple groups based on similarities identified by the model. Hence the model will detect patterns you may not be aware of and surface them, so you can leverage this in your analysis.

Another example would be to deepen our work in Natural Language Processing, such as expanding the supported language set for Sentiment Analysis and adding aspect mining (identifying key phrases in a long text, and the relevant sentiment for each key phrase).

Of course, there are more in works including continuous improvements of the existing ML functionalities to bring more control to the end-users and higher accuracy of the models used. So remember to keep an eye out for future release notes to see how you can make your data smarter without any complicated processes, and comment below to let us know what you’d want to see in our roadmap!

Forward-looking statement

This content contains forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proved incorrect, the results of, inc. could differ materially from the results expressed or implied by the forward-looking statements we make.

Any unreleased services or features referenced in this document or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available., inc. assumes no obligation and does not intend to update these forward-looking statements.

How useful was this post?

Click on a star to rate useful the post is!

Written by

2 thoughts on “New Data Prep platform made easier with native machine learning”

  • 1
    Anurag Joshi on May 18, 2020 Reply

    Hi Rikki

    Can we do sentiment analysis of Emojis , Smileys ?

  • 2
    Mike Lowe on May 19, 2020 Reply

    It uses the standard community sentiment model from the Einstein Platform API

    So It should have Emojis in its training data as these are common in the source data type sit pulls from

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.