Smart ETL with Data Prep: Detect Sentiment Got Better in Spring ‘22

5
(7)

In Spring ‘22, Detect Sentiment in Data Prep Recipes just got much more powerful with the introduction of Numerical Sentiments and Sentiment Scores!

What is Sentiment Analysis in Data Prep?

Data Prep Recipes run on our next-generation data platform, which includes machine-learning powered Smart Transformations out of the box: Detect Sentiment, Predict Missing Value, Clustering, and Time Series Forecasting. “Detect Sentiment” uses Natural Language Processing (NLP) and Machine Learning to extract sentiments from your long text fields using our pre-trained sentiment model. To extract sentiment from your long text fields, you simply apply the “Detect Sentiment” transform on the desired fields in your recipe definition, and after the recipe job runs, it will add the extracted sentiment columns in your target dataset.

Detect Sentiment in Spring ‘22

To access Detect Sentiment, you simply need to:

  • Add a Transform node to your data in your Data Prep Recipe
  • Select the text field you want to apply Detect Sentiment to
  • Click on the “Detect Sentiment” icon in the toolbar

Yes that’s correct, we introduced a “More Options” section in Detect Sentiment panel, and that’s where you can override the default sentiment format from Dimension to Measure. You also can choose to generate the sentiment scores (more on this later).

As always, the sentiment values being shown in the Data Prep Builder is for preview purpose only; to get the actual sentiment values, you need to run the recipe.

About Sentiment Scores

In Spring ‘22, we updated our sentiment behind the scene to classify texts into categories of 1, 2, 3, 4, or 5 along with the relevant probabilities/likelihood of accuracy.

Previously, when we classified the text into Positive, Neutral, and Negative, the sentiment model also provided the relevant probabilities for each category, and we returned the category with the highest probability.

In the new sentiment model, the text is classified into categories between 1 and 5, and the numerical sentiment value is calculated as a weighted sum of the categories and corresponding probabilities (called “Sentiment Scores”).

For example:
1 * 0.00 + 2 * 0.10 + 3 * 0.18 + 4 * 0.29 + 5 * 0.43 = 4.05

You will observe that the sentiment scores (or the probabilities for each classification) add up to 1.00 (or 100%), and that is absolutely intentional and something you can rely on.

If you prefer the Dimensional sentiment value of Positive/Neutral/Negative, this is how those classes are mapped to the numbers:

  • Negative is mapped to the range between 1.00 and less than 2.50
  • Neutral is mapped to the range between 2.50 and less than 3.50
  • Positive is mapped to greater than 3.50 up to 5.00

This means that, if you want a different way to calculate sentiments based on the sentiment scores for each category (instead of our weighted sum approach), such as applying different weights for the different classes, you absolutely can!

In addition, with numerical sentiment values, you can cast the numbers into different ranges (eg between -1 to 1), you can compare relative sentiment averages (eg although 3.2 is neutral, it is also more positive than a 2.6), and really, your imagination is the limit here.

Get Cracking!

I’d love to hear from you about the data you want to run sentiment analysis on, the business challenge you’re trying to solve, and what we should do next to make it work for you! Give the enhanced Detect Sentiment a shot and share your feedback with me directly on the Trailblazer Community, on Idea Exchange, or join us on Slack at DataTribe! I want to put this power at your fingertips but I need your help to make it work for you – please share your use cases and feedback with me so we can make it better!

How useful was this post?

Click on a star to rate useful the post is!

Written by


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.