If you’re late to the party, Data Prep recipes have arrived as the next-generation data platform. With recipes you can quickly blend, enrich, and export your data with an easy-to-use visual editor and real-time data preview so you can see the impact of your transformations on the data as you build.
Another key benefit of the Data Prep recipes? All of the extra machine learning-powered Smart Transformations you get out of the box: Detect Sentiment, Predict Missing Value, Clustering, and Time Series Forecasting.
In this article, we will dive into the Detect Sentiment transformation that evaluates and categorizes your text fields based on the contained words. We’ll talk about what Detect Sentiment is, how it works, how we built it, how you can best use it, and what we’re doing next.
Detect Sentiment transformation At a Glance
You apply Detect Sentiment in Data Prep recipes on a dimension field as a Transform step.
When the recipe runs, the Detect Sentiment transform step generates a new column that contains the assessed sentiment in the form of “Negative”, “Neutral”, and “Positive.” Since the generated sentiment is a column, you can visualize it just like you would any other column.
Running Detect Sentiment in Data Prep Recipe has multiple advantages:
- You do not need to fund a data science team, curate labeled data, train your own model, and mitigate any ethical bias that might be inherent in your training data
- You do not need to fund a dev team to integrate your data to an API-based sentiment service
- You can run Detect Sentiment on different data by simply applying it to the column of your choice
- You can further enrich the sentiment data using the rich capabilities in Data Prep Recipes within the same job
- Your data never leaves Salesforce, and Detect Sentiment complies with Data Residency requirements
- Supports up to 32,000 characters and 2 billion rows, and you can run Detect Sentiment as frequently as you need; there is no system limit such as the number of texts you can operate on
Wait, How Does Sentiment Analysis Actually Work?
Sentiment analysis is generally implemented under two approaches: a machine learning-based approach, and a lexicon-based approach.
The machine learning (ML)-based approach works by training a sentiment model using sentiment-labeled training data, then applying the sentiment model on the input text (often referred to as a “document”) to classify the sentiment of the text.
An example of sentiment-labeled training data may be product reviews from online retailers, which usually consist of a numerical rating and a text description. With enough data, we can define 5-star ratings as “Positive”, 4-star ratings as “Neutral”, and 1-3 star ratings as “Negative”, then train the sentiment model based on those definitions.
Since the ML-based approach uses training datasets consisting of (a) real-world language usage such as grammar and verbiage, and (b) clear sentiment labeling for each text, this approach generally provides higher accuracy.
The lexicon-based approach works by comparing an input text with a pre-defined list of words (the “lexicon”) with associated sentiment scores (positive, zero, and negative numbers) to aggregate total sentiment scores for the document.
For example, suppose our lexicon defines “Excellent” with a score of 75, “Fantastic” with a score of 80, and “Terrible” with a score of -50, and “Worse” with a score of -80. Our analysis of phrases would be:
“The food was excellent and the atmosphere was fantastic” = a positive sentiment score
“The food was fantastic but service was terrible” = a slightly less positive sentiment score
“The food was terrible and the experience was even worse” = negative sentiment score
Negation (phrases like “Not Terrible”) is typically handled by detecting the negation and switching the term’s polarity from positive to negative or vice versa. Back to our example, “Not terrible” would be scored 50.
Great, but Which Approach Does Detect Sentiment Use?
Well, the Detect Sentiment Data Prep recipe transformation uses the ML-based approach: we curated labeled sentiment data and passed it through a Feed Forward Neural Network called Multilayer Perceptron to train our sentiment model. To remove social bias inherent in large training datasets, we further added synthetic labeled data to neutralize bias for identity terms. You can find out more in our model card under Ethical Considerations. The Detect Sentiment transformation can be applied to datasets with up to 2 billion rows, and additional transformations such as data aggregations can be completed as part of the recipe. It supports English, it complies with Data Residency requirements, it can handle large texts up to 32,000 characters, and there is no volume limitation. And it is all included as part of your Analytics license so no additional licenses are needed!
Apply it on customer interviews! Apply it on case descriptions, case comments, customer feedback! You can apply it on any unstructured text data, and then perform custom calculations and/or aggregations to identify hot spots or trends such as sentiments around a specific product/service or sentiment trend changes over time. You can create a dataset and visualize it in dashboards. You can write the generated sentiment back to Salesforce using our Salesforce Output Connector as an output node in the same recipe! The sky is the limit and simplicity is the theme.
Roadmap for Detect Sentiment
We hear loud and clear that you need support for more languages. Given the multi-language nature of Salesforce data, it is clear that you need a sentiment analysis algorithm/model that can handle multiple languages within the same dataset, so this is top of mind for our team.
In addition, explainability for sentiment detection has always been an important but challenging aspect. Because we use a neural network to detect sentiments, it is essentially a black box. We are exploring ways to deliver better explainability by providing options to generate numerical values and confidence levels for sentiments so that you can perform custom calculations based on your business needs.