Skip to content

Sentimentanalysis

flowtask.components.SentimentAnalysis

ModelPrediction

ModelPrediction(sentiment_model='tabularisai/robust-sentiment-analysis', emotions_model='bhadresh-savani/distilbert-base-uncased-emotion', classification='sentiment-analysis', levels=5, max_length=512, use_bertweet=False, use_bert=False, use_roberta=False)

ModelPrediction

Overview

Performs sentiment analysis and emotion detection on text using Hugging Face Transformers.

This class utilizes pre-trained models for sentiment analysis and emotion detection.
It supports different model architectures like BERT, BERTweet, and RoBERTa.
The class handles text chunking for inputs exceeding the maximum token length
and provides detailed sentiment and emotion scores along with predicted labels.

Attributes:

Name Type Description
sentiment_model str

Name of the sentiment analysis model to use from Hugging Face.

emotions_model str

Name of the emotion detection model to use from Hugging Face.

classification str

Type of classification pipeline to use (e.g., 'sentiment-analysis').

levels int

Number of sentiment levels for sentiment analysis (2, 3, or 5).

max_length int

Maximum token length for input texts. Defaults to 512.

use_bertweet bool

If True, uses BERTweet model for sentiment analysis. Defaults to False.

use_bert bool

If True, uses BERT model for sentiment analysis. Defaults to False.

use_roberta bool

If True, uses RoBERTa model for sentiment analysis. Defaults to False.

Returns:

Name Type Description
DataFrame

A DataFrame with sentiment and emotion analysis results.

Includes columns for sentiment scores, sentiment labels, emotion scores, and emotion labels.

Raises:

Type Description
ComponentError

If there is an issue during text processing or data handling.

Example
SentimentAnalysis

text_column: text sentiment_model: tabularisai/robust-sentiment-analysis sentiment_levels: 5 emotions_model: bhadresh-savani/distilbert-base-uncased-emotion

Sets up the sentiment analysis and emotion detection models and tokenizers based on the provided configurations.

aggregate_sentiments

aggregate_sentiments(sentiments, levels)

Aggregates sentiment predictions from multiple texts to produce a single overall sentiment.

Calculates the average sentiment score across a list of sentiment predictions and determines the overall predicted sentiment based on these averages.

Parameters:

Name Type Description Default
sentiments list

A list of dictionaries, each containing sentiment prediction results

required
levels int

The number of sentiment levels used in the analysis, determining the sentiment map.

required

Returns:

Name Type Description
str

The aggregated predicted sentiment label (e.g., 'Positive', 'Negative', 'Neutral').

predict_emotion

predict_emotion(text)

Predicts the emotion of the input text.

Handles text chunking for long texts to ensure they fit within the model's token limit. Returns a dictionary containing emotion predictions.

Parameters:

Name Type Description Default
text str

The input text to predict emotion for.

required

Returns:

Name Type Description
dict dict

A dictionary containing emotion predictions.

dict

For example: {'emotions': [{'label': 'joy', 'score': 0.99}]}

dict

Returns an empty dictionary if the input text is empty.

predict_sentiment

predict_sentiment(text)

Predicts the sentiment of the input text.

Utilizes the sentiment analysis pipeline to classify the text and returns sentiment scores and the predicted sentiment label. Handles text chunking for texts exceeding the maximum token length.

Parameters:

Name Type Description Default
text str

The text to analyze for sentiment.

required

Returns:

Name Type Description
dict dict

A dictionary containing sentiment analysis results.

dict

Includes 'score' (list of sentiment scores) and 'predicted_sentiment' (string label).

dict

Returns None if the input text is empty.

split_into_sentences

split_into_sentences(text)

Splits a text into sentences using NLTK's sentence tokenizer.

Leverages nltk.tokenize.sent_tokenize for robust sentence splitting, handling various sentence terminators and abbreviations.

Parameters:

Name Type Description Default
text str

The input text to be split into sentences.

required

Returns:

Name Type Description
list

A list of strings, where each string is a sentence from the input text.

SentimentAnalysis

SentimentAnalysis(loop=None, job=None, stat=None, **kwargs)

Bases: FlowComponent

Applies sentiment analysis and emotion detection to a DataFrame of text data.

This component processes a DataFrame, applying Hugging Face Transformer models to analyze the sentiment and emotions expressed in a specified text column. It leverages the ModelPrediction class to perform the actual predictions and integrates these results back into the DataFrame.

Properties

text_column (str): The name of the DataFrame column containing the text to analyze. Defaults to 'text'. sentiment_model (str): Model name for sentiment analysis. Defaults to 'tabularisai/robust-sentiment-analysis'. emotions_model (str): Model name for emotion detection. Defaults to 'cardiffnlp/twitter-roberta-base-emotion'. pipeline_classification (str): Classification type for the pipeline (e.g., 'sentiment-analysis'). Defaults to 'sentiment-analysis'. with_average (bool): Boolean to indicate if sentiment should be averaged across rows (if applicable). Defaults to True. sentiment_levels (int): Number of sentiment levels (2, 3, or 5). Default is 5. use_bert (bool): Boolean to use BERT model for sentiment analysis. Defaults to False. use_roberta (bool): Boolean to use RoBERTa model for sentiment analysis. Defaults to False. use_bertweet (bool): Boolean to use BERTweet model for sentiment analysis. Defaults to False.

Returns:

Name Type Description
DataFrame

The input DataFrame augmented with new columns for sentiment scores,

predicted sentiment, emotion scores, and predicted emotion.

Specifically, it adds: 'sentiment_scores', 'sentiment_score', 'emotions_score',

'predicted_emotion', and 'predicted_sentiment' columns.

Raises:

Type Description
ComponentError

If input data is not a Pandas DataFrame or if the text column is not found.

run async

run()

Executes the sentiment analysis and emotion detection process on the input DataFrame.

Uses a single shared predictor instance to process data in larger batches. After processing, it concatenates the results and extracts relevant prediction scores and labels.

Returns:

Type Description

pd.DataFrame: The DataFrame with added sentiment and emotion analysis results.