What is a deep-learning word embedding?
2 Wooclap questions
What is a deep-learning word embedding?
2 Wooclap questions
This is a nice book for both young and old. It gives beautiful life lessons in a fun way. Definitely worth the money!
+ Educational
+ Fun
+ Price
Nice story for older children.
+ Funny
- ReadabilityWooclap time
Sentiment =
Feelings, Attitudes, Emotions, Opinions
A thought, view, or attitude, especially one based mainly on emotion instead of reason
Subjective impressions, not facts
Use of natural language processing (NLP) and computational techniques to automate the extraction or classification of sentiment from unstructured text
Other terms
Can be applied in every topic & domain (non exhaustive list):
Can be applied in every topic & domain (non exhaustive list):
Regular opinions: Sentiment/opinion expressions on some target entities
Comparative opinions: ?
Regular opinions: Sentiment/opinion expressions on some target entities
Comparative opinions: Comparison of more than one entity.
Regular opinions: Sentiment/opinion expressions on some target entities
Comparative opinions: Comparison of more than one entity.
Regular opinions: Sentiment/opinion expressions on some target entities
Direct opinions:
Indirect opinions:
Comparative opinions: Comparison of more than one entity.
Regular opinions: Sentiment/opinion expressions on some target entities
Direct opinions:
Indirect opinions:
Comparative opinions: Comparison of more than one entity.
Regular opinions: Sentiment/opinion expressions on some target entities
Direct opinions:
Indirect opinions:
“After taking the drug, my pain has gone.”
Comparative opinions: Comparison of more than one entity.
An opinion is a quintuple
(
entity,
aspect,
sentiment,
holder,
time)
where
entity: target entity (or object).
aspect: aspect (or feature) of the entity.
sentiment: +, -, or neu, a rating, or an emotion.
holder: opinion holder.
time: time when the opinion was expressed.
Simplest task:
More complex:
Advanced:
Hard to do with bag of words
Must consider other features due to…
Detect sentiment in two independent dimensions:
Example: “He is brilliant but boring”
Detect sentiment in two independent dimensions:
Example: “He is brilliant but boring”
2,300 words, >70 classes
Affective Processes
Cognitive Processes
Pronouns, Negation (no, never), Quantifiers (few, many)
VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool designed specifically for social media text. Contains a pre-built lexicon of words that are associated with sentiment scores ranging from -4 to +4
Five generalizable heuristics based on grammatical and syntactical cues:
All WordNet synsets automatically annotated for degrees of positivity, negativity, and objectivity
[estimable(J,3)] “may be computed or estimated”
\[\operatorname{Pos\ \ 0\ \ \ Neg\ \ 0\ \ \ Obj\ \ 1} \]
[estimable(J,1)] “deserving of respect or high regard” \[\operatorname{Pos\ \ .75\ \ \ Neg\ \ 0\ \ \ Obj\ \ .25} \]
Positive phrases co-occur more with “excellent”
Negative phrases co-occur more with “poor”
But how to measure co-occurrence?
hits(word)/N
hits(word1 NEAR word2)/N^2
Deal with HTML and XML markup
Twitter mark-up (names, hash tags)
Capitalization (preserve for words in all caps)
Phone numbers, dates
Emoticons
Useful code:
The Porter stemmer identifies word suffixes and strips them off.
But:
objective (pos) and objection (neg) -> object
competence (pos) and compete (neg) -> compet
The problem has been studied by numerous researchers.
Key: feature engineering. A large set of features have been tried by researchers. E.g.,
Add NOT_ to every word between negation and following punctuation:
Continuous word representations model the syntactic context of words but ignore the sentiment of text
Good vs bad: They will be represented as neighboring word vectors
Solution: Learn sentiment specific word embedding, which encodes sentiment information in the continuous representation of words
https://huggingface.co/blog/sentiment-analysis-python
Twitter-roberta-base-sentiment is a roBERTa model trained on ~58M tweets and fine-tuned for sentiment analysis (https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment)
SST-2 BERT: Fine-tuned on the Stanford Sentiment Treebank (SST-2) which consists of sentences from movie reviews. The model is well-suited for general sentiment analysis tasks. (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
Bert-base-multilingual-uncased-sentiment is a model fine-tuned for sentiment analysis on product reviews in six languages: English, Dutch, German, French, Spanish and Italian (https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment)
Distilbert-base-uncased-emotion is a model fine-tuned for detecting emotions in texts, including sadness, joy, love, anger, fear and surprise (https://huggingface.co/bhadresh-savani/distilbert-base-uncased-emotion)
Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems (Kiritchenko & Mohammad, *SEM 2018)
Are systems that detect sentiment biased?
Hypothesis: a system should equally rate the intensity of the emotion expressed by two sentences that differ in the gender/race
\(>\) 75% of systems mark one gender/race with higher intensity scores than other
more widely prevalent for race than for gender
impact on downstream applications?
What about biases in LLMs?
What about biases in LLMs?
From Aurélien Géron colab