This is a nice book for both young and old. It gives beautiful life lessons in a fun way. Definitely worth the money!
+ Educational
+ Funny
+ Price
Nice story for older children.
+ Funny
- ReadabilitySentiment
Sentiment =
Feelings, Attitudes, Emotions, Opinions
A thought, view, or attitude, especially one based mainly on emotion instead of reason
Subjective impressions, not facts
Webster’s dictionary
Webster’s dictionary
Scherer typology of affective states
- Emotion: brief organically synchronized … evaluation of a major event
- angry, sad, joyful, fearful, ashamed, proud, elated
- Mood: diffuse non-caused low-intensity long-duration change in subjective feeling
- cheerful, gloomy, irritable, listless, depressed, buoyant
- Interpersonal stances: affective stance toward another person in a specific interaction
- friendly, flirtatious, distant, cold, warm, supportive, contemptuous
- Attitudes: enduring, affectively colored beliefs, dispositions towards objects or persons
- liking, loving, hating, valuing, desiring
- Personality traits: stable personality dispositions and typical behavior tendencies
- nervous, anxious, reckless, morose, hostile, jealous
Sentiment analysis
Use of natural language processing (NLP) and computational techniques to automate the extraction or classification of sentiment from unstructured text
Other terms
- Opinion mining
- Sentiment mining
- Subjectivity analysis
Sentiment analysis
can be applied in every topic & domain
Book: is this review positive or negative?
Humanities:sentiment analysis for German historic plays.
Products: what do people think about the new iPhone?
Blog: how are people thinking about immigrants?
Politics: who is going to win the election?
Twitter: what is the trend today?
Movie: is this review positive or negative (IMDB, Netflix)?
Marketing: how is consumer confidence? Consumer attitudes?
Healthcare: are patients happy with the hospital environment?
Opinion types
Regular opinions: Sentiment/opinion expressions on some target entities
Direct opinions:
- “The touch screen is really cool.”
Indirect opinions:
- “After taking the drug, my pain has gone.”
Comparative opinions: Comparison of more than one entity.
- E.g., “iPhone is better than Blackberry.”
Practical definition
An opinion is a quintuple
( entity, aspect, sentiment, holder, time)
whereentity: target entity (or object).
Aspect: aspect (or feature) of the entity.
Sentiment: +, -, or neu, a rating, or an emotion.
holder: opinion holder.
time: time when the opinion was expressed.
Sentiment analysis
Simplest task:
- Is the attitude of this text positive or negative?
More complex:
- Rank the attitude of this text from 1 to 5
Advanced:
- Detect the target, source, or complex opinion types
- Implicit opinions or aspects
Simple task: Opinion summary
Aspect:
Touch screen
Positive: 212
The
touch screen was really cool.
The
touch screen was so easy to use and can do amazing things.
…
Negative: 6
The
screen is easily scratched.
I have a lot of difficulty in removing finger marks from the
touch screen.
…
Aspect: Size
…
Problem
Which features to use?
- Words (unigrams)
- Phrases/n-grams
- Sentences
How to interpret features for sentiment detection?
- Bag-of-words
- Annotated lexicons (WordNet, SentiWordNet)
- Syntactic patterns
- Paragraph structure
- Word embedding
Challenges
Harder than topical classification, with which bag of words features perform well
Must consider other features due to…
- Subtlety of sentiment expression
- irony
- expression of sentiment using neutral words
- Domain/context dependence
- words/phrases can mean different things in different contexts and domains
- Effect of syntax on semantics
- Subtlety of sentiment expression
Approaches for sentiment analysis
- Lexicon-based (dictionary-based) methods
Using sentiment words and phrases: good, wonderful, awesome, troublesome, cost an arm and leg
Not completely unsupervised!
- Supervised learning methods: to classify reviews into positive and negative.
- Naïve Bayes
- Maximum Entropy
- Support Vector Machine
- Deep learning
Lexicon-based Methods
LIWC (Linguistic Inquiry and Word Count)
Home page: http://liwc.wpengine.com/
2300 words, >70 classes
Affective Processes
- negative emotion (bad, weird, hate, problem, tough)
- positive emotion (love, nice, sweet)
Cognitive Processes
- Tentative (maybe, perhaps, guess), Inhibition (block, constraint)
Pronouns, Negation (no, never), Quantifiers (few, many)
Bing Liu opinion lexicon
6786 words
- 2006 positive
- 4783 negative
SentiWordNet
All WordNet synsets automatically annotated for degrees of positivity, negativity, and neutrality/objectiveness
[estimable(J,3)] “may be computed or estimated”
\[\operatorname{Pos\ \ 0\ \ \ Neg\ \ 0\ \ \ Obj\ \ 1} \][estimable(J,1)] “deserving of respect or high regard” \[\operatorname{Pos\ \ .75\ \ \ Neg\ \ 0\ \ \ Obj\ \ .25} \]
Turney algorithm
- Extract a phrasal lexicon from reviews
- Learn polarity of each phrase
- Rate a review by the average polarity of its phrases
Extract two-word phrases with adjectives
How to measure polarity of a phrase?
Positive phrases co-occur more with “excellent”
Negative phrases co-occur more with “poor”
But how to measure co-occurrence?
Pointwise Mutual Information
- PMI between two words:
- How much more do two words co-occur than if they were independent?
\[PMI(word_1,woprd_2)=log_2{\frac{P(word_1,word_2)}{P(word_1)P(word_2)}}\]
How to estimate PMI
- P(word) estimated by
hits(word)/N
- P(word1,word2) by
hits(word1 NEAR word2)/N^2
\[PMI(word_1,woprd_2)=log_2{\frac{hits(word_1 \: \mathrm{NEAR} \: word_2)}{hits(word_1)hits(word_2)}}\]
Does phrase appear more with “poor” or “excellent”?
\[ \begin{align} \mathrm{Polarity}(phrase) = \mathrm{PMI}(pharse, \mathrm{''excellent''}) - \mathrm{PMI}(pharse, \mathrm{''poor''}) \\ \end{align} \]
Phrases from a thumbs-up (positive) review
Phrase | POS.tags | Polarity |
---|---|---|
online service | JJ NN | 2.8 |
online experience | JJ NN | 2.3 |
direct deposit | JJ NN | 1.3 |
local branch | JJ NN | 0.42 |
… | ||
low fees | JJ NNS | 0.33 |
true service | JJ NN | -0.73 |
other bank | JJ NN | -0.85 |
inconveniently located | JJ NN | -1.5 |
Average | 0.32 |
Phrases from a thumbs-down (negative) review
Phrase | POS.tags | Polarity |
---|---|---|
direct deposits | JJ NNS | 5.8 |
online web | JJ NN | 1.9 |
very handy | RB JJ | 1.4 |
… | ||
virtual monopoly | JJ NN | -2 |
lesser evil | RBR JJ | -2.3 |
other problems | JJ NNS | -2.8 |
low funds | JJ NNS | -6.8 |
unethical practices | JJ NNS | -8.5 |
Average | -1.2 |
Results of Turney algorithm
- 410 reviews from Epinions
- 170 (41%) negative
- 240 (59%) positive
- Majority class baseline: 59%
- Turney algorithm: 74%
- Phrases rather than words
- Learns domain-specific information
Using WordNet to learn polarity
- WordNet: online thesaurus
- Create positive (“good”) and negative seed-words (“terrible”)
- Find Synonyms and Antonyms
- Positive Set: Add synonyms of positive words (“well”) and antonyms of negative words
- Negative Set: Add synonyms of negative words (“awful”) and antonyms of positive words (”evil”)
In R
Many packages and lexicons:
library(wordnet) # download the wordnet dictionary from # https://wordnet.princeton.edu/download/current-version setDict("C:/Program Files (x86)/WordNet/2.1/dict") Sys.setenv(WNHOME = "C:/Program Files (x86)/WordNet/2.1") synonyms("fill","VERB")
## [1] "fill" "fill up" "fulfil" "fulfill" "make full" "meet" ## [7] "occupy" "replete" "sate" "satiate" "satisfy" "take"
In R
library(tidytext) afinn_sentiments <- get_sentiments("afinn") afinn_sentiments
## # A tibble: 2,477 x 2 ## word value ## <chr> <dbl> ## 1 abandon -2 ## 2 abandoned -2 ## 3 abandons -2 ## 4 abducted -2 ## 5 abduction -2 ## 6 abductions -2 ## 7 abhor -3 ## 8 abhorred -3 ## 9 abhorrent -3 ## 10 abhors -3 ## # i 2,467 more rows
In R
nrc_sentiments <- get_sentiments("nrc") nrc_sentiments
## # A tibble: 13,901 x 2 ## word sentiment ## <chr> <chr> ## 1 abacus trust ## 2 abandon fear ## 3 abandon negative ## 4 abandon sadness ## 5 abandoned anger ## 6 abandoned fear ## 7 abandoned negative ## 8 abandoned sadness ## 9 abandonment anger ## 10 abandonment fear ## # i 13,891 more rows
In R
loughran_sentiments <- get_sentiments("loughran") loughran_sentiments
## # A tibble: 4,150 x 2 ## word sentiment ## <chr> <chr> ## 1 abandon negative ## 2 abandoned negative ## 3 abandoning negative ## 4 abandonment negative ## 5 abandonments negative ## 6 abandons negative ## 7 abdicated negative ## 8 abdicates negative ## 9 abdicating negative ## 10 abdication negative ## # i 4,140 more rows
In R
bing_sentiments <- get_sentiments("bing") bing_sentiments
## # A tibble: 6,786 x 2 ## word sentiment ## <chr> <chr> ## 1 2-faces negative ## 2 abnormal negative ## 3 abolish negative ## 4 abominable negative ## 5 abominably negative ## 6 abominate negative ## 7 abomination negative ## 8 abort negative ## 9 aborted negative ## 10 aborts negative ## # i 6,776 more rows
Learning lexicons in summary
- Advantages:
- Can be domain-specific
- Can be more robust (more words)
- Intuition
- Start with a seed set of words (‘good’, ‘poor’)
- Find other words that have similar polarity:
- Using “and” and “but”
- Using words that occur nearby in the same document
- Using WordNet synonyms and antonyms
Supervised Methods
Document sentiment classification
-
Classify a document (e.g., a review) based on the overall sentiment of the opinion holder
- Classes: Positive, negative (possibly neutral)
-
An example review:
- “I bought an iPhone a few days ago. It is such a nice phone, although a little large. The touch screen is cool. The voice quality is great too. I simply love it!”
- Classification: positive or negative?
- It is basically a text classification problem
Sentence sentiment analysis
-
Classify the sentiment expressed in a sentence
- Classes: positive, negative, neutral
- Neutral means no sentiment expressed
- “I believe he went home yesterday.”
- “I bought a iPhone yesterday”
-
But bear in mind
- Explicit opinion: “I like this car.”
- Fact-implied opinion: “I bought this car yesterday and it broke today.”
- Mixed opinion: “Apple is doing well in this poor economy”
Features for supervised learning
The problem has been studied by numerous researchers.
Key: feature engineering. A large set of features have been tried by researchers. E.g.,
- Terms frequency and different IR weighting schemes
- Part of speech (POS) tags
- Opinion words and phrases
- Negations
- Syntactic dependency
Sentiment classification in movie reviews
- Polarity detection:
- Is an IMDB movie review positive or negative?
- Data: Polarity Data 2.0:
Basic steps
- Pre-processing and tokenization
- Feature representation (DTM)
- Feature selection
- Classification
Sentiment tokenization issues
Deal with HTML and XML markup
Twitter mark-up (names, hash tags)
Capitalization (preserve forwords in all caps)
Phone numbers, dates
Emoticons
Useful code:
Extracting features for sentiment classification
How to handle negation
- I didn’t like this movie
vs - I really like this movie
- I didn’t like this movie
Which words to use?
Only adjectives
All words
- All words turns out to work better, at least on this data
Negation
Add NOT_ to every word between negation and following punctuation:
Cross-Validation
Break up data into 10 folds
- (Equal positive and negative inside each fold?)
For each fold
Choose the fold as a temporary test set
Train on 9 folds, compute performance on the test fold
Report average performance of the 10 runs
Supervised sentiment analysis
Using all words works well for some tasks
Finding subsets of words may help in other tasks
- Hand-built polarity lexicons
- Use seeds and semi-supervised learning to induce lexicons
Negation is important
Multiclass and Multilabel Classification
Classification
Multi-class classification
Sentiment: positive, negative, neutral
Emotion: angry, sad, joyful, fearful, ashamed, proud, elated
Disease: healthy, cold, flu
Weather: sunny, cloudy, rain, snow
One-vs-all (one-vs-rest)
Summary
Summary
- Sentiment analysis
- Lexicon-based methods
- Learning-based methods
- Multiclass classification
- Multi-label classification
Practical 4
Other Challenges in SA
Explicit and implicit aspects
(Hu and Liu, 2004)
Explicit aspects: Aspects explicitly mentioned as nouns or noun phrases in a sentence
- “The picture quality is of this phone is great.”
Implicit aspects: Aspects not explicitly mentioned in a sentence but are implied
- “This car is so expensive.”
- “This phone will not easily fit in a pocket.”
- “Included 16MB is stingy.”
Implicit aspects
Bagheri et al. 2013
An implicit aspect should satisfy the following conditions:
The related aspect word does not occur in the review sentence explicitly.
The aspect can be discovered by its surrounding words (e.g. opinion words) in the review sentence.
Some interesting sentences
Trying out Chrome because Firefox keeps crashing.
Firefox - negative; no opinion about chrome.
We need to segment the sentence into clauses to decide that “crashing” only applies to Firefox(?).
But how about these
I changed to Audi because BMW is so expensive.
I did not buy BMW because of the high price.
I am so happy that my iPhone is nothing like my old ugly phone.
Some interesting sentences (contd)
Conditional sentences are hard to deal with (Narayanan et al. 2009)
If I can find a good camera, I will buy it.
But conditional sentences can have opinions
- If you are looking for a good phone, buy Nokia
Questions are also hard to handle
Are there any great perks for employees?
Any idea how to fix this lousy Sony camera?
Some interesting sentences (contd)
Sarcastic sentences
- What a great car, it stopped working in the second day.
Sarcastic sentences are common in political blogs, comments and discussions.
- They make political opinions difficult to handle