In this practical, we will apply both dictionary- and deep learning-based sentiment analysis approaches on the IMDB sentiment classification task.
We are going to use the following libraries. Take care to have them installed!
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
#!pip install -q vaderSentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from sklearn import metrics
Here we are going to classify movie reviews as positive or negative using the text of the review. We will use the IMDB dataset that contains the text of 50,000 movie reviews from the Internet Movie Database (IMDb). These are split into 25,000 reviews for training and 25,000 reviews for testing. The training and test sets are balanced, meaning they contain an equal number of positive and negative reviews.
1. The IMDB dataset is available on TensorFlow datasets. Use the following code to download the IMDB dataset.
# Split the training set into 60% and 40% to end up with 15,000 examples
# for training, 10,000 examples for validation and 25,000 examples for testing.
train_data, validation_data, test_data = tfds.load(
name="imdb_reviews",
split=('train[:60%]', 'train[60%:]', 'test'),
as_supervised=True)
2. Use the following code to explore the data and print the first 4 examples.
train_examples_batch, train_labels_batch = next(iter(train_data.batch(4)))
train_examples_batch
<tf.Tensor: shape=(4,), dtype=string, numpy= array([b"This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it.", b'I have been known to fall asleep during films, but this is usually due to a combination of things including, really tired, being warm and comfortable on the sette and having just eaten a lot. However on this occasion I fell asleep because the film was rubbish. The plot development was constant. Constantly slow and boring. Things seemed to happen, but with no explanation of what was causing them or why. I admit, I may have missed part of the film, but i watched the majority of it and everything just seemed to happen of its own accord without any real concern for anything else. I cant recommend this film at all.', b'Mann photographs the Alberta Rocky Mountains in a superb fashion, and Jimmy Stewart and Walter Brennan give enjoyable performances as they always seem to do. <br /><br />But come on Hollywood - a Mountie telling the people of Dawson City, Yukon to elect themselves a marshal (yes a marshal!) and to enforce the law themselves, then gunfighters battling it out on the streets for control of the town? <br /><br />Nothing even remotely resembling that happened on the Canadian side of the border during the Klondike gold rush. Mr. Mann and company appear to have mistaken Dawson City for Deadwood, the Canadian North for the American Wild West.<br /><br />Canadian viewers be prepared for a Reefer Madness type of enjoyable howl with this ludicrous plot, or, to shake your head in disgust.', b'This is the kind of film for a snowy Sunday afternoon when the rest of the world can go ahead with its own business as you descend into a big arm-chair and mellow for a couple of hours. Wonderful performances from Cher and Nicolas Cage (as always) gently row the plot along. There are no rapids to cross, no dangerous waters, just a warm and witty paddle through New York life at its best. A family film in every sense and one that deserves the praise it received.'], dtype=object)>
train_labels_batch
<tf.Tensor: shape=(4,), dtype=int64, numpy=array([0, 0, 0, 1])>
The label is an integer value of either 0 or 1, where 0 is a negative review, and 1 is a positive review.
Vader (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
The VADER lexicon is an empirically validated by multiple independent human judges, VADER incorporates a "gold-standard" sentiment lexicon that is especially attuned to microblog-like contexts.
It has some advantages:
However, there are some disadvantages:
3. Create a Vader analyzer using the SentimentIntensityAnalyzer
function, and look at the polarity scores of some example sentences.
analyzer = SentimentIntensityAnalyzer()
print(analyzer.polarity_scores("you cannot be negative"))
{'neg': 0.0, 'neu': 0.5, 'pos': 0.5, 'compound': 0.4585}
The output is 50% positive ad 50% neutral. The compound score is 0.4585.
4. Calculate the compound sentiment scores of the first 1,000 training data. Convert the final scores to 0 (negative) and 1 (positive).
train_examples_batch, train_labels_batch = next(iter(train_data.batch(1000)))
score = [0 for x in range(1000)]
for i in range(1000):
text = train_examples_batch.numpy()[i].decode("utf-8")
sent = analyzer.polarity_scores(text)['compound']
if(sent > 0):
score[i] = 1
5. Evaluate the performance of the predicted sentiment socres using the classification_report
function. How do you analyze your results?
print(metrics.classification_report(train_labels_batch, score, target_names=['negative', 'positive']))
precision recall f1-score support negative 0.78 0.53 0.63 490 positive 0.66 0.85 0.74 510 accuracy 0.70 1000 macro avg 0.72 0.69 0.69 1000 weighted avg 0.71 0.70 0.69 1000
In this part of the practical, we are going to use pre-trained word embedding models from TensorFlow Hub (https://tfhub.dev/) to do sentiment classification on movie reviews. TensorFlow Hub is a repository of trained machine learning models.
6. Use a pre-trained model from TensorFlow Hub called "google/nnlm-en-dim50/2"
, and create a Keras embedding layer that uses this model to embed the sentences, and try it out on a couple of input examples.
# Token based text embedding trained on English Google News 7B corpus.
embedding = "https://tfhub.dev/google/nnlm-en-dim50/2"
hub_layer = hub.KerasLayer(embedding, input_shape=[],
dtype=tf.string, trainable=True)
hub_layer(train_examples_batch[:3])
<tf.Tensor: shape=(3, 50), dtype=float32, numpy= array([[ 0.5423195 , -0.0119017 , 0.06337538, 0.06862972, -0.16776837, -0.10581174, 0.16865303, -0.04998824, -0.31148055, 0.07910346, 0.15442263, 0.01488662, 0.03930153, 0.19772711, -0.12215476, -0.04120981, -0.2704109 , -0.21922152, 0.26517662, -0.80739075, 0.25833532, -0.3100421 , 0.28683215, 0.1943387 , -0.29036492, 0.03862849, -0.7844411 , -0.0479324 , 0.4110299 , -0.36388892, -0.58034706, 0.30269456, 0.3630897 , -0.15227164, -0.44391504, 0.19462997, 0.19528408, 0.05666234, 0.2890704 , -0.28468323, -0.00531206, 0.0571938 , -0.3201318 , -0.04418665, -0.08550783, -0.55847436, -0.23336391, -0.20782952, -0.03543064, -0.17533456], [ 0.56338924, -0.12339553, -0.10862679, 0.7753425 , -0.07667089, -0.15752277, 0.01872335, -0.08169781, -0.3521876 , 0.4637341 , -0.08492756, 0.07166859, -0.00670817, 0.12686075, -0.19326553, -0.52626437, -0.3295823 , 0.14394785, 0.09043556, -0.5417555 , 0.02468163, -0.15456742, 0.68333143, 0.09068331, -0.45327246, 0.23180096, -0.8615696 , 0.34480393, 0.12838456, -0.58759046, -0.4071231 , 0.23061076, 0.48426893, -0.27128142, -0.5380916 , 0.47016326, 0.22572741, -0.00830663, 0.2846242 , -0.304985 , 0.04400365, 0.25025874, 0.14867121, 0.40717036, -0.15422426, -0.06878027, -0.40825695, -0.3149215 , 0.09283665, -0.20183425], [ 0.7456154 , 0.21256861, 0.14400336, 0.5233862 , 0.11032254, 0.00902788, -0.3667802 , -0.08938274, -0.24165542, 0.33384594, -0.11194605, -0.01460047, -0.0071645 , 0.19562712, 0.00685216, -0.24886718, -0.42796347, 0.18620004, -0.05241098, -0.66462487, 0.13449019, -0.22205497, 0.08633006, 0.43685386, 0.2972681 , 0.36140734, -0.7196889 , 0.05291241, -0.14316116, -0.1573394 , -0.15056328, -0.05988009, -0.08178931, -0.15569411, -0.09303783, -0.18971172, 0.07620788, -0.02541647, -0.27134508, -0.3392682 , -0.10296468, -0.27275252, -0.34078008, 0.20083304, -0.26644835, 0.00655449, -0.05141488, -0.04261917, -0.45413622, 0.20023568]], dtype=float32)>
Here you see that no matter the length of the input text, the output shape of the embeddings is: (num_examples
, embedding_dimension
).
7. Build a deep learning model using the embedding layer and one hidden layer.
model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(1))
model.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= keras_layer (KerasLayer) (None, 50) 48190600 dense (Dense) (None, 16) 816 dense_1 (Dense) (None, 1) 17 ================================================================= Total params: 48,191,433 Trainable params: 48,191,433 Non-trainable params: 0 _________________________________________________________________
8. Compile and train the model for 10 epochs in batches of 512 samples.
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
history = model.fit(train_data.shuffle(10000).batch(512),
epochs=10,
validation_data=validation_data.batch(512),
verbose=1)
Epoch 1/10 30/30 [==============================] - 44s 1s/step - loss: 0.8458 - accuracy: 0.5589 - val_loss: 0.6438 - val_accuracy: 0.6516 Epoch 2/10 30/30 [==============================] - 43s 1s/step - loss: 0.5294 - accuracy: 0.7449 - val_loss: 0.5426 - val_accuracy: 0.7504 Epoch 3/10 30/30 [==============================] - 42s 1s/step - loss: 0.3954 - accuracy: 0.8300 - val_loss: 0.4921 - val_accuracy: 0.8039 Epoch 4/10 30/30 [==============================] - 42s 1s/step - loss: 0.3034 - accuracy: 0.8844 - val_loss: 0.4856 - val_accuracy: 0.8276 Epoch 5/10 30/30 [==============================] - 42s 1s/step - loss: 0.2345 - accuracy: 0.9137 - val_loss: 0.4677 - val_accuracy: 0.8436 Epoch 6/10 30/30 [==============================] - 42s 1s/step - loss: 0.1836 - accuracy: 0.9366 - val_loss: 0.4757 - val_accuracy: 0.8530 Epoch 7/10 30/30 [==============================] - 56s 2s/step - loss: 0.1437 - accuracy: 0.9558 - val_loss: 0.4887 - val_accuracy: 0.8583 Epoch 8/10 30/30 [==============================] - 43s 1s/step - loss: 0.1130 - accuracy: 0.9669 - val_loss: 0.5123 - val_accuracy: 0.8618 Epoch 9/10 30/30 [==============================] - 43s 1s/step - loss: 0.0870 - accuracy: 0.9775 - val_loss: 0.5251 - val_accuracy: 0.8660 Epoch 10/10 30/30 [==============================] - 43s 1s/step - loss: 0.0658 - accuracy: 0.9835 - val_loss: 0.5554 - val_accuracy: 0.8660
9. Evaluate the model on the test set.
results = model.evaluate(test_data.batch(512), verbose=2)
for name, value in zip(model.metrics_names, results):
print("%s: %.3f" % (name, value))
49/49 - 10s - loss: 0.5737 - accuracy: 0.8468 - 10s/epoch - 197ms/step loss: 0.574 accuracy: 0.847
This fairly simple approach achieves an accuracy of about 85%.
10. For your next experiment load a more complex pretrained word embedding for the embedding layer. Train and evaluate your model.
embedding = "https://tfhub.dev/google/nnlm-en-dim128-with-normalization/2"
hub_layer = hub.KerasLayer(embedding, input_shape=[],
dtype=tf.string, trainable=True)
# hub_layer(train_examples_batch[:3])
Here we tried google/nnlm-en-dim128-with-normalization/2 - trained with the same NNLM (Neural Network Language Model) architecture on the same data as google/nnlm-en-dim50/2, but with a larger embedding dimension. Larger dimensional embeddings can improve on your task but it may take longer to train your model. This new model has additional text normalization such as removing punctuation. This can help if the text in your task contains additional characters or punctuation. You can try more pretrained embeddings from TensorFlow Hub, for example BERT, but rememeber that these are huge models and need a lot of training time.
In Practical 10, you will fine-tune and fit BERT!
model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(1))
model.summary()
Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= keras_layer_1 (KerasLayer) (None, 128) 124642688 dense_2 (Dense) (None, 16) 2064 dense_3 (Dense) (None, 1) 17 ================================================================= Total params: 124,644,769 Trainable params: 124,644,769 Non-trainable params: 0 _________________________________________________________________
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
history = model.fit(train_data.shuffle(10000).batch(512),
epochs=10,
validation_data=validation_data.batch(512),
verbose=1)
Epoch 1/10 30/30 [==============================] - 102s 3s/step - loss: 1.5566 - accuracy: 0.5389 - val_loss: 0.7083 - val_accuracy: 0.5945 Epoch 2/10 30/30 [==============================] - 101s 3s/step - loss: 0.5828 - accuracy: 0.6949 - val_loss: 0.5918 - val_accuracy: 0.7227 Epoch 3/10 30/30 [==============================] - 113s 4s/step - loss: 0.3984 - accuracy: 0.8423 - val_loss: 0.4974 - val_accuracy: 0.8013 Epoch 4/10 30/30 [==============================] - 103s 3s/step - loss: 0.2586 - accuracy: 0.9127 - val_loss: 0.4536 - val_accuracy: 0.8386 Epoch 5/10 30/30 [==============================] - 101s 3s/step - loss: 0.1699 - accuracy: 0.9471 - val_loss: 0.4851 - val_accuracy: 0.8606 Epoch 6/10 30/30 [==============================] - 103s 3s/step - loss: 0.1106 - accuracy: 0.9691 - val_loss: 0.4975 - val_accuracy: 0.8720 Epoch 7/10 30/30 [==============================] - 104s 3s/step - loss: 0.0751 - accuracy: 0.9830 - val_loss: 0.5257 - val_accuracy: 0.8744 Epoch 8/10 30/30 [==============================] - 113s 4s/step - loss: 0.0538 - accuracy: 0.9910 - val_loss: 0.5552 - val_accuracy: 0.8751 Epoch 9/10 30/30 [==============================] - 103s 3s/step - loss: 0.0408 - accuracy: 0.9945 - val_loss: 0.5888 - val_accuracy: 0.8751 Epoch 10/10 30/30 [==============================] - 101s 3s/step - loss: 0.0325 - accuracy: 0.9967 - val_loss: 0.6171 - val_accuracy: 0.8740
results = model.evaluate(test_data.batch(512), verbose=2)
for name, value in zip(model.metrics_names, results):
print("%s: %.3f" % (name, value))
49/49 - 21s - loss: 0.7258 - accuracy: 0.8456 - 21s/epoch - 430ms/step loss: 0.726 accuracy: 0.846