In this practical, we are going to work with BERT! More specifically, we are going to perform sentiment analysis of movie reviews using a transformer model, have a look under its hood, and try to explain the model predictions using SHAP.
Our BERT-variant of choice is DistilBERT, a light-weight transformer whose performance is comparable to Google's BERT base model. From the authors:
[W]e leverage knowledge distillation during the pre-training phase and show that it is possible to reduce the size of a BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster.
In Part 1, we will use an off-the-shelf sentiment analysis pipeline from the Hugging Face transformers
module to classify two movie reviews.
In Part 2, we will dissasemble the sentiment analysis pipeline by performing the same analysis as in Part 1 step-by-step.
In Part 3, we will open the black box and explore which tokens were most important for DistilBERT's sentiment classification. We do this using Shapley Additive Explanations (SHAP).
In Part 4, we will fine-tune DistilBERT on the IMDB movie review dataset.
Fine-tuning a transformer model is quite resource-intensive! Switch your runtime type to GPU T4 under Runtime > Change runtime type.
Running this practical requires a more recent version of the accelerate
package than installed by default in Google Colab. Run the code below to upgrade accelerate
.
!pip install -q -U accelerate # update accelerate
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 244.2/244.2 kB 3.2 MB/s eta 0:00:00
Now restart your runtime under Runtime > Restart runtime or by pressing ctrl + M .
and clicking Yes in the pop-up message.
All set? 🙂
Since sentiment analysis is a popular application, there are off-the-shelf pipelines which we can use to quickly classify documents by sentiment. One such pipeline is part of the Hugging Face transformers
module.
🤗 Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models. Using pretrained models can reduce your compute costs, carbon footprint, and save you the time and resources required to train a model from scratch. [ ... ] The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the task summary for examples of use. [ ... ] The `pipeline()` is the most powerful object encapsulating all other pipelines.
We install the transformers
module from which we import pipeline
.
!pip install -q transformers
!pip install -q Xformers
from transformers import pipeline
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.4/7.4 MB 15.0 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 268.8/268.8 kB 25.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 34.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 44.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 109.1/109.1 MB 8.4 MB/s eta 0:00:00
#for reproducibility
from transformers import set_seed
import random
import numpy as np
seed = 137
set_seed(seed)
random.seed(seed)
np.random.seed(seed)
Pre-trained BERT models are available for many different natural language processing tasks based on the General Language Understanding Evaluation (GLUE) benchmark resources.
To showcase how to use the sentiment analysis pipeline, we will compare two relatively complex IMDB reviews of Mark Mylod's 2022 movie The Menu (2022). Load the following two reviews:
review1 = "The Menu isn't the first to satirise the rich and their incompetence and isn't saying anything new \
but that definitely doesn't prevent it from being a great satire that pokes fun at everything it can in ways that \
are often consistently funny, playful and extremely stylish. Ralph Fiennes gives a terrific performance full of awkward\
unease that only enhances his commanding screen presence. Anya Taylor-Joy is a perfect audience surrogate amongst a sea\
of deliberately unlikeable characters of which the best is Nicholas Hoult whose almost too good at making his character\
hilariously pathetic. Mark Mylod's direction is excellent, the film has more than enough visual style to match the \
pretentiousness of its characters and is really good at building tension. The music by Colin Stetson is fantastic, \
striking a unusual balance between beautiful and unnerving."
review2 = "This looked like an interesting film based on the trailer and the first half of it was just that. \
The tension and suspense was building nicely. There were little dribs and drabs and hints of what might be coming \
without being too obvious. The acting from everyone in the film was good. Even supporting characters with only a few \
lines. Were well realized I remember thinking that I couldn't wait to see where it was all going. Sadly it didn't \
really go anywhere. It all unwound in the second half. The acting was still on but the writing failed. That's the most \
i can say without giving up any spoilers. And that was extra disappointing because the first half was so good. This \
Menu did not deliver the meal as advertised."
You can skim the reviews.
print(review1)
The Menu isn't the first to satirise the rich and their incompetence and isn't saying anything new but that definitely doesn't prevent it from being a great satire that pokes fun at everything it can in ways that are often consistently funny, playful and extremely stylish. Ralph Fiennes gives a terrific performance full of awkwardunease that only enhances his commanding screen presence. Anya Taylor-Joy is a perfect audience surrogate amongst a seaof deliberately unlikeable characters of which the best is Nicholas Hoult whose almost too good at making his characterhilariously pathetic. Mark Mylod's direction is excellent, the film has more than enough visual style to match the pretentiousness of its characters and is really good at building tension. The music by Colin Stetson is fantastic, striking a unusual balance between beautiful and unnerving.
print(review2)
This looked like an interesting film based on the trailer and the first half of it was just that. The tension and suspense was building nicely. There were little dribs and drabs and hints of what might be coming without being too obvious. The acting from everyone in the film was good. Even supporting characters with only a few lines. Were well realized I remember thinking that I couldn't wait to see where it was all going. Sadly it didn't really go anywhere. It all unwound in the second half. The acting was still on but the writing failed. That's the most i can say without giving up any spoilers. And that was extra disappointing because the first half was so good. This Menu did not deliver the meal as advertised.
What is your guess of the sentiment of the following reviews? On the scale 1-10, what rating do you think the respective authors gave the movie?
1. Set up and fit a sentiment analysis pipeline to predict the sentiment of the two reviews. Define the model as 'distilbert-base-uncased-finetuned-sst-2-english'
Our BERT model will be Distilbert base uncased. Uncased means that the model disregards casing (upper or lower case). In particular, we use a distilbert version which has been fine-tuned for binary sentiment classification using the Stanford Sentiment Treebank (SST-2; Pang and Lee, 2005) corpus.
sentiment_pipeline = pipeline("sentiment-analysis", model = 'distilbert-base-uncased-finetuned-sst-2-english')
Downloading (…)lve/main/config.json: 0%| | 0.00/629 [00:00<?, ?B/s]
Downloading model.safetensors: 0%| | 0.00/268M [00:00<?, ?B/s]
Downloading (…)okenizer_config.json: 0%| | 0.00/48.0 [00:00<?, ?B/s]
Downloading (…)solve/main/vocab.txt: 0%| | 0.00/232k [00:00<?, ?B/s]
sentiment_pipeline(review1) # predict sentiment
[{'label': 'POSITIVE', 'score': 0.9983012080192566}]
sentiment_pipeline(review2) # predict sentiment
[{'label': 'NEGATIVE', 'score': 0.9622442722320557}]
Now we are going to show you how to build your own sentiment analysis pipeline from scratch. In practice, you can use the already existing one as we just have above, but it might be helpful to understand the steps associated with setting up a transformer-based pipeline for other applications you might work on.
We perform the same sentiment analysis on the same two reviews - this time step-by-step.
2. Define the tokenizer and model. For the tokenizer, use the pretrained DistilBERT tokenizer and for the model use distilbert-base-uncased-finetuned-sst-2-english
.
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
3. Tokenize the review1
and review2
objects. Pad and truncate the sequences, and return PyTorch (pt
) tensors. Save the output object as encoding
.
encoding = tokenizer([review1, review2], padding = True, truncation = True, return_tensors = 'pt') # tokenize the reviews
BERT and several other transformer models use tokenizers based on WordPiece, a subword tokenization algorithm. The main advantage of a subword tokenizer is that it interpolates between word-based and character-based tokenization. Common words get a slot in the vocabulary, but the tokenizer can fall back to word pieces and individual characters for unknown words.
Since batched inputs (our reviews) are of different lengths, they cannot be converted to fixed-size tensors to befed to the model.
There are two main strategies for solving this problem -- padding and truncation.
In order to create rectangular tensors from batches of varying lengths, padding adds a special padding token to ensure shorter sequences will have the same length as either the longest sequence in a batch or the maximum length accepted by the model. Truncation works in the other direction by truncating long sequences.
`padding = True`: pad to the longest sequence in the batch (no padding is applied if you only provide a single sequence). `truncation = True`: truncate to a maximum length specified by the max_length argument or the maximum length accepted by the model if no max_length is provided (max_length=None).
4. Inspect the encoding
object by prining the first reveiw's input ids.
print(encoding['input_ids'][0]) # first review's input_ids
tensor([ 101, 1996, 12183, 3475, 1005, 1056, 1996, 2034, 2000, 2938, 15735, 3366, 1996, 4138, 1998, 2037, 4297, 25377, 12870, 5897, 1998, 3475, 1005, 1056, 3038, 2505, 2047, 2021, 2008, 5791, 2987, 1005, 1056, 4652, 2009, 2013, 2108, 1037, 2307, 18312, 2008, 26202, 2015, 4569, 2012, 2673, 2009, 2064, 1999, 3971, 2008, 2024, 2411, 10862, 6057, 1010, 18378, 1998, 5186, 2358, 8516, 4509, 1012, 6798, 10882, 24336, 2015, 3957, 1037, 27547, 2836, 2440, 1997, 9596, 9816, 11022, 2008, 2069, 11598, 2015, 2010, 7991, 3898, 3739, 1012, 21728, 4202, 1011, 6569, 2003, 1037, 3819, 4378, 7505, 21799, 5921, 1037, 2712, 11253, 9969, 4406, 3085, 3494, 1997, 2029, 1996, 2190, 2003, 6141, 7570, 11314, 3005, 2471, 2205, 2204, 2012, 2437, 2010, 2839, 26415, 9488, 27191, 17203, 1012, 2928, 2026, 4135, 2094, 1005, 1055, 3257, 2003, 6581, 1010, 1996, 2143, 2038, 2062, 2084, 2438, 5107, 2806, 2000, 2674, 1996, 3653, 6528, 20771, 2791, 1997, 2049, 3494, 1998, 2003, 2428, 2204, 2012, 2311, 6980, 1012, 1996, 2189, 2011, 6972, 26261, 25656, 2003, 10392, 1010, 8478, 1037, 5866, 5703, 2090, 3376, 1998, 4895, 3678, 6455, 1012, 102])
We see that BERT assigns a unique id to each token (input_ids
).
5. Convert first review's input ids to tokens using convert_ids_to_tokens
to see how the text got tokenized.
print(tokenizer.convert_ids_to_tokens(encoding['input_ids'][0])) # first review's tokens
['[CLS]', 'the', 'menu', 'isn', "'", 't', 'the', 'first', 'to', 'sat', '##iri', '##se', 'the', 'rich', 'and', 'their', 'inc', '##omp', '##ete', '##nce', 'and', 'isn', "'", 't', 'saying', 'anything', 'new', 'but', 'that', 'definitely', 'doesn', "'", 't', 'prevent', 'it', 'from', 'being', 'a', 'great', 'satire', 'that', 'poke', '##s', 'fun', 'at', 'everything', 'it', 'can', 'in', 'ways', 'that', 'are', 'often', 'consistently', 'funny', ',', 'playful', 'and', 'extremely', 'st', '##yl', '##ish', '.', 'ralph', 'fi', '##enne', '##s', 'gives', 'a', 'terrific', 'performance', 'full', 'of', 'awkward', '##une', '##ase', 'that', 'only', 'enhance', '##s', 'his', 'commanding', 'screen', 'presence', '.', 'anya', 'taylor', '-', 'joy', 'is', 'a', 'perfect', 'audience', 'sur', '##rogate', 'amongst', 'a', 'sea', '##of', 'deliberately', 'unlike', '##able', 'characters', 'of', 'which', 'the', 'best', 'is', 'nicholas', 'ho', '##ult', 'whose', 'almost', 'too', 'good', 'at', 'making', 'his', 'character', '##hila', '##rio', '##usly', 'pathetic', '.', 'mark', 'my', '##lo', '##d', "'", 's', 'direction', 'is', 'excellent', ',', 'the', 'film', 'has', 'more', 'than', 'enough', 'visual', 'style', 'to', 'match', 'the', 'pre', '##ten', '##tious', '##ness', 'of', 'its', 'characters', 'and', 'is', 'really', 'good', 'at', 'building', 'tension', '.', 'the', 'music', 'by', 'colin', 'ste', '##tson', 'is', 'fantastic', ',', 'striking', 'a', 'unusual', 'balance', 'between', 'beautiful', 'and', 'un', '##ner', '##ving', '.', '[SEP]']
Note that BERT-based models also operate with special tokens:
Token | Token ID | Meaning |
---|---|---|
[CLS] |
101 |
Beginning of input |
[SEP] |
102 |
End of input or sentence |
[MASK] |
103 |
Masked tokens the model should predict |
[PAD] |
0 |
Padding |
[UNK] |
100 |
Unknown token not in training data |
</blockquote>
6. Predict the sentiment of the two reviews. In order to do this, import torch
, and define the output
object using the model, input ids and attention mask.
Now we are ready to do some sentiment prediction. We import torch
, define our model by feeding it input_ids
and attention_mask
. The attention mask is a binary tensor indicating the position of the padded indices so that the model does not attend to them.
# prediction of sentiment
import torch
output = model(input_ids = encoding['input_ids'], attention_mask = encoding['attention_mask'])
print("Predicted logits:\n\n", output['logits']) # logits
Predicted logits: tensor([[-3.1107, 3.2654], [ 1.8161, -1.4221]], grad_fn=<AddmmBackward0>)
print("Predicted probabilities:\n\n", torch.nn.functional.softmax(output['logits'], dim=-1)) # from logits to probabilities
Predicted probabilities: tensor([[0.0017, 0.9983], [0.9622, 0.0378]], grad_fn=<SoftmaxBackward0>)
prediction = torch.argmax(output['logits'], 1) # from logits to binary class
print("Predicted classes:\n", prediction)
Predicted classes: tensor([1, 0])
How do the output sentiments and probabilities compare to the off-the-shelf sentiment classification pipeline we used in Part 1?
Now that we have classified our two reviews, we might want to explain DistilBERT's predictions using Shapley Additive Values (SHAP).
7. Install the shap
module, import shap.Explainer
and feed it the sentiment_pipeline
model. Pass the two movie reviews as input for the explainer.
Note. The computation of Shapley values for DistilBERT explaining our two reviews should take about 5 minutes, but can be very computationally intensive in most real life applications.
!pip install -q shap
import shap
explainer = shap.Explainer(sentiment_pipeline)
shap_values = explainer([review1, review2])
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 547.9/547.9 kB 8.7 MB/s eta 0:00:00
0%| | 0/498 [00:00<?, ?it/s]
Partition explainer: 50%|█████ | 1/2 [00:00<?, ?it/s]
0%| | 0/498 [00:00<?, ?it/s]
Partition explainer: 3it [05:11, 155.63s/it]
8. A nice thing about the shap
module is that it comes with a built-in visualizer. Use shap.plots.text
to visualize the shap values for the first and the second movie review.
shap.plots.text(shap_values[0]) # first review
shap.plots.text(shap_values[1]) # second review
Features highlighted in red are increasing the predicted probability, while features highlighted in blue are lowering the predicted probability.
Now let's do a sentiment analysis of the IMDB dataset using the off-the-shelf sentiment analysis pipeline.
Since the DistilBERT we are using was trained on the Stanford Sentiment Treebank (SST) dataset, we also fine-tune our model for IMDb movie reviews. In practice, this might not be neccesary for this particular application, but might be good to see how it can be done.
!pip install -q datasets
!pip install -q transformers
!pip install -q evaluate
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 486.2/486.2 kB 5.7 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 110.5/110.5 kB 10.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 212.5/212.5 kB 9.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.3/134.3 kB 11.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 81.4/81.4 kB 2.1 MB/s eta 0:00:00
from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
import evaluate
import numpy as np
9. Load the IMDb dataset and sample 10% of the train
and test
.
imdb = load_dataset("imdb")
del imdb['unsupervised']
Downloading builder script: 0%| | 0.00/4.31k [00:00<?, ?B/s]
Downloading metadata: 0%| | 0.00/2.17k [00:00<?, ?B/s]
Downloading readme: 0%| | 0.00/7.59k [00:00<?, ?B/s]
Downloading and preparing dataset imdb/plain_text to /root/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0...
Downloading data: 0%| | 0.00/84.1M [00:00<?, ?B/s]
Generating train split: 0%| | 0/25000 [00:00<?, ? examples/s]
Generating test split: 0%| | 0/25000 [00:00<?, ? examples/s]
Generating unsupervised split: 0%| | 0/50000 [00:00<?, ? examples/s]
Dataset imdb downloaded and prepared to /root/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0. Subsequent calls will reuse this data.
0%| | 0/3 [00:00<?, ?it/s]
imdb["test"][0] # examine the first instance in test
{'text': 'I love sci-fi and am willing to put up with a lot. Sci-fi movies/TV are usually underfunded, under-appreciated and misunderstood. I tried to like this, I really did, but it is to good TV sci-fi as Babylon 5 is to Star Trek (the original). Silly prosthetics, cheap cardboard sets, stilted dialogues, CG that doesn\'t match the background, and painfully one-dimensional characters cannot be overcome with a \'sci-fi\' setting. (I\'m sure there are those of you out there who think Babylon 5 is good sci-fi TV. It\'s not. It\'s clichéd and uninspiring.) While US viewers might like emotion and character development, sci-fi is a genre that does not take itself seriously (cf. Star Trek). It may treat important issues, yet not as a serious philosophy. It\'s really difficult to care about the characters here as they are not simply foolish, just missing a spark of life. Their actions and reactions are wooden and predictable, often painful to watch. The makers of Earth KNOW it\'s rubbish as they have to always say "Gene Roddenberry\'s Earth..." otherwise people would not continue watching. Roddenberry\'s ashes must be turning in their orbit as this dull, cheap, poorly edited (watching it without advert breaks really brings this home) trudging Trabant of a show lumbers into space. Spoiler. So, kill off a main character. And then bring him back as another actor. Jeeez! Dallas all over again.', 'label': 0}
imdb.shape # inspect dimensions full data
{'train': (25000, 2), 'test': (25000, 2)}
Because fine-tuning on the entire IMDb dataset would be too resource-intensive to run in this practical, we will work with a randomly sampled 10% of the original train and test dataset size.
imdb_sample = imdb
imdb_sample['train'] = imdb['train'].shuffle(seed=42).select(range(int(0.1*len(imdb['train']))))
imdb_sample['test'] = imdb['test'].shuffle(seed=42).select(range(int(0.1*len(imdb['test']))))
imdb_sample.shape
{'train': (2500, 2), 'test': (2500, 2)}
10. Create a preprocessing function to tokenize text
and truncate sequences to be no longer than DistilBERT’s maximum input length. To apply the preprocessing function over the entire dataset, use Datasets map
function. You can speed up map by setting batched=True
to process multiple elements of the dataset at once.
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
def preprocess_function(examples):
return tokenizer(examples["text"], truncation=True)
tokenized_imdb = imdb_sample.map(preprocess_function, batched=True)
Map: 0%| | 0/2500 [00:00<?, ? examples/s]
Map: 0%| | 0/2500 [00:00<?, ? examples/s]
11. Load the accuracy metric from the evaluate
library to evaluate the model performance. Define a function takes the output predictions and true labels from a machine learning model. Processes the predictions to convert them into class indices, and then calculates and return the accuracy score.
accuracy = evaluate.load("accuracy")
def compute_metrics(eval_pred):
predictions, labels = eval_pred
predictions = np.argmax(predictions, axis=1)
return accuracy.compute(predictions=predictions, references=labels)
Downloading builder script: 0%| | 0.00/4.20k [00:00<?, ?B/s]
12. Load the DistilBERT model we had used earlier using AutoModelForSequenceClassification
and fine-tune it on the IMDb dataset using the Trainer
function. You can use the following training arguments:
training_args = TrainingArguments(
output_dir="tuned_model",
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=2,
weight_decay=0.01,
evaluation_strategy="epoch",
logging_steps = 100,
save_strategy="epoch",
load_best_model_at_end=True,
push_to_hub=False)
We create two dictionaries, id2label
and label2id
.
id2label = {0: "NEGATIVE", 1: "POSITIVE"}
label2id = {"NEGATIVE": 0, "POSITIVE": 1}
Fine-tuning the model should take around 5 minutes.
from transformers import set_seed
set_seed(137)
model = AutoModelForSequenceClassification.from_pretrained(
"distilbert-base-uncased-finetuned-sst-2-english", num_labels=2, id2label=id2label, label2id=label2id)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_imdb["train"],
eval_dataset=tokenized_imdb["test"],
tokenizer=tokenizer,
compute_metrics=compute_metrics)
trainer.train()
trainer.save_model()
This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
Epoch | Training Loss | Validation Loss | Accuracy |
---|---|---|---|
1 | 0.323600 | 0.233133 | 0.912000 |
2 | 0.145300 | 0.294243 | 0.908400 |
How do the results compare to the IMDb sentiment classification we performed using different neural network architectures?
13. Load the model you just fine-tuned into the pipeline and classify a sentence of choice.
classifier = pipeline("sentiment-analysis", model="tuned_model")
classifier("The movie was an experience.")
[{'label': 'POSITIVE', 'score': 0.9790797233581543}]
Let's compare the output to our initial model.
sentiment_pipeline = pipeline("sentiment-analysis", model = 'distilbert-base-uncased-finetuned-sst-2-english')
sentiment_pipeline("The movie was an experience.")
[{'label': 'POSITIVE', 'score': 0.9962561130523682}]
We see that our model is slightly different than the original SST-2 trained model. Which one do you agree with?
Pre-trained transformer models have been made available for many different tasks and by many different people. It is important to be aware that there may be bias and other limitations in the models that could affect your results.
DistilBERT is known to produce biased predictions that target underrepresented populations. For instance, for sentences like This film was filmed in COUNTRY, DistilBERT for binary classification will give radically different probabilities for the positive label depending on the country (0.89 if the country is France, but 0.08 if the country is Afghanistan) when nothing in the input indicates such a strong semantic shift.
See:
sentiment_pipeline("French movie")
[{'label': 'POSITIVE', 'score': 0.9987333416938782}]
sentiment_pipeline("Iraqi movie")
[{'label': 'NEGATIVE', 'score': 0.6413735747337341}]
classifier("French movie")
[{'label': 'POSITIVE', 'score': 0.9880437254905701}]
classifier("Iraqi movie")
[{'label': 'POSITIVE', 'score': 0.6644718050956726}]
When in doubt fine-tune and use feature importance measures!
Many code and quote blocks are adapted from the HuggingFace Documentation website. The website contains a lot of additional information and is a great resource for learners.