Open In Colab

Lab 2: Transformers for Text Analysis¶

logo

Transformers workshop¶

In this practical, we are going to work with transformers! Similar to the first practical, we will be going to use the Drug Review Dataset from drugs.com which is publicly availabe at the UCI Machine Learning repository.

More specifically, we are going to perform sentiment classification of drug reviews using a transformer model, have a look under its hood, try to explain the model predictions using Shapley Additive Explanations (SHAP), a method of explainable AI, and fine-tune the model.

Our BERT-variant of choice of transformers is DistilBERT, a light-weight transformer whose performance is comparable to Google's BERT base model.

In [1]:
from sklearn.metrics import f1_score, accuracy_score, confusion_matrix
import transformers
import pandas as pd
import numpy as np
import random
C:\Python\Python311\Lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
In [2]:
# set the seeds so we might be able to get the same results!
seed = 137
random.seed(seed)
np.random.seed(seed)
transformers.set_seed(seed)
WARNING:tensorflow:From C:\Python\Python311\Lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

Overview¶

  1. We will use an off-the-shelf sentiment analysis pipeline from the Hugging Face transformers module to classify two reviews.
  2. We will dissasemble the sentiment analysis pipeline by performing the same analysis as in Part 1 step-by-step.
  3. We will open the black box and explore which tokens were most important for DistilBERT's sentiment classification. We do this using Shapley Additive Explanations (SHAP).
  4. We will fine-tune DistilBERT on a sample of the Drug Review Dataset.

Load Data¶

Let's first load the data and run the preprocessing code from the first lab. Below more information on the dataset:

  • The Drug Review Dataset provides patient reviews on specific drugs along with related conditions and a 10-star patient rating reflecting the overall patient satisfaction.
  • The data was obtained by crawling online pharmaceutical review sites.
  • The dataset is of shape (161297, 7) i.e. It has 7 features including the review and 161297 Data Points or entries.

The features are 'drugName' which is the name of the drug, 'condition' which is the condition the patient is suffering from, 'review' is the patients review, 'rating' is the 10-star patient rating for the drug, 'date' is the date of the entry and the 'usefulcount' is the number of users who found the review usefule.

In [3]:
# load the data
df_train = pd.read_csv("data/drugsComTrain_raw.tsv",sep='\t')
df_test = pd.read_csv("data/drugsComTest_raw.tsv",sep='\t')

# Prepare the label for training
df_train['label'] = 'neutral'
df_train.loc[df_train['rating'] >= 6, 'label'] = 'positive'
df_train.loc[df_train['rating'] < 6, 'label'] = 'negative'

# Prepare the label for test
df_test['label'] = 'neutral'
df_test.loc[df_test['rating'] >= 6, 'label'] = 'positive'
df_test.loc[df_test['rating'] < 6, 'label'] = 'negative'

Off-the-shelf Sentiment Analysis Pipeline¶

Since sentiment analysis is a popular application, there are off-the-shelf pipelines which we can use to quickly classify documents by sentiment. One such pipeline is part of the Hugging Face transformers module.

In this module, there are various pre-trained BERT models available for many different NLP tasks based on the General Language Understanding Evaluation (GLUE) benchmark resources.

To showcase how to use the sentiment analysis pipeline, we will compare two random drug reviews from our dataset:

In [4]:
review1 = df_train.loc[1, 'review']
review2 = df_train.loc[7456, 'review']

You can skim the reviews.

In [5]:
print(review1)
"My son is halfway through his fourth week of Intuniv. We became concerned when he began this last week, when he started taking the highest dose he will be on. For two days, he could hardly get out of bed, was very cranky, and slept for nearly 8 hours on a drive home from school vacation (very unusual for him.) I called his doctor on Monday morning and she said to stick it out a few days. See how he did at school, and with getting up in the morning. The last two days have been problem free. He is MUCH more agreeable than ever. He is less emotional (a good thing), less cranky. He is remembering all the things he should. Overall his behavior is better. 
We have tried many different medications and so far this is the most effective."
In [6]:
print(review2)
"I received the Implanon after giving birth to my first son. I thought this would be great because I forget to take the pill. Well it is great. I am not pregnant. However, I am having the WORST side effect of them all. I AM LOOSING MY HAIR. I have got to get it out. I walked in thinking that this would not happen to me, side effects never effect me. Good luck to the rest of you."

What is your guess of the sentiment of these reviews? On the scale 1-10, what rating do you think the respective authors gave the drug?

In [7]:
print("The rating for review1 is", df_train.loc[1, 'rating'])
print("The label for review1 is", df_train.loc[1, 'label'])
print("The rating for review2 is", df_train.loc[7456, 'rating'])
print("The label for review2 is", df_train.loc[7456, 'label'])
The rating for review1 is 8.0
The label for review1 is positive
The rating for review2 is 3.0
The label for review2 is negative

1. Set up and fit a sentiment analysis pipeline to predict the sentiment of the two reviews. Define the model as 'distilbert-base-uncased-finetuned-sst-2-english'

Our BERT model will be Distilbert base uncased. Uncased means that the model disregards casing (upper or lower case). In particular, we use a distilbert version which has been fine-tuned for binary sentiment classification using the Stanford Sentiment Treebank (SST-2; Pang and Lee, 2005) corpus.

In [8]:
sentiment_pipeline = transformers.pipeline("sentiment-analysis", model = 'distilbert-base-uncased-finetuned-sst-2-english')
In [9]:
sentiment_pipeline(review1) # predict sentiment
Out[9]:
[{'label': 'POSITIVE', 'score': 0.9062138795852661}]
In [10]:
sentiment_pipeline(review2) # predict sentiment
Out[10]:
[{'label': 'NEGATIVE', 'score': 0.9842180013656616}]

For each review, we see the output label and the associated probability.

Do you agree with model predictions?

Here is the ground truth:

Review 1 is a positive review with a rating of 8/10.

Review 2 is a negative review with a rating of 3/10.

Does this match your human prediction?

Now we are going to show you how to build your own sentiment analysis pipeline from scratch. In practice, you can use the already existing one as we just have above, but it might be helpful to understand the steps associated with setting up a transformer-based pipeline for other applications you might work on.

Sentiment Analysis Pipeline - Deconstructed¶

We perform the same sentiment analysis on the same two reviews - this time step-by-step.

2. Define the tokenizer and the model. For the tokenizer, use the pretrained DistilBERT tokenizer and for the model use distilbert-base-uncased-finetuned-sst-2-english.

In [11]:
tokenizer = transformers.DistilBertTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
model = transformers.DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

3. Tokenize the review1 and review2 objects. Pad and truncate the sequences, and return PyTorch (pt) tensors. Save the output object as encoding.

In [12]:
encoding = tokenizer([review1, review2], padding = True, truncation = True, return_tensors = 'pt') # tokenize the reviews

BERT and several other transformer models use tokenizers based on WordPiece, a subword tokenization algorithm. The main advantage of a subword tokenizer is that it interpolates between word-based and character-based tokenization. Common words get a slot in the vocabulary, but the tokenizer can fall back to word pieces and individual characters for unknown words.

Since batched inputs (our reviews) are of different lengths, they cannot be converted to fixed-size tensors to befed to the model.

There are two main strategies for solving this problem -- padding and truncation.

In order to create rectangular tensors from batches of varying lengths, padding adds a special padding token to ensure shorter sequences will have the same length as either the longest sequence in a batch or the maximum length accepted by the model. Truncation works in the other direction by truncating long sequences.

padding = True: pad to the longest sequence in the batch (no padding is applied if you only provide a single sequence).

truncation = True: truncate to a maximum length specified by the max_length argument or the maximum length accepted by the model if no max_length is provided (max_length=None).

4. Inspect the encoding object by prining the first reveiw's input ids.

In [13]:
print(encoding['input_ids'][0]) # first review's input_ids
tensor([  101,  1000,  2026,  2365,  2003,  8576,  2083,  2010,  2959,  2733,
         1997, 20014, 19496,  2615,  1012,  2057,  2150,  4986,  2043,  2002,
         2211,  2023,  2197,  2733,  1010,  2043,  2002,  2318,  2635,  1996,
         3284, 13004,  2002,  2097,  2022,  2006,  1012,  2005,  2048,  2420,
         1010,  2002,  2071,  6684,  2131,  2041,  1997,  2793,  1010,  2001,
         2200, 27987,  2100,  1010,  1998,  7771,  2005,  3053,  1022,  2847,
         2006,  1037,  3298,  2188,  2013,  2082, 10885,  1006,  2200,  5866,
         2005,  2032,  1012,  1007,  1045,  2170,  2010,  3460,  2006,  6928,
         2851,  1998,  2016,  2056,  2000,  6293,  2009,  2041,  1037,  2261,
         2420,  1012,  2156,  2129,  2002,  2106,  2012,  2082,  1010,  1998,
         2007,  2893,  2039,  1999,  1996,  2851,  1012,  1996,  2197,  2048,
         2420,  2031,  2042,  3291,  2489,  1012,  2002,  2003,  2172,  2062,
         5993,  3085,  2084,  2412,  1012,  2002,  2003,  2625,  6832,  1006,
         1037,  2204,  2518,  1007,  1010,  2625, 27987,  2100,  1012,  2002,
         2003, 10397,  2035,  1996,  2477,  2002,  2323,  1012,  3452,  2010,
         5248,  2003,  2488,  1012,  2057,  2031,  2699,  2116,  2367, 20992,
         1998,  2061,  2521,  2023,  2003,  1996,  2087,  4621,  1012,  1000,
          102])

We see that BERT assigns a unique id to each token (input_ids).

5. Convert first review's input ids to tokens using convert_ids_to_tokens to see how the text got tokenized.

In [14]:
print(tokenizer.convert_ids_to_tokens(encoding['input_ids'][0])) # first review's tokens
['[CLS]', '"', 'my', 'son', 'is', 'halfway', 'through', 'his', 'fourth', 'week', 'of', 'int', '##uni', '##v', '.', 'we', 'became', 'concerned', 'when', 'he', 'began', 'this', 'last', 'week', ',', 'when', 'he', 'started', 'taking', 'the', 'highest', 'dose', 'he', 'will', 'be', 'on', '.', 'for', 'two', 'days', ',', 'he', 'could', 'hardly', 'get', 'out', 'of', 'bed', ',', 'was', 'very', 'crank', '##y', ',', 'and', 'slept', 'for', 'nearly', '8', 'hours', 'on', 'a', 'drive', 'home', 'from', 'school', 'vacation', '(', 'very', 'unusual', 'for', 'him', '.', ')', 'i', 'called', 'his', 'doctor', 'on', 'monday', 'morning', 'and', 'she', 'said', 'to', 'stick', 'it', 'out', 'a', 'few', 'days', '.', 'see', 'how', 'he', 'did', 'at', 'school', ',', 'and', 'with', 'getting', 'up', 'in', 'the', 'morning', '.', 'the', 'last', 'two', 'days', 'have', 'been', 'problem', 'free', '.', 'he', 'is', 'much', 'more', 'agree', '##able', 'than', 'ever', '.', 'he', 'is', 'less', 'emotional', '(', 'a', 'good', 'thing', ')', ',', 'less', 'crank', '##y', '.', 'he', 'is', 'remembering', 'all', 'the', 'things', 'he', 'should', '.', 'overall', 'his', 'behavior', 'is', 'better', '.', 'we', 'have', 'tried', 'many', 'different', 'medications', 'and', 'so', 'far', 'this', 'is', 'the', 'most', 'effective', '.', '"', '[SEP]']

Note that BERT-based models also operate with special tokens:


Token Token ID Meaning
[CLS] 101 Beginning of input
[SEP] 102 End of input or sentence
[MASK] 103 Masked tokens the model should predict
[PAD] 0 Padding
[UNK] 100 Unknown token not in training data

6. Predict the sentiment of the two reviews. In order to do this, import torch, and define the output object using the model, input ids and attention mask.

Now we are ready to do some sentiment prediction. We import torch, define our model by feeding it input_ids and attention_mask. The attention mask is a binary tensor indicating the position of the padded indices so that the model does not attend to them.

In [15]:
# prediction of sentiment
import torch
output = model(input_ids = encoding['input_ids'], attention_mask = encoding['attention_mask'])
In [16]:
print("Predicted logits:\n\n", output['logits']) # logits
Predicted logits:

 tensor([[-1.0547,  1.2136],
        [ 2.2059, -1.9270]], grad_fn=<AddmmBackward0>)
In [17]:
print("Predicted probabilities:\n\n", torch.nn.functional.softmax(output['logits'], dim=-1)) # from logits to probabilities
Predicted probabilities:

 tensor([[0.0938, 0.9062],
        [0.9842, 0.0158]], grad_fn=<SoftmaxBackward0>)
In [18]:
prediction = torch.argmax(output['logits'], 1) # from logits to a binary class
print("Predicted classes:\n", prediction)
Predicted classes:
 tensor([1, 0])

How do the output sentiments and probabilities compare to the off-the-shelf sentiment classification pipeline we used in the previous section?

Feature Importance with SHAP¶

Now that we have classified our two reviews, we might want to explain DistilBERT's predictions using Shapley Additive Values (SHAP).

7. Install the shap module, import shap.Explainer and feed it the sentiment_pipeline model. Pass the two drug reviews as input for the explainer.

Note. The computation of Shapley values for DistilBERT explaining our two reviews should take couple of minutes, but can be very computationally intensive in most real life applications.

In [19]:
# !pip install -q shap
import shap
explainer = shap.Explainer(sentiment_pipeline)
shap_values = explainer([review1, review2])
PartitionExplainer explainer:  50%|███████████████████████████████                               | 1/2 [00:00<?, ?it/s]
  0%|                                                                                          | 0/498 [00:00<?, ?it/s]
 17%|█████████████▊                                                                  | 86/498 [00:00<00:02, 205.16it/s]
 22%|█████████████████▋                                                              | 110/498 [00:02<00:08, 43.87it/s]
 24%|███████████████████▌                                                            | 122/498 [00:02<00:11, 32.89it/s]
 26%|████████████████████▌                                                           | 128/498 [00:03<00:12, 29.01it/s]
 27%|█████████████████████▌                                                          | 134/498 [00:03<00:14, 25.50it/s]
 28%|██████████████████████▍                                                         | 140/498 [00:04<00:15, 22.71it/s]
 29%|███████████████████████▍                                                        | 146/498 [00:04<00:17, 20.60it/s]
 31%|████████████████████████▍                                                       | 152/498 [00:04<00:18, 18.83it/s]
 32%|█████████████████████████▍                                                      | 158/498 [00:05<00:19, 17.70it/s]
 33%|██████████████████████████▎                                                     | 164/498 [00:05<00:20, 16.62it/s]
 34%|███████████████████████████▎                                                    | 170/498 [00:06<00:20, 16.00it/s]
 35%|████████████████████████████▎                                                   | 176/498 [00:06<00:20, 15.62it/s]
 37%|█████████████████████████████▏                                                  | 182/498 [00:07<00:20, 15.18it/s]
 38%|██████████████████████████████▏                                                 | 188/498 [00:07<00:20, 14.91it/s]
 39%|███████████████████████████████▏                                                | 194/498 [00:07<00:20, 14.96it/s]
 40%|████████████████████████████████▏                                               | 200/498 [00:08<00:20, 14.76it/s]
 41%|█████████████████████████████████                                               | 206/498 [00:08<00:19, 14.81it/s]
 43%|██████████████████████████████████                                              | 212/498 [00:09<00:19, 14.57it/s]
 44%|███████████████████████████████████                                             | 218/498 [00:09<00:19, 14.38it/s]
 45%|███████████████████████████████████▉                                            | 224/498 [00:09<00:19, 14.35it/s]
 46%|████████████████████████████████████▉                                           | 230/498 [00:10<00:18, 14.33it/s]
 47%|█████████████████████████████████████▉                                          | 236/498 [00:10<00:18, 14.37it/s]
 49%|██████████████████████████████████████▉                                         | 242/498 [00:11<00:17, 14.53it/s]
 50%|███████████████████████████████████████▊                                        | 248/498 [00:11<00:17, 14.29it/s]
 51%|████████████████████████████████████████▊                                       | 254/498 [00:12<00:16, 14.39it/s]
 52%|█████████████████████████████████████████▊                                      | 260/498 [00:12<00:16, 14.38it/s]
 53%|██████████████████████████████████████████▋                                     | 266/498 [00:12<00:16, 14.38it/s]
 55%|███████████████████████████████████████████▋                                    | 272/498 [00:13<00:15, 14.38it/s]
 56%|████████████████████████████████████████████▋                                   | 278/498 [00:13<00:15, 14.39it/s]
 57%|█████████████████████████████████████████████▌                                  | 284/498 [00:14<00:14, 14.40it/s]
 58%|██████████████████████████████████████████████▌                                 | 290/498 [00:14<00:14, 14.45it/s]
 59%|███████████████████████████████████████████████▌                                | 296/498 [00:14<00:14, 14.39it/s]
 61%|████████████████████████████████████████████████▌                               | 302/498 [00:15<00:13, 14.56it/s]
 62%|█████████████████████████████████████████████████▍                              | 308/498 [00:15<00:13, 14.44it/s]
 63%|██████████████████████████████████████████████████▍                             | 314/498 [00:16<00:12, 14.33it/s]
 64%|███████████████████████████████████████████████████▍                            | 320/498 [00:16<00:12, 14.44it/s]
 65%|████████████████████████████████████████████████████▎                           | 326/498 [00:17<00:11, 14.51it/s]
 67%|█████████████████████████████████████████████████████▎                          | 332/498 [00:17<00:11, 14.45it/s]
 68%|██████████████████████████████████████████████████████▎                         | 338/498 [00:17<00:11, 14.46it/s]
 69%|███████████████████████████████████████████████████████▎                        | 344/498 [00:18<00:10, 14.40it/s]
 70%|████████████████████████████████████████████████████████▏                       | 350/498 [00:18<00:10, 14.41it/s]
 71%|█████████████████████████████████████████████████████████▏                      | 356/498 [00:19<00:09, 14.30it/s]
 73%|██████████████████████████████████████████████████████████▏                     | 362/498 [00:19<00:09, 14.35it/s]
 74%|███████████████████████████████████████████████████████████                     | 368/498 [00:19<00:08, 14.52it/s]
 75%|████████████████████████████████████████████████████████████                    | 374/498 [00:20<00:08, 14.03it/s]
 76%|█████████████████████████████████████████████████████████████                   | 380/498 [00:20<00:08, 14.21it/s]
 78%|██████████████████████████████████████████████████████████████                  | 386/498 [00:21<00:07, 14.27it/s]
 79%|██████████████████████████████████████████████████████████████▉                 | 392/498 [00:21<00:07, 14.29it/s]
 80%|███████████████████████████████████████████████████████████████▉                | 398/498 [00:22<00:07, 14.20it/s]
 81%|████████████████████████████████████████████████████████████████▉               | 404/498 [00:22<00:06, 14.23it/s]
 82%|█████████████████████████████████████████████████████████████████▊              | 410/498 [00:22<00:06, 14.29it/s]
 84%|██████████████████████████████████████████████████████████████████▊             | 416/498 [00:23<00:05, 14.16it/s]
 85%|███████████████████████████████████████████████████████████████████▊            | 422/498 [00:23<00:05, 14.23it/s]
 86%|████████████████████████████████████████████████████████████████████▊           | 428/498 [00:24<00:04, 14.27it/s]
 87%|█████████████████████████████████████████████████████████████████████▋          | 434/498 [00:24<00:04, 14.19it/s]
 88%|██████████████████████████████████████████████████████████████████████▋         | 440/498 [00:25<00:04, 14.22it/s]
 90%|███████████████████████████████████████████████████████████████████████▋        | 446/498 [00:25<00:03, 14.09it/s]
 91%|████████████████████████████████████████████████████████████████████████▌       | 452/498 [00:25<00:03, 14.05it/s]
 92%|█████████████████████████████████████████████████████████████████████████▌      | 458/498 [00:26<00:02, 14.13it/s]
 93%|██████████████████████████████████████████████████████████████████████████▌     | 464/498 [00:26<00:02, 14.15it/s]
 94%|███████████████████████████████████████████████████████████████████████████▌    | 470/498 [00:27<00:01, 14.11it/s]
 96%|████████████████████████████████████████████████████████████████████████████▍   | 476/498 [00:27<00:01, 13.95it/s]
 97%|█████████████████████████████████████████████████████████████████████████████▍  | 482/498 [00:27<00:01, 14.14it/s]
 98%|██████████████████████████████████████████████████████████████████████████████▍ | 488/498 [00:28<00:00, 14.06it/s]
 99%|███████████████████████████████████████████████████████████████████████████████▎| 494/498 [00:28<00:00, 14.18it/s]
500it [00:29, 14.23it/s]                                                                                               
504it [00:29, 14.09it/s]
PartitionExplainer explainer: 3it [01:40, 50.47s/it]                                                                   

8. A nice thing about the shap module is that it comes with a built-in visualizer. Use shap.plots.text to visualize the shap values for the first and the second drug review.

In [20]:
shap.plots.text(shap_values[0]) # first review
outputs
NEGATIVE
POSITIVE


0.40.1-0.2-0.50.710.5828070.582807base value00fNEGATIVE(inputs)0.124 He is less emotional 0.105 (a good thing) 0.019 was very cranky 0.019 when he started taking the highest dose he 0.017 she said to stick it out a few days. 0.017 I called his doctor on Monday morning and 0.017 The last two days have been problem free. 0.016 "My son is halfway through his fourth week of 0.016 Intuniv. 0.015 For two days, 0.014 hours on a drive home from school vacation 0.013 See how he did at school, 0.013 We became concerned when 0.012 , 0.011 and slept for nearly 8 0.009 he began this last week, 0.009 (very unusual for him.) 0.009 We have tried many different medications and 0.008 will be on. 0.007 , 0.007 he could hardly get out of bed, 0.005 " 0.004 . 0.003 He is remembering all the things he should. Overall his behavior is better. 0.003 and with getting up in the morning. -0.436 the most effective -0.183 agreeable -0.146 this is -0.111 than ever -0.054 . -0.051 is -0.04 more -0.027 MUCH -0.015 He -0.01 less cranky. -0.003 so far
inputs
0.016 / 11
"My son is halfway through his fourth week of
0.016 / 4
Intuniv.
0.013 / 4
We became concerned when
0.009 / 6
he began this last week,
0.019 / 8
when he started taking the highest dose he
0.008 / 4
will be on.
0.015 / 4
For two days,
0.007 / 8
he could hardly get out of bed,
0.019 / 4
was very cranky
0.007
,
0.011 / 5
and slept for nearly 8
0.014 / 8
hours on a drive home from school vacation
0.009 / 7
(very unusual for him.)
0.017 / 8
I called his doctor on Monday morning and
0.017 / 10
she said to stick it out a few days.
0.013 / 7
See how he did at school,
0.003 / 8
and with getting up in the morning.
0.017 / 9
The last two days have been problem free.
-0.015
He
-0.051
is
-0.027
MUCH
-0.04
more
-0.183 / 2
agreeable
-0.111 / 2
than ever
0.004
.
0.124 / 4
He is less emotional
0.105 / 5
(a good thing)
0.012
,
-0.01 / 4
less cranky.
0.003 / 15
He is remembering all the things he should. Overall his behavior is better.
0.009 / 7
We have tried many different medications and
-0.003 / 2
so far
-0.146 / 2
this is
-0.436 / 3
the most effective
-0.054
.
0.005 / 2
"
0.30-0.30.60.90.5828070.582807base value00fNEGATIVE(inputs)0.124 He is less emotional 0.105 (a good thing) 0.019 was very cranky 0.019 when he started taking the highest dose he 0.017 she said to stick it out a few days. 0.017 I called his doctor on Monday morning and 0.017 The last two days have been problem free. 0.016 "My son is halfway through his fourth week of 0.016 Intuniv. 0.015 For two days, 0.014 hours on a drive home from school vacation 0.013 See how he did at school, 0.013 We became concerned when 0.012 , 0.011 and slept for nearly 8 0.009 he began this last week, 0.009 (very unusual for him.) 0.009 We have tried many different medications and 0.008 will be on. 0.007 , 0.007 he could hardly get out of bed, 0.005 " 0.004 . 0.003 He is remembering all the things he should. Overall his behavior is better. 0.003 and with getting up in the morning. -0.436 the most effective -0.183 agreeable -0.146 this is -0.111 than ever -0.054 . -0.051 is -0.04 more -0.027 MUCH -0.015 He -0.01 less cranky. -0.003 so far
inputs
0.016 / 11
"My son is halfway through his fourth week of
0.016 / 4
Intuniv.
0.013 / 4
We became concerned when
0.009 / 6
he began this last week,
0.019 / 8
when he started taking the highest dose he
0.008 / 4
will be on.
0.015 / 4
For two days,
0.007 / 8
he could hardly get out of bed,
0.019 / 4
was very cranky
0.007
,
0.011 / 5
and slept for nearly 8
0.014 / 8
hours on a drive home from school vacation
0.009 / 7
(very unusual for him.)
0.017 / 8
I called his doctor on Monday morning and
0.017 / 10
she said to stick it out a few days.
0.013 / 7
See how he did at school,
0.003 / 8
and with getting up in the morning.
0.017 / 9
The last two days have been problem free.
-0.015
He
-0.051
is
-0.027
MUCH
-0.04
more
-0.183 / 2
agreeable
-0.111 / 2
than ever
0.004
.
0.124 / 4
He is less emotional
0.105 / 5
(a good thing)
0.012
,
-0.01 / 4
less cranky.
0.003 / 15
He is remembering all the things he should. Overall his behavior is better.
0.009 / 7
We have tried many different medications and
-0.003 / 2
so far
-0.146 / 2
this is
-0.436 / 3
the most effective
-0.054
.
0.005 / 2
"
0.40.1-0.2-0.50.7100base value0.9062140.906214fPOSITIVE(inputs)0.363 the most effective 0.215 agreeable 0.188 this is 0.137 than ever 0.065 is 0.059 . 0.048 more 0.041 MUCH 0.038 so far 0.037 He 0.017 " 0.012 See how he did at school, 0.011 We have tried many different medications and 0.009 and with getting up in the morning. 0.008 0.001 . -0.14 He is less emotional -0.084 (a good thing) -0.028 Overall his behavior is better. -0.016 He is remembering all the things he should. -0.012 less cranky. -0.012 The last two days -0.011 I called his doctor on Monday morning and she said to stick it out a few days. -0.01 and slept for nearly 8 hours on a drive home from school vacation (very unusual for him.) -0.01 have been problem free. -0.009 , -0.005 "My son is halfway through his fourth week of Intuniv. -0.003 We became concerned when he began this last week, when he started taking the highest dose he will be on. -0.002 For two days, he could hardly get out of bed, was very cranky,
inputs
-0.005 / 15
"My son is halfway through his fourth week of Intuniv.
-0.003 / 22
We became concerned when he began this last week, when he started taking the highest dose he will be on.
-0.002 / 17
For two days, he could hardly get out of bed, was very cranky,
-0.01 / 20
and slept for nearly 8 hours on a drive home from school vacation (very unusual for him.)
-0.011 / 18
I called his doctor on Monday morning and she said to stick it out a few days.
0.012 / 7
See how he did at school,
0.009 / 8
and with getting up in the morning.
-0.012 / 4
The last two days
-0.01 / 5
have been problem free.
0.037
He
0.065
is
0.041
MUCH
0.048
more
0.215 / 2
agreeable
0.137 / 2
than ever
0.001
.
-0.14 / 4
He is less emotional
-0.084 / 5
(a good thing)
-0.009
,
-0.012 / 4
less cranky.
-0.016 / 9
He is remembering all the things he should.
-0.028 / 6
Overall his behavior is better.
0.011 / 7
We have tried many different medications and
0.038 / 2
so far
0.188 / 2
this is
0.363 / 3
the most effective
0.059
.
0.017
"
0.008
0.50.2-0.10.81.100base value0.9062140.906214fPOSITIVE(inputs)0.363 the most effective 0.215 agreeable 0.188 this is 0.137 than ever 0.065 is 0.059 . 0.048 more 0.041 MUCH 0.038 so far 0.037 He 0.017 " 0.012 See how he did at school, 0.011 We have tried many different medications and 0.009 and with getting up in the morning. 0.008 0.001 . -0.14 He is less emotional -0.084 (a good thing) -0.028 Overall his behavior is better. -0.016 He is remembering all the things he should. -0.012 less cranky. -0.012 The last two days -0.011 I called his doctor on Monday morning and she said to stick it out a few days. -0.01 and slept for nearly 8 hours on a drive home from school vacation (very unusual for him.) -0.01 have been problem free. -0.009 , -0.005 "My son is halfway through his fourth week of Intuniv. -0.003 We became concerned when he began this last week, when he started taking the highest dose he will be on. -0.002 For two days, he could hardly get out of bed, was very cranky,
inputs
-0.005 / 15
"My son is halfway through his fourth week of Intuniv.
-0.003 / 22
We became concerned when he began this last week, when he started taking the highest dose he will be on.
-0.002 / 17
For two days, he could hardly get out of bed, was very cranky,
-0.01 / 20
and slept for nearly 8 hours on a drive home from school vacation (very unusual for him.)
-0.011 / 18
I called his doctor on Monday morning and she said to stick it out a few days.
0.012 / 7
See how he did at school,
0.009 / 8
and with getting up in the morning.
-0.012 / 4
The last two days
-0.01 / 5
have been problem free.
0.037
He
0.065
is
0.041
MUCH
0.048
more
0.215 / 2
agreeable
0.137 / 2
than ever
0.001
.
-0.14 / 4
He is less emotional
-0.084 / 5
(a good thing)
-0.009
,
-0.012 / 4
less cranky.
-0.016 / 9
He is remembering all the things he should.
-0.028 / 6
Overall his behavior is better.
0.011 / 7
We have tried many different medications and
0.038 / 2
so far
0.188 / 2
this is
0.363 / 3
the most effective
0.059
.
0.017
"
0.008
In [21]:
shap.plots.text(shap_values[1]) # second review
outputs
NEGATIVE
POSITIVE


00000000000000000000.5213740.521374base value0.9842180.984218fNEGATIVE(inputs)0.162 never effect 0.122 LO 0.101 OSING 0.092 WORST side effect of 0.08 Implan 0.078 . 0.059 I 0.059 AM 0.055 . 0.043 HAIR 0.039 get it out. 0.038 got to 0.034 MY 0.034 take the 0.033 on 0.033 pill 0.033 would not 0.03 after giving birth to 0.028 side effects 0.028 them all. 0.028 I walked in thinking 0.027 I 0.025 me 0.025 However 0.023 I received the 0.021 that this 0.019 " 0.014 have 0.009 I am having the 0.009 forget to 0.006 . 0.005 , 0.005 I 0.0 happen to me, -0.172 Good -0.11 is great -0.1 luck -0.081 . -0.077 not pregnant -0.073 to the -0.044 Well it -0.044 . -0.035 . -0.03 I am -0.027 my first -0.027 great -0.022 rest of -0.022 you -0.013 this -0.013 would -0.011 thought -0.009 son -0.005 I -0.005 because -0.005 . -0.004 " -0.004 be
inputs
0.019 / 2
"
0.023 / 3
I received the
0.08 / 2
Implan
0.033
on
0.03 / 4
after giving birth to
-0.027 / 2
my first
-0.009
son
-0.035
.
0.005
I
-0.011
thought
-0.013
this
-0.013
would
-0.004
be
-0.027
great
-0.005
because
-0.005
I
0.009 / 2
forget to
0.034 / 2
take the
0.033
pill
0.055
.
-0.044 / 2
Well it
-0.11 / 2
is great
-0.081
.
-0.03 / 2
I am
-0.077 / 2
not pregnant
0.006
.
0.025
However
0.005
,
0.009 / 4
I am having the
0.092 / 4
WORST side effect of
0.028 / 3
them all.
0.059
I
0.059
AM
0.122
LO
0.101
OSING
0.034
MY
0.043
HAIR
0.078
.
0.027
I
0.014
have
0.038 / 2
got to
0.039 / 4
get it out.
0.028 / 4
I walked in thinking
0.021 / 2
that this
0.033 / 2
would not
0.0 / 4
happen to me,
0.028 / 2
side effects
0.162 / 2
never effect
0.025
me
-0.005
.
-0.172
Good
-0.1
luck
-0.073 / 2
to the
-0.022 / 2
rest of
-0.022
you
-0.044
.
-0.004 / 2
"
0.80.40-0.41.21.60.5213740.521374base value0.9842180.984218fNEGATIVE(inputs)0.162 never effect 0.122 LO 0.101 OSING 0.092 WORST side effect of 0.08 Implan 0.078 . 0.059 I 0.059 AM 0.055 . 0.043 HAIR 0.039 get it out. 0.038 got to 0.034 MY 0.034 take the 0.033 on 0.033 pill 0.033 would not 0.03 after giving birth to 0.028 side effects 0.028 them all. 0.028 I walked in thinking 0.027 I 0.025 me 0.025 However 0.023 I received the 0.021 that this 0.019 " 0.014 have 0.009 I am having the 0.009 forget to 0.006 . 0.005 , 0.005 I 0.0 happen to me, -0.172 Good -0.11 is great -0.1 luck -0.081 . -0.077 not pregnant -0.073 to the -0.044 Well it -0.044 . -0.035 . -0.03 I am -0.027 my first -0.027 great -0.022 rest of -0.022 you -0.013 this -0.013 would -0.011 thought -0.009 son -0.005 I -0.005 because -0.005 . -0.004 " -0.004 be
inputs
0.019 / 2
"
0.023 / 3
I received the
0.08 / 2
Implan
0.033
on
0.03 / 4
after giving birth to
-0.027 / 2
my first
-0.009
son
-0.035
.
0.005
I
-0.011
thought
-0.013
this
-0.013
would
-0.004
be
-0.027
great
-0.005
because
-0.005
I
0.009 / 2
forget to
0.034 / 2
take the
0.033
pill
0.055
.
-0.044 / 2
Well it
-0.11 / 2
is great
-0.081
.
-0.03 / 2
I am
-0.077 / 2
not pregnant
0.006
.
0.025
However
0.005
,
0.009 / 4
I am having the
0.092 / 4
WORST side effect of
0.028 / 3
them all.
0.059
I
0.059
AM
0.122
LO
0.101
OSING
0.034
MY
0.043
HAIR
0.078
.
0.027
I
0.014
have
0.038 / 2
got to
0.039 / 4
get it out.
0.028 / 4
I walked in thinking
0.021 / 2
that this
0.033 / 2
would not
0.0 / 4
happen to me,
0.028 / 2
side effects
0.162 / 2
never effect
0.025
me
-0.005
.
-0.172
Good
-0.1
luck
-0.073 / 2
to the
-0.022 / 2
rest of
-0.022
you
-0.044
.
-0.004 / 2
"
000000000000000000000base value00fPOSITIVE(inputs)0.198 Good 0.109 luck 0.106 is great 0.078 . 0.075 not pregnant 0.049 to the 0.043 . 0.042 . 0.039 Well it 0.038 I am 0.037 rest of 0.037 you 0.03 my first 0.023 great 0.018 thought 0.014 son 0.012 this 0.012 would 0.011 because 0.01 I 0.004 " 0.002 side effects 0.001 I -0.153 never effect -0.1 LO -0.071 Implan -0.071 OSING -0.067 WORST side effect of -0.047 However -0.043 . -0.035 . -0.03 AM -0.03 I -0.03 would not -0.029 I walked in thinking -0.028 got to -0.026 get it out. -0.026 pill -0.022 MY -0.022 HAIR -0.02 take the -0.019 I -0.019 that this -0.018 them all. -0.017 , -0.016 I am having the -0.014 on -0.008 have -0.007 me -0.006 forget to -0.005 . -0.004 . -0.003 happen to me, -0.002 be -0.002 after giving birth to
inputs
0.0
0.0
"
0.0
I
0.0
received
0.0
the
-0.071 / 2
Implan
-0.014
on
-0.002 / 4
after giving birth to
0.03 / 2
my first
0.014
son
0.043
.
0.001
I
0.018
thought
0.012
this
0.012
would
-0.002
be
0.023
great
0.011
because
0.01
I
-0.006 / 2
forget to
-0.02 / 2
take the
-0.026
pill
-0.035
.
0.039 / 2
Well it
0.106 / 2
is great
0.078
.
0.038 / 2
I am
0.075 / 2
not pregnant
-0.005
.
-0.047
However
-0.017
,
-0.016 / 4
I am having the
-0.067 / 4
WORST side effect of
-0.018 / 3
them all.
-0.03
I
-0.03
AM
-0.1
LO
-0.071
OSING
-0.022
MY
-0.022
HAIR
-0.043
.
-0.019
I
-0.008
have
-0.028 / 2
got to
-0.026 / 4
get it out.
-0.029 / 4
I walked in thinking
-0.019 / 2
that this
-0.03 / 2
would not
-0.003 / 4
happen to me,
0.002 / 2
side effects
-0.153 / 2
never effect
-0.007
me
-0.004
.
0.198
Good
0.109
luck
0.049 / 2
to the
0.037 / 2
rest of
0.037
you
0.042
.
0.004 / 2
"
0-0.3-0.6-0.90.30.60.900base value00fPOSITIVE(inputs)0.198 Good 0.109 luck 0.106 is great 0.078 . 0.075 not pregnant 0.049 to the 0.043 . 0.042 . 0.039 Well it 0.038 I am 0.037 rest of 0.037 you 0.03 my first 0.023 great 0.018 thought 0.014 son 0.012 this 0.012 would 0.011 because 0.01 I 0.004 " 0.002 side effects 0.001 I -0.153 never effect -0.1 LO -0.071 Implan -0.071 OSING -0.067 WORST side effect of -0.047 However -0.043 . -0.035 . -0.03 AM -0.03 I -0.03 would not -0.029 I walked in thinking -0.028 got to -0.026 get it out. -0.026 pill -0.022 MY -0.022 HAIR -0.02 take the -0.019 I -0.019 that this -0.018 them all. -0.017 , -0.016 I am having the -0.014 on -0.008 have -0.007 me -0.006 forget to -0.005 . -0.004 . -0.003 happen to me, -0.002 be -0.002 after giving birth to
inputs
0.0
0.0
"
0.0
I
0.0
received
0.0
the
-0.071 / 2
Implan
-0.014
on
-0.002 / 4
after giving birth to
0.03 / 2
my first
0.014
son
0.043
.
0.001
I
0.018
thought
0.012
this
0.012
would
-0.002
be
0.023
great
0.011
because
0.01
I
-0.006 / 2
forget to
-0.02 / 2
take the
-0.026
pill
-0.035
.
0.039 / 2
Well it
0.106 / 2
is great
0.078
.
0.038 / 2
I am
0.075 / 2
not pregnant
-0.005
.
-0.047
However
-0.017
,
-0.016 / 4
I am having the
-0.067 / 4
WORST side effect of
-0.018 / 3
them all.
-0.03
I
-0.03
AM
-0.1
LO
-0.071
OSING
-0.022
MY
-0.022
HAIR
-0.043
.
-0.019
I
-0.008
have
-0.028 / 2
got to
-0.026 / 4
get it out.
-0.029 / 4
I walked in thinking
-0.019 / 2
that this
-0.03 / 2
would not
-0.003 / 4
happen to me,
0.002 / 2
side effects
-0.153 / 2
never effect
-0.007
me
-0.004
.
0.198
Good
0.109
luck
0.049 / 2
to the
0.037 / 2
rest of
0.037
you
0.042
.
0.004 / 2
"

Features highlighted in red are increasing the predicted probability, while features highlighted in blue are lowering the predicted probability.

Fine-tuning BERT using the Drug Review Dataset¶

Now let's fine-tune a sentiment analysis of the Drug Review dataset using the off-the-shelf sentiment analysis pipeline.

Since the DistilBERT we are using was trained on the Stanford Sentiment Treebank dataset, we also fine-tune our model for the Drug Review dataset. In practice, this might not be neccesary for this particular application, but might be good to see how it can be done.

In [22]:
from datasets import load_dataset

from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

import evaluate
import numpy as np

9. Using the codes below, inspect the dataset and sample up to 10% of the train and test, because fine-tuning the entire dataset would be too resource-intensive to run in this practical.

In [23]:
df_train.head()
Out[23]:
Unnamed: 0 drugName condition review rating date usefulCount label
0 206461 Valsartan Left Ventricular Dysfunction "It has no side effect, I take it in combinati... 9.0 May 20, 2012 27 positive
1 95260 Guanfacine ADHD "My son is halfway through his fourth week of ... 8.0 April 27, 2010 192 positive
2 92703 Lybrel Birth Control "I used to take another oral contraceptive, wh... 5.0 December 14, 2009 17 negative
3 138000 Ortho Evra Birth Control "This is my first time using any form of birth... 8.0 November 3, 2015 10 positive
4 35696 Buprenorphine / naloxone Opiate Dependence "Suboxone has completely turned my life around... 9.0 November 27, 2016 37 positive
In [24]:
df_train['label'].hist()
Out[24]:
<Axes: >
No description has been provided for this image
In [25]:
df_train = df_train[['label','review']]
df_test = df_test[['label','review']]
In [26]:
# target_map = { 'positive': 1, 'negative': 0, 'neutral': 2}
target_map = { 'positive': 1, 'negative': 0}
df_train['target'] = df_train['label'].map(target_map)
df_test['target'] = df_test['label'].map(target_map)
In [27]:
# Save data to new csv file. Because transformers required special format of dataset to perform operations on it, 
# which we will give using load_dataset class. 
df1 = df_train[['review','target']]
df1.columns = ['sentence','label']
df1.to_csv('train.csv', index = False)
In [28]:
df1 = df_test[['review','target']]
df1.columns = ['sentence','label']
df1.to_csv('test.csv', index = False)
In [29]:
raw_dataset = load_dataset('csv', 
              data_files = { 'train': 'train.csv',
                             'test': 'test.csv'})
Downloading data files: 100%|████████████████████████████████████████████████████████████████████| 2/2 [00:00<?, ?it/s]
Extracting data files: 100%|████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 149.91it/s]
Generating train split: 161297 examples [00:01, 158894.96 examples/s]
Generating test split: 53766 examples [00:00, 154972.56 examples/s]
In [30]:
raw_dataset
Out[30]:
DatasetDict({
    train: Dataset({
        features: ['sentence', 'label'],
        num_rows: 161297
    })
    test: Dataset({
        features: ['sentence', 'label'],
        num_rows: 53766
    })
})
In [31]:
# Because fine-tuning on the dataset would be too resource-intensive to run in this practical, 
# we will work with a randomly sampled 2% of the original train and test dataset size.
drug_data = raw_dataset
drug_data['train'] = drug_data['train'].shuffle(seed=42).select(range(int(0.02*len(raw_dataset['train']))))
drug_data['test'] = drug_data['test'].shuffle(seed=42).select(range(int(0.02*len(raw_dataset['test']))))
drug_data
Out[31]:
DatasetDict({
    train: Dataset({
        features: ['sentence', 'label'],
        num_rows: 3225
    })
    test: Dataset({
        features: ['sentence', 'label'],
        num_rows: 1075
    })
})

10. Create a preprocessing function to tokenize text and truncate sequences to be no longer than DistilBERT’s maximum input length. To apply the preprocessing function over the entire dataset, use Datasets map function. You can speed up map by setting batched=True to process multiple elements of the dataset at once.

In [32]:
# Import AutoTokenizer and create tokenizer object
from transformers import AutoTokenizer
#checkpoint = 'bert-base-cased'
checkpoint = 'distilbert-base-uncased-finetuned-sst-2-english'
tokernizer = AutoTokenizer.from_pretrained(checkpoint)
In [33]:
def tokenize_fn(batch):
  return tokernizer(batch['sentence'], truncation = True)
In [34]:
tokenized_data = drug_data.map(tokenize_fn, batched=True)
Map: 100%|████████████████████████████████████████████████████████████████| 3225/3225 [00:01<00:00, 2150.29 examples/s]
Map: 100%|████████████████████████████████████████████████████████████████| 1075/1075 [00:00<00:00, 2480.38 examples/s]

12. Load the DistilBERT model we had used earlier using AutoModelForSequenceClassification and look at the summary of the model.

In [35]:
from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels = 2)

Let's install torchinfo which is a Python library for getting information about PyTorch models and tensors. It provides a convenient way to inspect the architecture of a PyTorch model, including the shape and size of the tensors that are passed between the layers, as well as the number of parameters and memory usage of each layer.

In [36]:
from torchinfo import summary
summary(model)
Out[36]:
================================================================================
Layer (type:depth-idx)                                  Param #
================================================================================
DistilBertForSequenceClassification                     --
├─DistilBertModel: 1-1                                  --
│    └─Embeddings: 2-1                                  --
│    │    └─Embedding: 3-1                              23,440,896
│    │    └─Embedding: 3-2                              393,216
│    │    └─LayerNorm: 3-3                              1,536
│    │    └─Dropout: 3-4                                --
│    └─Transformer: 2-2                                 --
│    │    └─ModuleList: 3-5                             42,527,232
├─Linear: 1-2                                           590,592
├─Linear: 1-3                                           1,538
├─Dropout: 1-4                                          --
================================================================================
Total params: 66,955,010
Trainable params: 66,955,010
Non-trainable params: 0
================================================================================

13. Use the code below ot setup the training arguments and define the metrics.

In [37]:
training_args = TrainingArguments(output_dir='training_dir',
                                  evaluation_strategy='epoch',
                                  save_strategy='epoch',
                                  num_train_epochs=2,
                                  per_device_train_batch_size=10,
                                  per_device_eval_batch_size=10)
In [38]:
def compute_metrics(logits_and_labels):
  logits, labels = logits_and_labels
  predictions = np.argmax(logits, axis=-1)
  acc = np.mean(predictions == labels)
  f1 = f1_score(labels, predictions, average = 'micro')
  return {'accuracy': acc, 'f1_score': f1}

14. Fine-tune model using the Trainer function. You can use the following training arguments.

In [39]:
trainer = Trainer(model,
                  training_args,
                  train_dataset = tokenized_data["train"],
                  eval_dataset = tokenized_data["test"],
                  tokenizer=tokernizer,
                  compute_metrics=compute_metrics)
In [40]:
trainer.train()
You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
[646/646 10:10, Epoch 2/2]
Epoch Training Loss Validation Loss Accuracy F1 Score
1 No log 0.342572 0.849302 0.849302
2 0.356900 0.561416 0.852093 0.852093

Out[40]:
TrainOutput(global_step=646, training_loss=0.3219169345064429, metrics={'train_runtime': 613.5648, 'train_samples_per_second': 10.512, 'train_steps_per_second': 1.053, 'total_flos': 337819032738540.0, 'train_loss': 0.3219169345064429, 'epoch': 2.0})
In [41]:
# the code here can be uncommented to clear the memory if you are using your machine for the practical
# import torch
# torch.cuda.empty_cache()

# import gc
# del df1, df_train, df_test
# gc.collect()
# torch.cuda.memory_summary(device=None, abbreviated=False)

15. Load the model you just fine-tuned into a pipeline and classify a sentence of choice.

In [42]:
from transformers import pipeline
fine_tuned_model = pipeline('text-classification',
                       model = 'training_dir/checkpoint-82')
In [43]:
fine_tuned_model(review1)
Out[43]:
[{'label': 'POSITIVE', 'score': 0.9956554174423218}]
In [44]:
fine_tuned_model(review2)
Out[44]:
[{'label': 'POSITIVE', 'score': 0.9846466183662415}]
In [45]:
fine_tuned_model("It was awful.")
Out[45]:
[{'label': 'NEGATIVE', 'score': 0.9894874095916748}]
In [46]:
fine_tuned_model("The drug has many painful side effects.")
Out[46]:
[{'label': 'NEGATIVE', 'score': 0.9243993759155273}]
In [47]:
predictions = fine_tuned_model(raw_dataset['test']['sentence'])
In [49]:
# predictions

Further reading¶

  • How to fine-tune a model: https://huggingface.co/docs/transformers/training
  • Fine-Tuning Transformers with custom dataset: https://medium.com/@lokaregns/fine-tuning-transformers-with-custom-dataset-classification-task-f261579ae068