Lab 1: Recurrent, Convolutional and Bidirectional Neural Networks¶

logo

Transformers workshop¶

In this practical, we will apply RNN and CNN deep learning architectures on text sequence data. We will be going to use the Drug Review Dataset from drugs.com which is publicly availabe at the UCI Machine Learning repository. Below more information on the dataset:

  • The Drug Review Dataset provides patient reviews on specific drugs along with related conditions and a 10-star patient rating reflecting the overall patient satisfaction.
  • The data was obtained by crawling online pharmaceutical review sites.
  • The dataset is of shape (161297, 7) i.e. It has 7 features including the review and 161297 Data Points or entries.
  • The features are 'drugName' which is the name of the drug, 'condition' which is the condition the patient is suffering from, 'review' is the patients review, 'rating' is the 10-star patient rating for the drug, 'date' is the date of the entry and the 'usefulcount' is the number of users who found the review usefule.

Let's first load the following libraries. Take care to have them installed!

In [1]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import RandomizedSearchCV
from sklearn.preprocessing import LabelEncoder

from keras.utils import pad_sequences
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras import layers, utils

from scikeras.wrappers import KerasClassifier
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import tensorflow as tf
import seaborn as sns
import pandas as pd
import numpy as np
import random
WARNING:tensorflow:From C:\Python\Python311\Lib\site-packages\keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

In [2]:
# set the seeds so we might be able to get the same results!
seed = 137
random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)

Data exploration and visualization¶

1. Load the train and test sets from the data folder.

In [3]:
df_train = pd.read_csv("data/drugsComTrain_raw.tsv",sep='\t')
df_train.head()
Out[3]:
Unnamed: 0 drugName condition review rating date usefulCount
0 206461 Valsartan Left Ventricular Dysfunction "It has no side effect, I take it in combinati... 9.0 May 20, 2012 27
1 95260 Guanfacine ADHD "My son is halfway through his fourth week of ... 8.0 April 27, 2010 192
2 92703 Lybrel Birth Control "I used to take another oral contraceptive, wh... 5.0 December 14, 2009 17
3 138000 Ortho Evra Birth Control "This is my first time using any form of birth... 8.0 November 3, 2015 10
4 35696 Buprenorphine / naloxone Opiate Dependence "Suboxone has completely turned my life around... 9.0 November 27, 2016 37
In [4]:
df_test = pd.read_csv("data/drugsComTest_raw.tsv",sep='\t')
df_test.head()
Out[4]:
Unnamed: 0 drugName condition review rating date usefulCount
0 163740 Mirtazapine Depression "I've tried a few antidepressants over th... 10.0 February 28, 2012 22
1 206473 Mesalamine Crohn's Disease, Maintenance "My son has Crohn's disease and has done ... 8.0 May 17, 2009 17
2 159672 Bactrim Urinary Tract Infection "Quick reduction of symptoms" 9.0 September 29, 2017 3
3 39293 Contrave Weight Loss "Contrave combines drugs that were used for al... 9.0 March 5, 2017 35
4 97768 Cyclafem 1 / 35 Birth Control "I have been on this birth control for one cyc... 9.0 October 22, 2015 4

2. Check the number of reviews in train and test sets and the number of drug names.

In [5]:
df_train.shape
Out[5]:
(161297, 7)
In [6]:
df_test.shape
Out[6]:
(53766, 7)
In [7]:
len(df_train['drugName'].unique().tolist())
Out[7]:
3436
In [8]:
len(df_test['drugName'].unique().tolist())
Out[8]:
2637

3. Let's explore the data by the following wordcloud and barchart plots.

In [9]:
# let's see the words cloud for the reviews 
# most popular drugs
wordcloud = WordCloud(max_font_size = 25, max_words = 50, background_color = "white").generate(str(df_train['drugName']))
plt.figure()
plt.imshow(wordcloud, interpolation = "bilinear")
plt.axis("off")
plt.show()
No description has been provided for this image
In [10]:
# This barplot shows the top 20 drugs with the 10/10 rating
# Setting the Parameter
sns.set(font_scale = 1.2, style = 'darkgrid')
plt.rcParams['figure.figsize'] = [15, 8]

rating = dict(df_train.loc[df_train.rating == 10, "drugName"].value_counts())
drugname = list(rating.keys())
drug_rating = list(rating.values())

sns_rating = sns.barplot(x = drugname[0:20], y = drug_rating[0:20])

sns_rating.set_title('Top 20 drugs with 10/10 rating')
sns_rating.set_ylabel("Number of Ratings")
sns_rating.set_xlabel("Drug Names")
plt.setp(sns_rating.get_xticklabels(), rotation=90);
No description has been provided for this image

4. Convert the rating into three labels: positive, negative and neutral for the sentiment classification task.

In [11]:
df_train['label'] = 'neutral'
df_train.loc[df_train['rating'] >= 6, 'label'] = 'positive'
df_train.loc[df_train['rating'] <= 4, 'label'] = 'negative'
df_train.head()
Out[11]:
Unnamed: 0 drugName condition review rating date usefulCount label
0 206461 Valsartan Left Ventricular Dysfunction "It has no side effect, I take it in combinati... 9.0 May 20, 2012 27 positive
1 95260 Guanfacine ADHD "My son is halfway through his fourth week of ... 8.0 April 27, 2010 192 positive
2 92703 Lybrel Birth Control "I used to take another oral contraceptive, wh... 5.0 December 14, 2009 17 neutral
3 138000 Ortho Evra Birth Control "This is my first time using any form of birth... 8.0 November 3, 2015 10 positive
4 35696 Buprenorphine / naloxone Opiate Dependence "Suboxone has completely turned my life around... 9.0 November 27, 2016 37 positive
In [12]:
df_test['label'] = 'neutral'
df_test.loc[df_test['rating'] >= 6, 'label'] = 'positive'
df_test.loc[df_test['rating'] <= 4, 'label'] = 'negative'
df_test.head()
Out[12]:
Unnamed: 0 drugName condition review rating date usefulCount label
0 163740 Mirtazapine Depression "I&#039;ve tried a few antidepressants over th... 10.0 February 28, 2012 22 positive
1 206473 Mesalamine Crohn's Disease, Maintenance "My son has Crohn&#039;s disease and has done ... 8.0 May 17, 2009 17 positive
2 159672 Bactrim Urinary Tract Infection "Quick reduction of symptoms" 9.0 September 29, 2017 3 positive
3 39293 Contrave Weight Loss "Contrave combines drugs that were used for al... 9.0 March 5, 2017 35 positive
4 97768 Cyclafem 1 / 35 Birth Control "I have been on this birth control for one cyc... 9.0 October 22, 2015 4 positive

5. More preprocessing.

First run the following lines to make your data ready for the models (we used this code in the previous practical):

In [13]:
# tokenizer from keras
tokenizer = Tokenizer(num_words = 20000)
tokenizer.fit_on_texts(df_train.review.values)
X_train = tokenizer.texts_to_sequences(df_train.review.values)
X_test  = tokenizer.texts_to_sequences(df_test.review.values)
vocab_size = len(tokenizer.word_index) + 1  # Adding 1 because of reserved 0 index for sequence padding

# pad sequence
maxlen  = 100
X_train = pad_sequences(X_train, padding = 'post', maxlen = maxlen)
X_test  = pad_sequences(X_test,  padding = 'post', maxlen = maxlen)

# One-hot encoding the labels
lb = LabelEncoder()
y = lb.fit_transform(df_train.label.values)
y_train = utils.to_categorical(y)
y = lb.transform(df_test.label.values)
y_test = utils.to_categorical(y)

Let's use the code below which is a function for plotting the history of training of our neural network models:

In [14]:
plt.style.use('ggplot')
def plot_history(history, val=0):
    acc = history.history['accuracy']
    if val == 1:
        val_acc = history.history['val_accuracy'] # we can add a validation set in our fit function with nn
    loss = history.history['loss']
    if val == 1:
        val_loss = history.history['val_loss']
    x = range(1, len(acc) + 1)

    plt.figure(figsize=(12, 5))
    plt.subplot(1, 2, 1)
    plt.plot(x, acc, 'b', label='Training accuracy')
    if val == 1:
        plt.plot(x, val_acc, 'r', label='Validation accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.title('Accuracy')
    plt.legend()
    plt.subplot(1, 2, 2)
    plt.plot(x, loss, 'b', label='Training loss')
    if val == 1:
        plt.plot(x, val_loss, 'r', label='Validation loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.title('Loss')
    plt.legend()

Recurrent neural networks¶

A recurrent neural network (RNN) is a natural generalization of feed-forward neural networks to sequence data such as text. In contrast to a feed-forward neural network, however, it accepts a new input at every time step (layer). Long-short term memory (LSTM) networks are a variant of RNNs. The LSTM introduces mechanisms to decide what should be remembered and what should be forgotten in learning from text documents.

6. Build a neural network model with an LSTM layer of 100 units. As before, the first layer should be an embedding layer, then the LSTM layer, a Dense layer, and the output Dense layer for the 5 news categories. Compile the model and print its summary.

In [15]:
embedding_dim = 100
model_rnn = Sequential()
model_rnn.add(layers.Embedding(vocab_size, embedding_dim, input_length = maxlen))
model_rnn.add(layers.LSTM(100, dropout = 0.2, recurrent_dropout = 0.2))
model_rnn.add(layers.Dense(10, activation = 'relu'))
model_rnn.add(layers.Dense(3, activation = 'softmax'))
model_rnn.compile(optimizer = 'adam',
              loss = 'categorical_crossentropy',
              metrics = ['accuracy'])
model_rnn.summary()
WARNING:tensorflow:From C:\Python\Python311\Lib\site-packages\keras\src\backend.py:873: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From C:\Python\Python311\Lib\site-packages\keras\src\optimizers\__init__.py:309: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding (Embedding)       (None, 100, 100)          5143000   
                                                                 
 lstm (LSTM)                 (None, 100)               80400     
                                                                 
 dense (Dense)               (None, 10)                1010      
                                                                 
 dense_1 (Dense)             (None, 3)                 33        
                                                                 
=================================================================
Total params: 5224443 (19.93 MB)
Trainable params: 5224443 (19.93 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

The first layer is the Embedded layer that uses 100 length vectors to represent each word. The next layer is the LSTM layer with 100 memory units (smart neurons!). Finally, because this is a classification problem we use a Dense output layer with 5 neurons and a softmax activation function to make 0 or 1 predictions for the five classes.

7. Fit the model in 5 epochs.

In [16]:
history_rnn = model_rnn.fit(X_train, y_train,
                        epochs = 5,
                        verbose = True,
                        validation_split = 0.1,
                        batch_size = 64)
Epoch 1/5
WARNING:tensorflow:From C:\Python\Python311\Lib\site-packages\keras\src\utils\tf_utils.py:492: The name tf.ragged.RaggedTensorValue is deprecated. Please use tf.compat.v1.ragged.RaggedTensorValue instead.

WARNING:tensorflow:From C:\Python\Python311\Lib\site-packages\keras\src\engine\base_layer_utils.py:384: The name tf.executing_eagerly_outside_functions is deprecated. Please use tf.compat.v1.executing_eagerly_outside_functions instead.

2269/2269 [==============================] - 322s 141ms/step - loss: 0.6573 - accuracy: 0.7354 - val_loss: 0.6459 - val_accuracy: 0.7620
Epoch 2/5
2269/2269 [==============================] - 318s 140ms/step - loss: 0.5227 - accuracy: 0.8063 - val_loss: 0.4845 - val_accuracy: 0.8244
Epoch 3/5
2269/2269 [==============================] - 340s 150ms/step - loss: 0.4252 - accuracy: 0.8478 - val_loss: 0.4525 - val_accuracy: 0.8359
Epoch 4/5
2269/2269 [==============================] - 351s 155ms/step - loss: 0.3698 - accuracy: 0.8691 - val_loss: 0.4266 - val_accuracy: 0.8474
Epoch 5/5
2269/2269 [==============================] - 348s 153ms/step - loss: 0.3216 - accuracy: 0.8869 - val_loss: 0.4210 - val_accuracy: 0.8520

8. Evaluate the accuracy of your model on the test set and plot the history of fit.

In [17]:
loss, accuracy = model_rnn.evaluate(X_test, y_test, verbose = True)
print("Testing Accuracy:  {:.4f}".format(accuracy))
plot_history(history_rnn, val = 1)
1681/1681 [==============================] - 23s 13ms/step - loss: 0.4144 - accuracy: 0.8542
Testing Accuracy:  0.8542
No description has been provided for this image

Convolutional neural networks¶

Convolutional neural networks or also called convnets are one of the most exciting developments in machine learning in recent years. They have revolutionized image classification and computer vision by being able to extract features from images and using them in neural networks. The properties that made them useful in image processing makes them also handy for sequence processing. When you are working with sequential data, like text, you work with one dimensional convolutions, but the idea and the application stays the same.

9. Build a neural network model with an convolution layer (Conv1D) of 128 units and window size of 5. As before, the first layer should be an embedding layer, then the CNN layers, a Dense layer, and the output Dense layer for the 5 news categories. Do you also need a pooling layer? Compile the model and print its summary.

In [18]:
model_cnn = Sequential()
model_cnn.add(layers.Embedding(vocab_size, embedding_dim, input_length = maxlen))
model_cnn.add(layers.Conv1D(128, 5, activation = 'relu'))
model_cnn.add(layers.GlobalMaxPooling1D())
model_cnn.add(layers.Dense(10, activation = 'relu'))
model_cnn.add(layers.Dense(3, activation = 'softmax'))
model_cnn.compile(optimizer = 'adam',
                  loss = 'binary_crossentropy',
                  metrics = ['accuracy'])
model_cnn.summary()
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_1 (Embedding)     (None, 100, 100)          5143000   
                                                                 
 conv1d (Conv1D)             (None, 96, 128)           64128     
                                                                 
 global_max_pooling1d (Glob  (None, 128)               0         
 alMaxPooling1D)                                                 
                                                                 
 dense_2 (Dense)             (None, 10)                1290      
                                                                 
 dense_3 (Dense)             (None, 3)                 33        
                                                                 
=================================================================
Total params: 5208451 (19.87 MB)
Trainable params: 5208451 (19.87 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

10. Fit the model in 5 epochs, and evaluate the accuracy of the test data using your model. Plot the history of the fit.

In [19]:
history_cnn = model_cnn.fit(X_train, y_train,
                        epochs = 5,
                        verbose = True,
                        validation_split = 0.1,
                        batch_size = 64)
Epoch 1/5
2269/2269 [==============================] - 96s 42ms/step - loss: 0.3115 - accuracy: 0.8016 - val_loss: 0.2689 - val_accuracy: 0.8365
Epoch 2/5
2269/2269 [==============================] - 92s 40ms/step - loss: 0.2120 - accuracy: 0.8792 - val_loss: 0.2328 - val_accuracy: 0.8667
Epoch 3/5
2269/2269 [==============================] - 91s 40ms/step - loss: 0.1443 - accuracy: 0.9185 - val_loss: 0.2198 - val_accuracy: 0.8791
Epoch 4/5
2269/2269 [==============================] - 92s 40ms/step - loss: 0.0925 - accuracy: 0.9464 - val_loss: 0.2446 - val_accuracy: 0.8877
Epoch 5/5
2269/2269 [==============================] - 91s 40ms/step - loss: 0.0567 - accuracy: 0.9708 - val_loss: 0.2642 - val_accuracy: 0.8923
In [20]:
loss, accuracy = model_cnn.evaluate(X_test, y_test, verbose = True)
print("Testing Accuracy:  {:.4f}".format(accuracy))
plot_history(history_cnn, val = 1)
1681/1681 [==============================] - 5s 3ms/step - loss: 0.2556 - accuracy: 0.8944
Testing Accuracy:  0.8944
No description has been provided for this image

Bidirectional recurrent neural networks¶

Bidirectional recurrent neural networks are neural networks that process input data in both the forward and backward directions. This time we will create our arctitechture based on the GRU (Gated Recurrent Unit) cells. Introduced by Cho, et al. in 2014, GRU aims to solve the vanishing gradient problem which comes with a standard recurrent neural network. GRU's are a variation on the LSTM because both are designed similarly and, in some cases, produce equally excellent results. GRU's were designed to be simpler and faster than LSTM's and in most cases produce equally good results and thus there is no clear winner.

11. Repeat the analysis with a Bidirectional recurrent neural network using GRU.

In [21]:
model_brnn = Sequential()
model_brnn.add(layers.Embedding(vocab_size, embedding_dim, input_length=maxlen))
model_brnn.add(layers.Bidirectional(layers.GRU(300)))
model_brnn.add(layers.Dense(10, activation = 'relu'))
model_brnn.add(layers.Dense(3, activation = 'softmax'))
model_brnn.compile(optimizer = 'adam',
                   loss = 'categorical_crossentropy',
                   metrics = ['accuracy'])
model_brnn.summary()
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_2 (Embedding)     (None, 100, 100)          5143000   
                                                                 
 bidirectional (Bidirection  (None, 600)               723600    
 al)                                                             
                                                                 
 dense_4 (Dense)             (None, 10)                6010      
                                                                 
 dense_5 (Dense)             (None, 3)                 33        
                                                                 
=================================================================
Total params: 5872643 (22.40 MB)
Trainable params: 5872643 (22.40 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [22]:
history_brnn = model_brnn.fit(X_train, y_train,
                              epochs = 5,
                              verbose = True,
                              validation_split = 0.1,
                              batch_size = 64)
Epoch 1/5
2269/2269 [==============================] - 502s 220ms/step - loss: 0.5030 - accuracy: 0.8113 - val_loss: 0.4523 - val_accuracy: 0.8305
Epoch 2/5
2269/2269 [==============================] - 489s 216ms/step - loss: 0.3699 - accuracy: 0.8674 - val_loss: 0.4084 - val_accuracy: 0.8507
Epoch 3/5
2269/2269 [==============================] - 495s 218ms/step - loss: 0.2903 - accuracy: 0.8946 - val_loss: 0.3969 - val_accuracy: 0.8633
Epoch 4/5
2269/2269 [==============================] - 490s 216ms/step - loss: 0.2200 - accuracy: 0.9208 - val_loss: 0.3946 - val_accuracy: 0.8668
Epoch 5/5
2269/2269 [==============================] - 488s 215ms/step - loss: 0.1621 - accuracy: 0.9424 - val_loss: 0.4444 - val_accuracy: 0.8717
In [23]:
loss, accuracy = model_brnn.evaluate(X_test, y_test, verbose = True)
print("Testing Accuracy:  {:.4f}".format(accuracy))
plot_history(history_brnn, val = 1)
1681/1681 [==============================] - 68s 41ms/step - loss: 0.4249 - accuracy: 0.8758
Testing Accuracy:  0.8758
No description has been provided for this image

Hyperparameter optimization (optional)¶

One crucial steps of deep learning and working with neural networks is hyperparameter optimization. Hyperparameters are parameters that are chosen by the algorithm designer. Tuning them is very important! One popular method for hyperparameter optimization is grid search. What this method does is it takes lists of parameters and it runs the model with each parameter combination that it can find. It is the most thorough way but also the most computationally heavy way to do this. Another common way, random search, which you’ll see in action here, simply takes random combinations of parameters.

12. Write function for creating your cnn-based model which has the number of filters, kernel size, and embedding size as input arguments. Name your function create_model. For the rest follow the architecture of your previous cnn model.

In [24]:
def create_model(num_filters, kernel_size, embedding_dim):
    model = Sequential()
    model.add(layers.Embedding(vocab_size, embedding_dim, input_length = 100))
    model.add(layers.Conv1D(num_filters, kernel_size, activation = 'relu'))
    model.add(layers.GlobalMaxPooling1D())
    model.add(layers.Dense(10, activation = 'relu'))
    model.add(layers.Dense(3, activation = 'sigmoid'))
    model.compile(optimizer = 'adam',
                  loss = 'binary_crossentropy',
                  metrics = ['accuracy'])
    return model

13. Dictionary in Python is an ordered collection of data values. Use the dict structure to define your hyperparameters for the cnn model. You can include the number of filters and kernel size as examples.

In [25]:
param_grid = dict(num_filters = [32, 64],
                  kernel_size = [3, 5])

When constructing this structure you must provide a dictionary of hyperparameters to evaluate in the param_grid argument. This is a map of the model parameter name and an array of values to try.

14. Use the KerasClassifier from scikeras to create your model with the create_model function, 15 epochs, and batch_size of 64.

In [26]:
# Parameter grid for grid search
# Hyperparameters to be tuned need to be added as arguments to KerasClassifier from scikeras (https://adriangb.com/scikeras/stable/migration.html#default-arguments-in-build-fn-model)
model = KerasClassifier(model = create_model,
                        epochs = 5,
                        batch_size = 64,
                        num_filters = 32, # hyperparameter 1
                        kernel_size = 3, # hyperparameter 2
                        embedding_dim = 50, # hyperparameter 3
                        verbose = True)

15. Time to call the RandomizedSearchCV function. Use your model, your selected grid for hyperparameters, and 5-fold cross-validation.

In [27]:
grid = RandomizedSearchCV(estimator = model,
                          param_distributions = param_grid,
                          cv = 5,
                          n_jobs = -1,
                          verbose = 1,
                          n_iter = 2)

As you see in this function, there are more input arguments which you can set including n_jobs and n_iter. By default, the random search (or grid search) will only use one thread. By setting the n_jobs argument in the RandomizedSearchCV (or GridSearchCV) constructor to -1, the process will use all cores on your machine. It is also worth mentioning that by default, accuracy is the score that is optimized, but other scores can be specified in the score argument of the GridSearchCV constructor.

16. Fit your grid on X_train and y_train.

In [28]:
grid_result = grid.fit(X_train, y_train)
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Epoch 1/5
2521/2521 [==============================] - 49s 19ms/step - loss: 0.3137 - accuracy: 0.8123
Epoch 2/5
2521/2521 [==============================] - 48s 19ms/step - loss: 0.2236 - accuracy: 0.8721
Epoch 3/5
2521/2521 [==============================] - 48s 19ms/step - loss: 0.1717 - accuracy: 0.9023
Epoch 4/5
2521/2521 [==============================] - 50s 20ms/step - loss: 0.1302 - accuracy: 0.9228
Epoch 5/5
2521/2521 [==============================] - 47s 19ms/step - loss: 0.0984 - accuracy: 0.9414

17. Find the best scores and the best values for the hyperparameters.

In [29]:
print(grid_result.best_score_)
print(grid_result.best_params_)
0.8723844470072674
{'num_filters': 64, 'kernel_size': 3}

The best_score_ attribute provides access to the best score observed during the optimization procedure and the best_params_ attribute shows the combination of parameters that achieved the best results.

18. Evaluate the performance on the test set.

In [30]:
test_accuracy = grid.score(X_test, y_test)
test_accuracy
841/841 [==============================] - 2s 2ms/step
Out[30]:
0.8807610757727932

Now you can use the best hyperparameter values to build your final model.