Practical 7: RNN vs CNN¶

Ayoub Bagheri¶

logo

Applied Text Mining - Utrecht Summer School¶

In this practical, we will try RNN and CNN deep learning architectures. We will work with the famous 20 Newsgroups dataset from the sklearn library to apply deep learning models using the Keras package.

Today we will use the following libraries. Take care to have them installed!

In [18]:
#!pip install -q scikeras

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import RandomizedSearchCV
from sklearn.datasets import fetch_20newsgroups
from sklearn.preprocessing import LabelEncoder

from keras.utils import pad_sequences
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras import layers, utils

from scikeras.wrappers import KerasClassifier

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

First run the following lines to make your data ready for the models (we used this code in the previous practical):

In [19]:
# select couple of the categories in 20newsgroups
categories = ['rec.sport.hockey', 'talk.politics.mideast', 'soc.religion.christian', 'comp.graphics', 'sci.med']
# fetch the training set
twenty_train = fetch_20newsgroups(subset='train', remove=('headers', 'footers', 'quotes'),
                                  categories=categories, shuffle=True, random_state=321)
# fetch the test set
twenty_test = fetch_20newsgroups(subset='test', remove=('headers', 'footers', 'quotes'),
                                 categories=categories, shuffle=True, random_state=321)
# convert to a dataframe
df_train = pd.DataFrame(list(zip(twenty_train.data, twenty_train.target)), columns=['text', 'label'])
df_test = pd.DataFrame(list(zip(twenty_test.data, twenty_test.target)), columns=['text', 'label'])
# tokenizer from keras
tokenizer = Tokenizer(num_words=20000)
tokenizer.fit_on_texts(df_train.text.values)
X_train = tokenizer.texts_to_sequences(df_train.text.values)
X_test = tokenizer.texts_to_sequences(df_test.text.values)
vocab_size = len(tokenizer.word_index) + 1  # Adding 1 because of reserved 0 index for sequence padding
# pad sequence
maxlen = 100
X_train = pad_sequences(X_train, padding='post', maxlen=maxlen)
X_test = pad_sequences(X_test, padding='post', maxlen=maxlen)
# Encode the list of newsgroups into categorical integer values
lb = LabelEncoder()
y = lb.fit_transform(df_train.label.values)
y_train = utils.np_utils.to_categorical(y)
y = lb.transform(df_test.label.values)
y_test = utils.np_utils.to_categorical(y)

And here the code to a function for plotting the history of training for our neural network model:

In [20]:
plt.style.use('ggplot')
def plot_history(history, val=0):
    acc = history.history['accuracy']
    if val == 1:
        val_acc = history.history['val_accuracy'] # we can add a validation set in our fit function with nn
    loss = history.history['loss']
    if val == 1:
        val_loss = history.history['val_loss']
    x = range(1, len(acc) + 1)

    plt.figure(figsize=(12, 5))
    plt.subplot(1, 2, 1)
    plt.plot(x, acc, 'b', label='Training accuracy')
    if val == 1:
        plt.plot(x, val_acc, 'r', label='Validation accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.title('Accuracy')
    plt.legend()
    plt.subplot(1, 2, 2)
    plt.plot(x, loss, 'b', label='Training loss')
    if val == 1:
        plt.plot(x, val_loss, 'r', label='Validation loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.title('Loss')
    plt.legend()

Let's get started!¶

Recurrent neural networks¶

A recurrent neural network (RNN) is a natural generalization of feed-forward neural networks to sequence data such as text. In contrast to a feed-forward neural network, however, it accepts a new input at every time step (layer). Long-short term memory (LSTM) networks are a variant of RNNs. The LSTM introduces mechanisms to decide what should be remembered and what should be forgotten in learning from text documents.

1. Build a neural network model with an LSTM layer of 100 units. As before, the first layer should be an embedding layer, then the LSTM layer, a Dense layer, and the output Dense layer for the 5 news categories. Compile the model and print its summary.

In [21]:
from numpy.random import seed
seed(1)
import tensorflow
tensorflow.random.set_seed(2)

embedding_dim = 100
model = Sequential()
model.add(layers.Embedding(vocab_size, embedding_dim, input_length=maxlen))
model.add(layers.LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(5, activation='softmax'))
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.summary()
Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_5 (Embedding)     (None, 100, 100)          3811100   
                                                                 
 lstm_2 (LSTM)               (None, 100)               80400     
                                                                 
 dense_10 (Dense)            (None, 10)                1010      
                                                                 
 dense_11 (Dense)            (None, 5)                 55        
                                                                 
=================================================================
Total params: 3,892,565
Trainable params: 3,892,565
Non-trainable params: 0
_________________________________________________________________

The first layer is the Embedded layer that uses 100 length vectors to represent each word. The next layer is the LSTM layer with 100 memory units (smart neurons!). Finally, because this is a classification problem we use a Dense output layer with 5 neurons and a softmax activation function to make 0 or 1 predictions for the five classes.

2.Fit the model in 5 epochs.

In [22]:
import random
import numpy as np
import tensorflow as tf
seed = 137
random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)


history = model.fit(X_train, y_train,
                    epochs=5,
                    verbose=False,
                    validation_data=(X_test, y_test),
                    batch_size=64)

3.Evaluate the accuracy of training and test data using your model and plot the history of fit.

In [23]:
loss, accuracy = model.evaluate(X_train, y_train, verbose=False)
print("Training Accuracy: {:.4f}".format(accuracy))
loss, accuracy = model.evaluate(X_test, y_test, verbose=False)
print("Testing Accuracy:  {:.4f}".format(accuracy))
plot_history(history, val=1)
Training Accuracy: 0.6491
Testing Accuracy:  0.5480

Below we trained the model for 20 epochs. Check the results:

In [7]:
model = Sequential()
model.add(layers.Embedding(vocab_size, embedding_dim, input_length=maxlen))
model.add(layers.LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(5, activation='softmax'))
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
history = model.fit(X_train, y_train,
                    epochs=20,
                    verbose=False,
                    validation_data=(X_test, y_test),
                    batch_size=64)
loss, accuracy = model.evaluate(X_train, y_train, verbose=False)
print("Training Accuracy: {:.4f}".format(accuracy))
loss, accuracy = model.evaluate(X_test, y_test, verbose=False)
print("Testing Accuracy:  {:.4f}".format(accuracy))
plot_history(history, val=1)
Training Accuracy: 0.8552
Testing Accuracy:  0.7319

Convolutional neural networks¶

Convolutional neural networks or also called convnets are one of the most exciting developments in machine learning in recent years. They have revolutionized image classification and computer vision by being able to extract features from images and using them in neural networks. The properties that made them useful in image processing makes them also handy for sequence processing. When you are working with sequential data, like text, you work with one dimensional convolutions, but the idea and the application stays the same.

4. Build a neural network model with an convolution layer (Conv1D) of 128 units and window size of 5. As before, the first layer should be an embedding layer, then the CNN layers, a Dense layer, and the output Dense layer for the 5 news categories. Do you also need a pooling layer? Compile the model and print its summary.

In [8]:
model = Sequential()
model.add(layers.Embedding(vocab_size, embedding_dim, input_length=maxlen))
model.add(layers.Conv1D(128, 5, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(5, activation='softmax'))
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
model.summary()
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_2 (Embedding)     (None, 100, 100)          3811100   
                                                                 
 conv1d (Conv1D)             (None, 96, 128)           64128     
                                                                 
 global_max_pooling1d (Globa  (None, 128)              0         
 lMaxPooling1D)                                                  
                                                                 
 dense_4 (Dense)             (None, 10)                1290      
                                                                 
 dense_5 (Dense)             (None, 5)                 55        
                                                                 
=================================================================
Total params: 3,876,573
Trainable params: 3,876,573
Non-trainable params: 0
_________________________________________________________________

5. Fit the model in 5 epochs, and evaluate the accuracy of the training and test data using your model. Plot the history of the fit.

In [9]:
history = model.fit(X_train, y_train,
                    epochs=5,
                    verbose=False,
                    validation_data=(X_test, y_test),
                    batch_size=64)
loss, accuracy = model.evaluate(X_train, y_train, verbose=False)
print("Training Accuracy: {:.4f}".format(accuracy))
loss, accuracy = model.evaluate(X_test, y_test, verbose=False)
print("Testing Accuracy:  {:.4f}".format(accuracy))
plot_history(history, val=1)
Training Accuracy: 0.9735
Testing Accuracy:  0.8325

Below we trained the model for 10 epochs. Check the results:

In [ ]:
model = Sequential()
model.add(layers.Embedding(vocab_size, embedding_dim, input_length=maxlen))
model.add(layers.Conv1D(128, 5, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(20, activation='relu'))
model.add(layers.Dense(5, activation='sigmoid'))
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
history = model.fit(X_train, y_train,
                    epochs=10,
                    verbose=False,
                    validation_data=(X_test, y_test),
                    batch_size=64)
loss, accuracy = model.evaluate(X_train, y_train, verbose=False)
print("Training Accuracy: {:.4f}".format(accuracy))
loss, accuracy = model.evaluate(X_test, y_test, verbose=False)
print("Testing Accuracy:  {:.4f}".format(accuracy))
plot_history(history, val=1)

6. How do you compare the performance of the two models also with regard to 5 and 20 iterations?

Look at the plots and discuss your ideas!

Hyperparameter Optimization¶

One crucial steps of deep learning and working with neural networks is hyperparameter optimization. Hyperparameters are parameters that are chosen by the algorithm designer. Tuning them is very important! One popular method for hyperparameter optimization is grid search. What this method does is it takes lists of parameters and it runs the model with each parameter combination that it can find. It is the most thorough way but also the most computationally heavy way to do this. Another common way, random search, which you’ll see in action here, simply takes random combinations of parameters.

7. Write function for creating your cnn-based model which has the number of filters, kernel size, and embedding size as input arguments. Name your function create_model. For the rest follow the architecture of your previous cnn model.

In [ ]:
def create_model(num_filters, kernel_size, embedding_dim):
    model = Sequential()
    model.add(layers.Embedding(vocab_size, embedding_dim, input_length=100))
    model.add(layers.Conv1D(num_filters, kernel_size, activation='relu'))
    model.add(layers.GlobalMaxPooling1D())
    model.add(layers.Dense(10, activation='relu'))
    model.add(layers.Dense(5, activation='sigmoid'))
    model.compile(optimizer='adam',
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
    return model

8. Dictionary in Python is an ordered collection of data values. Use the dict structure to define your hyperparameters for the cnn model. You can include the number of filters, kernel size, and embedding size.

In [12]:
param_grid = dict(num_filters=[32, 64, 128],
                  kernel_size=[3, 5, 7],
                  embedding_dim=[50, 100])

When constructing this structure you must provide a dictionary of hyperparameters to evaluate in the param_grid argument. This is a map of the model parameter name and an array of values to try.

9. Use the KerasClassifier from scikeras to create your model with the create_model function, 15 epochs, and batch_size of 64.

In [ ]:
# Parameter grid for grid search
# Hyperparameters to be tuned need to be added as arguments to KerasClassifier from scikeras (https://adriangb.com/scikeras/stable/migration.html#default-arguments-in-build-fn-model)
model = KerasClassifier(model=create_model,
                  epochs = 15,
                  batch_size=64,
                  num_filters = 32, # hyperparameter 1
                  kernel_size = 3, # hyperparameter 2
                  embedding_dim = 50, # hyperparameter 3
                  verbose=True)

10. Time to call the RandomizedSearchCV function. Use your model, your selected grid for hyperparameters, and 5-fold cross-validation.

In [14]:
grid = RandomizedSearchCV(estimator=model,
                          param_distributions=param_grid,
                          cv=5,
                          n_jobs=-1,
                          verbose=1,
                          n_iter=2)

As you see in this function, there are more input arguments which you can set including n_jobs and n_iter. By default, the random search (or grid search) will only use one thread. By setting the n_jobs argument in the RandomizedSearchCV (or GridSearchCV) constructor to -1, the process will use all cores on your machine. It is also worth mentioning that by default, accuracy is the score that is optimized, but other scores can be specified in the score argument of the GridSearchCV constructor.

11. Fit your grid on X_train and y_train.

In [15]:
grid_result = grid.fit(X_train, y_train)
Fitting 5 folds for each of 2 candidates, totalling 10 fits
Epoch 1/15
46/46 [==============================] - 2s 31ms/step - loss: 0.6099 - accuracy: 0.2047
Epoch 2/15
46/46 [==============================] - 1s 32ms/step - loss: 0.5030 - accuracy: 0.2897
Epoch 3/15
46/46 [==============================] - 1s 30ms/step - loss: 0.4478 - accuracy: 0.4604
Epoch 4/15
46/46 [==============================] - 2s 41ms/step - loss: 0.3578 - accuracy: 0.7001
Epoch 5/15
46/46 [==============================] - 2s 33ms/step - loss: 0.2618 - accuracy: 0.8511
Epoch 6/15
46/46 [==============================] - 1s 32ms/step - loss: 0.1644 - accuracy: 0.9157
Epoch 7/15
46/46 [==============================] - 1s 31ms/step - loss: 0.0963 - accuracy: 0.9415
Epoch 8/15
46/46 [==============================] - 1s 32ms/step - loss: 0.0591 - accuracy: 0.9629
Epoch 9/15
46/46 [==============================] - 1s 31ms/step - loss: 0.0392 - accuracy: 0.9731
Epoch 10/15
46/46 [==============================] - 1s 31ms/step - loss: 0.0281 - accuracy: 0.9759
Epoch 11/15
46/46 [==============================] - 1s 32ms/step - loss: 0.0222 - accuracy: 0.9793
Epoch 12/15
46/46 [==============================] - 2s 43ms/step - loss: 0.0190 - accuracy: 0.9793
Epoch 13/15
46/46 [==============================] - 1s 32ms/step - loss: 0.0172 - accuracy: 0.9776
Epoch 14/15
46/46 [==============================] - 1s 31ms/step - loss: 0.0162 - accuracy: 0.9796
Epoch 15/15
46/46 [==============================] - 1s 31ms/step - loss: 0.0153 - accuracy: 0.9799

12. Find the best scores and the best values for the hyperparameters.

In [24]:
print(grid_result.best_score_)
print(grid_result.best_params_)
0.8510706489726619
{'num_filters': 128, 'kernel_size': 3, 'embedding_dim': 50}

The best_score_ attribute provides access to the best score observed during the optimization procedure and the best_params_ attribute shows the combination of parameters that achieved the best results.

13. Evaluate the performance on the test set.

In [25]:
test_accuracy = grid.score(X_test, y_test)
test_accuracy
31/31 [==============================] - 0s 6ms/step
Out[25]:
0.8166496424923391

Now you can use the best hyperparameter values to build your final model. Do that as your fun homework!