In this practical, we will try RNN and CNN deep learning architectures. We will work with the famous 20 Newsgroups dataset from the sklearn library to apply deep learning models using the Keras package.
Today we will use the following libraries. Take care to have them installed!
#!pip install -q scikeras
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import RandomizedSearchCV
from sklearn.datasets import fetch_20newsgroups
from sklearn.preprocessing import LabelEncoder
from keras.utils import pad_sequences
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras import layers, utils
from scikeras.wrappers import KerasClassifier
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
First run the following lines to make your data ready for the models (we used this code in the previous practical):
# select couple of the categories in 20newsgroups
categories = ['rec.sport.hockey', 'talk.politics.mideast', 'soc.religion.christian', 'comp.graphics', 'sci.med']
# fetch the training set
twenty_train = fetch_20newsgroups(subset='train', remove=('headers', 'footers', 'quotes'),
categories=categories, shuffle=True, random_state=321)
# fetch the test set
twenty_test = fetch_20newsgroups(subset='test', remove=('headers', 'footers', 'quotes'),
categories=categories, shuffle=True, random_state=321)
# convert to a dataframe
df_train = pd.DataFrame(list(zip(twenty_train.data, twenty_train.target)), columns=['text', 'label'])
df_test = pd.DataFrame(list(zip(twenty_test.data, twenty_test.target)), columns=['text', 'label'])
# tokenizer from keras
tokenizer = Tokenizer(num_words=20000)
tokenizer.fit_on_texts(df_train.text.values)
X_train = tokenizer.texts_to_sequences(df_train.text.values)
X_test = tokenizer.texts_to_sequences(df_test.text.values)
vocab_size = len(tokenizer.word_index) + 1 # Adding 1 because of reserved 0 index for sequence padding
# pad sequence
maxlen = 100
X_train = pad_sequences(X_train, padding='post', maxlen=maxlen)
X_test = pad_sequences(X_test, padding='post', maxlen=maxlen)
# Encode the list of newsgroups into categorical integer values
lb = LabelEncoder()
y = lb.fit_transform(df_train.label.values)
y_train = utils.np_utils.to_categorical(y)
y = lb.transform(df_test.label.values)
y_test = utils.np_utils.to_categorical(y)
And here the code to a function for plotting the history of training for our neural network model:
plt.style.use('ggplot')
def plot_history(history, val=0):
acc = history.history['accuracy']
if val == 1:
val_acc = history.history['val_accuracy'] # we can add a validation set in our fit function with nn
loss = history.history['loss']
if val == 1:
val_loss = history.history['val_loss']
x = range(1, len(acc) + 1)
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(x, acc, 'b', label='Training accuracy')
if val == 1:
plt.plot(x, val_acc, 'r', label='Validation accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.title('Accuracy')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(x, loss, 'b', label='Training loss')
if val == 1:
plt.plot(x, val_loss, 'r', label='Validation loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.title('Loss')
plt.legend()
A recurrent neural network (RNN) is a natural generalization of feed-forward neural networks to sequence data such as text. In contrast to a feed-forward neural network, however, it accepts a new input at every time step (layer). Long-short term memory (LSTM) networks are a variant of RNNs. The LSTM introduces mechanisms to decide what should be remembered and what should be forgotten in learning from text documents.
1. Build a neural network model with an LSTM layer of 100 units. As before, the first layer should be an embedding layer, then the LSTM layer, a Dense layer, and the output Dense layer for the 5 news categories. Compile the model and print its summary.
from numpy.random import seed
seed(1)
import tensorflow
tensorflow.random.set_seed(2)
embedding_dim = 100
model = Sequential()
model.add(layers.Embedding(vocab_size, embedding_dim, input_length=maxlen))
model.add(layers.LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(5, activation='softmax'))
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.summary()
Model: "sequential_5" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding_5 (Embedding) (None, 100, 100) 3811100 lstm_2 (LSTM) (None, 100) 80400 dense_10 (Dense) (None, 10) 1010 dense_11 (Dense) (None, 5) 55 ================================================================= Total params: 3,892,565 Trainable params: 3,892,565 Non-trainable params: 0 _________________________________________________________________
The first layer is the Embedded layer that uses 100 length vectors to represent each word. The next layer is the LSTM layer with 100 memory units (smart neurons!). Finally, because this is a classification problem we use a Dense output layer with 5 neurons and a softmax activation function to make 0 or 1 predictions for the five classes.
2.Fit the model in 5 epochs.
import random
import numpy as np
import tensorflow as tf
seed = 137
random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)
history = model.fit(X_train, y_train,
epochs=5,
verbose=False,
validation_data=(X_test, y_test),
batch_size=64)
3.Evaluate the accuracy of training and test data using your model and plot the history of fit.
loss, accuracy = model.evaluate(X_train, y_train, verbose=False)
print("Training Accuracy: {:.4f}".format(accuracy))
loss, accuracy = model.evaluate(X_test, y_test, verbose=False)
print("Testing Accuracy: {:.4f}".format(accuracy))
plot_history(history, val=1)
Training Accuracy: 0.6491 Testing Accuracy: 0.5480
Below we trained the model for 20 epochs. Check the results:
model = Sequential()
model.add(layers.Embedding(vocab_size, embedding_dim, input_length=maxlen))
model.add(layers.LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(5, activation='softmax'))
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(X_train, y_train,
epochs=20,
verbose=False,
validation_data=(X_test, y_test),
batch_size=64)
loss, accuracy = model.evaluate(X_train, y_train, verbose=False)
print("Training Accuracy: {:.4f}".format(accuracy))
loss, accuracy = model.evaluate(X_test, y_test, verbose=False)
print("Testing Accuracy: {:.4f}".format(accuracy))
plot_history(history, val=1)
Training Accuracy: 0.8552 Testing Accuracy: 0.7319
Convolutional neural networks or also called convnets are one of the most exciting developments in machine learning in recent years. They have revolutionized image classification and computer vision by being able to extract features from images and using them in neural networks. The properties that made them useful in image processing makes them also handy for sequence processing. When you are working with sequential data, like text, you work with one dimensional convolutions, but the idea and the application stays the same.
4. Build a neural network model with an convolution layer (Conv1D) of 128 units and window size of 5. As before, the first layer should be an embedding layer, then the CNN layers, a Dense layer, and the output Dense layer for the 5 news categories. Do you also need a pooling layer? Compile the model and print its summary.
model = Sequential()
model.add(layers.Embedding(vocab_size, embedding_dim, input_length=maxlen))
model.add(layers.Conv1D(128, 5, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(5, activation='softmax'))
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
Model: "sequential_2" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding_2 (Embedding) (None, 100, 100) 3811100 conv1d (Conv1D) (None, 96, 128) 64128 global_max_pooling1d (Globa (None, 128) 0 lMaxPooling1D) dense_4 (Dense) (None, 10) 1290 dense_5 (Dense) (None, 5) 55 ================================================================= Total params: 3,876,573 Trainable params: 3,876,573 Non-trainable params: 0 _________________________________________________________________
5. Fit the model in 5 epochs, and evaluate the accuracy of the training and test data using your model. Plot the history of the fit.
history = model.fit(X_train, y_train,
epochs=5,
verbose=False,
validation_data=(X_test, y_test),
batch_size=64)
loss, accuracy = model.evaluate(X_train, y_train, verbose=False)
print("Training Accuracy: {:.4f}".format(accuracy))
loss, accuracy = model.evaluate(X_test, y_test, verbose=False)
print("Testing Accuracy: {:.4f}".format(accuracy))
plot_history(history, val=1)
Training Accuracy: 0.9735 Testing Accuracy: 0.8325
Below we trained the model for 10 epochs. Check the results:
model = Sequential()
model.add(layers.Embedding(vocab_size, embedding_dim, input_length=maxlen))
model.add(layers.Conv1D(128, 5, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(20, activation='relu'))
model.add(layers.Dense(5, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
history = model.fit(X_train, y_train,
epochs=10,
verbose=False,
validation_data=(X_test, y_test),
batch_size=64)
loss, accuracy = model.evaluate(X_train, y_train, verbose=False)
print("Training Accuracy: {:.4f}".format(accuracy))
loss, accuracy = model.evaluate(X_test, y_test, verbose=False)
print("Testing Accuracy: {:.4f}".format(accuracy))
plot_history(history, val=1)
6. How do you compare the performance of the two models also with regard to 5 and 20 iterations?
Look at the plots and discuss your ideas!
One crucial steps of deep learning and working with neural networks is hyperparameter optimization. Hyperparameters are parameters that are chosen by the algorithm designer. Tuning them is very important! One popular method for hyperparameter optimization is grid search. What this method does is it takes lists of parameters and it runs the model with each parameter combination that it can find. It is the most thorough way but also the most computationally heavy way to do this. Another common way, random search, which you’ll see in action here, simply takes random combinations of parameters.
7. Write function for creating your cnn-based model which has the number of filters, kernel size, and embedding size as input arguments. Name your function create_model
. For the rest follow the architecture of your previous cnn model.
def create_model(num_filters, kernel_size, embedding_dim):
model = Sequential()
model.add(layers.Embedding(vocab_size, embedding_dim, input_length=100))
model.add(layers.Conv1D(num_filters, kernel_size, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(5, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
return model
8. Dictionary in Python is an ordered collection of data values. Use the dict structure to define your hyperparameters for the cnn model. You can include the number of filters, kernel size, and embedding size.
param_grid = dict(num_filters=[32, 64, 128],
kernel_size=[3, 5, 7],
embedding_dim=[50, 100])
When constructing this structure you must provide a dictionary of hyperparameters to evaluate in the param_grid argument. This is a map of the model parameter name and an array of values to try.
9. Use the KerasClassifier
from scikeras
to create your model with the create_model
function, 15 epochs, and batch_size
of 64.
# Parameter grid for grid search
# Hyperparameters to be tuned need to be added as arguments to KerasClassifier from scikeras (https://adriangb.com/scikeras/stable/migration.html#default-arguments-in-build-fn-model)
model = KerasClassifier(model=create_model,
epochs = 15,
batch_size=64,
num_filters = 32, # hyperparameter 1
kernel_size = 3, # hyperparameter 2
embedding_dim = 50, # hyperparameter 3
verbose=True)
10. Time to call the RandomizedSearchCV
function. Use your model, your selected grid for hyperparameters, and 5-fold cross-validation.
grid = RandomizedSearchCV(estimator=model,
param_distributions=param_grid,
cv=5,
n_jobs=-1,
verbose=1,
n_iter=2)
As you see in this function, there are more input arguments which you can set including n_jobs
and n_iter
. By default, the random search (or grid search) will only use one thread. By setting the n_jobs
argument in the RandomizedSearchCV
(or GridSearchCV
) constructor to -1, the process will use all cores on your machine.
It is also worth mentioning that by default, accuracy is the score that is optimized, but other scores can be specified in the score
argument of the GridSearchCV
constructor.
11. Fit your grid on X_train
and y_train
.
grid_result = grid.fit(X_train, y_train)
Fitting 5 folds for each of 2 candidates, totalling 10 fits Epoch 1/15 46/46 [==============================] - 2s 31ms/step - loss: 0.6099 - accuracy: 0.2047 Epoch 2/15 46/46 [==============================] - 1s 32ms/step - loss: 0.5030 - accuracy: 0.2897 Epoch 3/15 46/46 [==============================] - 1s 30ms/step - loss: 0.4478 - accuracy: 0.4604 Epoch 4/15 46/46 [==============================] - 2s 41ms/step - loss: 0.3578 - accuracy: 0.7001 Epoch 5/15 46/46 [==============================] - 2s 33ms/step - loss: 0.2618 - accuracy: 0.8511 Epoch 6/15 46/46 [==============================] - 1s 32ms/step - loss: 0.1644 - accuracy: 0.9157 Epoch 7/15 46/46 [==============================] - 1s 31ms/step - loss: 0.0963 - accuracy: 0.9415 Epoch 8/15 46/46 [==============================] - 1s 32ms/step - loss: 0.0591 - accuracy: 0.9629 Epoch 9/15 46/46 [==============================] - 1s 31ms/step - loss: 0.0392 - accuracy: 0.9731 Epoch 10/15 46/46 [==============================] - 1s 31ms/step - loss: 0.0281 - accuracy: 0.9759 Epoch 11/15 46/46 [==============================] - 1s 32ms/step - loss: 0.0222 - accuracy: 0.9793 Epoch 12/15 46/46 [==============================] - 2s 43ms/step - loss: 0.0190 - accuracy: 0.9793 Epoch 13/15 46/46 [==============================] - 1s 32ms/step - loss: 0.0172 - accuracy: 0.9776 Epoch 14/15 46/46 [==============================] - 1s 31ms/step - loss: 0.0162 - accuracy: 0.9796 Epoch 15/15 46/46 [==============================] - 1s 31ms/step - loss: 0.0153 - accuracy: 0.9799
12. Find the best scores and the best values for the hyperparameters.
print(grid_result.best_score_)
print(grid_result.best_params_)
0.8510706489726619 {'num_filters': 128, 'kernel_size': 3, 'embedding_dim': 50}
The best_score_
attribute provides access to the best score observed during the optimization procedure and the best_params_
attribute shows the combination of parameters that achieved the best results.
13. Evaluate the performance on the test set.
test_accuracy = grid.score(X_test, y_test)
test_accuracy
31/31 [==============================] - 0s 6ms/step
0.8166496424923391
Now you can use the best hyperparameter values to build your final model. Do that as your fun homework!