For processing data with a grid-like or array topology:
1-D convolution: text data, sequence data, time-series data, sensor signal data
2-D convolution: image data
3-D convolution: video data
What hyperparameters do we have in a CNN model?
Main CNN idea for text:
Compute vectors for n-grams and group them afterwards
Example: “Utrecht summer school is in Utrecht” compute vectors for:
Utrecht summer, summer school, school is, is in, in Utrecht, Utrecht summer school, summer school is, school is in, is in Utrecht, Utrecht summer school is, summer school is in, school is in Utrecht, Utrecht summer school is in, summer school is in Utrecht, Utrecht summer school is in Utrecht
MR: Movie reviews with one sentence per review. Classification involves detecting positive/negative reviews (Pang and Lee, 2005). url: https://www.cs.cornell.edu/people/pabo/movie-review-data/
SST-1: Stanford Sentiment Treebank—an extension of MR but with train/dev/test splits provided and fine-grained labels (very positive, positive, neutral, negative, very negative), re-labeled by Socher et al. (2013). url: https://nlp.stanford.edu/sentiment/
SST-2: Same as SST-1 but with neutral reviews removed and binary labels.
Subj: Subjectivity dataset where the task is to classify a sentence as being subjective or objective (Pang and Lee, 2004).
TREC: TREC question dataset—task involves classifying a question into 6 question types (whether the question is about person, location, numeric information, etc.) (Li and Roth, 2002). url: https://cogcomp.seas.upenn.edu/Data/QA/QC/
CR: Customer reviews of various products (cameras, MP3s etc.). Task is to predict positive/negative reviews (Hu and Liu, 2004). url: https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
MPQA: Opinion polarity detection subtask of the MPQA dataset (Wiebe et al., 2005). url: https://mpqa.cs.pitt.edu/corpora/mpqa_corpus/
A transformer adopts an encoder-decoder architecture.
Transformers were developed to solve the problem of sequence transduction, or neural machine translation. That means any task that transforms an input sequence to an output sequence.
More details on the architecture and implementation:
ChatGPT: https://chat.openai.com/
Write with Transformer: https://transformer.huggingface.co/
Talk to Transformer: https://app.inferkit.com/demo
Transformer model for language understanding: https://www.tensorflow.org/text/tutorials/transformer
Pretrained models: https://huggingface.co/transformers/pretrained_models.html
Go to https://chat.openai.com/ and login
How many hyperparameters has chatgpt-3 model been trained on?
How many hyperparameters has chatgpt-4 model been trained on?
What is the next generation NLP?
Build a neural network model with an LSTM layer of 100 units in Keras. As before, the first layer should be an embedding layer, then the LSTM layer, a Dense layer, and the output Dense layer for the 5 news categories. Compile the model and print its summary.
Can you make it functional keras?
Convolutional Neural Networks
A transformer is a type of model architecture, while a large language model (LLM) refers to a model that is typically built using such architectures and is trained on a large corpus of text.
“Small” models like BERT have become general tools in a wide range of settings
GPT-3 has 175 billion parameters
These models are still not well-understood