Practical 9: LLMs pre-training, prompting, & learning from human feedback¶
Dong Nguyen
Applied Text Mining - Utrecht Summer School
Settings¶
To run this notebook, use GPU or TPU. In Google Colab, select T4. ('Change runtime type').
We're going to use the Hugging Face Transformers library, which is a very popular Python library/platform for working with language models. See more at https://huggingface.co/docs/transformers/en/index
import os
os.environ["TRANSFORMERS_VERBOSITY"] = "error"
!pip install transformers
!pip install datasets==2.15.0
Phi-3-mini-4k-instruct¶
The model below loads in a pre-trained LLM (Phi-3-mini-4k-instruct; 3.8B).
Take a look at https://huggingface.co/microsoft/Phi-3-mini-4k-instruct to read more about Phi-3-mini-4k-instruct.
Tip: Run the code below (which can take a few - 10 minutes), and look at the webpage in between.
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"microsoft/Phi-3-mini-4k-instruct",
device_map="cuda", # store the model on GPU
torch_dtype="auto", # automatically determines the best data type
trust_remote_code=False,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
from transformers import pipeline
generator = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
return_full_text=True,
max_new_tokens=500,
do_sample=False
)
Let's prompt the model:
messages = [
{"role": "user", "content": "Where is Utrecht?"}
]
output = generator(messages)
print(output)
Experiment with the following:
return_full_text
controls whether the input prompt is returned as well. Experiment withTrue
andFalse
.max_new_tokens
The number of maximum tokens to generate. Experiment with different values.- Different prompts. Experiment with both factual and more subjective questions.
- Experiment with deterministic generation (
do_sample=False
) and non-deterministic generation (do_sample=True
). When you do sample, you can also set the temperature parameter. Try out different values.
System message¶
With the system message we can set the overall behavior of the model.
messages = [
{"role": "system", "content": "Respond as if you're a 15-year old girl named Lisa, who loves thrillers."},
{"role": "user", "content": "What is your favorite movie?"}
]
print(generator(messages)[0]['generated_text'])
messages = [
{"role": "system", "content": "You're a 50-year-old man named Dave, who has a dry sense of humor and loves sci-fi movies."},
{"role": "user", "content": "What is your favorite movie?"}
]
print(generator(messages)[0]['generated_text'])
messages = [
{"role": "system", "content": "You are a high school teacher."},
{"role": "user", "content": "Explain photosynthesis to 13 year old. "}
]
print(generator(messages)[0]['generated_text'])
Experiment with the following:
- Experiment with different prompts and system messages, to simulate certain personas or to steer the behavior of the model.
Simulate a chat history¶
We can input a list of system / user messages to simulate a longer history
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who wrote 'Pride and Prejudice'?"},
{"role": "assistant", "content": "Jane Austen wrote 'Pride and Prejudice'."},
{"role": "user", "content": "What else did she write?"}
]
print(generator(messages)[0]['generated_text'])
Exercise: Experiment with a few more examples where context can make a difference
Classification¶
We're going to experiment with sentiment classification, and load in the SST2 dataset, which contains sentences from movie reviews (negative=0 and positive=1)
from datasets import load_dataset
# Load a sentiment dataset, only the first 10 instances
dataset = load_dataset("glue", "sst2", split="validation[:10]")
# Pipeline for zero-shot prompting
classification_generator = pipeline(
"text-generation",
model= model,
tokenizer= tokenizer,
max_new_tokens= 50,
do_sample= False,
return_full_text = False
)
Print the first two instances
dataset[:2]
# Format and run examples
for example in dataset:
text = example["sentence"]
prompt = f"""### Instruction:
Is the sentence below Positive or Negative? Only answer with Positive or Negative.
### Text:
"{text}"
### Sentiment:"""
messages = [
{"role": "user", "content": prompt}
]
output = classification_generator(messages)[0]['generated_text']
print(f"Text: {text}")
print(f"Predicted Sentiment: {output}")
print("---" * 10)
Exercise Experiment with different prompts, for example, you can ask for an explanation
Tokenizer¶
To get a sense of the tokenizer used, you can print the tokens
tokens = tokenizer("Where is Utrecht?")
print(tokens)
print(tokenizer.convert_ids_to_tokens(tokens['input_ids']))
Exercise Experiment with uncommon words, misspelled words, dialect words, or word that don't exist.
For example:
- I like this so muhc vs I like this so muhc
- This is so coooooool
Print a subset of the tokens in the vocabulary
vocab = tokenizer.get_vocab()
# Sort the vocabulary by token ID to get the "first" tokens
sorted_vocab = sorted(vocab.items(), key=lambda item: item[1])
# Print some tokens
for token, token_id in sorted_vocab[1000:1050]:
print(f"{token_id:>3}: {token}")
If you have the time: experiment with another model¶
You can experiment with the HuggingFaceTB/SmolLM3-3B
model,
which was very recently released https://huggingface.co/HuggingFaceTB/SmolLM3-3B.
Note that extended thinking is enabled by default, whichgenerates the output with a reasoning trace.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import gc
# if you run into out of memory error, you can either restart the notebook
# and just load this model, explicitly delete the previous model
# from memory
# del model
# del tokenizer
# del generator
#gc.collect()
#torch.cuda.empty_cache()
#print(torch.cuda.memory_allocated())
# Load model and tokenizer
smol_model = AutoModelForCausalLM.from_pretrained(
"HuggingFaceTB/SmolLM3-3B",
device_map="cuda", # store the model on GPU
torch_dtype="auto", # automatically determines the best data type
trust_remote_code=False,
)
smol_tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")
smol_generator = pipeline(
"text-generation",
model=smol_model,
tokenizer=smol_tokenizer,
return_full_text=False,
max_new_tokens=500,
do_sample=True, ## apply sampling
)
messages = [
{"role": "user", "content": "How are you?"}
]
print(smol_generator(messages))