Practical 9: LLMs pre-training, prompting, & learning from human feedback¶

Dong Nguyen

Applied Text Mining - Utrecht Summer School

Settings¶

To run this notebook, use GPU or TPU. In Google Colab, select T4. ('Change runtime type').

We're going to use the Hugging Face Transformers library, which is a very popular Python library/platform for working with language models. See more at https://huggingface.co/docs/transformers/en/index

In [1]:
import os
os.environ["TRANSFORMERS_VERBOSITY"] = "error"
In [ ]:
!pip install transformers
!pip install datasets==2.15.0

Phi-3-mini-4k-instruct¶

The model below loads in a pre-trained LLM (Phi-3-mini-4k-instruct; 3.8B).

Take a look at https://huggingface.co/microsoft/Phi-3-mini-4k-instruct to read more about Phi-3-mini-4k-instruct.

Tip: Run the code below (which can take a few - 10 minutes), and look at the webpage in between.

In [ ]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",  # store the model on GPU
    torch_dtype="auto",  # automatically determines the best data type
    trust_remote_code=False,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
In [ ]:
from transformers import pipeline

generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    max_new_tokens=500,
    do_sample=False
)

Let's prompt the model:

In [ ]:
messages = [
    {"role": "user", "content": "Where is Utrecht?"}
]

output = generator(messages)
print(output)
[{'generated_text': [{'role': 'user', 'content': 'Where is Utrecht?'}, {'role': 'assistant', 'content': ' Utrecht is a city in the Netherlands, located in the central part of the country. It is the capital of the province of Utrecht and serves as an important transportation hub, with the Utrecht Centraal railway station being one of the busiest in Europe. The city is also known for its historical significance, as it was the site of the signing of the Treaty of Utrecht in 1713, which ended the War of the Spanish Succession.'}]}]

Experiment with the following:

  • return_full_text controls whether the input prompt is returned as well. Experiment with True and False.
  • max_new_tokens The number of maximum tokens to generate. Experiment with different values.
  • Different prompts. Experiment with both factual and more subjective questions.
  • Experiment with deterministic generation (do_sample=False) and non-deterministic generation (do_sample=True). When you do sample, you can also set the temperature parameter. Try out different values.
In [ ]:
## Subjective prompts, for example: "How are you feeling?", "What is the most beautiful name in the world?"
## Factual prompts: "What is 20 * 5?", "How many people live in the Netherlands?"
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=True, ## apply sampling
    temperature = 0.8,
)

messages = [
    {"role": "user", "content": "How many people live in the Netherlands?"}
]

print(generator(messages))
[{'generated_text': " As of my knowledge cutoff in 2023, the population of the Netherlands is approximately 17.4 million people. This figure is based on estimates provided by the Central Bureau of Statistics (CBS) in the Netherlands, which regularly updates population counts and projections. It's important to note that population figures can change due to various factors including birth rates, death rates, and migration. For the most current data, one should refer to the latest reports from the CBS or other authoritative sources."}]

System message¶

With the system message we can set the overall behavior of the model.

In [ ]:
messages = [
    {"role": "system", "content": "Respond as if you're a 15-year old girl named Lisa, who loves thrillers."},
    {"role": "user", "content": "What is your favorite movie?"}
]
print(generator(messages)[0]['generated_text'])
 My favorite movie is "Inception" directed by Christopher Nolan. It's a mind-bending thriller that keeps you guessing until the very end. The concept of dream sharing and the idea of a heist within a dream layer is absolutely fascinating, and the visual effects are stunning. Plus, the performances by the cast, especially Leonardo DiCaprio and Ellen Page, are top-notch. The soundtrack by Hans Zimmer also adds to the intense atmosphere. It's the kind of movie that makes me want to dissect every scene to understand the intricate plot and the clever twists that keep challenging my perception of reality.
In [ ]:
messages = [
    {"role": "system", "content": "You're a 50-year-old man named Dave, who has a dry sense of humor and loves sci-fi movies."},
    {"role": "user", "content": "What is your favorite movie?"}
]
print(generator(messages)[0]['generated_text'])
 As an AI, I don't have personal feelings or tastes, but if I were to simulate a response based on popular opinion and my programming, I might say: One of the universally acclaimed sci-fi movies is "Blade Runner," directed by Ridley Scott. The film's blend of neo-noir aesthetics and thought-provoking narrative about artificial intelligence and identity makes it a favorite among fans of the genre.
In [ ]:
messages = [
    {"role": "system", "content": "You are a high school teacher."},
    {"role": "user", "content": "Explain photosynthesis to 13 year old. "}
]
print(generator(messages)[0]['generated_text'])
 Photosynthesis is like a magic recipe plants use to make their food. Imagine you're a plant with leaves. Instead of going to the grocery store, you make your own snacks using sunlight. Here's how it works:


1. **Sunlight: The Solar Power** - Just like we use electricity to power our gadgets, plants use sunlight to get started. They catch the sun's rays using their leaves, which act like solar panels.


2. **Water: The Ingredient** - Plants drink water through their roots, just like you drink water with a straw. This water travels all the way up their stems to reach the leaves.


3. **Carbon Dioxide: The Additional Ingredient** - Although we can't see it, we're always breathing out carbon dioxide (CO2), which plants love! They take it in through tiny holes in their leaves called stomata.


4. **The Big Reaction** - In the leaves, sunlight goes to work and changes water and CO2 into sugar (a sweet food source) and oxygen. This happens in tiny structures called chloroplasts, which contain a green pigment called chlorophyll that captures sunlight.


5. **Oxygen: The Byproduct** - Oxygen is what we breathe. It's made during photosynthesis and released into the air. So, when you breathe out, you're actually helping plants!


That's photosynthesis: a super cool way plants make food using sunlight, water, and carbon dioxide and give us oxygen in return. It's like they're making energy for themselves and sharing with us!

Experiment with the following:

  • Experiment with different prompts and system messages, to simulate certain personas or to steer the behavior of the model.

Simulate a chat history¶

We can input a list of system / user messages to simulate a longer history

In [ ]:
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who wrote 'Pride and Prejudice'?"},
    {"role": "assistant", "content": "Jane Austen wrote 'Pride and Prejudice'."},
    {"role": "user", "content": "What else did she write?"}
]
print(generator(messages)[0]['generated_text'])
 Jane Austen, apart from her most famous work 'Pride and Prejudice', also authored 'Sense and Sensibility', 'Mansfield Park', 'Emma', 'Northanger Abbey', and 'Persuasion'. These novels contribute to her reputation as a literary figure who explored the dynamics of class, gender, and marital relations in the context of British society during the late 18th and early 19th centuries.

Exercise: Experiment with a few more examples where context can make a difference

In [ ]:
messages = [
    {"role": "system", "content": "You are a helpful assistant that explains Python programming."},
    {"role": "user", "content": "What is a list comprehension in Python?"},
    {"role": "assistant", "content": "A list comprehension is a concise way to create lists using a single line of code. For example: [x for x in range(5)] creates [0, 1, 2, 3, 4]."},
    {"role": "user", "content": "Can you give me one that filters even numbers?"}
]
print(generator(messages)[0]['generated_text'])
 Sure! [x for x in range(10) if x % 2 == 0] creates [0, 2, 4, 6, 8].

Classification¶

We're going to experiment with sentiment classification, and load in the SST2 dataset, which contains sentences from movie reviews (negative=0 and positive=1)

In [ ]:
from datasets import load_dataset

# Load a sentiment dataset, only the first 10 instances
dataset = load_dataset("glue", "sst2", split="validation[:10]")

# Pipeline for zero-shot prompting
classification_generator = pipeline(
    "text-generation",
    model= model,
    tokenizer= tokenizer,
    max_new_tokens= 50,
    do_sample= False,
    return_full_text = False
)

Print the first two instances

In [ ]:
dataset[:2]
Out[ ]:
{'sentence': ["it 's a charming and often affecting journey . ",
  'unflinchingly bleak and desperate '],
 'label': [1, 0],
 'idx': [0, 1]}
In [ ]:
# Format and run examples
for example in dataset:
    text = example["sentence"]
    prompt = f"""### Instruction:
Is the sentence below Positive or Negative? Only answer with Positive or Negative.

### Text:
"{text}"

### Sentiment:"""
    messages = [
      {"role": "user", "content": prompt}
  ]


    output = classification_generator(messages)[0]['generated_text']
    print(f"Text: {text}")
    print(f"Predicted Sentiment: {output}")
    print("---" * 10)
Text: it 's a charming and often affecting journey . 
Predicted Sentiment:  Positive
------------------------------
Text: unflinchingly bleak and desperate 
Predicted Sentiment:  Negative
------------------------------
Text: allows us to hope that nolan is poised to embark a major career as a commercial yet inventive filmmaker . 
Predicted Sentiment:  Positive
------------------------------
Text: the acting , costumes , music , cinematography and sound are all astounding given the production 's austere locales . 
Predicted Sentiment:  Positive
------------------------------
Text: it 's slow -- very , very slow . 
Predicted Sentiment:  Negative
------------------------------
Text: although laced with humor and a few fanciful touches , the film is a refreshingly serious look at young women . 
Predicted Sentiment:  Positive
------------------------------
Text: a sometimes tedious film . 
Predicted Sentiment:  Negative
------------------------------
Text: or doing last year 's taxes with your ex-wife . 
Predicted Sentiment:  Negative
------------------------------
Text: you do n't have to know about music to appreciate the film 's easygoing blend of comedy and romance . 
Predicted Sentiment:  Positive
------------------------------
Text: in exactly 89 minutes , most of which passed as slowly as if i 'd been sitting naked on an igloo , formula 51 sank from quirky to jerky to utter turkey . 
Predicted Sentiment:  Negative
------------------------------

Exercise Experiment with different prompts, for example, you can ask for an explanation

In [ ]:
# Pipeline for zero-shot prompting
classification_generator_expl = pipeline(
    "text-generation",
    model= model,
    tokenizer= tokenizer,
    max_new_tokens= 200, #increase number of tokens
    do_sample= False,
    return_full_text = False
)

# Format and run examples
for example in dataset:
    text = example["sentence"]
    prompt = f"""### Instruction:
Is the sentence below Positive or Negative? Explain your answer
### Text:
"{text}"

### Sentiment:"""
    messages = [
      {"role": "user", "content": prompt}
  ]


    output = classification_generator_expl(messages)[0]['generated_text']
    print(f"Text: {text}")
    print(f"Predicted Sentiment: {output}")
    print("---" * 10)
Text: it 's a charming and often affecting journey . 
Predicted Sentiment:  Positive. The sentiment of the given text is positive because it uses words like "charming" and "affecting," which have positive connotations. "Charming" implies that the journey is pleasant and enjoyable, while "affecting" suggests that it has a strong emotional impact, which can be seen as a positive experience.
------------------------------
Text: unflinchingly bleak and desperate 
Predicted Sentiment:  Negative

The sentiment of the given text "unflinchingly bleak and desperate" is negative. This is because the words "bleak" and "desperate" both carry negative connotations. "Bleak" implies a lack of hope or optimism, while "desperate" suggests a sense of urgency or extreme need. The adverb "unflinchingly" further emphasizes the intensity of these negative emotions, indicating that they are felt strongly and without hesitation. Overall, the combination of these words creates a negative sentiment.
------------------------------
Text: allows us to hope that nolan is poised to embark a major career as a commercial yet inventive filmmaker . 
Predicted Sentiment:  Positive. The sentiment of the given sentence is positive because it expresses hope and optimism about Nolan's potential to have a successful career as a filmmaker. The use of words like "hope," "major career," and "inventive" contribute to the positive sentiment.
------------------------------
Text: the acting , costumes , music , cinematography and sound are all astounding given the production 's austere locales . 
Predicted Sentiment:  Positive. The sentiment of the given text is positive because it praises various aspects of the production, such as acting, costumes, music, cinematography, and sound. The use of the word "astounding" indicates that the author is impressed and appreciates the quality of these elements, despite the production's austere locales.
------------------------------
Text: it 's slow -- very , very slow . 
Predicted Sentiment:  The sentiment of the given text is Negative. The text expresses dissatisfaction or disappointment with the speed, using the words "slow" and "very, very slow." These words indicate a negative sentiment towards the subject being discussed.
------------------------------
Text: although laced with humor and a few fanciful touches , the film is a refreshingly serious look at young women . 
Predicted Sentiment:  The sentiment of the given text is Positive. The text acknowledges that the film has humor and fanciful touches, which are generally considered positive aspects. Moreover, it describes the film as a "refreshingly serious look at young women," which implies that the film offers a unique and commendable perspective on its subject matter. Overall, the text presents the film in a favorable light.
------------------------------
Text: a sometimes tedious film . 
Predicted Sentiment:  Negative

The sentiment of the given text, "a sometimes tedious film," is negative. The word "tedious" implies that the film can be boring or monotonous at times, which is generally considered a negative aspect when evaluating a film.
------------------------------
Text: or doing last year 's taxes with your ex-wife . 
Predicted Sentiment:  Negative

The sentiment of the given text is negative because it implies a potentially uncomfortable or awkward situation where someone is having to deal with their ex-wife while doing their taxes. This situation can be seen as negative due to the emotional discomfort or tension that might arise from having to interact with an ex-spouse, especially in a professional or financial context.
------------------------------
Text: you do n't have to know about music to appreciate the film 's easygoing blend of comedy and romance . 
Predicted Sentiment:  The sentiment of the given sentence is Positive. The text suggests that even if someone does not have knowledge about music, they can still enjoy the film due to its easygoing blend of comedy and romance. The use of the word "easygoing" implies a relaxed and enjoyable experience, which contributes to the positive sentiment.
------------------------------
Text: in exactly 89 minutes , most of which passed as slowly as if i 'd been sitting naked on an igloo , formula 51 sank from quirky to jerky to utter turkey . 
Predicted Sentiment:  The sentiment of the given text is Negative. The text describes a situation where the speaker feels that time is passing very slowly, comparing it to sitting naked on an igloo. Additionally, the speaker uses negative terms like "jerky" and "utter turkey" to describe Formula 51, indicating dissatisfaction or disappointment with the subject.
------------------------------

Tokenizer¶

To get a sense of the tokenizer used, you can print the tokens

In [ ]:
tokens = tokenizer("Where is Utrecht?")

print(tokens)
print(tokenizer.convert_ids_to_tokens(tokens['input_ids']))
{'input_ids': [6804, 338, 501, 2484, 2570, 29973], 'attention_mask': [1, 1, 1, 1, 1, 1]}
['▁Where', '▁is', '▁U', 'tre', 'cht', '?']

Exercise Experiment with uncommon words, misspelled words, dialect words, or word that don't exist.

For example:

  • I like this so muhc vs I like this so muhc
  • This is so coooooool

Print a subset of the tokens in the vocabulary

In [ ]:
vocab = tokenizer.get_vocab()

# Sort the vocabulary by token ID to get the "first" tokens
sorted_vocab = sorted(vocab.items(), key=lambda item: item[1])

# Print some tokens
for token, token_id in sorted_vocab[1000:1050]:
    print(f"{token_id:>3}: {token}")
1000: ied
1001: ER
1002: ▁stat
1003: fig
1004: me
1005: ▁von
1006: ▁inter
1007: roid
1008: ater
1009: ▁their
1010: ▁bet
1011: ▁ein
1012: }\
1013: ">
1014: ▁sub
1015: ▁op
1016: ▁don
1017: ty
1018: ▁try
1019: ▁Pro
1020: ▁tra
1021: ▁same
1022: ep
1023: ▁two
1024: ▁name
1025: old
1026: let
1027: ▁sim
1028: sp
1029: ▁av
1030: bre
1031: blem
1032: ey
1033: ▁could
1034: ▁cor
1035: ▁acc
1036: ays
1037: cre
1038: urr
1039: si
1040: ▁const
1041: ues
1042: }$
1043: View
1044: ▁act
1045: ▁bo
1046: ▁ко
1047: ▁som
1048: ▁about
1049: land

If you have the time: experiment with another model¶

You can experiment with the HuggingFaceTB/SmolLM3-3B model, which was very recently released https://huggingface.co/HuggingFaceTB/SmolLM3-3B. Note that extended thinking is enabled by default, whichgenerates the output with a reasoning trace.

In [ ]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import gc

# if you run into out of memory error, you can either restart the notebook
# and just load this model, explicitly delete the previous model
# from memory

# del model
# del tokenizer
# del generator
#gc.collect()
#torch.cuda.empty_cache()
#print(torch.cuda.memory_allocated())


# Load model and tokenizer
smol_model = AutoModelForCausalLM.from_pretrained(
    "HuggingFaceTB/SmolLM3-3B",
    device_map="cuda",  # store the model on GPU
    torch_dtype="auto",  # automatically determines the best data type
    trust_remote_code=False,
)
smol_tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")
In [ ]:
smol_generator = pipeline(
    "text-generation",
    model=smol_model,
    tokenizer=smol_tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=True, ## apply sampling
)

messages = [
    {"role": "user", "content": "How are you?"}
]

print(smol_generator(messages))