Lecture 24

Prompts and In-Context Learning

1. Prompting as the Interface to LLMs

Prompting is the main way we “program” modern large language models (LLMs).
Instead of changing the model weights, we:

1.1 What is a Prompt?

A prompt is any text we feed into the model before it generates an output.
It can include:

The model then continues the text by sampling or choosing the most likely next tokens.

1.2 Prompting vs. Traditional Supervised Learning

Traditional supervised learning:

Prompting with a frozen LLM:

The key idea: a single pre-trained model can perform many different tasks just by changing the prompt.


2. Zero-Shot and Few-Shot Learning

2.1 Zero-Shot Learning

Zero-shot learning: the model is asked to perform a task without seeing any examples of that task in the prompt.

The prompt typically includes:

Example (question answering):

Answer the question using a short phrase.

Question: Where was Tom Brady born?
Answer:

The model uses its internal knowledge and language understanding to complete the answer, even though we never fine-tuned it specifically for this QA dataset.

A well-known illustration of zero-shot behavior comes from summarization with the prompt “TL;DR:”. Even without any fine-tuning or task-specific examples, GPT-2 can produce a reasonable one-sentence summary simply by appending “TL;DR:” to the end of an article. During pre-training, the model encountered this construction frequently on the web, where “TL;DR:” conventionally introduces a brief summary. As a result, a single well-chosen prompt is enough to activate this implicit knowledge and elicit summary-like behavior in a purely zero-shot setting.

2.2 Few-Shot Learning

Few-shot learning: we prepend a few input–output examples of the task directly into the prompt.

Example (sentiment classification):

Review: "This movie was amazing!"
Sentiment: positive

Review: "The plot was boring and predictable."
Sentiment: negative

Review: "Performances were mixed but I liked the atmosphere."
Sentiment:

The examples in the prompt implicitly define:

The model then “continues the pattern” for the new review.

2.3 Prompts as Task Descriptions

From the model’s perspective, all of these are just token sequences.
However, the pattern of tokens in the prompt:

As models scale, they increasingly succeed at zero- and few-shot tasks without any extra training, which is one of the surprising emergent abilities of LLMs.

Figure 1. GPT-2 performance on LAMBADA, CBT, and WikiText2 as model size increases (Radford et al., 2019).

2.4 Prompt design: good patterns and failure modes

In practice, prompt design matters a lot. Small changes in phrasing can lead to surprisingly large differences in performance.

Some helpful patterns:

Common failure modes:

Prompting is therefore both a powerful interface and a source of instability: good prompts can elicit impressive behavior, but brittle prompts can hide the model’s true capabilities or amplify biases.


3. In-Context Learning (ICL)

3.1 Definition

In-context learning is the phenomenon where a language model appears to “learn” a new mapping from examples provided in the context window, without explicitly updating its parameters.

We show the model a few examples ((xi, y_i)) in the prompt.
Then we give it a new (x
{\text{test}}) and ask it to generate (y_{\text{test}}).

All adaptation happens implicitly through the forward pass on the prompt.

This is sometimes described as “learning via inference rather than via gradient updates.”

3.2 Example: Translation via ICL

Translate English to French:

sea otter        => loutre de mer
peppermint       => menthe poivrée
plush giraffe    => girafe peluche
cheese           =>

The model:

No weights were updated; the examples were only in the prompt.

3.3 Relationship to Few-Shot Learning

“Few-shot prompting” and “in-context learning” are closely related:

Empirically:

Figure 2. Few-shot in-context learning on SuperGLUE: GPT-3 performance as the number of examples in context increases (Brown et al., 2020).

4. Chain-of-Thought (CoT) Prompting

4.1 Motivation

Standard prompts often ask the model to directly output an answer:

Q: 23 − 7 = ?
A:

This can work, but for more complex reasoning (multi-step arithmetic, logic puzzles, word problems), models may:

Chain-of-thought (CoT) prompting encourages the model to output the reasoning steps, not just the final answer.

4.2 Example of CoT

Instead of:

Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls.
Each can has 3 tennis balls. How many tennis balls does he now have?
A:

We prompt the model to reason step-by-step:

Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls.
Each can has 3 tennis balls. How many tennis balls does he now have?

A: Let's think step by step.
Roger has 5 tennis balls.
He buys 2 cans, each with 3 balls, so that is 2 * 3 = 6 balls.
In total, he has 5 + 6 = 11 balls.
The answer is 11.

When given more problems with this style of answer, the model tends to:

4.3 Why CoT Helps

Chain-of-thought prompting:

However:


5. Reasoning Models

The lecture also touches on reasoning models, which are architectures and training methods explicitly designed to improve reasoning ability.

5.1 Limitations of Vanilla LLMs

Base LLMs:

They can still show surprising reasoning behaviors, but:

This motivates specialized designs for structured reasoning.

5.2 Ideas Behind Reasoning Models

Reasoning models often introduce one or more of:

The general theme: augment LLMs so that reasoning paths become more reliable, interpretable, and verifiable compared to naive next-token prediction.

Figure 3. Illustration of an explicit reasoning language model (RLM) with separate components for data, models, and operators.

6. Soft Prompting

6.1 From Hard Prompts to Soft Prompts

So far, we considered hard prompts:

Soft prompting replaces or augments these with learned prompt vectors:

6.2 How Soft Prompting Works (High Level)

Conceptually:

  1. Take the pre-trained model and freeze its weights.
  2. Introduce a new set of parameters: a sequence of (k) prompt embeddings
    ({p_1, \dots, p_k}).
  3. For each training example, feed the model the concatenation:
    ([p_1, \dots, p_k, x_1, x_2, \dots, x_T])
    where (x_i) are the embeddings of the input tokens.
  4. Train only the prompt embeddings to minimize a task loss
    (e.g., classification or generation objective).

Result:

6.3 Benefits of Soft Prompting

Soft prompting is an example of parameter-efficient adaptation:

Trade-offs:

Figure 4. Soft prompting: learned prompt embeddings prepended as virtual tokens and optimized while keeping the decoder-only Transformer frozen.

7. Key Takeaways


Reading: Li & Liang (2021), “Prefix-Tuning: Optimizing Continuous Prompts for Generation”