Output generation

Softmax
Linear

You are here: The Goal. The standard GPT model doesn't just read; it writes. It does this one word at a time in an auto-regressive cycle.

  • Auto-Regressive: Fancy word for "feeds its own output back in as input".

Output generation

The final vector produced by the stack isn't a word yet; it's a rich numerical representation. To turn it back into text, we perform two final mathematical steps:

  1. The Unembedding (Linear Layer): We multiply the vector by a massive matrix (the decoder's vocabulary size 50,257 x the embedding size 12,288). This produces a distinct score (known as a logit) for every single possible word in the dictionary.
  2. Softmax Normalization: These raw scores are hard to interpret (some are negative, some huge). The Softmax function squashes them so they all add up to exactly 100% (1.0). This highlights the most likely next word while pushing unlikely ones to zero.

The result is a probability list (e.g., "cat: 60%", "dog: 30%"), from which we sample choices to generate the next token.

The Generation Loop

To write a sentence like "The cat sat on the mat", the model repeats these steps billions of times:

Input Context
The
Transformer Block
⚙️
(Attention + FFN)
Next Word Probabilities
cat
65%
dog
25%
quick
10%
Step 1 of 5

Conclusion

And that's it! The Transformer takes text, turns it into numbers, mixes context using Attention, and calculates the most likely next word. It does this over and over again to write entire essays.

Start Over