Embeddings & Position

Input Embedding
You are here: The very first step. We take the raw text inputs and convert them into vectors (lists of numbers) so the math can begin. We also add the Positional Encoding here.

Note: For this demo, we use lists of size 8. Real models like GPT-3 use lists of size 12,288!

Meaning & Position

The Transformer needs two things to understand a word:
  1. Meaning: What does "cat" mean? (Embedding)
  2. Position: Where is it in the sentence? (Positional Encoding)

Select a word from our example to see its input representation:

1. Word Embedding

Lookup ID: 3
We use the Token ID (3) to find the corresponding row in the massive Learned Embedding Matrix.

E(x)
0.9 1
0.1
-0.5
0.2
0.8
-0.9
0.1
0.0 12,288
+

2. Positional Encoding

A formula-based vector unique to position #0.

PE(pos)
0.0 1
1.0
0.0
1.0
0.0
1.0
0.0
1.0 12,288
=

3. Model Input

The combination of meaning and order.

Input
0.9 1
1.1
-0.5
1.2
0.8
0.1
0.1
1.0 12,288

Why do we add them?

By adding the positional encoding to the embedding, we slightly "color" the meaning of the word with its position. "Cat" at position 0 looks slightly different from "Cat" at position 5, allowing the model to distinguish them while keeping the core meaning intact.