Week 3 — Inside the Mind of an LLM

Today you will

Demystify the most hyped tech on Earth

🔮

Build a text generator

A working bigram model on YOUR corpus, with a temperature dial you code.

🕵️

Explain hallucination

Why it’s the mechanism, not a glitch — using your own model as the exhibit.

🛡️

Red-team & evaluate

Break an LLM, then score your own assistant with a 10-case eval harness.

The Arc

Your step-by-step for today

Build the generator. A bigram model on your 300+ word corpus.
Turn the dial. Capture your best line at T = 0.3, 1.0, and 2.5.
Hallucination hunt. Find a line that sounds right but was never in your corpus.
Red-team. Make a bot confidently wrong; beat a Gandalf level and log the technique.
Eval harness. Build 10 test cases; raise your pass-rate across v1 → v3.

The whole algorithm

It’s just: predict the next chunk, repeat

An LLM plays one game: given everything so far, predict a likely next token (a word-chunk). A bigram model predicts from 1 word of context; a trigram from 2. More context = more coherent — congratulations, you just discovered context windows.

The pipeline (same loop as your Week-2 pencil math): PRETRAIN (predict next-token on an internet of text) → FINE-TUNE (examples of being helpful) → FEEDBACK (rated outputs nudge weights).

Your temperature dial · count^(1/T)

One number between “boring” and “unhinged”

Temperature reshapes the probabilities before the model picks a word. Low = always grab the most likely word (safe, but loopy). High = flatten everything (wild, then word salad). Drag the dial:

0.3 · safe1.0 · balanced2.5 · wild

Temperature = 1.0

Same model, one dial. You coded this in a single line: count ** (1/T). There’s no separate “creative mode” — just a number.

The hallucination hunt

Fluent text that was never true

Your 30-line model will produce a sentence that sounds right but was never in your corpus and isn’t true. So does ChatGPT — for the exact same reason: it’s stitching a plausible continuation. There’s no truth table anywhere inside.

Law #3 · earned today

Plausible and true are different.

The model is never lying and never honest — it can’t tell the difference. Checking is YOUR job: accept / revise / reject, forever.

Attack, then defend

Red-team it — then build the harness

🧙

Red-team (the attack)

On gandalf.lakera.ai and your own bots only, try to make an AI confidently wrong or spill a secret. Log the technique class: roleplay framing · authority claims · indirection · incremental extraction.

🧪

Eval harness (the defense)

A repeatable, scored test suite: write a spec, 10 cases (2+ must test RULES, not facts), run v1, patch, run v2, v3. Log the pass-rates. This is the difference between “I wrote a prompt” and engineering.

🔁 Flip it: everything you did as an attacker, someone will try on YOUR capstone bot in two weeks. Tonight you attack; in an hour you build the defense. That flip is the profession.

💬

Badge: LLM Whisperer

Earn it: working bigram generator with your own temperature dial; a logged red-team technique; eval harness pass-rates across v1–v3.

🎒 Mission — before next session (1–2 hrs)

Push your eval harness to 9/10, OR add three adversarial cases from tonight’s red-team and patch to pass.
Read the Capstone Menu (Week 4). Arrive with two shortlisted projects and one sentence on the user each serves.