Week 3 ยท Session 3 ยท 2 hours

Inside the Mind of an LLM

You’ll build a working language model from scratch on text you choose โ€” with your own temperature dial. You’ll leave saying: “I built a language model from scratch.”

Begin โ†’
Today you will

Demystify the most hyped tech on Earth

๐Ÿ”ฎ

Build a text generator

A working bigram model on YOUR corpus, with a temperature dial you code.

๐Ÿ•ต๏ธ

Explain hallucination

Why it’s the mechanism, not a glitch โ€” using your own model as the exhibit.

๐Ÿ›ก๏ธ

Red-team & evaluate

Break an LLM, then score your own assistant with a 10-case eval harness.

The Arc

Your step-by-step for today

The whole algorithm

It’s just: predict the next chunk, repeat

An LLM plays one game: given everything so far, predict a likely next token (a word-chunk). A bigram model predicts from 1 word of context; a trigram from 2. More context = more coherent โ€” congratulations, you just discovered context windows.

The pipeline (same loop as your Week-2 pencil math): PRETRAIN (predict next-token on an internet of text) โ†’ FINE-TUNE (examples of being helpful) โ†’ FEEDBACK (rated outputs nudge weights).
Your temperature dial ยท count^(1/T)

One number between “boring” and “unhinged”

Temperature reshapes the probabilities before the model picks a word. Low = always grab the most likely word (safe, but loopy). High = flatten everything (wild, then word salad). Drag the dial:

0.3 ยท safe1.0 ยท balanced2.5 ยท wild
Temperature = 1.0
Same model, one dial. You coded this in a single line: count ** (1/T). There’s no separate “creative mode” โ€” just a number.
The hallucination hunt

Fluent text that was never true

Your 30-line model will produce a sentence that sounds right but was never in your corpus and isn’t true. So does ChatGPT โ€” for the exact same reason: it’s stitching a plausible continuation. There’s no truth table anywhere inside.

Law #3 ยท earned today
Plausible and true are different.

The model is never lying and never honest โ€” it can’t tell the difference. Checking is YOUR job: accept / revise / reject, forever.

Attack, then defend

Red-team it โ€” then build the harness

๐Ÿง™

Red-team (the attack)

On gandalf.lakera.ai and your own bots only, try to make an AI confidently wrong or spill a secret. Log the technique class: roleplay framing ยท authority claims ยท indirection ยท incremental extraction.

๐Ÿงช

Eval harness (the defense)

A repeatable, scored test suite: write a spec, 10 cases (2+ must test RULES, not facts), run v1, patch, run v2, v3. Log the pass-rates. This is the difference between “I wrote a prompt” and engineering.

๐Ÿ” Flip it: everything you did as an attacker, someone will try on YOUR capstone bot in two weeks. Tonight you attack; in an hour you build the defense. That flip is the profession.
๐Ÿ’ฌ

Badge: LLM Whisperer

Earn it: working bigram generator with your own temperature dial; a logged red-team technique; eval harness pass-rates across v1โ€“v3.

๐ŸŽ’ Mission โ€” before next session (1โ€“2 hrs)
  • Push your eval harness to 9/10, OR add three adversarial cases from tonight’s red-team and patch to pass.
  • Read the Capstone Menu (Week 4). Arrive with two shortlisted projects and one sentence on the user each serves.