Four multiplications and an average per step — that’s the entire engine of deep learning. You’ll leave saying: “I did the math the machine does — by hand.”
Begin โDo with a pencil what every GPU does billions of times a second.
The U-curve โ why a model that aces practice can fail the real test.
Produce bias on purpose with a data filter, then catch it with a metric.
Our data: x = [1,2,3,4], y = [2.1, 3.9, 6.2, 7.8] (so y โ 2x). Our model: prediction = w ยท x. We start at w = 0 with learning rate 0.05. Watch w climb toward 2 all by itself.
The learning rate is your step size downhill. Tap each to see what it does to training.
As you make a model more powerful, its error on the training data always falls. But its error on new data falls, hits a sweet spot, then rises โ the model started memorizing noise instead of learning patterns. That’s overfitting.
In the notebook you deliberately starve digit 8 down to just 5 training examples, then measure. Tap to reveal what the metrics say.
You produced it on purpose with a data filter, and fixed it with data. Which is exactly why “we didn’t mean to” is not a defense at a company that never measured per-group.
These are real. For each, ask the Week-2 question: which metric, computed for which group, would have exposed this BEFORE launch? Tap to open.
Earn it: 3 gradient-descent steps by hand matching the notebook; U-curve explained; recall-for-8 measured before & after the fix.