Regularization and Generalization
1. Generalization Problem
Model should perform well on unseen data, not only training set.
2. Regularized Objectives
- Ridge:
Loss + lambda ||w||_2^2 - Lasso:
Loss + lambda ||w||_1 - Elastic net: combination
3. Optimization and Geometry
L2 shrinks smoothly. L1 encourages sparsity due to corner geometry of L1 ball.
4. Bias-Variance Tradeoff
Increasing regularization: - reduces variance - increases bias
Need tuning via validation/cross-validation.
5. Early Stopping as Implicit Regularization
Iterative optimization halted before overfitting can regularize model.
6. Worked Example
Compare linear regression with and without ridge on collinear features; regularized model stabilizes coefficients.
7. Practical Workflow
- Split train/validation/test
- Tune
lambda - Monitor train-vs-validation gap
- Report uncertainty and calibration
Exercises
- Derive closed form of ridge solution.
- Explain why lasso can perform feature selection.
- Plot validation error vs lambda and select optimum.
- Compare early stopping vs explicit L2 on same dataset.