Numerical Optimization in Practice

1. Problem Setup

Minimize objective f(x) possibly with constraints.

2. First-Order Methods

Gradient descent update: x_{k+1}=x_k - alpha_k grad f(x_k).

Convergence depends on step-size strategy and smoothness assumptions.

3. Second-Order Methods

Newton update: x_{k+1}=x_k - H^{-1}(x_k) grad f(x_k).

Very fast locally but expensive and sensitive to Hessian properties.

4. Line Search and Trust Regions

Line search picks acceptable step along direction.
Trust region restricts step to model-valid neighborhood.

5. Constrained Optimization

Common numerical techniques: - projected gradient, - penalty methods, - barrier methods, - augmented Lagrangian.

6. Diagnostics and Failure Modes

Track: - objective value, - gradient norm, - step norm, - constraint violation.

Failure patterns: - divergence (step too large), - stagnation (ill-conditioning), - oscillation (poor scaling), - infeasible drift (constraint handling issue).

7. Worked Example Outline

L2-regularized logistic regression: min_w sum log(1+exp(-y_i w^T x_i)) + lambda ||w||^2.

Use mini-batch gradient with decaying learning rate and monitor validation loss.

Exercises

Implement gradient descent with backtracking line search.
Compare convergence of GD vs Newton on quadratic objective.
Add L2 regularization and inspect conditioning effect.
Define robust stopping rules for noisy stochastic objective.

Summary

Numerical optimization is the bridge between mathematical objective definitions and trainable systems.