Least Squares and Conditioning

1. Least Squares Problem

Given overdetermined system Ax ≈ b, solve:

min_x ||Ax-b||_2^2.

Optimality condition:

A^T A x = A^T b.

Geometrically: residual is orthogonal to column space of A.

If columns of A are independent, least-squares solution is unique.

A^T A is positive definite under full column rank, so linear system has unique solution.

Normal equations square condition number; can degrade accuracy. Prefer QR or SVD methods in practice.

High kappa(A) implies small data perturbations can cause large parameter changes.

Regularization (ridge) improves conditioning: (A^T A + lambda I)x = A^T b.

Linear regression with correlated features: - unregularized coefficients unstable - ridge stabilizes estimates and prediction behavior