Multivariable Calculus for ML
1. Partial Derivatives
For f(x1,...,xn), partial derivative wrt xi treats others fixed.
2. Gradient
nabla f = [df/dx1,...,df/dxn]^T.
Interpretation: - direction of steepest ascent - magnitude gives local steepness
3. Directional Derivative
Along unit vector u:
D_u f = nabla f · u.
Maximum value equals ||nabla f|| at u parallel to gradient.
4. Hessian Matrix
H_ij = d^2f/(dxi dxj).
Second-order structure: - positive definite -> local minimum - negative definite -> local maximum - indefinite -> saddle
5. Jacobian Matrix
For vector function F: R^n -> R^m, Jacobian stores first partial derivatives of each output component.
Backpropagation in neural nets is repeated Jacobian-chain multiplication.
6. Taylor Approximation (Multivariate)
f(x+Delta) ~= f(x) + nabla f(x)^T Delta + 1/2 Delta^T H(x) Delta.
This underlies Newton and quasi-Newton methods.
7. Worked Example
For f(x,y)=x^2 + 3xy + y^2: - gradient: [2x+3y, 3x+2y] - Hessian: [[2,3],[3,2]]. Hessian eigenvalues indicate saddle/convex behavior.
Exercises
- Compute gradient/Hessian for logistic regression loss.
- Show chain rule for composition
f(g(x))in vector form. - Classify critical points for a two-variable quadratic.
- Explain role of Hessian conditioning in optimization speed.