Multivariable Calculus for ML

1. Partial Derivatives

For f(x1,...,xn), partial derivative wrt xi treats others fixed.

nabla f = [df/dx1,...,df/dxn]^T.

Interpretation: - direction of steepest ascent - magnitude gives local steepness

Along unit vector u:

D_u f = nabla f · u.

Maximum value equals ||nabla f|| at u parallel to gradient.

H_ij = d^2f/(dxi dxj).

Second-order structure: - positive definite -> local minimum - negative definite -> local maximum - indefinite -> saddle

For vector function F: R^n -> R^m, Jacobian stores first partial derivatives of each output component.

Backpropagation in neural nets is repeated Jacobian-chain multiplication.

f(x+Delta) ~= f(x) + nabla f(x)^T Delta + 1/2 Delta^T H(x) Delta.

This underlies Newton and quasi-Newton methods.

For f(x,y)=x^2 + 3xy + y^2: - gradient: [2x+3y, 3x+2y] - Hessian: [[2,3],[3,2]]. Hessian eigenvalues indicate saddle/convex behavior.