Floating-Point Arithmetic, Error, and Stability

1. Machine Numbers

Real numbers are approximated by finite binary encodings. IEEE-754 double precision stores sign, exponent, and mantissa.

Not all decimals are exactly representable (0.1 is classic example).

A computed operation can be idealized as:

fl(x op y) = (x op y)(1 + delta) with |delta| <= u,

where u is machine precision.

Subtracting nearly equal numbers destroys significant digits.

Example: f(x)=sqrt(x+1)-sqrt(x) for large x. Direct formula is unstable; rationalized form is better:

f(x)=1/(sqrt(x+1)+sqrt(x)).

A well-conditioned problem can still yield bad answers with unstable algorithm.

If condition number kappa is large, relative input error eps may produce output error roughly kappa*eps.

Proof idea: first-order perturbation analysis.

import math

x = 1e16
naive = math.sqrt(x+1) - math.sqrt(x)
stable = 1.0 / (math.sqrt(x+1) + math.sqrt(x))
print(naive, stable)