Least Squares - Math 311 (Narcowich)

Least Squares Problems

Least squares problems and inner products One of the standard problems in 3D geometry is to find the distance from a point to a plane. Many problems encountered in applications can be put into a similar "geometric" form, with the point being replaced by a vector v and the plane by a subspace W of an inner product space V. Specifically, what we wish to discuss here is called the least-squares problem. The object is to find both the distance of v to W, which is precisely the minimum of || v - w ||, where w is any vector in W, as well as any minimizer w₀ in W. The key to solving the problem is this.

Theorem. Let V be a vector space with an inner product < u, v >, and let W be a subspace of V. A vector w₀ in W minimizes the distance || v - w || if and only if w₀ satisfies the equation,
(∗) < v - w₀, w > = 0,
which holds for all w in W. In addition, w₀ is unique.
Proof. Let's first show that if w₀ in W minimizes || v - w ||, then it satisfies the normal equations. Fix u in W, ||u|| = 1, and let t ∈ R. Define
p(t) := || v - w₀ + t u) ||² = || v - w₀ ||² + 2t < v - w₀, u > + t² || u ||² = || v - w₀ ||² + 2t < v - w₀, u > + t² = t² + 2Bt +C.
Because w₀ minimizes || v - w ||² over all w in W, the minimum of p(t) is at t = 0. This means that t = 0 is a critical point for p(t), so p'(0) = 0. Calculating p'(0) then gives us 2B = 2< v - w₀, u > = 0. Diving by 2 then yields < v - w₀, u > = 0. Now, for any w ∈ W, we can let u = w/||w||. Multiplying the last equation by ||w|| then gives us < v - w₀, w > = 0 for all w ∈ W.
Conversely, if w₀ ∈ W satisfies < v - w₀, w > = 0, then take w = tu in the polynomial p(t). Doing so gives p(t) = ||v - w₀ ||² +t², because B = < v - w₀, u > = 0. Again, p′(t) = 2t, and so the minimum occurs at t = 0.
To see that the minimizer is unique, suppose that there is a second minimizer, w₁ ≠ w₀. Both must satisfy the equation (∗), so
< v - w₀, w > = 0 and < v - w₁, w > = 0.
Subtract the two equations to get < w₀ - w₁, w > = 0, which holds for all w in W. Since both minimizers are in W, which is a subspace, their difference w₀ - w₁ is also in W. If in the previous equation we take w = w₀ - w₁, then || w₀ - w₁ ||² = 0. It follows that w₀ - w₁ = 0, and so w₁ = w₀, which is a contradiction. Thus there is only one minimizer

The significance of this theorem is that it provides a way to actually calculate the minimizer. If W has an orthonormal basis E = {u₁,..., u_n}, then we recall that w₀ = c₁u₁ + ... + c_nu_n, where c_k = < w₀, u_k >, k = 1,..., n. By (∗), we have <v - w₀, u_k > = 0. Thus, <v, u_k > = <w₀, u_k >. From this it follows that c_k = <v, u_k >, and that the minimizer has the explicit form
w₀ = ∑_k <v, u_k > u_k.
The important feature of this formula is that we can calculate c_k = < w₀, u_k > without knowing what w₀ is. In the next sections, we will solve two least squares problems.

Least squares fitting of a function. We want to find the quadratic polynomial that gives the best least least squares fit for the function f(x) = e^2x on the interval [-1,1]. In this case, the inner product and norm are
< f , g > = ∫₋₁¹ f(x)g(x)dx and ||f|| = (∫₋₁¹ f(x)²dx)^½.
Since we want to use quadratics, we will take W = P₃. The basis that we will use is E = {p₀(x), p₁(x), p₃(x)}, where

p₀(x) = 2^-1/2, p₁(x) = (3/2)^1/2x, and p₂(x)= (5/8)^1/2(3x²-1).

These polynomials are called normalized Legendre polynomials and they are orthonormal: <p_i, p_j> = δ_ij. In this basis, the quadratic polynomial that is the best least squares fit to e^2x has the form

p(x) = c₁p₀(x) + c₂p(x) + c₃p₂(x). From our discussion above, we have

c₁ = < f, p₀> = ∫₋₁¹ e^2xp₀(x)dx = 8^-1/2(e² - e^-2)
c₂ = < f, p₁> = ∫₋₁¹ e^2xp₁(x)dx = (3/32)^1/2(e² + 3e^-2)
c₃ = < f, p₂> = ∫₋₁¹ e^2xp₂(x)dx = (5/128)^1/2(e² - 13e^-2)

The quadratic polynomial that is the best least squares fit to e^2x is
p(x) = (1/4)(e² - e^-2) + (3/8)(e² + 3e^-2)x + (5/32)(e² - 13e^-2)(3x²-1).

Both the function and quadratic least squares fit are plotted below.

Least-squares data fitting. Problem: The table below contains data obtained by measuring the concentration of a drug in a person's blood. Find and sketch the straight line that best fits the data in the (discrete) least squares sense.

Log of Concentration
t 0 1 2 3 4

ln(C) − 0.1 − 0.4 − 0.8 − 1.1 − 1.5

**Log of Concentration**
t	0	1	2	3	4
ln(C)	− 0.1	− 0.4	− 0.8	− 1.1	− 1.5

Solution. We want to find coefficients a₁ and a₂ such that y = a₁ + a₂t is the best least-squares straight-line fit to the data. This means that we choose the two constants a₁ and a₂ so that we minimize the sum S = (y₀ − a₁ + a₂·0)² + (y₁ − a₁ + a₂·1)² + ... + (y₄ − a₁ + a₂·4)². If we let
v₁ = [1 1 1 1 1]^T, v₂ = [0 1 2 3 4]^T, and y_d = [-0.1 -0.4 -0.8 -1.1 -1.5]^T,
then we can rewrite the sum above in terms of the inner product and norm for R⁵:
S = || y_d − c₁v₁ - c₂v₂ ||²
Next, let W = span{v₁, v₂}. The minimization problem now can be put in the form discussed earlier:

Find w₀ in W such that || y_d − w₀ || = min_{w ∈ W} || y_d − w ||.

It is easy to show that if u₁ =(5)^−½ [1 1 1 1 1]^T and u₂ = (10)^−½;[-2 -1 0 1 2]^T, then E = {u₁, u₂} is an orthonormal basis for W. The idea here is to first find w₀ in terms of the E basis, and then change basis to F = {v₁, v₂}. The reason for doing this is that a₁ and a₂ are the just coordinates of w₀ relative to F. Relative to the E basis, w₀ = < y_d, u₁> u₁ + < y_d, u₂> u₂ = −(1.7441 u₁ + 1.1068 u₂). With a little work, we see that u₁ = 5^−½v₁ and u₂ = 10^−½(v₂ − 2v₁). Substituting these in the expression for w₀ yields
w₀ = −0.0800v₁ − 0.35v₂.
From this, we get c₁ = −0.08 and c₂ = −0.35. The line we want is y = - 0.08 - 0.35t. The data and the line are plotted below.