The significance of this theorem is that it provides a way to actuallyTheorem. Let V be a vector space with an inner product < u, v >, and let W be a subspace of V. A vectorw_{0}in W minimizes the distance ||v-w|| if and only ifw_{0}satisfies the equation,

(∗) <v-w_{0},w> = 0,

which holds for allwin W. In addition,w_{0}is unique.

Proof.Let's first show that ifw_{0}in W minimizes ||v-w||, then it satisfies the normal equations. Fixuin W, ||u|| = 1, and let t ∈R. Define

p(t) := ||v-w_{0}+ tu) ||^{2}= ||v-w_{0}||^{2}+ 2t <v-w_{0},u> + t^{2}||u||^{2}= ||v-w_{0}||^{2}+ 2t <v-w_{0},u> + t^{2}= t^{2}+ 2Bt +C.

Becausew_{0}minimizes ||v-w||^{2}over allwin W, the minimum of p(t) is at t = 0. This means that t = 0 is a critical point for p(t), so p'(0) = 0. Calculating p'(0) then gives us 2B = 2<v-w_{0},u> = 0. Diving by 2 then yields <v-w_{0},u> = 0. Now, for anyw∈ W, we can letu=w/||w||. Multiplying the last equation by ||w|| then gives us <v-w_{0},w> = 0 for allw∈ W.Conversely, if

w_{0}∈ W satisfies <v-w_{0},w> = 0, then takew= tuin the polynomial p(t). Doing so gives p(t) = ||v-w_{0}||^{2}+t^{2}, because B = <v-w_{0},u> = 0. Again, p′(t) = 2t, and so the minimum occurs at t = 0.To see that the minimizer is unique, suppose that there is a second minimizer,

w_{1}≠w_{0}. Both must satisfy the equation (∗), so

<v-w_{0},w> = 0 and <v-w_{1},w> = 0.

Subtract the two equations to get <w_{0}-w_{1},w> = 0, which holds for allwin W. Since both minimizers are in W, which is a subspace, their differencew_{0}-w_{1}is also in W. If in the previous equation we takew=w_{0}-w_{1}, then ||w_{0}-w_{1}||^{2}= 0. It follows thatw_{0}-w_{1}= 0, and sow_{1}=w_{0}, which is a contradiction. Thus there is only one minimizer

The important feature of this formula is that we can calculate c

**Least squares fitting of a function**. We want to find the
quadratic polynomial that gives the best least least squares fit for
the function f(x) = e^{2x} on the interval [-1,1]. In this
case, the inner product and norm are

< f , g > = ∫_{ −1}^{1} f(x)g(x)dx and
||f|| = (∫_{ −1}^{1}
f(x)^{2}dx)^{½}.

Since we want to use quadratics, we will take W = P_{3}. The
basis that we will use is E = {p_{0}(x), p_{1}(x),
p_{3}(x)}, where

p_{0}(x) = 2^{-1/2}, p_{1}(x) =
(3/2)^{1/2}x, and p_{2}(x)=
(5/8)^{1/2}(3x^{2}-1).

p(x) = c_{1}p_{0}(x) + c_{2}p_{}(x) +
c_{3}p_{2}(x). From our discussion above, we have

c_{1} = < f, p_{0}> = ∫_{
−1}^{1} e^{2x}p_{0}(x)dx =
8^{-1/2}(e^{2} - e^{-2})

c_{2} = < f, p_{1}> = ∫_{
−1}^{1} e^{2x}p_{1}(x)dx =
(3/32)^{1/2}(e^{2} + 3e^{-2})

c_{3} = < f, p_{2}> = ∫_{
−1}^{1} e^{2x}p_{2}(x)dx =
(5/128)^{1/2}(e^{2} - 13e^{-2})

The quadratic polynomial that is the best least squares fit to
e^{2x} is

p(x) = (1/4)(e^{2} - e^{-2}) + (3/8)(e^{2} +
3e^{-2})x + (5/32)(e^{2} -
13e^{-2})(3x^{2}-1).

Both the function and quadratic least squares fit are plotted below.

**Least-squares data fitting**. Problem: The table below contains
data obtained by measuring the concentration of a drug in a person's
blood. Find and sketch the straight line that best fits the data in
the (discrete) least squares sense.

t | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|

ln(C) | − 0.1 | − 0.4 | − 0.8 | − 1.1 | − 1.5 |

**Solution**. We want to find coefficients a_{1} and
a_{2} such that y = a_{1} + a_{2}t is the best
least-squares straight-line fit to the data. This means that we choose
the two constants a_{1} and a_{2} so that we minimize
the sum S = (y_{0} − a_{1} +
a_{2}·0)^{2} + (y_{1} −
a_{1} + a_{2}·1)^{2} + ... +
(y_{4} − a_{1} +
a_{2}·4)^{2}. If we let

**v**_{1} = [1 1 1 1 1]^{T}, **v**_{2} = [0 1 2 3
4]^{T}, and **y**_{d} = [-0.1 -0.4 -0.8 -1.1
-1.5]^{T},

then we can rewrite the sum above in terms of the inner product and
norm for **R**^{5}:

S = || **y**_{d} −
c_{1}**v**_{1} -
c_{2}**v**_{2} ||^{2}

Next, let W = span{**v**_{1}, **v**_{2}}. The
minimization problem now can be put in the form discussed earlier:

FindIt is easy to show that ifw_{0}in W such that ||y_{d}−w_{0}|| = min_{w ∈ W}||y_{d}−w||.

From this, we get c