Proof: If x1 and x2 are any two eigenvectors of
A corresponding to the (not necessarily distinct) eigenvalues
z1 and z2, then
< A x1, x2 > = < x1, A
x2>
< z1 x1, x2 > = <
x1, z2 x2>
z1 < x1, x2 > =
c.c.(z2) < x1, x2> (c.c. =
complex conjugate).
If we set x1 = x2, so that z1 =
z2, then this formula implies that z1 <
x1, x1 > = c.c.(z1) <
x1, x1>. Since < x1,
x1 > = || x1 ||2, we can divide by
it to obtain c.c.(z1) = z1. Now, a complex
number is real if and only if it is equal to its complex conjugate. It
follows that z1 is real. Applying this to the formula we
derived above, we see that z1 < x1,
x2 > = z2 < x1,
x2>. This in turn gives us
(z1 - z2) < x1, x2 >
= 0. Thus, if the eigenvalues z1 and z2 are
distinct, then dividing by their difference yields < x1,
x2 > = 0. That is, the corresponding eigenvectors are
orthogonal.
Proof: We take A to be an n×n selfadjoint matrix. Our proof
will be via induction. The statement is true for n=1, because
everything is a scalar in that case. We only need a single
vector. Suppose that n > 1. We must show that if every
n-1×n-1 selfadjoint matrix has an orthonormal basis of
eigenvectors, then every n×n selfadjoint matrix does,
too. Start with the n×n selfadjoint matrix A. First of all,
every matrix has at least one eigenvalue and one corresponding
eigenvector, so A has an eigenvalue z1 and eigenvector
x1, which we may take as a unit vector. We can always find
vectors y2, ..., yn such that the set
{x1, y2, ..., yn}.
is not only a basis, but, by application of the Gram-Schmidt process
if necessary, an orthonormal basis. Note that the yj's are
not necessarily eigenvectors. They are simply chosen to so that the
set above is an orthonormal basis. Let's find the matrix for A
relative to this new basis:
B= TH A T, T = [x1 y2,
... yn].
We leave it as an exercise to show that B=
z1 0 0 Cwhere C is an n-1×n-1 selfadjoint matrix. The induction hypothesis then implies that we can find an orthonormal basis of eigenvectors {v1, ..., vn-1} for Cn-1 or Rn-1. Let V be the n-1×n-1 matrix given by V = [v1 ... vn-1], and note that
1 0 0 VWe also note that V being unitary implies that U is a unitary matrix. Moreover, UH B U =
z1 0 0 VHCVThis is the diagonal matrix D= diag(z1, z2, ..., zn). That is, D = UH B U = UHTHATU. Taking S = TU finishes the proof, because the product of unitary matrices is unitary.
1 3 3 1is QA(x) = x12 + 6x1x2 + x22. We recall from analytic geometry that the level curves x12 + 6x1x2 + x22 = constant are either ellipses or hyperbolas, but with principal axes rotated or reflected relative to the x1-x2 coordinate system. The typical problem is then to find these principal axes and the also standard form of the conic. Similar problems arise in three or more dimensions. We illustrate the technique for solving these using the example QA(x) = x12 + 6x1x2 + x22
First, let's find the eigenvalues and eigenvectors of A. These turn
out to be -2 and 4, with
u-2 = [2-1/2 -2-1/2
]T and u4 = [2-1/2
2-1/2 ]T.
The eigenvectors are orthogonal, because they belong to distinct
eigenvalues. We have also normalized them to have length 1. Thus the
matrix S = [u-2 u4] is orthogonal and
satisfies STAS = D, where D = diag(-2,4). We can rewrite
this as A = SDST. The quadratic form then becomes
QA(x) = xT A x =xT SDSTx =
(STx)TDSTx.
If we let X=STx be new coordinates, which are the old ones
rotated by 45 degrees, then we see that
QA(x) = QD(X) = -2X12 +
4X22.
The principal axes, in terms of the old coordinates, are just
u-2 and u4, and the family of conics
QA(x) = constant are all hyperbolas.
s1 | 0 | 0 | ... | 0 | ... | 0 |
0 | s2 | 0 | ... | 0 | ... | 0 |
... | ... | ... | ... | ... | ... | ... |
0 | 0 | 0 | ... | sr | ... | 0 |
0 | 0 | 0 | ... | 0 | ... | 0 |
... | ... | ... | ... | ... | ... | ... |
0 | 0 | 0 | ... | 0 | ... | 0 |
ATA vk = zk vk
vkT AT A vk =
zk vkT vk
|| A vk ||2 = zk || vk
||2
|| A vk ||2 = zk (||
vk || =1 )
This calculation also shows that a vector v is in the null space of A
if and only if it is an eigenvector corresponding to the eigenvalue
0. If r = rank(A), then "rank + nullity = # of columns" tells us that
the the nullity(A) = n - r. This means that there are r eigenvectors
for the remaining eigenvalues. List these as z1 >=
z2 >= ... >= zr > 0. Our input basis is
now chosen as {v1, ..., vr, vr+1,
..., vn }. The numbering is the same as that for the
eigenvalues. We now define the matrix V via
V = [ v1 ... vr vr+1
... vn ].
S = [ [z11/2 u1]U [z21/2 u2]U ... [zk1/2 ur]U [ 0 ]U ... [ 0 ]U ]
S = [ z11/2 [u1]U ... zr1/2 [ur]U 0 ... 0 ]
S = [ z11/2 e1 ... zr1/2 er 0 ... 0 ].
If we let sk = zk1/2 for k = 1, ..., r, we get the same S as the one given in the statement of the theorem. These sk's are the singular values of A. The matrix S is related to A via multiplication by change-of-basis matrices. The matrix U changes from new output to old output bases, and V changes from new input to old input bases. Since VT = V-1, we have that VT changes from old input to new input bases. In the end, this gives us A =USVT
2 -2 1 1 -2 2Here, ATA =
9 -7 -7 9The eigenvalues of this matrix are z1 = 16 and z2 = 2. The singular values are s1 = 4 and s2 = 21/2. We can immediately write out what S is. We have S =
4 0 0 21/2 0 0The eigenvector corresponding to 16 is v1 = 2-1/2(1,-1)T, and the one corresponding to 4 is v2 = 2-1/2(1,1)T. Hence, we see that V =
2-1/2 2-1/2 -2-1/2 2-1/2Next, we find the u's.
u1 = A v1 / z11/2
u1 = 2-1/2 (4, 0, -4)T/4
u1 =( 2-1/2 , 0, - 2-1/2 )T.
A similar calculation gives us
u2 =( 0, 1, 0 )T.
We now have to add to these to a "fill" vector
u3 =( 2-1/2 , 0, 2-1/2 )T
to complete the new output basis. This finally yields U =
2-1/2 0 2-1/2 0 1 0 -2-1/2 0 2-1/2
4-1 0 0 0 2-1/2 0Using this matrix we can then write out the solution to the original least squares problem as a matrix product,
The pseudoinverse The pseudoinverse of an m×n
matrix A that has SVD A = USVT is defined to be
A+ = VS+UT, where S+ is
the n×m matrix given by S+ =
s1-1 | 0 | 0 | ... | 0 | ... | 0 |
0 | s2-1 | 0 | ... | 0 | ... | 0 |
... | ... | ... | ... | ... | ... | ... |
0 | 0 | 0 | ... | sr-1 | ... | 0 |
0 | 0 | 0 | ... | 0 | ... | 0 |
... | ... | ... | ... | ... | ... | ... |
0 | 0 | 0 | ... | 0 | ... | 0 |
B(x) = 0 if x < -1 or x > 1;
B(x) = 1 + x if -1 <= x < 0;
B(x) = 1 - x if 0 < x <= 1.
We want to minimize || y - u ||. This is a least squares problem. The only new thing here is doing least squares with a non-orthogonal basis. We will now discuss such problems.
Ac=d, where Ajk= < wk, wj >, dj = < v , wj >.
The matrix A is called the Gram matrix for the basis of w's; it is always invertible.
Proof: To simplify the calculation, we will assume the scalars are the
real numbers. Let x1, ..., xn-1 be scalars, and
consider the vector w in U given by
w = x1w1+ ... + xn-1 wn-1.
Putting this form of w in the inner product < w, wj >
gives us the equation
< w, wj > = x1< w1,
wj > + ... + xn-1 < wn-1,
wj > = [Ax]j,
which is the jth component of the column vector Ax. The
implication is that
< w, w >
= < w, x1w1+ ... +
xn-1 wn-1 >
= x1 < w, w1 > + ... + xn-1
< w, wn-1 >
= xTAx.
Now, A is singular (not invertible) if and only if there is a non-zero
vector x such that Ax = 0. If such a vector exists, then
0= xTAx = < w, w > = || w ||2,
which implies that w = 0. Since the coefficients in x are not all 0,
the set of w's has to be linearly dependent. Conversely, if the w's
are linearly dependent, there is a non-zero vector x such w = 0. This
and the equation < w, w > = [Ax]j then imply that Ax
= 0. Hence, A is singular. This shows that A is singular if and only
if the w's are linearly dependent. Equivalently, A is invertible if
and only if the w's are linearly independent.
We collect what we have proved above in this result:
Let us carry this procedure out. By examining the graphs of
w'j = nB'(nx-j), we see that Ajk = <
wk, wj > satisfies
Aj,j = 2n, j = 1 ... n-1
Aj,j-1 = - n, j = 2 ... n-1
Aj,j+1 = - n, j = 1 ... n-2
Aj,k = 0, all other possible k.
For example, if n=5, then A is
10 -5 0 0 -5 10 -5 0 0 -5 10 -5 0 0 -5 10
z1 * 0 A1where A1 is (n-1)×(n-1).
z2 * 0 A2Next, let S2 =
1 0 0 U2It is easy to see that S2 is unitary and that S2H S1HA S1S2 =
z1 * * 0 z2 * 0 0 A2.
z1 * * ... * 0 z2 * ... * 0 0 z3 ... * ... 0 0 0 ... zn-1If we take S = S1S2···Sn-1 and note that the product of unitary matrices is unitary, we have S-1 = SH = Sn-1H ··· S2H S1H. Hence, we arrive at SHAS = T, which was what we wanted to establish. All that remains is to show the diagonal entries of T are the eigenvalues of A repeated according to algebraic multiplicity. To do this, we note that T and A are similar, so their characteristic polynomials are the same. Thus, we have
T1 0 0 ... 0 0 T2 0 ... 0 0 0 T3 ... 0 ... 0 0 0 ... TrIf there are r distinct eigenvalues of A, z1, z2, ..., zr, and if the algebraic multiplicity of each is m1, ..., mr, then a block Tk is an mk×mk upper triangular matrix having zk's on the diagonal; specifically, Tk =
zk * * ... * 0 zk * ... * 0 0 zk ... * ... 0 0 0 ... zk
3 | 1 | 0 | 0 | 0 | 0 |
0 | 3 | 1 | 0 | 0 | 0 |
0 | 0 | 3 | 1 | 0 | 0 |
0 | 0 | 0 | 0 | 3 | 1 |
0 | 0 | 0 | 0 | 0 | 3 |
One may find a proof of this theorem in Linear Algebra, 2nd ed.,by K. Hoffman and R. Kunze, Prentice-Hall, Upper Saddle River, NJ, 1971.