Notes for Math 423 - Spring 2002

Selfadjoint matrices

Adjoints   The adjoint of a linear transformation L : V -> V, where V is an inner product space, is the unique linear transformation L' that satisfies
< L'[u],v > = < u, L[v] > .
For a real m×n matrix A, the adjoint is the transpose AT. If the matrix is complex, then the adjoint of A is AH, the conjugate transpose. A matrix is selfadjoint or Hermitian if it is equal to its own adjoint. Thus, if A is real, A = AT; i.e., A is symmetric.

Properties of selfadjoint matrices

  1. Theorem: The eigenvalues of a selfadjoint matrix are real and the eigenvectors corresponding to distinct eigenvalues are orthogonal.

    Proof: If x1 and x2 are any two eigenvectors of A corresponding to the (not necessarily distinct) eigenvalues z1 and z2, then
    < A x1, x2 > = < x1, A x2>
    < z1 x1, x2 > = < x1, z2 x2>
    z1 < x1, x2 > = c.c.(z2) < x1, x2>  (c.c. = complex conjugate).
    If we set x1 = x2, so that z1 = z2, then this formula implies that z1 < x1, x1 > = c.c.(z1) < x1, x1>. Since < x1, x1 > = || x1 ||2, we can divide by it to obtain c.c.(z1) = z1. Now, a complex number is real if and only if it is equal to its complex conjugate. It follows that z1 is real. Applying this to the formula we derived above, we see that z1 < x1, x2 > = z2 < x1, x2>. This in turn gives us
    (z1 - z2) < x1, x2 > = 0. Thus, if the eigenvalues z1 and z2 are distinct, then dividing by their difference yields < x1, x2 > = 0. That is, the corresponding eigenvectors are orthogonal.

  2. Theorem: Every selfadjoint matrix has an orthonormal basis relative to which it is diagonal. Equivalently, there is a unitary (orthogonal, real case) matrix S such that SHA S is a diagonal matrix. An n×n matrix S is said to be unitary if SH = S-1; this amounts to saying that the columns form an orthonormal basis for Cn or Rn.

    Proof: We take A to be an n×n selfadjoint matrix. Our proof will be via induction. The statement is true for n=1, because everything is a scalar in that case. We only need a single vector. Suppose that n > 1. We must show that if every n-1×n-1 selfadjoint matrix has an orthonormal basis of eigenvectors, then every n×n selfadjoint matrix does, too. Start with the n×n selfadjoint matrix A. First of all, every matrix has at least one eigenvalue and one corresponding eigenvector, so A has an eigenvalue z1 and eigenvector x1, which we may take as a unit vector. We can always find vectors y2, ..., yn such that the set
    {x1, y2, ..., yn}.
    is not only a basis, but, by application of the Gram-Schmidt process if necessary, an orthonormal basis. Note that the yj's are not necessarily eigenvectors. They are simply chosen to so that the set above is an orthonormal basis. Let's find the matrix for A relative to this new basis:
    B= TH A T,   T = [x1 y2, ... yn].
    We leave it as an exercise to show that B=

    z1 0
    0 C
    
    where C is an n-1×n-1 selfadjoint matrix. The induction hypothesis then implies that we can find an orthonormal basis of eigenvectors {v1, ..., vn-1} for Cn-1 or Rn-1. Let V be the n-1×n-1 matrix given by V = [v1 ... vn-1], and note that
    VH C V = diag(z2, z3, ..., zn).
    Of course, since the vj's are orthonomal, the n-1×n-1 matrix V is unitary. Next, let U =
    1 0 
    0 V
    
    We also note that V being unitary implies that U is a unitary matrix. Moreover, UH B U =
    z1 0 
    0 VHCV
    
    This is the diagonal matrix D= diag(z1, z2, ..., zn). That is, D = UH B U = UHTHATU. Taking S = TU finishes the proof, because the product of unitary matrices is unitary.

Quadratic forms  One important application of these results is to finding principal axes of a quadratic form. If A is a real, symmetric n×n matrix, then the quadratic form associated with A is the function QA(x) = xT A x. For example, the quadratic form associated with the matrix
A =
1 3
3 1
is QA(x) = x12 + 6x1x2 + x22. We recall from analytic geometry that the level curves x12 + 6x1x2 + x22 = constant are either ellipses or hyperbolas, but with principal axes rotated or reflected relative to the x1-x2 coordinate system. The typical problem is then to find these principal axes and the also standard form of the conic. Similar problems arise in three or more dimensions. We illustrate the technique for solving these using the example QA(x) = x12 + 6x1x2 + x22

First, let's find the eigenvalues and eigenvectors of A. These turn out to be -2 and 4, with
u-2 = [2-1/2   -2-1/2 ]T and u4 = [2-1/2   2-1/2 ]T.
The eigenvectors are orthogonal, because they belong to distinct eigenvalues. We have also normalized them to have length 1. Thus the matrix S = [u-2   u4] is orthogonal and satisfies STAS = D, where D = diag(-2,4). We can rewrite this as A = SDST. The quadratic form then becomes
QA(x) = xT A x =xT SDSTx = (STx)TDSTx.
If we let X=STx be new coordinates, which are the old ones rotated by 45 degrees, then we see that
QA(x) = QD(X) = -2X12 + 4X22.
The principal axes, in terms of the old coordinates, are just u-2 and u4, and the family of conics QA(x) = constant are all hyperbolas.

Singular value decomposition (SVD)

Theorem   Every (real) m×n matrix A can be written as a product A = USVT. Here, U and V are orthogonal matrices, with U being m×m and V being n×n. S is m×n, and has the form S =
s1 0 0 ... 0 ... 0
0 s2 0 ... 0 ... 0
... ... ... ... ... ... ...
0 0 0 ...sr ... 0
0 0 0 ... 0 ... 0
... ... ... ... ... ... ...
0 0 0 ... 0 ... 0
The diagonal entries are positive, and are ordered from greatest to least; r is the rank of A. The sk's are called the singular values of A.

Applications   Gilbert Strang has called this theorem the "Fundamental Theorem of Linear Algebra." The SVD contains very explicit information concerning everything one would want to know about a matrix.

Finding the SVD of a matrix   The matrix S above is just the matrix of A relative to new bases for both the input and output spaces.

  • The input space basis   The matrix ATA is real and symmetric, so there is an orthonormal basis of eigenvectors relative to which it is diagonal. Let the eigenvalues of ATA be z1, ..., zn, and the corresponding basis of eigenvectors be {v1, ..., vn}. The eigenvalues are nonnegative, as this calculation shows:

    ATA vk = zk vk
    vkT AT A vk = zk vkT vk
    || A vk ||2 = zk || vk ||2
    || A vk ||2 = zk   (|| vk || =1 )

    This calculation also shows that a vector v is in the null space of A if and only if it is an eigenvector corresponding to the eigenvalue 0. If r = rank(A), then "rank + nullity = # of columns" tells us that the the nullity(A) = n - r. This means that there are r eigenvectors for the remaining eigenvalues. List these as z1 >= z2 >= ... >= zr > 0. Our input basis is now chosen as {v1, ..., vr, vr+1, ..., vn }. The numbering is the same as that for the eigenvalues. We now define the matrix V via
    V = [ v1 ... vr vr+1 ... vn ].

  • The output space basis   For k = 1, ..., r, let
    uk = A vk / || A vk ||
    uk = A vk / z11/2
    We can also write this as the following equation:
    A vk = zk1/2 uk .
    The uk's are orthonormal, for k = 1, ..., r. Again, we see this from these equations.
    ujT uk = vjT AT A vk / zj1/2 zk1/2
    ujT uk = zk vjT vk / zj1/2 zk1/2
    The orthonormality of the v's implies that the right side above is 0 unless j = k, in which case it is 1. Thus, the u's are orthonormal. Fill this set out with m - r vectors to form an orthonormal basis for the output space, Rm. This gives us the basis {u1, .., ur, ur+1, .., um}. As before, we define the m×m orthogonal matrix
    U = [u1, .., ur, ur+1, .., um].

  • The matrix of A relative to the new bases   We let S be MA. We compute it via the formula for the matrix of a linear transformation.
    S = [ [Av1]U [Av2]U ... [Avr]U [Avr+1]U ... [Avn]U ]
    The v's with k = r+1,...,n, are all in the null space of A. Thus the last n - r vectors are 0, and so are their corresponding columns relative to the basis of u's. The other vectors we get from the definition of the u's for k = 1, ..., r. The end result is this.

    S = [ [z11/2 u1]U [z21/2 u2]U ... [zk1/2 ur]U [ 0 ]U ... [ 0 ]U ]

    S = [ z11/2 [u1]U ... zr1/2 [ur]U   0 ... 0 ]

    S = [ z11/2 e1 ... zr1/2 er 0 ... 0 ].

    If we let sk = zk1/2 for k = 1, ..., r, we get the same S as the one given in the statement of the theorem. These sk's are the singular values of A. The matrix S is related to A via multiplication by change-of-basis matrices. The matrix U changes from new output to old output bases, and V changes from new input to old input bases. Since VT = V-1, we have that VT changes from old input to new input bases. In the end, this gives us A =USVT

Example Consider the matrix A =
 2  -2
 1   1
-2   2
Here, ATA =
 9  -7
-7   9
The eigenvalues of this matrix are z1 = 16 and z2 = 2. The singular values are s1 = 4 and s2 = 21/2. We can immediately write out what S is. We have S =
 4 0
 0 21/2
 0 0
The eigenvector corresponding to 16 is v1 = 2-1/2(1,-1)T, and the one corresponding to 4 is v2 = 2-1/2(1,1)T. Hence, we see that V =
 2-1/2   2-1/2
-2-1/2   2-1/2
Next, we find the u's.

u1 = A v1 / z11/2
u1 = 2-1/2 (4, 0, -4)T/4
u1 =( 2-1/2 , 0, - 2-1/2 )T.

A similar calculation gives us
u2 =( 0, 1, 0 )T.

We now have to add to these to a "fill" vector
u3 =( 2-1/2 , 0, 2-1/2 )T
to complete the new output basis. This finally yields U =

 2-1/2 0  2-1/2
 0    1  0
-2-1/2 0  2-1/2

A least squares example  We again consider the matrix A used above. This time we want to find an x in R2 such that the distance
|| Ax - b ||,   b = (1 1 3)T
is a minimum. This is of course a least squares problem. Because orthogonal and unitary matrices preserve length, we have
|| Ax - b || = || USVTx - UUTb || = || SVTx - UTb || = || Sz - c ||,   z = VTx and c = UTb = (21/2  1  23/2)T.
Sz - c = (4z1 - 21/2  21/2 z2 - 1  23/2)T.
The square of the length of this vector is
|| Sz - c ||2 = (4z1 - 21/2)2 + (21/2 z2 - 1)2 + 8
This is minimized by taking z1 = 2-3/2 and z2 = 2-1/2. We can also write this in matrix form as z = S+c, where S+ =
 4-1  0  0
 0 2-1/2 0
Using this matrix we can then write out the solution to the original least squares problem as a matrix product,
x = VS+UTb.
The solution we obtain here is x = (3/4 1/4)T. Now, the point is that if we look at what we did in light of what we said earlier about least squares and the SVD, we see that this is a general result.

The pseudoinverse The pseudoinverse of an m×n matrix A that has SVD A = USVT is defined to be
A+ = VS+UT, where S+ is the n×m matrix given by S+ =
s1-1 0 0 ... 0 ... 0
0 s2-1 0 ... 0 ... 0
... ... ... ... ... ... ...
0 0 0 ...sr-1 ... 0
0 0 0 ... 0 ... 0
... ... ... ... ... ... ...
0 0 0 ... 0 ... 0
Thus S+ is constructed from S by taking the transpose of S and replacing the nonzero diagonal elements by their reciprocals. We have the following result.

Theorem   Let A be an m×n matrix and let A+ be its pseudoinverse. If b is an n×1 column vector, then x = A+b minimizes || Ax - b || and has the property that x has the smallest norm || x || among all minimizers.

Finite element method

An ODE problem We want to illustrate the finite element method with a simple example. Solve the ODE - y'' = f(x), y(0) = y(1) = 0. We take the space V to be all continuous functions g(x) defined on [0,1] having a piecewise continuous derivative g'(x) and satisfying g(0) = g(1) = 0. The inner product for this space is < f , g > = S01 f'(x)g'(x) dx. The subspace U comprises all piecewise linear functions in V, with possible discontinuities in the derivatives at 0, 1/n, 2/n, ... (n-1)/n, 1. These are linear B-splines; the possible discontinuities are called knots. A (non-orthogonal) basis is
wj(x) = B(n x - j), j = 1, ..., n-1,
where B(x) is the piecewise linear function defined this way:

B(x) = 0 if x < -1 or x > 1;
B(x) = 1 + x if -1 <= x < 0;
B(x) = 1 - x if 0 < x <= 1.

We want to minimize || y - u ||. This is a least squares problem. The only new thing here is doing least squares with a non-orthogonal basis. We will now discuss such problems.

Normal equations for non-orthogonal bases Let V be a vector space with an inner product < u, v >, and let U be a finite dimensional subspace of V. The vector u* minimizes the distance || v - u || if and only if u* satisfies
< v-u*, u > = 0,
for all u in U. Equivalently, u* is the minimizer if and only if v - u* is orthogonal to U. Let U have a basis {w1, ... , wn-1}. We can express the minimizer u* as
u*= c1w1+ ... + cn-1 wn-1.
The condition that v - u* is orthogonal to U implies that for all j = 1 to n-1, we have
< v-u*, wj > = 0.
Inserting the expression for u* in terms of the w's, we see that the coefficients xj satisfy the system of equations
c1< w1, wj > + ... + cn < wn-1, wj > = < v, wj >
In matrix form, the equation above becomes

Ac=d,   where Ajk= < wk, wj >,   dj = < v , wj >.

The matrix A is called the Gram matrix for the basis of w's; it is always invertible.

Lemma:  The Gram matrix A with entries Ajk= < wk, wj > is invertible if and only if the w's are linearly independent.

Proof: To simplify the calculation, we will assume the scalars are the real numbers. Let x1, ..., xn-1 be scalars, and consider the vector w in U given by
w = x1w1+ ... + xn-1 wn-1.
Putting this form of w in the inner product < w, wj > gives us the equation
< w, wj > = x1< w1, wj > + ... + xn-1 < wn-1, wj > = [Ax]j,
which is the jth component of the column vector Ax. The implication is that
< w, w >
= < w, x1w1+ ... + xn-1 wn-1 >
= x1 < w, w1 > + ... + xn-1 < w, wn-1 >
= xTAx.
Now, A is singular (not invertible) if and only if there is a non-zero vector x such that Ax = 0. If such a vector exists, then
0= xTAx = < w, w > = || w ||2,
which implies that w = 0. Since the coefficients in x are not all 0, the set of w's has to be linearly dependent. Conversely, if the w's are linearly dependent, there is a non-zero vector x such w = 0. This and the equation < w, w > = [Ax]j then imply that Ax = 0. Hence, A is singular. This shows that A is singular if and only if the w's are linearly dependent. Equivalently, A is invertible if and only if the w's are linearly independent.

We collect what we have proved above in this result:

Theorem:  Let U be a finite dimensional subspace of an inner product space V, let v be a vector in V, and let U have a (possibly non-orthogonal) basis {w1, ... , wn-1}. The minimizer u* of ||v - u|| is u*= c1w1+ ... + cn-1 wn-1, where c = A-1d. Here A is the Gram matrix for the w's and d is the data vector with components dj= < v , wj >.

The solution   Relative to {w1, ... ,wn-1}, the data vector and entries in A are given by
dj = < y, wj > = S01 f(x) wj(x) dx (integrate by parts)
Ajk = < wk, wj > = S01 w'k(x) w'j(x) dx
From here on, one must compute d, A, and solve Ac = d for c. In this case, one can show that u*(x) is just the the piecewise linear function that is 0 at x=0, and cj at x= j/n, j=1, .. , n-1, and 0 at x = 1. One can thus plot u* by ``connecting the dots.''

Let us carry this procedure out. By examining the graphs of w'j = nB'(nx-j), we see that Ajk = < wk, wj >  satisfies
Aj,j = 2n, j = 1 ... n-1
Aj,j-1 = - n, j = 2 ... n-1
Aj,j+1 = - n, j = 1 ... n-2
Aj,k = 0, all other possible k.

For example, if n=5, then A is

10  -5   0   0
-5  10  -5   0
 0  -5  10  -5 
 0   0  -5  10

Triangularization of a matrix

Theorem (Schur form of a matrix):  Given an n×n matrix A with complex entries, one can find a unitary transformation S such that SHAS = T, where T is an upper triangular matrix whose diagonal comprises the eigenvalues of A repeated according to each eigenvalues algebraic multiplicity.

Proof: We will carry out the construction of T in several steps. (See section 8.2 in the text.)

Step 1  Let A be an n×n matrix, and let z1 be an eigenvalue of A, with v1 being the corresponding eigenvector normalized to have length 1. We can find n-1 vectors u2 ... un such that {v1, u2, ..., un) form a basis for Cn. We can also assume that this basis is orthonormal - if not, apply Gram- Schmidt. Finally, set S1 = [v1 u2 ... un]. Since the columns of this matrix form an o.n. basis, it is unitary; that is, S1H = S1-1. Putting this together, we have S1HA S1 =
z1 *
0  A1
where A1 is (n-1)×(n-1).
Step 2  Repeat this procedure with the matrix A1, which will have an eigenvalue z2. We can then construct an (n-1)×(n-1) unitary matrix U2 such that U2HA1 U2 =
z2 *
0  A2
Next, let S2 =
0 
0  U2
It is easy to see that S2 is unitary and that   S2H S1HA S1S2 =
z1 *  * 
0  z2 * 
0  0  A2.
Step 3 Continue repeating the procedure until you have found unitary matrices S1, S2, ..., Sn-1 such that
Sn-1H ··· S2H S1H A S1S2···Sn-1 = T,
where T =
z1 *  *  ... * 
0  z2 *  ... * 
0  0  z3 ... * 
...
0  0  0  ... zn-1
If we take S = S1S2···Sn-1 and note that the product of unitary matrices is unitary, we have S-1 = SH = Sn-1H ··· S2H S1H. Hence, we arrive at SHAS = T, which was what we wanted to establish. All that remains is to show the diagonal entries of T are the eigenvalues of A repeated according to algebraic multiplicity. To do this, we note that T and A are similar, so their characteristic polynomials are the same. Thus, we have
fA(z) = fT(z) = det(T - z I) = (z1 - z) ··· (zn - z),
which establishes that the eigenvalues of A are the diagonal elements of T, and that each eigenvalue is listed according to its algebraic multiplicity.

Block diagonal form  The Schur decomposition only proves that A is similar to an upper triangular matrix. The part of T above the diagonal may have no 0 entries. However, using elementary matrix operations (see section 8.2 in the text), we can show that T is itself similar to a matrix Tblock =
T1 0  0  ... 0 
0  T2 0  ... 0 
0  0  T3 ... 0 
...
0  0  0  ... Tr
If there are r distinct eigenvalues of A, z1, z2, ..., zr, and if the algebraic multiplicity of each is m1, ..., mr, then a block Tk is an mk×mk upper triangular matrix having zk's on the diagonal; specifically, Tk =
zk *  *  ... * 
0  zk *  ... * 
0  0  zk ... * 
...
0  0  0  ... zk
Jordan canonical form  Every matrix A has a basis relative to which the blocks Tk are Jordan blocks, Jmk(zk). This is an mk×mk matrix with zk's down the diagonal, 1's down the superdiagonal, and 0's elsewhere. For example, if mk = 6 and zk = 3, then J6(3) =
3 1 0 0 00
0 3 1 0 00
0 0 3 1 00
0 0 0 0 3 1
0 0 0 0 0 3
Theorem  Two matrices having the same Jordan canonical form, apart from ordering of the blocks along the diagonal, are similar.

One may find a proof of this theorem in Linear Algebra, 2nd ed.,by K. Hoffman and R. Kunze, Prentice-Hall, Upper Saddle River, NJ, 1971.