Notes for Math 423 - Spring 2002

Selfadjoint matrices

Adjoints The adjoint of a linear transformation L : V -> V, where V is an inner product space, is the unique linear transformation L' that satisfies
< L'[u],v > = < u, L[v] > .
For a real m×n matrix A, the adjoint is the transpose A^T. If the matrix is complex, then the adjoint of A is A^H, the conjugate transpose. A matrix is selfadjoint or Hermitian if it is equal to its own adjoint. Thus, if A is real, A = A^T; i.e., A is symmetric.

Properties of selfadjoint matrices

Theorem: The eigenvalues of a selfadjoint matrix are real and the eigenvectors corresponding to distinct eigenvalues are orthogonal.
Proof: If x₁ and x₂ are any two eigenvectors of A corresponding to the (not necessarily distinct) eigenvalues z₁ and z₂, then
< A x₁, x₂ > = < x₁, A x₂>
< z₁ x₁, x₂ > = < x₁, z₂ x₂>
z₁ < x₁, x₂ > = c.c.(z₂) < x₁, x₂> (c.c. = complex conjugate).
If we set x₁ = x₂, so that z₁ = z₂, then this formula implies that z₁ < x₁, x₁ > = c.c.(z₁) < x₁, x₁>. Since < x₁, x₁ > = || x₁ ||², we can divide by it to obtain c.c.(z₁) = z₁. Now, a complex number is real if and only if it is equal to its complex conjugate. It follows that z₁ is real. Applying this to the formula we derived above, we see that z₁ < x₁, x₂ > = z₂ < x₁, x₂>. This in turn gives us
(z₁ - z₂) < x₁, x₂ > = 0. Thus, if the eigenvalues z₁ and z₂ are distinct, then dividing by their difference yields < x₁, x₂ > = 0. That is, the corresponding eigenvectors are orthogonal.
Theorem: Every selfadjoint matrix has an orthonormal basis relative to which it is diagonal. Equivalently, there is a unitary (orthogonal, real case) matrix S such that S^HA S is a diagonal matrix. An n×n matrix S is said to be unitary if S^H = S^-1; this amounts to saying that the columns form an orthonormal basis for Cⁿ or Rⁿ.
Proof: We take A to be an n×n selfadjoint matrix. Our proof will be via induction. The statement is true for n=1, because everything is a scalar in that case. We only need a single vector. Suppose that n > 1. We must show that if every n-1×n-1 selfadjoint matrix has an orthonormal basis of eigenvectors, then every n×n selfadjoint matrix does, too. Start with the n×n selfadjoint matrix A. First of all, every matrix has at least one eigenvalue and one corresponding eigenvector, so A has an eigenvalue z₁ and eigenvector x₁, which we may take as a unit vector. We can always find vectors y₂, ..., y_n such that the set
{x₁, y₂, ..., y_n}.
is not only a basis, but, by application of the Gram-Schmidt process if necessary, an orthonormal basis. Note that the y_j's are not necessarily eigenvectors. They are simply chosen to so that the set above is an orthonormal basis. Let's find the matrix for A relative to this new basis:
B= T^H A T, T = [x₁ y₂, ... y_n].
We leave it as an exercise to show that B=
```
z₁ 0
0 C
```
where C is an n-1×n-1 selfadjoint matrix. The induction hypothesis then implies that we can find an orthonormal basis of eigenvectors {v₁, ..., v_n-1} for C^n-1 or R^n-1. Let V be the n-1×n-1 matrix given by V = [v₁ ... v_n-1], and note that
V^H C V = diag(z₂, z₃, ..., z_n).
Of course, since the v_j's are orthonomal, the n-1×n-1 matrix V is unitary. Next, let U =
```
1 0 
0 V
```
We also note that V being unitary implies that U is a unitary matrix. Moreover, U^H B U =
```
z₁ 0 
0 V^HCV
```
This is the diagonal matrix D= diag(z₁, z₂, ..., z_n). That is, D = U^H B U = U^HT^HATU. Taking S = TU finishes the proof, because the product of unitary matrices is unitary.

Quadratic forms One important application of these results is to finding principal axes of a quadratic form. If A is a real, symmetric n×n matrix, then the quadratic form associated with A is the function Q_A(x) = x^T A x. For example, the quadratic form associated with the matrix
A =

1 3
3 1

is Q_A(x) = x₁² + 6x₁x₂ + x₂². We recall from analytic geometry that the level curves x₁² + 6x₁x₂ + x₂² = constant are either ellipses or hyperbolas, but with principal axes rotated or reflected relative to the x₁-x₂ coordinate system. The typical problem is then to find these principal axes and the also standard form of the conic. Similar problems arise in three or more dimensions. We illustrate the technique for solving these using the example Q_A(x) = x₁² + 6x₁x₂ + x₂²

First, let's find the eigenvalues and eigenvectors of A. These turn out to be -2 and 4, with
u_-2 = [2^-1/2 -2^-1/2 ]^T and u₄ = [2^-1/2 2^-1/2 ]^T.
The eigenvectors are orthogonal, because they belong to distinct eigenvalues. We have also normalized them to have length 1. Thus the matrix S = [u_-2 u₄] is orthogonal and satisfies S^TAS = D, where D = diag(-2,4). We can rewrite this as A = SDS^T. The quadratic form then becomes
Q_A(x) = x^T A x =x^T SDS^Tx = (S^Tx)^TDS^Tx.
If we let X=S^Tx be new coordinates, which are the old ones rotated by 45 degrees, then we see that
Q_A(x) = Q_D(X) = -2X₁² + 4X₂².
The principal axes, in terms of the old coordinates, are just u_-2 and u₄, and the family of conics Q_A(x) = constant are all hyperbolas.

Singular value decomposition (SVD)

Theorem Every (real) m×n matrix A can be written as a product A = USV^T. Here, U and V are orthogonal matrices, with U being m×m and V being n×n. S is m×n, and has the form S =

s₁	0	0	...	0	...	0
0	s₂	0	...	0	...	0
...	...	...	...	...	...	...
0	0	0	...	s_r	...	0
0	0	0	...	0	...	0
...	...	...	...	...	...	...
0	0	0	...	0	...	0

The diagonal entries are positive, and are ordered from greatest to least; r is the rank of A. The s_k's are called the singular values of A.

Applications Gilbert Strang has called this theorem the "Fundamental Theorem of Linear Algebra." The SVD contains very explicit information concerning everything one would want to know about a matrix.

Condition number The condition number of a square, invertible matrix A is defined by
cond(A) = s₁/s_n.
It measures how many significant digits are preserved when one tries to solve Ax = b. For example, if b is known to 6 digits and cond(A) = 10³, then x is known to 6 -3 = 3 digits.
Numerical rank The rank of A is r, the number of (positive) singular values. The numerical rank of A is
rank(A,t) = # of singular values greater than a tolerance t.
Again, this is useful in working with problems having finite precision.
Least squares The solution to finding a minimum of || Ax - b || is easily done with the SVD. First we rewrite the problem using the SVD for A.
|| Ax - b || = || USV^Tx -UU^T b ||
|| Sz - c ||, z = V^Tx c = U^T b
Then we note that
|| Sz - c ||² = SUM_{k = 1...r} (s_kz_k - c_k )² + SUM_{k
= r+1...n} c_k².
Choosing z_k = c_k/s_k for k = 1 ... r and z_k = 0 for k = r+1 ... n not only solves the problem, but also gives the solution x = Vz with smallest length || x || = || z ||.

Finding the SVD of a matrix The matrix S above is just the matrix of A relative to new bases for both the input and output spaces.

The input space basis The matrix A^TA is real and symmetric, so there is an orthonormal basis of eigenvectors relative to which it is diagonal. Let the eigenvalues of A^TA be z₁, ..., z_n, and the corresponding basis of eigenvectors be {v₁, ..., v_n}. The eigenvalues are nonnegative, as this calculation shows:
A^TA v_k = z_k v_k
v_k^T A^T A v_k = z_k v_k^T v_k
|| A v_k ||² = z_k || v_k ||²
|| A v_k ||² = z_k (|| v_k || =1 )
This calculation also shows that a vector v is in the null space of A if and only if it is an eigenvector corresponding to the eigenvalue 0. If r = rank(A), then "rank + nullity = # of columns" tells us that the the nullity(A) = n - r. This means that there are r eigenvectors for the remaining eigenvalues. List these as z₁ >= z₂ >= ... >= z_r > 0. Our input basis is now chosen as {v₁, ..., v_r, v_r+1, ..., v_n }. The numbering is the same as that for the eigenvalues. We now define the matrix V via
V = [ v₁ ... v_r v_r+1 ... v_n ].
The output space basis For k = 1, ..., r, let
u_k = A v_k / || A v_k ||
u_k = A v_k / z₁^1/2
We can also write this as the following equation:
A v_k = z_k^1/2 u_k .
The u_k's are orthonormal, for k = 1, ..., r. Again, we see this from these equations.
u_j^T u_k = v_j^T A^T A v_k / z_j^1/2 z_k^1/2
u_j^T u_k = z_k v_j^T v_k / z_j^1/2 z_k^1/2
The orthonormality of the v's implies that the right side above is 0 unless j = k, in which case it is 1. Thus, the u's are orthonormal. Fill this set out with m - r vectors to form an orthonormal basis for the output space, R^m. This gives us the basis {u₁, .., u_r, u_r+1, .., u_m}. As before, we define the m×m orthogonal matrix
U = [u₁, .., u_r, u_r+1, .., u_m].
The matrix of A relative to the new bases We let S be M_A. We compute it via the formula for the matrix of a linear transformation.
S = [ [Av₁]_U [Av₂]_U ... [Av_r]_U [Av_r+1]_U ... [Av_n]_U ]
The v's with k = r+1,...,n, are all in the null space of A. Thus the last n - r vectors are 0, and so are their corresponding columns relative to the basis of u's. The other vectors we get from the definition of the u's for k = 1, ..., r. The end result is this.
S = [ [z₁^1/2 u₁]_U [z₂^1/2 u₂]_U ... [z_k^1/2 u_r]_U [ 0 ]_U ... [ 0 ]_U ]
S = [ z₁^1/2 [u₁]_U ... z_r^1/2 [u_r]_U 0 ... 0 ]
S = [ z₁^1/2 e₁ ... z_r^1/2 e_r 0 ... 0 ].
If we let s_k = z_k^1/2 for k = 1, ..., r, we get the same S as the one given in the statement of the theorem. These s_k's are the singular values of A. The matrix S is related to A via multiplication by change-of-basis matrices. The matrix U changes from new output to old output bases, and V changes from new input to old input bases. Since V^T = V^-1, we have that V^T changes from old input to new input bases. In the end, this gives us A =USV^T

Example Consider the matrix A =

 2  -2
 1   1
-2   2

Here, A^TA =

 9  -7
-7   9

The eigenvalues of this matrix are z₁ = 16 and z₂ = 2. The singular values are s₁ = 4 and s₂ = 2^1/2. We can immediately write out what S is. We have S =

 4 0
 0 2^1/2
 0 0

The eigenvector corresponding to 16 is v₁ = 2^-1/2(1,-1)^T, and the one corresponding to 4 is v₂ = 2^-1/2(1,1)^T. Hence, we see that V =

 2^-1/2   2^-1/2
-2^-1/2   2^-1/2

Next, we find the u's.

u₁ = A v₁ / z₁^1/2
u₁ = 2^-1/2 (4, 0, -4)^T/4
u₁ =( 2^-1/2 , 0, - 2^-1/2 )^T.

A similar calculation gives us
u₂ =( 0, 1, 0 )^T.

We now have to add to these to a "fill" vector
u₃ =( 2^-1/2 , 0, 2^-1/2 )^T
to complete the new output basis. This finally yields U =

 2^-1/2 0  2^-1/2
 0   1  0
-2^-1/2 0  2^-1/2

A least squares example We again consider the matrix A used above. This time we want to find an x in R² such that the distance
|| Ax - b ||, b = (1 1 3)^T
is a minimum. This is of course a least squares problem. Because orthogonal and unitary matrices preserve length, we have
|| Ax - b || = || USV^Tx - UU^Tb || = || SV^Tx - U^Tb || = || Sz - c ||, z = V^Tx and c = U^Tb = (2^1/2 1 2^3/2)^T.
Sz - c = (4z₁ - 2^1/2 2^1/2z₂- 1 2^3/2)^T.
The square of the length of this vector is
|| Sz - c ||² = (4z₁ - 2^1/2)² + (2^1/2z₂ - 1)² + 8
This is minimized by taking z₁ = 2^-3/2 and z₂ = 2^-1/2. We can also write this in matrix form as z = S⁺c, where S⁺ =

 4^-1  0  0
 0 2^-1/2 0

Using this matrix we can then write out the solution to the original least squares problem as a matrix product,
x = VS⁺U^Tb.
The solution we obtain here is x = (3/4 1/4)^T. Now, the point is that if we look at what we did in light of what we said earlier about least squares and the SVD, we see that this is a general result.

The pseudoinverse The pseudoinverse of an m×n matrix A that has SVD A = USV^T is defined to be
A⁺ = VS⁺U^T, where S⁺ is the n×m matrix given by S⁺ =

s₁^-1 0 0 ... 0 ... 0

0 s₂^-1 0 ... 0 ... 0

... ... ... ... ... ... ...

0 0 0 ... s_r^-1 ... 0

0 0 0 ... 0 ... 0

... ... ... ... ... ... ...

0 0 0 ... 0 ... 0

Thus S⁺ is constructed from S by taking the transpose of S and replacing the nonzero diagonal elements by their reciprocals. We have the following result.

Theorem Let A be an m×n matrix and let A⁺ be its pseudoinverse. If b is an n×1 column vector, then x = A⁺b minimizes || Ax - b || and has the property that x has the smallest norm || x || among all minimizers.

Finite element method

An ODE problem We want to illustrate the finite element method with a simple example. Solve the ODE - y'' = f(x), y(0) = y(1) = 0. We take the space V to be all continuous functions g(x) defined on [0,1] having a piecewise continuous derivative g'(x) and satisfying g(0) = g(1) = 0. The inner product for this space is < f , g > = S₀¹ f'(x)g'(x) dx. The subspace U comprises all piecewise linear functions in V, with possible discontinuities in the derivatives at 0, 1/n, 2/n, ... (n-1)/n, 1. These are linear B-splines; the possible discontinuities are called knots. A (non-orthogonal) basis is
w_j(x) = B(n x - j), j = 1, ..., n-1,
where B(x) is the piecewise linear function defined this way:

B(x) = 0 if x < -1 or x > 1;
B(x) = 1 + x if -1 <= x < 0;
B(x) = 1 - x if 0 < x <= 1.

We want to minimize || y - u ||. This is a least squares problem. The only new thing here is doing least squares with a non-orthogonal basis. We will now discuss such problems.

Normal equations for non-orthogonal bases Let V be a vector space with an inner product < u, v >, and let U be a finite dimensional subspace of V. The vector u* minimizes the distance || v - u || if and only if u* satisfies
< v-u*, u > = 0,
for all u in U. Equivalently, u* is the minimizer if and only if v - u* is orthogonal to U. Let U have a basis {w₁, ... , w_n-1}. We can express the minimizer u* as
u*= c₁w₁+ ... + c_n-1 w_n-1.
The condition that v - u* is orthogonal to U implies that for all j = 1 to n-1, we have
< v-u*, w_j > = 0.
Inserting the expression for u* in terms of the w's, we see that the coefficients x_j satisfy the system of equations
c₁< w₁, w_j > + ... + c_n < w_n-1, w_j > = < v, w_j >
In matrix form, the equation above becomes

Ac=d, where A_jk= < w_k, w_j >, d_j = < v , w_j >.

The matrix A is called the Gram matrix for the basis of w's; it is always invertible.

Lemma: The Gram matrix A with entries A_jk= < w_k, w_j > is invertible if and only if the w's are linearly independent.

Proof: To simplify the calculation, we will assume the scalars are the real numbers. Let x₁, ..., x_n-1 be scalars, and consider the vector w in U given by
w = x₁w₁+ ... + x_n-1 w_n-1.
Putting this form of w in the inner product < w, w_j > gives us the equation
< w, w_j > = x₁< w₁, w_j > + ... + x_n-1 < w_n-1, w_j > = [Ax]_j,
which is the j^th component of the column vector Ax. The implication is that
< w, w >
= < w, x₁w₁+ ... + x_n-1 w_n-1 >
= x₁ < w, w₁ > + ... + x_n-1 < w, w_n-1 >
= x^TAx.
Now, A is singular (not invertible) if and only if there is a non-zero vector x such that Ax = 0. If such a vector exists, then
0= x^TAx = < w, w > = || w ||²,
which implies that w = 0. Since the coefficients in x are not all 0, the set of w's has to be linearly dependent. Conversely, if the w's are linearly dependent, there is a non-zero vector x such w = 0. This and the equation < w, w > = [Ax]_j then imply that Ax = 0. Hence, A is singular. This shows that A is singular if and only if the w's are linearly dependent. Equivalently, A is invertible if and only if the w's are linearly independent.

We collect what we have proved above in this result:

Theorem: Let U be a finite dimensional subspace of an inner product space V, let v be a vector in V, and let U have a (possibly non-orthogonal) basis {w₁, ... , w_n-1}. The minimizer u* of ||v - u|| is u*= c₁w₁+ ... + c_n-1 w_n-1, where c = A^-1d. Here A is the Gram matrix for the w's and d is the data vector with components d_j= < v , w_j >.

The solution Relative to {w₁, ... ,w_n-1}, the data vector and entries in A are given by
d_j = < y, w_j > = S₀¹ f(x) w_j(x) dx (integrate by parts)
A_jk = < w_k, w_j > = S₀¹ w'_k(x) w'_j(x) dx
From here on, one must compute d, A, and solve Ac = d for c. In this case, one can show that u*(x) is just the the piecewise linear function that is 0 at x=0, and c_j at x= j/n, j=1, .. , n-1, and 0 at x = 1. One can thus plot u* by ``connecting the dots.''

Let us carry this procedure out. By examining the graphs of w'_j = nB'(nx-j), we see that A_jk = < w_k, w_j > satisfies
A_j,j = 2n, j = 1 ... n-1
A_j,j-1 = - n, j = 2 ... n-1
A_j,j+1 = - n, j = 1 ... n-2
A_j,k = 0, all other possible k.

For example, if n=5, then A is

10  -5   0   0
-5  10  -5   0
 0  -5  10  -5 
 0   0  -5  10

Triangularization of a matrix

Theorem (Schur form of a matrix): Given an n×n matrix A with complex entries, one can find a unitary transformation S such that S^HAS = T, where T is an upper triangular matrix whose diagonal comprises the eigenvalues of A repeated according to each eigenvalues algebraic multiplicity.

Proof: We will carry out the construction of T in several steps. (See section 8.2 in the text.)

Step 1 Let A be an n×n matrix, and let z₁ be an eigenvalue of A, with v₁ being the corresponding eigenvector normalized to have length 1. We can find n-1 vectors u₂ ... u_n such that {v₁, u₂, ..., u_n) form a basis for Cⁿ. We can also assume that this basis is orthonormal - if not, apply Gram- Schmidt. Finally, set S₁ = [v₁ u₂ ... u_n]. Since the columns of this matrix form an o.n. basis, it is unitary; that is, S₁^H = S₁^-1. Putting this together, we have S₁^HA S₁ =

z₁ *
0 A₁

where A₁ is (n-1)×(n-1).

Step 2 Repeat this procedure with the matrix A₁, which will have an eigenvalue z₂. We can then construct an (n-1)×(n-1) unitary matrix U₂ such that U₂^HA₁ U₂ =

z₂ *
0 A₂

Next, let S₂ =

1  0
0 U₂

It is easy to see that S₂ is unitary and that S₂^H S₁^HA S₁S₂ =

z₁ * *
0 z₂ *
0 0 A₂.

Step 3 Continue repeating the procedure until you have found unitary matrices S₁, S₂, ..., S_n-1 such that
S_n-1^H ··· S₂^H S₁^H A S₁S₂···S_n-1 = T,
where T =

z₁ * * ... *
0 z₂ * ... *
0 0 z₃ ... *
...
0 0 0 ... z_n-1

If we take S = S₁S₂···S_n-1 and note that the product of unitary matrices is unitary, we have S^-1 = S^H = S_n-1^H ··· S₂^H S₁^H. Hence, we arrive at S^HAS = T, which was what we wanted to establish. All that remains is to show the diagonal entries of T are the eigenvalues of A repeated according to algebraic multiplicity. To do this, we note that T and A are similar, so their characteristic polynomials are the same. Thus, we have
f_A(z) = f_T(z) = det(T - z I) = (z₁ - z) ··· (z_n - z),
which establishes that the eigenvalues of A are the diagonal elements of T, and that each eigenvalue is listed according to its algebraic multiplicity.

Block diagonal form The Schur decomposition only proves that A is similar to an upper triangular matrix. The part of T above the diagonal may have no 0 entries. However, using elementary matrix operations (see section 8.2 in the text), we can show that T is itself similar to a matrix T_block =

T₁ 0 0 ... 0
0 T₂ 0 ... 0
0 0 T₃ ... 0
...
0 0 0 ... T_r

If there are r distinct eigenvalues of A, z₁, z₂, ..., z_r, and if the algebraic multiplicity of each is m₁, ..., m_r, then a block T_k is an m_k×m_k upper triangular matrix having z_k's on the diagonal; specifically, T_k =

z_k * * ... *
0 z_k * ... *
0 0 z_k ... *
...
0 0 0 ... z_k

Jordan canonical form Every matrix A has a basis relative to which the blocks T_k are Jordan blocks, J_{m_k}(z_k). This is an m_k×m_k matrix with z_k's down the diagonal, 1's down the superdiagonal, and 0's elsewhere. For example, if m_k = 6 and z_k = 3, then J₆(3) =

3	1	0	0	0	0
0	3	1	0	0	0
0	0	3	1	0	0
0	0	0	0	3	1
0	0	0	0	0	3

Theorem Two matrices having the same Jordan canonical form, apart from ordering of the blocks along the diagonal, are similar.

One may find a proof of this theorem in Linear Algebra, 2^nd ed.,by K. Hoffman and R. Kunze, Prentice-Hall, Upper Saddle River, NJ, 1971.

s₁^-1	0	0	...	0	...	0
0	s₂^-1	0	...	0	...	0
...	...	...	...	...	...	...
0	0	0	...	s_r^-1	...	0
0	0	0	...	0	...	0
...	...	...	...	...	...	...
0	0	0	...	0	...	0