Math 601 - Fall 2000

Summary

Dates

AUG	29	31
SEP	5	7	12	14	19	21	26	28
OCT	3	5	10	12	17	19	24	26	31
NOV	2	7	9	14	16	21	23	28	30
DEC	5

29 August

Vector spaces

Definition - Vector space. A vector space is a set V together with two operations, + and × . If u, v are in V, then u + v is in V; if c is a scalar, then c×v is in V. The operations satisfy the following rules.

Addition Scalar multiplication

u + (v + w) = (u + v) + w a×(b×u) = (ab)×u

Identitiy: u + 0 = 0 + u = u (a + b)×u = a× u + b×u

Inverse: u + (-u) = (-u) + u = 0 a×(u + v) = a×u + a×v

u + v = v + u 1×u = u

Addition		Scalar multiplication
u + (v + w) = (u + v) + w		a×(b×u) = (ab)×u
Identitiy: u + 0 = 0 + u = u		(a + b)×u = a× u + b×u
Inverse: u + (-u) = (-u) + u = 0		a×(u + v) = a×u + a×v
u + v = v + u		1×u = u

Subspaces

Definition - Subspace. A subset U of V is a subspace if, under + and × from V, U is a vector space in its own right.

Theorem. U is a subspace of V if and only if these hold:

0 is in U.
U is closed under + .
U is closed under × .

Example: U={(x₁, x₂, x₃) such that x₁ + x₂ - x₃ = 0} is a subspace of R³. On the other hand, W={(x₁, x₂, x₃) such that x₁ + x₂ - x₃ = 3} is not a subspace of R³, because 0 is not in W.

31 August

Span

Definition - Span. Let S={v₁ ... v_n} be a subset of a vector space V. The span of S is the set of linear combinations of vectors in S. That is,
U=span(S)={c₁v₁ + ... + c_n v_n},
where the c's are arbitrary scalars.

Proposition. The set span(S) is a subspace of V.

Dual space

Definition - Dual Space. The set V^* of all linear functions f:V - > R (or C) is called the (algebraic) dual of V. Linear means that
f(c₁v₁ + c₂v₂) = c₁f(v₁) + c₂f(v₂)

Coordinates and Bases

Coordinates for a vector space. The correspondence
v <-> (c₁, c₂, ... , c_n)
should be 1:1, onto, and preserve vector addition and scalar multiplication. If we let v_k <-> e_k, then the condition of being ``onto'' holds if and only if
V=span{v₁, v₂, ... , v_n}.
The condition of being ``1:1'' holds if and only if {v₁, v₂, ... , v_n} is linearly independent, which we now define.

Definition - Linear independence and linear dependence. We say that a set of vectors
S = {v₁, v₂, ... , v_n}
is linearly independent if the equation
c₁v₁ + c₂v₂ + ... + c_nv_n = 0
has only c₁ = c₂ = ... = c_n = 0 as a solution. If it has solutions different from this one, then the set S is said to be linearly dependent.

Bases

Definition - basis. A subset B = {v₁ ... v_n} of V is a basis for v if B spans V and is linearly independent. Equivalently, B is a basis if it is maximally linearly independent; that is, B is not a proper subset of some other linearly independent set. Unless we specifically state otherwise, we will assume that B is ordered.

Theorem. Every basis for V has the same number of vectors as every other basis. This common number is defined to be the dimension of V, dim(V).

Coordinates and bases. The properties defining a basis are exactly the ones needed to define ``good'' coordinates for a vector space. Given an ordered basis B = {v₁ ... v_n} and a vector v, we can uniquely write the vector as v = x₁v₁ +...+ x_nv_n, and thus represent it by the column vector [v]_B = [x₁, ..., x_n]^T

5 September

Change-of-bases

Let B = {v₁ ... v_n} and D = {w₁ ... w_n} be ordered bases for a vector space V. Suppose that we have these formulas for v's in terms of w's and vice versa:

v_j=A_1jw₁ + A_2jw₂ + ... + A_njw_n
w_k=C_1kv₁ + C_2kv₂ + ... + C_nkv_n
(Note that the sums are over the row index for each matrix A and C.) For any vector v with representations
v = b₁v₁ +...+ b_nv_n
v = d₁w₁ +...+ d_nw_n
and corresponding coordinate vectors
[v]_B = [b₁,..., b_n]^T
[v]_D = [d₁,..., d_n]^T
we have the change-of-basis formulas
[v]_D = A[v]_B and [v]_B = C[v]_D.
These imply that AC=CA=I_n×n.

Examples

Rⁿ. We suppose that the two bases are sets of column vectors. Let
V = [v₁ ... v_n]
and similarly
W = [w₁ ... w_n] .
The matrices A and C above then satisfy V=WA and W=VC. Consequently,
A = V^-1W and C = W^-1V
P_n. We will consider the n = 2 case. Consider the two ordered bases,
B = {1,x,x²} and D = {(3-x)², x+2, x-1}. The matrix that takes coordinates relative to B into ones relative to D is
```
  0      0    1
 1/3    1/3  -1
-1/3    2/3  2 1/3
```

7 September

Inner product

Definition - Inner product Let V be a vector space. we say that a mapping < , > : V×V --> R is an inner product for V if these hold:

positivity - <v ,v > > 0, with <v ,v > = 0 implying that v=0.
symmetry - = <v ,u >
homogeneity - < cu ,v > = c
additivity - < u+v ,w > = + < v, w >

Schwarz's inequality: || <= ||u|| ||v||, where ||u|| := ()^½ is called the norm or length of a vector u.

Triangle inequality: ||u + v|| <= ||u|| + ||v||

12 September

Function spaces - see Function spaces (PS) (PDF)

Reviewed inner products; discussed complex inner products
Orthogonal sets and orthonormal sets of vectors
Separated variables to solve heat equation
- Discussed solution in terms of inner products; viewed {e^{i n
  theta}} as an orthonormal set
- Discussed Cauchy sequences in terms of convergence.
Completeness in inner product spaces; all Cauchy sequences converge
Hilbert space - a complete inner product space

14 September

Gram-Schmidt Process

Algorithm Let V be a vector space with an inner product < u, v >, and let {v₁ ... v_n} be a linearly independent set. We want to find an orthonormal set with the same span.

u₁ := v₁ (|| v₁ ||)^-1 equivalently v₁ = r₁₁u₁, r₁₁ = || v₁ ||
u₂ := (v₂ - < v₂, u₁ > u₁) r₂₂^-1 equivalently v₂ = r₁₂u₁ + r₂₂u₁,
where r₁₂ = < v₂, u₁ > and r₂₂ = || v₂ - < v₂, u₁ > u₁ ||
u_k := (v_k - < v_k, u₁ > u₁ - ... - < v_k, u_k-1 > u_k-1 ) r_kk^-1 equivalently v_k = r_1ku₁ + ... + r_kku_k,
where r_jk = < v_k, u_j > and r_kk = || v_k - < v_k, u₁ > u₁ - ... - < v_k, u_k-1 > u_k-1 ||.
Repeat the step above for k = 3, 4, ... n. The result is an orthonormal (o.n.) basis that replaces the v_j's.

QR-Algorithm Write all the v_j's. and u_j's in terms of a common basis, E = {z₁ ... z_n}. Define the column vectors:
a_k = [v_k]_E and q_k [u_k]_E
Taking coordinates relative to E in the eqaution v_k = r_1ku₁ + ... + r_kku_k gives the equation
a_k = r_1kq₁ + ... + r_kkq_k .
Letting A = [ a₁ .. a_n ], Q = [ q₁ .. q_n ], and R =

r₁₁ r₁₂ r₁₃ ... r_1n
0 r₂₂ r₂₃ ... r_2n
0  0 r₃₃ ... r_3n
...
0  0 0 ... r_nn

we have A = QR. If the basis E doesn't change the inner product, then the columns of Q are orthonormal. This is the QR factorization. It also works if we start with an arbitrary matrix and apply Gram-Schmidt to the column space, with the inner product < u, v > = v^Tu.

Least-Squares Problem We have a vector v in an inner product space V, and subspace U of V. The least-squares problem is to find the both the minimum of || v - u ||, where u is any vector in U, as well as the minimizer u*.

19 September

Least Squares

Normal equations Let V be a vector space with an inner product < u, v >, and let U be a finite dimensional subspace of V. The vector u* minimizes the distance || v - u || if and only if u* satisfies the normal equations,
< v-u*, u > = 0,
which hold for all u in U. (We say that v - u* is orthogonal to U.)

The minimizer Let U have a basis {w₁, ... , w_n}. Let u* = c₁w₁+ ... + c_n w_n. The normal equations imply that the coefficients x_j satisfy the matrix equation

Ac=d, where A_jk= < w_k, w_j >, d_j = < v , w_j >.

The matrix A is called the Gram matrix for the basis of w's; it is always invertible. (See problem 1, Assignment 4.)

Orthonormal case If the basis {u₁, ... , u_n} (u's replace w's) is orthonormal, then A = I and x = d; that is, the minimizer has the form
u* = < v , u₁ > u₁+ ... + < v , u_n > u_n

Finite element method We want to illustrate the finite element method with a simple example. Solve the ODE - y'' = f(x), y(0) = y(1) = 0. We take the space V to be all continuous functions g(x) defined on [0,1] having a piecewise continuous derivative g'(x) and satisfying g(0) = g(1) = 0. The inner product for this space is < f , g > = S₀¹ f'(x)g'(x) dx. The subspace U comprises all piecewise linear functions in V, with possible discontinuities in the derivatives at 0, 1/n, 2/n, ... (n-1)/n, 1. These are linear B-splines; the possible discontinuities are called knots. A (non-orthogonal) basis is
w_j(x) = B(n x - j), j = 1, ..., n-1,
where B(x) is the piecewise linear function defined this way:

B(x) = 0 if x < -1 or x > 1;
B(x) = 1 + x if -1 <= x < 0;
B(x) = 1 - x if 0 < x <= 1.

We want to minimize || y - u ||. Relative to {w₁, ... , w_n-1}, the normal equations are Ac = d, where
d_j = < y, w_j > = S₀¹ f(x) w_j(x) dx (integrate by parts)
A_jk = < w_k, w_j >
From here on, one must compute d, A, and solve Ac = d for c. In this case, one can show that u*(x) is just the the piecewise linear function that is 0 at x=0, and c_j at x= j/n, j=1, .. , n-1, and 0 at x = 1. One can thus plot u* by ``connecting the dots.'' A specific example will be included in Assignment 4.

Infinite sets of orthonormal vectors See §4 in my notes on Function spaces (PS) (PDF).

21 September

Norms

Definition A norm on a vector space V is a function || || : V -> [0,infinity) that satisfies these properites:

Positivity: || u || >= 0, and || u || = 0 implies u = 0.
Positive homogeneity: || cu || = |c| || u ||
Triangle inequality: || u + v || <= || u || + || v ||

Norms from an inner product || u || = ()^1/2. A norm comes from an inner product if and only if it satisfies the polarization identity. For real vector spaces, this is
|| u + v ||² + || u - v ||² = 2(|| u ||² + || v ||²)

Examples

V = C[a,b], || f || = max_{a <= x <= b} |f(x)|
V = Rⁿ, || x ||_p = (|x₁|^p + ... + |x_n|^p)^1/p, 1 <= p
V = Rⁿ, || x ||_infinity = max_{k
= 1 ... n} |x_k|, p = infinity

Linear transformations

Definition A mapping L:V -> W, where V, W are vector spaces is said to be a linear transformation if it satisfies these properties.

Homogeneity L[cu] = cL[u]
Additivity L[u+v] = L[u] + L[v]

Matrix associated with L If V and W are finite dimensional, and if
B = {v₁, ... , v_n} and D = {w₁, ... , w_m}
are bases for V and W, respectively, then the matrix for L is an m×n matrix M with columns
M_k = [ L[v_k] ]_D.
For example, if L=R_t, which rotates a 2D-vector through an angle t, then, using the basis e₁ = i, e₂ = j, the columns of M_t are
[ R_t[e₁] ] = [ cos(t), sin(t)]^T
[ R_t[e₂] ] = [ -sin(t), cos(t)]^T
the the matrix that represents the rotation is

cos(t)  -sin(t) 

sin(t)   cos(t)

26 September

Properties and combinations of linear transformations

Action on linear combinations
L[ c₁v₁ + c₂v₂ + ... + c_nv_n ] = c₁L[v₁] + c₂L[v₂] + ... + c_nL[v_n]

Derivation of the matrix associated with L
Let v = c₁v₁ + c₂v₂ + ... + c_nv_n, so [v]_B = [c₁, ... , c_n]^T. From above, we see that

[ L[v] ]_D = c₁[ L[v₁] ]_D + c₂[ L[v₂] ]_D + ... + c_n[ L[v_n] ]_D = M_L[v]_B
where
M_L = [ [ L[v₁] ]_D, ... , [ L[v_n] ]_D ]

Sums K+L is defined by (K+L)[v] = K[v] + L[v].

Scalar multiples cL is defined by (cL)[v] = c(L[v]).

Products If K : V -> U and L : U -> W are linear, then we define LK via LK[v] = L[K[v]]. (This is composition of functions). The transformation defined this way, LK, is linear, and maps V -> W. Note: LK is not in general equal to KL.

Inverses Let L : V -> V be linear. As a function, if L is both one-to-one and onto, then it has an inverse K V -> V. One can show that K is linear, and LK = KL = I, the identity transformation. We write K = L^-1.

Associated matrices

M_{K + L} = M_K + M_L

M_cL = c M_L

M_LK = M_L M_K

M_L^-1 = (M_L)^-1

Polynomials in L : V -> V We define powers of L in the usual way: L² = LL, L³ = LLL, and so on. A polynomial in L is then the transformation
p(L) = a₀I + a₁L + ... + a_mL^m
Later on we will encounter the Cayley-Hamilton theorem, which says that if V has dimension n, then there is a degree n (or less) polynomial p for which p(L) is the 0 transformation.

28 September

Change-of-basis

Change-of-basis and linear transformations Let L : V -> V be linear, and let V have bases
B = {v₁, ..., v_n} and D = {w₁, ..., w_n}.
If the matrix of L relative to B is M_L, and that relative to D is N_L, then
N_L = S_{B -> D} M_L (S_{B
-> D})^-1,
where S_{B -> D} changes B coordinates to D coordinates.

Invariant subspaces

Definition Let L : V -> V be linear, and let U be a subspace of V. We say that U is invariant under L if u being in U implies that L[u] is in U.

Eigenvalue problems We say that a scalar l is an eigenvalue of L : V -> V if there is exists a vector v in V, with v not equal to 0, such that L[v] = l v. We let E_l be the span of all eigenvectors associated with l, and we call E_l the eigenspace of l. E_l is an invariant subspace for L.

Characteristic polynomial Let A be an n× n matrix, and define p_A(l) = det(A - l I). p_A is a polynomial of degree n. The scalar l is an eigenvalue of A if and only if it is a root of p_A. This follows from two obeservations: (1) l is an eigenvalue of a if and only if for some x not equal to 0 Ax = l x; this in turn is equivalent to (A - l I)x = 0, so A - l I is singular. (2) A - l I is singular if and only if det(A - l I) = 0.

For A, the problem of finding eigenvalues and eigenvectors decouples. One can first find the roots of p_A(l), and then solve a linear system to get the eigenvectors.
If the scalars are the complex numbers, then the Fundamental Theorem of Algebra implies that p_A has at least one root. Consequently, A has at least one eigenvalue.
To find the eigenvalues and eigenvectors of a linear transformation L, choose a basis for V and find the matrix of L relative to this basis, M_L. The eigenvalues of L are precisely those of M_L; the eigenvectors of L have coordinates corresponding to the eigenvectors of M_L.

3 October

Determinants - A Quick Tour

Permutations Permutations of the integers 1 through n are either even or odd. A permutation is even if it can be achieved by an even number of interchanges (transpositions), and odd if it takes an odd number of them. There are n! permutaions. We define the function sgn(p) to be +1 if p is an even permutation, and -1 if p is odd.

Definiton of a determinant If A is an n×n matrix, then we define det(A) via
det(A) = SUM_p sgn(p) a_{i1, 1}a_{i2, 2} ...a_{in, n} , p = (i1,i2,...,in)

Basic properties of determinants These properties follow immediately from the definition. On the other hand, they characterize the determinant. Only det(A) satisfies them.

Multilinearity. Let A =[a₁,a₂,...,a_n]. Then det([a₁,a₂,...,a_n]) is a linear function of column a_k, if all other columns are held fixed.
Alternating function. Interchanging two columns changes the sign of the determinant.
det(I) = 1, I = identity matrix.

Determinants and matrices

If two columns of A are equal, then det(A)=0.
If a column of A has all 0's, then det(A)=0.
Product rule
If A and B are n×n matrices, then det(AB)=det(A)det(B).
A is singular if and only if det(A) = 0.
det(A^-1) = (det(A))^-1

Cramer's rule If A is invertible, and y = Ax, then
x₁ = det([y,a₂,...,a_n])/det(A).
x₂ = det([a₁,y,a₃,..., a_n])/det(A).

...

x_n = det([a₁,a₂,..., a_n-1, y])/det(A).

Cramer's rule for the inverse of A The cofactor of the (j,k)-entry in A is
C_j,k = det([a₁,..,a_k-1,e_j, a_k+1,...,a_n]]).
The adjugate or classical adjoint of A is adj(A) = C^T, and
A^-1 = adj(A)/det(A)

Characteristic polynomial

Structure p_A(t) = (-1)ⁿ tⁿ + (-1)^n-1 t^n-1 trace(A) + ... + det(A)

Basis independence If B = S^-1AS, then p_B(t) = p_A(t).

Cayley-Hamilton Theorem p_A(A) = 0.

5 October

Diagonalization

Definition A linear transformation L : V -> V, where dim(V) = n, is diagonalizable if there is a basis for V relative to which the matrix for L is diagonal.

Matrix Example The matrix A =

2 -3 1
1 -2 1
1 -3 2

has eigenvalues 0, 1 (repeated twice), and corresponding eigenvectors
{(1,1,1)^T}, and {(-1,0,1)^T,(3,1,0)^T}.
If we let S be the matrix with these eigenvectors as columns - that is, S =

1 -1  3
1  0  1
1  1  0

then S^-1 A S =

0  0  0
0  1  0
0  0  1

Lemma Let L : V -> V have r distinct eigenvalues, and let v₁, v₂, ..., v_r be eigenvectors corresponding to each of the eigenvalues. Then the set of eigenvectors {v₁, v₂, ..., v_r} is linearly independent.

Theorem Let V be finite dimensional, with n = dim(V). If L : V -> V has n distinct eigenvalues, then L is diagonalizable.

Transformation Example Let L : P₂ - > P₂ be given by L[p] = ((1-x²)p')'. L has eigenvalues 0, -2, and -6. Since these are distinct we know that L is diagonalizable. The eigenvectors for L corresponding to 0, -2, and -6 are 1, x, and 3x²-1. These form a basis D = {1, x, 3x²-1} for P₂. The matrix for L relative to this basis is the diagonal matrix M_L =

0   0   0
0  -2   0
0   0  -6

Non-diagonalizable transformation Not all linear transformations are diagonalizable. Consider the shear transformation T : R² -> R² defined by
T[x] =x + 2x₂e₁.
The matrix for T is M_T =

1 2
0 1

This is not diagonalizable. We can see this directly simply by noting that it has only 1 as an eigenvalue, and for T to be diagonalizable M_T would have to be similar to the identity, I. One can also see this by noting that, up to multiples, the only eigenvector is e₁, which doesn't form a basis for R².

10 October

Triangular forms of matrices

Basic upper triangular form Every matrix A has a basis relative to which it is upper triangular. See Assignment 6, Problem 3.

Block upper triangular form Every matrix A has a basis relative to which it is in block triangular form. This means that we can find an invertible matrix S such that S^-1AS =

T_1,1	0	0	...	0
0	T_2,2	0	...	0
...	...	...	...	...
0	0	0	...	T_r,r

Each diagonal block T_k,k is upper triangular, with the diagonal entries all being an eigenvalue z_k repeated as many times as it is a root of the characteristic polynomial. For example, if z₇ is repeated four times, then T_7,7 =

z₇	*	*	*
0	z₇	*	*
0	0	z₇	*
0	0	0	z₇

Jordan normal form Every matrix A has a basis relative to which the blocks T_k,k are Jordan blocks, J_m(z). This is an m×m matrix with z's down the diagonal, 1's down the superdiagonal, and 0's elsewhere. For example, if m = 6 and z = 3, then J₆(3) =

3	1	0	0	0	0
0	3	1	0	0	0
0	0	3	1	0	0
0	0	0	0	3	1
0	0	0	0	0	3

Two matrices having the same Jordan normal form, apart from ordering of the blocks along the diagonal, are similar. An m×m matrix A is similar to J_m(z) if and only there is a basis {f₁, ..., f_m} satisfying

Af₁ = zf₁
Af₂ = zf₂ + f₁
...

Af_m = zf_m + f_m-1

Example Consider the matrix A =

2  1 -1
0  2  3
0  0  2

We begin with the eigenvector, f₁ = (1,0,0)^T. Solving (A - 2I)f₂ = f₁ gives f₂ = (0,1,0)^T. Finally, solving (A - 2I)f₃ = f₂ gives f₃ = (1/3,1/3,0)^T. Thus, S^-1AS = J₃(2), where S = [f₁, f₂, f₃]

Selfadjoint matrices

Adjoints The adjoint of a linear transformation L : V -> V, where V is an inner product space, is the unique linear transformation L' that satisfies
< L'[u],v > = < u, L[v] > .
For a real m×n matrix A, the adjoint is the transpose A^T. If the matrix is complex, then the adjoint of A is A^H, the conjugate transpose. A matrix is selfadjoint or Hermitian if it is equal to its own adjoint. Thus, if A is real, A = A^T; i.e., A is symmetric.

Properties of selfadjoint matrices

The eigenvalues of a selfadjoint matrix are real.
Eigenvectors corresponding to distinct eigenvalues are orthogonal.
Every selfadjoint matrix is diagonalizable.
Every selfadjoint matrix has an orthonormal basis relative to which it is diagonal. Equivalently, there is an orthogonal matrix S such that S^TA S is a diagonal matrix. An n×n matrix S is said to be orthogonal if S^T = S^-1; this amounts to saying that the columns form an orthonormal basis for Rⁿ.

12 October

Singular value decomposition (SVD)

Theorem Every (real) m×n matrix A can be written as a product A = USV^T. Here, U and V are orthogonal matrices, with U being m×m and V being n×n. S is m×n, and has the form S =

s₁	0	0	...	0	...	0
0	s₂	0	...	0	...	0
...	...	...	...	...	...	...
0	0	0	...	s_r	...	0
0	0	0	...	0	...	0
...	...	...	...	...	...	...
0	0	0	...	0	...	0

The diagonal entries are positive, and are ordered from greatest to least; r is the rank of A. The s_k's are called the singular values of A.

Applications Gilbert Strang has called this theorem the "Fundamental Theorem of Linear Algebra." The SVD contains very explicit information concerning everything one would want to know about a matrix.

Condition number The condition number of a square, invertible matrix A is defined by
cond(A) = s₁/s_n.
It measures how many significant digits are preserved when one tries to solve Ax = b. For example, if b is known to 6 digits and cond(A) = 10³, then x is known to 6 -3 = 3 digits.
Numerical rank The rank of A is r, the number of (positive) singular values. The numerical rank of A is
rank(A,t) = # of singular values greater than a tolerance t.
Again, this is useful in working with problems having finite precision.
Least squares The solution to finding a minimum of || Ax - b || is easily done with the SVD. First we rewrite the problem using the SVD for A.
|| Ax - b || = || USV^Tx -UU^T b ||
|| Sz - c ||, z = V^Tx c = U^T b
Then we note that
|| Sz - c ||² = SUM_{k = 1...r} (s_kz_k - c_k )² + SUM_{k
= r+1...n} c_k².
Choosing z_k = c_k/s_k for k = 1 ... r and z_k = 0 for k = r+1 ... n not only solves the problem, but also gives the solution x = Vz with smallest length || x || = || z ||.

Finding the SVD of a matrix The matrix S above is just the matrix of A relative to new bases for both the input and output spaces.

The input space basis The matrix A^TA is real and symmetric, so there is an orthonormal basis of eigenvectors relative to which it is diagonal. Let the eigenvalues of A^TA be z₁, ..., z_n, and the corresponding basis of eigenvectors be {v₁, ..., v_n}. The eigenvalues are nonnegative, as this calculation shows:
A^TA v_k = z_k v_k
v_k^T A^T A v_k = z_k v_k^T v_k
|| A v_k ||² = z_k || v_k ||²
|| A v_k ||² = z_k (|| v_k || =1 )
This calculation also shows that a vector v is in the null space of A if and only if it is an eigenvecor corresponding to the eigenvalue 0. If r = rank(A), then "rank + nullity = # of columns" tells us that the the nullity(A) = n - r. This means that there are r eigenvectors for the remaining eigenvalues. List these as z₁ >= z₂ >= ... >= z_r > 0. Our input basis is now chosen as {v₁, ..., v_r, v_r+1, ..., v_n }. The numbering is the same as that for the eigenvalues. We now define the matrix V via
V = [ v₁ ... v_r v_r+1 ... v_n ].
The output space basis For k = 1, ..., r, let
u_k = A v_k / || A v_k ||
u_k = A v_k / z₁^1/2
We can also write this as the following equation:
A v_k = z_k^1/2 u_k .
The u_k's are orthonormal, for k = 1, ..., r. Again, we see this from these equations.
u_j^T u_k = v_j^T A^T A v_k / z_j^1/2 z_k^1/2
u_j^T u_k = z_k v_j^T v_k / z_j^1/2 z_k^1/2
The orthonormality of the v's implies that the right side above is 0 unless j = k, in which case it is 1. Thus, the u's are orthonormal. Fill this set out with m - r vectors to form an orthonormal basis for the output space, R^m. This gives us the basis {u₁, .., u_r, u_r+1, .., u_m}. As before, we define the m×m orthogonal matrix
U = [u₁, .., u_r, u_r+1, .., u_m].
The matrix of A relative to the new bases We let S be M_A. We compute it via the formula for the matrix of a linear transformation.
S = [ [Av₁]_U [Av₂]_U ... [Av_r]_U [Av_r+1]_U ... [Av_n]_U ]
The v's with k = r+1,...,n, are all in the null space of A. Thus the last n - r vectors are 0, and so are their corresponding columns relative to the basis of u's. The other vectors we get from the definition of the u's for k = 1, ..., r. The end result is this.
S = [ [z₁^1/2 u₁]_U [z₂^1/2 u₂]_U ... [z_k^1/2 u_r]_U [ 0 ]_U ... [ 0 ]_U ]
S = [ z₁^1/2 [u₁]_U ... z_r^1/2 [u_r]_U 0 ... 0 ]
S = [ z₁^1/2 e₁ ... z_r^1/2 e_r 0 ... 0 ].
If we let s_k = z_k^1/2 for k = 1, ..., r, we get the same S as the one given in the statement of the theorem. These s_k's are the singular values of A. The matrix S is related to A via multiplication by change-of-basis matrices. The matrix U changes from new output to old output bases, and V changes from new input to old input bases. Since V^T = V^-1, we have that V^T changes from old input to new input bases. In the end, this gives us A =USV^T

Example Consider the matrix A =

 2  -2
 1   1
-2   2

Here, A^TA =

 9  -7
-7   9

The eigenvalues of this matrix are z₁ = 16 and z₂ = 2. The singular values are s₁ = 4 and s₂ = 2^1/2. We can immediately write out what S is. We have S =

 4 0
 0 2^1/2
 0 0

The eigenvector corresponding to 16 is v₁ = 2^-1/2(1,-1)^T, and the one corresponding to 4 is v₂ = 2^-1/2(1,1)^T. Hence, we see that V =

 2^-1/2   2^-1/2
-2^-1/2   2^-1/2

Next, we find the u's.

u₁ = A v₁ / z₁^1/2
u₁ = 2^-1/2 (4, 0, -4)^T/4
u₁ =( 2^-1/2 , 0, - 2^-1/2 )^T.

A simlar calculation gives us
u₂ =( 0, 1, 0 )^T.

We now have to add to these to a "fill" vector
u₃ =( 2^-1/2 , 0, 2^-1/2 )^T
to complete the new output basis. This finally yields U =

 2^-1/2 0  2^-1/2
 0   1  0
-2^-1/2 0  2^-1/2

17 October

LU Decomposition

Example Let A be an n×n matrix. We want to decompose A into the product of a lower triangular matrix L, which has ones for diagonal elements, and an upper triangular matrix U. Consider the matrix A =

2	-1	4
1	1	1
-1	3	2

The highlighted 2 in the first column is the pivot. We perform two row operations to zero-out the rest of the entries in the first column.
R₂ = R₂ -(1/2)R₁
R₃ = R₃ -(-1/2)R₁
These row operations put the matrix into the form below.

2	-1	4
0	3/2	-1
0	5/2	4

We need one last row operation,
R₃ = R₃ -(2/3)(5/2)R₂
= R₃ -(5/3)R₂,
to finish finding the matrix U; carrying it out, we get U =

2	-1	4
0	3/2	-1
0	0	17/3

The matrix L is lower triangular and its diagonal elements are 1's. We only need to know the entries L_jkR_j, where k> j. These turn out to be coefficients involved in the row operations:
R_k = R_k - L_jkR_j, k> j.
For our example, L_2,1 1/2; L_3,1 = -1/2; and L_3,2 = 5/3. Hence, L =

1	0	0
1/2	1	0
-1/2	5/3	1

One may verify that A = LU. Note that we can read off the determinant of A. The product rule states that det(A) = det(L)det(U) = (1³)× (2)(3/2)(17/3) = 17. In general, the determinant of A = LU is simply the product of the pivots, which are the diagonal entries in U.

Review for the Midterm Exam

General information The exam will have five to seven questions, some with multiple parts. The test covers everything we've discussed, except the LU decomposition. You will be asked to state basic definitons, to do simple derivations or proofs (some choice will be given here), and to do problems similar to ones given for homework.

Subspaces Be able to determine whether or not a nonempty subset of a vector space is a subspace.

Change of basis Be able to find the change-of-basis matrix, and be able to change coordinates in vectors and matrices.

Inner products and norms Be able to show Schwarz's inequality and the triangle inequality. Know the terms below and be able to carry out the procedures listed.

Orthogonal and orthonormal sets; orthonomal bases; orthogonal matrices
Gram-Schmidt procedure
QR factorization of a matrix
Least squares. Be able to derive the normal equations. Be able to solve the various LS problems that we've discussed.
p-norms

Linear transformations Be able to find the matrix of a linear transformation, and be able to find this matrix relative to different bases.

Eigenvalue problems. Be able to find the eigenvalues and eigenvectors for a matrix (or linear transformation).

Characteristic polynomial.
Cayley-Hamilton Theorem.
Diagonalization. Be able to diagonalize a matrix or to explain why it cannot be done.
Adjoint of a linear transformation. Diagonalization of selfadjoint matrices by orthogonal matrices. Be able to show that the eigenvalues of a selfadjoint matrix are real, and that eigenvectors corresponding to distinct eigenvalues are orthogonal.
Triangular forms. Upper triangular, block upper triangular, and jordan normal form. Know the algorithm for triangularizing a matrix.

Singular value Decomposition. Know what the SVD is, what singular values are, what information the SVD contains, and be able to find it for a simple example.

19 October: Midterm Test

24 October

Multivariate Calculus

Open sets Our functions will be defined on sets that have this simple property: every point in the set can be placed at the center of an n-dimensional ball that is contained entirely in the set. We need this property to discuss limits, derivatives, etc.

Limits We say that lim_{Q -> P} f(Q) = L if for every epsilon > 0 there is a delta > 0 such that
|| f(Q) - L || < epsilon whenever 0 < || P - Q ||< delta . In class we briefly mentioned the reason for using this definition rather than a ``dynamical'' one involving ``P approaching Q''. See The Historical Development of the Calculus, C. H. Edwards, Jr., Springer-Verlag, New York, 1979.

Continuity A function f is continuous at P if lim_{Q
-> P} f(Q) exists and equals f(P).

Partial derivatives The partial derivative of f : Rⁿ -> R^m with respect to x_k is
D_kf(x) = lim_{h -> 0}( f(x₁,...,x_k+h,...,x_n) - f(x₁,...,x_k,...,x_n))/h

Jacobian derivative If f : Rⁿ -> R^m, so that f(x) = (f₁(x), ... , f_m(x) )^T, then we define the Jacobian derivative of f to be the matrix f´(x) with entries
f´(x)_j,k := D_kf_j(x)

Linear approximation If f : Rⁿ -> R^m, then the the linear approximation to f is given by the first two terms on the right below.
f(x+h) = f(x) + f´(x)h + ||h||eps(h), where eps(h) -> 0 as ||h|| -> 0

Chain rule If g : Rⁿ -> R^p and f : R^p -> R^m, then we let h(x) = f(g(x)). The chain rule states that
h´(x) = f´(g(x))g´(x)
Note that the Jacobian derivatives involved are matrices. Order matters! Do not reverse the order in which they are multiplied.

Inverse function theorem Let g : Rⁿ -> Rⁿ. Thus, g takes n variables to n variables, say u = g(w). When can we solve for w in terms of u? That is, when can we find a function h for which w = h(u)? We have to be careful here. We are only looking for an h that works in an open set about u. With that in mind, plus an additional caveat that we are assuming that the functions involved are continuously differentiable, we have that such an h exists when the Jacobian derivative g´(w) is invertible. Moreover, we can calculate h´. It is just given by this:
h´(u) = g´(w)^-1, where u = g(w) and w = h(u).

26 October

Tensors

Invariance Physical laws should be formulated in ways that are independent of any coordinates used; that is, the laws should be stated in such a way that they are invariant under transformation of coordinates. For scalar quantities this has a simple meaning. If we know the temperature on a surface T(P) in u-coordinates, so that T=f(u). Then in another coordinate system, u = g(w), the temperature is given by T=f(g(w)). The law of transformation is simply composition of functions (substitution).

Vectors Here is an example of a vector quantity. Suppose that a surface is specified by giving its position x in R³ in terms of parameters u₁ and u₂; that is, x = x(u₁, u₂). If a body travels on a curve passing through x at a time t, the velocity vector is
v = a₁ f₁ + a₂ f ₂
where
a_j = D_tu_j and f_j = D_ujx = partial of x along u_j, j=1,2.
If we now change parametrization from u₁ and u₂ to w₁ and w₂, where
u = H(w) and w = K(u)
are the functions relating the two sets of parameters, then we can represent the same vector v as
v = b₁ g₁ + b₂ g ₂
where
b_j = D_tw_j and g_j = D_wjx = partial of x along w_j, j=1,2.
We want to see how various quantities transform under this transformation. Using the chain rule, one can show that
D_wjx = D_u1x D_wju₁ + D_u2x D_wju₂.
Equivalently,
g_j = H´_1,jf₁ + H´_2,jf₂,
where H´ is the Jacobian matrix of H. Quantites that transform with the same coefficients as the ones above are called covariant. To see how the components change, we first note that we are dealing with a change-of-basis problem. Let
F={f₁, f₂} and G={g₁, g₂}.
Then, the two coordinate vectors for v are
[v]_F = [a₁ a₂]^T and [v]_G = [b₁ b₂]^T,
and they transform as follows:
[v]_G = (H´)^-1[v]_F = K´[v]_F.
Quantities with components that transform this way are called contravariant.

Dual spaces Let V be an n-dimenasional, real vector space with basis F = {f₁,..., f_n}. Recall that the dual space of V is the set of all linear functionals v* : V -> R. Corresponding to the basis F for V, we have a dual basis F* = {f*₁, ..., f*_n} that is defined by the property f*_j[f_k] = delta_j,k.

31 October

Tensors (continued)

Transformation laws & index notation Recall that we have two types of transformation laws: covariant and contravariant. Objects that transorm covariantly have subscripts. Basis vectors for V transform covariantly, so they are denoted by subscripts, {f₁,..., f_n}. Dual basis vectors transform contravariantly. Instead of using {f*₁,...,f*_n} for the dual basis, we will use superscripts {f¹,...,fⁿ}. The components for a vector transform contravariantly; thus we use superscripts when working with them. This means that we write a vector v as
v = a¹ f₁ +... + aⁿ f_n.
The components of a dual vector transform covariantly, and so we use subscripts for them.
v* = a₁ f¹ +... + a_n fⁿ.
We summarize the transformation laws below.

Space Basis Components

V covariant contravariant

V* contravariant covariant

Space	Basis	Components
V	covariant	contravariant
V*	contravariant	covariant

2 November

Examples of Tensors

Stress tensor By considering the forces on a small tetrahedron, we showed that f_n, the contact force per unit area with normal direction n, was a linear function of the components of n. The linear transformation that takes n to f_n can be represented by a matrix t^j_k relative to any given basis E = {e₁, e₂, e₃} for V. Here, we think of the row index as j and the column index as k. The placement of the indices shows how the matrix transforms under a change of coordinates. The quantity t^j_k is an order 2 mixed type tensor, the stress tensor.

Strain (deformation) tensor Suppose that a body is deformed so that a point A with coordinates (x¹, x², x³) is moved to a point A´ with coordinates (y¹, y², y³), where
y¹ = x¹ + u¹(x), y² = x² + u²(x), y³ = x³ + u³(x).
Consider a point B near to A and having coordinates (x¹+ dx¹, x²+dx², x³+dx³). Under the deformation B is moved to B´, which, to first order, has coordinates (y¹+dy¹, y²+dy², y³+dy³), where
dy^j = dx^j+(D_x1u^j)dx¹ + (D_x2u^j)dx² + (D_x3u^j)dx³
The strain tensor measures the change in the distances from A to A´ and from B to B´. If we are working in cartesian coordinates, then the square of the distance from B to B´ is (dy¹)² + (dy²)² + (dy³)², and that from A to A´ is (dx¹)² + (dx²)² + (dx³)², and the difference between the two is

(dy¹)² + (dy²)² + (dy³)² - (dx¹)² -(dx²)² - (dx³)² = SUM_j,k 2s_j,kdx^jdx^k

where s_j,k = ½(D_xju^k + D_xku^j + SUM_i D_xjuⁱD_xkuⁱ) is the strain tensor. (In the linear theory of elasticity, the products are neglected.) The strain tensor is also second order, but it is purely covariant.

Hooke's law In linear elasticity, stress and strain are related via a version of Hooke's law:
t^j_k = SUM_lm c^j_k,l,m s_l,m.
The tensor c^j_k,l,m is of mixed type and has order 4.

Metric tensor We had to use cartesian coordinates to find the strain tensor, because we did not have available the distance, or arclength, between nearby points in general coordinates. If we change from cartesian coordinates x to another set w, then the square of the distance between x and x + dx is (ds)² = (dx¹)² + (dx²)² + (dx³)² in cartesian coordinates. Relative to w it becomes

(ds)² = SUM_j,k g_j,k dw^jdw^k.

The tensor g_j,k is called the metric tensor. It is purely covariant, and has order 2.

7 November

Change of variables in multiple integrals

Metric tensors and Jacobian determinants Suppose that we have written the arclength (ds)² in u-coordinates, so that

In matrix notation, we can write this as (ds)² = du^Tg_u du. If we now go from u-coordinates to w-coordinates, du = Jdw, where J = u´ is the Jacobian of the transformation u = u(w). since the arclength is an invariant, we have

(ds)² = dw^TJ^Tg_uJ dw = dw^Tg_w dw.
Consequently, we have that g_w = J^Tg_uJ. This also can be derived via the covariance of the metric tensor. The matrices involved are all square and n×n. Taking the determinants for both sides, and then taking square roots yields

det(g_w)^1/2 = |det(J)| det(g_u)^1/2, J = u´(w)

The quantity det(J) is called the Jacobian determinant, and is often writtem as

Jacobi's Theorem Consider a change of coordinates x = x(u), where R_x and R_u correspond to each other under it. Then,

Invariant volume We want to look at the volume element det(g_u)^1/2du¹...duⁿ when we make the change of coordinates u = u(w). First, we have

det(g_u)^1/2 du¹...duⁿ = det(g_u)^1/2 |det(u´(w))| dw¹...dwⁿ.

Recall that we also have det(g_w)^1/2 = |det(J)| det(g_u)^1/2, J = u´(w). Substituting this into the last equation then gives us

det(g_u)^1/2 du¹...duⁿ = det(g_w)^1/2 dw¹...dwⁿ,

which shows that the combination det(g_u)^1/2 du¹...duⁿ is invariant under coordinate transformations. This is often called the invariant volume element.

9 November

Surface integrals

Flux integrals Consider the steady state velocity field V(x) of a fluid. We want to calculate the amount of fluid crossing a surface parametrized by x = x(u¹, u²). Let f₁ and f₂ be partials of x with respect to the parameters u¹and u². We consider an element of surface area, shown below as the base of the parallelepiped. Our first step is to calculate the fluid crossing this surface element. In time t to t+dt, the volume of fluid crossing the base of the parallelepiped equals its volume, (Vdt)·f₁×f₂ du¹du².

Vdt
f₁du¹ f₂du²

The mass of the fluid crossing the base in time t to dt is then density×volume, or
(µVdt)·f₁×f₂ du¹du²
Thus the mass per unit time crossing the base is F·N du¹du², where F = µV, and N = f₁×f₂ is the standard normal. Recall that the area of the surface element is dS = |N|du¹du². Consequently the mass per unit time crossing the base is F·n dS, where n is the unit normal. Integrating over the whole surface then yields

This surface integral is called the the flux of the vector field F.

14 November

Three Theorems from Vector Analysis

Curves and Green's Theorem To state Green's theorem, we need to discuss simple, closed curves. These are closed curves, like circles, but the do not intersect themselves. Rectangles, triangles, circles, and ellipses are simple closed curves; figure eights are not. Simple closed curves divide the plane into two nonoverlapping regions, one interior and the other exterior. It forms the boundary of both regions. We will consider simple closed curves that are piecewise smooth, which just means that we are allowing a finite number of corners. We also say that a simple closed curve is positively oriented if it is travered in the counterclockwise direction. Here is the statement of Green's Theorem:

Green's Theorem Let C be a piecewise smooth simple closed curve that is the boundary of its interior region R. If F(x,y) = A(x,y)i + B(x,y)j is a vector-valued function that is continuously differentiable on and in C, then

The curl and Stokes' Theorem Let S be a surface in 3D bounded by a simple closed curve C. We will not be absolutely precise here. One should think of S as a butterfly net, with C as its rim. Such a surface is orientable, and we alsways have a consistent piecewise continuous unit normal n defined on S. We say that C is positively oriented if in traversing C with the surface on our left, we are standing in the direction of n.

To state this theorem, we also need to define the curl of a vector field
F(x)=A(x,y,z)i + B(x,y,z)j +C(x,y,z)k.
We will assume that F has continuous partial derivatives. The curl is then defined by

There is an important connection between the Jacobian derivative of a vector field and the curl of that vector field. Namely, the antisymmetric part of the Jacobian derivative has components of the curl for entries.

This is important because it gives us the following formula. For any two vectors b and c in R³,
c ^T(F-(F')^T)b = curl F · b × c.
This is useful in proving Stokes' theorem, which we now state.

Stokes' Theorem Let S be an orientable surface bounded by a simple closed positively oriented curve C. If F is a continuously differentiable vector-valued function defined in a region containing S, then

The divergence of a vector field and the Divergence Theorem The divergence of a vector field F(x)=A(x,y,z)i + B(x,y,z)j +C(x,y,z)k is defined by

Like the curl, the divergence of F is connected to the Jacobian derivative F´. Namely, div F = trace(F´), which is the sum of the diagonal entires in F´. We can now state the divergence theorem.

Divergence Theorem Let V be region in 3D bounded by a closed, piecewise smooth, orientable surface S; let the outward-drawn normal be n. Then,

16 November

Three Theorems from Vector Analysis (cont.)

Verifications We went over several examples verifying the Diveregnce Theorem and Stokes' Theorem.

Equation of continuity Let S be a closed surface containing a fluid having velocity v(x,t) and density µ.Assume there are no sources or sinks in S. Let F = µv. The total amount of mass crossing S in the direction of the outward normal is just the flux:

Using the Divergence Theorem, we can write this in terms of a volume integral.

Since there are no sources or sinks in S, any mass entering or leaving the volume enclosed by S must pass through S. Thus, the rate at which mass of fluid inside of S changes in time is the negative of the flux.

Interchanging the time derivative and the triple integral and bringing everything under the same integral, we have that

This equation holds for all regions in which the fluid is source free. It follows that the integrand must be 0, otherwise we could pick a small cube restricted to a region in which it was positive (or negative). The whole integral would then be positive (or negative). Hence, we arrive at the following equation, known as the equation of continuity:

21 November

PDEs of mathematical physics

Derivation of the Heat Equation See § VI.2 in Zachmanoglou and Thoe (Z/T). The steady state version of this equation is Laplace's equation.

Separation of variables

Dirichlet problem for Laplace's equation in the disk See § VII.2, VII.7 in Z/T.

28 November

Fourier series

Expansions in Fourier series See § VII.8 in Z/T.

Solution to Dirichlet problem See § VII.7 and VII.8 in Z/T.

Vibrating string with ends clamped

Solution via separation of variables See § VIII.8 in Z/T.

Sine series See § VII.8 and § VIII.8 in Z/T.

30 November

Vibrations in Finite Regions

Separation of variables See VIII.10 in Z/T.

Eigenfunction expansions See VIII.10, Theorem 10.1.

Vibrations in a circular membrane See VIII.10, Example 10.3.

5 December

D'Alembert's Solution to the Wave Equation

One dimensional case Change variables in u_xx- u_tt=0 from x and t to p=x+t and q=x-t. The result is u_pq=0. Integrating, we have u=F(p)+G(q), or u(x,t)=F(x+t)+G(x-t). See § VIII.2 for the rest of the solution.

Three dimensional radial case See §VIII.1.

Divergence, Laplacian, and Curl in General Coordinates and Spherical Coordinates

Divergence

Laplacian

Curl

Review for the Final Exam

General information The exam will have six to eight questions, some with multiple parts. The test will cover the material discussed after the midterm exam; there will be no direct questions on material prior to the midterm. You will be asked to state basic definitons and theorems, to do simple derivations or verifications, and to do problems similar to ones given for homework. The specific topics covered are as follows:

Multivariate Calculus

Jacobian derivative
Chain rule
Inverse function theorem

Tensors

Transformation laws, dual spaces and dual bases
Examples of tensors: stress, strain, and metric tensors

Change of variables in multiple integrals

Jacobi's Theorem.

Surface integrals See my notes on surfaces.

Area element
Normals - unit and standard
Flux and density integrals
Parametrizations of simple surfaces

Green's Theorem, Divergence Theorem, Stokes' Theorem

Partial differential equations.

Initial value and boundary value problems. See Z/T VI.2.
Separation of variables.
- Laplace's equation in the disk. See § VII.2, § VII.7.
- Fourier series. See § VII.8.
- Vibrating string. See § VIII.8.
- Vibrating drumhead. See § VIII.10.
- Eigenvalue problems and eigenfunction expansions. See § VIII.10
D'Almebert's solution to the one-dimensional wave equation.

AUG	29	31
SEP	5	7	12	14	19	21	26	28
OCT	3	5	10	12	17	19	24	26	31
NOV	2	7	9	14	16	21	23	28	30
DEC	5

AUG	29	31
SEP	5	7	12	14	19	21	26	28
OCT	3	5	10	12	17	19	24	26	31
NOV	2	7	9	14	16	21	23	28	30
DEC	5

AUG	29	31
SEP	5	7	12	14	19	21	26	28
OCT	3	5	10	12	17	19	24	26	31
NOV	2	7	9	14	16	21	23	28	30
DEC	5