Math 603-601 - Fall 2002

Summary

Dates
 SEP OCT NOV DEC 3 5 10 12 17 19 24 26 1 3 8 10 15 17 22 24 29 31 5 7 12 14 19 21 26 28 3 5 10

3 September

Vector spaces
Definition - Vector space. A vector space is a set V together with two operations, + and · . If u, v are in V, then u + v is in V; if c is a scalar, then c·v is in V. The operations satisfy the following rules.

 Addition Scalar multiplication u + (v + w) = (u + v) + w a·(b·u) = (ab)·u Identity: u + 0 = 0 + u = u (a + b)·u = a· u + b·u Inverse: u + (-u) = (-u) + u = 0 a·(u + v) = a·u + a·v u + v = v + u 1·u = u

Subspaces
Definition - Subspace. A nonempty subset U of V is a subspace if, under + and · from V, U is a vector space in its own right.
Theorem. U is a subspace of V if and only if these hold:
1. 0 is in U.
2. U is closed under + .
3. U is closed under · .
Example: U={(x1, x2, x3) | 2x1 + 3x2 - x3 = 0} is a subspace of R3. On the other hand, W={(x1, x2, x3) | 2x1 +3x2 - x3 = 1} is not a subspace of R3, because 0 is not in W.

Important vector spaces
Displacements in space, forces, velocities, accelerations, etc.
• + parallelogram law
• · usual scalar multiplication
Rn (real scalars) or Cn (complex scalars) - n×1 real or complex matrices (i.e., columns; can also work with rows).
• + component of sum is sum of components
• · multiply each component by the scalar
Pn = {a0 + a1 x + a2x2 + ... + anxn }, the polynomials of degree n or less.
• · multiply each term by the scalar
Spaces of functions f : X -> scalars, where X can be anything. Think of f(x) as the ``x component of f.''
• + is given by (f+g)(x) = f(x) + g(x) (``component of sum is sum of components'')
• · is given by (c·f)(x) = c(f(x)) (``multiply each component by the scalar'')
Various subspaces of the vector spaces of the type f : X -> scalars
• C[a,b], all functions continuous on the interval [a,b]. (Could be complex valued).
• C(k)[a,b], all continuous functions having derivatives continuous through order k.

5 September

Span
Definition - Linear combination. Let v1 ... vn be vectors in a vector space V. A vector of the form c1v1 + ... + cn vn is called a linear combination of the vj's.

Definition - Span. Let S={v1 ... vn} be a subset of a vector space V. The span of S is the set of linear combinations of vectors in S. That is,
U=span(S)={c1v1 + ... + cn vn},
where the cjs are arbitrary scalars.

Proposition. The set span(S) is a subspace of V.

Examples.
• The polynomials of degree n or less, Pn = span{1, x, x2, ..., xn }
• The set of 3D displacements = span{ i, j, k}
• The set of solutions to y'' + y = 0 is span{ sin(x), cos(x) }.
• U={(x1, x2, x3) | 2x1 + 3x2 - x3 = 0} = span{ (1,0,2), (0,1,3) }. In addition, we also have U = span{ (1,-1,-1), (1,2,7), ((2,-1,1) }.

Terminology If a vector space V = span(S), we say that V is spanned by S, or V is the span of S, or S spans V.

Linear independence and dependence
Definition - Linear independence and linear dependence. We say that a set of vectors
S = {v1, v2, ... , vn}
is linearly independent (LI) if the equation
c1v1 + c2v2 + ... + cnvn = 0
has only c1 = c2 = ... = cn = 0 as a solution. If it has solutions different from this one, then the set S is said to be linearly dependent (LD).

Examples.
• {1, x, x2, ..., xn } is linearly independent.
• {1 + x, 1 - x, 1} is linearly dependent.
• { i, j, k} is linearly independent.
• { (1,0,2), (0,1,3) } is linearly independent and { (1,-1,-1), (1,2,7), ((2,-1,1) } is linearly dependent.
• Any set of three vectors in R2 is linearly dependent.

Inheritance properties of LI and LD sets. Every subset of a linearly independent set is linearly independent. Every set that contains a linearly dependent set is linearly dependent.

Basis and dimension

Classification of vector spaces. If a vector space V has no limit to the size of its linearly independent sets, it is said to be infinite dimensional. Otherwise, V is said to be finite dimensional. 2D and 3D displacements are finite dimensional spaces. C[0,1] is infinite dimensional, because it contains {1, x, x2, ..., xk } for all k.

When V is finite dimensional, its LI sets cannot be arbitrarily large. Suppose that n is the maximum number of vectors an LI set in V can have. A linearly independent set S in V having n vectors in it will be called maximal.

Proposition. If a vector space V is finite dimensional, then any maximal linearly independent set of vectors
S = {v1, v2, ... , vn}
spans V.

Proof. Add any vector v in V to S to form the new set
{v, v1, v2, ... , vn}
This augmented set is LD, because it has n+1 > n vectors in it. Thus we can find coefficients c, c1, c2, ... , cn, at least one of which is not 0, such that
c v + c1v1 + ... + cn vn = 0.
There are two possibilities. Either c is 0 or c is not 0. In case c = 0, we have c1v1 + ... + cnvn = 0. The set S is LI, so this implies c1 = c2 = ... = cn = 0. But this means all of the coefficients vanish, contradicting the fact that one of them does not vanish. The only possibility left is that c is not 0. We can then divide by it in our previous equation and rewrite that equation as
v = (- c-1c1)v1 + ... + (- c-1cn)vn .
This shows that S spans V. Our proof is complete.

Definition - Basis. We say that a set of vectors
S = {v1, v2, ... , vn}
in a vector space V is a basis for V if S is both linearly independent and spans V.

Corollary. A linearly independent set is a basis for a finite dimensional vector space if and only if it is maximal.

Definition - Dimension. The dimension of a vector space V is the maximum number of vectors in an LI set. Equivalently, it is the number of vectors common to every basis for V.

10 September

Coordinates and bases

Coordinates for a vector space. Assigning coordinates amounts to providing a correspondence
v <-> (c1, c2, ... , cn)
that is 1:1, onto, and preserves vector addition and scalar multiplication. Consider a basis {v1, v2, ... , vn} for a vector space V. We also assume that the basis is ordered, in the sense that we keep track of which vector is first, or second, etc.

Theorem. Let B={v1, v2, ... , vn} be an ordered basis for a vector space V. Every vector v in V can be written in one and only one way as a linear combination of vectors in B. That is,
v = c1v1 +...+ cnvn
where the coefficients are unique.

Proof. Since B is a basis, it spans V. Consequently, we have scalars c1, ..., cn such that the representation above holds. We want to show that this representation is unique - i,e,, no other set of scalars can be used to represent v. Keeping this in mind, suppose that we also have the representation
v = d1v1 +...+ dnvn,
where the coefficients are allowed to be different from the ck's. Subtracting the two representations for v yields
0 = (c1 - d1) v1 +...+ (cn - dn) vn.
Now, B is a basis for V, and is therefore LI; the last equation implies that c1 - d1 = 0, c2 - d2 = 0, ..., cn - dn = 0. That is, the c's and d's are the same, and so the representation is unique.

This theorem gives us a way to assign coordinates to V, for the correspondence
v= c1v1 +...+ cnvn <-> (c1, c2, ... , cn)
it sets up is both 1:1 and onto. The condition of linear independence gives us that it is 1:1, and the condition of spanning gives us that it is onto. It is also easy to show that this correspondence preserves addition and scalar multiplication, which are the properties needed in defining "good" coordinates for a vector space.

Definition - Coordinate Vector. Given an ordered basis B = {v1 ... vn} and a vector v = c1v1 +...+ cnvn, we say that the column vector [v]B = [c1, ..., cn]T is the coordinate vector for v, and that c1, c2, ..., cn are the coordinates for v.

Definition - Isomorphism. Let U and V be vector spaces. A correspondence between U and V
u <-> v
that is 1:1, onto, and preserves vector addition and scalar multiplication is called an isomorphism between U and V, and the two spaces are said to be isomorphic.

The word isomorphism comes from two Greek words, "isos," which means "same", and "morphy," which means "form." As far as vector space operations go, two isomorphic vector spaces have the "same form" and behave the same way. Essentially, the spaces are the same thing, just with different labels. For example,a basis in one space corresponds to a basis in the other. Indeed, any property in one space that only involves vector addition and scalar multiplication will hold in the other. This makes the following theorem, which is a consequence of what we said above concerning coordinates, very important.

Theorem. Every n-dimensional vector space is isomorphic to Rn or Cn, depending on whether the set of scalars is R or C.

Examples. The space P2 of polynomials of degree 2 or less has B = { 1, x, x2 } as a basis, and so it is three dimensional (B has three vectors). We take the scalars to be real. Relative to this basis, we have
[p]B = [ a1 + a2x + a3x2 ]B = [a1 a2 a3]T.
Thus, for example, we have these:
[1-x+x2]B = [1 -1 1]T
[-3x+4-2x2]B = [4-3x-2x2]B = [4 -3 -2]T
[(2-x)2]B = [4-4x+x2]B = [4 -4 1]T
Changing the ordering of the basis vectors changes the ordering of the coordinates. Using the basis C = { x2 1, x }, we have
[-3x+4-2x2]C = [-2x2+4-3x]C = [-2 4 -3]T,
which is a reordering of the coordinate vector relative to B. Be aware that order matters!

This isn't the only isomorphism between P2 and R3. Recall that a quadratic polynomial is determined by its values at three distinct values of x; for instance, x=-1, 0, and 1. Also, we are free to assign whatever values we please at these points, and we can get a quadratic that passes through them. Thus, the correspondence
p <-> [p(-1) p(0) p(1)]T
between P2 and R3 is both 1:1 and onto. It is easy to show that it also preserves addition and scalar multiplication, so it is another isomorphism between P2 and R3. Let's use this to find a new basis for P2. (Remember, a basis in one isomorphic space corresponds to a basis in the other.) Since { [ 1 0 0 ]T, [ 0 1 0 ]T, [ 0 0 1 ]T } is a basis for R3, the set of polynomials
C = { p1(x) = -½x + ½x2, p2(x) = 1, p3(x) = ½x + ½x2 },
which satisfy
p1(-1) = 1, p1(0) = 0, p1(1) = 0,
p2(-1) = 0, p2(0) = 1, p2(1) = 0,
p3(-1) = 0, p3(0) = 0, p1(1) = 1,
is another basis for P2. This raises the question of how the coordinate vectors [p]C and [p]B are related.

12 September

Change-of-bases
Let B = {v1 ... vn} and D = {w1 ... wn} be ordered bases for a vector space V. Suppose that we have these formulas for v's in terms of w's and vice versa:

vj=A1jw1 + A2jw2 + ... + Anjwn
wk=C1kv1 + C2kv2 + ... + Cnkvn
(Note that the sums are over the row index for each matrix A and C.) For any vector v with representations
v = b1v1 +...+ bnvn
v = d1w1 +...+ dnwn
and corresponding coordinate vectors
[v]B = [b1,..., bn]T
[v]D = [d1,..., dn]T
we have the change-of-basis formulas
[v]D = A[v]B and [v]B = C[v]D.
These imply that AC=CA=In×n, so C=A-1 and A=C-1 .

For purposes of comparison, we want to write out the expressions for the coordinate changes. Writing the d's in terms of the b's, we have
dk = Ak1 b1 + ... + Akn bn.
Going the other way, we can write the b's in terms of the d's,
bk = Ck1 d1 + ... + Ckn dn.
We note that quantities transforming according to the formula for bases are called covariant, and quantities transforming like the coordinates are called contravariant.

Examples
• R2. Let B={v1=i, v2=j} and let D={w1=i - j, w2=2i + 3j}. Note that we have
w1=v1 - v2
w2=2v1 + 3v2
Hence, we have that C11 = 1, C21 = -1, C12 = 2, and C22 = 3. Consequently,
b1 = d1 + 2d2
b2 = -d1 + 3d2
We remark that the matrix C =
```1  2
-1 3
```
and the matrix A = C-1 =
```3/5  -2/5
1/5   1/5
```

• Pn. We will consider the n = 2 case. Consider the two ordered bases,
B = {1,x,x2} and D = {(3-x)2, x+2, x-1}. Because we have
w1 = (3-x)2 = 9 - 6x + x2 = 9v1 - 6v2 + v3
w2 = 2+x =2v1 + v2 + 0v3
w3 = (-1)+x = - v1 + v2 + 0v3,
The matrix that takes coordinates relative to D into ones relative to B is C =
```  9      2   -1
-6      1    1
1      0    0
```
The matrix that takes coordinates relative to B into ones relative to D is A =
```  0      0    1
1/3    1/3  -1
-1/3    2/3   7
```
Solving a System of ODEs
Consider the following system of ordinary differential equations, relative to x1-x2 coordinates.
dx1/dt = 3x1 + 2x2
dx2/dt = 2x1 + 3x2
We can turn this into a very simple decoupled system if we change from x1-x2 coordinates to the u1-u2 set defined via these equations:
x1 = u1 + u2
x2 = u1 - u2 .
Afrter some algebra, the system of ODEs becomes
du1/dt = 5u1
du2/dt = u2 ,
which can be easily solved.

17 September

Dual space
Definition - Dual Space. The set V* of all linear functions L:V - > R (or C) is called the (algebraic) dual of V. The term linear means that
L(au + bv) = aL(u) + bL(v)
holds for all scalars a,b and vectors u, v.

Terminology. Linear functions in the dual space are called linear functionals, to distinguish them from other types of linear functions. They are also called or 1-forms or co-vectors.

Proposition. V* is a subspace of all functions mapping V to the scalars.

Proof. We leave this as an exercise.

A simple physical example is the work W done by a force f applied at a point and producing a displacement s. Here, the work is given by W=L(s) = f·s. The point is that is if we fix the force, then the work is a linear function of the displacement. Note that forces and displacements have different units and are thus in different vector spaces, even though the spaces are isomorphic.

Another simple example that frequently comes up is multiplication of a column vector X by a row vector Y. The linear functional in this case is just L(X) = Y X. Our final example concerns C[0,1]. L[f] = 0S1 f(x)dx is a linear functional.

Dual Basis
Let V be an n-dimensional vector space. We want to construct a basis for V*. Let B = {v1 ... vn} be a basis for V. We may uniquely write any v in V as
v = x1 v1+ ...+ xn vn
Now, if L is a linear functional (i.e., it is in V*), we also have
L(v) = x1 L(v1)+ ...+ xnL(vn).
Thus knowing L(vj) for j=1 ... n completely specifies what L(v) is. Conversely, given scalars {y1, ..., yn}, one can show that
L(v) = x1y1 + ...+ xnyn,
where the xj 's are the components of relative to B, defines a linear functional. As before, L(vj) = yj. In summary, we have established this.

Theorem. Let V be a vector space with a basis B = {v1 ... vn}. If L is a linear functional in V*, then
L(v) = x1y1 + ...+ xnyn, where yj = L(vj)
Conversely, given scalars {y1, ..., yn}, the formula for L above defines a linear functional in V*, where again L(vj) = yj.

We can use the theorem we just obtained to define n linear functionals {v1 ... vn} via

To make this clearer, let's look at what v1 does to vectors. If we take a vector v = x1 v1+ ...+ xn vn, then
v1(v) = x1v1(v1) + x2v1(v2) + ... + xnv1(vn) = x1·1 + x2·0 + ... +xn·0 = x1.
A similar calculation shows that v2(v) = x2, v3(v) = x3, ..., vn(v) = xn. This means that we can write L(v) = x1y1 + ...+ xnyn as
L(v) = y1v1(v) + ... + ynvn(v)
= (y1v1 + ... +ynvn)(v)
Now the two sides are equal for all values of the argument, so they are the same function. That is, L = yjv1 +...+ ynvn. Hence, the set B* = {v1 ... vn} spans V*. The set is also linearly independent. If 0 = yjv1 +...+ ynvn, then 0=0(vj) = yj. Hence, the only yj's that give 0 are all 0. Summarizing, we have obtained this result.

Theorem. If V is an n-dimensional vector space, and if B = {v1 ... vn} is a basis for V, then the dual space V* is also n-dimensional and B* = {v1 ... vn} is a basis for V*.

Definition - Dual Basis. The basis B* is called the dual basis for B.

Inner Product
Definition - Inner product Let V be a real vector space. We say that a mapping < , > : V×V --> R is an inner product for V if these hold:
1. positivity - <v,v> > 0, with <v,v> = 0 implying that v=0.
2. symmetry - <u,v> = <v,u>
3. homogeneity - <cu,v > = c<u,v >
4. additivity - < u+v,w> = <u,w> + <v,w>

Definition - Norm The quantity ||v|| := (<v,v>)½ is called the norm or length of a vector v.

Schwarz's inequality: |<u,v>| <= ||u|| ||v||.

Schwarz's inequality shows that the quotient |<u,v>| ÷ ||u|| ||v|| is always between -1 and 1. Consequently, we may define an angle between vectors to be cos-1(<u,v>(||u|| ||v||)-1). The norm or length of a vector ||v|| satisfies three important properties.

1. positivity - ||v|| > 0, unless v = 0
2. positive homogeneity - ||cv|| = |c| ||v||, where c is any scalar.
3. The triangle inequality: ||u+v|| <= ||u|| + ||v||

19 September

Inner product spaces
Definition - Inner product space A vector space together with an inner product defined on it is called an inner product space.

Examples We verified in detail that the following are inner products on the spaces listed. In particular, we motivated the selecting the inner product on C[a,b] by working with the one for Rn, modifying it, and letting n tend to infinity.

• Rn, with <x,y> = yTx
• C[a,b], with < f,g > = aS b f(x)g(x)dx
Norm (length) and angle. Here is a simple problem: Let x=[1 -1 0 1]T and x=[1 1 1 1]T. In the inner product for R4, we want to find ||x||, ||y|| and the angle between x and y. Solution: ||x|| = (xTx)½ = 3½. ||y|| = (yTy)½ = 4½ = 2. Now, <x,y> = 1. Thus, the angle between them is cos-1(1/(2·3½)) = 1.28 radians.

Orthogonal vectors. From the definition of the angle between two vectors, we can rewrite an inner product in a form familiar from the definition of the "dot product" of vectors in 2D and 3D. If we let t be the angle between u and v, then
<u,v> = ||u|| ||v|| cos(t),
If neither of the vectors are 0, then t = π/2 if and only if the inner product on the left is 0. With this in mind, we say that two vectors u,v in an inner product space V are orthogonal or perpendicular whenever <u,v> = 0.

24 September

Orthogonal and orthonormal sets

Definition - Orthogonal set A finite or infinite set of nonzero vectors {v1, v2, v3, v4, ... } in an inner product space V is an orthogonal set if <vj,vk> = 0 whenever j is not equal to k.

Examples
• 3D vectors. {i, j, k}
• Column vectors. {[1 1 1]T, [1 0 -1]T,[1 -2 1]T}
• Polynomials. {1,x,3x2-1}, where
• Continuous, periodic functions. {1, sin(x), cos(x), sin(2x), cos(2x), sin(3x), cos(3x), ...}, where

To do the integrals involved, we used the following product-to-sum trigonometric identities:
cos(A)cos(B) = ½(cos(A+B) + cos(A-B))
sin(A)sin(B) = ½(cos(A-B) - cos(A+B))

Definition - Orthonormal set An orthogonal set in which all vectors have length 1 is called an orthonormal set. That is, <vj,vk> = 0 whenever j is not equal to k and <vj,vj> = 1 for all j.

One can always convert an orthogonal set into an orthonormal set. We simply divide each vector in the set by the norm (length) of its length. Here are results for the examples above.

• 3D vectors. {i, j, k} (No change; the original set is orthonormal.)
• Column vectors. {[3 3 3]T, [2 0 -2 ]T, [6 -2·6 6 ]T}
• Polynomials. {2, (3/2)½ x, (5/2)½ (½)(3x2 - 1)}
• Continuous, periodic functions.

Proposition. Every set of nonzero, orthogonal vectors is linearly independent. Consequently, every orthonormal set is linearly indepedent.

Othogonal and orthonormal bases

Proposition. Suppose that V has dim(V) = n. Every set of nonzero, orthogonal vectors is a basis for V if and only if it contains n vectors.

Corollary. Suppose that V has dim(V) = n. Every orthonormal set is a basis for V if and only if contains n vectors.

Coordinates. Orthogonal and orthonormal bases are very useful because one can easily find coordinates of a vector relative to them. To see how this works, let's start with an orthogonal basis for an n dimensional vector space V,
B = {v1, v2, v3, ..., vn}.
Any vector v in V can be uniquely written as
v = x1 v1 + x2 v2 + x3 v3 + ... + xn vn.
To find the xj 's, we form the inner products <v,vj>. For example,
<v,v1> = < x1 v1 + x2 v2 + x3 v3 + ... + xn vn,v1>
<v,v1> = x1 < v1, v1 > + x2 < v2, v1 > + x3 < v3, v1 > + ... + xn < vn, v1 >
<v,v1> = x1 || v1 ||2 + x2 ·0 + x3 ·0 + ... + xn ·0
<v,v1> = x1 || v1 ||2
Consequently, we have that x1 = <v,v1> || v1 ||-2. A similar calculation gives us that, for j = 1, ..., n,
xj = <v,vj> || vj ||-2.

Things are even simpler in an orthonormal basis. If we let B = {u1, u2, u3, ..., un} be orthonormal, then || uj || = 1, and
xj = <v,uj>.
This is familiar from 3D vectors, with {i, j, k} being the orthonormal basis.

26 September

Orthonormal bases and inner products

Proposition. Let B = {u1, u2, u3, ..., un} be an orthonormal basis for a real inner product space V. If v and w in V have coordinate vectors [v]B and [w]B relative to B, then
<v,w> = [w]BT [v]B

We now want to address what happens if we change from B to a new orthonormal basis, B' = {u'1, u'2, u'3, ..., u'n}.
u'j=A1ju1 + A2ju2 + ... + Anjun.
If the matrix A has j,k entry Akj, then the coordinate vectors transform according to the rule [v]B = A[v]B'. By our previous proposition, we thus have that
<v,w> = [w]BT [v]B = [w]B'T AT A[v]B'
On the other hand, the proposition applies directly to the basis B' itself. Hence, <v,w> = [w]B'T [v]B'. Combining these two equations then gives us
[w]B'T [v]B' = [w]B'T AT A[v]B',
which holds for any choice of vectors v and w.

We are interested in getting the components of ATA. To do this, choose w = u'j and v = u'k. The coordinate vectors for these are [w]B' = [u'j]B' = ej and [v]B' = [u'k]B' = ek. Inserting these in the equation above gives us
ejT ek = ejT ATA ek
This implies that the (j,k) entry in ATA is 1 if j=k and 0 if j is not equal to k. But these are exactly the entries in the n×n identity matrix I. Thus, we have shown that ATA = I.

Definition - Orthogonal matrix An n×n matrix A is said to be orthogonal if ATA = I.

Proposition. The following are equivalent.
1. ATA = I
2. AT = A-1
3. The columns of A form an orthonormal basis for Rn
4. The rows of A form orthonormal basis for Rn.
5. In Rn, the length of Ax is the same as x, and the angle between Ax and Ay is the same as that between x and y.

Euler angles

Rotations and reflections. A change from one orthonormal basis to another is accomplished by an orthogonal matrix. This change of basis leaves lengths and angles invariant, and in 3D represents a rotation or reflection. For a 3×3 real matrix, being orthogonal implies six equations for the nine entries in A. To parametrize the 3×3 orthogonal matrices should then require three variables. These variables are the Euler angles, and they come from three rotations. The angles are called the precession, nutation, and pure rotation. A diagram may be found in the book by Borisenko and Tarapov.

1 October

Rotation about the z-axis Here, the z-axis is fixed, and we want to rotate our axes counterclockwise though an angle t. Let B = {i, j, k}, and B' = {i', j', k'}. The relationship between these two bases is

i' = cos(t) i + sin(t) j
j' = -sin(t) i + cos(t) j
k' = k
The matrix A for which [v]B = A[v]B' is A = Rz(t) =

```cos(t)  -sin(t)  0
sin(t)   cos(t)  0
0        0       1
```
By simplying relabeling the axes, we can obtain formulae for rotations with the x or y axis fixed. The one for a counterclockwise rotation about the x-axis through an angle t is A = Rx(t) =
```1   0        0
0   cos(t)  -sin(t)
0   sin(t)   cos(t)
```
The rotation matrix for the Euler angles The matrix A that takes B' coordinates into B coordinates, that is [v]B = A[v]B', is just a product of three matrices:

A = Rz(precession) Rx(nutation) Rz(pure rotation)
This discussion is based on that given in the book by H. Goldstein: Classical Mechanics, Addison-Wesley, Reading, MA, 1965.

Gram-Schmidt Process

Algorithm Let V be a vector space with an inner product < u, v >, and let {v1 ... vn} be a linearly independent set. We want to find an orthonormal set with the same span.

1. u1 := v1 (|| v1 ||)-1 equivalently v1 = r11u1, r11 = || v1 ||

2. u2 := (v2 - < v2, u1 > u1) r22-1 equivalently v2 = r12u1 + r22u1,
where r12 = < v2, u1 > and r22 = || v2 - < v2, u1 > u1 ||

3. uk := (vk - < vk, u1 > u1 - ... - < vk, uk-1 > uk-1 ) rkk-1 equivalently vk = r1ku1 + ... + rkkuk,
where rjk = < vk, uj > and rkk = || vk - < vk, u1 > u1 - ... - < vk, uk-1 > uk-1 ||.

4. Repeat the step above for k = 3, 4, ... n. The result is an orthonormal (o.n.) basis that replaces the vj's.

Example The Gram-Schmidt process takes {1,x,x2} into {2, (3/2)½ x, (5/8)½ (3x2 - 1)}, provided the inner product on P2 is

QR-Factorization Suppose m>=n. Let A be an m×n matrix with linearly independent columns {v1, ..., vn}; that is,
A = [v1 ... vn].
Apply the Gram-Schmidt process to the vj's. The result is that for k = 1, ..., n, we have
vk = r1ku1 + ... + rkkuk,
where the uj's are also column vectors. Put the uj's in an m×n matrix
Q = [u1 ... un].
Letting R =
```r11 r12  r13  ... r1n
0   r22  r23  ... r2n
0    0   r33  ... r3n
...
0    0   0   ... rnn
```
we can write the equations that give vk's as linear combinations of uj's in matrix form, A = QR. This is the QR factorization. Whem m = n, the matrix Q is orthogonal. If m > n, then Q satisfies QTQ = In×n. However, QQT will not be the m×m identity Im×m

Example Find the QR factorization of the matrix A =
``` 1  1
-2  0
2  1
```
We start with v1 = [1 -2 2]T v2 = [1 0 1]T. Following the Gram-Schmidt procedure, we get
u1 = (1/3)[1 -2 2]T, or v1 = 3u1,
so r11 = 3. Next, we will find u2. To do this, we first compute
v2 - (u1Tv2)u1 = [1 0 1]T - (3/3)*(1/3)*[1 -2 2]T = [2 2 1]T/3
Clearly, the length of [2 2 1]T/3 is r22 = 1, and so u2 = v2 - (u1Tv2)u1 = v2 - u1. Hence, v2 = u1 + u2. Thus, r12 = 1, and R =
``` 3  1
0  1
```
and the matrix Q =
``` 1/3   2/3
-2/3   2/3
2/3   1/3
```

3 October

Least squares approximation

Discrete least squares problems The simplest example of this is fitting a straight line to data. For instance, suppose we have measured the log of the concentration of some chemical and listed the data in the table below.

 t ln(C) 0 1 2 3 4 -0.1 -0.4 -0.8 -1.1 -1.5

We know that the law of decay tells us that ln(C) = -r*t + ln(C0), so the the data should lie on a straight line. Of course, they don't; experimental errors offset the points. The question is, what are values for r and ln(C0) that will fit a straight line to the data? A more general problem is this. At times t1, t2, ..., tn, we have measurements y1, y2, ..., yn. Find a line y = a*t + b that fits the data.

One good way to solve this problem is the method of least squares. Let
E2 = (|y1 - a*t1 -b|2 + |y2 - a*t2 -b|2 + ... + |yn - a*tn -b|2)/n
The quantity E is the root mean square of all of the errors yj - a*tj -b at each time tj. The idea is to choose a and b so as to minimize E. That is, we will try to find a and b that give the least value for the sum of the squares of the errors. We can put this in the form of an inner product. Define the following column vectors in Rn.
v = [y1 y2 ... yn]T
v1 = [t1 t2 ... tn]T
v2 = [1 1 ... 1]T
Notice that the jth component of the vector v - av1 - bv2 is just the difference yj - a*tj -b. Form this it follows that
E2 = ||v - av1 - bv2||2/n
Now, n is simply the number of data points, so it is fixed. Consequently, minimizing E then is equivalent to minimizing the the distance from v to the space spanned by v1 and v2. Put a little differently, minimizing E is equivalent to finding the v* in span{v1, v2} that comes closest to v or best approximates v.

Continuous least squares problems There is a continuous version of the discrete problem described above. In many applications, we are given a complicated function and we want to approximate it with sums of simpler functions. A familiar example is the Taylor series for ex, where we are approximating ex by a sum of powers of x. Another example is using sines and cosines to approximate a signal in order to find its frequency content. This is one of the applications of Fourier series.

Suppose that we are given a continuous function f(x) on the interval [0,1] that has an upward trend or bias to it. One way to measure this is to fit a straight line to the function f. The difference here is that we know f at every x in [0,1]. The discrete square error E2 goes over to an integral,
E2 = 0S 1 (f(x) - ax -b)2dx .
If we use the inner product < f,g > = 0S 1 f(x)g(x)dx, then E2 = || f(x) - ax -b||2, and the problem again goes over to finding the best approximation to f from the span{1,x}, relative to the norm from our inner product < f,g >.

One can carry this further. If f(x) has not only an upward trend, but is also concave up, then it makes sense to fit a quadratic to it. The problem described above would change to finding the quadratic polynomial f*(x) in span{1,x,x2} that minimizes || f(x) - a0 - a1x - a2x2 ||2.

Least squares problems and inner products All of the problems that we have described above have been put in terms of inner products. Here is the general form of these problems. Suppose that we have an inner product space V, a vector v in V, and a subspace U of V. The least-squares problem is to find both the minimum of || v - u ||, where u is any vector in U, as well as any minimizer v* in U.

Normal equations

Theorem. Let V be a vector space with an inner product < u, v >, and let U be a subspace of V. A vector v* in U minimizes the distance || v - u || if and only if v* satisfies the normal equations,
< v - v*, u > = 0,
which hold for all u in U. That is, v - v* is orthogonal to the whole space U. In addition, v* is unique.

Proof. Let's first show that if v* in U minimizes || v - u ||, then it satisfies the normal equations. The way we do this is similar to the way we proved Schwarz's inequality. Fix u in U and define
q(t) := || v - v* + t u) ||2 = || v - v* ||2 + 2t < v - v*, u > + t2 || u ||2
Because v* minimizes || v - u ||2 over all u in U, the minimum of q(t) is at t = 0. This means that t = 0 is a critical point for q(t), so q'(0) = 0. Calulating q'(0) then gives us 2< v - v*, u > = 0 for all u in U. Dividing by 2 yields the normal equations.

Conversely, if v* in U satisfies < v - v*, u > = 0, then we will show not only that v* is a minimizer, but also that it is the minimizer; that is, v* is unique. To do this, let u be any vector in U. Observe that we can write v - u = v - v* + v* - u = v - v* + u', where u' := v* - u is in U. Consequently, we also have
|| v - u ||2 = ||v - v* + u' ||2
|| v - u ||2 = ||v - v* ||2 + 2< v - v*, u' > + || u' ||2.
Since we are assuming that v - v* is orthogonal to every vector in U, it is orthogonal to u'; hence, < v - v*, u' > =0, and so have that
|| v - u ||2 = ||v - v* ||2 + || u' ||2.
It follows that || v - u || >= ||v - v* ||, so that v* is a minimizer. Now, if equality holds, that is, if || v - u || = ||v - v* ||, then we also have || u' || = 0. Consequently, u' = 0. But then, we have to have u = v*. So, the vector v* is unique.

8 October

Normal equations and bases.

Normal equations relative to a basis. The normal equations are geometric conditions that can be used to directly find the minimizer. When the subspace U has dimension n, they reduce to a set of n equations involving basis vectors.

Corollary If B = {w1 ... wn} is a basis for the subspace U, then the normal equations are equivalent to the set
< v - v*, wk > = 0, k = 1 ... n.

Proof. If the normal equations are satisfied, they hold for every vector in U, including the basis vectors. Thus, the equations above have to hold, too. On the other hand, suppose the equations above are satisfied. We can write any vector u in U as u = c1 w1+ ... + cnwn. It then follows from the equations above that
< v - v*, u > = c1 < v - v*,w1 > + ... + cn < v - v*,wn > = c1·0 + ... + cn·0 = 0,
and so the normal equations hold.

Finding the minimizer - orthonormal case. The normal equations are geometric conditions that can be used to directly find the minimizer v*. When an orthonormal basis is for U is known, the answer is simple, and can be explicitly written down. We just need to apply the last corollary. Take B = {u1 ... un} to be an orthonormal basis for the subspace U. The normal equations relative to this basis are < v - v*, uk > = 0 or, equivalently,
< v, uk > = <v*, uk >,
for k=1 .. n. By the formula the coordinates of a vector relative to an orthonormal basis, we see that <v*, uk > is just the kth coordinate of v*. It follows that the minimizer is given by
v* = < v , u1 > u1+ ... + < v , un > un
It's worth noting that this is the first time we have actually shown that there is a minimizer. Of course, by what we have said above, it's unique. We will use this fact later.

An Example. Suppose that f(x) = ex on [-1,1]. What straight line gives the best least squares fit? The first thing to do is identify the subspace. Here U =span{1,x}. We know that {2, (3/2)½ x} form an orthonormal basis for U relative to the inner product

Here, we interpret p and q as continuous functions, rather than polynomials. Doing a little calculus, we obtain
< ex, 2 > = 2(e - e-1) = 2½sinh(1)
< ex, (3/2)½ x > = 6½ e-1.
Applying the formula we derived, we get f*(x) = 2½sinh(1)·2 + 6½ e-1·(3/2)½ x = sinh(1) + 3e-1 x. The function and the line are plotted below.

Finding the minimizer - non-orthogonal case. Often, it is useful to use a non-orthogonal basis in solving a least squares problem. Let B = {w1 ... wn} be a basis for the subspace U. We showed that the normal equations, which determine the minimizer v*, have the form
< v - v*, wk > = 0, k = 1 ... n.
Since v* is in U, we can represent it in terms of this basis, v* = c1 w1+ ... + cnwn. The normal equations imply that the coefficients cj satisfy the matrix equation
Gc=d,   where Gjk= < wk, wj >,   dj = < v , wj >.
The matrix G is called the Gram matrix for the basis of w's; it is always invertible, because the normal equations always have a unique solution, as we saw above in connection with the case of an orthonormal basis.

Example We will set up the normal equations for fitting a straight line to data y1, y2, ..., yn taken at times t1 < t2 < ... < tn. The vector space V is Rn, with the usual inner product, < x , y > = yTx. The vector v = [y1, y2, ..., yn]T. The subspace U is the span of vectors
w1 = [t1 t2 ... tn]T
w2 = [1 1 ... 1]T.
These are linearly independent vectors because the tj's are all distinct. Consequently, we may take B= {w1, w2} as our basis. The Gram matrix in this case is has entries
G11 = w1T w1 = t12 + t22 + ... + tn2
G12 = w1T w2 = t1 + t2 + ... + tn
G21 = w2T w1 = t1 + t2 + ... + tn
G22 = w2T w2 = n
The vector d has entries
d1 = w1Tv = t1y1 + t2y2 + ... + tnyn
d2 = w2Tv = y1 + y2 + ... + yn.
The nice feature here is that the coefficient vector [c1 c2]T has the slope and intercept of the line for its entries.

The normal equations and the QR factorization Suppose that V is Rn and U has a basis B = {w1, ..., wm} of m column vectors, where m lt; n. Given v in V, We want to minimize ||v - u|| over all u in U. As we have seen, the unique minimizer
v* = c1 w1+ ... + cnwn.
Instead of employing the Gram matrix, as we did earlier, we use our "basic matrix trick," and write v* as a matrix product,
v* = Wc,
where W = [w1 ... wm] is an n×m matrix with linearly independent columns. Now, carry out a QR factorization and write W = QR, where R is an invertible m×m upper triangular matrix and where the columns of Q are form an orthonormal set {u1 = Qe1, ..., um = Qem} that is also a basis for U. The normal equations, relative to the basis comprising the columns of Q, become, for k = 1, ..., n,

< v - Wc, Qek > = 0
ekTQT(v - QRc) = 0 (since W =QR)
ek(QTv - QTQRc) = 0
ek(QTv - Rc) = 0 (since QTQ = I).
From this, it follows that all m components of the column vector QTv - Rc are 0; hence,
Rc = QTv.
These are yet another form of the normal equations. They are especially useful from a numerical point of view. The matrix R is upper triangular and Rc = QTv can be solved quickly. For a discussion, see: G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed., Johns Hopkins Press, Baltimore, 1996.

Series of orthogonal functions and least squares

Least squares approximation Consider the following continuous least squares problem. Start with a function f defined on [-1,1], which we can think of as continuous, and suppose that we want to fit not only a straight line to f, but also a quadratic, cubic, quartic, and so on. In other words, we want to find the degree n polynomial that gives the best least squares fit to the function f over the interval [-1,1].

The orthonormal set of Legendre polynomials is formed by using the Gram-Schmidt process on {1, x, x2, x3, ...} relative to the inner product

We denote these polynomials by {p0, p1, p2, p3, ...}. Earlier, we had seen that
p0 = 2, p1 = (3/2)½ x, p2 = (5/8)½ (3x2 - 1).
We remark that there are similar formulas for all of the orthonormal Legendre polynomials.

For each n, we look at the subspace Un = span{p0, p1, p2, ..., pn}. Of course, Un = Pn, the polynomials of degree n or less. In our least squares minimization problem, we identify f with v and the uk's with the pk's. The minimizer for Un is
f*n = < f, p0 > p0+ ... + < f, pn > pn.

The minimizers f*n change with n in a very simple way. Namely, to go from n to n+1, we only need to add a term to the previous minimizer. If we formally let n tend to infinity, then we get the infinite series
< f, p0 > p0 + < f, p1 > p1 + < f, p2 > p2 + < f, p3 > p3 + ...
for which the minimizer f*n is the nth partial sum.

Let the minimum error over Un be En = || f - f*n ||. Because Un = Pn is contained in Un+1 = Pn+1, we must have En+1 <= En. That is, En decreases as n gets bigger. Does the En go to 0 as n -> infinity? If it does, we say that f*n converges in the mean. We also say that the series converges in the mean to f, and we also write
f = < f, p0 > p0 + < f, p1 > p1 + < f, p2 > p2 + < f, p3 > p3 + ... .

Theorem The series above converges in the mean to f if and only if
|| f ||2 = |< f, p0 >|2 + |< f, p1 >|2 + |< f, p2 >|2 + |< f, p3 >|2 + ...
This formula is called Parseval's equation, and first appeared in connection with Fourier series.

Orthonormal series What we just said applies to any infinite set of orthonormal functions, including the trigonometric functions

Both Fourier series and series of Legendre polynomials converge in the mean whenever the square of the appropriate norm, || f ||2, is finite.

Fourier series. The Fourier series for a 2 periodic function f is usually written as

where the coefficients are given by

As an example, we calculated the Fourier series for the 2 periodic extension of the function |x| defined on the interval [-,]. The resulting series was
½ - 4-1 (cos(x) + 3-2cos(3x) + 5-2cos(5x) + 7-2cos(7x) + ...)

10 October

Numerical Demonstrations

We used MATLAB to do several examples of finding the best continuous least-squares fit using Legendre polynomials. We also looked at the partial sums of the Fourier series for the 2 periodic extension of |x|.

Linear transformations

Definition A mapping L:V -> W, where V, W are vector spaces is said to be a linear transformation if it satisfies these properties.
1. Homogeneity L[cu] = cL[u]
2. Additivity L[u+v] = L[u] + L[v]

Simple properties
• L[ c1v1 + c2v2 + ... + cnvn ] = c1L[v1] + c2L[v2] + ... + cnL[vn]
• L[ 0V] = 0W

Theorem (Matrix associated with L) Let V and W be finite dimensional, and let B = {v1, ... , vn} and D = {w1, ... , wm} be bases for V and W, respectively. If L:V -> W is a linear transformation, then there is a unique m×n matrix A such that w = L[v] holds if and only if A[v]B = [w]D.

Proof Let v = c1v1 + c2v2 + ... + cnvn, so [v]B = [c1, ... , cn]T. From the first of the simple properties above, we see that
[ L[v] ]D = c1[ L[v1] ]D + c2[ L[v2] ]D + ... + cn[ L[vn] ]D
Now, by our "basic matrix trick," we can write this as a matrix product,
[ L[v] ]D = A[v]B, where
A = [ [ L[v1] ]D, ... , [ L[vn] ]D ]
is the m×n matrix we wanted. This matrix is unique because its columns are the unique coordinates for L[v2] ] relative to the basis D.

A Eample Consider the following problem. Let V = W = P2 have the common basis B = D = {1,x,x2}. Suppose that L:P2 -> P2 is a linear transformation given by
L[p] = x2p'' + (2x+1)p' + 3p,
Find the matrix A that represenets L. In addition, find the solution to L[p] = 18x2 - x + 2.

To find A, we first find the output of L applied to each basis vector; that is, L[1], L[x], and L[x2]. Doing this, we obtain L[1] = 3, L[x] = 1 + 5x, and L[x2] = 2x + 9x2. By the construction in the theorem, the kth column of A is the coordinate vector [ L[vk] ]D. Consequently, we have
[ L[1] ]D = [3 0 0]T
[ L[x] ]D = [1 5 0]T
[ L[x2] ]D = [0 2 9]T
We have now found that the matrix A =

```3 1 0
0 5 2
0 0 9```
To solve L[p] = 18x2 - x + 2, we go over to the matrix form of the equation,
A[p]B = [18x2 - x + 2]B = [2 -5 18]T.
Solving this equation, we get
[p]B = A-1[2 -1 18]T = [1 -1 2]T
Putting this back in terms of polynomials, we arrive at p(x) = 2 - x + x2

15 October

Subspaces associated with a linear transformation L : V -> W

• Domain. The vector space V is called the domain.
• Co-domain. The vector space W is called the co-domain. (This is sometimes called the range.)
• Null space. The set of all vectors null(L) := {v in V: L[v] = 0}; it can be shown to be a subspace of V.
• Image. The set of all vectors image(L) := {w in W: w = L[v] for some v in V}; it can be shown to be a subspace of W. (This set is also sometimes called the range.)

Examples

1. ODEs Consider the ODE y"+p(x)y'+q(x)y = g(x). We convert this to operator form L[y]=g, where L[y] = y"+p(x)y'+q(x)y. In this case, the domain V is C(2)(R), the space of twice continuously differentiable functions on R and the co-domain W is C(R), the space of continuous functions on R. The null space, null(L), is the set of solutions to the homogeneous equations. With a little work, one can show that image(L) is also C(R).

2. Matrices Consider the 2×4 matrix M =
```1 -1 2 3
1  1 1 4
```
We let L be the transformation L[v] = Mv, where v is in R4. For this problem, V = R4. The co-domain or set of outputs W is R2. Here, we also have image(L) = R2. The null space is the set of all v for which Mv = 0. Using row-reduction, one can show that
null(L) = span{[-3 1 2 0]T, [-7 -1 0 2]T}

Combinations of linear transformations

Sums K+L is defined by (K+L)[v] = K[v] + L[v].

Scalar multiples cL is defined by (cL)[v] = c(L[v]).

Products If K : V -> U and L : U -> W are linear, then we define LK via LK[v] = L[K[v]]. (This is composition of functions). The transformation defined this way, LK, is linear, and maps V -> W. Note: LK is not in general equal to KL, which may not even be defined.

Inverses Let L : V -> V be linear. As a function, if L is both one-to-one and onto, then it has an inverse K V -> V. One can show that K is linear, and LK = KL = I, the identity transformation. We write K = L-1.

Associated matrices Recall if B = {v1, ... , vn} and D = {w1, ... , wm} be bases for V and W, respectively, then the matrix associated with the linear transformation L : V -> W is
ML = [ [ L[v1] ]D, ... , [ L[vn] ]D ]
Since each of the combinations listed above is still a linear transformation, it will have a matrix associated with it. Here is how the various matrices are related.

MK + L = MK + ML

McL = c ML

MLK = ML MK

ML-1 = (ML)-1

Polynomials in L : V -> V We define powers of L in the usual way: L2 = LL, L3 = LLL, and so on. A polynomial in L is then the transformation
p(L) = a0I + a1L + ... + amLm
Later on we will encounter the Cayley-Hamilton theorem, which says that if V has dimension n, then there is a degree n (or less) polynomial p for which p(L) is the 0 transformation.

Change-of-basis

Change-of-basis and linear transformations Let L : V -> V be linear. Here L maps V into itself. We want to look at what happens when to the matrix of L when we make a change of basis in V. Let V have bases
B = {v1, ... , vn} and B' = {v'1, ... , v'n}
If the matrix of L relative to B is ML, and that relative to B' is M'L, then
M'L = SB -> B' ML (SB -> B')-1, where SB -> B' changes B coordinates to B' coordinates.

17 October

Eigenvalues, eigenvectors, and eigenspaces

Eigenvalue problems We say that a scalar µ is an eigenvalue of L : V -> V if there is exists a vector v in V, with v not equal to 0, such that L[v] = µ v. The vector v is called an eigenvector of µ. We let Eµ be set eigenvectors associated with µ, along with 0. Eµ is a subspace of V that is called the eigenspace of µ.

Trajectories of a velocity field Suppose that we are given a time-independent velocity field for a 2D fluid, V(x). Assume that V is a linear function of x, so that V(x) = L[x], where L is a linear transformation taking 2D to 2D. We want to find the trajectories of the fluid particles, dx/dt = V(x) = L[x], given that we know the initial position x(0). Let B = {i,j} be the usual basis for 2D. Relative to this basis L will have a matrix A. so our problem becomes [dx/dt]B = A[x]B. Since the basis B is time-independent, [dx/dt]B = d[x]B/dt. If we let X = X(t) = [x]B=[x1 x2]T, we arrive at the system
dX/dt = AX, where X(0)= [x(0)]B.
To make the situation more concrete, we will assume that A =
```1  4
1 -2```
When this system is written out for this choice of A, it looks like
dx1/dt = x1 + 4x2
dx2/dt = x1 - 2x2
The idea here is to switch to a basis where the system decouples. To accomplish this, we will use a basis of eigenvectors for A, {[4 1]T, [-1 1]T}. These correspond to the eigenvalues of A µ = 2 and µ = -3, respectively. (We will discuss how to get these later). Relative to our original 2D space, we write this basis as B' = {i+4j, -i+j}. The change of basis matrix S = SB' ->B is given by S =
```4 -1
1  1
```
Of course, SB ->B' = S-1. Relative to the new basis, the matrix of L, A' = M'L = S-1AS =
```2 0
0 -3
```
Letting Z = [x]B' = [z1 z2]T, we have the new system dZ/dt = A'Z, with Z(0) = [x(0)]B'. In the new coordinates the system decouples and becomes
dz1/dt = 2z1
dz2/dt = - 3z2
This decoupled system is easy to solve. The result is
z1(t) = e2tz1(0)
z2(t) = e-3tz2(0)
Transforming back to the original coordinates, we have that X = SD(t)S-1X(0), where D(t) =
```exp(2t)   0
0         exp(-3t)
```
This explicitly solves the problem. However, we still have to explain how to find the eigenvalues and eigenvectors of A.

Finding eigenvalues and eigenvectors Let A be an n× n matrix, and define pA(µ) = det(A - µ I). One can show that pA is a polynomial of degree n. The scalar µ is an eigenvalue of A if and only if it is a root of pA. This follows from two obeservations. First, µ is an eigenvalue of A if and only if for some x not equal to 0 Ax = µ x; this in turn is equivalent to (A - µ I)x = 0, so A - µ I is singular. Second, A - µ I is singular if and only if det(A - µ I) = 0.

• For A, the problem of finding eigenvalues and eigenvectors decouples. One can first find the roots of pA(µ), and then solve a linear system to get the eigenvectors.
• If the scalars are the complex numbers, then the Fundamental Theorem of Algebra implies that pA has at least one root. Consequently, A has at least one eigenvalue.
• To find the eigenvalues and eigenvectors of a linear transformation L, choose a basis for V and find the matrix of L relative to this basis, ML. The eigenvalues of L are precisely those of ML; the eigenvectors of L have coordinates corresponding to the eigenvectors of ML.
Returning to our earlier trajectory problem in which A =
```1  4
1 -2```
we see that pA(µ) = det(A - µ I) = µ2 + µ - 6 = (µ+3)( µ-2). Thus the eigenvalues are µ = 2 and µ = -3. Now, we can solve for the eigenvectors. When µ = 2, the corresponding eigenvector X satisfies (A - 2I)X = 0. In augmented form, this becomes
```-1   4  0
1  -4  0
```
which has X = x2·[4 1]T as a solution. Repeating the argument for µ = -3 results in X = x2·[-1 1]T. To get the basis we want, we choose x2 = 1 in both cases. Other nonzero values will work equally well.

22 October

Diagonalization

Diagonlizable linear transforms A linear transformation T : V -> V is diagonalizable if and only if there is a basis B for V comprising only eigenvectors of L. When the happens ML, the matrix of L relative to the basis B will be a diagonal matrix. Conversely, if there is a basis relative to which the matrix of L, ML, is diagonal, then that basis will be composed of eigenvectors.

A non-diagonalizable linear transformation The shear transformation L[x] = x + ax2i, where a > 0, has the matrix ML =
```1 a
0 1
```
This matrix is not diagonalizable. It's only eigenvalue is µ=1 and the only eigenvectors have the form c·[1 0]T, where c is a scalar. There is no second linearly independent eigenvector, and so there is no basis of eigenvectors.

Applications

Normal modes of a spring system We consider a horizontal spring system consisting of three identical springs (constant = k) and two identical masses (mass = m), all attached to wall mounts. The displacements from rest of the first and second masses are x1 and x2, respectively. Newton's laws for the system give these equations of motion
md2x1/dt2 = -2kx1 + kx2
md2x2/dt2 = kx1 - 2kx2
We can put these equations in matrix form. Let X = [x1 x2]T. The equations become the single matrix equation
md2X/dt2 = - kAX, where the matrix A =
``` 2  -1
-1   2
```
A normal mode of this system is a solution of the form X(t) = Xweiwt, where Xw is independent of t and not equal to 0. Plugging this solution back into the matrix equation yields, after cancelling eiwt,
AXw = (mw2/k) Xw
We see that Xw is an eigenvector of A corresponding to the eigenvalue µ = mw2/k. Thus we need to find the eigenvalues and eigenvectors of A. As usual, we obtain the characteristic polynomial:
pA(µ) = det(A - µI) = (2 - µ)2+1.
The roots are µ = 1 and µ = 3, and the corresponding eigenvectors are [1 1]T and [-1 1]T. In writing the normal modes, we use both e±iwt or, equivalently, cos(wt) and sin(wt).
[1 1]Texp(±i(k/m)½t)   frequency = (k/m)½
[-1 1]Texp(±i(3k/m)½t)   frequency = (3k/m)½
Physically, in the lower frequency mode both masses move in tandem together, -> -> or <- <-. In the higher frequency mode, they move exactly opposite each other, -> <- or <- ->.

A circuit We want to analyze the circuit shown below. In the circuit, E(t) is a voltage source, R1 and R2 are resistors, L is a coil, and C is a capacitor. The state variables for this system are I=IL, the current through L, and V = VC, the voltage drop across C.

Recall that we have these relations among currents and voltages in the circuit components.
VL = L dI/dt
IC = CdV/dt
VR1 = R1(I + IC) = R1(I + CdV/dt)
VR2 = R2I
Kirchoff's laws for this circuit are as follows. For the loop E-R1-C, we have
E = R1(I + CdV/dt) + V
For the loop R2-L-C. we have
V = R2I + L dI/dt
Rearranging these equations gives us
dI/dt = V/L - R2I/L
dV/dt = (E - V)/(CR1) - I/C
We want to put this in matrix form. Let X = [I V]T, F(t) = [0 E/(CR1)]T, and finally let A =
 - R2/L    1/L - 1/C   - 1/(CR1)
Doing so puts the system in the form dX/dt = F(t) + AX. Of course, we also need to know the initial conditions I(0) and V(0).

Special case - complex eigenvalues Assume that
E(t) = 0, L =1 henry, C = 1 farad, R1 = R2 = 1 ohm.
Then, we have
A =
```-1  1
-1 -1
```
The characteristic polynomial for this matrix is pA(µ) = (-1 -µ)2 + 1. The two eigenvalues are then
µ± = -1 ± i.
The augmented matrix representing the system (A -µ+I)X = 0 is
```-i   1  0
-1  -i  0
```
This is equivalent to the single equation -ix1 + x2 = 0. Up to nonzero multiples, the eigenvector for µ+ is [1 i]T. Either by repeating these steps with µ- or by taking complex conjugates, we have that the eigenvector for µ- is [1 -i]T. As in our previous examples, we form the change-of-basis matrix S = SB' -> B =
```1   1
i  -i
```
Setting Z = SX, the system becomes
dz1/dt = µ+z1 = (-1+i)z1
dz2/dt = µ-z2 = (-1-i)z2
Consequently, z1(t) = e(-1+i)t z1(0) and z2(t) = e(-1-i)t z2(0). Finally, X(t) = S-1D(t)SX(0), where D(t) =
 e(-1+i)t   0 0   e(-1-i)t
It's easy to show that if X(0) is a real vector, then, even though complex numbers are involved, X(t) will be real.

24 October

Similarity

Similar matrices   Recall that when we change coordinates, the two matrices representing a linear transformation L are related by M'L = S-1MLS. We say that an n×n matrix B is similar to an n×n matrix A if there exists an invertible matrix S such that B = S-1AS. Simple properties of similarity are listed below. We remark that these properties together make similarity an equivalence relation.

• B is similar to A if and only if A is similar to B. (Thus, we can simply say that A and B are similar matrices.)
• A is similar to A.
• If A is similar to B, and B is similar to C, then A is similar to C.

Diagonalizable matrix A linear transformation L was defined to be diagonalizable if there was a basis relative to which its matrix ML was diagonal. If we regard an n×n matrix A as a linear transformation, then the condition for it to be diagonalizable is that there is a matrix S such that S-1AS is a diagonal matrix. Equivalently, A is diagonalizable if and only if it is similar to a diagonal matrix.

Determinants - A Quick Tour

Permutations Permutations of the integers 1 through n are either even or odd. A permutation is even if it can be achieved by an even number of interchanges (transpositions), and odd if it takes an odd number of them. There are n! permutations. We define the function sgn(p) to be +1 if p is an even permutation, and -1 if p is odd.

Definition of a determinant If A is an n×n matrix, then we define det(A) via
det(A) = SUMp sgn(p) ai1,1ai2,2 ...ain,n ,   p = (i1,i2,...,in)

Basic properties of determinants These properties follow immediately from the definition. On the other hand, they characterize the determinant. Only det(A) satisfies them.

1. Multilinearity. Let A =[a1,a2,...,an]. Then det([a1,a2,...,an]) is a linear function of column ak, if all other columns are held fixed.
2. Alternating function. Interchanging two columns changes the sign of the determinant.
3. det(I) = 1, I = identity matrix.

Determinants and matrices
1. If two columns of A are equal, then det(A)=0.
2. If a column of A has all 0's, then det(A)=0.
3. Product rule
If A and B are n×n matrices, then det(AB)=det(A)det(B).
4. A is singular if and only if det(A) = 0.
5. det(A-1) = (det(A))-1
6. det(AT) = det(A)
7. det([a1,a2,...,ak-1, ak + caj, ak+1,...,an]) = det([a1,a2,...,an]) (j not equal to k).
8. det([a1,a2,...,cak, ...,an]) =c det([a1,a2,...,an])

Characteristic polynomial
Structure pA(µ) = (-1)n µn + (-1)n-1 µn-1 trace(A) + ... + det(A), where trace(A) = a11+a22+...+ann
Degree The degree of the characteristic polynomial for an n×n matrix is precisely n.
Similar matrices If B = S-1AS, then pB(t) = pA(t).

Necessary conditions for a matrix to be diagonalizable

Theorem. If an n×n matrix A has n distinct eigenvalues, then A is diagonalizable.

Proof. Let µ1, µ2,..., µn be the n distinct (possibly complex) eigenvalues of A; equivalently, these are the roots of pA. Let v1, v2,..., vn. Be n eigenvectors corresponding to these n eigenvalues. We will show that these eigenvecotrs form a linearly independent set of vectors. We start with the equation
c1v1 + c2v2 + ... + cnvn = 0
Consider the matrix P1 = (µ2I - A)( µ3I- A)...(µnI - A). Using the fact that the vk's are eigenvectors of A, we have that
P1vk = 0 if k=2,3,...,n
P1v1 = (µ2 - µ1)( µ3- µ1)...(µn - µ1)v1,
so applying P1 to both sides of the previous equation results is
c12 - µ1)( µ3- µ1)...(µn - µ1)v1 = 0
Since v1 is not 0, and since the eigenvalues are distinct, we have that c1=0. Repeating this procedure with the remaining coefficients implies that all of the c's are 0, and so the set is a basis because it is a maximal linearly independent set.

Adjoints   The adjoint of a linear transformation L : V -> V, where V is an inner product space, is the unique linear transformation L' that satisfies
< L'[u],v > = < u, L[v] > .
For a real m×n matrix A, the adjoint is the transpose AT. If the matrix is complex, then the adjoint of A is AH, the conjugate transpose. A matrix is selfadjoint or Hermitian if it is equal to its own adjoint. Thus, if A is real, A = AT; i.e., A is symmetric.

1. The eigenvalues of a selfadjoint matrix are real.
2. Eigenvectors corresponding to distinct eigenvalues are orthogonal.
3. Every selfadjoint matrix is diagonalizable.
4. Every selfadjoint matrix has an orthonormal basis relative to which it is diagonal. Equivalently, for a real n×n matrix, there is an orthogonal matrix S such that STA S is a diagonal matrix.

Q(X) = 5x2 + 5y2 + 5z2 + 2xy + 2xz + 2yz = XTAX, where X = [x y z]T and A =
```5 1 1
1 5 1
1 1 5
```
We wish to find coordinates relative to which the "cross terms" are removed. A set of axes that has this property is called principal. Since A is selfadjoint, it can be diagonalized by an orthogonal transformation S. That is, there is a diagonal matrix D such that D = STAS. Because S is orthogonal, we also have A = SDST. Now, let Y = [u v w]T = STX. We see that
XTAX = YTDY = µ1u2 + µ2v2 + µ3w2,
and the cross terms have been transformed away by rotating and (possibly) reflecting coordinates.

We now will finish the probelm by finding the eigenvalues and eigenvectos of A. The characteristic polynomial of A is pA(µ) = det(A - µI) =
 5-µ  1   1 1   5-µ  1 1    1   5-µ
We do not change the determinant by subtracting (5-µ)× column 2 from column 1. Thus, pA(µ) =
 0         1       1 1-(5-µ)2   5-µ  1 1-(5-µ)    1   5-µ
We can factor 1-(5-µ) = µ-4 out of the first column. The result, after simplifying, is that pA(µ) = (µ-4)×
 0        1   1 6+µ   5-µ  1 1        1   5-µ
We now subtract the last column from the second. We obtain pA(µ) = (µ-4)×
 0        0    1 6+µ   4-µ   1 1      µ-4   5-µ
Next, we factor µ-4 out of the second column to get pA(µ) = (µ-4)2×
 0    0    1 6+µ 1   1 1   -1   5-µ
Evaluating the last 3×3 determinant is easily done. The final result is
pA(µ) = (µ-4)2(7-µ)
Now we will find the eiegnvectors correspond to µ=4 and µ=7. For µ=4, the augmented matrix [A-4I|0] is
 1 1 1 0 1 1 1 0 1 1 1 0
This system is equivalent to the single equation x+y+z=0. There are two linearly independent solutions:
[-1 1 0]T and [-1 0 1]T.
Any linear combination of these is still an eigenvector. Using the Gram-Schmidt process, we can convert this to an orthonomal set,
2[-1 1 0]T and 6[-1 -1 2]T
The eigenvector for µ=7 is found by solving the system with augmented matrix [A-7I|0]
 -2 1 1 0 1 -2 1 0 1 1 -2 0
The resulting eigenvector is 3[1 1 1]T. Finally, S =
 -2-½ -6-½   3-½ 2-½  -6-½   3-½ 0     2·6-½   3-½
Relative to the new coordinates, the quadratic form becomes YTDY = 4u2+4v2+7w2.

29 October

Triangular forms of matrices

Motivation We return to the circuit we were working with earlier. (See the class notes for October 22.) Assume that the various components have these values:
E(t) = 0, L =1 henry, C = 1 farad, R1 = 1 ohm, R2 = 3 ohm.
The matrix equation for the circuit is dX/dt = AX + F(t), where X = [I V]T. With these values for the components, F(t) = 0, and
A =
```-3  1
-1 -1
```
The characteristic polynomial for this matrix is pA(µ) = µ2 + 4µ + 4 = (µ+2)2. Thus, there is only one eigenvalue, µ = -2. The augmented matrix representing the system (A -µI)X = 0 is
```-1   1  0
-1   1  0
```
This is equivalent to the single equation -x1 + x2 = 0. Hence, up to nonzero multiples, the eigenvector for µ = -2 is [1 1]T. There are no other linearly independent eigenvectors, and so A is not diagonalizable. What this means is that the system of equations doesn't completely decouple. It does partially decouple, though. Consider a new basis B' = {[1 1]T, [0 1]T}. As before, let S = SB'->B =
```1  0
1  1
```
Setting Z = SX, the system becomes dZ/dt = S-1ASZ. Doing a little matrix algebra, we see that S-1AS =
```-2  1
0 -2
```
This is called the Jordan normal (canonical) form of A. The new system is
dz1/dt = -2z1 + z2
dz2/dt = -2z2
Solving the second equation gives us z2(t) = e-2t z2(0). Substituting this into the first equation results in a nonhomogeneous, linear, first order differential equation,
dz1/dt = -2z1 + e-2t z2(0).
We can solve this using integrating factors. (See an ODE text for an explanation). The solution is
z1(t) = e-2t z1(0) + te-2t z2(0)
Finally, X(t) = S-1T(t)SX(0), where T(t) =
 e-2t  te-2t 0     e-2t

Block upper triangular form   Every matrix A has a basis relative to which it is in block triangular form. This means that we can find an invertible matrix S such that S-1AS =
 T1,1 0 0 ... 0 0 T2,2 0 ... 0 ... ... ... ... ... 0 0 0 ... Tr,r
Each diagonal block Tk,k is upper triangular, with the diagonal entries all being an eigenvalue µk repeated as many times as it is a root of the characteristic polynomial. For example, if µ7 is repeated four times, then T7,7 =
 µ7 * * * 0 µ7 * * 0 0 µ7 * 0 0 0 µ7

Jordan normal (canonical) form   Every matrix A has a basis relative to which the blocks Tk,k are Jordan blocks, Jm(µ). This is an m×m matrix with µ's down the diagonal, 1's down the superdiagonal, and 0's elsewhere. For example, if m = 6 and µ = 3, then J6(3) =
 3 1 0 0 0 0 0 3 1 0 0 0 0 0 3 1 0 0 0 0 0 0 3 1 0 0 0 0 0 3
Two matrices having the same Jordan normal form, apart from ordering of the blocks along the diagonal, are similar. An m×m matrix A is similar to Jm(µ) if and only there is a basis {f1, ..., fm} satisfying

Af1 = µf1
Af2 = µf2 + f1
...

Afm = µfm + fm-1

Example   Consider the matrix A =
```2  1 -1
0  2  3
0  0  2
```
We begin with the eigenvector, f1 = (1,0,0)T. Solving (A - 2I)f2 = f1 gives f2 = (0,1,0)T. Finally, solving (A - 2I)f3 = f2 gives f3 = (0,1/3,1/3)T. Thus, S-1AS = J3(2), where S = [f1, f2, f3]

Midterm review

31 October
Midterm

5 November
Tensors

Invariance Physical laws should be formulated in ways that are independent of any coordinates used; that is, the laws should be stated in such a way that they are invariant under transformation of coordinates. For scalar quantities this has a simple meaning. If we know the temperature on a surface T(P) in u-coordinates, so that T=f(x). Then in another coordinate system, x = g(x'), the temperature is given by T=f(g(x')). The law of transformation is simply composition of functions (substitution).

Vectors Vectors are displacements in ordinary three dimensional space, or physical quantities, like velocity, acceleration, force, and so on, that can be represented by such displacements. Invariance means describing these vectors in a way that is independent of the choice of coordinates. Indeed, tensor algebra can be viewed as linear algebra for such displacements.

Affine coordinates We will begin with an affine coordinate system in 3D space. This simply means that the axes we use for coordinates are three lines that intersect at the origin and that do not lie in the same plane. These are like the usual x-y-z axes, except that they do not need to be perpendicular. We label these axes as x1-x2-x3. A point in space P then has coordinates P(x1, x2, x3). If another point Q has coordinates
Q(x1+dx1, x2+dx2, x3+dx3),
then the displacement dx = PQ is described by the column vector [dx1 dx2 dx3]T relative to a basis of displacements corresponding to the x1-x2-x3 system. In particular, the three displacements fj corresponding to PQj, where the Qj's are given by
Q1(x1+1, x2, x3),
Q2(x1, x2+1, x3),
Q3(x1, x2, x3+1),
provides a basis for the 3D displacements, as long as we are using an affine system of coordinates. We let B = {f1,f2,f3} be the basis formed by these displacements.

Transformation laws

Change of coordinates We now consider a change to another affine coordinate system with axes x'1-x'2-x'3.
x'1 = J1,1x1 + J1,2x2 + J1,3x3 + b1
x'2 = J2,1x1 + J2,2x2 + J2,3x3 + b2
x'3 = J3,1x1 + J3,2x2 + J3,3x3 + b3.
For future reference, note that the 3×3 matrix J, which is called the Jacobian matrix of the coordinate transformation, has entries given by

Relative to the primed system, we have a basis B' = {f1',f2',f3'}. The new components for the displacement dx are related to the old via these equations:
dx'1 = J1,1dx1 + J1,2dx2 + J1,3dx3
dx'2 = J2,1dx1 + J2,2dx2 + J2,3dx3
dx'3 = J3,1dx1 + J3,2dx2 + J3,3dx3.
In matrix form, this becomes [dx]B' = J[dx]B. Earlier, we developed formulas for change of bases. The notation we are using here corresponds to the earlier notation in the following way:
B <-> B,   fk <-> vk
B' <-> D,   fk' <-> wk
[dx]B' = J[dx]B <-> [v]D = A[v]B; thus, J <-> A and J-1 <-> C = A-1
fk = J1,kf1' + J2,kf2' + J3,kf3' <-> vj=A1jw1 + A2jw2 + ... + Anjwn   (JT <-> AT)
fk' = J-11,kf1 + J-12,kf2 + J-13,kf3 <-> wk=C1kv1 + C2kv2 + ... + Cnkvn   ( (J-1)T <-> CT )

Dual space The dual space to the space of 3D displacements includes physical quantities such a work done by a force in displacing a mass dx. Recall that the dual space is composed of linear functions that take vectors to scalars. For the basis B = {f1,f2,f3}, the dual basis B* = {f1,f2,f3} satisfies fj(fj) = 1 and fk(fj) = 0 for j not equal to k. Let L be a linear functional on 3D displacements. Expand L using B* to get
L = y1f1 + y2f2 + y3f3,
and thus the coordinate vector for L relative to B* is
[L]B* = [y1 y2 y3]T.
Recall that if dx is a displacement (i.e., a vector), then
L(dx) = [L]B*T[dx]B
Now, L(dx) is independent of our choice of bases, so if we change everything to a new basis B', then we will have
L(dx) = [L]B'*T[dx]B'.
Since [dx]B' = J[dx]B, we see that
[L]B*T[dx]B = [L]B'*TJ[dx]B = (JT[L]B'*)T [dx]B,
which holds for all displacements dx. Hence, ([L]B* - JT[L]B'*)T[dx]B = 0 holds for all [dx]B. By choosing appropriate values for [dx]B, we obtain
[L]B* = JT[L]B'*.
Finally, we arrive at the transformation law for [L]B*:
[L]B'* = (JT)-1[L]B*.
This also provides us with the transformation law for the dual basis itself.
fj' = Jj,1f1 + Jj,2f2 + Jj,3f3, j=1,2,3.

Summary In the table below, we list the transformation laws for bases and components. In all cases, the coefficients refer to equating primed quantities to linear combinations of unprimed quantities -- e.g., [dx]B' = J[dx]B.
 Displacements Dual space Basis (JT)-1 J Components J (JT)-1
Looking at the table above, we see that that there are two types of transformation laws. Quantities that transform the same way as the basis vectors B are called covariant. The components of displacements transform in a way roughly inverse to the basis. For this reason, they are called contravariant. The vectors in the dual basis are contravariant, and the components of dual vectors transform covariantly. Covariant quantities are denoted by subscripts, and contravariant quantities by superscripts. Thus fj is covariant and fj is contravariant.

The metric tensor

Distance The distance between two points is space is the square root of
ds2 = dx·dx,
where the "dot" denotes the usual dot product in 3D. We want to put this in terms of components. It is in fact a little easier to look at the dot product of two displacements, dx·dy, where
dx = dx1f1 + dx2f2 + dx3f3 and dy = dy1f1 + dy2f2 + dy3f3
Taking the dot product of these vectors results in a quadratic form with nine terms,
dx· dy = f1·f1 dx1dy1 + f2·f2 dx2dy2 + f3·f3 dx3dy3 + f1·f2 dx1dy2 + f2·f1 dx2dy1 +
f1·f3 dx1dy3 + f3·f1 dx3dy1 + f2·f3 dx2dy3 + f3·f2 dx3dy2.
We want to put this in matrix form. Define the matrix g with entries
gj,k = fj·fk.
The result is that we can write dx·dy as
dx·dy = [dy]BTg[dx]B

Transformation laws The matrix g contains the components of the metric tensor relative to the basis B. Using the invariance of the dot product or the transformation laws for basis vectors,one can easily show that
g'=(JT)-1g J-1.
Equivalently, we can write out the transformation laws for the components:
g'j,k = SUMm,ngm,n (JT)-1m,j (JT)-1n,k

Simple properties of the metric tensor g We close by noting that the matrix g is actually a Gram matrix. As such is invertible and symmetric. In addition, it has the property that it is positive definite. All of its eigenvalues are positive.

7 November

Reciprocal bases and dual bases

Reciprocal basis Let B = {f1,f2,f3} be a basis for the displacements in 3D. We want to define a new basis for the displacements, one that behaves like a dual basis but that is still in 3D. We are looking for vectors {f1,f2,f3} with the property that
fj·fj = 1
and
fj·fk = 0, if j is not equal to k. Every basis can be written as linear combinations of vectors in B. That is, for any other basis, there will be coefficients aj,k such that for j=1,2,3,
fj = aj,1f1 + aj,2f2 + aj,3f3
In particular, we have
fj·fk = aj,1f1·fk + aj,2f2·fk + aj,3f3·fk
Now, recall that the metric tensor gj,k = fj·fk. Hence, we may rewrite the set of equations above as
fj·fk = aj,1g1,k + aj,2g2,k + a3,3g3,k
The sum above is just the j,k component of the matrix product ag. The conditions for a reciprocal basis will be satisfied if and only if [ag]jj = 1 and [ag]jk = 0 for j not equal to k. This means that ag = I, the identity matrix. Hence, the new basis will be reciprocal if and only if a = g-1. The matrix g-1 gives rise to a new tensor that is called the conjugate of the metric tensor g. The components of g-1 will be denoted by
gj,k .
The reciprocal basis it generates is then written as
fj = gj,1f1 + gj,2f2 + gj,3f3

Connection with dual space Let L be a linear functional on the space of displacements. Recall that we can write L in terms of the dual basis; its components are then Lk. The result of applying L to dx is
L(dx) = [L]B*T[dx]B = L1 dx1 + L2 dx2 + L3 dx3
This is precisely the same result we would get from L·dx if we regarded L as a displacement, with reciprocal basis representation
L = L1f1 + L2f2 + L3f3.
The point is that linear functionals in the dual space can thus be identified with displacements. In addition, the dual basis may be identified with the reciprocal basis. This also means that if the underlying basis B is changed to B', then the dual basis B* and the reciprocal basis Brecip transform in exactly the same way to B'* and B'recip.

Contravariant and covariant components of a vector

Two ways of representing vectors We can represent a displacement vector v in two ways. First, we can use the basis B = = {f1,f2,f3}. This gives us the following expression:
v = v1f1 + v2f2 + v3f3, and so [v]B = [v1 v2 v3]T
The second way is to use the reciprocal basis Brecip = {f1,f2,f3}, which results in
v = v1f1 + v2f2 + v3f3, and so [v]Brecip = [v1 v2 v3]T.
The relationship between the components is determined by the change-of-basis matrix that relates B and Brecip. As we have seen above, the reciprocal basis is expressed in terms of B using the entries of g-1. The formulas derived earlied for making a change of basis apply here, too. We identify B here with the B used earlier, and Brecip with the basis D. Thus we identitfy g-1 and the matrix CT used earlier, and we obtain [v]B = (g-1)T[v]Brecip. However, both g and g-1 are symmetric, so that in fact we have
[v]B = g-1[v]Brecip or, equivalently, [v]Brecip = g[v]B

Transformation laws If we introduce a coordinate transformation that introduces a new basis B', then we have seen that the metric tensor transforms according to the law g'=(JT)-1g J-1 and that [v]B' = J[v]B. It follows that
[v]B'recip = g'[v]B' = (JT)-1g J-1J[v]B = (JT)-1g[v]B = (JT)-1[v]Brecip.
Thus the components relative to the reciprocal basis transform covariantly.

Representations of a Vector v
Original Basis Reciprocal Basis
Representation   v = v1f1+ v2f2+ v3f3   v = v1f1+ v2f2+ v3f3
Components   vj = fj·v = SUMkgj,kvk   vj = fj·v = SUMkgj,kvk
Transformation matrix J (JT)-1
Transformation law Contravariant Covariant

The Inertia Tensor
See §2.4.3 in Borisenko and Tarapov.

12 November

Examples of Tensors

The stress tensor See §2.4.2 in Barisenko and Tarapov.

The deformation (strain) tensor See §2.4.4 in Barisenko and Tarapov.

Curvilinear coordinates

Generalized coordinates Examples: cylindrical coordinates and spherical coordinates. See §2.8 in Barisenko and Tarapov.

Coordinate surfaces Barisenko and Tarapov, §2.8.1.

Coordinate curves Barisenko and Tarapov, §2.8.2.

14 November

Bases and reciprocal bases in generalized coordinates

Basis vectors Let qj = qj(x1, x2, x3), j=1,2,3, be a set of generalized coordinates. The xj's are cartesian coordinates. There are three coordinate curves associated with the qj's. If x is the usual radius vector to a point in three dimensional space, then x = x(q1, q2, q3), and the three curves are
1. x = x(t, q2, q3)   (q1-curve)
2. x = x(q1, t, q3)   (q2-curve)
3. x = x(q1, q2, t)   (q3-curve)
Let's look at the q1-curve. The velocity vector tangent to this curve at any time t is dx/dt. This can be computed using the chain rule:

Set t = q1. This gives us our first basis vector, e1. The others are defined in the same way. That is, for j=1,2,3, we set

Together, these form a basis for the 3D displacements. The basis vectors, however, do depend on the point described by the coordinates q1, q2, q3. This means that a different basis is associated with each point in three dimensional space.

Reciprocal basis vectors The reciprocal basis for {e1, e2, e3} can be obtained via the cross product. For example,
e1 = W-1e2×e3,   W = e1·e2 ×e3.
The others are defined by cyclic permutation of the indices involved. (W stays the same, of course. See problem 4 in Assignment 4.)

There is another way to view the reciprocal basis vectors, a way that is similar to viewing basis vectors as tangent vectors to the coordinate curves. Let F(x1, x2, x3) = C be the level surfaces for for a function F. Recall that at a point P(x1, x2, x3), the vector ∇F is normal to the plane tangent to F=C at P. Applying this to the coordinate surface q3(x1, x2, x3) = c3 = constant, we see that ∇q3 is perpendicular to the tangent vectors to the q1 and q2 coordinate curves. Since these tangent vectors are precisely the two basis vectors e1 and e2, it follows that ∇q3 is parallel to e1×e2 and hence to e3. In fact, we will show that they are equal.

Proposition For j=1,2,3, ∇qj = ej

Proof: We will show the case j=3. The others are identical. The q3 coordinate curve through a point P is given by x = x(q1, q2, t). On this curve, we have
q3(x) = t.
If we differentiate both sides with respect to t and use the chain rule, we will get
∇q3·dx/dt = 1.
Since e3 = dx/dt in this case, we have that ∇q3·e3 = 1. As we have already mentioned, ∇q3·e1 = 0 and ∇q3·e2 = 0. By definition, ∇q3 = e3. This completes the proof.

The metric tensor and volume element We remark that the metric tensor is obtained from the dot product of two displacements dx·dy, relative to the basis B = {e1, e2, e3}. The result is dx·dy = [dy]BTg [dx]B, where the metric tensor g has components gj,k = ej·ek. From problem 5c, Assignment 9, we have that the volume element is given by
dV = G½dq1dq2dq3, where G = det(g).

Cylindrical coordinates In cylindrical coordinates, (r,θ,z), we have that
er = cos(θ)i + sin(θ)j
eθ = -r sin(θ)i + r cos(θ)j
ez = k
From the proposition we proved above, we can calculate the reciprocal basis vectors.
er = ∇r =(x/r)i + (y/r)j = cos(θ)i + sin(θ)j =er.
Similarly, we have
eθ = ∇θ = - (y/r2)i + (x/r2)j = r-2eθ
and
ez = ez = k.
The metric tensor is g =
 1 0 0 0 r2 0 0 0 1
This is frequently written as a quadratic form giving the square of the arc length element ds:
ds2 = dr2 + r22 + dz2.
Finally, since G = det(g) = r2, we see that the volume element is dV = rdrdθdz.

19 November

Summary of vectors and tensors in a fixed coordinate system

The basis B Points in 3D are described by coordinates (q1,q2,q3). The radius vector x = x(q1,q2,q3). The three vectors derived from the radius vector via ej = ∂x/∂qj form a basis B = {e1, e2, e3}. This basis may vary from point to point in space. Here we are concerned with only what happens at a single point.
Here is a 2D example. Suppose we have x = (q1 + q2)i + ( (q1)2 - q2)j. The basis for this case is
e1 = ∂x/∂q1 = i + 2q1j
e2 = ∂x/∂q2 = i - j

The metric tensor Recall that ds2 = [dx]BTg[dx]B = ∑gj,kdqjdqk, where gj,k = ej·ek. The inverse of this matrix, g-1, is also important. We will denote its enrties by gj,k. We point out that g is also the Gram matrix for the basis B.

In the 2D example, g =
 1+4(q1)2 1-2q1 1-2q1 2
and g-1 = (1+2q1)-2×
 2 2q1-1 2q1-1 1+4(q1)2

The reciprocal basis Br The reciprocal basis Br = {e1, e2, e3} may be constructed in three different ways.
1. Gradient ej = ∇ qj.
2. Cross product e1 = W-1e2×e3,   W = e1·e2 ×e3, where the others are defined by cyclic permutation of the indices involved.
3. Metric tensor We may directly use the definition ej·ek = δjk to find the reciprocal vectors in terms of the basis B. The result is simply that
ej = ∑ gj,mem.
This is easy to verify:
ej·ek = ∑ gj,mem·ek = ∑ gj,m gm,k = [g-1g] jk = δjk

Returning to our 2D example. Far and away the easiest method is the third one above. The reciprocal basis for this case is
e1 = 2(1+2q1)-2e1 + (2q1-1)(1+2q1)-2e2
e2 = (2q1-1)(1+2q1)-2e1 + (1+4(q1)2)(1 + 2q1)-2e2
These can also be written in terms of {i,j}.

Vectors & components We can represent any vector (displacement) v using B-coordinates or Br coordinates. That is,
v = v1e1 + v2e2 + v3e3  ( [v]B = [v1 v2 v3]T, contravariant)
or
v = v1e1 + v2e2 + v3e3  ( [v]Br = [v1 v2 v3]T, covariant)
The two column vectors are related via the equations
[v]Br = g[v]B and [v]B = g-1[v]Br
Tensor notation directly writes out the matrix products:
vj = ∑ gj,k vk and vj = ∑ gj,k vk

Back to the 2D example. Using the g and g-1 from before, we can write covariant components in terms of contravariant ones,
v1 = (1+4(q1)2)v1 + (1-2q1)v2
v2 = (1-2q1)v1 + 2v2 ,
or the contravariant components in terms of the covariant components,
v1 = 2(1+2q1)-2v1 + (1+2q1)-2(2q1-1)v2
v2 = (1+2q1)-2(1+4(q1)2 )v1 + (1+2q1)-2(2q1-1)v2

Tensors & components We regard tensors as linear transformations on the spaces associated with 3D displacements. Because most problems involve only linear transformations that take vectors to vectors, we will concentrate on these. To that end, suppose that T is a linear transformation that takes 3D displacements to 3D displacements; that is,
T(v) = w
Recall that we can represent T by a matrix, given bases for the inputs and the outputs. We list these below.

Input basis    Output basis    Matrix  Column k    (j,k)-entry  Tensor type
B B M [T(ek)]B Tj k Mixed
B Br N [T(ek)]Br Tj kCovariant
Br Br P [T(ek)]Br Tj kMixed
Br B Q [T(ek)]B Tj kContravariant

The names under the heading matrix are arbitrary labels. They are used only here and nowhere else. Using the change of basis formulas from the previous section, we can write all of the matrices in terms of M, g, and g-1.
N = gM and Tj k = ∑gj mTm k
P = gMg-1 and Tj k = ∑gj m gk n Tm n
Q = Mg-1 and Tj k = ∑gk m Tj m
The point is that once one set of components is determined, so are all of the rest.

Higher order tensors The order or rank of a tensor is the number of indices reuired to specify its components. The tensors described above are all order 2 tensors. Vectors are order 1 tensors, and scalars are order 0 tensors. An order 4 tensor that arises in elasticity theory relates the stress and strain (deformation) tensors via a generalized Hooke's law. (See pg. 209, J. L. Synge and A. Schild, Tensor Calculus, Dover, New York, 1978.) The purely contravariant form of such a tensor would have four superscript indices,
Tijkl.
To change to a mixed form where the second index is covariant but the others are contravariant, one need only multiply by gm k and sum over k. This results in lowering the third index.
Tijml = ∑k gm k Tijkl
To move to a completely covariant form, one uses the same operation on all indices,
Tmnpq = ∑ijkl gm i gn j gp k gq l Tijkl
The process may be reversed. So if we start with the third index being covariant and the remaining contravariant, we can raise the index as follows:
Tijkl = ∑m gk mTijml.

Summary of vectors and tensors under coordinate transformations

The bases B′ and B′r The setting here is that we start with underlying coordinates (q1, q2, q3) and make a change to the set (q'1, q'2, q'3). The key in all of this is to first find out how the coordinate vectors for B and B' are related. This is derived from the Jacobian matrix of the transformation of coordinates. Namely, we have
dq′ j = ∑(∂q′ j/∂qk) dqk.
In terms of matrices, this equation means that displacements transform this way:
[dx]B′ = ∂q′/∂q [dx]B.
Note that by reversing the roles of q and q′, we also have
[dx]B = ∂q/∂q′ [dx]B′.
Of course, the matrices ∂q′/∂q and ∂q/∂q′ are actually inverses of each other. We have derived the relationships among various components of vectors earlier. In tensor notation, these are
v′j = ∑∂q k/∂q′ j vk (equivalently, [v]B′r = (∂q/∂q′)T [v]Br
ej = ∑∂q k/∂q′ j ek
e j = ∑∂q′ j/∂qk ek

Tensors of order 2 We can now look at how oreder 2 tensors transform. In particular, let us see what happens to the matrix representing T relative to the input and output bases B. This was the matrix M with mixed components
Tj k .
Changing to the B′ system, we have from our standard change-of-basis methods
M' = ∂q′/∂q M (∂q′/∂q )-1 = ∂q′/∂q M ∂q/∂q′
It then follows that
T′ j k = ∑(∂q′ j/∂qm) (∂q n/∂q′ k) Tm n
One also has that these hold:
N′ = ∂q/∂q′T N ∂q/∂q′ and T′j k = ∑(∂q m/∂q′ j)(∂q n/∂q′ k) Tm n
P′ = ∂q/∂q′T P ∂q′/∂qT and T′j k = ∑(∂q m/∂q′ j) (∂q′ k/∂qn) Tm n
Q′ = ∂q′/∂q Q ∂q′/∂qT and T′ j k = ∑(∂q′ j/∂qm)(∂q′ k/∂qn) Tm n

Higher order tensors Consider a rank 4 tensor Tijml. Using this tensor, we will illustrate how the components change if we change coordinates to q′ j. The result is
T′ ijkl = ∑ (∂q′ i/∂qm) (∂q′ j/∂qn) (∂q p/∂q′ k) (∂q′ l/∂qr) T mnpr
Other cases are treated analogously.

The differential of a scalar quantity Temperature is a good example of a scalar quantity. To avoid notational problems, we will use τ to designate it. We certainly know that the temperature can vary from point to point in a region. We want to measure how much it changes if we move from a given point x to a nearby point x+dx. Recall from several variable calculus that
τ(x+dx) = τ(x) + ∇τ(x)·dx + o(|dx|)
where o(|dx|) represents terms that vanish faster than |dx|. The linear term
dτ = ∇τ·dx
is called the differential of τ at x, and ∇&tau = ∇&tau(x) is the gradient of τ.

The gradient in generalized coordinates Let us introduce generalized coordinates, x = x(q1,q2,q3). Thus we may regard τ as a function of the q's. Again appealing to several variable calculus, we have
dτ = ∑j (∂τ/∂qj) dqj
= ∑jk (∂τ/∂qj) dqk δjk
= ∑jk (∂τ/∂qj) dqk ej·ek
= (∑j(∂τ/∂qj)ej) · (∑kekdqk)
= (∑j(∂τ/∂qj)ej) ·dx
From this, we see that the gradient has the following expression in generalized coordinates:
∇τ = ∑j(∂τ/∂qj)ej.
For example, in cylindrical coordinates, we would have
∇τ = ∂τ/∂r er + ∂τ/∂θ eθ + ∂τ/∂z ez.

21 November

Line integrals

Curves and Green's Theorem  To state Green's theorem, we need to discuss simple, closed curves. These are closed curves, like circles, but the do not intersect themselves. Rectangles, triangles, circles, and ellipses are simple closed curves; figure eights are not. Simple closed curves divide the plane into two nonoverlapping regions, one interior and the other exterior. It forms the boundary of both regions. We will consider simple closed curves that are piecewise smooth, which just means that we are allowing a finite number of corners. We also say that a simple closed curve is positively oriented if it is travered in the counterclockwise direction. Here is the statement of Green's Theorem:
Green's Theorem Let C be a piecewise smooth simple closed curve that is the boundary of its interior region R. If F(x,y) = A(x,y)i + B(x,y)j is a vector-valued function that is continuously differentiable on and in C, then
Surface integrals

Surfaces  See my notes, Surfaces. In addition to discussing ways of representing surfaces, we discussed computing surface area elements, normals to surfaces, and related topics.

Flux integrals  Consider the steady state velocity field V(x) of a fluid. We want to calculate the amount of fluid crossing a surface parametrized by x = x(u1, u2). Let f1 and f2 be partials of x with respect to the parameters u1and u2. We consider an element of surface area, shown below as the base of the parallelepiped. Our first step is to calculate the fluid crossing this surface element. In time t to t+dt, the volume of fluid crossing the base of the parallelepiped equals its volume, (Vdt)·f1×f2 du1du2.

Vdt
f1du1                         f2du2

The mass of the fluid crossing the base in time t to dt is then density×volume, or
Vdt)·f1×f2 du1du2
Thus the mass per unit time crossing the base is F·N du1du2, where F = µV, and N = f1×f2 is the standard normal. Recall that the area of the surface element is dS = |N|du1du2. Consequently the mass per unit time crossing the base is F·n dS, where n is the unit normal. Integrating over the whole surface then yields

This surface integral is called the the flux of the vector field F.

26 November

Relating line integrals to surface integrals: Stokes's Theorem

The curl and Stokes' Theorem  Let S be a surface in 3D bounded by a simple closed curve C. We will not be absolutely precise here. One should think of S as a butterfly net, with C as its rim. Such a surface is orientable, and we always have a consistent piecewise continuous unit normal n defined on S. We say that C is positively oriented if in traversing C with the surface on our left, we are standing in the direction of n.

To state this theorem, we also need to define the curl of a vector field
F(x)=A(x,y,z)i + B(x,y,z)j +C(x,y,z)k.
We will assume that F has continuous partial derivatives. The curl is then defined by

There is a useful physical interpretation for the curl. Suppose that a fluid is rotating about a fixed axis with angular velocity ω. Define ω to be the vector with magnitude ω and with direction along the axis of rotation. The velocity of an element of the fluid located at the position with radius vector x is v(x) = ω×x. With a little work, one can show that ω = ½∇×v. Thus one half of the curl of the velocity vector v is the vector ω mentioned above.

Stokes' Theorem  Let S be an orientable surface bounded by a simple closed positively oriented curve C. If F is a continuously differentiable vector-valued function defined in a region containing S, then

Example  Verify Stokes's Theorem for the vector field F(x) = 2yi + 3xj - z2k over the surface s, where S is the upper half of a sphere x2+y2+z2 = 9 and C is its boundary in the xy-plane, the circle x2+y2 = 9. C is traversed counterclockwise.

We will first compute the line integral over C. In the xy-plane, C is parameterized via
x(t) = 3 cos(t) i + 3 sin(t) j,   0 ≤ t ≤ 2π,
and so we have:

dx = (- 3 sin(t) i + 3 cos(t) j)dt
F(x(t)) = 2·3 sin(t)i + 32 cos(t)j - 02k = 2·3 sin(t)i + 32 cos(t)j
F·dx = (- 18 sin2(t) + 27 cos2(t))dt
CF·dx = ∫0(- 18 sin2(t) + 27 cos2(t))dt = 9π
We now turn to finding the surface integral. ∫∫S ∇×F·n dσ. The normal compatible with the orientation of C is n = x/|x| = x/3. Thus, on the surface of the hemisphere S, we have
n = x(θ,φ)/3 = (3 sin(θ)cos(φ) i + 3 sin(θ)sin(φ) j + 3 cos(θ) k)/3,
and hence,
n = sin(θ)cos(φ) i + sin(θ)cos(φ) j + cos(θ) k)
where 0 ≤ θ ≤ ½π and 0 ≤ φ ≤ 2π. Also, the area element is dσ = 32sin(θ)dφdθ. Moreover, it is easy to show that &nabla×F = k. We are now ready to do the surface integral involved:
∫∫S ∇×F·n dσ = ∫∫S k·n
∫∫S ∇×F·n dσ = ∫00½π cos(θ) 32 sin(θ)dφdθ
∫∫S ∇×F·n dσ = 9π
Since both terms in Stokes's Theorem have the same value, we have verified the theorem in this case.

Relating surface integrals to volume integrals: the Divergence Theorem

The divergence of a vector field and the Divergence Theorem  The divergence of a vector field F(x)=A(x,y,z)i + B(x,y,z)j +C(x,y,z)k is defined by

Like the curl, the divergence of F has a physical interpretation in terms of fluids. This will be made clearer later. Here is the statement of the Divergence Theorem.

Divergence Theorem  Let V be region in 3D bounded by a closed, piecewise smooth, orientable surface S; let the outward-drawn normal be n. Then,

Example  Verify the Divergence Theorem for the surface integral ∫∫S F·ndσ, where F = 3xi+yj+2zk and S is the surface of the closed cylinder (including caps) x2+y2 = 16, 0 ≤ z ≤ 5. The normal is outward drawn.

To do this we must compute both integrals in the Divergence Theorem. We will first do the volume integral. It is easy to check that ∇·F = 3+1+2=6. Hence, we have that
∫∫∫V·FdV = ∫∫∫V6dV = 6π·42·5 = 480π

The surface integral must be broken into three parts: one for the top cap, a second for the curved sides, and a third for the bottom cap.
∫∫S F·ndσ = ∫∫top F·ndσ + ∫∫sides F·ndσ + ∫∫bottom F·n
The outward normals for the top and bottom caps are k and −k, respectively. For the top (z = 5), we are integrating F(x,y,5)·k = 2·5 = 10, and for the bottom (z = 0), F(x,y,0)·(−k) = −2·0 = 0. Hence, we have
∫∫top F·ndσ = ∫∫top 10dσ = 10π42 = 160π
∫∫bottom F·ndσ = ∫∫bottom 0dσ = 0
The integral over the curved sides will require a little more effort. The outward normal (see my notes, Surfaces, pg. 5) and area element are, respectively,
n = cos(θ)i + sin(θ)j and dσ = 4dθdz.
In addition, on the curved sides
F(4cos(θ),4sin(θ),z) = 12cos(θ)i + sin(θ)j + 2zk, so F·n = 12cos2(θ) + 4sin2(θ).
The surface integral over the curved sides is then given by
∫∫sides F·ndσ = ∫050 (12cos2(θ) + 4sin2(θ))4dθdz = 5·4(12π+4π)= 320π.
Combining these three integrals, we obtain
∫∫S F·ndσ = 160π+320π + 0 = 480π,
which agrees with the result from the volume integral. Thus we have verified the Divergence Theorem in this case.

Equation of continuity for fluids  Suppose that in a region a fluid has a velocity field v(x,t) and density ρ(x,t), and that there are no sources or sinks in the region. Recall that last class we showed that the amount of fluid crossing a surface in the direction n per unit time is the flux,
Φ = ∫∫Sρv·ndσ.
If S is a closed surface forming the boundary of a volume V, then Φ is the negative of the total rate of change of mass in the fluid in V. Thus, we have the equation
∫∫Sρv·ndσ = − d/dt ∫∫∫VρdV = − ∫∫∫V∂ρ/∂t dV.
If we use the divergence theorem to replace the surface integral by a volume integral, we obtain
∫∫∫V·v)dV = − ∫∫∫V∂ρ/∂t dV,
and, consequently, that
∫∫∫V (∇·v)+∂ρ/∂t) dV = 0
holds for every choice of V within the region under consideration. If we take V to be a small sphere of radius ε and center x, then the limit as ε tends to 0 of
V-1∫∫∫V (∇·v)+∂ρ/∂t) dV
is ∇·v)+∂ρ/∂t. On the other hand, this limit is of course 0. Therefore,
·v)+∂ρ/∂t = 0.
This partial differential equation is called the equation of continuity.

3 December

Divergence, Laplacian, and Curl in General Coordinates and Spherical Coordinates

Divergence
Laplacian
Curl

Heat flow in a body

Derivation of the heat equation  See § VI.2 in Zachmanoglou and Thoe (Z/T). The steady state version of this equation is Laplace's equation. The heat equation comes from considering energy balance. To specify a temperature in a body, we must also take into account two other factors: the past history of the body and the interaction of the body with the environment. For materials without "memory," the past history is adequately described via specifying the temperature throughout the body at some initial time, say t = 0. The interaction of the the environment with the body is modeled through the use of boundary conditions.

Types of boundary conditions  There are three common types of boundary conditions.

1. Dirichlet boundary conditions. These specify the temperature on the surface of the body. For example, putting an object in ice keeps the temperature at its surface at 0 degrees C.

2. Neumann boundary conditions. These specify the flow of heat across the boundary using Fourier's law. That is, they specify n·∇u on the surface of the body. An insulated boundary would have no heat flow, and one would require n·∇u = 0.

3. Robin boundary conditions. These specify a linear combination of u and n·∇u on the surface. They come from Newton's law of cooling, for example.

5 December

Classification of partial differential equations

General second order linear PDEs  In addition to the heat equation and Laplace's equation, which we derived earlier, there is a third important type of PDE, the wave equation.
c-22u/∂t2 = ∇2u.
These three equations are special cases of three general types of second order linear PDEs. The most general form of a second order linear PDE is
∑ajk2u/∂xj∂xk + ∑bj∂u/∂xj + cu +d = 0.
The classification scheme is based on the signs of the eigenvalues of the symmetric matrix A with entries ajk. In the table below, we classify PDEs for three space variables (x= x1, y = x2, z = x3) and one time variable (t = x4).

Classification of PDEs

Type Eigenvalues of A Example Variables
Parabolic +++0 Heat equation 3 space, 1 time
Elliptic +++  Laplace's equation 3 space
Hyperbolic +++- Wave equation 3 space, 1 time

We remark that if the general PDE is multiplied by a minus sign, the patterns in the table will have "+" replaced by"−". In general, the solutions to the various types of equations behave like the corresponding example. For instance, hyperbolic equations have solutions that propagate in time, like those for the wave equation, while parabolic equations have solutions that behave like temperature in a heat flow problem. For further discussion, see section V.8 in Z/T.

Separation of variables

Laplace's equation  We want to solve for the steady state temperature u in a disk of radius r = a, given Dirichlet boundary conditions. (See also section V.7 in Z/T). We will use polar coordinates. The precise problem is this:
2u = r-1∂/∂r[r∂u/∂r] + r-22u/∂θ2 = 0
u(a,θ) = f(θ) = known temperature on boundary (Dirichlet boundary condition).
Because we are using polar coordinates, which are singular at r = 0 and have a discontinuity at θ = ±π, we have two additional "boundary conditions" -- namely that u(r,θ) is well behaved (bounded) as r approaches 0 and that u is 2π periodic in θ.

If we ignore the nonhomogeneous boundary condition, u(a,θ) = f(θ), then the set of solutions is a vector space. Our aim is to construct a basis for this space. Separation of variables is a method for finding a basis. Once we have accomplished this, we then find the linear combination that also satisfies the nonhomogeneous condition.

Separating variables  We begin by looking for special solutions to the homogeneous problem,
r-1∂/∂r[r∂u/∂r] + r-22u/∂θ2 = 0
u(r,θ) is bounded as r approaches 0
u(r,θ) is 2π periodic in θ.
The solutions that we want have the form u(r,θ) = R(r)Θ(θ). Plugging into the equation gives us
r-1[rR′]′Θ + r-2RΘ″ = 0
If we now multiply this equation by r2 and divide by RΘ, we arrive at this equation:
r[rR′]′/R + Θ″/Θ = 0
Since r[rR′]′/R is a function of r only, and since Θ″/Θ is a function of θ only, it follows that both are constant. If we let μ = r[rR′]′/R, then Θ″/Θ = -μ. With a little algebra, we obtain the separation equations,
r[rR′]′ - μR = 0 and Θ″ + μΘ = 0.

The eigenvalue problem  We now turn to the two remaining conditions. The first of these will be satisfied if R(r) is chosen so as to be continuous at r = 0. We will deal with it later. The condition that u(r,θ) be 2π periodic implies that Θ(θ) is 2π periodic. This imposes restrictions on the possible values of μ and gives us the following eigenvalue problem:
Find all possible values of μ for which the problem
Θ″ + μΘ = 0 and Θ(θ) = Θ(θ+2π)
has a nonzero solution Θ. These values of μ are called eigenvalues, while the corresponding solutions Θ are called eigenfunctions.
We can immediately eliminate μ < 0. The solutions to Θ″ − |μ|Θ = 0 are linear combinations of exp(±|μ|½θ), which always will blow up as θ approaches either +∞ or −∞ or both. They therefore cannot be periodic. For μ = 0, we do have a single periodic solution, namely Θ = 1. The second solution is Θ(θ) = θ, which is not periodic.

This leaves the case in which μ > 0. The differential equation Θ″ + μΘ = 0 has two solutions, sin(μ½θ) and cos(μ½θ). These solutions are periodic with fundamental period 2πμ−½. They will also have 2π as a period if and only if some integer multiple of 2πμ−½ is 2π. Thus, we μ > 0 is an eigenvalue if and only if there is an integer n > 0 such that 2πμ−½n = 2π. It follows that μ = n2 and that Θ(θ) is a linear combination of sin(nθ) and cos(nθ).

Solution to the Eigenvalue Problem

Eigenvalues μ   Eigenfunctions &Theta(θ)
02 1
12 cos(θ), sin(θ)
22 cos(2θ), sin(2θ)
n2 cos(nθ), sin(nθ)

Separation solutions  We still have to find the radial solutions corresponding the eigenvalues we found previously. For μ = 0, the radial equation is r[rR′]′ = 0. dividing by r, we get [rR′]′ = 0, so rR′ = C = constant, and R′ = C/r. Integrating this gives R(r) = Cln(r) + D, where D is another constant. The only solution that behaves nicely at r = 0 is R(r) = constant; that is, any multiple of R(r) = 1.

When μ = n2, n ≥ 1, R satisfies the equation r[rR′]′ - n2R = 0. Working out the derivatives, we see that this is the equation

r2R″ + rR′ - n2R = 0,
which is a Cauchy-Euler equation. The technique for solving it is to use assume a solution of the form R = rα and determine α. Carrying this out, we obtain
α(α−1)r2rα−2 + αrrα−1 − n2rα = 0
(α(α−1) + α − n2)rα = 0
2 − n2)rα = 0
Dividing the last equation by rα, we see that α2 − n2 = 0, and so α = ±n and the possible solutions are linear combinations of rn and r−n. Of these two, only rn is bounded as r approaches 0. Thus, only R(r) = rn can be used. The separation solutions that we have obtained are listed in the table below.

Separation Solutions

n   R(r) Θ(θ) u = Rθ
0 1 1 1
1 r cos(θ), sin(θ) r cos(θ), r sin(θ)
2 r2 cos(2θ), sin(2θ)   r2cos(2θ), r2sin(2θ)
n rn cos(nθ), sin(nθ)    rncos(nθ), rnsin(nθ)

Matching the nonhomogeneous conditions  We may think of the separation solutions as forming a basis for the solution space. The general solution is thus
u(r,θ) = A0 + ∑n≥1(Anrncos(nθ) + Bnrnsin(nθ)).
To match the boundary condition u(a,θ) = f(θ), we need to find coefficients such that
f(θ) = A0 + ∑n≥1(Anancos(nθ) + Bnansin(nθ))
holds. We have already seen that we can represent f this way via its Fourier series. Indeed, this type of problem was Fourier's motivation for introducing such series! All we need to do now is to identify the Fourier coefficients for f with the coefficients above: an = Anan and bn = Bnan. The final solution is then
u(r,θ) = a0 + ∑n≥1(r/a)n(ancos(nθ) + bnsin(nθ)),
where an and bn are the Fourier coefficients for f.

An example  Recall that we have calculated the Fourier series for the periodic function f(θ), which is defined by f(θ) = |θ| for −π ≤ θ ≤ π. The series we found was
f(θ) = ½π - (4/π)∑k≥1 (2k−1)−2cos((2k−1)θ)
By what we said above, the temperature u(r,θ) corresponding to this f is
u(r,θ) = ½π - (4/π)∑k≥1 (r/a)2k−1 (2k−1)−2 cos((2k−1)θ)

Separation of variables in a vibrating string problem  We also discussed separation of variables for the problem of a vibrating string with ends clamped. This is covered in section VIII.8 of Z/T.

10 December
Vibrations in Finite Regions

Separation of variables See VIII.10 in Z/T.
Eigenfunction expansions See VIII.10, Theorem 10.1.
Vibrations in a circular membrane See VIII.10, Example 10.3.