Math 603-601 - Fall 2002

Summary

Dates

SEP	3	5	10	12	17	19	24	26
OCT	1	3	8	10	15	17	22	24	29	31
NOV	5	7	12	14	19	21	26	28
DEC	3	5	10

3 September

Vector spaces

Definition - Vector space. A vector space is a set V together with two operations, + and · . If u, v are in V, then u + v is in V; if c is a scalar, then c·v is in V. The operations satisfy the following rules.

Addition Scalar multiplication

u + (v + w) = (u + v) + w a·(b·u) = (ab)·u

Identity: u + 0 = 0 + u = u (a + b)·u = a· u + b·u

Inverse: u + (-u) = (-u) + u = 0 a·(u + v) = a·u + a·v

u + v = v + u 1·u = u

Addition		Scalar multiplication
u + (v + w) = (u + v) + w		a·(b·u) = (ab)·u
Identity: u + 0 = 0 + u = u		(a + b)·u = a· u + b·u
Inverse: u + (-u) = (-u) + u = 0		a·(u + v) = a·u + a·v
u + v = v + u		1·u = u

Subspaces

Definition - Subspace. A nonempty subset U of V is a subspace if, under + and · from V, U is a vector space in its own right.

Theorem. U is a subspace of V if and only if these hold:

0 is in U.
U is closed under + .
U is closed under · .

Example: U={(x₁, x₂, x₃) | 2x₁ + 3x₂ - x₃ = 0} is a subspace of R³. On the other hand, W={(x₁, x₂, x₃) | 2x₁ +3x₂ - x₃ = 1} is not a subspace of R³, because 0 is not in W.

Important vector spaces

Displacements in space, forces, velocities, accelerations, etc.

+ parallelogram law
· usual scalar multiplication

Rⁿ (real scalars) or Cⁿ (complex scalars) - n×1 real or complex matrices (i.e., columns; can also work with rows).

+ component of sum is sum of components
· multiply each component by the scalar

P_n = {a₀ + a₁ x + a₂x² + ... + a_nxⁿ }, the polynomials of degree n or less.

+ polynomial addition
· multiply each term by the scalar

Spaces of functions f : X -> scalars, where X can be anything. Think of f(x) as the ``x component of f.''

+ is given by (f+g)(x) = f(x) + g(x) (``component of sum is sum of components'')
· is given by (c·f)(x) = c(f(x)) (``multiply each component by the scalar'')

Various subspaces of the vector spaces of the type f : X -> scalars

C[a,b], all functions continuous on the interval [a,b]. (Could be complex valued).
C^(k)[a,b], all continuous functions having derivatives continuous through order k.

5 September

Span

Definition - Linear combination. Let v₁ ... v_n be vectors in a vector space V. A vector of the form c₁v₁ + ... + c_n v_n is called a linear combination of the v_j's.

Definition - Span. Let S={v₁ ... v_n} be a subset of a vector space V. The span of S is the set of linear combinations of vectors in S. That is,
U=span(S)={c₁v₁ + ... + c_n v_n},
where the c_js are arbitrary scalars.

Proposition. The set span(S) is a subspace of V.

Examples.

The polynomials of degree n or less, P_n = span{1, x, x², ..., xⁿ }
The set of 3D displacements = span{ i, j, k}
The set of solutions to y'' + y = 0 is span{ sin(x), cos(x) }.
U={(x₁, x₂, x₃) | 2x₁ + 3x₂ - x₃ = 0} = span{ (1,0,2), (0,1,3) }. In addition, we also have U = span{ (1,-1,-1), (1,2,7), ((2,-1,1) }.

Terminology If a vector space V = span(S), we say that V is spanned by S, or V is the span of S, or S spans V.

Linear independence and dependence

Definition - Linear independence and linear dependence. We say that a set of vectors
S = {v₁, v₂, ... , v_n}
is linearly independent (LI) if the equation
c₁v₁ + c₂v₂ + ... + c_nv_n = 0
has only c₁ = c₂ = ... = c_n = 0 as a solution. If it has solutions different from this one, then the set S is said to be linearly dependent (LD).

Examples.

{1, x, x², ..., xⁿ } is linearly independent.
{1 + x, 1 - x, 1} is linearly dependent.
{ i, j, k} is linearly independent.
{ (1,0,2), (0,1,3) } is linearly independent and { (1,-1,-1), (1,2,7), ((2,-1,1) } is linearly dependent.
Any set of three vectors in R² is linearly dependent.

Inheritance properties of LI and LD sets. Every subset of a linearly independent set is linearly independent. Every set that contains a linearly dependent set is linearly dependent.

Basis and dimension

Classification of vector spaces. If a vector space V has no limit to the size of its linearly independent sets, it is said to be infinite dimensional. Otherwise, V is said to be finite dimensional. 2D and 3D displacements are finite dimensional spaces. C[0,1] is infinite dimensional, because it contains {1, x, x², ..., x^k } for all k.

When V is finite dimensional, its LI sets cannot be arbitrarily large. Suppose that n is the maximum number of vectors an LI set in V can have. A linearly independent set S in V having n vectors in it will be called maximal.

Proposition. If a vector space V is finite dimensional, then any maximal linearly independent set of vectors
S = {v₁, v₂, ... , v_n}
spans V.

Proof. Add any vector v in V to S to form the new set
{v, v₁, v₂, ... , v_n}
This augmented set is LD, because it has n+1 > n vectors in it. Thus we can find coefficients c, c₁, c₂, ... , c_n, at least one of which is not 0, such that
c v + c₁v₁ + ... + c_n v_n = 0.
There are two possibilities. Either c is 0 or c is not 0. In case c = 0, we have c₁v₁ + ... + c_nv_n = 0. The set S is LI, so this implies c₁ = c₂ = ... = c_n = 0. But this means all of the coefficients vanish, contradicting the fact that one of them does not vanish. The only possibility left is that c is not 0. We can then divide by it in our previous equation and rewrite that equation as
v = (- c^-1c₁)v₁ + ... + (- c^-1c_n)v_n .
This shows that S spans V. Our proof is complete.

Definition - Basis. We say that a set of vectors
S = {v₁, v₂, ... , v_n}
in a vector space V is a basis for V if S is both linearly independent and spans V.

Corollary. A linearly independent set is a basis for a finite dimensional vector space if and only if it is maximal.

Definition - Dimension. The dimension of a vector space V is the maximum number of vectors in an LI set. Equivalently, it is the number of vectors common to every basis for V.

10 September

Coordinates and bases

Coordinates for a vector space. Assigning coordinates amounts to providing a correspondence
v <-> (c¹, c², ... , cⁿ)
that is 1:1, onto, and preserves vector addition and scalar multiplication. Consider a basis {v₁, v₂, ... , v_n} for a vector space V. We also assume that the basis is ordered, in the sense that we keep track of which vector is first, or second, etc.

Theorem. Let B={v₁, v₂, ... , v_n} be an ordered basis for a vector space V. Every vector v in V can be written in one and only one way as a linear combination of vectors in B. That is,
v = c¹v₁ +...+ cⁿv_n
where the coefficients are unique.

Proof. Since B is a basis, it spans V. Consequently, we have scalars c¹, ..., cⁿ such that the representation above holds. We want to show that this representation is unique - i,e,, no other set of scalars can be used to represent v. Keeping this in mind, suppose that we also have the representation
v = d¹v₁ +...+ dⁿv_n,
where the coefficients are allowed to be different from the c^k's. Subtracting the two representations for v yields
0 = (c¹ - d¹) v₁ +...+ (cⁿ - dⁿ) v_n.
Now, B is a basis for V, and is therefore LI; the last equation implies that c¹ - d¹ = 0, c² - d² = 0, ..., cⁿ - dⁿ = 0. That is, the c's and d's are the same, and so the representation is unique.

This theorem gives us a way to assign coordinates to V, for the correspondence
v= c¹v₁ +...+ cⁿv_n <-> (c¹, c², ... , cⁿ)
it sets up is both 1:1 and onto. The condition of linear independence gives us that it is 1:1, and the condition of spanning gives us that it is onto. It is also easy to show that this correspondence preserves addition and scalar multiplication, which are the properties needed in defining "good" coordinates for a vector space.

Definition - Coordinate Vector. Given an ordered basis B = {v₁ ... v_n} and a vector v = c¹v₁ +...+ cⁿv_n, we say that the column vector [v]_B = [c¹, ..., cⁿ]^T is the coordinate vector for v, and that c¹, c², ..., cⁿ are the coordinates for v.

Definition - Isomorphism. Let U and V be vector spaces. A correspondence between U and V
u <-> v
that is 1:1, onto, and preserves vector addition and scalar multiplication is called an isomorphism between U and V, and the two spaces are said to be isomorphic.

The word isomorphism comes from two Greek words, "isos," which means "same", and "morphy," which means "form." As far as vector space operations go, two isomorphic vector spaces have the "same form" and behave the same way. Essentially, the spaces are the same thing, just with different labels. For example,a basis in one space corresponds to a basis in the other. Indeed, any property in one space that only involves vector addition and scalar multiplication will hold in the other. This makes the following theorem, which is a consequence of what we said above concerning coordinates, very important.

Theorem. Every n-dimensional vector space is isomorphic to Rⁿ or Cⁿ, depending on whether the set of scalars is R or C.

Examples. The space P₂ of polynomials of degree 2 or less has B = { 1, x, x² } as a basis, and so it is three dimensional (B has three vectors). We take the scalars to be real. Relative to this basis, we have
[p]_B = [ a₁ + a₂x + a₃x² ]_B = [a₁ a₂ a₃]^T.
Thus, for example, we have these:
[1-x+x²]_B = [1 -1 1]^T
[-3x+4-2x²]_B = [4-3x-2x²]_B = [4 -3 -2]^T
[(2-x)²]_B = [4-4x+x²]_B = [4 -4 1]^T
Changing the ordering of the basis vectors changes the ordering of the coordinates. Using the basis C = { x² 1, x }, we have
[-3x+4-2x²]_C = [-2x²+4-3x]_C = [-2 4 -3]^T,
which is a reordering of the coordinate vector relative to B. Be aware that order matters!

This isn't the only isomorphism between P₂ and R³. Recall that a quadratic polynomial is determined by its values at three distinct values of x; for instance, x=-1, 0, and 1. Also, we are free to assign whatever values we please at these points, and we can get a quadratic that passes through them. Thus, the correspondence
p <-> [p(-1) p(0) p(1)]^T
between P₂ and R³ is both 1:1 and onto. It is easy to show that it also preserves addition and scalar multiplication, so it is another isomorphism between P₂ and R³. Let's use this to find a new basis for P₂. (Remember, a basis in one isomorphic space corresponds to a basis in the other.) Since { [ 1 0 0 ]^T, [ 0 1 0 ]^T, [ 0 0 1 ]^T } is a basis for R³, the set of polynomials
C = { p₁(x) = -½x + ½x², p₂(x) = 1, p₃(x) = ½x + ½x² },
which satisfy
p₁(-1) = 1, p₁(0) = 0, p₁(1) = 0,
p₂(-1) = 0, p₂(0) = 1, p₂(1) = 0,
p₃(-1) = 0, p₃(0) = 0, p₁(1) = 1,
is another basis for P₂. This raises the question of how the coordinate vectors [p]_C and [p]_B are related.

12 September

Change-of-bases

Let B = {v₁ ... v_n} and D = {w₁ ... w_n} be ordered bases for a vector space V. Suppose that we have these formulas for v's in terms of w's and vice versa:

v_j=A¹_jw₁ + A²_jw₂ + ... + Aⁿ_jw_n
w_k=C¹_kv₁ + C²_kv₂ + ... + Cⁿ_kv_n
(Note that the sums are over the row index for each matrix A and C.) For any vector v with representations
v = b¹v₁ +...+ bⁿv_n
v = d¹w₁ +...+ dⁿw_n
and corresponding coordinate vectors
[v]_B = [b¹,..., bⁿ]^T
[v]_D = [d¹,..., dⁿ]^T
we have the change-of-basis formulas
[v]_D = A[v]_B and [v]_B = C[v]_D.
These imply that AC=CA=I_n×n, so C=A^-1 and A=C^-1 .

For purposes of comparison, we want to write out the expressions for the coordinate changes. Writing the d's in terms of the b's, we have
d^k = A^k₁ b¹ + ... + A^k_n bⁿ.
Going the other way, we can write the b's in terms of the d's,
b^k = C^k₁ d¹ + ... + C^k_n dⁿ.
We note that quantities transforming according to the formula for bases are called covariant, and quantities transforming like the coordinates are called contravariant.

Examples

R². Let B={v₁=i, v₂=j} and let D={w₁=i - j, w₂=2i + 3j}. Note that we have
w₁=v₁ - v₂
w₂=2v₁ + 3v₂
Hence, we have that C¹₁ = 1, C²₁ = -1, C¹₂ = 2, and C²₂ = 3. Consequently,
b¹ = d¹ + 2d²
b² = -d¹ + 3d²
We remark that the matrix C =
```
1  2
-1 3
```
and the matrix A = C^-1 =
```
3/5  -2/5
1/5   1/5
```
P_n. We will consider the n = 2 case. Consider the two ordered bases,
B = {1,x,x²} and D = {(3-x)², x+2, x-1}. Because we have
w₁ = (3-x)² = 9 - 6x + x² = 9v₁ - 6v₂ + v₃
w₂ = 2+x =2v₁ + v₂ + 0v₃
w₃ = (-1)+x = - v₁ + v₂ + 0v₃,
The matrix that takes coordinates relative to D into ones relative to B is C =
```
  9      2   -1
 -6      1    1
  1      0    0 
```
The matrix that takes coordinates relative to B into ones relative to D is A =
```
  0      0    1
 1/3    1/3  -1
-1/3    2/3   7
```

Solving a System of ODEs

Consider the following system of ordinary differential equations, relative to x¹-x² coordinates.
dx¹/dt = 3x¹ + 2x²
dx²/dt = 2x¹ + 3x²
We can turn this into a very simple decoupled system if we change from x¹-x² coordinates to the u¹-u² set defined via these equations:
x¹ = u¹ + u²
x² = u¹ - u² .
Afrter some algebra, the system of ODEs becomes
du¹/dt = 5u¹
du²/dt = u² ,
which can be easily solved.

17 September

Dual space

Definition - Dual Space. The set V^* of all linear functions L:V - > R (or C) is called the (algebraic) dual of V. The term linear means that
L(au + bv) = aL(u) + bL(v)
holds for all scalars a,b and vectors u, v.

Terminology. Linear functions in the dual space are called linear functionals, to distinguish them from other types of linear functions. They are also called or 1-forms or co-vectors.

Proposition. V^* is a subspace of all functions mapping V to the scalars.

Proof. We leave this as an exercise.

A simple physical example is the work W done by a force f applied at a point and producing a displacement s. Here, the work is given by W=L(s) = f·s. The point is that is if we fix the force, then the work is a linear function of the displacement. Note that forces and displacements have different units and are thus in different vector spaces, even though the spaces are isomorphic.

Another simple example that frequently comes up is multiplication of a column vector X by a row vector Y. The linear functional in this case is just L(X) = Y X. Our final example concerns C[0,1]. L[f] = ₀S¹ f(x)dx is a linear functional.

Dual Basis

Let V be an n-dimensional vector space. We want to construct a basis for V^*. Let B = {v₁ ... v_n} be a basis for V. We may uniquely write any v in V as
v = x¹ v₁+ ...+ xⁿ v_n
Now, if L is a linear functional (i.e., it is in V^*), we also have
L(v) = x¹ L(v₁)+ ...+ xⁿL(v_n).
Thus knowing L(v_j) for j=1 ... n completely specifies what L(v) is. Conversely, given scalars {y₁, ..., y_n}, one can show that
L(v) = x¹y₁ + ...+ xⁿy_n,
where the x^j's are the components of relative to B, defines a linear functional. As before, L(v_j) = y_j. In summary, we have established this.

Theorem. Let V be a vector space with a basis B = {v₁ ... v_n}. If L is a linear functional in V^*, then
L(v) = x¹y₁ + ...+ xⁿy_n, where y_j = L(v_j)
Conversely, given scalars {y₁, ..., y_n}, the formula for L above defines a linear functional in V^*, where again L(v_j) = y_j.

We can use the theorem we just obtained to define n linear functionals {v¹ ... vⁿ} via

To make this clearer, let's look at what v¹ does to vectors. If we take a vector v = x¹ v₁+ ...+ xⁿ v_n, then
v¹(v) = x¹v¹(v₁) + x²v¹(v₂) + ... + xⁿv¹(v_n) = x¹·1 + x²·0 + ... +xⁿ·0 = x¹.
A similar calculation shows that v²(v) = x², v³(v) = x³, ..., vⁿ(v) = xⁿ. This means that we can write L(v) = x¹y₁ + ...+ xⁿy_n as
L(v) = y₁v¹(v) + ... + y_nvⁿ(v)
= (y₁v¹ + ... +y_nvⁿ)(v)
Now the two sides are equal for all values of the argument, so they are the same function. That is, L = y_jv¹ +...+ y_nvⁿ. Hence, the set B^* = {v¹ ... vⁿ} spans V^*. The set is also linearly independent. If 0 = y_jv¹ +...+ y_nvⁿ, then 0=0(v_j) = y_j. Hence, the only y_j's that give 0 are all 0. Summarizing, we have obtained this result.

Theorem. If V is an n-dimensional vector space, and if B = {v₁ ... v_n} is a basis for V, then the dual space V^* is also n-dimensional and B^* = {v¹ ... vⁿ} is a basis for V^*.

Definition - Dual Basis. The basis B^* is called the dual basis for B.

Inner Product

Definition - Inner product Let V be a real vector space. We say that a mapping < , > : V×V --> R is an inner product for V if these hold:

positivity - <v,v> > 0, with <v,v> = 0 implying that v=0.
symmetry - <u,v> = <v,u>
homogeneity - <cu,v > = c<u,v >
additivity - < u+v,w> = <u,w> + <v,w>

Definition - Norm The quantity ||v|| := (<v,v>)^½ is called the norm or length of a vector v.

Schwarz's inequality: |<u,v>| <= ||u|| ||v||.

Schwarz's inequality shows that the quotient |<u,v>| ÷ ||u|| ||v|| is always between -1 and 1. Consequently, we may define an angle between vectors to be cos^-1(<u,v>(||u|| ||v||)^-1). The norm or length of a vector ||v|| satisfies three important properties.

positivity - ||v|| > 0, unless v = 0

positive homogeneity - ||cv|| = |c| ||v||, where c is any scalar.

The triangle inequality: ||u+v|| <= ||u|| + ||v||

19 September

Inner product spaces

Definition - Inner product space A vector space together with an inner product defined on it is called an inner product space.

Examples We verified in detail that the following are inner products on the spaces listed. In particular, we motivated the selecting the inner product on C[a,b] by working with the one for Rⁿ, modifying it, and letting n tend to infinity.

Rⁿ, with <x,y> = y^Tx
C[a,b], with < f,g > = _aS^b f(x)g(x)dx

Norm (length) and angle. Here is a simple problem: Let x=[1 -1 0 1]^T and x=[1 1 1 1]^T. In the inner product for R⁴, we want to find ||x||, ||y|| and the angle between x and y. Solution: ||x|| = (x^Tx)^½ = 3^½. ||y|| = (y^Ty)^½ = 4^½ = 2. Now, <x,y> = 1. Thus, the angle between them is cos^-1(1/(2·3^½)) = 1.28 radians.

Orthogonal vectors. From the definition of the angle between two vectors, we can rewrite an inner product in a form familiar from the definition of the "dot product" of vectors in 2D and 3D. If we let t be the angle between u and v, then
<u,v> = ||u|| ||v|| cos(t),
If neither of the vectors are 0, then t = π/2 if and only if the inner product on the left is 0. With this in mind, we say that two vectors u,v in an inner product space V are orthogonal or perpendicular whenever <u,v> = 0.

24 September

Orthogonal and orthonormal sets

Definition - Orthogonal set A finite or infinite set of nonzero vectors {v₁, v₂, v₃, v₄, ... } in an inner product space V is an orthogonal set if <v_j,v_k> = 0 whenever j is not equal to k.

Examples

3D vectors. {i, j, k}
Column vectors. {[1 1 1]^T, [1 0 -1]^T,[1 -2 1]^T}
Polynomials. {1,x,3x²-1}, where
Continuous, periodic functions. {1, sin(x), cos(x), sin(2x), cos(2x), sin(3x), cos(3x), ...}, where

To do the integrals involved, we used the following product-to-sum trigonometric identities:
cos(A)cos(B) = ½(cos(A+B) + cos(A-B))
sin(A)sin(B) = ½(cos(A-B) - cos(A+B))

Definition - Orthonormal set An orthogonal set in which all vectors have length 1 is called an orthonormal set. That is, <v_j,v_k> = 0 whenever j is not equal to k and <v_j,v_j> = 1 for all j.

One can always convert an orthogonal set into an orthonormal set. We simply divide each vector in the set by the norm (length) of its length. Here are results for the examples above.

3D vectors. {i, j, k} (No change; the original set is orthonormal.)
Column vectors. {[3^-½ 3^-½ 3^-½]^T, [2^-½ 0 -2^-½ ]^T, [6^-½ -2·6^-½ 6^-½ ]^T}
Polynomials. {2^-½, (3/2)^½x, (5/2)^½(½)(3x² - 1)}
Continuous, periodic functions.

Proposition. Every set of nonzero, orthogonal vectors is linearly independent. Consequently, every orthonormal set is linearly indepedent.

Othogonal and orthonormal bases

Proposition. Suppose that V has dim(V) = n. Every set of nonzero, orthogonal vectors is a basis for V if and only if it contains n vectors.

Corollary. Suppose that V has dim(V) = n. Every orthonormal set is a basis for V if and only if contains n vectors.

Coordinates. Orthogonal and orthonormal bases are very useful because one can easily find coordinates of a vector relative to them. To see how this works, let's start with an orthogonal basis for an n dimensional vector space V,
B = {v₁, v₂, v₃, ..., v_n}.
Any vector v in V can be uniquely written as
v = x¹v₁ + x²v₂ + x³v₃ + ... + xⁿv_n.
To find the x^j's, we form the inner products <v,v_j>. For example,
<v,v₁> = < x¹v₁ + x²v₂ + x³v₃ + ... + xⁿv_n,v₁>
<v,v₁> = x¹ < v₁, v₁ > + x² < v₂, v₁ > + x³ < v₃, v₁ > + ... + xⁿ < v_n, v₁ >
<v,v₁> = x¹ || v₁ ||² + x²·0 + x³·0 + ... + xⁿ·0
<v,v₁> = x¹ || v₁ ||²
Consequently, we have that x¹ = <v,v₁> || v₁ ||^-2. A similar calculation gives us that, for j = 1, ..., n,
x^j = <v,v_j> || v_j ||^-2.

Things are even simpler in an orthonormal basis. If we let B = {u₁, u₂, u₃, ..., u_n} be orthonormal, then || u_j || = 1, and
x^j = <v,u_j>.
This is familiar from 3D vectors, with {i, j, k} being the orthonormal basis.

26 September

Orthonormal bases and inner products

Proposition. Let B = {u₁, u₂, u₃, ..., u_n} be an orthonormal basis for a real inner product space V. If v and w in V have coordinate vectors [v]_B and [w]_B relative to B, then
<v,w> = [w]_B^T [v]_B

We now want to address what happens if we change from B to a new orthonormal basis, B' = {u'₁, u'₂, u'₃, ..., u'_n}.
u'_j=A¹_ju₁ + A²_ju₂ + ... + Aⁿ_ju_n.
If the matrix A has j,k entry A^k_j, then the coordinate vectors transform according to the rule [v]_B = A[v]_B'. By our previous proposition, we thus have that
<v,w> = [w]_B^T [v]_B = [w]_B'^T A^T A[v]_B'
On the other hand, the proposition applies directly to the basis B' itself. Hence, <v,w> = [w]_B'^T [v]_B'. Combining these two equations then gives us
[w]_B'^T [v]_B' = [w]_B'^T A^T A[v]_B',
which holds for any choice of vectors v and w.

We are interested in getting the components of A^TA. To do this, choose w = u'_j and v = u'_k. The coordinate vectors for these are [w]_B' = [u'_j]_B' = e_j and [v]_B' = [u'_k]_B' = e_k. Inserting these in the equation above gives us
e_j^T e_k = e_j^T A^TA e_k
This implies that the (j,k) entry in A^TA is 1 if j=k and 0 if j is not equal to k. But these are exactly the entries in the n×n identity matrix I. Thus, we have shown that A^TA = I.

Orthogonal matrices

Definition - Orthogonal matrix An n×n matrix A is said to be orthogonal if A^TA = I.

Proposition. The following are equivalent.

A^TA = I
A^T = A^-1
The columns of A form an orthonormal basis for Rⁿ
The rows of A form orthonormal basis for Rⁿ.
In Rⁿ, the length of Ax is the same as x, and the angle between Ax and Ay is the same as that between x and y.

Euler angles

Rotations and reflections. A change from one orthonormal basis to another is accomplished by an orthogonal matrix. This change of basis leaves lengths and angles invariant, and in 3D represents a rotation or reflection. For a 3×3 real matrix, being orthogonal implies six equations for the nine entries in A. To parametrize the 3×3 orthogonal matrices should then require three variables. These variables are the Euler angles, and they come from three rotations. The angles are called the precession, nutation, and pure rotation. A diagram may be found in the book by Borisenko and Tarapov.

1 October

Rotations about a coordinate axis

Rotation about the z-axis Here, the z-axis is fixed, and we want to rotate our axes counterclockwise though an angle t. Let B = {i, j, k}, and B' = {i', j', k'}. The relationship between these two bases is

i' = cos(t) i + sin(t) j
j' = -sin(t) i + cos(t) j
k' = k
The matrix A for which [v]_B = A[v]_B' is A = R_z(t) =

cos(t)  -sin(t)  0
sin(t)   cos(t)  0
0        0       1

By simplying relabeling the axes, we can obtain formulae for rotations with the x or y axis fixed. The one for a counterclockwise rotation about the x-axis through an angle t is A = R_x(t) =

1   0        0
0   cos(t)  -sin(t)
0   sin(t)   cos(t)

The rotation matrix for the Euler angles The matrix A that takes B' coordinates into B coordinates, that is [v]_B = A[v]_B', is just a product of three matrices:

A = R_z(precession) R_x(nutation) R_z(pure rotation)
This discussion is based on that given in the book by H. Goldstein: Classical Mechanics, Addison-Wesley, Reading, MA, 1965.

Gram-Schmidt Process

Algorithm Let V be a vector space with an inner product < u, v >, and let {v₁ ... v_n} be a linearly independent set. We want to find an orthonormal set with the same span.

u₁ := v₁ (|| v₁ ||)^-1 equivalently v₁ = r₁₁u₁, r₁₁ = || v₁ ||
u₂ := (v₂ - < v₂, u₁ > u₁) r₂₂^-1 equivalently v₂ = r₁₂u₁ + r₂₂u₁,
where r₁₂ = < v₂, u₁ > and r₂₂ = || v₂ - < v₂, u₁ > u₁ ||
u_k := (v_k - < v_k, u₁ > u₁ - ... - < v_k, u_k-1 > u_k-1 ) r_kk^-1 equivalently v_k = r_1ku₁ + ... + r_kku_k,
where r_jk = < v_k, u_j > and r_kk = || v_k - < v_k, u₁ > u₁ - ... - < v_k, u_k-1 > u_k-1 ||.
Repeat the step above for k = 3, 4, ... n. The result is an orthonormal (o.n.) basis that replaces the v_j's.

Example The Gram-Schmidt process takes {1,x,x²} into {2^-½, (3/2)^½x, (5/8)^½(3x² - 1)}, provided the inner product on P₂ is

QR-Factorization Suppose m>=n. Let A be an m×n matrix with linearly independent columns {v₁, ..., v_n}; that is,
A = [v₁ ... v_n].
Apply the Gram-Schmidt process to the v_j's. The result is that for k = 1, ..., n, we have
v_k = r_1ku₁ + ... + r_kku_k,
where the u_j's are also column vectors. Put the u_j's in an m×n matrix
Q = [u₁ ... u_n].
Letting R =

r₁₁ r₁₂ r₁₃ ... r_1n
0 r₂₂ r₂₃ ... r_2n
0  0 r₃₃ ... r_3n
...
0  0 0 ... r_nn

we can write the equations that give v_k's as linear combinations of u_j's in matrix form, A = QR. This is the QR factorization. Whem m = n, the matrix Q is orthogonal. If m > n, then Q satisfies Q^TQ = I_n×n. However, QQ^T will not be the m×m identity I_m×m

Example Find the QR factorization of the matrix A =

 1  1
-2  0
 2  1

We start with v₁ = [1 -2 2]^T v₂ = [1 0 1]^T. Following the Gram-Schmidt procedure, we get
u₁ = (1/3)[1 -2 2]^T, or v₁ = 3u₁,
so r₁₁ = 3. Next, we will find u₂. To do this, we first compute
v₂ - (u₁^Tv₂)u₁ = [1 0 1]^T - (3/3)*(1/3)*[1 -2 2]^T = [2 2 1]^T/3
Clearly, the length of [2 2 1]^T/3 is r₂₂ = 1, and so u₂ = v₂ - (u₁^Tv₂)u₁ = v₂ - u₁. Hence, v₂ = u₁ + u₂. Thus, r₁₂ = 1, and R =

 3  1
 0  1

and the matrix Q =

 1/3   2/3
-2/3   2/3
 2/3   1/3

3 October

Least squares approximation

Discrete least squares problems The simplest example of this is fitting a straight line to data. For instance, suppose we have measured the log of the concentration of some chemical and listed the data in the table below.

Log of Concentration
t 0 1 2 3 4

ln(C) -0.1 -0.4 -0.8 -1.1 -1.5

**Log of Concentration**
t	0	1	2	3	4
ln(C)	-0.1	-0.4	-0.8	-1.1	-1.5

We know that the law of decay tells us that ln(C) = -r*t + ln(C₀), so the the data should lie on a straight line. Of course, they don't; experimental errors offset the points. The question is, what are values for r and ln(C₀) that will fit a straight line to the data? A more general problem is this. At times t₁, t₂, ..., t_n, we have measurements y₁, y₂, ..., y_n. Find a line y = a*t + b that fits the data.

One good way to solve this problem is the method of least squares. Let
E² = (|y₁ - a*t₁ -b|² + |y₂ - a*t₂ -b|² + ... + |y_n - a*t_n -b|²)/n
The quantity E is the root mean square of all of the errors y_j - a*t_j -b at each time t_j. The idea is to choose a and b so as to minimize E. That is, we will try to find a and b that give the least value for the sum of the squares of the errors. We can put this in the form of an inner product. Define the following column vectors in Rⁿ.
v = [y₁ y₂ ... y_n]^T
v₁ = [t₁ t₂ ... t_n]^T
v₂ = [1 1 ... 1]^T
Notice that the j^th component of the vector v - av₁ - bv₂ is just the difference y_j - a*t_j -b. Form this it follows that
E² = ||v - av₁ - bv₂||²/n
Now, n is simply the number of data points, so it is fixed. Consequently, minimizing E then is equivalent to minimizing the the distance from v to the space spanned by v₁ and v₂. Put a little differently, minimizing E is equivalent to finding the v* in span{v₁, v₂} that comes closest to v or best approximates v.

Continuous least squares problems There is a continuous version of the discrete problem described above. In many applications, we are given a complicated function and we want to approximate it with sums of simpler functions. A familiar example is the Taylor series for e^x, where we are approximating e^x by a sum of powers of x. Another example is using sines and cosines to approximate a signal in order to find its frequency content. This is one of the applications of Fourier series.

Suppose that we are given a continuous function f(x) on the interval [0,1] that has an upward trend or bias to it. One way to measure this is to fit a straight line to the function f. The difference here is that we know f at every x in [0,1]. The discrete square error E² goes over to an integral,
E² = ₀S¹ (f(x) - ax -b)²dx .
If we use the inner product < f,g > = ₀S¹ f(x)g(x)dx, then E² = || f(x) - ax -b||², and the problem again goes over to finding the best approximation to f from the span{1,x}, relative to the norm from our inner product < f,g >.

One can carry this further. If f(x) has not only an upward trend, but is also concave up, then it makes sense to fit a quadratic to it. The problem described above would change to finding the quadratic polynomial f*(x) in span{1,x,x²} that minimizes || f(x) - a₀ - a₁x - a₂x² ||².

Least squares problems and inner products All of the problems that we have described above have been put in terms of inner products. Here is the general form of these problems. Suppose that we have an inner product space V, a vector v in V, and a subspace U of V. The least-squares problem is to find both the minimum of || v - u ||, where u is any vector in U, as well as any minimizer v* in U.

Normal equations

Theorem. Let V be a vector space with an inner product < u, v >, and let U be a subspace of V. A vector v* in U minimizes the distance || v - u || if and only if v* satisfies the normal equations,
< v - v*, u > = 0,
which hold for all u in U. That is, v - v* is orthogonal to the whole space U. In addition, v* is unique.

Proof. Let's first show that if v* in U minimizes || v - u ||, then it satisfies the normal equations. The way we do this is similar to the way we proved Schwarz's inequality. Fix u in U and define
q(t) := || v - v* + t u) ||² = || v - v* ||² + 2t < v - v*, u > + t² || u ||²
Because v* minimizes || v - u ||² over all u in U, the minimum of q(t) is at t = 0. This means that t = 0 is a critical point for q(t), so q'(0) = 0. Calulating q'(0) then gives us 2< v - v*, u > = 0 for all u in U. Dividing by 2 yields the normal equations.

Conversely, if v* in U satisfies < v - v*, u > = 0, then we will show not only that v* is a minimizer, but also that it is the minimizer; that is, v* is unique. To do this, let u be any vector in U. Observe that we can write v - u = v - v* + v* - u = v - v* + u', where u' := v* - u is in U. Consequently, we also have
|| v - u ||² = ||v - v* + u' ||²
|| v - u ||² = ||v - v* ||² + 2< v - v*, u' > + || u' ||².
Since we are assuming that v - v* is orthogonal to every vector in U, it is orthogonal to u'; hence, < v - v*, u' > =0, and so have that
|| v - u ||² = ||v - v* ||² + || u' ||².
It follows that || v - u || >= ||v - v* ||, so that v* is a minimizer. Now, if equality holds, that is, if || v - u || = ||v - v* ||, then we also have || u' || = 0. Consequently, u' = 0. But then, we have to have u = v*. So, the vector v* is unique.

8 October

Normal equations and bases.

Normal equations relative to a basis. The normal equations are geometric conditions that can be used to directly find the minimizer. When the subspace U has dimension n, they reduce to a set of n equations involving basis vectors.

Corollary If B = {w₁ ... w_n} is a basis for the subspace U, then the normal equations are equivalent to the set
< v - v*, w_k > = 0, k = 1 ... n.

Proof. If the normal equations are satisfied, they hold for every vector in U, including the basis vectors. Thus, the equations above have to hold, too. On the other hand, suppose the equations above are satisfied. We can write any vector u in U as u = c₁ w₁+ ... + c_nw_n. It then follows from the equations above that
< v - v*, u > = c₁ < v - v*,w₁ > + ... + c_n < v - v*,w_n > = c₁·0 + ... + c_n·0 = 0,
and so the normal equations hold.

Finding the minimizer - orthonormal case. The normal equations are geometric conditions that can be used to directly find the minimizer v*. When an orthonormal basis is for U is known, the answer is simple, and can be explicitly written down. We just need to apply the last corollary. Take B = {u₁ ... u_n} to be an orthonormal basis for the subspace U. The normal equations relative to this basis are < v - v*, u_k > = 0 or, equivalently,
< v, u_k > = <v*, u_k >,
for k=1 .. n. By the formula the coordinates of a vector relative to an orthonormal basis, we see that <v*, u_k > is just the kth coordinate of v*. It follows that the minimizer is given by
v* = < v , u₁ > u₁+ ... + < v , u_n > u_n
It's worth noting that this is the first time we have actually shown that there is a minimizer. Of course, by what we have said above, it's unique. We will use this fact later.

An Example. Suppose that f(x) = e^x on [-1,1]. What straight line gives the best least squares fit? The first thing to do is identify the subspace. Here U =span{1,x}. We know that {2^-½, (3/2)^½x} form an orthonormal basis for U relative to the inner product

Here, we interpret p and q as continuous functions, rather than polynomials. Doing a little calculus, we obtain
< e^x, 2^-½ > = 2^-½(e - e^-1) = 2^½sinh(1)
< e^x, (3/2)^½x > = 6^½ e^-1.
Applying the formula we derived, we get f*(x) = 2^½sinh(1)·2^-½ + 6^½ e^-1·(3/2)^½x = sinh(1) + 3e^-1 x. The function and the line are plotted below.

Finding the minimizer - non-orthogonal case. Often, it is useful to use a non-orthogonal basis in solving a least squares problem. Let B = {w₁ ... w_n} be a basis for the subspace U. We showed that the normal equations, which determine the minimizer v*, have the form
< v - v*, w_k > = 0, k = 1 ... n.
Since v* is in U, we can represent it in terms of this basis, v* = c₁ w₁+ ... + c_nw_n. The normal equations imply that the coefficients c_j satisfy the matrix equation
Gc=d, where G_jk= < w_k, w_j >, d_j = < v , w_j >.
The matrix G is called the Gram matrix for the basis of w's; it is always invertible, because the normal equations always have a unique solution, as we saw above in connection with the case of an orthonormal basis.

Example We will set up the normal equations for fitting a straight line to data y₁, y₂, ..., y_n taken at times t₁ < t₂ < ... < t_n. The vector space V is Rⁿ, with the usual inner product, < x , y > = y^Tx. The vector v = [y₁, y₂, ..., y_n]^T. The subspace U is the span of vectors
w₁ = [t₁ t₂ ... t_n]^T
w₂ = [1 1 ... 1]^T.
These are linearly independent vectors because the t_j's are all distinct. Consequently, we may take B= {w₁, w₂} as our basis. The Gram matrix in this case is has entries
G₁₁ = w₁^T w₁ = t₁² + t₂² + ... + t_n²
G₁₂ = w₁^T w₂ = t₁ + t₂ + ... + t_n
G₂₁ = w₂^T w₁ = t₁ + t₂ + ... + t_n
G₂₂ = w₂^T w₂ = n
The vector d has entries
d₁ = w₁^Tv = t₁y₁ + t₂y₂ + ... + t_ny_n
d₂ = w₂^Tv = y₁ + y₂ + ... + y_n.
The nice feature here is that the coefficient vector [c₁ c₂]^T has the slope and intercept of the line for its entries.

The normal equations and the QR factorization Suppose that V is Rⁿ and U has a basis B = {w₁, ..., w_m} of m column vectors, where m lt; n. Given v in V, We want to minimize ||v - u|| over all u in U. As we have seen, the unique minimizer
v* = c₁ w₁+ ... + c_nw_n.
Instead of employing the Gram matrix, as we did earlier, we use our "basic matrix trick," and write v* as a matrix product,
v* = Wc,
where W = [w₁ ... w_m] is an n×m matrix with linearly independent columns. Now, carry out a QR factorization and write W = QR, where R is an invertible m×m upper triangular matrix and where the columns of Q are form an orthonormal set {u₁ = Qe₁, ..., u_m = Qe_m} that is also a basis for U. The normal equations, relative to the basis comprising the columns of Q, become, for k = 1, ..., n,

< v - Wc, Qe_k > = 0
e_k^TQ^T(v - QRc) = 0 (since W =QR)
e_k(Q^Tv - Q^TQRc) = 0
e_k(Q^Tv - Rc) = 0 (since Q^TQ = I).
From this, it follows that all m components of the column vector Q^Tv - Rc are 0; hence,
Rc = Q^Tv.
These are yet another form of the normal equations. They are especially useful from a numerical point of view. The matrix R is upper triangular and Rc = Q^Tv can be solved quickly. For a discussion, see: G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed., Johns Hopkins Press, Baltimore, 1996.

Series of orthogonal functions and least squares

Least squares approximation Consider the following continuous least squares problem. Start with a function f defined on [-1,1], which we can think of as continuous, and suppose that we want to fit not only a straight line to f, but also a quadratic, cubic, quartic, and so on. In other words, we want to find the degree n polynomial that gives the best least squares fit to the function f over the interval [-1,1].

The orthonormal set of Legendre polynomials is formed by using the Gram-Schmidt process on {1, x, x², x³, ...} relative to the inner product

We denote these polynomials by {p₀, p₁, p₂, p₃, ...}. Earlier, we had seen that
p₀ = 2^-½, p₁ = (3/2)^½x, p₂ = (5/8)^½(3x² - 1).
We remark that there are similar formulas for all of the orthonormal Legendre polynomials.

For each n, we look at the subspace U_n = span{p₀, p₁, p₂, ..., p_n}. Of course, U_n = P_n, the polynomials of degree n or less. In our least squares minimization problem, we identify f with v and the u_k's with the p_k's. The minimizer for U_n is
f*_n = < f, p₀ > p₀+ ... + < f, p_n > p_n.

The minimizers f*_n change with n in a very simple way. Namely, to go from n to n+1, we only need to add a term to the previous minimizer. If we formally let n tend to infinity, then we get the infinite series
< f, p₀ > p₀ + < f, p₁ > p₁ + < f, p₂ > p₂ + < f, p₃ > p₃ + ...
for which the minimizer f*_n is the nth partial sum.

Let the minimum error over U_n be E_n = || f - f*_n ||. Because U_n = P_n is contained in U_n+1 = P_n+1, we must have E_n+1 <= E_n. That is, E_n decreases as n gets bigger. Does the E_n go to 0 as n -> infinity? If it does, we say that f*_n converges in the mean. We also say that the series converges in the mean to f, and we also write
f = < f, p₀ > p₀ + < f, p₁ > p₁ + < f, p₂ > p₂ + < f, p₃ > p₃ + ... .

Theorem The series above converges in the mean to f if and only if
|| f ||² = |< f, p₀ >|² + |< f, p₁ >|² + |< f, p₂ >|² + |< f, p₃ >|² + ...
This formula is called Parseval's equation, and first appeared in connection with Fourier series.

Orthonormal series What we just said applies to any infinite set of orthonormal functions, including the trigonometric functions

Both Fourier series and series of Legendre polynomials converge in the mean whenever the square of the appropriate norm, || f ||², is finite.

Fourier series. The Fourier series for a 2 periodic function f is usually written as

where the coefficients are given by

As an example, we calculated the Fourier series for the 2 periodic extension of the function |x| defined on the interval [-,]. The resulting series was
½ - 4^-1 (cos(x) + 3^-2cos(3x) + 5^-2cos(5x) + 7^-2cos(7x) + ...)

10 October

Numerical Demonstrations

We used MATLAB to do several examples of finding the best continuous least-squares fit using Legendre polynomials. We also looked at the partial sums of the Fourier series for the 2 periodic extension of |x|.

Linear transformations

Definition A mapping L:V -> W, where V, W are vector spaces is said to be a linear transformation if it satisfies these properties.

Homogeneity L[cu] = cL[u]
Additivity L[u+v] = L[u] + L[v]

Simple properties

L[ c₁v₁ + c₂v₂ + ... + c_nv_n ] = c₁L[v₁] + c₂L[v₂] + ... + c_nL[v_n]
L[ 0_V] = 0_W

Theorem (Matrix associated with L) Let V and W be finite dimensional, and let B = {v₁, ... , v_n} and D = {w₁, ... , w_m} be bases for V and W, respectively. If L:V -> W is a linear transformation, then there is a unique m×n matrix A such that w = L[v] holds if and only if A[v]_B = [w]_D.

Proof Let v = c₁v₁ + c₂v₂ + ... + c_nv_n, so [v]_B = [c₁, ... , c_n]^T. From the first of the simple properties above, we see that
[ L[v] ]_D = c₁[ L[v₁] ]_D + c₂[ L[v₂] ]_D + ... + c_n[ L[v_n] ]_D
Now, by our "basic matrix trick," we can write this as a matrix product,
[ L[v] ]_D = A[v]_B, where
A = [ [ L[v₁] ]_D, ... , [ L[v_n] ]_D ]
is the m×n matrix we wanted. This matrix is unique because its columns are the unique coordinates for L[v₂] ] relative to the basis D.

A Eample Consider the following problem. Let V = W = P₂ have the common basis B = D = {1,x,x²}. Suppose that L:P₂ -> P₂ is a linear transformation given by
L[p] = x²p'' + (2x+1)p' + 3p,
Find the matrix A that represenets L. In addition, find the solution to L[p] = 18x² - x + 2.

To find A, we first find the output of L applied to each basis vector; that is, L[1], L[x], and L[x²]. Doing this, we obtain L[1] = 3, L[x] = 1 + 5x, and L[x²] = 2x + 9x². By the construction in the theorem, the kth column of A is the coordinate vector [ L[v_k] ]_D. Consequently, we have
[ L[1] ]_D = [3 0 0]^T
[ L[x] ]_D = [1 5 0]^T
[ L[x²] ]_D = [0 2 9]^T
We have now found that the matrix A =

3 1 0
0 5 2
0 0 9

To solve L[p] = 18x² - x + 2, we go over to the matrix form of the equation,
A[p]_B = [18x² - x + 2]_B = [2 -5 18]^T.
Solving this equation, we get
[p]_B = A^-1[2 -1 18]^T = [1 -1 2]^T
Putting this back in terms of polynomials, we arrive at p(x) = 2 - x + x²

15 October

Subspaces associated with a linear transformation L : V -> W

Domain. The vector space V is called the domain.
Co-domain. The vector space W is called the co-domain. (This is sometimes called the range.)
Null space. The set of all vectors null(L) := {v in V: L[v] = 0}; it can be shown to be a subspace of V.
Image. The set of all vectors image(L) := {w in W: w = L[v] for some v in V}; it can be shown to be a subspace of W. (This set is also sometimes called the range.)

Examples

ODEs Consider the ODE y"+p(x)y'+q(x)y = g(x). We convert this to operator form L[y]=g, where L[y] = y"+p(x)y'+q(x)y. In this case, the domain V is C⁽²⁾(R), the space of twice continuously differentiable functions on R and the co-domain W is C(R), the space of continuous functions on R. The null space, null(L), is the set of solutions to the homogeneous equations. With a little work, one can show that image(L) is also C(R).
Matrices Consider the 2×4 matrix M =
```
1 -1 2 3
1  1 1 4
```
We let L be the transformation L[v] = Mv, where v is in R⁴. For this problem, V = R⁴. The co-domain or set of outputs W is R². Here, we also have image(L) = R². The null space is the set of all v for which Mv = 0. Using row-reduction, one can show that
null(L) = span{[-3 1 2 0]^T, [-7 -1 0 2]^T}

Combinations of linear transformations

Sums K+L is defined by (K+L)[v] = K[v] + L[v].

Scalar multiples cL is defined by (cL)[v] = c(L[v]).

Products If K : V -> U and L : U -> W are linear, then we define LK via LK[v] = L[K[v]]. (This is composition of functions). The transformation defined this way, LK, is linear, and maps V -> W. Note: LK is not in general equal to KL, which may not even be defined.

Inverses Let L : V -> V be linear. As a function, if L is both one-to-one and onto, then it has an inverse K V -> V. One can show that K is linear, and LK = KL = I, the identity transformation. We write K = L^-1.

Associated matrices Recall if B = {v₁, ... , v_n} and D = {w₁, ... , w_m} be bases for V and W, respectively, then the matrix associated with the linear transformation L : V -> W is
M_L = [ [ L[v₁] ]_D, ... , [ L[v_n] ]_D ]
Since each of the combinations listed above is still a linear transformation, it will have a matrix associated with it. Here is how the various matrices are related.

M_{K + L} = M_K + M_L

M_cL = c M_L

M_LK = M_L M_K

M_L^-1 = (M_L)^-1

Polynomials in L : V -> V We define powers of L in the usual way: L² = LL, L³ = LLL, and so on. A polynomial in L is then the transformation
p(L) = a₀I + a₁L + ... + a_mL^m
Later on we will encounter the Cayley-Hamilton theorem, which says that if V has dimension n, then there is a degree n (or less) polynomial p for which p(L) is the 0 transformation.

Change-of-basis

Change-of-basis and linear transformations Let L : V -> V be linear. Here L maps V into itself. We want to look at what happens when to the matrix of L when we make a change of basis in V. Let V have bases
B = {v₁, ... , v_n} and B' = {v'₁, ... , v'_n}
If the matrix of L relative to B is M_L, and that relative to B' is M'_L, then
M'_L = S_{B -> B'} M_L (S_{B ->
B'})^-1, where S_{B -> B'} changes B coordinates to B' coordinates.

17 October

Eigenvalues, eigenvectors, and eigenspaces

Eigenvalue problems We say that a scalar µ is an eigenvalue of L : V -> V if there is exists a vector v in V, with v not equal to 0, such that L[v] = µ v. The vector v is called an eigenvector of µ. We let E_µ be set eigenvectors associated with µ, along with 0. E_µ is a subspace of V that is called the eigenspace of µ.

Trajectories of a velocity field Suppose that we are given a time-independent velocity field for a 2D fluid, V(x). Assume that V is a linear function of x, so that V(x) = L[x], where L is a linear transformation taking 2D to 2D. We want to find the trajectories of the fluid particles, dx/dt = V(x) = L[x], given that we know the initial position x(0). Let B = {i,j} be the usual basis for 2D. Relative to this basis L will have a matrix A. so our problem becomes [dx/dt]_B = A[x]_B. Since the basis B is time-independent, [dx/dt]_B = d[x]_B/dt. If we let X = X(t) = [x]_B=[x₁ x₂]^T, we arrive at the system
dX/dt = AX, where X(0)= [x(0)]_B.
To make the situation more concrete, we will assume that A =

1  4
1 -2

When this system is written out for this choice of A, it looks like
dx₁/dt = x₁ + 4x₂
dx₂/dt = x₁ - 2x₂
The idea here is to switch to a basis where the system decouples. To accomplish this, we will use a basis of eigenvectors for A, {[4 1]^T, [-1 1]^T}. These correspond to the eigenvalues of A µ = 2 and µ = -3, respectively. (We will discuss how to get these later). Relative to our original 2D space, we write this basis as B' = {i+4j, -i+j}. The change of basis matrix S = S_B'
->B is given by S =

4 -1
1  1

Of course, S_{B ->B'} = S^-1. Relative to the new basis, the matrix of L, A' = M'_L = S^-1AS =

2 0 
0 -3

Letting Z = [x]_B' = [z₁ z₂]^T, we have the new system dZ/dt = A'Z, with Z(0) = [x(0)]_B'. In the new coordinates the system decouples and becomes
dz₁/dt = 2z₁
dz₂/dt = - 3z₂
This decoupled system is easy to solve. The result is
z₁(t) = e^2tz₁(0)
z₂(t) = e^-3tz₂(0)
Transforming back to the original coordinates, we have that X = SD(t)S^-1X(0), where D(t) =

exp(2t)   0  
0         exp(-3t)

This explicitly solves the problem. However, we still have to explain how to find the eigenvalues and eigenvectors of A.

Finding eigenvalues and eigenvectors Let A be an n× n matrix, and define p_A(µ) = det(A - µ I). One can show that p_A is a polynomial of degree n. The scalar µ is an eigenvalue of A if and only if it is a root of p_A. This follows from two obeservations. First, µ is an eigenvalue of A if and only if for some x not equal to 0 Ax = µ x; this in turn is equivalent to (A - µ I)x = 0, so A - µ I is singular. Second, A - µ I is singular if and only if det(A - µ I) = 0.

For A, the problem of finding eigenvalues and eigenvectors decouples. One can first find the roots of p_A(µ), and then solve a linear system to get the eigenvectors.
If the scalars are the complex numbers, then the Fundamental Theorem of Algebra implies that p_A has at least one root. Consequently, A has at least one eigenvalue.
To find the eigenvalues and eigenvectors of a linear transformation L, choose a basis for V and find the matrix of L relative to this basis, M_L. The eigenvalues of L are precisely those of M_L; the eigenvectors of L have coordinates corresponding to the eigenvectors of M_L.

Returning to our earlier trajectory problem in which A =

1  4
1 -2

we see that p_A(µ) = det(A - µ I) = µ² + µ - 6 = (µ+3)( µ-2). Thus the eigenvalues are µ = 2 and µ = -3. Now, we can solve for the eigenvectors. When µ = 2, the corresponding eigenvector X satisfies (A - 2I)X = 0. In augmented form, this becomes

-1   4  0
 1  -4  0

which has X = x₂·[4 1]^T as a solution. Repeating the argument for µ = -3 results in X = x₂·[-1 1]^T. To get the basis we want, we choose x₂ = 1 in both cases. Other nonzero values will work equally well.

22 October

Diagonalization

Diagonlizable linear transforms A linear transformation T : V -> V is diagonalizable if and only if there is a basis B for V comprising only eigenvectors of L. When the happens M_L, the matrix of L relative to the basis B will be a diagonal matrix. Conversely, if there is a basis relative to which the matrix of L, M_L, is diagonal, then that basis will be composed of eigenvectors.

A non-diagonalizable linear transformation The shear transformation L[x] = x + ax₂i, where a > 0, has the matrix M_L =

1 a
0 1

This matrix is not diagonalizable. It's only eigenvalue is µ=1 and the only eigenvectors have the form c·[1 0]^T, where c is a scalar. There is no second linearly independent eigenvector, and so there is no basis of eigenvectors.

Applications

Normal modes of a spring system We consider a horizontal spring system consisting of three identical springs (constant = k) and two identical masses (mass = m), all attached to wall mounts. The displacements from rest of the first and second masses are x₁ and x₂, respectively. Newton's laws for the system give these equations of motion
md²x₁/dt² = -2kx₁ + kx₂
md²x₂/dt² = kx₁ - 2kx₂
We can put these equations in matrix form. Let X = [x₁ x₂]^T. The equations become the single matrix equation
md²X/dt² = - kAX, where the matrix A =

 2  -1
-1   2

A normal mode of this system is a solution of the form X(t) = X_we^iwt, where X_w is independent of t and not equal to 0. Plugging this solution back into the matrix equation yields, after cancelling e^iwt,
AX_w = (mw²/k) X_w
We see that X_w is an eigenvector of A corresponding to the eigenvalue µ = mw²/k. Thus we need to find the eigenvalues and eigenvectors of A. As usual, we obtain the characteristic polynomial:
p_A(µ) = det(A - µI) = (2 - µ)²+1.
The roots are µ = 1 and µ = 3, and the corresponding eigenvectors are [1 1]^T and [-1 1]^T. In writing the normal modes, we use both e^±iwt or, equivalently, cos(wt) and sin(wt).
[1 1]^Texp(±i(k/m)^½t) frequency = (k/m)^½
[-1 1]^Texp(±i(3k/m)^½t) frequency = (3k/m)^½
Physically, in the lower frequency mode both masses move in tandem together, -> -> or <- <-. In the higher frequency mode, they move exactly opposite each other, -> <- or <- ->.

A circuit We want to analyze the circuit shown below. In the circuit, E(t) is a voltage source, R1 and R2 are resistors, L is a coil, and C is a capacitor. The state variables for this system are I=I_L, the current through L, and V = V_C, the voltage drop across C.

Recall that we have these relations among currents and voltages in the circuit components.
V_L = L dI/dt
I_C = CdV/dt
V_R1 = R₁(I + I_C) = R₁(I + CdV/dt)
V_R2 = R₂I
Kirchoff's laws for this circuit are as follows. For the loop E-R1-C, we have
E = R₁(I + CdV/dt) + V
For the loop R2-L-C. we have
V = R₂I + L dI/dt
Rearranging these equations gives us
dI/dt = V/L - R₂I/L
dV/dt = (E - V)/(CR₁) - I/C
We want to put this in matrix form. Let X = [I V]^T, F(t) = [0 E/(CR₁)]^T, and finally let A =

- R₂/L 1/L

- 1/C - 1/(CR₁)

Doing so puts the system in the form dX/dt = F(t) + AX. Of course, we also need to know the initial conditions I(0) and V(0).

Special case - complex eigenvalues Assume that
E(t) = 0, L =1 henry, C = 1 farad, R₁ = R₂ = 1 ohm.
Then, we have
A =

-1  1
-1 -1

The characteristic polynomial for this matrix is p_A(µ) = (-1 -µ)² + 1. The two eigenvalues are then
µ_± = -1 ± i.
The augmented matrix representing the system (A -µ₊I)X = 0 is

-i   1  0
-1  -i  0

This is equivalent to the single equation -ix₁ + x₂ = 0. Up to nonzero multiples, the eigenvector for µ₊ is [1 i]^T. Either by repeating these steps with µ_- or by taking complex conjugates, we have that the eigenvector for µ_- is [1 -i]^T. As in our previous examples, we form the change-of-basis matrix S = S_{B' -> B} =

1   1
i  -i

Setting Z = SX, the system becomes
dz₁/dt = µ₊z₁ = (-1+i)z₁
dz₂/dt = µ_-z₂ = (-1-i)z₂
Consequently, z₁(t) = e^(-1+i)t z₁(0) and z₂(t) = e^(-1-i)t z₂(0). Finally, X(t) = S^-1D(t)SX(0), where D(t) =

e^(-1+i)t 0

0 e^(-1-i)t

It's easy to show that if X(0) is a real vector, then, even though complex numbers are involved, X(t) will be real.

24 October

Similarity

Similar matrices Recall that when we change coordinates, the two matrices representing a linear transformation L are related by M'_L = S^-1M_LS. We say that an n×n matrix B is similar to an n×n matrix A if there exists an invertible matrix S such that B = S^-1AS. Simple properties of similarity are listed below. We remark that these properties together make similarity an equivalence relation.

B is similar to A if and only if A is similar to B. (Thus, we can simply say that A and B are similar matrices.)
A is similar to A.
If A is similar to B, and B is similar to C, then A is similar to C.

Diagonalizable matrix A linear transformation L was defined to be diagonalizable if there was a basis relative to which its matrix M_L was diagonal. If we regard an n×n matrix A as a linear transformation, then the condition for it to be diagonalizable is that there is a matrix S such that S^-1AS is a diagonal matrix. Equivalently, A is diagonalizable if and only if it is similar to a diagonal matrix.

Determinants - A Quick Tour

Permutations Permutations of the integers 1 through n are either even or odd. A permutation is even if it can be achieved by an even number of interchanges (transpositions), and odd if it takes an odd number of them. There are n! permutations. We define the function sgn(p) to be +1 if p is an even permutation, and -1 if p is odd.

Definition of a determinant If A is an n×n matrix, then we define det(A) via
det(A) = SUM_p sgn(p) a_i1,1a_i2,2 ...a_in,n , p = (i1,i2,...,in)

Basic properties of determinants These properties follow immediately from the definition. On the other hand, they characterize the determinant. Only det(A) satisfies them.

Multilinearity. Let A =[a₁,a₂,...,a_n]. Then det([a₁,a₂,...,a_n]) is a linear function of column a_k, if all other columns are held fixed.
Alternating function. Interchanging two columns changes the sign of the determinant.
det(I) = 1, I = identity matrix.

Determinants and matrices

If two columns of A are equal, then det(A)=0.
If a column of A has all 0's, then det(A)=0.
Product rule
If A and B are n×n matrices, then det(AB)=det(A)det(B).
A is singular if and only if det(A) = 0.
det(A^-1) = (det(A))^-1
det(A^T) = det(A)
det([a₁,a₂,...,a_k-1, a_k + ca_j, a_k+1,...,a_n]) = det([a₁,a₂,...,a_n]) (j not equal to k).
det([a₁,a₂,...,ca_k, ...,a_n]) =c det([a₁,a₂,...,a_n])

Characteristic polynomial

Structure p_A(µ) = (-1)ⁿ µⁿ + (-1)^n-1 µ^n-1 trace(A) + ... + det(A), where trace(A) = a₁₁+a₂₂+...+a_nn

Degree The degree of the characteristic polynomial for an n×n matrix is precisely n.

Similar matrices If B = S^-1AS, then p_B(t) = p_A(t).

Necessary conditions for a matrix to be diagonalizable

Theorem. If an n×n matrix A has n distinct eigenvalues, then A is diagonalizable.

Proof. Let µ₁, µ₂,..., µ_n be the n distinct (possibly complex) eigenvalues of A; equivalently, these are the roots of p_A. Let v₁, v₂,..., v_n. Be n eigenvectors corresponding to these n eigenvalues. We will show that these eigenvecotrs form a linearly independent set of vectors. We start with the equation
c₁v₁ + c₂v₂ + ... + c_nv_n = 0
Consider the matrix P₁ = (µ₂I - A)( µ₃I- A)...(µ_nI - A). Using the fact that the v_k's are eigenvectors of A, we have that
P₁v_k = 0 if k=2,3,...,n
P₁v₁ = (µ₂ - µ₁)( µ₃- µ₁)...(µ_n - µ₁)v₁,
so applying P₁ to both sides of the previous equation results is
c₁(µ₂ - µ₁)( µ₃- µ₁)...(µ_n - µ₁)v₁ = 0
Since v₁ is not 0, and since the eigenvalues are distinct, we have that c₁=0. Repeating this procedure with the remaining coefficients implies that all of the c's are 0, and so the set is a basis because it is a maximal linearly independent set.

Selfadjoint matrices

Adjoints The adjoint of a linear transformation L : V -> V, where V is an inner product space, is the unique linear transformation L' that satisfies
< L'[u],v > = < u, L[v] > .
For a real m×n matrix A, the adjoint is the transpose A^T. If the matrix is complex, then the adjoint of A is A^H, the conjugate transpose. A matrix is selfadjoint or Hermitian if it is equal to its own adjoint. Thus, if A is real, A = A^T; i.e., A is symmetric.

Properties of selfadjoint matrices

The eigenvalues of a selfadjoint matrix are real.
Eigenvectors corresponding to distinct eigenvalues are orthogonal.
Every selfadjoint matrix is diagonalizable.
Every selfadjoint matrix has an orthonormal basis relative to which it is diagonal. Equivalently, for a real n×n matrix, there is an orthogonal matrix S such that S^TA S is a diagonal matrix.

Quadratic forms Consider the quadratic form
Q(X) = 5x² + 5y² + 5z² + 2xy + 2xz + 2yz = X^TAX, where X = [x y z]^T and A =

5 1 1
1 5 1
1 1 5

We wish to find coordinates relative to which the "cross terms" are removed. A set of axes that has this property is called principal. Since A is selfadjoint, it can be diagonalized by an orthogonal transformation S. That is, there is a diagonal matrix D such that D = S^TAS. Because S is orthogonal, we also have A = SDS^T. Now, let Y = [u v w]^T = S^TX. We see that
X^TAX = Y^TDY = µ₁u² + µ₂v² + µ₃w²,
and the cross terms have been transformed away by rotating and (possibly) reflecting coordinates.

We now will finish the probelm by finding the eigenvalues and eigenvectos of A. The characteristic polynomial of A is p_A(µ) = det(A - µI) =

5-µ 1   1

1   5-µ 1

1    1   5-µ

We do not change the determinant by subtracting (5-µ)× column 2 from column 1. Thus, p_A(µ) =

0         1       1

1-(5-µ)²   5-µ 1

1-(5-µ)    1   5-µ

We can factor 1-(5-µ) = µ-4 out of the first column. The result, after simplifying, is that p_A(µ) = (µ-4)×

0        1 1

6+µ 5-µ 1

1        1   5-µ

We now subtract the last column from the second. We obtain p_A(µ) = (µ-4)×

0        0    1

6+µ 4-µ 1

1    µ-4   5-µ

Next, we factor µ-4 out of the second column to get p_A(µ) = (µ-4)²×

0    0    1

6+µ 1 1

1 -1 5-µ

Evaluating the last 3×3 determinant is easily done. The final result is
p_A(µ) = (µ-4)²(7-µ)
Now we will find the eiegnvectors correspond to µ=4 and µ=7. For µ=4, the augmented matrix [A-4I|0] is

1 1 1 0

1 1 1 0

1 1 1 0

This system is equivalent to the single equation x+y+z=0. There are two linearly independent solutions:
[-1 1 0]^T and [-1 0 1]^T.
Any linear combination of these is still an eigenvector. Using the Gram-Schmidt process, we can convert this to an orthonomal set,
2^-½[-1 1 0]^T and 6^-½[-1 -1 2]^T
The eigenvector for µ=7 is found by solving the system with augmented matrix [A-7I|0]

-2 1 1 0

1 -2 1 0

1 1 -2 0

The resulting eigenvector is 3^-½[1 1 1]^T. Finally, S =

-2^-½ -6^-½ 3^-½

2^-½ -6^-½ 3^-½

0     2·6^-½ 3^-½

Relative to the new coordinates, the quadratic form becomes Y^TDY = 4u²+4v²+7w².

29 October

Triangular forms of matrices

Motivation We return to the circuit we were working with earlier. (See the class notes for October 22.) Assume that the various components have these values:
E(t) = 0, L =1 henry, C = 1 farad, R₁ = 1 ohm, R₂ = 3 ohm.
The matrix equation for the circuit is dX/dt = AX + F(t), where X = [I V]^T. With these values for the components, F(t) = 0, and
A =

-3  1
-1 -1

The characteristic polynomial for this matrix is p_A(µ) = µ² + 4µ + 4 = (µ+2)². Thus, there is only one eigenvalue, µ = -2. The augmented matrix representing the system (A -µI)X = 0 is

-1   1  0
-1   1  0

This is equivalent to the single equation -x₁ + x₂ = 0. Hence, up to nonzero multiples, the eigenvector for µ = -2 is [1 1]^T. There are no other linearly independent eigenvectors, and so A is not diagonalizable. What this means is that the system of equations doesn't completely decouple. It does partially decouple, though. Consider a new basis B' = {[1 1]^T, [0 1]^T}. As before, let S = S_B'->B =

1  0
1  1

Setting Z = SX, the system becomes dZ/dt = S^-1ASZ. Doing a little matrix algebra, we see that S^-1AS =

-2  1
 0 -2

This is called the Jordan normal (canonical) form of A. The new system is
dz₁/dt = -2z₁ + z₂
dz₂/dt = -2z₂
Solving the second equation gives us z₂(t) = e^-2t z₂(0). Substituting this into the first equation results in a nonhomogeneous, linear, first order differential equation,
dz₁/dt = -2z₁ + e^-2t z₂(0).
We can solve this using integrating factors. (See an ODE text for an explanation). The solution is
z₁(t) = e^-2t z₁(0) + te^-2t z₂(0)
Finally, X(t) = S^-1T(t)SX(0), where T(t) =

e^-2t te^-2t

0 e^-2t

Block upper triangular form Every matrix A has a basis relative to which it is in block triangular form. This means that we can find an invertible matrix S such that S^-1AS =

T_1,1	0	0	...	0
0	T_2,2	0	...	0
...	...	...	...	...
0	0	0	...	T_r,r

Each diagonal block T_k,k is upper triangular, with the diagonal entries all being an eigenvalue µ_k repeated as many times as it is a root of the characteristic polynomial. For example, if µ₇ is repeated four times, then T_7,7 =

µ₇	*	*	*
0	µ₇	*	*
0	0	µ₇	*
0	0	0	µ₇

Jordan normal (canonical) form Every matrix A has a basis relative to which the blocks T_k,k are Jordan blocks, J_m(µ). This is an m×m matrix with µ's down the diagonal, 1's down the superdiagonal, and 0's elsewhere. For example, if m = 6 and µ = 3, then J₆(3) =

3	1	0	0	0	0
0	3	1	0	0	0
0	0	3	1	0	0
0	0	0	0	3	1
0	0	0	0	0	3

Two matrices having the same Jordan normal form, apart from ordering of the blocks along the diagonal, are similar. An m×m matrix A is similar to J_m(µ) if and only there is a basis {f₁, ..., f_m} satisfying

Af₁ = µf₁
Af₂ = µf₂ + f₁
...

Af_m = µf_m + f_m-1

Example Consider the matrix A =

2  1 -1
0  2  3
0  0  2

We begin with the eigenvector, f₁ = (1,0,0)^T. Solving (A - 2I)f₂ = f₁ gives f₂ = (0,1,0)^T. Finally, solving (A - 2I)f₃ = f₂ gives f₃ = (0,1/3,1/3)^T. Thus, S^-1AS = J₃(2), where S = [f₁, f₂, f₃]

Midterm review

31 October

Midterm

5 November

Tensors

Invariance Physical laws should be formulated in ways that are independent of any coordinates used; that is, the laws should be stated in such a way that they are invariant under transformation of coordinates. For scalar quantities this has a simple meaning. If we know the temperature on a surface T(P) in u-coordinates, so that T=f(x). Then in another coordinate system, x = g(x'), the temperature is given by T=f(g(x')). The law of transformation is simply composition of functions (substitution).

Vectors Vectors are displacements in ordinary three dimensional space, or physical quantities, like velocity, acceleration, force, and so on, that can be represented by such displacements. Invariance means describing these vectors in a way that is independent of the choice of coordinates. Indeed, tensor algebra can be viewed as linear algebra for such displacements.

Affine coordinates We will begin with an affine coordinate system in 3D space. This simply means that the axes we use for coordinates are three lines that intersect at the origin and that do not lie in the same plane. These are like the usual x-y-z axes, except that they do not need to be perpendicular. We label these axes as x¹-x²-x³. A point in space P then has coordinates P(x¹, x², x³). If another point Q has coordinates
Q(x¹+dx¹, x²+dx², x³+dx³),
then the displacement dx = PQ is described by the column vector [dx¹ dx² dx³]^T relative to a basis of displacements corresponding to the x¹-x²-x³ system. In particular, the three displacements f_j corresponding to PQ_j, where the Q_j's are given by
Q₁(x¹+1, x², x³),
Q₂(x¹, x²+1, x³),
Q₃(x¹, x², x³+1),
provides a basis for the 3D displacements, as long as we are using an affine system of coordinates. We let B = {f₁,f₂,f₃} be the basis formed by these displacements.

Transformation laws

Change of coordinates We now consider a change to another affine coordinate system with axes x'¹-x'²-x'³.
x'¹ = J_1,1x¹ + J_1,2x² + J_1,3x³ + b¹
x'² = J_2,1x¹ + J_2,2x² + J_2,3x³ + b²
x'³ = J_3,1x¹ + J_3,2x² + J_3,3x³ + b³.
For future reference, note that the 3×3 matrix J, which is called the Jacobian matrix of the coordinate transformation, has entries given by

Relative to the primed system, we have a basis B' = {f₁',f₂',f₃'}. The new components for the displacement dx are related to the old via these equations:
dx'¹ = J_1,1dx¹ + J_1,2dx² + J_1,3dx³
dx'² = J_2,1dx¹ + J_2,2dx² + J_2,3dx³
dx'³ = J_3,1dx¹ + J_3,2dx² + J_3,3dx³.
In matrix form, this becomes [dx]_B' = J[dx]_B. Earlier, we developed formulas for change of bases. The notation we are using here corresponds to the earlier notation in the following way:
B <-> B, f_k <-> v_k
B' <-> D, f_k' <-> w_k
[dx]_B' = J[dx]_B <-> [v]_D = A[v]_B; thus, J <-> A and J^-1 <-> C = A^-1
f_k = J_1,kf₁' + J_2,kf₂' + J_3,kf₃' <-> v_j=A¹_jw₁ + A²_jw₂ + ... + Aⁿ_jw_n (J^T <-> A^T)
f_k' = J^-1_1,kf₁ + J^-1_2,kf₂ + J^-1_3,kf₃ <-> w_k=C¹_kv₁ + C²_kv₂ + ... + Cⁿ_kv_n ( (J^-1)^T <-> C^T )

Dual space The dual space to the space of 3D displacements includes physical quantities such a work done by a force in displacing a mass dx. Recall that the dual space is composed of linear functions that take vectors to scalars. For the basis B = {f₁,f₂,f₃}, the dual basis B* = {f¹,f²,f³} satisfies f^j(f_j) = 1 and f^k(f_j) = 0 for j not equal to k. Let L be a linear functional on 3D displacements. Expand L using B* to get
L = y₁f¹ + y₂f² + y₃f³,
and thus the coordinate vector for L relative to B* is
[L]_B* = [y₁ y₂ y₃]^T.
Recall that if dx is a displacement (i.e., a vector), then
L(dx) = [L]_B*^T[dx]_B
Now, L(dx) is independent of our choice of bases, so if we change everything to a new basis B', then we will have
L(dx) = [L]_B'*^T[dx]_B'.
Since [dx]_B' = J[dx]_B, we see that
[L]_B*^T[dx]_B = [L]_B'*^TJ[dx]_B = (J^T[L]_B'*)^T [dx]_B,
which holds for all displacements dx. Hence, ([L]_B* - J^T[L]_B'*)^T[dx]_B = 0 holds for all [dx]_B. By choosing appropriate values for [dx]_B, we obtain
[L]_B* = J^T[L]_B'*.
Finally, we arrive at the transformation law for [L]_B*:
[L]_B'* = (J^T)^-1[L]_B*.
This also provides us with the transformation law for the dual basis itself.
f^j' = J_j,1f¹ + J_j,2f² + J_j,3f³, j=1,2,3.

Summary In the table below, we list the transformation laws for bases and components. In all cases, the coefficients refer to equating primed quantities to linear combinations of unprimed quantities -- e.g., [dx]_B' = J[dx]_B.

	Displacements	Dual space
Basis	(J^T)^-1	J
Components	J	(J^T)^-1

Looking at the table above, we see that that there are two types of transformation laws. Quantities that transform the same way as the basis vectors B are called covariant. The components of displacements transform in a way roughly inverse to the basis. For this reason, they are called contravariant. The vectors in the dual basis are contravariant, and the components of dual vectors transform covariantly. Covariant quantities are denoted by subscripts, and contravariant quantities by superscripts. Thus f_j is covariant and f^j is contravariant.

The metric tensor

Distance The distance between two points is space is the square root of
ds² = dx·dx,
where the "dot" denotes the usual dot product in 3D. We want to put this in terms of components. It is in fact a little easier to look at the dot product of two displacements, dx·dy, where
dx = dx¹f₁ + dx²f₂ + dx³f₃ and dy = dy¹f₁ + dy²f₂ + dy³f₃
Taking the dot product of these vectors results in a quadratic form with nine terms,
dx· dy = f₁·f₁ dx¹dy¹ + f₂·f₂ dx²dy² + f₃·f₃ dx³dy³ + f₁·f₂ dx¹dy² + f₂·f₁ dx²dy¹ +
f₁·f₃ dx¹dy³ + f₃·f₁ dx³dy¹ + f₂·f₃ dx²dy³ + f₃·f₂ dx³dy².
We want to put this in matrix form. Define the matrix g with entries
g_j,k = f_j·f_k.
The result is that we can write dx·dy as
dx·dy = [dy]_B^Tg[dx]_B

Transformation laws The matrix g contains the components of the metric tensor relative to the basis B. Using the invariance of the dot product or the transformation laws for basis vectors,one can easily show that
g'=(J^T)^-1g J^-1.
Equivalently, we can write out the transformation laws for the components:
g'_j,k = SUM_m,ng_m,n (J^T)^-1_m,j (J^T)^-1_n,k

Simple properties of the metric tensor g We close by noting that the matrix g is actually a Gram matrix. As such is invertible and symmetric. In addition, it has the property that it is positive definite. All of its eigenvalues are positive.

7 November

Reciprocal bases and dual bases

Reciprocal basis Let B = {f₁,f₂,f₃} be a basis for the displacements in 3D. We want to define a new basis for the displacements, one that behaves like a dual basis but that is still in 3D. We are looking for vectors {f¹,f²,f³} with the property that
f^j·f_j = 1
and
f^j·f_k = 0, if j is not equal to k. Every basis can be written as linear combinations of vectors in B. That is, for any other basis, there will be coefficients a^j,k such that for j=1,2,3,
f^j = a^j,1f₁ + a^j,2f₂ + a^j,3f₃
In particular, we have
f^j·f_k = a^j,1f₁·f_k + a^j,2f₂·f_k + a^j,3f₃·f_k
Now, recall that the metric tensor g_j,k = f_j·f_k. Hence, we may rewrite the set of equations above as
f^j·f_k = a^j,1g_1,k + a^j,2g_2,k + a^3,3g_3,k
The sum above is just the j,k component of the matrix product ag. The conditions for a reciprocal basis will be satisfied if and only if [ag]^j_j = 1 and [ag]^j_k = 0 for j not equal to k. This means that ag = I, the identity matrix. Hence, the new basis will be reciprocal if and only if a = g^-1. The matrix g^-1 gives rise to a new tensor that is called the conjugate of the metric tensor g. The components of g^-1 will be denoted by
g^j,k .
The reciprocal basis it generates is then written as
f^j = g^j,1f₁ + g^j,2f₂ + g^j,3f₃

Connection with dual space Let L be a linear functional on the space of displacements. Recall that we can write L in terms of the dual basis; its components are then L^k. The result of applying L to dx is
L(dx) = [L]_B*^T[dx]_B = L₁ dx¹ + L₂ dx² + L₃ dx³
This is precisely the same result we would get from L·dx if we regarded L as a displacement, with reciprocal basis representation
L = L₁f¹ + L₂f² + L₃f³.
The point is that linear functionals in the dual space can thus be identified with displacements. In addition, the dual basis may be identified with the reciprocal basis. This also means that if the underlying basis B is changed to B', then the dual basis B* and the reciprocal basis B_recip transform in exactly the same way to B'* and B'_recip.

Contravariant and covariant components of a vector

Two ways of representing vectors We can represent a displacement vector v in two ways. First, we can use the basis B = = {f₁,f₂,f₃}. This gives us the following expression:
v = v¹f₁ + v²f₂ + v³f₃, and so [v]_B = [v¹ v² v³]^T
The second way is to use the reciprocal basis B_recip = {f¹,f²,f³}, which results in
v = v₁f¹ + v₂f² + v₃f³, and so [v]_{B_recip} = [v₁ v₂ v₃]^T.
The relationship between the components is determined by the change-of-basis matrix that relates B and B_recip. As we have seen above, the reciprocal basis is expressed in terms of B using the entries of g^-1. The formulas derived earlied for making a change of basis apply here, too. We identify B here with the B used earlier, and B_recip with the basis D. Thus we identitfy g^-1 and the matrix C^T used earlier, and we obtain [v]_B = (g^-1)^T[v]_{B_recip}. However, both g and g^-1 are symmetric, so that in fact we have
[v]_B = g^-1[v]_{B_recip} or, equivalently, [v]_{B_recip} = g[v]_B

Transformation laws If we introduce a coordinate transformation that introduces a new basis B', then we have seen that the metric tensor transforms according to the law g'=(J^T)^-1g J^-1 and that [v]_B' = J[v]_B. It follows that
[v]_{B'_recip} = g'[v]_B' = (J^T)^-1g J^-1J[v]_B = (J^T)^-1g[v]_B = (J^T)^-1[v]_{B_recip}.
Thus the components relative to the reciprocal basis transform covariantly.

Representations of a Vector v
Original Basis Reciprocal Basis

Representation v = v¹f₁+ v²f₂+ v³f₃ v = v₁f¹+ v₂f²+ v₃f³

Components v^j = f^j·v = SUM_kg^j,kv_k v_j = f_j·v = SUM_kg_j,kv^k

Transformation matrix J (J^T)^-1

Transformation law Contravariant Covariant

**Representations of a Vector v**
	Original Basis	Reciprocal Basis
Representation	v = v¹f₁+ v²f₂+ v³f₃	v = v₁f¹+ v₂f²+ v₃f³
Components	v^j = f^j·v = SUM_kg^j,kv_k	v_j = f_j·v = SUM_kg_j,kv^k
Transformation matrix	J	(J^T)^-1
Transformation law	Contravariant	Covariant

The Inertia Tensor

See §2.4.3 in Borisenko and Tarapov.

12 November

Examples of Tensors

The stress tensor See §2.4.2 in Barisenko and Tarapov.

The deformation (strain) tensor See §2.4.4 in Barisenko and Tarapov.

Curvilinear coordinates

Generalized coordinates Examples: cylindrical coordinates and spherical coordinates. See §2.8 in Barisenko and Tarapov.

Coordinate surfaces Barisenko and Tarapov, §2.8.1.

Coordinate curves Barisenko and Tarapov, §2.8.2.

14 November

Bases and reciprocal bases in generalized coordinates

Basis vectors Let q^j = q^j(x¹, x², x³), j=1,2,3, be a set of generalized coordinates. The x^j's are cartesian coordinates. There are three coordinate curves associated with the q^j's. If x is the usual radius vector to a point in three dimensional space, then x = x(q¹, q², q³), and the three curves are

x = x(t, q², q³) (q¹-curve)
x = x(q¹, t, q³) (q²-curve)
x = x(q¹, q², t) (q³-curve)

Let's look at the q¹-curve. The velocity vector tangent to this curve at any time t is dx/dt. This can be computed using the chain rule:

Set t = q¹. This gives us our first basis vector, e₁. The others are defined in the same way. That is, for j=1,2,3, we set

Together, these form a basis for the 3D displacements. The basis vectors, however, do depend on the point described by the coordinates q¹, q², q³. This means that a different basis is associated with each point in three dimensional space.

Reciprocal basis vectors The reciprocal basis for {e₁, e₂, e₃} can be obtained via the cross product. For example,
e¹ = W^-1e₂×e₃, W = e₁·e₂ ×e₃.
The others are defined by cyclic permutation of the indices involved. (W stays the same, of course. See problem 4 in Assignment 4.)

There is another way to view the reciprocal basis vectors, a way that is similar to viewing basis vectors as tangent vectors to the coordinate curves. Let F(x¹, x², x³) = C be the level surfaces for for a function F. Recall that at a point P(x¹, x², x³), the vector ∇F is normal to the plane tangent to F=C at P. Applying this to the coordinate surface q³(x¹, x², x³) = c³ = constant, we see that ∇q³ is perpendicular to the tangent vectors to the q¹ and q² coordinate curves. Since these tangent vectors are precisely the two basis vectors e₁ and e₂, it follows that ∇q³ is parallel to e₁×e₂ and hence to e³. In fact, we will show that they are equal.

Proposition For j=1,2,3, ∇q^j = e^j

Proof: We will show the case j=3. The others are identical. The q³ coordinate curve through a point P is given by x = x(q¹, q², t). On this curve, we have
q³(x) = t.
If we differentiate both sides with respect to t and use the chain rule, we will get
∇q³·dx/dt = 1.
Since e₃ = dx/dt in this case, we have that ∇q³·e₃ = 1. As we have already mentioned, ∇q³·e₁ = 0 and ∇q³·e₂ = 0. By definition, ∇q³ = e³. This completes the proof.

The metric tensor and volume element We remark that the metric tensor is obtained from the dot product of two displacements dx·dy, relative to the basis B = {e₁, e₂, e₃}. The result is dx·dy = [dy]_B^Tg [dx]_B, where the metric tensor g has components g_j,k = e_j·e_k. From problem 5c, Assignment 9, we have that the volume element is given by
dV = G^½dq¹dq²dq³, where G = det(g).

Cylindrical coordinates In cylindrical coordinates, (r,θ,z), we have that
e_r = cos(θ)i + sin(θ)j
e_θ = -r sin(θ)i + r cos(θ)j
e_z = k
From the proposition we proved above, we can calculate the reciprocal basis vectors.
e^r = ∇r =(x/r)i + (y/r)j = cos(θ)i + sin(θ)j =e_r.
Similarly, we have
e^θ = ∇θ = - (y/r²)i + (x/r²)j = r^-2e_θ
and
e^z = e_z = k.
The metric tensor is g =

1	0	0
0	r²	0
0	0	1

This is frequently written as a quadratic form giving the square of the arc length element ds:
ds² = dr² + r²dθ² + dz².
Finally, since G = det(g) = r², we see that the volume element is dV = rdrdθdz.

19 November

Summary of vectors and tensors in a fixed coordinate system

The basis B Points in 3D are described by coordinates (q¹,q²,q³). The radius vector x = x(q¹,q²,q³). The three vectors derived from the radius vector via e_j = ∂x/∂q^j form a basis B = {e₁, e₂, e₃}. This basis may vary from point to point in space. Here we are concerned with only what happens at a single point.

Here is a 2D example. Suppose we have x = (q¹ + q²)i + ( (q¹)² - q²)j. The basis for this case is
e₁ = ∂x/∂q¹ = i + 2q¹j
e₂ = ∂x/∂q² = i - j

The metric tensor Recall that ds² = [dx]_B^Tg[dx]_B = ∑g_j,kdq^jdq^k, where g_j,k = e_j·e_k. The inverse of this matrix, g^-1, is also important. We will denote its enrties by g^j,k. We point out that g is also the Gram matrix for the basis B.

In the 2D example, g =

1+4(q¹)²	1-2q¹
1-2q¹	2

and g^-1 = (1+2q¹)^-2×

2	2q¹-1
2q¹-1	1+4(q¹)²

The reciprocal basis B_r The reciprocal basis B_r = {e¹, e², e³} may be constructed in three different ways.

Gradient e^j = ∇ q^j.
Cross product e¹ = W^-1e₂×e₃, W = e₁·e₂ ×e₃, where the others are defined by cyclic permutation of the indices involved.
Metric tensor We may directly use the definition e^j·e_k = δ^j_k to find the reciprocal vectors in terms of the basis B. The result is simply that
e^j = ∑ g^j,me_m.
This is easy to verify:
e^j·e_k = ∑ g^j,me_m·e_k = ∑ g^j,m g_m,k = [g^-1g]^j_k = δ^j_k

Returning to our 2D example. Far and away the easiest method is the third one above. The reciprocal basis for this case is
e¹ = 2(1+2q¹)^-2e₁ + (2q¹-1)(1+2q¹)^-2e₂
e² = (2q¹-1)(1+2q¹)^-2e₁ + (1+4(q¹)²)(1 + 2q¹)^-2e₂
These can also be written in terms of {i,j}.

Vectors & components We can represent any vector (displacement) v using B-coordinates or B_r coordinates. That is,
v = v¹e₁ + v²e₂ + v³e₃ ( [v]_B = [v¹ v² v³]^T, contravariant)
or
v = v₁e¹ + v₂e² + v₃e³ ( [v]_{B_r} = [v₁ v₂ v₃]^T, covariant)
The two column vectors are related via the equations
[v]_{B_r} = g[v]_B and [v]_B = g^-1[v]_{B_r}
Tensor notation directly writes out the matrix products:
v_j = ∑ g_j,k v^k and v^j = ∑ g^j,k v_k

Back to the 2D example. Using the g and g^-1 from before, we can write covariant components in terms of contravariant ones,
v₁ = (1+4(q¹)²)v¹ + (1-2q¹)v²
v₂ = (1-2q¹)v¹ + 2v² ,
or the contravariant components in terms of the covariant components,
v¹ = 2(1+2q¹)^-2v₁ + (1+2q¹)^-2(2q¹-1)v₂
v² = (1+2q¹)^-2(1+4(q¹)² )v₁ + (1+2q¹)^-2(2q¹-1)v₂

Tensors & components We regard tensors as linear transformations on the spaces associated with 3D displacements. Because most problems involve only linear transformations that take vectors to vectors, we will concentrate on these. To that end, suppose that T is a linear transformation that takes 3D displacements to 3D displacements; that is,
T(v) = w
Recall that we can represent T by a matrix, given bases for the inputs and the outputs. We list these below.

Input basis Output basis Matrix Column k (j,k)-entry Tensor type

B B M [T(e_k)]_B T^j_k Mixed

B B_r N [T(e_k)]_{B_r} T_j_k Covariant

B_r B_r P [T(e^k)]_{B_r} T_j^k Mixed

B_r B Q [T(e^k)]_B T^j^k Contravariant

Input basis	Output basis	Matrix	Column k	(j,k)-entry	Tensor type
B	B	M	[T(e_k)]_B	T^j_k	Mixed
B	B_r	N	[T(e_k)]_{B_r}	T_j_k	Covariant
B_r	B_r	P	[T(e^k)]_{B_r}	T_j^k	Mixed
B_r	B	Q	[T(e^k)]_B	T^j^k	Contravariant

The names under the heading matrix are arbitrary labels. They are used only here and nowhere else. Using the change of basis formulas from the previous section, we can write all of the matrices in terms of M, g, and g^-1.
N = gM and T_j_k = ∑g_j
mT^m_k
P = gMg^-1 and T_j^k = ∑g_j
m g^{k n} T^m_n
Q = Mg^-1 and T^j^k = ∑g^{k m} T^j_m
The point is that once one set of components is determined, so are all of the rest.

Higher order tensors The order or rank of a tensor is the number of indices reuired to specify its components. The tensors described above are all order 2 tensors. Vectors are order 1 tensors, and scalars are order 0 tensors. An order 4 tensor that arises in elasticity theory relates the stress and strain (deformation) tensors via a generalized Hooke's law. (See pg. 209, J. L. Synge and A. Schild, Tensor Calculus, Dover, New York, 1978.) The purely contravariant form of such a tensor would have four superscript indices,
Tⁱ^j^k^l.
To change to a mixed form where the second index is covariant but the others are contravariant, one need only multiply by g_{m k} and sum over k. This results in lowering the third index.
Tⁱ^j_m^l = ∑_k g_{m k} Tⁱ^j^k^l
To move to a completely covariant form, one uses the same operation on all indices,
T_m_n_p_q = ∑_ijkl g_{m i} g_{n j} g_{p k} g_{q l} Tⁱ^j^k^l
The process may be reversed. So if we start with the third index being covariant and the remaining contravariant, we can raise the index as follows:
Tⁱ^j^k^l = ∑_m g^{k m}Tⁱ^j_m^l.

Summary of vectors and tensors under coordinate transformations

The bases B′ and B′_r The setting here is that we start with underlying coordinates (q¹, q², q³) and make a change to the set (q'¹, q'², q'³). The key in all of this is to first find out how the coordinate vectors for B and B' are related. This is derived from the Jacobian matrix of the transformation of coordinates. Namely, we have
dq′^j = ∑(∂q′^j/∂q^k) dq^k.
In terms of matrices, this equation means that displacements transform this way:
[dx]_B′ = ∂q′/∂q [dx]_B.
Note that by reversing the roles of q and q′, we also have
[dx]_B = ∂q/∂q′ [dx]_B′.
Of course, the matrices ∂q′/∂q and ∂q/∂q′ are actually inverses of each other. We have derived the relationships among various components of vectors earlier. In tensor notation, these are
v′_j = ∑∂q^k/∂q′^j v_k (equivalently, [v]_{B′_r} = (∂q/∂q′)^T [v]_{B_r}
e′_j = ∑∂q^k/∂q′^j e_k
e′^j = ∑∂q′^j/∂q^k e^k

Tensors of order 2 We can now look at how oreder 2 tensors transform. In particular, let us see what happens to the matrix representing T relative to the input and output bases B. This was the matrix M with mixed components
T^j_k .
Changing to the B′ system, we have from our standard change-of-basis methods
M' = ∂q′/∂q M (∂q′/∂q )^-1 = ∂q′/∂q M ∂q/∂q′
It then follows that
T′^j_k = ∑(∂q′^j/∂q^m) (∂qⁿ/∂q′^k) T^m_n
One also has that these hold:
N′ = ∂q/∂q′^T N ∂q/∂q′ and T′_j_k = ∑(∂q^m/∂q′^j)(∂qⁿ/∂q′^k) T_m_n
P′ = ∂q/∂q′^T P ∂q′/∂q^T and T′_j^k = ∑(∂q^m/∂q′^j) (∂q′^k/∂qⁿ) T_mⁿ
Q′ = ∂q′/∂q Q ∂q′/∂q^T and T′^j^k = ∑(∂q′^j/∂q^m)(∂q′^k/∂qⁿ) T^mⁿ

Higher order tensors Consider a rank 4 tensor Tⁱ^j_m^l. Using this tensor, we will illustrate how the components change if we change coordinates to q′^j. The result is
T′ⁱ^j_k^l = ∑ (∂q′ⁱ/∂q^m) (∂q′^j/∂qⁿ) (∂q^p/∂q′^k) (∂q′^l/∂q^r) T^mⁿ_p^r
Other cases are treated analogously.

The gradient operator

The differential of a scalar quantity Temperature is a good example of a scalar quantity. To avoid notational problems, we will use τ to designate it. We certainly know that the temperature can vary from point to point in a region. We want to measure how much it changes if we move from a given point x to a nearby point x+dx. Recall from several variable calculus that
τ(x+dx) = τ(x) + ∇τ(x)·dx + o(|dx|)
where o(|dx|) represents terms that vanish faster than |dx|. The linear term
dτ = ∇τ·dx
is called the differential of τ at x, and ∇&tau = ∇&tau(x) is the gradient of τ.

The gradient in generalized coordinates Let us introduce generalized coordinates, x = x(q¹,q²,q³). Thus we may regard τ as a function of the q's. Again appealing to several variable calculus, we have
dτ = ∑_j (∂τ/∂q^j) dq^j
   = ∑_j∑_k (∂τ/∂q^j) dq^k δ^j_k
   = ∑_j∑_k (∂τ/∂q^j) dq^k e^j·e_k
   = (∑_j(∂τ/∂q^j)e^j) · (∑_ke_kdq^k)
   = (∑_j(∂τ/∂q^j)e^j) ·dx
From this, we see that the gradient has the following expression in generalized coordinates:
∇τ = ∑_j(∂τ/∂q^j)e^j.
For example, in cylindrical coordinates, we would have
∇τ = ∂τ/∂r e^r + ∂τ/∂θ e^θ + ∂τ/∂z e^z.

21 November

Line integrals

Curves and Green's Theorem To state Green's theorem, we need to discuss simple, closed curves. These are closed curves, like circles, but the do not intersect themselves. Rectangles, triangles, circles, and ellipses are simple closed curves; figure eights are not. Simple closed curves divide the plane into two nonoverlapping regions, one interior and the other exterior. It forms the boundary of both regions. We will consider simple closed curves that are piecewise smooth, which just means that we are allowing a finite number of corners. We also say that a simple closed curve is positively oriented if it is travered in the counterclockwise direction. Here is the statement of Green's Theorem:

Green's Theorem Let C be a piecewise smooth simple closed curve that is the boundary of its interior region R. If F(x,y) = A(x,y)i + B(x,y)j is a vector-valued function that is continuously differentiable on and in C, then

Surface integrals

Surfaces See my notes, Surfaces. In addition to discussing ways of representing surfaces, we discussed computing surface area elements, normals to surfaces, and related topics.

Flux integrals Consider the steady state velocity field V(x) of a fluid. We want to calculate the amount of fluid crossing a surface parametrized by x = x(u¹, u²). Let f₁ and f₂ be partials of x with respect to the parameters u¹and u². We consider an element of surface area, shown below as the base of the parallelepiped. Our first step is to calculate the fluid crossing this surface element. In time t to t+dt, the volume of fluid crossing the base of the parallelepiped equals its volume, (Vdt)·f₁×f₂ du¹du².

Vdt
f₁du¹ f₂du²

The mass of the fluid crossing the base in time t to dt is then density×volume, or
(µVdt)·f₁×f₂ du¹du²
Thus the mass per unit time crossing the base is F·N du¹du², where F = µV, and N = f₁×f₂ is the standard normal. Recall that the area of the surface element is dS = |N|du¹du². Consequently the mass per unit time crossing the base is F·n dS, where n is the unit normal. Integrating over the whole surface then yields

This surface integral is called the the flux of the vector field F.

26 November

Relating line integrals to surface integrals: Stokes's Theorem

The curl and Stokes' Theorem Let S be a surface in 3D bounded by a simple closed curve C. We will not be absolutely precise here. One should think of S as a butterfly net, with C as its rim. Such a surface is orientable, and we always have a consistent piecewise continuous unit normal n defined on S. We say that C is positively oriented if in traversing C with the surface on our left, we are standing in the direction of n.

To state this theorem, we also need to define the curl of a vector field
F(x)=A(x,y,z)i + B(x,y,z)j +C(x,y,z)k.
We will assume that F has continuous partial derivatives. The curl is then defined by

There is a useful physical interpretation for the curl. Suppose that a fluid is rotating about a fixed axis with angular velocity ω. Define ω to be the vector with magnitude ω and with direction along the axis of rotation. The velocity of an element of the fluid located at the position with radius vector x is v(x) = ω×x. With a little work, one can show that ω = ½∇×v. Thus one half of the curl of the velocity vector v is the vector ω mentioned above.

Stokes' Theorem Let S be an orientable surface bounded by a simple closed positively oriented curve C. If F is a continuously differentiable vector-valued function defined in a region containing S, then

Example Verify Stokes's Theorem for the vector field F(x) = 2yi + 3xj - z²k over the surface s, where S is the upper half of a sphere x²+y²+z² = 9 and C is its boundary in the xy-plane, the circle x²+y² = 9. C is traversed counterclockwise.

We will first compute the line integral over C. In the xy-plane, C is parameterized via
x(t) = 3 cos(t) i + 3 sin(t) j, 0 ≤ t ≤ 2π,
and so we have:

dx = (- 3 sin(t) i + 3 cos(t) j)dt
F(x(t)) = 2·3 sin(t)i + 3² cos(t)j - 0²k = 2·3 sin(t)i + 3² cos(t)j
F·dx = (- 18 sin²(t) + 27 cos²(t))dt
∫_CF·dx = ∫₀^2π(- 18 sin²(t) + 27 cos²(t))dt = 9π

We now turn to finding the surface integral. ∫∫_S ∇×F·n dσ. The normal compatible with the orientation of C is n = x/|x| = x/3. Thus, on the surface of the hemisphere S, we have
n = x(θ,φ)/3 = (3 sin(θ)cos(φ) i + 3 sin(θ)sin(φ) j + 3 cos(θ) k)/3,
and hence,
n = sin(θ)cos(φ) i + sin(θ)cos(φ) j + cos(θ) k)
where 0 ≤ θ ≤ ½π and 0 ≤ φ ≤ 2π. Also, the area element is dσ = 3²sin(θ)dφdθ. Moreover, it is easy to show that &nabla×F = k. We are now ready to do the surface integral involved:

∫∫_S ∇×F·n dσ = ∫∫_S k·n dσ
∫∫_S ∇×F·n dσ = ∫₀^2π ∫₀^½π cos(θ) 3² sin(θ)dφdθ
∫∫_S ∇×F·n dσ = 9π

Since both terms in Stokes's Theorem have the same value, we have verified the theorem in this case.

Relating surface integrals to volume integrals: the Divergence Theorem

The divergence of a vector field and the Divergence Theorem The divergence of a vector field F(x)=A(x,y,z)i + B(x,y,z)j +C(x,y,z)k is defined by

Like the curl, the divergence of F has a physical interpretation in terms of fluids. This will be made clearer later. Here is the statement of the Divergence Theorem.

Divergence Theorem Let V be region in 3D bounded by a closed, piecewise smooth, orientable surface S; let the outward-drawn normal be n. Then,

Example Verify the Divergence Theorem for the surface integral ∫∫_S F·ndσ, where F = 3xi+yj+2zk and S is the surface of the closed cylinder (including caps) x²+y² = 16, 0 ≤ z ≤ 5. The normal is outward drawn.

To do this we must compute both integrals in the Divergence Theorem. We will first do the volume integral. It is easy to check that ∇·F = 3+1+2=6. Hence, we have that
∫∫∫_V∇·FdV = ∫∫∫_V6dV = 6π·4²·5 = 480π

The surface integral must be broken into three parts: one for the top cap, a second for the curved sides, and a third for the bottom cap.
∫∫_S F·ndσ = ∫∫_top F·ndσ + ∫∫_sides F·ndσ + ∫∫_bottom F·ndσ
The outward normals for the top and bottom caps are k and −k, respectively. For the top (z = 5), we are integrating F(x,y,5)·k = 2·5 = 10, and for the bottom (z = 0), F(x,y,0)·(−k) = −2·0 = 0. Hence, we have
∫∫_top F·ndσ = ∫∫_top 10dσ = 10π4² = 160π
∫∫_bottom F·ndσ = ∫∫_bottom 0dσ = 0
The integral over the curved sides will require a little more effort. The outward normal (see my notes, Surfaces, pg. 5) and area element are, respectively,
n = cos(θ)i + sin(θ)j and dσ = 4dθdz.
In addition, on the curved sides
F(4cos(θ),4sin(θ),z) = 12cos(θ)i + sin(θ)j + 2zk, so F·n = 12cos²(θ) + 4sin²(θ).
The surface integral over the curved sides is then given by
∫∫_sides F·ndσ = ∫₀⁵∫₀^2π (12cos²(θ) + 4sin²(θ))4dθdz = 5·4(12π+4π)= 320π.
Combining these three integrals, we obtain
∫∫_S F·ndσ = 160π+320π + 0 = 480π,
which agrees with the result from the volume integral. Thus we have verified the Divergence Theorem in this case.

Equation of continuity for fluids Suppose that in a region a fluid has a velocity field v(x,t) and density ρ(x,t), and that there are no sources or sinks in the region. Recall that last class we showed that the amount of fluid crossing a surface in the direction n per unit time is the flux,
Φ = ∫∫_Sρv·ndσ.
If S is a closed surface forming the boundary of a volume V, then Φ is the negative of the total rate of change of mass in the fluid in V. Thus, we have the equation
∫∫_Sρv·ndσ = − d/dt ∫∫∫_VρdV = − ∫∫∫_V∂ρ/∂t dV.
If we use the divergence theorem to replace the surface integral by a volume integral, we obtain
∫∫∫_V ∇·(ρv)dV = − ∫∫∫_V∂ρ/∂t dV,
and, consequently, that
∫∫∫_V (∇·(ρv)+∂ρ/∂t) dV = 0
holds for every choice of V within the region under consideration. If we take V to be a small sphere of radius ε and center x, then the limit as ε tends to 0 of
V^-1∫∫∫_V (∇·(ρv)+∂ρ/∂t) dV
is ∇·(ρv)+∂ρ/∂t. On the other hand, this limit is of course 0. Therefore,
∇·(ρv)+∂ρ/∂t = 0.
This partial differential equation is called the equation of continuity.

3 December

Divergence, Laplacian, and Curl in General Coordinates and Spherical Coordinates

Divergence

Laplacian

Curl

Heat flow in a body

Derivation of the heat equation See § VI.2 in Zachmanoglou and Thoe (Z/T). The steady state version of this equation is Laplace's equation. The heat equation comes from considering energy balance. To specify a temperature in a body, we must also take into account two other factors: the past history of the body and the interaction of the body with the environment. For materials without "memory," the past history is adequately described via specifying the temperature throughout the body at some initial time, say t = 0. The interaction of the the environment with the body is modeled through the use of boundary conditions.

Types of boundary conditions There are three common types of boundary conditions.

Dirichlet boundary conditions. These specify the temperature on the surface of the body. For example, putting an object in ice keeps the temperature at its surface at 0 degrees C.
Neumann boundary conditions. These specify the flow of heat across the boundary using Fourier's law. That is, they specify n·∇u on the surface of the body. An insulated boundary would have no heat flow, and one would require n·∇u = 0.
Robin boundary conditions. These specify a linear combination of u and n·∇u on the surface. They come from Newton's law of cooling, for example.

5 December

Classification of partial differential equations

General second order linear PDEs In addition to the heat equation and Laplace's equation, which we derived earlier, there is a third important type of PDE, the wave equation.
c^-2∂²u/∂t² = ∇²u.
These three equations are special cases of three general types of second order linear PDEs. The most general form of a second order linear PDE is
∑a_jk∂²u/∂x_j∂x_k + ∑b_j∂u/∂x_j + cu +d = 0.
The classification scheme is based on the signs of the eigenvalues of the symmetric matrix A with entries a_jk. In the table below, we classify PDEs for three space variables (x= x₁, y = x₂, z = x₃) and one time variable (t = x₄).

Classification of PDEs

Type Eigenvalues of A Example Variables

Parabolic +++0 Heat equation 3 space, 1 time

Elliptic +++ Laplace's equation 3 space

Hyperbolic +++- Wave equation 3 space, 1 time

Classification of PDEs
Type	Eigenvalues of A	Example	Variables
Parabolic	+++0	Heat equation	3 space, 1 time
Elliptic	+++	Laplace's equation	3 space
Hyperbolic	+++-	Wave equation	3 space, 1 time

We remark that if the general PDE is multiplied by a minus sign, the patterns in the table will have "+" replaced by"−". In general, the solutions to the various types of equations behave like the corresponding example. For instance, hyperbolic equations have solutions that propagate in time, like those for the wave equation, while parabolic equations have solutions that behave like temperature in a heat flow problem. For further discussion, see section V.8 in Z/T.

Separation of variables

Laplace's equation We want to solve for the steady state temperature u in a disk of radius r = a, given Dirichlet boundary conditions. (See also section V.7 in Z/T). We will use polar coordinates. The precise problem is this:

∇²u = r^-1∂/∂r[r∂u/∂r] + r^-2∂²u/∂θ² = 0
u(a,θ) = f(θ) = known temperature on boundary (Dirichlet boundary condition).

Because we are using polar coordinates, which are singular at r = 0 and have a discontinuity at θ = ±π, we have two additional "boundary conditions" -- namely that u(r,θ) is well behaved (bounded) as r approaches 0 and that u is 2π periodic in θ.

If we ignore the nonhomogeneous boundary condition, u(a,θ) = f(θ), then the set of solutions is a vector space. Our aim is to construct a basis for this space. Separation of variables is a method for finding a basis. Once we have accomplished this, we then find the linear combination that also satisfies the nonhomogeneous condition.

Separating variables We begin by looking for special solutions to the homogeneous problem,

r^-1∂/∂r[r∂u/∂r] + r^-2∂²u/∂θ² = 0
u(r,θ) is bounded as r approaches 0
u(r,θ) is 2π periodic in θ.

The solutions that we want have the form u(r,θ) = R(r)Θ(θ). Plugging into the equation gives us

r^-1[rR′]′Θ + r^-2RΘ″ = 0

If we now multiply this equation by r² and divide by RΘ, we arrive at this equation:

r[rR′]′/R + Θ″/Θ = 0

Since r[rR′]′/R is a function of r only, and since Θ″/Θ is a function of θ only, it follows that both are constant. If we let μ = r[rR′]′/R, then Θ″/Θ = -μ. With a little algebra, we obtain the separation equations,

r[rR′]′ - μR = 0 and Θ″ + μΘ = 0.

The eigenvalue problem We now turn to the two remaining conditions. The first of these will be satisfied if R(r) is chosen so as to be continuous at r = 0. We will deal with it later. The condition that u(r,θ) be 2π periodic implies that Θ(θ) is 2π periodic. This imposes restrictions on the possible values of μ and gives us the following eigenvalue problem:

Find all possible values of μ for which the problem
Θ″ + μΘ = 0 and Θ(θ) = Θ(θ+2π)
has a nonzero solution Θ. These values of μ are called eigenvalues, while the corresponding solutions Θ are called eigenfunctions.

We can immediately eliminate μ < 0. The solutions to Θ″ − |μ|Θ = 0 are linear combinations of exp(±|μ|^½θ), which always will blow up as θ approaches either +∞ or −∞ or both. They therefore cannot be periodic. For μ = 0, we do have a single periodic solution, namely Θ = 1. The second solution is Θ(θ) = θ, which is not periodic.

This leaves the case in which μ > 0. The differential equation Θ″ + μΘ = 0 has two solutions, sin(μ^½θ) and cos(μ^½θ). These solutions are periodic with fundamental period 2πμ^−½. They will also have 2π as a period if and only if some integer multiple of 2πμ^−½ is 2π. Thus, we μ > 0 is an eigenvalue if and only if there is an integer n > 0 such that 2πμ^−½n = 2π. It follows that μ = n² and that Θ(θ) is a linear combination of sin(nθ) and cos(nθ).

Solution to the Eigenvalue Problem

Eigenvalues μ Eigenfunctions &Theta(θ)

0² 1

1² cos(θ), sin(θ)

2² cos(2θ), sin(2θ)

⋮ ⋮

n² cos(nθ), sin(nθ)

⋮ ⋮

Solution to the Eigenvalue Problem
Eigenvalues μ	Eigenfunctions &Theta(θ)
0²	1
1²	cos(θ), sin(θ)
2²	cos(2θ), sin(2θ)
⋮	⋮
n²	cos(nθ), sin(nθ)
⋮	⋮

Separation solutions We still have to find the radial solutions corresponding the eigenvalues we found previously. For μ = 0, the radial equation is r[rR′]′ = 0. dividing by r, we get [rR′]′ = 0, so rR′ = C = constant, and R′ = C/r. Integrating this gives R(r) = Cln(r) + D, where D is another constant. The only solution that behaves nicely at r = 0 is R(r) = constant; that is, any multiple of R(r) = 1.

When μ = n², n ≥ 1, R satisfies the equation r[rR′]′ - n²R = 0. Working out the derivatives, we see that this is the equation

r²R″ + rR′ - n²R = 0,

which is a Cauchy-Euler equation. The technique for solving it is to use assume a solution of the form R = r^α and determine α. Carrying this out, we obtain

α(α−1)r²r^α−2 + αrr^α−1 − n²r^α = 0
(α(α−1) + α − n²)r^α = 0
(α² − n²)r^α = 0

Dividing the last equation by r^α, we see that α² − n² = 0, and so α = ±n and the possible solutions are linear combinations of rⁿ and r⁻ⁿ. Of these two, only rⁿ is bounded as r approaches 0. Thus, only R(r) = rⁿ can be used. The separation solutions that we have obtained are listed in the table below.

Separation Solutions

n R(r) Θ(θ) u = Rθ

0 1 1 1

1 r cos(θ), sin(θ) r cos(θ), r sin(θ)

2 r² cos(2θ), sin(2θ) r²cos(2θ), r²sin(2θ)

⋮ ⋮ ⋮ ⋮

n rⁿ cos(nθ), sin(nθ) rⁿcos(nθ), rⁿsin(nθ)

⋮ ⋮ ⋮ ⋮

Separation Solutions
n	R(r)	Θ(θ)	u = Rθ
0	1	1	1
1	r	cos(θ), sin(θ)	r cos(θ), r sin(θ)
2	r²	cos(2θ), sin(2θ)	r²cos(2θ), r²sin(2θ)
⋮	⋮	⋮	⋮
n	rⁿ	cos(nθ), sin(nθ)	rⁿcos(nθ), rⁿsin(nθ)
⋮	⋮	⋮	⋮

Matching the nonhomogeneous conditions We may think of the separation solutions as forming a basis for the solution space. The general solution is thus

u(r,θ) = A₀ + ∑_n≥1(A_nrⁿcos(nθ) + B_nrⁿsin(nθ)).

To match the boundary condition u(a,θ) = f(θ), we need to find coefficients such that

f(θ) = A₀ + ∑_n≥1(A_naⁿcos(nθ) + B_naⁿsin(nθ))

holds. We have already seen that we can represent f this way via its Fourier series. Indeed, this type of problem was Fourier's motivation for introducing such series! All we need to do now is to identify the Fourier coefficients for f with the coefficients above: a_n = A_naⁿ and b_n = B_naⁿ. The final solution is then

u(r,θ) = a₀ + ∑_n≥1(r/a)ⁿ(a_ncos(nθ) + b_nsin(nθ)),

where a_n and b_n are the Fourier coefficients for f.

An example Recall that we have calculated the Fourier series for the periodic function f(θ), which is defined by f(θ) = |θ| for −π ≤ θ ≤ π. The series we found was

f(θ) = ½π - (4/π)∑_k≥1 (2k−1)⁻²cos((2k−1)θ)

By what we said above, the temperature u(r,θ) corresponding to this f is

u(r,θ) = ½π - (4/π)∑_k≥1 (r/a)^2k−1(2k−1)⁻² cos((2k−1)θ)

Separation of variables in a vibrating string problem We also discussed separation of variables for the problem of a vibrating string with ends clamped. This is covered in section VIII.8 of Z/T.

10 December

Vibrations in Finite Regions

Separation of variables See VIII.10 in Z/T.

Eigenfunction expansions See VIII.10, Theorem 10.1.

Vibrations in a circular membrane See VIII.10, Example 10.3.

SEP	3	5	10	12	17	19	24	26
OCT	1	3	8	10	15	17	22	24	29	31
NOV	5	7	12	14	19	21	26	28
DEC	3	5	10

SEP	3	5	10	12	17	19	24	26
OCT	1	3	8	10	15	17	22	24	29	31
NOV	5	7	12	14	19	21	26	28
DEC	3	5	10

SEP	3	5	10	12	17	19	24	26
OCT	1	3	8	10	15	17	22	24	29	31
NOV	5	7	12	14	19	21	26	28
DEC	3	5	10