# Notes for Analysis for Applications I

## Vector Spaces

Vector spaces developed gradually over a very long period of time, with roots in both pure mathematics and physics. Its axiomatic formulation is attributed to the mathematician Peano, in 1886. The word vector comes directly from Latin; it means "carrier." In astronomy in the 1600's, the line joining a planet to the sun was thought to be a a kind of string that "carried" the planet around in its orbit around the sun. Here is the familiar, modern definition:

Definition. A vector space is a set V together with two operations, + and × . If u, v are in V, then u + v is in V; if c is a scalar (a real or complex number for us), then c×v is in V. The operations satisfy the following rules.

 Addition Multiplication by a scalar u + (v + w) = (u + v) + w a(b×u) = (ab)×u Identity: u + 0 = 0 + u = u (a + b)×u = a× u + b×u Inverse: u + (-u) = (-u) + u = 0 a×(u + v) = a×u + a×v u + v = v + u 1×u = u
The mathematician Peter Lax calls the idea of a vector space "a bare-bones concept," and, when we look at the definition, it's easy to see why. All that's required is a set, scalars, two operations, and a few axioms involving them. It is striking that there are many objects that satisfy these axioms, and that the "bare-bones" vector-space notion can yield a great deal of information concerning these objects.

The vector spaces that we will deal with here include the familiar finite dimensional spaces $\mathbb R^n$, $\mathbb C^n$, the space of $k$ times continuously differentiable functions $C^k$, polynomials of degree $n$ or less $P_n$, various spaces of Lebesgue integrable functions $L^p$, sequence spaces $\ell^p$, and other spaces that we will introduce later. At this point, it's a good idea to review some standard definitions and theorems that come up in linear algebra.

Subspaces

Definition. A subset U of V is a subspace if, under + and × from V, U is a vector space in its own right.

Theorem. U is a subspace of V if and only if these hold:
1. 0 is in U.
2. U is closed under + .
3. U is closed under × .

Span

Definition. Let S={v1 ... vn} be a subset of a vector space V. The span of S is the set of linear combinations of vectors in S. That is, $\text{span}(S) := \{c_1v_1 + \cdots c_n v_n \}$, where the $c$'s are arbitrary scalars and the $v$'s are vectors from $S$.

Proposition. The set span(S) is a subspace of V.

Bases and dimension

Definition. We say that a set of vectors
S = {v1, v2, ... , vn}
is linearly independent if and only the equation
c1v1 + c2v2 + ... + cnvn = 0
has only c1 = c2 = ... = cn = 0 as a solution. If it has solutions different from this one, then the set S is said to be linearly dependent.

Definition. A subset B = {v1 ... vn} of V is a basis for V if B spans V and is linearly independent. Equivalently, B is a basis if it is maximally linearly independent; that is, B is not a proper subset of some other linearly independent set. Unless we specifically state otherwise, we will assume that B is ordered.

Theorem. Every basis for V has the same number of vectors as every other basis. This common number is defined to be the dimension of V, dim(V).

Two remarks. If a vector space has arbitrarily large sets of linearly independent vectors, it is infinite dimensional. The subspace containing only $0$ has no linearly independent vectors and is assigned $0$ as its dimension.

Linear transformations and isomorphisms
Definition. A mapping $L$ from a vector space $V$ to a vector space $W$ that preserves addition and scalar multiplication is called a linear transformation. That is $L:V\to W$ is linear if and only if it satisfies $L(c_1v_1+\cdots +c_nv_n) =c_1L(v_1)+\cdots +c_nL(v_n)$.

Definition A linear transformation that is one-to-one and onto (bijective) is said to be an isomorphism.

Proposition. Under appropriate conditions, linear combinations, compositions, and inverses of linear transformations are linear.

The word "appropriate" means that, for example, the composition $K\circ L$ has to be defined. Thus the image of $L$ must be contained in the domain of $K$. Similar conditions need to be placed on the other operations mentioned in the proposition.

Coordinate vectors

Coordinates are used to associate a point in some space with a set of real or complex numbers — think of polar or cartesian coordinates in 2D. The association is at least locally one-to-one and onto. For finite dimensional vector spaces, we not only want the association to be one-to-one and onto, but we also want it to preserve vector addition and multiplication by scalars. In other words, we want an isomorphism between a vector space $V$ and $\mathbb R^n$ or $\mathbb C^n$. The properties defining a basis are exactly the ones needed to define "good" coordinates for a finite dimensional vector space. We start by recalling the following important theorem:

Theorem. Given an ordered basis B = {v1 ... vn} and a vector v, we can uniquely write the vector as v = x1v1 +...+ xnvn, and thus represent it by the column vector [v]B = [x1, ..., xn]T.

As a consequence of this theorem, we can define a bijective map $\Phi$ from an $n$ dimensional vector $V$ to $n\times 1$ columns of scalars. All we need is an ordered basis for $V$. Then, using the previous result, we define the map $\Phi[v] := [v]_B = \left(\begin{array}{c} x_1 \\ \vdots \\ x_n\end{array}\right).$ The inverse is given by $\Phi^{-1}[x_1 \ \cdots \ x_n]^T = x_1v_1+\cdots +x_nv_n$. The map $\Phi$ is easily shown to be linear, so $\Phi$ is an isomorphism between $V$ and $n\times 1$ columns of scalars. We will call $\Phi$ a coordinate map. and the column vector $[v]_B$ a coordinate vector.

Change of coordinates

Often we want to change coordinates. For instance, we do this when we diagonalize a matrix. To see how to do this, we start with two ordered bases B = {v1 ... vn} and D = {w1 ... wn} for an n-dimensional vector space V. In addition, let $\Phi$ and $\Psi$ be coordinate maps for B and D, respectively. Finally, suppose that the coordinates of $v\in V$ relative to B and D are $\Phi(v) = [v]_B = \mathbf x \quad \text{and}\quad \Psi(v) = [v]_D = \mathbf y.$ From this, we see that $\Phi^{-1}(\mathbf x)=v$. Since $\Psi(v)=\mathbf y$, $\Psi(\Phi^{-1}(\mathbf x)) = \Psi(v) =\mathbf y$. Thus the map $\Psi\circ \Phi^{-1}$ changes $\mathbf x$-coordinates to $\mathbf y$ coordinates.

Of course, this doesn't give us an explicit formula for changing from one system to the other. To do that, we first let $\mathbf e_j= (0 \cdots 1 \cdots 0)^T$ be the column vector with 1 in the jth position and zeros elsewhere. Next, write the column vector as a linear combination of $\mathbf e_j$'s, $\mathbf x = \sum_j x_j \mathbf e_j$, and then apply $\Psi\circ \Phi^{-1}$ to get $\mathbf y = \sum_j x_j\Psi\circ \Phi^{-1}(\mathbf e_j)$. By the definition of $\Phi$, we see that $\Phi^{-1}(\mathbf e_j)=0\cdot v_1+\cdots +1\cdot v_j+\cdots 0\cdot v_n=v_j.\,$ Hence, $\mathbf y =\sum_j x_j\Psi(v_j)= \sum_j x_j[v_j]_D=S\mathbf x$. Here $S$ is called the transition matrix and its jth column is $[v_j]_D$, the D-coordinate vector of $v_j$; explicitly, $S= \big[ [v_1]_D\ [v_2]_D \cdots [v_n]_D\big] = \big[\text{D-coordinates of the B-basis}\big].$ What's left to do is to find the entries in $S$. With a little matrix manipulation, we have that $S\mathbf e_j=[v_j]_D=\sum_k S_{kj}\mathbf e_k$. Applying $\Psi^{-1}$ to this equation yields $v_j = \Psi^{-1}[v_j]_D = \Psi^{-1}\big(\sum_k S_{kj}\mathbf e_k\big)= \sum_k S_{kj}\underbrace{\Psi^{-1}(\mathbf e_k)}_{w_k} = \sum_k S_{kj}w_k.$ One final remark. If we take components in $\mathbf y = S\mathbf x$, we get $y_j=\sum_k S_{jk}x_k$. In the formula for $v_j$ above, the row and column indices of $S$ are reversed. This means that the matrix relating the two bases is the transpose of the transition matrix, which relates the two sets of coordinates.

Matrix representations of linear transformations

Let $L:V\to W$ be linear, and suppose that the dimension of $V$ is $n$ and that of $W$ is $m$. Furthermore, we will take B = {v1 ... vn} and D = {w1 ... wm} to be bases for $V$ and $W$, respectively, and we will let $\Phi$ and $\Psi$ be the associated coordinate maps. (Keep in mind that here the coordinate maps are for different vector spaces.) It follows that the linear map $\Psi\circ L \circ \Phi^{-1}$ takes $n\times 1$ columns into $m\times 1$ columns. If $w=L(v)$, $\mathbf x=\Phi(v)$, $\mathbf y = \Psi(w)$, then $\mathbf y = \Psi\circ L \circ \Phi^{-1} (\mathbf x)$. Next, note that $\mathbf x= \sum_k x_k\mathbf e_k$, so $\mathbf y = \Psi\circ L \circ \Phi^{-1} (\mathbf x) = \sum_k x_k \Psi\circ L \circ \Phi^{-1} (\mathbf e_k) = \sum_k x_k \Psi(L(v_k))= \sum_k [L(v_k)]_D x_k = A_L \mathbf x,$ where $A_L =\big[[L(v_1)]_D \ [L(v_2)]_D \cdots [L(v_n)]_D\big]=\big[\text{ D-coordinates of } L(\text{B-basis})\big]$. To summarize, we have constructed a unique matrix $A_L$ such that $w=L(v)$ if and only if $\mathbf y=A_L\mathbf x$. Thus $A_L$ is the matrix representation of $L$ relative to the bases involved.

Dual space

Definition. A linear functional is a linear transformation $\varphi:V\to$ scalars.

Definition. The set $V^\ast$ of all linear functionals is called the (algebraic) dual of V.

Proposition. $V^\ast$ is a vector space under the operations of addition of functions and multiplication of a function by a scalar.

## Inner Products and Norms

### Inner product spaces

In 2D or 3D, vectors have a natural geometry, independent of any coordinate system: every vector has a length; any two vectors have an angle between them. These quantities can be used to define the "dot" product of two vectors, $\mathbf x \cdot \mathbf y=|\mathbf x||\mathbf y| \cos(\theta(\mathbf x,\mathbf y))$. On the other hand, given the dot product, we can obtain the length of a vector $|\mathbf x|=\sqrt{\mathbf x \cdot \mathbf x}$, and also the angle between two vectors, $\theta(\mathbf x,\mathbf y) = \arccos\big(\frac{\mathbf x \cdot \mathbf y}{|\mathbf x| |\mathbf y|}\big)$. If we introduce the usual basis $\{\mathbf i,\mathbf j,\mathbf k\}$, where the vectors are perpendicular and have unit length, then every vector can be represented as $\mathbf x = x_1\mathbf i + x_2\mathbf j + x_3\mathbf k$. From this and the law of cosines we can derive the familiar algebraic formula for the dot product; namely, $\mathbf x \cdot \mathbf y = x_1y_1+x_2y_2+x_3y_3$. As we pointed out above, given this formula, we can find the geometic quantities of length and angle. The point is that we have connected the geometry of 3D with an algebraic formula.

Let's take a closer look at the 3D dot product. From the algebraic formula $\mathbf x \cdot \mathbf y = x_1y_1+x_2y_2+x_3y_3$, it is easy to show that the dot product has four properties: (1) Positivity: $\mathbf x \cdot \mathbf x \ge 0$, with 0 only occurring when $\mathbf x = \mathbf 0$. (2) Symmetry: $\mathbf x \cdot \mathbf y = \mathbf y \cdot \mathbf x$. (3) Homogeneity: $(c\mathbf x) \cdot \mathbf y = c(\mathbf x \cdot \mathbf y)$. (4) Additivity: $( \mathbf x+ \mathbf z) \cdot \mathbf y = \mathbf x\cdot \mathbf y + \mathbf z \cdot \mathbf y$. Even though we used coordinates to get these properties, they hold generally, independent of any coordinate system.

Surprisingly, these properties are sufficient to define "dot products" on other vector spaces, and obtain "geometries" for these spaces. Instead of "dot product", we will use the term "inner product." Let $V$ be a vector space and $\langle \cdot, \cdot\rangle$ a function from $V\times V$ into the real or complex numbers, depending on whether $V$ is real or complex — i.e., the scalars are real or complex.

Definition. We say that $\langle \cdot, \cdot\rangle$ is an inner product on $V$ if the following properties hold:

1. Positivity. $\langle v, v\rangle \ge 0$, with $0$ occurring only if $v = \mathbf 0$.
2. Symmetry or conjugate symmetry. $\langle u, v\rangle \langle v, u\rangle$, for the real case, and $\overline{\langle u, v\rangle} = \langle v, u\rangle$, for the complex case.
3. Homogeneity. $\langle cu, v\rangle = c\langle u, v\rangle$.
4. Additivity. $\langle u + w, v\rangle = \langle u, v\rangle+ \langle w, v\rangle$.
A vector space with an inner product is called an inner product space. We point out that in the complex case, conjugate symmetry and homogeneity imply that $\langle u, cv\rangle = \bar c\langle u, v\rangle$, and we say that the inner product is "linear on the left." In quantum physics, where complex inner products are an integral part of the theory, the inner products are "linear on the right." There are two very important inequalities associated with inner products: Schwarz's inequality and the triangle inequality.

Theorem (Schwarz's inequality). Suppose that a vector space $V$ is equipped with an inner product, $\langle u,v\rangle$. Let $\|u\|:=\sqrt{\langle u,u\rangle}$. Then, $\big|\langle u,v\rangle\big| \le \|u\|\|v\|.$ Proof. If either $u$ or $v$ is $\mathbf 0$, the inequality is trivially true. Thus, we may suppose neither $u$ nor $v$ is $\mathbf 0$. We will suppose that $V$ is complex. The real case is easier. Let $t,\alpha\in \mathbb R$. From the four properties of the inner product, we can show that $0\le \langle u+te^{-i\alpha} v, u+ te^{-i\alpha} v\rangle = \|u\|^2+te^{-i\alpha} \langle u,v\rangle + t\,\overline{e^{-i\alpha} \langle u,v\rangle} + t^2\|v\|^2.$ Using the polar form of a complex number, we can write $\langle u,v\rangle = \big|\langle u,v\rangle\big|e^{i\theta}$. Choose $\alpha = \theta$. Then the previous inequality becomes $p(t):=\|u\|^2+2t\big|\langle u,v\rangle\big| + t^2\|v\|^2\ge 0.$ Because $p(t)$ doesn't go negative, it either has two complex roots or a double real root. In either case, the discriminant for $p$ satisfies $4\big|\langle u,v\rangle\big|^2 - 4\|u\|^2\|v\|^2\le 0$. Schwarz's inequality follows immediately from this. $\square$

Corollary. Equality holds in Schwarz's inequality if and only if $\{u,v\}$ is linearly dependent.

Proof. Exercise.

Theorem (The triangle inequality). Suppose that a vector space $V$ is equipped with an inner product, $\langle u,v\rangle$. Let $\|u\|:=\sqrt{\langle u,u\rangle}$. Then, $\big| \|u\|-\|v\|\big| \le \|u+v\|\le \|u\|+\|v\|.$ Proof. Recall that in the proof of Schwarz's inequality, we used $\|u+te^{-i\alpha} v\|^2=\|u\|^2+te^{-i\alpha} \langle u,v\rangle + t\,\overline{e^{-i\alpha} \langle u,v\rangle} + t^2\|v\|^2. \|u\|^2+te^{-i\alpha} \langle u,v\rangle + t\,\overline{e^{-i\alpha} \langle u,v\rangle} + t^2\|v\|^2.$ If we set $t=1$, $\alpha =0$, use the identity $z+\bar z=2\text{Re}(z)$, the inequality $\text{Re}(z)\le |z|$ and Schwarz's inequality, we get $\|u+v\|^2 \le \|u\|^2 + 2\big|\langle u,v\rangle\big|+\|v\|^2\le \big( \|u\|+\|v\| \big)^2.$ Taking square roots gives the right side of the triangle inequality. Similarly, we have $\|u+v\|^2 \ge \|u\|^2 - 2\big|\langle u,v\rangle\big|+\|v\|^2\ge \big( \|u\|-\|v\| \big)^2.$ Again taking square roots yields the left side. $\square$

The triangle inequality is one of the essential properties of length. The sum of the lengths of two sides of a (nondegenaerate) triangle is greater than length of the third side. In a real vector space, we can also define the angle between two vectors. Suppose that neither $u$ nor $v$ is $\mathbf 0$. Schwarz's inequality implies that we have $-1 \le \frac{\langle u,v\rangle}{\|u\|\|v\|}\le 1.$ Thus we may define the angle between $u$ and $v$ to be $\theta(u,v) = \arccos\bigg(\frac{\langle u,v\rangle}{\|u\|\|v\|}\bigg).$

When $\langle u,v\rangle = 0$, the vectors are orthogonal (perpendicular) to each other. We will say this is true even if one or both or $u$ and $v$ are $\mathbf 0$. In the complex case, the concept of angle between two vectors is not so important, except when $\langle u,v\rangle = 0$. When this happens we will also say that $u$ and $v$ are orthogonal.

Standard inner products. Here is a list of a few standard inner product spaces.

1. $\mathbf R^n$ —   $\langle x,y\rangle = \sum_{j=1}^n x_jy_j = y^Tx$.
2. $\mathbf C^n$ —   $\langle x,y\rangle = \sum_{j=1}^n x_j\overline{y_j} = y^\ast x$.
3. Real $L^2[a,b]$ —   $\langle f,g\rangle = \int_a^b f(x)g(x)dx$
4. Complex $L^2[a,b]$ —   $\langle f,g\rangle = \int_a^b f(x)\overline{g(x)}dx$
5. $2\pi$-periodic $L^2$ functions —   $\langle f,g\rangle = \int_0^{2\pi} f(x)\overline{g(x)}dx$
6. Weighted $L^2$ inner products (real) —   $\langle f,g\rangle = \int_a^b f(x)g(x)w(x)dx$, where $w(x)>0$.
7. Real Sobolev space for differentiable functions —   $\langle f,g\rangle = \int_a^b \big(f(x)g(x)+ f'(x)g'(x)\big)dx$.
8. Let $A$ be an $n\times n$ matrix with real entries. Suppose that $A$ is selfadjoint and that $x^TAx>0$, unless $x=\mathbf 0$ —   $\langle x,y\rangle = y^TAx$.
9. Biinfinite complex sequences, $\ell^2$ —   $\langle x,y\rangle = \sum_{n=-\nfty}^\infty x_n \overline{y_n}$.
Example. Let $f,g$ be real values functions in $C[0,1]$ and define $\langle f,g\rangle = \int_0^1 f(x)g(x)dx$. Show that $\langle f,g\rangle$ is a real inner product for $C[0,1]$.

Solution. Symmetry, homogeneity and additivity are all simple consequences of the properties of the integral. Thus, we only need to show positivity. The definition of the Riemann integral implies that $\langle f,f\rangle = \int_0^1 f^2(x)dx\ge 0$, so what remains is showing that the only function $f\in C[0,1]$ for which $\langle f,f\rangle=0$ is $f\equiv 0$.

Suppose this is false. Then there is an $f\in C[0,1]$ such that $\int_0^1 f^2(x)dx=0$ and there is also an $x_0\in [0,1]$ for which $f(x_0) \ne 0$. Let $F=f^2$. Products of continuous functions are continuous, so $F\in C[0,1]$. Also, $F(x_0) = (f(x_0))^2>0$. Using a continuity argument, one can show that there is a closed interval $[a,b]\subseteq [0,1]$ that contains $x_0$ and on which $F(x)\ge \frac12 F(x_0)$. (Exercise.) Consequently, $\int_0^1 f^2(x)dx=\int_0^1 F(x)dx \ge \int_a^b F(x)dx \ge \frac12 F(x_0)(b-a)>0,$ which contradicts the assumption that $\int_0^1 f^2(x)dx=0$. Consequently, positivity holds. $\square$

Orthogonality

We will begin with a few definitions. In an inner product space $V$, we say that a (possibly infinite) set of vectors $S= \{v_1,\ldots,v_n, \ldots\}$ is orthogonal if and only if (i) none of the vectors are $\mathbf 0$ and (ii) $\langle v_j,v_k\rangle =0$ for all $j\ne k$. Part (i) excludes $\mathbf 0$ from the set. We do this to avoid having to write the phrase "orthogonal set of nonzero vectors." However, be aware that some authors do allow for including $\mathbf 0$. It is easy to see that an orthogonal set of vectors is linearly independent. We will frequently use normalized sets of orthogonal vectors. An orthogonal set is termed orthonormal (o.n.) if all of the vectors in it have unit length; that is, $\langle v_j,v_k\rangle =\delta_{j,k}$. We say that two subspaces of $V$, $U$ and $W$, are orthogonal if and only if all of the vectors in $U$ are orthogonal to all of the vectors in $W$. When this happens we write $U\perp W$. Finally, we define the orthogonal complement of $U$ in $V$ to be $U^\perp := \{v\in V\colon \langle v,u\rangle = 0 \ \forall\ u\in U\}$.

Minimization problems. A common way of fitting data, either discrete or continuous, is least-squares minimization. The familiar straight line fit to a set of data is a good example of this technique and we will discuss it briefly. Suppose that we have collected data $\{y_j\in \mathbb R,\ j=1,\ldots, n\}$ at times $\{t_1,\ldots,t_n\}$. To get a good straight line that $y(t)=a+bt$ that fits the data, we choose the intecept and slope to to minimize the sum of the squares of $y_j-y(t_j)= y_j - a -bt_j$. Specifically, we will minimize over all $a$, $b$ the quantity $D^2 = \sum_{j=1}^n(y_j- a -bt_j)^2$. We can put this problem in terms of $\mathbb R^n$. Let $\mathbf y= [y_1\ y_2 \ \cdots \ y_n]^T$, $\mathbf 1= [1\ 1\ \cdots \ 1]^T$, and $\mathbf = [t_1\ t_2 \ \cdots \ t_n]^T$. In this notation, we have $D^2 = \|\mathbf y - a\mathbf 1 - b \mathbf t\|^2$. Next, let $U = \text{span}\{\mathbf 1,\mathbf t \}=\{a\mathbf 1 + b\mathbf t,\ \forall \ a,b\in \mathbb R\}$. Using this notation, we can thus recast the problem in its final form: Find $p\in U$ such that $\min_{a,b} D=:\text{dist}(Y,U) = \|Y - p\|=\min_{u\in U}\|Y-u\|.$ As you have shown in exercises 1.3 and 1.4, this problem has a unique solution that can be found from the normal equations, which in matrix form are $\left(\begin{array}{c} a\\ b\end{array}\right) = \left(\begin{array}{cc} \mathbf 1^T\mathbf 1 & \mathbf 1^T \mathbf t\\ \mathbf t^T\mathbf 1 & \mathbf t^T \mathbf t \end{array}\right)^{-1}\mathbf y.$ Stated another way, the solution $\mathbf p = P\mathbf y$, where $P$ is the orhogonal projection of $\mathbf y$ onto $U$.

The general "least squares" minimization problem is this: Given an inner product space $V$, a vector $v\in V$, and a subspace $U\subset V$, find $p\in U$ such that $\|v - p\|=\min_{u\in U}\|v - u\|$. By the exercises 1.3 and 1.4, a solution $p$ exists for every $v\in V$ if and only if there is a vector $p\in U$ such that $v-p\in U^\perp$. When this happens $p$ is unique and $p=Pv$, where as before $P$ is the orthogonal projection of $v$ onto $u$. In particular, when $U$ is finite dimensional, this is always true. Furthermore, if $B=\{u_1,\ldots,u_n\}$ is an otrhonormal (o.n) basis for $U$, then by exercise 1.4(c), the formula for $Pv$ is especially simple: $Pv = \sum_{j=1}^n \langle v,u_j\rangle u_j.$ Gram-Schmidt process. Unlike the situation for fitting data, we don't need to invert the matrix $G$. This raises two questions: Does an o.n. basis exist for an inner product space and, if so, how can it be found?

We will deal with a finite dimensional space having a basis $B=\{v_1,\ldots,v_n\}$. Our aim will be to produce an orthogonal basis for the space. This can be converted to an o.n. basis by simply dividing each vector by its length. To begin, define the spaces $U_k=\text{span}\{v_1,\ldots,v_k\}$, $k=1,\ldots,n$. Let $w_1=v_1$. Next, let $w_2=v_2 - P_1v_2=v_2 - \frac{\langle v_2,w_1\rangle}{\|w_1\|^2}w_1$, where $P_1$ is the orthogonal projection onto $U_1$. An easy computaion shows that $w_2$ is orthogonal to $w_1$ and, consequently, $\{w_1,w_2\}$ is an orthogonal basis for $U_2$. Similarly, we let $w_3 = v_2 - P_2v_3 = v_3 - \frac{\langle v_3,w_1\rangle}{\|w_1\|^2}w_1 -\frac{\langle v_3,w_2\rangle}{\|w_2\|^2}w_2$. As before $w_3$ is orthogonal to $w_!,w_2$ and $\{w_1,w_2, w_3\}$ is an orthogonal basis for $U_3$. We can continue in this way. Let $w_k = v_k - P_{k-1}v_k$. It follows that $w_k$ is orthogonal to $U_{k-1}$, so $U_k$ has the orthogonal basis $\{w_1,\ldots, w_k\}$. Eventually, we obtain an orthogonal basis for $V$, $\{w_1,\ldots,w_n\}$.

QR-factorization. One important application of the Gram-Schmidt is the QR factorization of a matrix. Let $A$ be n×n matrix, which we will assume to be real and to have linearly independent columns $\{\mathbf v_1,\ldots, \mathbf v_n\}$. Then there exists a matrix $Q$ whose columns form an o.n. basis for $\mathbb R^n$ — i.e., an orthogonal matrix — and an upper triangular matrix $R$, with positive diagonal entries, such that $A=QR$.

To prove this, note that we are dealing with three different bases of $\mathbb R^n$: (1) the standard basis $E=\{\mathbf e_1,\ldots,\mathbf e_n\}$; (2) the basis of columns $F=\{\mathbf v_1,\ldots, \mathbf v_n\}$; and (3), the basis of orthonormal vectors $G=\{\mathbf q_1,\ldots,\mathbf q_n\}$ obtained via the Gram-Schmidt process. The matrix $A$ is actually the transition matrix from $F$ to $E$ coordinates, because it has the form $A=[\text{E-coordinates of the F-basis}]$. Let $Q:=[\mathbf q_1\ \cdots \ \mathbf q_n]$. As is the case with $A$, $Q$ is the transition matrix from $G$ to $E$ coordinates. Assuming $A=QR$, we would have $R=Q^{-1}A$. Since $A$ takes $F$ to $E$ and $Q^{-1}$ takes $E$ to $G$, the matrix $R$ is the transition matrix that takes $F$ to $G$. Thus $R=[\text{G-coordinates of the F-basis}]$.

We need the columns of $R$. Each will have the form $[\mathbf a_k]_G$. Finding these requires a careful look at the Gram-Schmidt process itself. The $\mathbf q_k$'s are obtained from the $\mathbf a_k$'s by first getting the orthogonal set $\mathbf w_k$, and then normalizing. Gramm-Schmidt gives us the $\mathbf w_k$'s via the formula $\mathbf w_k = \mathbf v_k - \sum_{j=1}^{k-1}\frac{\langle \mathbf v_k, \mathbf w_j\rangle}{\|\mathbf w_j\|^2}w_j$ This formula can be re-written to give us $\mathbf v_k$ in terms of the $\mathbf w_k$'s: $\mathbf v_k = \mathbf w_k + \sum_{j=1}^{k-1}\frac{\langle \mathbf v_k, \mathbf w_j\rangle}{\|\mathbf w_j\|^2}w_j$ We can now normalize the set using $\mathbf q_j = \mathbf w_j/\| \mathbf w_j\|$. Doing so yields $\mathbf v_k = \|\mathbf w_k\|\mathbf q_k + \sum_{j=1}^{k-1}\langle \mathbf v_k, \mathbf q_j\rangle\mathbf q_j.$ From this, it follows that the entries of the transition matrix are $R_{j,k}=\langle \mathbf v_k, \mathbf q_j \rangle$, if $j < k$; $\ R_{k,k}=\|w_k\|$ when $j=k$; and $R_{j,k} = 0$ for $j>k$. This matrix is upper triangular. Specifically, it has the form $R = \left( \begin{array}{cccc} R_{11} & R_{12} & \cdots & R_{1n} \\ 0 & R_{22} & \cdots & R_{2n}\\ \vdots & \vdots &\ddots & \vdots \\ 0 & 0 & \cdots &R_{nn} \end{array} \right)\ .$

### Normed linear spaces

Inner product spaces have a geometry that involves both angles and distance. In many cases, it's useful to simply deal with a geometry only involving distance. This leads us to define the concept of a norm.

Definition Let $V$ be a vector space. A mapping $\| \cdot \| : V \to [0.\infty)$ is said to be a norm on $V$ if it satisfies these properties:

1. Positivity. $\|v\|\ge 0$, with $0$ occurring only if $v = \mathbf 0$.
2. Positive homogeneity. $\| v\|= c\| v\|$.
3. Triangle inequality. $\|u+v\| \le \|u\|+ \|v\|$.
There are many examples of normed linear spaces. Below we list a few of these.
• Any inner product space — $\|v\|=\sqrt{\langle v,v\rangle }$.
• $\mathbb R^n$ or $\mathbb C^n$ — $\|x\|_p := \big(\sum_{j=1}^n |x_j|^P\big)^{1/p}$, where $1\le p <\infty$. For $p=\infty$, $\|x\|_\infty := \max_{1\lej \le n}|x_j|$.
• Continuous functions on an interval $[a,b]$ — $\|f\|_C = \max_{a\le x\le b}|f(x)|$.
• $k$-times continuously differentiable functions — $\|f\|_{C^k} = \sum_{j=0}^k \|f^{(j)}\|_C$. (There are other equivalent norms.)
• $L^p$ spaces on the interval $a\le x \le b$, for $1\le p <\infty$ — $\|f\|_{L^p}: = \big(\int_a^b |f(x|^pdx \big)^{1/p}$.   For $p=\infty$, $\|f\|_{L^\infty} := \text{essential-sup}_{a\le x \le b}|f(x)|$.  These spaces involve the theory of the Lebesgue integral and will be discussed later in the course.
Norms on dual spaces. For the moment, we will deal only with finite dimensional spaces. Infinite dimensional spaces are more delicate and will be treated later. Recall that the dual space of a vector space $V$ is the set of all linear functionals on $V$ and it is denoted by $V^\ast$. If $V$ has a norm $\| \cdot\|_V$, then one can show that the following quantity defines a norm for $V^*$. Let $\varphi\in V^*$. Then we define $\| \varphi\|_{V^*} = \sup_{\|v\|_V=1}\big|\varphi(v)\big|.$ Equivalently, we can define the dual-space norm as the smallest constant $K$ such that $\| \varphi\|_{V^*} \le K\|v\|_V$. Verifying that $\| \cdot \|_{V^*}$ is a norm is left as an exercise. We close this section by mentioning two facts: first, that ${\ell ^p}^* = \ell^q$, where $1/p+1/q = 1$, $1\le p\le \infty$; and second, that for all finite dimensional spaces and many (but not all!) infinite dimensional spaces, $V^{**}=V$.

## Spectral Theory for Linear Transformations

Spectral theory concerns eigenvalues, eigenvectors, and diagonal representations of linear transformations. We will begin our discussion with diagonalization. This topic is covered in my Notes on Diagonalization. The topic of diagonalization of self adjoint matrices is dealt with in another set of notes, Notes on Self Adjoint Linear Transformations.

Updated 9/16/13 (fjn).