Definition. A vector space is a set V together with two operations, + and × . If u, v are in V, then u + v is in V; if c is a scalar (a real or complex number for us), then c×v is in V. The operations satisfy the following rules.
Addition | Multiplication by a scalar | |
---|---|---|
u + (v + w) = (u + v) + w | a(b×u) = (ab)×u | |
Identity: u + 0 = 0 + u = u | (a + b)×u = a× u + b×u | |
Inverse: u + (-u) = (-u) + u = 0 | a×(u + v) = a×u + a×v | |
u + v = v + u | 1×u = u |
The vector spaces that we will deal with here include the familiar finite dimensional spaces $\mathbb R^n$, $\mathbb C^n$, the space of $k$ times continuously differentiable functions $C^k$, polynomials of degree $n$ or less $P_n$, various spaces of Lebesgue integrable functions $L^p$, sequence spaces $\ell^p$, and other spaces that we will introduce later. At this point, it's a good idea to review some standard definitions and theorems that come up in linear algebra.
Subspaces
Span
Bases and dimension
Two remarks. If a vector space has arbitrarily large sets of linearly independent vectors, it is infinite dimensional. The subspace containing only $0$ has no linearly independent vectors and is assigned $0$ as its dimension.
The word "appropriate" means that, for example, the composition $K\circ L$ has to be defined. Thus the image of $L$ must be contained in the domain of $K$. Similar conditions need to be placed on the other operations mentioned in the proposition.
Coordinate vectors
Change of coordinates
Of course, this doesn't give us an explicit formula for changing from one system to the other. To do that, we first let $\mathbf e_j= (0 \cdots 1 \cdots 0)^T$ be the column vector with 1 in the j^{th} position and zeros elsewhere. Next, write the column vector as a linear combination of $\mathbf e_j$'s, $\mathbf x = \sum_j x_j \mathbf e_j$, and then apply $\Psi\circ \Phi^{-1}$ to get $\mathbf y = \sum_j x_j\Psi\circ \Phi^{-1}(\mathbf e_j)$. By the definition of $\Phi$, we see that $\Phi^{-1}(\mathbf e_j)=0\cdot v_1+\cdots +1\cdot v_j+\cdots 0\cdot v_n=v_j.\,$ Hence, $\mathbf y =\sum_j x_j\Psi(v_j)= \sum_j x_j[v_j]_D=S\mathbf x$. Here $S$ is called the transition matrix and its j^{th} column is $[v_j]_D$, the D-coordinate vector of $v_j$; explicitly, \[ S= \big[ [v_1]_D\ [v_2]_D \cdots [v_n]_D\big] = \big[\text{D-coordinates of the B-basis}\big]. \] What's left to do is to find the entries in $S$. With a little matrix manipulation, we have that $S\mathbf e_j=[v_j]_D=\sum_k S_{kj}\mathbf e_k$. Applying $\Psi^{-1}$ to this equation yields \[ v_j = \Psi^{-1}[v_j]_D = \Psi^{-1}\big(\sum_k S_{kj}\mathbf e_k\big)= \sum_k S_{kj}\underbrace{\Psi^{-1}(\mathbf e_k)}_{w_k} = \sum_k S_{kj}w_k. \] One final remark. If we take components in $\mathbf y = S\mathbf x$, we get $ y_j=\sum_k S_{jk}x_k$. In the formula for $v_j$ above, the row and column indices of $S$ are reversed. This means that the matrix relating the two bases is the transpose of the transition matrix, which relates the two sets of coordinates.
Dual space
Let's take a closer look at the 3D dot product. From the algebraic formula $\mathbf x \cdot \mathbf y = x_1y_1+x_2y_2+x_3y_3$, it is easy to show that the dot product has four properties: (1) Positivity: $\mathbf x \cdot \mathbf x \ge 0$, with 0 only occurring when $\mathbf x = \mathbf 0 $. (2) Symmetry: $\mathbf x \cdot \mathbf y = \mathbf y \cdot \mathbf x$. (3) Homogeneity: $(c\mathbf x) \cdot \mathbf y = c(\mathbf x \cdot \mathbf y)$. (4) Additivity: $( \mathbf x+ \mathbf z) \cdot \mathbf y = \mathbf x\cdot \mathbf y + \mathbf z \cdot \mathbf y$. Even though we used coordinates to get these properties, they hold generally, independent of any coordinate system.
Surprisingly, these properties are sufficient to define "dot products" on other vector spaces, and obtain "geometries" for these spaces. Instead of "dot product", we will use the term "inner product." Let $V$ be a vector space and $\langle \cdot, \cdot\rangle$ a function from $V\times V$ into the real or complex numbers, depending on whether $V$ is real or complex — i.e., the scalars are real or complex.
Definition. We say that $\langle \cdot, \cdot\rangle$ is an inner product on $V$ if the following properties hold:
Theorem (Schwarz's inequality). Suppose that a vector space $V$ is equipped with an inner product, $\langle u,v\rangle$. Let $\|u\|:=\sqrt{\langle u,u\rangle}$. Then, \[ \big|\langle u,v\rangle\big| \le \|u\|\|v\|. \] Proof. If either $u$ or $v$ is $\mathbf 0$, the inequality is trivially true. Thus, we may suppose neither $u$ nor $v$ is $\mathbf 0$. We will suppose that $V$ is complex. The real case is easier. Let $t,\alpha\in \mathbb R$. From the four properties of the inner product, we can show that \[ 0\le \langle u+te^{-i\alpha} v, u+ te^{-i\alpha} v\rangle = \|u\|^2+te^{-i\alpha} \langle u,v\rangle + t\,\overline{e^{-i\alpha} \langle u,v\rangle} + t^2\|v\|^2. \] Using the polar form of a complex number, we can write $\langle u,v\rangle = \big|\langle u,v\rangle\big|e^{i\theta}$. Choose $\alpha = \theta$. Then the previous inequality becomes \[ p(t):=\|u\|^2+2t\big|\langle u,v\rangle\big| + t^2\|v\|^2\ge 0. \] Because $p(t)$ doesn't go negative, it either has two complex roots or a double real root. In either case, the discriminant for $p$ satisfies $4\big|\langle u,v\rangle\big|^2 - 4\|u\|^2\|v\|^2\le 0$. Schwarz's inequality follows immediately from this. $\square$
Corollary. Equality holds in Schwarz's inequality if and only if $\{u,v\}$ is linearly dependent.
Proof. Exercise.
Theorem (The triangle inequality). Suppose that a vector space $V$ is equipped with an inner product, $\langle u,v\rangle$. Let $\|u\|:=\sqrt{\langle u,u\rangle}$. Then, \[ \big| \|u\|-\|v\|\big| \le \|u+v\|\le \|u\|+\|v\|. \] Proof. Recall that in the proof of Schwarz's inequality, we used \[ \|u+te^{-i\alpha} v\|^2=\|u\|^2+te^{-i\alpha} \langle u,v\rangle + t\,\overline{e^{-i\alpha} \langle u,v\rangle} + t^2\|v\|^2. \|u\|^2+te^{-i\alpha} \langle u,v\rangle + t\,\overline{e^{-i\alpha} \langle u,v\rangle} + t^2\|v\|^2. \] If we set $t=1$, $\alpha =0$, use the identity $z+\bar z=2\text{Re}(z)$, the inequality $\text{Re}(z)\le |z|$ and Schwarz's inequality, we get \[ \|u+v\|^2 \le \|u\|^2 + 2\big|\langle u,v\rangle\big|+\|v\|^2\le \big( \|u\|+\|v\| \big)^2. \] Taking square roots gives the right side of the triangle inequality. Similarly, we have \[ \|u+v\|^2 \ge \|u\|^2 - 2\big|\langle u,v\rangle\big|+\|v\|^2\ge \big( \|u\|-\|v\| \big)^2. \] Again taking square roots yields the left side. $\square$
The triangle inequality is one of the essential properties of length. The sum of the lengths of two sides of a (nondegenaerate) triangle is greater than length of the third side. In a real vector space, we can also define the angle between two vectors. Suppose that neither $u$ nor $v$ is $\mathbf 0$. Schwarz's inequality implies that we have \[ -1 \le \frac{\langle u,v\rangle}{\|u\|\|v\|}\le 1. \] Thus we may define the angle between $u$ and $v$ to be \[ \theta(u,v) = \arccos\bigg(\frac{\langle u,v\rangle}{\|u\|\|v\|}\bigg). \]
When $\langle u,v\rangle = 0$, the vectors are orthogonal (perpendicular) to each other. We will say this is true even if one or both or $u$ and $v$ are $\mathbf 0$. In the complex case, the concept of angle between two vectors is not so important, except when $\langle u,v\rangle = 0$. When this happens we will also say that $u$ and $v$ are orthogonal.
Standard inner products. Here is a list of a few standard inner product spaces.
Solution. Symmetry, homogeneity and additivity are all simple consequences of the properties of the integral. Thus, we only need to show positivity. The definition of the Riemann integral implies that $\langle f,f\rangle = \int_0^1 f^2(x)dx\ge 0$, so what remains is showing that the only function $f\in C[0,1]$ for which $\langle f,f\rangle=0$ is $f\equiv 0$.
Suppose this is false. Then there is an $f\in C[0,1]$ such that $\int_0^1 f^2(x)dx=0$ and there is also an $x_0\in [0,1]$ for which $f(x_0) \ne 0$. Let $F=f^2$. Products of continuous functions are continuous, so $F\in C[0,1]$. Also, $F(x_0) = (f(x_0))^2>0$. Using a continuity argument, one can show that there is a closed interval $[a,b]\subseteq [0,1]$ that contains $x_0$ and on which $F(x)\ge \frac12 F(x_0)$. (Exercise.) Consequently, \[ \int_0^1 f^2(x)dx=\int_0^1 F(x)dx \ge \int_a^b F(x)dx \ge \frac12 F(x_0)(b-a)>0, \] which contradicts the assumption that $\int_0^1 f^2(x)dx=0$. Consequently, positivity holds. $\square$
Orthogonality
We will begin with a few definitions. In an inner product space $V$, we say that a (possibly infinite) set of vectors $S= \{v_1,\ldots,v_n, \ldots\}$ is orthogonal if and only if (i) none of the vectors are $\mathbf 0$ and (ii) $\langle v_j,v_k\rangle =0$ for all $j\ne k$. Part (i) excludes $\mathbf 0$ from the set. We do this to avoid having to write the phrase "orthogonal set of nonzero vectors." However, be aware that some authors do allow for including $\mathbf 0$. It is easy to see that an orthogonal set of vectors is linearly independent. We will frequently use normalized sets of orthogonal vectors. An orthogonal set is termed orthonormal (o.n.) if all of the vectors in it have unit length; that is, $\langle v_j,v_k\rangle =\delta_{j,k}$. We say that two subspaces of $V$, $U$ and $W$, are orthogonal if and only if all of the vectors in $U$ are orthogonal to all of the vectors in $W$. When this happens we write $U\perp W$. Finally, we define the orthogonal complement of $U$ in $V$ to be $U^\perp := \{v\in V\colon \langle v,u\rangle = 0 \ \forall\ u\in U\}$.
Minimization problems. A common way of fitting data, either discrete or continuous, is least-squares minimization. The familiar straight line fit to a set of data is a good example of this technique and we will discuss it briefly. Suppose that we have collected data $\{y_j\in \mathbb R,\ j=1,\ldots, n\}$ at times $\{t_1,\ldots,t_n\}$. To get a good straight line that $y(t)=a+bt$ that fits the data, we choose the intecept and slope to to minimize the sum of the squares of $y_j-y(t_j)= y_j - a -bt_j$. Specifically, we will minimize over all $a$, $b$ the quantity $D^2 = \sum_{j=1}^n(y_j- a -bt_j)^2$. We can put this problem in terms of $\mathbb R^n$. Let $\mathbf y= [y_1\ y_2 \ \cdots \ y_n]^T$, $\mathbf 1= [1\ 1\ \cdots \ 1]^T$, and $\mathbf = [t_1\ t_2 \ \cdots \ t_n]^T$. In this notation, we have $D^2 = \|\mathbf y - a\mathbf 1 - b \mathbf t\|^2$. Next, let $U = \text{span}\{\mathbf 1,\mathbf t \}=\{a\mathbf 1 + b\mathbf t,\ \forall \ a,b\in \mathbb R\}$. Using this notation, we can thus recast the problem in its final form: Find $p\in U$ such that \[ \min_{a,b} D=:\text{dist}(Y,U) = \|Y - p\|=\min_{u\in U}\|Y-u\|. \] As you have shown in exercises 1.3 and 1.4, this problem has a unique solution that can be found from the normal equations, which in matrix form are \[ \left(\begin{array}{c} a\\ b\end{array}\right) = \left(\begin{array}{cc} \mathbf 1^T\mathbf 1 & \mathbf 1^T \mathbf t\\ \mathbf t^T\mathbf 1 & \mathbf t^T \mathbf t \end{array}\right)^{-1}\mathbf y. \] Stated another way, the solution $\mathbf p = P\mathbf y$, where $P$ is the orhogonal projection of $\mathbf y$ onto $U$.
The general "least squares" minimization problem is this: Given an inner product space $V$, a vector $v\in V$, and a subspace $U\subset V$, find $p\in U$ such that $\|v - p\|=\min_{u\in U}\|v - u\|$. By the exercises 1.3 and 1.4, a solution $p$ exists for every $v\in V$ if and only if there is a vector $p\in U$ such that $v-p\in U^\perp$. When this happens $p$ is unique and $p=Pv$, where as before $P$ is the orthogonal projection of $v$ onto $u$. In particular, when $U$ is finite dimensional, this is always true. Furthermore, if $B=\{u_1,\ldots,u_n\}$ is an otrhonormal (o.n) basis for $U$, then by exercise 1.4(c), the formula for $Pv$ is especially simple: \[ Pv = \sum_{j=1}^n \langle v,u_j\rangle u_j. \] Gram-Schmidt process. Unlike the situation for fitting data, we don't need to invert the matrix $G$. This raises two questions: Does an o.n. basis exist for an inner product space and, if so, how can it be found?
We will deal with a finite dimensional space having a basis $B=\{v_1,\ldots,v_n\}$. Our aim will be to produce an orthogonal basis for the space. This can be converted to an o.n. basis by simply dividing each vector by its length. To begin, define the spaces $U_k=\text{span}\{v_1,\ldots,v_k\}$, $k=1,\ldots,n$. Let $w_1=v_1$. Next, let $w_2=v_2 - P_1v_2=v_2 - \frac{\langle v_2,w_1\rangle}{\|w_1\|^2}w_1$, where $P_1$ is the orthogonal projection onto $U_1$. An easy computaion shows that $w_2$ is orthogonal to $w_1$ and, consequently, $\{w_1,w_2\}$ is an orthogonal basis for $U_2$. Similarly, we let $w_3 = v_2 - P_2v_3 = v_3 - \frac{\langle v_3,w_1\rangle}{\|w_1\|^2}w_1 -\frac{\langle v_3,w_2\rangle}{\|w_2\|^2}w_2$. As before $w_3$ is orthogonal to $w_!,w_2$ and $\{w_1,w_2, w_3\}$ is an orthogonal basis for $U_3$. We can continue in this way. Let $w_k = v_k - P_{k-1}v_k$. It follows that $w_k$ is orthogonal to $U_{k-1}$, so $U_k$ has the orthogonal basis $\{w_1,\ldots, w_k\}$. Eventually, we obtain an orthogonal basis for $V$, $\{w_1,\ldots,w_n\}$.
QR-factorization. One important application of the Gram-Schmidt is the QR factorization of a matrix. Let $A$ be n×n matrix, which we will assume to be real and to have linearly independent columns $\{\mathbf v_1,\ldots, \mathbf v_n\}$. Then there exists a matrix $Q$ whose columns form an o.n. basis for $\mathbb R^n$ — i.e., an orthogonal matrix — and an upper triangular matrix $R$, with positive diagonal entries, such that $A=QR$.
To prove this, note that we are dealing with three different bases of $\mathbb R^n$: (1) the standard basis $E=\{\mathbf e_1,\ldots,\mathbf e_n\}$; (2) the basis of columns $F=\{\mathbf v_1,\ldots, \mathbf v_n\}$; and (3), the basis of orthonormal vectors $G=\{\mathbf q_1,\ldots,\mathbf q_n\}$ obtained via the Gram-Schmidt process. The matrix $A$ is actually the transition matrix from $F$ to $E$ coordinates, because it has the form $A=[\text{E-coordinates of the F-basis}]$. Let $Q:=[\mathbf q_1\ \cdots \ \mathbf q_n]$. As is the case with $A$, $Q$ is the transition matrix from $G$ to $E$ coordinates. Assuming $A=QR$, we would have $R=Q^{-1}A$. Since $A$ takes $F$ to $E$ and $Q^{-1}$ takes $E$ to $G$, the matrix $R$ is the transition matrix that takes $F$ to $G$. Thus $R=[\text{G-coordinates of the F-basis}]$.
We need the columns of $R$. Each will have the form $[\mathbf a_k]_G$. Finding these requires a careful look at the Gram-Schmidt process itself. The $\mathbf q_k$'s are obtained from the $\mathbf a_k$'s by first getting the orthogonal set $\mathbf w_k$, and then normalizing. Gramm-Schmidt gives us the $\mathbf w_k$'s via the formula \[ \mathbf w_k = \mathbf v_k - \sum_{j=1}^{k-1}\frac{\langle \mathbf v_k, \mathbf w_j\rangle}{\|\mathbf w_j\|^2}w_j \] This formula can be re-written to give us $\mathbf v_k$ in terms of the $\mathbf w_k$'s: \[ \mathbf v_k = \mathbf w_k + \sum_{j=1}^{k-1}\frac{\langle \mathbf v_k, \mathbf w_j\rangle}{\|\mathbf w_j\|^2}w_j \] We can now normalize the set using $\mathbf q_j = \mathbf w_j/\| \mathbf w_j\|$. Doing so yields \[ \mathbf v_k = \|\mathbf w_k\|\mathbf q_k + \sum_{j=1}^{k-1}\langle \mathbf v_k, \mathbf q_j\rangle\mathbf q_j. \] From this, it follows that the entries of the transition matrix are $R_{j,k}=\langle \mathbf v_k, \mathbf q_j \rangle $, if $j < k$; $\ R_{k,k}=\|w_k\|$ when $j=k$; and $ R_{j,k} = 0$ for $j>k$. This matrix is upper triangular. Specifically, it has the form \[ R = \left( \begin{array}{cccc} R_{11} & R_{12} & \cdots & R_{1n} \\ 0 & R_{22} & \cdots & R_{2n}\\ \vdots & \vdots &\ddots & \vdots \\ 0 & 0 & \cdots &R_{nn} \end{array} \right)\ . \]
Definition Let $V$ be a vector space. A mapping $ \| \cdot \| : V \to [0.\infty)$ is said to be a norm on $V$ if it satisfies these properties:
Updated 9/16/13 (fjn).