Let's take a closer look at the 3D dot product. From the algebraic formula $\mathbf x \cdot \mathbf y = x_1y_1+x_2y_2+x_3y_3$, it is easy to show that the dot product has four properties: (1) Positivity: $\mathbf x \cdot \mathbf x \ge 0$, with 0 only occurring when $\mathbf x = \mathbf 0 $. (2) Symmetry: $\mathbf x \cdot \mathbf y = \mathbf y \cdot \mathbf x$. (3) Homogeneity: $(c\mathbf x) \cdot \mathbf y = c(\mathbf x \cdot \mathbf y)$. (4) Additivity: $( \mathbf x+ \mathbf z) \cdot \mathbf y = \mathbf x\cdot \mathbf y + \mathbf z \cdot \mathbf y$. Even though we used coordinates to get these properties, they hold generally, independent of any coordinate system.

Surprisingly, these properties are sufficient to define "dot products" on other vector spaces, and obtain "geometries" for these spaces. Instead of "dot product", we will use the term "inner product." Let $V$ be a vector space and $\langle \cdot, \cdot\rangle$ a function from $V\times V$ into the real or complex numbers, depending on whether $V$ is real or complex — i.e., the scalars are real or complex.

**Definition**. We say that $\langle \cdot, \cdot\rangle$ is an
inner product on $V$ if the following properties hold:

- Positivity. $\langle v, v\rangle \ge 0$, with $0$ occurring only if $v = \mathbf 0$.
- Symmetry or conjugate symmetry. $\langle u, v\rangle= \langle v, u\rangle$, for the real case, and $\overline{\langle u, v\rangle} = \langle v, u\rangle$, for the complex case.
- Homogeneity. $\langle cu, v\rangle = c\langle u, v\rangle$.
- Additivity. $\langle u + w, v\rangle = \langle u, v\rangle+ \langle w, v\rangle$.

**Theorem (Schwarz's inequality)**. Suppose that a vector space $V$
is equipped with an inner product, $\langle u,v\rangle$. Let
$\|u\|:=\sqrt{\langle u,u\rangle}$. Then,
\[
\big|\langle u,v\rangle\big| \le \|u\|\|v\|.
\]
**Proof**. If either $u$ or $v$ is $\mathbf 0$, the inequality is
trivially true. Thus, we may suppose neither $u$ nor $v$ is $\mathbf
0$. We will suppose that $V$ is complex. The real case is
easier. Let $t,\alpha\in \mathbb R$. From the
four properties of the inner product, we can show that
\[ 0\le \langle u+te^{i\alpha} v, u+ te^{i\alpha} v\rangle = \|u\|^2+
t\,\overline{e^{-i\alpha} \langle u,v\rangle} + te^{-i\alpha} \langle
u,v\rangle + t^2\|v\|^2. \]
Using the polar form of a complex number, we can write $\langle
u,v\rangle = \big|\langle u,v\rangle\big|e^{i\theta}$. Choose $\alpha =
\theta$. Then the previous inequality becomes
\[
p(t):=\|u\|^2+2t\big|\langle u,v\rangle\big| + t^2\|v\|^2\ge 0.
\]
Because $p(t)$ doesn't go negative, it either has two complex roots or
a double real root. In either case, the discriminant for $p$ satisfies
$4\big|\langle u,v\rangle\big|^2 - 4\|u\|^2\|v\|^2\le 0$. Schwarz's inequality
follows immediately from this. $\square$

**Corollary**. Equality holds in Schwarz's inequality if and only
if $\{u,v\}$ is linearly dependent.

**Proof**. Exercise.

**Theorem (The triangle inequality)**. Suppose that a vector space $V$
is equipped with an inner product, $\langle u,v\rangle$. Let
$\|u\|:=\sqrt{\langle u,u\rangle}$. Then,
\[
\big| \|u\|-\|v\|\big| \le \|u+v\|\le \|u\|+\|v\|.
\]
**Proof**. Recall that in the proof of Schwarz's inequality, we
used
\[
\|u+te^{-i\alpha} v\|^2=\|u\|^2+te^{-i\alpha} \langle u,v\rangle +
t\,\overline{e^{-i\alpha} \langle u,v\rangle} + t^2\|v\|^2.
\|u\|^2+te^{-i\alpha} \langle u,v\rangle + t\,\overline{e^{-i\alpha}
\langle u,v\rangle} + t^2\|v\|^2.
\]
If we set $t=1$, $\alpha =0$, use the identity $z+\bar
z=2\text{Re}(z)$, the inequality $\text{Re}(z)\le |z|$ and Schwarz's
inequality, we get
\[
\|u+v\|^2 \le \|u\|^2 + 2\big|\langle u,v\rangle\big|+\|v\|^2\le \big(
\|u\|+\|v\| \big)^2.
\]
Taking square roots gives the right side of the triangle
inequality. Similarly, we have
\[
\|u+v\|^2 \ge \|u\|^2 - 2\big|\langle u,v\rangle\big|+\|v\|^2\ge \big(
\|u\|-\|v\| \big)^2.
\]
Again taking square roots yields the left side. $\square$

The triangle inequality is one of the essential properties of length. The sum of the lengths of two sides of a (nondegenaerate) triangle is greater than length of the third side. In a real vector space, we can also define the angle between two vectors. Suppose that neither $u$ nor $v$ is $\mathbf 0$. Schwarz's inequality implies that we have \[ -1 \le \frac{\langle u,v\rangle}{\|u\|\|v\|}\le 1. \] Thus we may define the angle between $u$ and $v$ to be \[ \theta(u,v) = \arccos\bigg(\frac{\langle u,v\rangle}{\|u\|\|v\|}\bigg). \]

When $\langle u,v\rangle = 0$, the vectors are orthogonal (perpendicular) to each other. We will say this is true even if one or both or $u$ and $v$ are $\mathbf 0$. In the complex case, the concept of angle between two vectors is not so important, except when $\langle u,v\rangle = 0$. When this happens we will also say that $u$ and $v$ are orthogonal.

**Standard inner products**. Here is a list of a few standard inner product
spaces.

- $\mathbf R^n$ — $\langle x,y\rangle = \sum_{j=1}^n x_jy_j = y^Tx$.
- $\mathbf C^n$ — $\langle x,y\rangle = \sum_{j=1}^n x_j\overline{y_j} = y^\ast x$.
- Real $L^2[a,b]$ — $\langle f,g\rangle = \int_a^b f(x)g(x)dx$
- Complex $L^2[a,b]$ — $\langle f,g\rangle = \int_a^b f(x)\overline{g(x)}dx$
- $2\pi$-periodic $L^2$ functions — $\langle f,g\rangle = \int_0^{2\pi} f(x)\overline{g(x)}dx$
- Weighted $L^2$ inner products (real) — $\langle f,g\rangle = \int_a^b f(x)g(x)w(x)dx$, where $w(x)>0$.
- Real Sobolev space for differentiable functions — $\langle f,g\rangle = \int_a^b \big(f(x)g(x)+ f'(x)g'(x)\big)dx$.
- Let $A$ be an $n\times n$ matrix with real entries. Suppose that $A$ is selfadjoint and that $x^TAx>0$, unless $x=\mathbf 0$ — $\langle x,y\rangle = y^TAx$.
- Biinfinite complex sequences, $\ell^2$ — $\langle x,y\rangle = \sum_{n=-\infty}^\infty x_n \overline{y_n}$.

**Solution**. Symmetry, homogeneity and additivity are all simple
consequences of the properties of the integral. Thus, we only need to
show positivity. The definition of the Riemann integral implies that
$\langle f,f\rangle = \int_0^1 f^2(x)dx\ge 0$, so what remains is
showing that the only function $f\in C[0,1]$ for which $\langle
f,f\rangle=0$ is $f\equiv 0$.

Suppose this is false. Then there is an $f\in C[0,1]$ such that $\int_0^1 f^2(x)dx=0$ and there is also an $x_0\in [0,1]$ for which $f(x_0) \ne 0$. Let $F=f^2$. Products of continuous functions are continuous, so $F\in C[0,1]$. Also, $F(x_0) = (f(x_0))^2>0$. Using a continuity argument, one can show that there is a closed interval $[a,b]\subseteq [0,1]$ that contains $x_0$ and on which $F(x)\ge \frac12 F(x_0)$. (Exercise.) Consequently, \[ \int_0^1 f^2(x)dx=\int_0^1 F(x)dx \ge \int_a^b F(x)dx \ge \frac12 F(x_0)(b-a)>0, \] which contradicts the assumption that $\int_0^1 f^2(x)dx=0$. Consequently, positivity holds. $\square$

** Orthogonality**

We will begin with a few definitions. In an inner product space $V$,
we say that a (possibly infinite) set of vectors $S= \{v_1,\ldots,v_n,
\ldots\}$ is orthogonal if and only if (i) none of the vectors are
$\mathbf 0$ and (ii) $\langle v_j,v_k\rangle =0$ for all $j\ne
k$. Part (i) excludes $\mathbf 0$ from the set. We do this to avoid
having to write the phrase "orthogonal set of nonzero vectors."
However, be aware that some authors do allow for including $\mathbf
0$. It is easy to see that an orthogonal set of vectors is linearly
independent. We will frequently use normalized sets of orthogonal
vectors. An orthogonal set is termed *orthonormal* (o.n.) if
all of the vectors in it have unit length; that is, $\langle
v_j,v_k\rangle =\delta_{j,k}$. We say that two subspaces of $V$, $U$
and $W$, are orthogonal if and only if all of the vectors in $U$ are
orthogonal to all of the vectors in $W$. When this happens we write
$U\perp W$. Finally, we define the orthogonal complement of $U$ in $V$
to be $U^\perp := \{v\in V\colon \langle v,u\rangle = 0 \ \forall\
u\in U\}$.

**Minimization problems**. A common way of fitting data, either
discrete or continuous, is *least-squares* minimization. The
familiar straight line fit to a set of data is a good example of
this technique and we will discuss it briefly. Suppose that we have
collected data $\{y_j\in \mathbb R,\ j=1,\ldots, n\}$ at times
$\{t_1,\ldots,t_n\}$. To get a good straight line $y(t)=a+bt$ that
fits the data, we choose the intecept and slope to to minimize the
sum of the squares of $y_j-y(t_j)= y_j - a -bt_j$. Specifically, we
will minimize over all $a$, $b$ the quantity $D^2 =
\sum_{j=1}^n(y_j- a -bt_j)^2$. We can put this problem in terms of
$\mathbb R^n$. Let $\mathbf y= [y_1\ y_2 \ \cdots \ y_n]^T$,
$\mathbf 1= [1\ 1\ \cdots \ 1]^T$, and $\mathbf = [t_1\ t_2 \ \cdots
\ t_n]^T$. In this notation, we have $D^2 = \|\mathbf y - a\mathbf 1
- b \mathbf t\|^2$. Next, let $U = \text{span}\{\mathbf 1,\mathbf t
\}=\{a\mathbf 1 + b\mathbf t,\ \forall \ a,b\in \mathbb R\}$. Using
this notation, we can thus recast the problem in its final form:
Find $\mathbf p\in U$ such that \[ \min_{a,b} D = \|\mathbf y -
\mathbf p\|=\min_{\mathbf u\in U}\|\mathbf{y}-\mathbf u\|. \] As
you have shown in exercises 1.3 and 1.4, this problem has a unique
solution that can be found from the normal equations, which in
matrix form are
\[
\left(\begin{array}{c} a\\ b\end{array}\right) =
\left(\begin{array}{cc} \mathbf 1^T\mathbf 1 & \mathbf 1^T \mathbf t\\
\mathbf t^T\mathbf 1 & \mathbf t^T \mathbf t
\end{array}\right)^{-1}\mathbf y.
\]
Stated another way, the solution is given by $\mathbf p = P\mathbf y$,
where $P$ is the orhogonal projection of $\mathbf y$ onto $U$.

The general "least squares" minimization problem is this: Given an
inner product space $V$, a vector $v\in V$, and a subspace $U\subset
V$, find $p\in U$ such that $\|v - p\|=\min_{u\in U}\|v - u\|$. By the
exercises 1.3 and 1.4, a solution $p$ exists for every $v\in V$ if and
only if there is a vector $p\in U$ such that $v-p\in U^\perp$. When
this happens $p$ is unique and $p=Pv$, where as before $P$ is the
orthogonal projection of $v$ onto $u$. In particular, when $U$ is
finite dimensional, this is always true. Furthermore, if
$B=\{u_1,\ldots,u_n\}$ is an orthonormal (o.n) basis for $U$, then by
exercise 1.4(c), the formula for $Pv$ is especially simple:
\[
Pv = \sum_{j=1}^n
\langle v,u_j\rangle u_j.
\]
**Gram-Schmidt process**. Unlike the situation for fitting data, we
don't need to invert the matrix $G$. This raises two questions: Does
an o.n. basis exist for an inner product space and, if so, how can it
be found?

We will deal with a finite dimensional space having a basis $B=\{v_1,\ldots,v_n\}$. Our aim will be to produce an orthogonal basis for the space. This can be converted to an o.n. basis by simply dividing each vector by its length. To begin, define the spaces $U_k=\text{span}\{v_1,\ldots,v_k\}$, $k=1,\ldots,n$. Let $w_1=v_1$. Next, let $w_2=v_2 - P_1v_2=v_2 - \frac{\langle v_2,w_1\rangle}{\|w_1\|^2}w_1$, where $P_1$ is the orthogonal projection onto $U_1$. An easy computaion shows that $w_2$ is orthogonal to $w_1$ and, consequently, $\{w_1,w_2\}$ is an orthogonal basis for $U_2$. Similarly, we let $w_3 = v_2 - P_2v_3 = v_3 - \frac{\langle v_3,w_1\rangle}{\|w_1\|^2}w_1 -\frac{\langle v_3,w_2\rangle}{\|w_2\|^2}w_2$. As before $w_3$ is orthogonal to $w_!,w_2$ and $\{w_1,w_2, w_3\}$ is an orthogonal basis for $U_3$. We can continue in this way. Let $w_k = v_k - P_{k-1}v_k$. It follows that $w_k$ is orthogonal to $U_{k-1}$, so $U_k$ has the orthogonal basis $\{w_1,\ldots, w_k\}$. Eventually, we obtain an orthogonal basis for $V$, $\{w_1,\ldots,w_n\}$.

** QR-factorization**.
One important application of the Gram-Schmidt is the QR factorization of a matrix. Let $A$ be n×n matrix, which we will assume to be real and to have linearly independent columns $\{\mathbf v_1,\ldots, \mathbf v_n\}$. Then there exists a matrix $Q$ whose columns form an o.n. basis for $\mathbb R^n$ — i.e., an orthogonal matrix — and an upper triangular matrix $R$, with positive diagonal entries, such that $A=QR$.

To prove this, note that we are dealing with three different bases of $\mathbb R^n$: (1) the standard basis $E=\{\mathbf e_1,\ldots,\mathbf e_n\}$; (2) the basis of columns $F=\{\mathbf v_1,\ldots, \mathbf v_n\}$; and (3), the basis of orthonormal vectors $G=\{\mathbf q_1,\ldots,\mathbf q_n\}$ obtained via the Gram-Schmidt process. The matrix $A$ is actually the transition matrix from $F$ to $E$ coordinates, because it has the form $A=[\text{E-coordinates of the F-basis}]$. Let $Q:=[\mathbf q_1\ \cdots \ \mathbf q_n]$. As is the case with $A$, $Q$ is the transition matrix from $G$ to $E$ coordinates. Assuming $A=QR$, we would have $R=Q^{-1}A$. Since $A$ takes $F$ to $E$ and $Q^{-1}$ takes $E$ to $G$, the matrix $R$ is the transition matrix that takes $F$ to $G$. Thus $R=[\text{G-coordinates of the F-basis}]$.

We need the columns of $R$. Each will have the form $[\mathbf a_k]_G$. Finding these requires a careful look at the Gram-Schmidt process itself. The $\mathbf q_k$'s are obtained from the $\mathbf a_k$'s by first getting the orthogonal set $\mathbf w_k$, and then normalizing. Gramm-Schmidt gives us the $\mathbf w_k$'s via the formula \[ \mathbf w_k = \mathbf v_k - \sum_{j=1}^{k-1}\frac{\langle \mathbf v_k, \mathbf w_j\rangle}{\|\mathbf w_j\|^2}w_j \] This formula can be re-written to give us $\mathbf v_k$ in terms of the $\mathbf w_k$'s: \[ \mathbf v_k = \mathbf w_k + \sum_{j=1}^{k-1}\frac{\langle \mathbf v_k, \mathbf w_j\rangle}{\|\mathbf w_j\|^2}w_j \] We can now normalize the set using $\mathbf q_j = \mathbf w_j/\| \mathbf w_j\|$. Doing so yields \[ \mathbf v_k = \|\mathbf w_k\|\mathbf q_k + \sum_{j=1}^{k-1}\langle \mathbf v_k, \mathbf q_j\rangle\mathbf q_j. \] From this, it follows that the entries of the transition matrix are $R_{j,k}=\langle \mathbf v_k, \mathbf q_j \rangle $, if $j < k$; $\ R_{k,k}=\|w_k\|$ when $j=k$; and $ R_{j,k} = 0$ for $j>k$. This matrix is upper triangular. Specifically, it has the form \[ R = \left( \begin{array}{cccc} R_{11} & R_{12} & \cdots & R_{1n} \\ 0 & R_{22} & \cdots & R_{2n}\\ \vdots & \vdots &\ddots & \vdots \\ 0 & 0 & \cdots &R_{nn} \end{array} \right)\ . \]

**Definition** Let $V$ be a vector space. A mapping $ \| \cdot \| : V \to [0.\infty)$ is said to be a norm on $V$ if it satisfies these properties:

- Positivity. $\|v\|\ge 0$, with $0$ occurring only if $v = \mathbf 0$.
- Positive homogeneity. $\| v\|= c\| v\|$.
- Triangle inequality. $\|u+v\| \le \|u\|+ \|v\|$.

- Any inner product space — $\|v\|=\sqrt{\langle v,v\rangle }$.
- $\mathbb R^n$ or $\mathbb C^n$ — $\|x\|_p := \big(\sum_{j=1}^n |x_j|^P\big)^{1/p}$, where $1\le p <\infty$. For $p=\infty$, $\|x\|_\infty := \max_{1\lej \le n}|x_j|$.
- Continuous functions on an interval $[a,b]$ — $\|f\|_C = \max_{a\le x\le b}|f(x)|$.
- $k$-times continuously differentiable functions — $\|f\|_{C^k} = \sum_{j=0}^k \|f^{(j)}\|_C$. (There are other equivalent norms.)
- $L^p$ spaces on the interval $a\le x \le b$, for $1\le p <\infty$ — $\|f\|_{L^p}: = \big(\int_a^b |f(x|^pdx \big)^{1/p}$. For $p=\infty$, $\|f\|_{L^\infty} := \text{essential-sup}_{a\le x \le b}|f(x)|$. These spaces involve the theory of the Lebesgue integral and will be discussed later in the course.

Updated 9/5/15 (fjn).