Inner Products and Norms

Inner product spaces

In 2D or 3D, vectors have a natural geometry, independent of any coordinate system: every vector has a length; any two vectors have an angle between them. These quantities can be used to define the "dot" product of two vectors, $\mathbf x \cdot \mathbf y=|\mathbf x||\mathbf y| \cos(\theta(\mathbf x,\mathbf y))$. On the other hand, given the dot product, we can obtain the length of a vector $|\mathbf x|=\sqrt{\mathbf x \cdot \mathbf x}$, and also the angle between two vectors, $\theta(\mathbf x,\mathbf y) = \arccos\big(\frac{\mathbf x \cdot \mathbf y}{|\mathbf x| |\mathbf y|}\big)$. If we introduce the usual basis $\{\mathbf i,\mathbf j,\mathbf k\}$, where the vectors are perpendicular and have unit length, then every vector can be represented as $\mathbf x = x_1\mathbf i + x_2\mathbf j + x_3\mathbf k$. From this and the law of cosines we can derive the familiar algebraic formula for the dot product; namely, $\mathbf x \cdot \mathbf y = x_1y_1+x_2y_2+x_3y_3$. As we pointed out above, given this formula, we can find the geometic quantities of length and angle. The point is that we have connected the geometry of 3D with an algebraic formula.

Let's take a closer look at the 3D dot product. From the algebraic formula $\mathbf x \cdot \mathbf y = x_1y_1+x_2y_2+x_3y_3$, it is easy to show that the dot product has four properties: (1) Positivity: $\mathbf x \cdot \mathbf x \ge 0$, with 0 only occurring when $\mathbf x = \mathbf 0 $. (2) Symmetry: $\mathbf x \cdot \mathbf y = \mathbf y \cdot \mathbf x$. (3) Homogeneity: $(c\mathbf x) \cdot \mathbf y = c(\mathbf x \cdot \mathbf y)$. (4) Additivity: $( \mathbf x+ \mathbf z) \cdot \mathbf y = \mathbf x\cdot \mathbf y + \mathbf z \cdot \mathbf y$. Even though we used coordinates to get these properties, they hold generally, independent of any coordinate system.

Surprisingly, these properties are sufficient to define "dot products" on other vector spaces, and obtain "geometries" for these spaces. Instead of "dot product", we will use the term "inner product." Let $V$ be a vector space and $\langle \cdot, \cdot\rangle$ a function from $V\times V$ into the real or complex numbers, depending on whether $V$ is real or complex — i.e., the scalars are real or complex.

Definition. We say that $\langle \cdot, \cdot\rangle$ is an inner product on $V$ if the following properties hold:

  1. Positivity. $\langle v, v\rangle \ge 0$, with $0$ occurring only if $v = \mathbf 0$.
  2. Symmetry or conjugate symmetry. $\langle u, v\rangle= \langle v, u\rangle$, for the real case, and $\overline{\langle u, v\rangle} = \langle v, u\rangle$, for the complex case.
  3. Homogeneity. $\langle cu, v\rangle = c\langle u, v\rangle$.
  4. Additivity. $\langle u + w, v\rangle = \langle u, v\rangle+ \langle w, v\rangle$.
A vector space with an inner product is called an inner product space. We point out that in the complex case, conjugate symmetry and homogeneity imply that $\langle u, cv\rangle = \bar c\langle u, v\rangle$, and we say that the inner product is "linear on the left." In quantum physics, where complex inner products are an integral part of the theory, the inner products are "linear on the right." There are two very important inequalities associated with inner products: Schwarz's inequality and the triangle inequality.

Theorem (Schwarz's inequality). Suppose that a vector space $V$ is equipped with an inner product, $\langle u,v\rangle$. Let $\|u\|:=\sqrt{\langle u,u\rangle}$. Then, \[ \big|\langle u,v\rangle\big| \le \|u\|\|v\|. \] Proof. If either $u$ or $v$ is $\mathbf 0$, the inequality is trivially true. Thus, we may suppose neither $u$ nor $v$ is $\mathbf 0$. We will suppose that $V$ is complex. The real case is easier. Let $t,\alpha\in \mathbb R$. From the four properties of the inner product, we can show that \[ 0\le \langle u+te^{i\alpha} v, u+ te^{i\alpha} v\rangle = \|u\|^2+ t\,\overline{e^{-i\alpha} \langle u,v\rangle} + te^{-i\alpha} \langle u,v\rangle + t^2\|v\|^2. \] Using the polar form of a complex number, we can write $\langle u,v\rangle = \big|\langle u,v\rangle\big|e^{i\theta}$. Choose $\alpha = \theta$. Then the previous inequality becomes \[ p(t):=\|u\|^2+2t\big|\langle u,v\rangle\big| + t^2\|v\|^2\ge 0. \] Because $p(t)$ doesn't go negative, it either has two complex roots or a double real root. In either case, the discriminant for $p$ satisfies $4\big|\langle u,v\rangle\big|^2 - 4\|u\|^2\|v\|^2\le 0$. Schwarz's inequality follows immediately from this. $\square$

Corollary. Equality holds in Schwarz's inequality if and only if $\{u,v\}$ is linearly dependent.

Proof. Exercise.

Theorem (The triangle inequality). Suppose that a vector space $V$ is equipped with an inner product, $\langle u,v\rangle$. Let $\|u\|:=\sqrt{\langle u,u\rangle}$. Then, \[ \big| \|u\|-\|v\|\big| \le \|u+v\|\le \|u\|+\|v\|. \] Proof. Recall that in the proof of Schwarz's inequality, we used \[ \|u+te^{-i\alpha} v\|^2=\|u\|^2+te^{-i\alpha} \langle u,v\rangle + t\,\overline{e^{-i\alpha} \langle u,v\rangle} + t^2\|v\|^2. \|u\|^2+te^{-i\alpha} \langle u,v\rangle + t\,\overline{e^{-i\alpha} \langle u,v\rangle} + t^2\|v\|^2. \] If we set $t=1$, $\alpha =0$, use the identity $z+\bar z=2\text{Re}(z)$, the inequality $\text{Re}(z)\le |z|$ and Schwarz's inequality, we get \[ \|u+v\|^2 \le \|u\|^2 + 2\big|\langle u,v\rangle\big|+\|v\|^2\le \big( \|u\|+\|v\| \big)^2. \] Taking square roots gives the right side of the triangle inequality. Similarly, we have \[ \|u+v\|^2 \ge \|u\|^2 - 2\big|\langle u,v\rangle\big|+\|v\|^2\ge \big( \|u\|-\|v\| \big)^2. \] Again taking square roots yields the left side. $\square$

The triangle inequality is one of the essential properties of length. The sum of the lengths of two sides of a (nondegenaerate) triangle is greater than length of the third side. In a real vector space, we can also define the angle between two vectors. Suppose that neither $u$ nor $v$ is $\mathbf 0$. Schwarz's inequality implies that we have \[ -1 \le \frac{\langle u,v\rangle}{\|u\|\|v\|}\le 1. \] Thus we may define the angle between $u$ and $v$ to be \[ \theta(u,v) = \arccos\bigg(\frac{\langle u,v\rangle}{\|u\|\|v\|}\bigg). \]

When $\langle u,v\rangle = 0$, the vectors are orthogonal (perpendicular) to each other. We will say this is true even if one or both or $u$ and $v$ are $\mathbf 0$. In the complex case, the concept of angle between two vectors is not so important, except when $\langle u,v\rangle = 0$. When this happens we will also say that $u$ and $v$ are orthogonal.

Standard inner products. Here is a list of a few standard inner product spaces.

  1. $\mathbf R^n$ —   $\langle x,y\rangle = \sum_{j=1}^n x_jy_j = y^Tx$.
  2. $\mathbf C^n$ —   $\langle x,y\rangle = \sum_{j=1}^n x_j\overline{y_j} = y^\ast x$.
  3. Real $L^2[a,b]$ —   $\langle f,g\rangle = \int_a^b f(x)g(x)dx$
  4. Complex $L^2[a,b]$ —   $\langle f,g\rangle = \int_a^b f(x)\overline{g(x)}dx$
  5. $2\pi$-periodic $L^2$ functions —   $\langle f,g\rangle = \int_0^{2\pi} f(x)\overline{g(x)}dx$
  6. Weighted $L^2$ inner products (real) —   $\langle f,g\rangle = \int_a^b f(x)g(x)w(x)dx$, where $w(x)>0$.
  7. Real Sobolev space for differentiable functions —   $\langle f,g\rangle = \int_a^b \big(f(x)g(x)+ f'(x)g'(x)\big)dx$.
  8. Let $A$ be an $n\times n$ matrix with real entries. Suppose that $A$ is selfadjoint and that $x^TAx>0$, unless $x=\mathbf 0$ —   $\langle x,y\rangle = y^TAx$.
  9. Biinfinite complex sequences, $\ell^2$ —   $\langle x,y\rangle = \sum_{n=-\infty}^\infty x_n \overline{y_n}$.
Example. Let $f,g$ be real values functions in $C[0,1]$ and define $\langle f,g\rangle = \int_0^1 f(x)g(x)dx$. Show that $\langle f,g\rangle$ is a real inner product for $C[0,1]$.

Solution. Symmetry, homogeneity and additivity are all simple consequences of the properties of the integral. Thus, we only need to show positivity. The definition of the Riemann integral implies that $\langle f,f\rangle = \int_0^1 f^2(x)dx\ge 0$, so what remains is showing that the only function $f\in C[0,1]$ for which $\langle f,f\rangle=0$ is $f\equiv 0$.

Suppose this is false. Then there is an $f\in C[0,1]$ such that $\int_0^1 f^2(x)dx=0$ and there is also an $x_0\in [0,1]$ for which $f(x_0) \ne 0$. Let $F=f^2$. Products of continuous functions are continuous, so $F\in C[0,1]$. Also, $F(x_0) = (f(x_0))^2>0$. Using a continuity argument, one can show that there is a closed interval $[a,b]\subseteq [0,1]$ that contains $x_0$ and on which $F(x)\ge \frac12 F(x_0)$. (Exercise.) Consequently, \[ \int_0^1 f^2(x)dx=\int_0^1 F(x)dx \ge \int_a^b F(x)dx \ge \frac12 F(x_0)(b-a)>0, \] which contradicts the assumption that $\int_0^1 f^2(x)dx=0$. Consequently, positivity holds. $\square$

Orthogonality

We will begin with a few definitions. In an inner product space $V$, we say that a (possibly infinite) set of vectors $S= \{v_1,\ldots,v_n, \ldots\}$ is orthogonal if and only if (i) none of the vectors are $\mathbf 0$ and (ii) $\langle v_j,v_k\rangle =0$ for all $j\ne k$. Part (i) excludes $\mathbf 0$ from the set. We do this to avoid having to write the phrase "orthogonal set of nonzero vectors." However, be aware that some authors do allow for including $\mathbf 0$. It is easy to see that an orthogonal set of vectors is linearly independent. We will frequently use normalized sets of orthogonal vectors. An orthogonal set is termed orthonormal (o.n.) if all of the vectors in it have unit length; that is, $\langle v_j,v_k\rangle =\delta_{j,k}$. We say that two subspaces of $V$, $U$ and $W$, are orthogonal if and only if all of the vectors in $U$ are orthogonal to all of the vectors in $W$. When this happens we write $U\perp W$. Finally, we define the orthogonal complement of $U$ in $V$ to be $U^\perp := \{v\in V\colon \langle v,u\rangle = 0 \ \forall\ u\in U\}$.

Minimization problems. A common way of fitting data, either discrete or continuous, is least-squares minimization. The familiar straight line fit to a set of data is a good example of this technique and we will discuss it briefly. Suppose that we have collected data $\{y_j\in \mathbb R,\ j=1,\ldots, n\}$ at times $\{t_1,\ldots,t_n\}$. To get a good straight line $y(t)=a+bt$ that fits the data, we choose the intecept and slope to to minimize the sum of the squares of $y_j-y(t_j)= y_j - a -bt_j$. Specifically, we will minimize over all $a$, $b$ the quantity $D^2 = \sum_{j=1}^n(y_j- a -bt_j)^2$. We can put this problem in terms of $\mathbb R^n$. Let $\mathbf y= [y_1\ y_2 \ \cdots \ y_n]^T$, $\mathbf 1= [1\ 1\ \cdots \ 1]^T$, and $\mathbf = [t_1\ t_2 \ \cdots \ t_n]^T$. In this notation, we have $D^2 = \|\mathbf y - a\mathbf 1 - b \mathbf t\|^2$. Next, let $U = \text{span}\{\mathbf 1,\mathbf t \}=\{a\mathbf 1 + b\mathbf t,\ \forall \ a,b\in \mathbb R\}$. Using this notation, we can thus recast the problem in its final form: Find $\mathbf p\in U$ such that \[ \min_{a,b} D = \|\mathbf y - \mathbf p\|=\min_{\mathbf u\in U}\|\mathbf{y}-\mathbf u\|. \] As you have shown in exercises 1.3 and 1.4, this problem has a unique solution that can be found from the normal equations, which in matrix form are \[ \left(\begin{array}{c} a\\ b\end{array}\right) = \left(\begin{array}{cc} \mathbf 1^T\mathbf 1 & \mathbf 1^T \mathbf t\\ \mathbf t^T\mathbf 1 & \mathbf t^T \mathbf t \end{array}\right)^{-1}\mathbf y. \] Stated another way, the solution is given by $\mathbf p = P\mathbf y$, where $P$ is the orhogonal projection of $\mathbf y$ onto $U$.

The general "least squares" minimization problem is this: Given an inner product space $V$, a vector $v\in V$, and a subspace $U\subset V$, find $p\in U$ such that $\|v - p\|=\min_{u\in U}\|v - u\|$. By the exercises 1.3 and 1.4, a solution $p$ exists for every $v\in V$ if and only if there is a vector $p\in U$ such that $v-p\in U^\perp$. When this happens $p$ is unique and $p=Pv$, where as before $P$ is the orthogonal projection of $v$ onto $u$. In particular, when $U$ is finite dimensional, this is always true. Furthermore, if $B=\{u_1,\ldots,u_n\}$ is an orthonormal (o.n) basis for $U$, then by exercise 1.4(c), the formula for $Pv$ is especially simple: \[ Pv = \sum_{j=1}^n \langle v,u_j\rangle u_j. \] Gram-Schmidt process. Unlike the situation for fitting data, we don't need to invert the matrix $G$. This raises two questions: Does an o.n. basis exist for an inner product space and, if so, how can it be found?

We will deal with a finite dimensional space having a basis $B=\{v_1,\ldots,v_n\}$. Our aim will be to produce an orthogonal basis for the space. This can be converted to an o.n. basis by simply dividing each vector by its length. To begin, define the spaces $U_k=\text{span}\{v_1,\ldots,v_k\}$, $k=1,\ldots,n$. Let $w_1=v_1$. Next, let $w_2=v_2 - P_1v_2=v_2 - \frac{\langle v_2,w_1\rangle}{\|w_1\|^2}w_1$, where $P_1$ is the orthogonal projection onto $U_1$. An easy computaion shows that $w_2$ is orthogonal to $w_1$ and, consequently, $\{w_1,w_2\}$ is an orthogonal basis for $U_2$. Similarly, we let $w_3 = v_2 - P_2v_3 = v_3 - \frac{\langle v_3,w_1\rangle}{\|w_1\|^2}w_1 -\frac{\langle v_3,w_2\rangle}{\|w_2\|^2}w_2$. As before $w_3$ is orthogonal to $w_1,w_2$ and $\{w_1,w_2, w_3\}$ is an orthogonal basis for $U_3$. We can continue in this way. Let $w_k = v_k - P_{k-1}v_k$. It follows that $w_k$ is orthogonal to $U_{k-1}$, so $U_k$ has the orthogonal basis $\{w_1,\ldots, w_k\}$. Eventually, we obtain an orthogonal basis for $V$, $\{w_1,\ldots,w_n\}$.

QR-factorization. One important application of Gram-Schmidt is the QR factorization of a matrix. Let $A$ be n×n matrix, which we will assume to be real and to have linearly independent columns $\{\mathbf v_1,\ldots, \mathbf v_n\}$. Then there exists a matrix $Q$ whose columns form an o.n. basis for $\mathbb R^n$ — i.e., an orthogonal matrix — and an upper triangular matrix $R$, with positive diagonal entries, such that $A=QR$.

To prove this, note that we are dealing with three different bases of $\mathbb R^n$: (1) the standard basis $E=\{\mathbf e_1,\ldots,\mathbf e_n\}$; (2) the basis of columns $F=\{\mathbf v_1,\ldots, \mathbf v_n\}$; and (3), the basis of orthonormal vectors $G=\{\mathbf q_1,\ldots,\mathbf q_n\}$ obtained via the Gram-Schmidt process. The matrix $A$ is actually the transition matrix from $F$ to $E$ coordinates, because it has the form $A=[\text{E-coordinates of the F-basis}]$. Let $Q:=[\mathbf q_1\ \cdots \ \mathbf q_n]$. As is the case with $A$, $Q$ is the transition matrix from $G$ to $E$ coordinates. Assuming $A=QR$, we would have $R=Q^{-1}A$. Since $A$ takes $F$ to $E$ and $Q^{-1}$ takes $E$ to $G$, the matrix $R$ is the transition matrix that takes $F$ to $G$. Thus $R=[\text{G-coordinates of the F-basis}]$.

We need the columns of $R$. Each will have the form $[\mathbf a_k]_G$. Finding these requires a careful look at the Gram-Schmidt process itself. The $\mathbf q_k$'s are obtained from the $\mathbf a_k$'s by first getting the orthogonal set $\mathbf w_k$, and then normalizing. Gramm-Schmidt gives us the $\mathbf w_k$'s via the formula \[ \mathbf w_k = \mathbf v_k - \sum_{j=1}^{k-1}\frac{\langle \mathbf v_k, \mathbf w_j\rangle}{\|\mathbf w_j\|^2}w_j \] This formula can be re-written to give us $\mathbf v_k$ in terms of the $\mathbf w_k$'s: \[ \mathbf v_k = \mathbf w_k + \sum_{j=1}^{k-1}\frac{\langle \mathbf v_k, \mathbf w_j\rangle}{\|\mathbf w_j\|^2}w_j \] We can now normalize the set using $\mathbf q_j = \mathbf w_j/\| \mathbf w_j\|$. Doing so yields \[ \mathbf v_k = \|\mathbf w_k\|\mathbf q_k + \sum_{j=1}^{k-1}\langle \mathbf v_k, \mathbf q_j\rangle\mathbf q_j. \] From this, it follows that the entries of the transition matrix are $R_{j,k}=\langle \mathbf v_k, \mathbf q_j \rangle $, if $j < k$; $\ R_{k,k}=\|w_k\|$ when $j=k$; and $ R_{j,k} = 0$ for $j>k$. This matrix is upper triangular. Specifically, it has the form \[ R = \left( \begin{array}{cccc} R_{11} & R_{12} & \cdots & R_{1n} \\ 0 & R_{22} & \cdots & R_{2n}\\ \vdots & \vdots &\ddots & \vdots \\ 0 & 0 & \cdots &R_{nn} \end{array} \right)\ . \]

Normed linear spaces

Inner product spaces have a geometry that involves both angles and distance. In many cases, it's useful to simply deal with a geometry only involving distance. This leads us to define the concept of a norm.

Definition Let $V$ be a vector space. A mapping $ \| \cdot \| : V \to [0.\infty)$ is said to be a norm on $V$ if it satisfies these properties:

  1. Positivity. $\|v\|\ge 0$, with $0$ occurring only if $v = \mathbf 0$.
  2. Positive homogeneity. $\| v\|= c\| v\|$.
  3. Triangle inequality. $\|u+v\| \le \|u\|+ \|v\|$.
There are many examples of normed linear spaces. Below we list a few of these. Norms on dual spaces. For the moment, we will deal only with finite dimensional spaces. Infinite dimensional spaces are more delicate and will be treated later. Recall that the dual space of a vector space $V$ is the set of all linear functionals on $V$ and it is denoted by $V^\ast$. If $V$ has a norm $\| \cdot\|_V$, then one can show that the following quantity defines a norm for $V^*$. Let $\varphi\in V^*$. Then we define \[ \| \varphi\|_{V^*} = \sup_{\|v\|_V=1}\big|\varphi(v)\big|. \] Equivalently, we can define the dual-space norm as the smallest constant $K$ such that $\| \varphi\|_{V^*} \le K\|v\|_V$. Verifying that $\| \cdot \|_{V^*}$ is a norm is left as an exercise. We close this section by mentioning two facts: first, that ${\ell ^p}^* = \ell^q$, where $1/p+1/q = 1$, $1\le p\le \infty$; and second, that for all finite dimensional spaces and many (but not all!) infinite dimensional spaces, $V^{**}=V$.

Previous: Vector spaces

Next: Cooordinates and bases

Updated 9/8/19 (fjn).