Statistics and Linear Algebra I: Covariance

Given real-valued random variables X_{1},\cdots,X_{n} on some probability space (\Omega,\mathcal{F},\mathbb{P}), we can can form a random column vector \mathbf{X} := \begin{bmatrix} X_{1}\\ \vdots \\ X_{n}\end{bmatrix} and define an n \times n matrix by

\displaystyle\Lambda_{ij} := \mathbb{E}\left[\left(X_{i}-\mathbb{E}[X_{i}]\right)\left(X_{j}-\mathbb{E}[X_{j}]\right)\right], \indent 1 \leq i,j \leq n

\mathbf{\Lambda} is called the variance of the random vector \mathbf{X}, and we write \text{Var}(\mathbf{X}). Some authors alternatively call \mathbf{\Lambda} the covariance matrix of \mathbf{X} and write \text{Cov}(\mathbf{X})=\mathbf{\Lambda}. \text{Var}(\mathbf{X}) is evidently symmetric, and in fact, it is positive semidefinite. Indeed, for any column vector \mathbf{x} \in \mathbb{R}^{k}

\begin{array}{lcl} \displaystyle\langle{\text{Var}(\mathbf{X})\mathbf{x},\mathbf{x}}\rangle&=&\displaystyle\begin{bmatrix}\sum_{j=1}^{n}\mathbb{E}\left[(X_{1}-\mu_{1})(X_{j}-\mu_{j})\right]x_{j}\cdots\sum_{j=1}^{n}\mathbb{E}\left[(X_{n}-\mu_{n})(X_{j}-\mu_{j})\right]x_{j}\end{bmatrix}\begin{bmatrix}x_{1}\\ \vdots \\ x_{n}\end{bmatrix}\\&=&\displaystyle\mathbb{E}\left[\left(x_{1}(X_{1}-\mu_{1})+\cdots+x_{n}(X_{n}-\mu_{n})\right)\left(x_{1}(X_{1}-\mu_{1})+\cdots+x_{n}(X_{n}-\mu_{n})\right)\right]\\&=&\displaystyle\mathbb{E}\left[\left(\sum_{j=1}^{n}x_{j}(X_{j}-\mu_{j})\right)^{2}\right]\\&\geq&\displaystyle 0\end{array}

We can ask whether every real symmetric, positive semidefinite n\times n matrix is the covariance matrix of random variables X_{1},\cdots,X_{n}. The answer is yes, but we will need a few results from linear algebra before we can give a full proof of this fact.

Lemma 1. Suppose V and W are real or complex inner product spaces with orthonormal bases \left\{e_{1},\cdots,e_{n}\right\} and \left\{f_{1},\cdots,f_{m}\right\}, respectively. Then for any linear operator T: V \rightarrow W,


Proof. Let A=(a_{ij}) be the m\times n matrix of T with respect to the given bases. For all 1\leq j \leq m, we can uniquely write

\displaystyle T^{*}f_{j}= b_{1j}e_{1}+\cdots+b_{nj}e_{n}

for scalars b_{1j},\cdots,b_{nj}\in\mathbb{F}. By definition of the Hilbert adjoint,

\begin{array}{lcl}\displaystyle\overline{a_{kj}}=\overline{\langle{Te_{k},f_{j}}\rangle}=\langle{f_{j},Te_{k}}\rangle=\langle{T^{*}f_{j},e_{k}}\rangle&=&\displaystyle\langle{b_{1j}e_{1}+\cdots+b_{nj}e_{n},e_{k}}\rangle\\&=&\displaystyle\sum_{i=1}^{m}b_{ij}\langle{e_{i},e_{k}}\rangle\\&=&\displaystyle b_{kj}\end{array},

where the last equality follows from orthonormality. \Box

The preceding lemma allows us to reduce computation of adjoint operators to taking the conjugate transpose of matrices. Note that the statement of the lemma may fail if the bases are not orthonormal. Indeed, consider the linear operator T: \mathbb{R}^{2} \rightarrow\mathbb{R}^{2} defined by

\displaystyle T(1,0)=(1,1),\indent T(0,1)=(1,0)

The matrix of T with respect to the standard basis is \begin{bmatrix} 1 & 1\\ 1& 0\end{bmatrix}, which is symmetric, implying that the matrix of T=T^{*}. But the matrix of T with respect to the basis \left\{(1,1),(0,1)\right\} is \begin{bmatrix} 1& 1\&-1\end{bmatrix}, which is not symmetric.

To motivate the next lemma, suppose we have a real polynomial p(t) := t^{2}+\alpha t + \beta. The famous quadratic formula tells us that p has complex conjugate roots if the discriminant \alpha^{2}-4\beta <0. If we know that the operator T can only have real eigenvalues, then p(T) is injective, hence invertible. Since self-adjoint operators necessarily have real eigenvalues, we obtain our next result.

Lemma 2. Suppose a linear operator T is self-adjoint. If \alpha,\beta \in \mathbb{R} satisfy \alpha^{2}<4\beta, then the operator T^{2}+\alpha T+ \beta I is invertible.

We will now use this lemma to show that self-adjointess is a sufficient, although obviously not necessary, condition for a real-operator to have an eigenvalue.

Lemma 3. Every self-adjoint operator T: V \rightarrow V has an eigenvalue.

Proof. Set n:=\dim(V). Then, for v \neq 0, the set \left\{v,Tv,\cdots,T^{n}v\right\} is linearly dependent. Hence, there exist real numbers a_{0},\cdots,a_{n}, not identically zero, such that


Define a real polynomial p by p(x):=a_{0}+a_{1}x+\cdots+a_{n}x^{n}. It is a consequence of the fundamental theorem of algebra and the quadratic formula that we can factor p as

\displaystyle p(x)=c\left(x^{2}+\alpha_{1}x+\beta_{1}\right)\cdots\left(x^{2}+\alpha_{M}x+\beta_{M}\right)\left(x-\lambda_{1}\right)\cdots\left(x-\lambda_{m}\right)

where c \in \mathbb{R}\setminus\left\{0\right\}, \alpha_{j},\beta_{j}\in\mathbb{R} with \alpha_{j}^{2} < 4\beta_{j}, \lambda_{j}\in\mathbb{R}, and m\geq 1. The last assertion is, perhaps, not so obvious. If p is just a product of quadratic polynomials with the same coefficient conditions, then p(T) would be invertible, contradicting that p(T)v=0. Each operator T^{2}+\alpha_{j}T+\beta_{j}I is invertible by the preceding lemma, so setting

\displaystyle w:=c\left(T^{2}+\alpha_{1}T+\beta_{1}I\right)\cdots\left(T^{2}+\alpha_{M}T+\beta_{M}I\right)v\neq0

we obtain that (T-\lambda_{1}I)\cdots (T-\lambda_{m})w=0. Hence, T-\lambda_{j}I must not be injective for some j. \Box

We now have all the necessary lemmas to tackle the real spectral theorem.

Theorem 4. Suppose V is a (finite-dimensional) real inner-product space, and T: V \rightarrow V is a linear operator. Then T is self-adjoint if and only if V has an orthonormal basis consisting of eigenvectors of T.

Proof. Suppose V has an orthonormal basis \left\{e_{1},\cdots,e_{n}\right\} consisting of eigenvectors of T. Then the matrix of T with respect to this is diagonal, hence symmetric, which shows that T is self-adjoint.

Now suppose that T is self-adjoint. We prove that V has the desired orthonormal basis by induction on \dim(V). If \dim(V)=1, then we apply the preceding lemma and we’re done. Now suppose \dim(V)=n > 1, and we have proved the real spectral theorem for all vectors with dimension < n. By the preceding lemma, T has an eigenvalue $\lambda$ with associated eigenvector u of norm 1. Let U denote the span of u, and consider its orthogonal complement U^{\perp}. I claim that U^{\perp} is T-invariant. Indeed, for any v \in U^{\perp},

\displaystyle\langle{Tv,u}\rangle=\langle{v,Tu}\rangle=\langle{v,\lambda u}\rangle=\lambda\langle{v,u}\rangle=0

Hence, we can define an operator S: U^{\perp}\rightarrow U^{\perp} by S:=T|_{U^{\perp}}. S is clearly self-adjoint, so by our induction hypothesis, U^{\perp} has an orthonormal basis \left\{e_{1},\cdots,e_{n-1}\right\} consisting of eigenvectors of T. Adjoining e_{n}:=u to this set gives an orthonormal basis for V consisting of eigenvectors of T. \Box

Let T: V \rightarrow V be a self-adjoint linear transformation between finite-dimensional inner product spaces. Fix an orthonormal bases \left\{e_{1},\cdots,e_{n}\right\} for V. By the real spectral theorem, there exists an orthonormal basis of eigenvectors \left\{f_{1},\cdots,f_{n}\right\}. Let \varphi_{1}: \mathbb{R}^{n}\rightarrow V and \varphi_{2}: \mathbb{R}^{n}\rightarrow V be the coordinate isomorphisms mapping the standard basis \epsilon_{j} to e_{j} and f_{j}, respectivey. Set T_{1} := \varphi_{1}^{-1} \circ T\circ \varphi_{1} and \varphi_{2}^{-1}\circ T\circ \varphi_{2} to obtain operators \mathbb{R}^{n}\rightarrow\mathbb{R}^{n}. Then

\displaystyle T_{2}=\varphi_{2}^{-1}\circ T\circ\varphi_{2}=\varphi_{2}^{-1}\circ\varphi_{1}\circ(\varphi_{1}^{-1}\circ T\circ\varphi_{1})\circ\varphi_{1}^{-1}\varphi_{2}=(\varphi_{2}^{-1}\circ \varphi_{1})\circ T_{1}\circ (\varphi_{1}^{-1}\circ\varphi_{2})

Set P:= \varphi_{1}^{-1}\circ\varphi_{2}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}. I claim that P is a unitary transformation. Indeed,

\begin{array}{lcl} \displaystyle\langle{P\epsilon_{i},P\epsilon_{j}}\rangle=\langle{\varphi_{1}^{-1}f_{i},\varphi_{1}^{-1}f_{j}}\rangle&=&\displaystyle\left\langle{\varphi_{1}^{-1}\left(\sum_{k=1}^{n}\langle{f_{i},e_{k}}\rangle e_{k}\right),\varphi_{1}^{-1}\left(\sum_{k=1}^{n}\langle{f_{j},e_{k}}\rangle e_{k}\right)}\right\rangle\\&=&\displaystyle\left\langle{\sum_{k=1}^{n}\langle{f_{i},e_{k}}\rangle\epsilon_{k},\sum_{k=1}^{n}\langle{f_{j},e_{k}}\rangle\epsilon_{k}}\right\rangle\\&=&\displaystyle\sum_{k=1}^{n}\langle{f_{i},e_{k}}\rangle\overline{\langle{f_{j},e_{k}}\rangle}\\&=&\displaystyle\sum_{k=1}^{n}\langle{f_{i},\langle{f_{j},e_{k}}\rangle e_{k}}\rangle\\&=&\displaystyle\left\langle{f_{i},\sum_{k=1}^{n}\langle{f_{j},e_{k}}\rangle e_{k}}\right\rangle\\&=&\displaystyle\langle{f_{i},f_{j}}\rangle\\&=&\displaystyle\langle{\epsilon_{i},\epsilon_{j}}\rangle\end{array}

Using this change-of-basis result, we see that for any symmetric real n \times n matrix \mathbf{A}, there exists an orthogonal (i.e. defines a unitary operator) matrix \mathbf{O} and a diagonal matrix \mathbf{D} such that


Let \lambda_{1},\cdots,\lambda_{n} denote the diagonal entries of \mathbf{D}, and let X_{1},\cdots,X_{n} be i.i.d. mean-zero random variables on a probability space (\Omega,\mathcal{F},\mathbb{P}) with respective variances \lambda_{1},\cdots,\lambda_{n}. Define a random vector \mathbf{X} := (X_{1},\cdots,X_{n}), and set \mathbf{Y}=\mathbf{O}\mathbf{X}, where \mathbf{X} is regarded as a column vector. Hence,


This entry was posted in math.ST and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s