## Statistics and Linear Algebra I: Covariance

Given real-valued random variables $X_{1},\cdots,X_{n}$ on some probability space $(\Omega,\mathcal{F},\mathbb{P})$, we can can form a random column vector $\mathbf{X} := \begin{bmatrix} X_{1}\\ \vdots \\ X_{n}\end{bmatrix}$ and define an $n \times n$ matrix by

$\displaystyle\Lambda_{ij} := \mathbb{E}\left[\left(X_{i}-\mathbb{E}[X_{i}]\right)\left(X_{j}-\mathbb{E}[X_{j}]\right)\right], \indent 1 \leq i,j \leq n$

$\mathbf{\Lambda}$ is called the variance of the random vector $\mathbf{X}$, and we write $\text{Var}(\mathbf{X})$. Some authors alternatively call $\mathbf{\Lambda}$ the covariance matrix of $\mathbf{X}$ and write $\text{Cov}(\mathbf{X})=\mathbf{\Lambda}$. $\text{Var}(\mathbf{X})$ is evidently symmetric, and in fact, it is positive semidefinite. Indeed, for any column vector $\mathbf{x} \in \mathbb{R}^{k}$

$\begin{array}{lcl} \displaystyle\langle{\text{Var}(\mathbf{X})\mathbf{x},\mathbf{x}}\rangle&=&\displaystyle\begin{bmatrix}\sum_{j=1}^{n}\mathbb{E}\left[(X_{1}-\mu_{1})(X_{j}-\mu_{j})\right]x_{j}\cdots\sum_{j=1}^{n}\mathbb{E}\left[(X_{n}-\mu_{n})(X_{j}-\mu_{j})\right]x_{j}\end{bmatrix}\begin{bmatrix}x_{1}\\ \vdots \\ x_{n}\end{bmatrix}\\&=&\displaystyle\mathbb{E}\left[\left(x_{1}(X_{1}-\mu_{1})+\cdots+x_{n}(X_{n}-\mu_{n})\right)\left(x_{1}(X_{1}-\mu_{1})+\cdots+x_{n}(X_{n}-\mu_{n})\right)\right]\\&=&\displaystyle\mathbb{E}\left[\left(\sum_{j=1}^{n}x_{j}(X_{j}-\mu_{j})\right)^{2}\right]\\&\geq&\displaystyle 0\end{array}$

We can ask whether every real symmetric, positive semidefinite $n\times n$ matrix is the covariance matrix of random variables $X_{1},\cdots,X_{n}$. The answer is yes, but we will need a few results from linear algebra before we can give a full proof of this fact.

Lemma 1. Suppose $V$ and $W$ are real or complex inner product spaces with orthonormal bases $\left\{e_{1},\cdots,e_{n}\right\}$ and $\left\{f_{1},\cdots,f_{m}\right\}$, respectively. Then for any linear operator $T: V \rightarrow W$,

$\displaystyle\text{Mat}\left(T^{*},\left\{f_{1},\cdots,f_{m}\right\},\left\{e_{1},\cdots,e_{n}\right\}\right)=\text{Mat}\left(T,\left\{e_{1},\cdots,e_{n}\right\},\left\{f_{1},\cdots,f_{m}\right\}\right)^{*}$

Proof. Let $A=(a_{ij})$ be the $m\times n$ matrix of $T$ with respect to the given bases. For all $1\leq j \leq m$, we can uniquely write

$\displaystyle T^{*}f_{j}= b_{1j}e_{1}+\cdots+b_{nj}e_{n}$

for scalars $b_{1j},\cdots,b_{nj}\in\mathbb{F}$. By definition of the Hilbert adjoint,

$\begin{array}{lcl}\displaystyle\overline{a_{kj}}=\overline{\langle{Te_{k},f_{j}}\rangle}=\langle{f_{j},Te_{k}}\rangle=\langle{T^{*}f_{j},e_{k}}\rangle&=&\displaystyle\langle{b_{1j}e_{1}+\cdots+b_{nj}e_{n},e_{k}}\rangle\\&=&\displaystyle\sum_{i=1}^{m}b_{ij}\langle{e_{i},e_{k}}\rangle\\&=&\displaystyle b_{kj}\end{array},$

where the last equality follows from orthonormality. $\Box$

The preceding lemma allows us to reduce computation of adjoint operators to taking the conjugate transpose of matrices. Note that the statement of the lemma may fail if the bases are not orthonormal. Indeed, consider the linear operator $T: \mathbb{R}^{2} \rightarrow\mathbb{R}^{2}$ defined by

$\displaystyle T(1,0)=(1,1),\indent T(0,1)=(1,0)$

The matrix of $T$ with respect to the standard basis is $\begin{bmatrix} 1 & 1\\ 1& 0\end{bmatrix}$, which is symmetric, implying that the matrix of $T=T^{*}$. But the matrix of $T$ with respect to the basis $\left\{(1,1),(0,1)\right\}$ is $\begin{bmatrix} 1& 1\&-1\end{bmatrix}$, which is not symmetric.

To motivate the next lemma, suppose we have a real polynomial $p(t) := t^{2}+\alpha t + \beta$. The famous quadratic formula tells us that $p$ has complex conjugate roots if the discriminant $\alpha^{2}-4\beta <0$. If we know that the operator $T$ can only have real eigenvalues, then $p(T)$ is injective, hence invertible. Since self-adjoint operators necessarily have real eigenvalues, we obtain our next result.

Lemma 2. Suppose a linear operator $T$ is self-adjoint. If $\alpha,\beta \in \mathbb{R}$ satisfy $\alpha^{2}<4\beta$, then the operator $T^{2}+\alpha T+ \beta I$ is invertible.

We will now use this lemma to show that self-adjointess is a sufficient, although obviously not necessary, condition for a real-operator to have an eigenvalue.

Lemma 3. Every self-adjoint operator $T: V \rightarrow V$ has an eigenvalue.

Proof. Set $n:=\dim(V)$. Then, for $v \neq 0$, the set $\left\{v,Tv,\cdots,T^{n}v\right\}$ is linearly dependent. Hence, there exist real numbers $a_{0},\cdots,a_{n}$, not identically zero, such that

$0=a_{0}v+a_{1}Tv+\cdots+a_{n}T^{n}v$

Define a real polynomial $p$ by $p(x):=a_{0}+a_{1}x+\cdots+a_{n}x^{n}$. It is a consequence of the fundamental theorem of algebra and the quadratic formula that we can factor $p$ as

$\displaystyle p(x)=c\left(x^{2}+\alpha_{1}x+\beta_{1}\right)\cdots\left(x^{2}+\alpha_{M}x+\beta_{M}\right)\left(x-\lambda_{1}\right)\cdots\left(x-\lambda_{m}\right)$

where $c \in \mathbb{R}\setminus\left\{0\right\}$, $\alpha_{j},\beta_{j}\in\mathbb{R}$ with $\alpha_{j}^{2} < 4\beta_{j}$, $\lambda_{j}\in\mathbb{R}$, and $m\geq 1$. The last assertion is, perhaps, not so obvious. If $p$ is just a product of quadratic polynomials with the same coefficient conditions, then $p(T)$ would be invertible, contradicting that $p(T)v=0$. Each operator $T^{2}+\alpha_{j}T+\beta_{j}I$ is invertible by the preceding lemma, so setting

$\displaystyle w:=c\left(T^{2}+\alpha_{1}T+\beta_{1}I\right)\cdots\left(T^{2}+\alpha_{M}T+\beta_{M}I\right)v\neq0$

we obtain that $(T-\lambda_{1}I)\cdots (T-\lambda_{m})w=0$. Hence, $T-\lambda_{j}I$ must not be injective for some $j$. $\Box$

We now have all the necessary lemmas to tackle the real spectral theorem.

Theorem 4. Suppose $V$ is a (finite-dimensional) real inner-product space, and $T: V \rightarrow V$ is a linear operator. Then $T$ is self-adjoint if and only if $V$ has an orthonormal basis consisting of eigenvectors of $T$.

Proof. Suppose $V$ has an orthonormal basis $\left\{e_{1},\cdots,e_{n}\right\}$ consisting of eigenvectors of $T$. Then the matrix of $T$ with respect to this is diagonal, hence symmetric, which shows that $T$ is self-adjoint.

Now suppose that $T$ is self-adjoint. We prove that $V$ has the desired orthonormal basis by induction on $\dim(V)$. If $\dim(V)=1$, then we apply the preceding lemma and we’re done. Now suppose $\dim(V)=n > 1$, and we have proved the real spectral theorem for all vectors with dimension $< n$. By the preceding lemma, $T$ has an eigenvalue $\lambda$ with associated eigenvector $u$ of norm $1$. Let $U$ denote the span of $u$, and consider its orthogonal complement $U^{\perp}$. I claim that $U^{\perp}$ is $T$-invariant. Indeed, for any $v \in U^{\perp}$,

$\displaystyle\langle{Tv,u}\rangle=\langle{v,Tu}\rangle=\langle{v,\lambda u}\rangle=\lambda\langle{v,u}\rangle=0$

Hence, we can define an operator $S: U^{\perp}\rightarrow U^{\perp}$ by $S:=T|_{U^{\perp}}$. $S$ is clearly self-adjoint, so by our induction hypothesis, $U^{\perp}$ has an orthonormal basis $\left\{e_{1},\cdots,e_{n-1}\right\}$ consisting of eigenvectors of $T$. Adjoining $e_{n}:=u$ to this set gives an orthonormal basis for $V$ consisting of eigenvectors of $T$. $\Box$

Let $T: V \rightarrow V$ be a self-adjoint linear transformation between finite-dimensional inner product spaces. Fix an orthonormal bases $\left\{e_{1},\cdots,e_{n}\right\}$ for $V$. By the real spectral theorem, there exists an orthonormal basis of eigenvectors $\left\{f_{1},\cdots,f_{n}\right\}$. Let $\varphi_{1}: \mathbb{R}^{n}\rightarrow V$ and $\varphi_{2}: \mathbb{R}^{n}\rightarrow V$ be the coordinate isomorphisms mapping the standard basis $\epsilon_{j}$ to $e_{j}$ and $f_{j}$, respectivey. Set $T_{1} := \varphi_{1}^{-1} \circ T\circ \varphi_{1}$ and $\varphi_{2}^{-1}\circ T\circ \varphi_{2}$ to obtain operators $\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}$. Then

$\displaystyle T_{2}=\varphi_{2}^{-1}\circ T\circ\varphi_{2}=\varphi_{2}^{-1}\circ\varphi_{1}\circ(\varphi_{1}^{-1}\circ T\circ\varphi_{1})\circ\varphi_{1}^{-1}\varphi_{2}=(\varphi_{2}^{-1}\circ \varphi_{1})\circ T_{1}\circ (\varphi_{1}^{-1}\circ\varphi_{2})$

Set $P:= \varphi_{1}^{-1}\circ\varphi_{2}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}$. I claim that $P$ is a unitary transformation. Indeed,

$\begin{array}{lcl} \displaystyle\langle{P\epsilon_{i},P\epsilon_{j}}\rangle=\langle{\varphi_{1}^{-1}f_{i},\varphi_{1}^{-1}f_{j}}\rangle&=&\displaystyle\left\langle{\varphi_{1}^{-1}\left(\sum_{k=1}^{n}\langle{f_{i},e_{k}}\rangle e_{k}\right),\varphi_{1}^{-1}\left(\sum_{k=1}^{n}\langle{f_{j},e_{k}}\rangle e_{k}\right)}\right\rangle\\&=&\displaystyle\left\langle{\sum_{k=1}^{n}\langle{f_{i},e_{k}}\rangle\epsilon_{k},\sum_{k=1}^{n}\langle{f_{j},e_{k}}\rangle\epsilon_{k}}\right\rangle\\&=&\displaystyle\sum_{k=1}^{n}\langle{f_{i},e_{k}}\rangle\overline{\langle{f_{j},e_{k}}\rangle}\\&=&\displaystyle\sum_{k=1}^{n}\langle{f_{i},\langle{f_{j},e_{k}}\rangle e_{k}}\rangle\\&=&\displaystyle\left\langle{f_{i},\sum_{k=1}^{n}\langle{f_{j},e_{k}}\rangle e_{k}}\right\rangle\\&=&\displaystyle\langle{f_{i},f_{j}}\rangle\\&=&\displaystyle\langle{\epsilon_{i},\epsilon_{j}}\rangle\end{array}$

Using this change-of-basis result, we see that for any symmetric real $n \times n$ matrix $\mathbf{A}$, there exists an orthogonal (i.e. defines a unitary operator) matrix $\mathbf{O}$ and a diagonal matrix $\mathbf{D}$ such that

$\displaystyle\mathbf{O}^{T}\mathbf{A}\mathbf{O}=\mathbf{D}$

Let $\lambda_{1},\cdots,\lambda_{n}$ denote the diagonal entries of $\mathbf{D}$, and let $X_{1},\cdots,X_{n}$ be i.i.d. mean-zero random variables on a probability space $(\Omega,\mathcal{F},\mathbb{P})$ with respective variances $\lambda_{1},\cdots,\lambda_{n}$. Define a random vector $\mathbf{X} := (X_{1},\cdots,X_{n})$, and set $\mathbf{Y}=\mathbf{O}\mathbf{X}$, where $\mathbf{X}$ is regarded as a column vector. Hence,

$\displaystyle\text{Var}(\mathbf{Y})=\mathbb{E}\left[\mathbf{O}\mathbf{X}\left(\mathbf{O}\mathbf{X}\right)^{T}\right]=\mathbb{E}\left[\mathbf{O}\mathbf{X}\mathbf{X}^{T}\mathbf{O}^{T}\right]=\mathbf{O}\mathbf{D}\mathbf{O}^{T}=\mathbf{A}$