## Conditional Expectation as a Markov Operator

I assume that the reader acquainted with probability theory is familiar with conditional expectation, but for everyone’s benefit, we will briefly review the definition.

Definition. Let $(\Omega,\mathcal{F},\mathbb{P})$ be a probability space, $\mathcal{G}\subset\mathcal{F}$ a sub-$\sigma$-algebra, and $X \in L^{1}(\Omega,\mathcal{F},\mathbb{P})$ be a random variable. A random varible $Y\in L^{1}(\Omega,\mathcal{G},\mathbb{P})$ such that

$\displaystyle\int_{A}Xd\mathbb{P}=\int_{A}Yd\mathbb{P},\indent\forall A\in\mathcal{G}$

is called the conditional expectation of $X$ with respect to $\mathcal{G}$ and denoted by $\mathbb{E}[X\mid\mathcal{G}]$.

If $\mathcal{G}$ is the sub-$\sigma$-algebra generated by a random variable $Z$, we often write $\mathbb{E}[X\mid Z]$ instead of $\mathbb{E}[X\mid\mathcal{G}]$.

The definition of conditional expectation implicitly assumes that the random variable $Y$ is a.s. unique, a result we prove now. Suppose $Y,Y' \in L^{1}(\Omega,\mathcal{G},\mathbb{P})$ satisfy the definition of conditional expectation. Then the sets $\left\{Y-Y'\geq 0\right\}$ and $\left\{Y-Y'<0\right\}$ are in $\mathcal{G}$. Hence,

$\displaystyle\left\{\left|Y-Y'\right|>0\right\}=\left\{Y-Y'\geq 0\right\}\cup\left\{Y-Y'<0\right\}\in\mathcal{G}$

Hence,

$\begin{array}{lcl}\displaystyle\int_{\Omega}\left|Y-Y'\right|d\mathbb{P}&=&\displaystyle\int_{\left\{Y-Y'\geq 0\right\}}\left(Y-Y'\right)d\mathbb{P}+\int_{\left\{Y-Y'<0\right\}}\left(Y'-Y\right)d\mathbb{P}\\[.9 em]&=&\displaystyle\int_{\left\{Y-Y'\geq 0\right\}}Xd\mathbb{P}-\int_{\left\{Y-Y'\geq 0\right\}}Xd\mathbb{P}+\int_{\left\{Y-Y'<0\right\}}Xd\mathbb{P}-\int_{\left\{Y-Y'<0\right\}}Xd\mathbb{P}\\[.9 em]&=&\displaystyle 0\end{array}$

which implies that $\left|Y-Y'\right|=0$ a.s. or equivalently, $Y=Y'$ a.s.

Conditional expectation would not be such a useful concept unless it existed for a sufficiently large class of random variables. Fortunately, absolute integrability ensures the existence of conditional expectation of a random variable. The existence proof I first learned as a student used the Lebesgue-Radon-Nikodym theorem. There’s nothing wrong with this proof; any student of probability theory needs to understand absolute continuity and the Radon-Nikodym derivative. But the Radon-Nikodym route downplays the fact that conditional expectation is an operator $X\mapsto\mathbb{E}[X\mid\mathcal{G}]$ on $L^{1}(\Omega,\mathcal{F},\mathbb{P})$ such that when $X\in L^{2}(\Omega,\mathcal{F},\mathbb{P})$, $\mathbb{E}[X\mid\mathcal{G}]$ is the orthogonal projection of $X$ onto the subspace $L^{2}(\Omega,\mathcal{G},\mathbb{P})$. Since I have a fondness for functional analysis–Hilbert spaces, in particular–I’m going to present an alternative proof from Bobrowski’s Functional Analysis for Probability and Stochastic Processes

Theorem. For any $X\in L^{1}(\Omega,\mathcal{F},\mathbb{P})$, $\mathbb{E}[X\mid\mathcal{G}]$ exists and is a.s. unique. Moreover, the map $P: X \mapsto \mathbb{E}[X\mid\mathcal{G}]$ is a Markov operator and restricted to the subspace $L^{2}(\Omega,\mathcal{F},\mathbb{P})$ is the projection onto the subspace $L^{2}(\Omega,\mathcal{G},\mathbb{P})$.

Proof. First, suppose $X\in L^{2}(\Omega,\mathcal{F},\mathbb{P})$. Then since the $L^{2}$-limit of $\mathcal{G}$-measurable functions is again $\mathcal{G}$-measurable, we have that $L^{2}(\Omega,\mathcal{G},\mathbb{P})$ is a closed subspace of the Hilbert space $L^{2}(\Omega,\mathcal{F},\mathbb{P})$. We define the random variable $Y$ to be the orthogonal projection of $PX$ onto $L^{2}(\Omega,\mathcal{G},\mathbb{P})$. Denote the $L^{2}$-inner product by $\langle{\cdot,\cdot}\rangle$. Since $\langle{X-PX,Z}\rangle=0$ for all $Z\in L^{2}(\Omega,\mathcal{G},\mathbb{P})$, taking $Z=\mathbf{1}_{A}$, for any $A\in\mathcal{G}$, we obtain

$\displaystyle 0=\int_{\Omega}\left(X-PX\right)\mathbf{1}_{A}d\mathbb{P}=\int_{A}Xd\mathbb{P}-\int_{A}PXd\mathbb{P}$

Note that $L^{2}(\Omega,\mathcal{F},\mathbb{P})$ is a dense subspace of $L^{1}(\Omega,\mathcal{F},\mathbb{P})$. Thus, if we can show that $Platex X \geq 0$ for any $X \in L^{2}(\Omega,\mathcal{F},\mathbb{P})$, then by the Markov extension theorem $P$ has a unique extension to a Markov operator $P: L^{1}(\Omega,\mathcal{F},\mathbb{P})\rightarrow L^{1}(\Omega,\mathcal{G},\mathbb{P})$. Since $P$ preserves the continuous linear functional $\int_{A}\cdot$ on the dense subset $L^{2}(\Omega,\mathcal{F},\mathbb{P})$, for any $A\in\mathcal{G}$, we then conclude that

$\displaystyle\int_{A}Xd\mathbb{P}=\int_{A}PXd\mathbb{P},\indent\forall A\in\mathcal{G}$

Suppose $X\in L^{2}(\Omega,\mathcal{F},\mathbb{P})$ is nonnegative, and that $\mathbb{P}(PX < 0) > 0$. Since $\left\{PX<0\right\}=\bigcup_{n=1}^{\infty}\left\{PX<-\frac{1}{n}\right\}$, $\mathbb{P}(PX<-\frac{1}{n})>0$ for some $n$. Since $\left\{PX<0\right\}$. Since $\left\{PX<-\frac{1}{n}\right\}\in\mathcal{G}$ and $X$ is nonnegative, we have that

$\displaystyle 0\leq\int_{\left\{PX< -\frac{1}{n}\right\}}Xd\mathbb{P}=\int_{\left\{PX< -\frac{1}{n}\right\}}PXd\mathbb{P}< -\dfrac{1}{n}\mathbb{P}\left(PX< -\frac{1}{n}\right)$

which is a contradiction. $\Box$