Conditional Expectation as a Markov Operator

I assume that the reader acquainted with probability theory is familiar with conditional expectation, but for everyone’s benefit, we will briefly review the definition.

Definition. Let (\Omega,\mathcal{F},\mathbb{P}) be a probability space, \mathcal{G}\subset\mathcal{F} a sub-\sigma-algebra, and X \in L^{1}(\Omega,\mathcal{F},\mathbb{P}) be a random variable. A random varible Y\in L^{1}(\Omega,\mathcal{G},\mathbb{P}) such that

\displaystyle\int_{A}Xd\mathbb{P}=\int_{A}Yd\mathbb{P},\indent\forall A\in\mathcal{G}

is called the conditional expectation of X with respect to \mathcal{G} and denoted by \mathbb{E}[X\mid\mathcal{G}].

If \mathcal{G} is the sub-\sigma-algebra generated by a random variable Z, we often write \mathbb{E}[X\mid Z] instead of \mathbb{E}[X\mid\mathcal{G}].

The definition of conditional expectation implicitly assumes that the random variable Y is a.s. unique, a result we prove now. Suppose Y,Y' \in L^{1}(\Omega,\mathcal{G},\mathbb{P}) satisfy the definition of conditional expectation. Then the sets \left\{Y-Y'\geq 0\right\} and \left\{Y-Y'<0\right\} are in \mathcal{G}. Hence,

\displaystyle\left\{\left|Y-Y'\right|>0\right\}=\left\{Y-Y'\geq 0\right\}\cup\left\{Y-Y'<0\right\}\in\mathcal{G}

Hence,

\begin{array}{lcl}\displaystyle\int_{\Omega}\left|Y-Y'\right|d\mathbb{P}&=&\displaystyle\int_{\left\{Y-Y'\geq 0\right\}}\left(Y-Y'\right)d\mathbb{P}+\int_{\left\{Y-Y'<0\right\}}\left(Y'-Y\right)d\mathbb{P}\\[.9 em]&=&\displaystyle\int_{\left\{Y-Y'\geq 0\right\}}Xd\mathbb{P}-\int_{\left\{Y-Y'\geq 0\right\}}Xd\mathbb{P}+\int_{\left\{Y-Y'<0\right\}}Xd\mathbb{P}-\int_{\left\{Y-Y'<0\right\}}Xd\mathbb{P}\\[.9 em]&=&\displaystyle 0\end{array}

which implies that \left|Y-Y'\right|=0 a.s. or equivalently, Y=Y' a.s.

Conditional expectation would not be such a useful concept unless it existed for a sufficiently large class of random variables. Fortunately, absolute integrability ensures the existence of conditional expectation of a random variable. The existence proof I first learned as a student used the Lebesgue-Radon-Nikodym theorem. There’s nothing wrong with this proof; any student of probability theory needs to understand absolute continuity and the Radon-Nikodym derivative. But the Radon-Nikodym route downplays the fact that conditional expectation is an operator X\mapsto\mathbb{E}[X\mid\mathcal{G}] on L^{1}(\Omega,\mathcal{F},\mathbb{P}) such that when X\in L^{2}(\Omega,\mathcal{F},\mathbb{P}), \mathbb{E}[X\mid\mathcal{G}] is the orthogonal projection of X onto the subspace L^{2}(\Omega,\mathcal{G},\mathbb{P}). Since I have a fondness for functional analysis–Hilbert spaces, in particular–I’m going to present an alternative proof from Bobrowski’s Functional Analysis for Probability and Stochastic Processes

Theorem. For any X\in L^{1}(\Omega,\mathcal{F},\mathbb{P}), \mathbb{E}[X\mid\mathcal{G}] exists and is a.s. unique. Moreover, the map P: X \mapsto \mathbb{E}[X\mid\mathcal{G}] is a Markov operator and restricted to the subspace L^{2}(\Omega,\mathcal{F},\mathbb{P}) is the projection onto the subspace L^{2}(\Omega,\mathcal{G},\mathbb{P}).

Proof. First, suppose X\in L^{2}(\Omega,\mathcal{F},\mathbb{P}). Then since the L^{2}-limit of \mathcal{G}-measurable functions is again \mathcal{G}-measurable, we have that L^{2}(\Omega,\mathcal{G},\mathbb{P}) is a closed subspace of the Hilbert space L^{2}(\Omega,\mathcal{F},\mathbb{P}). We define the random variable Y to be the orthogonal projection of PX onto L^{2}(\Omega,\mathcal{G},\mathbb{P}). Denote the L^{2}-inner product by \langle{\cdot,\cdot}\rangle. Since \langle{X-PX,Z}\rangle=0 for all Z\in L^{2}(\Omega,\mathcal{G},\mathbb{P}), taking Z=\mathbf{1}_{A}, for any A\in\mathcal{G}, we obtain

\displaystyle 0=\int_{\Omega}\left(X-PX\right)\mathbf{1}_{A}d\mathbb{P}=\int_{A}Xd\mathbb{P}-\int_{A}PXd\mathbb{P}

Note that L^{2}(\Omega,\mathcal{F},\mathbb{P}) is a dense subspace of L^{1}(\Omega,\mathcal{F},\mathbb{P}). Thus, if we can show that $Platex X \geq 0$ for any X \in L^{2}(\Omega,\mathcal{F},\mathbb{P}), then by the Markov extension theorem P has a unique extension to a Markov operator P: L^{1}(\Omega,\mathcal{F},\mathbb{P})\rightarrow L^{1}(\Omega,\mathcal{G},\mathbb{P}). Since P preserves the continuous linear functional \int_{A}\cdot on the dense subset L^{2}(\Omega,\mathcal{F},\mathbb{P}), for any A\in\mathcal{G}, we then conclude that

\displaystyle\int_{A}Xd\mathbb{P}=\int_{A}PXd\mathbb{P},\indent\forall A\in\mathcal{G}

Suppose X\in L^{2}(\Omega,\mathcal{F},\mathbb{P}) is nonnegative, and that \mathbb{P}(PX < 0) > 0. Since \left\{PX<0\right\}=\bigcup_{n=1}^{\infty}\left\{PX<-\frac{1}{n}\right\}, \mathbb{P}(PX<-\frac{1}{n})>0 for some n. Since \left\{PX<0\right\}. Since \left\{PX<-\frac{1}{n}\right\}\in\mathcal{G} and X is nonnegative, we have that

\displaystyle 0\leq\int_{\left\{PX< -\frac{1}{n}\right\}}Xd\mathbb{P}=\int_{\left\{PX< -\frac{1}{n}\right\}}PXd\mathbb{P}< -\dfrac{1}{n}\mathbb{P}\left(PX< -\frac{1}{n}\right)

which is a contradiction. \Box

Advertisements
This entry was posted in math.FA, math.PR and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s