Suppose we have a collection of probability measures index by some set . We call the parameter and the parameter space. For example, we can take and let be the collection of exponential distributions with parameter .

We first prove an elementary result for conditional expectation that is sometimes called the smoothing lemma.

Lemma 1.Let be a random variable on a probability space , and suppose that are sub--algebras of . Then

*Proof. *For any ,

since .

Using Lemma 1, we can obtain the conditional analogue of the computational formula for the variance of a random variable.

Lemma 2.If and is a sub--algebra, then

*Proof. *Expanding the quadratic,

where we use that is, by definition, -measurable.

We define the conditional variance of an random variable with respect to a sub--algebra by

Note that implies that by conditional Jensen’s inequality, so conditional variance is well-defined.

Our last lemma before getting to the main result of this post, the Rao-Blackwell theorem, is a lower bound for the approximation of an random variable by an random variable .

Lemma 3.Let be random variables with finite variance, let be a sub--algebra of , and suppose that is -measurable. Thenwhere equality holds if and only if a.s.

*Proof. *We add and expanding the quadratic to obtain

since by the smoothing lemma. The equality condition is immediate.

Lemma 3 is really the “best approximation” property of orthogonal projections in Hilbert space theory translated into the language of probability theory.

Recall that we say a random variable is an unbiased estimator of a parameter if , where represents the “unknown parameter.” If we have a sample , a statistic is said to be sufficient if the conditional distribution is independent of the value . Intuitively, once we observe a random sample and compute the sufficient statistic , the original data do not contain any additional information about the unknown parameter .

An important result in statistical theory for determining whether a statistic is sufficient is the Fisher-Neyman factorization theorem, which we will not prove. A special case of the factorization theorem says a statistic of a sample with parameter is sufficient if the joint density function with parameter can be factored

where are Borel-measurable function.

We use the factorization theorem to show that the sample mean of independent random normal variables with unknown mean and variance is sufficient. Indeed, by independence, the joint density function of the is

Since the sum of i.i.d. random variables is distributed , we have by the scaling properties of the normal distribution that . It follows that the density function of , where , is

This last expression is evidently independent of , which shows that is sufficient.

If we start with an estimator , a sufficient statistic allow us to obtain an estimator , known as the Rao-Blackwell estimator, which has lower expected square loss than the original estimator . This result is the Rao-Blackwell theorem, which we state and prove now.

Theorem 4. (Rao-Blackwell)Suppose that is a sufficient statistic for , and suppose that is an unbiased estimator of such thatThen is an unbiased estimator of and

*Proof. *That is a consistent estimator of is immediate from the tower property of conditional expectation:

The equality in the statement of the theorem follows from application of the preceding lemmas. We have by Lemma 3 that

Noting that both terms are nonnegative completes the proof.

A useful consequence of the Rao-Blackwell theorem is that we can restrict our search for minimum-variance unbiased estimators (MVUEs) to sufficient statistics.