We begin with an inequality for probability density functions, sometimes referred to as the **information inequality**. We use this inequality to prove that the Kullback-Leibler divergence, a `distance’ between two probability measures, is always nonnegative.

**Proposition 1. **

for all nonnegative functions measurable functions satisfying .

*Proof.* Let be a random variable with probability density function . Consider the random variable . We will show that

Since is concave, Jensen’s inequality gives

Observing that completes the proof.

The information inequality is used to showed that the Kullback-Leibler divergence between two distributions of continuous random variables with probability density functions and , respectively, defined by

is nonnegative. Note that the Kullback-Leibler divergence can be generalized to the case where are probability measures on a measure space and is absolutely continuous with respect to . In the case, we define

where is the Radon-Nikodym derivative of with respect to . To see that is sitll nonnegative, we apply Jensen’s inequality to the convex function to obtain

Note that, although the KL-divergence can be intuivitely thought of as a `distance’ between two probability measures, it does not define a metric since it need not be symmetric. For a simple counter example, let and . Then

while

We now apply the information inequality to proving some results concerning maximal entropy. Define the entropy of to be

We also use the notation .

We now prove that the distribution has maximum entropy among all probability density functions supported on . For any pdf supported on and , we have by the information inequality that

Taking completes the proof.

We now prove that has maximum entropy among all probability density functions on that have mean and variance . If is the density function of a random variable, then

since if has pdf , then

Taking shows that this upper bound is attained.

### Like this:

Like Loading...

*Related*

It is in reality a nice and helpful piece of info. I am happy that you simply shared this helpful info with us. Please stay us informed like this. Thank you for sharing.

You’re using Jensen’s inequality the wrong way around! You’ll need to rethink your first proof…

Thank you for your comment. I take it that you are referring to the proof of Proposition 1. I reviewed my proof, and I’m not seeing the mistake. f(x) = ln(x) is a concave function (compute the second derivative), so Jensen’s inequality is reversed.