Lately, I’ve been on a bit of a convexity kick. It’s one of many subjects that I never studied in detail as a student, despite being acquainted with its importance. Not constrained by the curricula of offered courses, I have the freedom to indulge my mathematical curiosity. As one might sense from my recent posts, this curiosity is currently gorging itself on the basic theory of convex functions.

I also really like inequalities. So it’s not surprising that when you combine the property of convexity and the study of inequalities, I might want to write about Jensen’s inequality. Indeed, this is the case.

Jensen’s inequality (in its discrete form) says that given real numbers , a convex function , and convex combination ,

Note that if is concave, the inequality is just reversed. Jensen’s inequality is well-known to beginning math students, especially if they’ve learned a bit of probability theory, but less well-known–my sample size is 1, myself–are upper bounds for the quantity

viewed as a function on the cube . Today’s post is intended as a brief exposition of such bounds. We begin by stating the main theorem in the elementary theory of such bounds, then proving a weaker form of Lagrange’s mean value theorem for nondifferentiable functions, which we will need in the proof of the preceding theorem, and we’ll conclude by deriving some well-known inequalities, including Kantorovich’s.

Theorem 1.Let be a convex function and let be compact subintervals of . Given satisfying , the function defined byattains its supremum at a vertex of .

The following lemma generalizes Lagrange’s mean value theorem for real-valued functions on an interval I by only requiring the weaker hypothesis that

,

respectively called the lower and upper derivatives of at the point , exist everywhere on .

Lemma 2.Let be a continuous function. Then there exists a point such that

*Proof.* Define a function by

Observe that

is continuous, being the composition of continuous functions, so by Weierstrass’ theorem, attains its infimum and supremum on . Suppose attains its supremum at some point . Then for small,

,

which implies that . If attains its infimum at , then reversing the preceding inequalities yields the same result. In either case, we have

and similarly, . If attains its supremum and infimum at the endpoints, then is necessarily constant, and the conclusion of the lemma is immediate.

We now use the lemma to prove the theorem.

*Proof.* Fix compact subintervals . Since is convex, it is continuous; since is compact, attains its supremum and infimum on . By induction, it suffices to show that

for all and .

Assume the contrary: that there exists and such that

Note that this assumption forces . By symmetry, we may assume that . Consider the function

By the lemma, there exists such that

Since , it follows that and therefore

By the convexity of , the is nondecreasing on and therefore . Thus,

We apply lemma 2 to and the same reasoning to obtain such that ,

But this is a contradiction, since .

If we impose the additional hypothesis that is differentiable on an interval , then we have the inequality

*Proof.* By Theorem 1,

attains its maximum at a point . Let denote the subset of indices such that . Then

Since is convex and differentiable, we have

In particular,

Hence,

The conclusion follows from noting that for .

Theorem 1 also yields the following simple corollary.

If be a convex function, then

for all .

*Proof.* We apply Theorem 1 with and and . The function

attains its supremum at a vertex of the cube . Since by Jensen’s inequality, we see that the conclusion of the corollary is immediate if the supremum is attained at . If the supremum is attained at , then the conclusion follows from noting that by hypothesis.

We can use Theorem 1 to prove an upper bound for how good an approximation Jensen’s inequality, which Niculescu and Persson attribute to L.G. Khanin.

Proposition 4.Let , , and satisfying . Then

*Proof.* Fix and define by

By Theorem 1, attains its supremum at a point . Let denote the subset of indices such that . Then

We can use the calculus to compute . Indeed, define a function by . Then

,

for , and for . Hence, maximizes , and we conclude that