Lately, I’ve been on a bit of a convexity kick. It’s one of many subjects that I never studied in detail as a student, despite being acquainted with its importance. Not constrained by the curricula of offered courses, I have the freedom to indulge my mathematical curiosity. As one might sense from my recent posts, this curiosity is currently gorging itself on the basic theory of convex functions.
I also really like inequalities. So it’s not surprising that when you combine the property of convexity and the study of inequalities, I might want to write about Jensen’s inequality. Indeed, this is the case.
Jensen’s inequality (in its discrete form) says that given real numbers , a convex function , and convex combination ,
Note that if is concave, the inequality is just reversed. Jensen’s inequality is well-known to beginning math students, especially if they’ve learned a bit of probability theory, but less well-known–my sample size is 1, myself–are upper bounds for the quantity
viewed as a function on the cube . Today’s post is intended as a brief exposition of such bounds. We begin by stating the main theorem in the elementary theory of such bounds, then proving a weaker form of Lagrange’s mean value theorem for nondifferentiable functions, which we will need in the proof of the preceding theorem, and we’ll conclude by deriving some well-known inequalities, including Kantorovich’s.
Theorem 1. Let be a convex function and let be compact subintervals of . Given satisfying , the function defined by
attains its supremum at a vertex of .
The following lemma generalizes Lagrange’s mean value theorem for real-valued functions on an interval I by only requiring the weaker hypothesis that
respectively called the lower and upper derivatives of at the point , exist everywhere on .
Lemma 2. Let be a continuous function. Then there exists a point such that
Proof. Define a function by
is continuous, being the composition of continuous functions, so by Weierstrass’ theorem, attains its infimum and supremum on . Suppose attains its supremum at some point . Then for small,
which implies that . If attains its infimum at , then reversing the preceding inequalities yields the same result. In either case, we have
and similarly, . If attains its supremum and infimum at the endpoints, then is necessarily constant, and the conclusion of the lemma is immediate.
We now use the lemma to prove the theorem.
Proof. Fix compact subintervals . Since is convex, it is continuous; since is compact, attains its supremum and infimum on . By induction, it suffices to show that
for all and .
Assume the contrary: that there exists and such that
Note that this assumption forces . By symmetry, we may assume that . Consider the function
By the lemma, there exists such that
Since , it follows that and therefore
By the convexity of , the is nondecreasing on and therefore . Thus,
We apply lemma 2 to and the same reasoning to obtain such that ,
But this is a contradiction, since .
If we impose the additional hypothesis that is differentiable on an interval , then we have the inequality
Proof. By Theorem 1,
attains its maximum at a point . Let denote the subset of indices such that . Then
Since is convex and differentiable, we have
The conclusion follows from noting that for .
Theorem 1 also yields the following simple corollary.
If be a convex function, then
for all .
Proof. We apply Theorem 1 with and and . The function
attains its supremum at a vertex of the cube . Since by Jensen’s inequality, we see that the conclusion of the corollary is immediate if the supremum is attained at . If the supremum is attained at , then the conclusion follows from noting that by hypothesis.
We can use Theorem 1 to prove an upper bound for how good an approximation Jensen’s inequality, which Niculescu and Persson attribute to L.G. Khanin.
Proposition 4. Let , , and satisfying . Then
Proof. Fix and define by
By Theorem 1, attains its supremum at a point . Let denote the subset of indices such that . Then
We can use the calculus to compute . Indeed, define a function by . Then
for , and for . Hence, maximizes , and we conclude that