Inverse Jensen Inequality

Lately, I’ve been on a bit of a convexity kick. It’s one of many subjects that I never studied in detail as a student, despite being acquainted with its importance. Not constrained by the curricula of offered courses, I have the freedom to indulge my mathematical curiosity. As one might sense from my recent posts, this curiosity is currently gorging itself on the basic theory of convex functions.

I also really like inequalities. So it’s not surprising that when you combine the property of convexity and the study of inequalities,  I might want to write about Jensen’s inequality. Indeed, this is the case.

Jensen’s inequality (in its discrete form) says that given real numbers a_{1},\cdots,a_{n}\in I, a convex function f: I\rightarrow\mathbb{R}, and convex combination \lambda_{1},\cdots,\lambda_{n}\in [0,1],

\displaystyle f\left(\sum_{k=1}^{n}\lambda_{k}a_{k}\right)\leq\sum_{k=1}^{n}\lambda_{k}f(a_{k})

Note that if f is concave, the inequality is just reversed. Jensen’s inequality is well-known to beginning math students, especially if they’ve learned a bit of probability theory, but less well-known–my sample size is 1, myself–are upper bounds for the quantity

\displaystyle\sum_{k=1}^{n}\lambda_{k}f(x_{k})-f\left(\sum_{k=1}^{n}\lambda_{k}x_{k}\right)\geq 0,\indent\forall(x_{1},\cdots,x_{n})\in I^{n}

viewed as a function on the cube I^{n}. Today’s post is intended as a brief exposition of such bounds. We begin by stating the main theorem in the elementary theory of such bounds, then proving a weaker form of Lagrange’s mean value theorem for nondifferentiable functions, which we will need in the proof of the preceding theorem, and we’ll conclude by deriving some well-known inequalities, including Kantorovich’s.

Theorem 1. Let f:[a,b]\rightarrow\mathbb{R} be a convex function and let [m_{1},M_{1}],\cdots,[m_{n},M_{n}] be compact subintervals of [a,b]. Given \lambda_{1},\cdots,\lambda_{n}\in[0,1] satisfying \sum_{k=1}^{n}\lambda_{k}=1, the function E:\mathbb{R}^{n}\rightarrow\mathbb{R} defined by

\displaystyle E(x_{1},\cdots,x_{n}):=\sum_{k=1}^{n}\lambda_{k}f(x_{k})-f\left(\sum_{k=1}^{n}\lambda_{k}x_{k}\right)

attains its supremum at a vertex of \Omega:=\prod_{k=1}^{n}[m_{k},M_{k}].

The following lemma generalizes Lagrange’s mean value theorem for real-valued functions on an interval I by only requiring the weaker hypothesis that

\displaystyle\underline{D}f(a)=\liminf_{x\rightarrow a}\dfrac{f(x)-f(a)}{x-a}\text{ and }\overline{D}f(a)=\limsup_{x\rightarrow a}\dfrac{f(x)-f(a)}{x-a},

respectively called the lower and upper derivatives of f at the point a, exist  everywhere on I.

Lemma 2. Let h: [a,b]\rightarrow\mathbb{R} be a continuous function. Then there exists a point c\in (a,b) such that

\displaystyle\underline{D}h(c)\leq\dfrac{h(b)-h(a)}{b-a}\leq\overline{D}h(c)

Proof. Define a function H: [a,b]\rightarrow\mathbb{R} by

\displaystyle H(x):=h(x)-\dfrac{h(b)-h(a)}{b-a}(x-a),\indent\forall x\in(a,b)

Observe that

\displaystyle H(a)=h(a)=h(b)-\dfrac{h(b)-h(a)}{b-a}(b-a)=H(b)

H is continuous, being the composition of continuous functions, so by Weierstrass’ theorem, H attains its infimum and supremum on [a,b]. Suppose H attains its supremum at some point c\in (a,b). Then for \delta>0 small,

\displaystyle\dfrac{H(c+\delta)-H(c)}{\delta}\leq 0\text{ and }\dfrac{H(c+\delta)-H(c)}{-\delta}\geq 0,

which implies that \underline{D}H(c)\leq 0\leq\overline{D}H(c). If H attains its infimum at c \in(a,b), then reversing the preceding inequalities yields the same result. In either case, we have

\begin{array}{lcl} \displaystyle 0\geq\underline{D}H(c)=\liminf_{x\rightarrow c}\dfrac{H(x)-H(c)}{x-c}&=&\displaystyle\dfrac{h(x)-\frac{h(b)-h(a)}{b-a}(x-a)-h(c)+\frac{h(b)-(a)}{b-a}(c-a)}{x-c}\\&=&\displaystyle\liminf_{x\rightarrow }\dfrac{h(x)-h(c)+\frac{h(b)-h(a)}{b-a}(c-x)}{x-c}\\&=&\displaystyle\underline{D}h(c)\end{array}

and similarly, \overline{D}h(c)\geq 0. If H attains its supremum and infimum at the endpoints, then H is necessarily constant, and the conclusion of the lemma is immediate. \Box

We now use the lemma to prove the theorem.

Proof. Fix compact subintervals [m_{1},M_{1}],\cdots,[m_{n},M_{n}]. Since f is convex, it is continuous; since \Omega:=\prod_{k=1}^{n}[m_{k},M_{k}] is compact, f attains its supremum and infimum on \Omega. By induction, it suffices to show that

\displaystyle E(x_{1},\cdots,x_{n})\leq \sup_{\Omega}\left\{E(x_{1},\cdots,m_{k},\cdots,x_{n}),E(x_{1},\cdots,M_{k},\cdots,x_{n})\right\}

for all (x_{1},\cdots,x_{n})\in\Omega and 1\leq k\leq n.

Assume the contrary: that there exists (x_{1},\cdots,x_{n})\in\Omega and k\in\left\{1,\cdots,n\right\} such that

\displaystyle E(x_{1},\cdots,x_{n})>\sup\left\{E(x_{1},\cdots,m_{k},\cdots,x_{n}),E(x_{1},\cdots,M_{k},\cdots,x_{n})\right\}

Note that this assumption forces \lambda_{k}\in(0,1). By symmetry, we may assume that k=1. Consider the function

\displaystyle h:[m_{1},M_{1}]\rightarrow\mathbb{R},\indent h(x):=E(x,x_{2},\cdots,x_{n})\ \forall x\in [m_{1},M_{1}]

By the lemma, there exists \xi\in (m_{1},x_{1}) such that

\displaystyle\dfrac{h(x_{1})-h(m_{1})}{x_{1}-m_{1}}\leq \overline{D}h(\xi)\Longleftrightarrow h(x_{1})-h(m_{1})\leq\overline{D}h(\xi)(x_{1}-m_{1})

Since h(x_{1})>h(m_{1}), it follows that \overline{D}h(\xi)>0 and therefore

\begin{array}{lcl}\displaystyle0&<&\displaystyle\limsup_{x\rightarrow \xi}\dfrac{h(x)-h(\xi)}{x-\xi}=\limsup_{x\rightarrow\xi}\dfrac{\lambda_{1}\left[f(x)-f(\xi)\right]-f\left(\lambda_{1}x+\sum_{k=2}^{n}\lambda_{k}x_{k}\right)+f\left(\lambda_{1}\xi+\sum_{k=2}^{n}\lambda_{k}x_{k}\right)}{x-\xi}\\&\Longleftrightarrow&\displaystyle\lambda_{1}\limsup_{x\rightarrow\xi}\dfrac{f(x)-f(\xi)}{x-\xi}>\limsup_{x\rightarrow x_{1}}\dfrac{f\left(\lambda_{1}x+\sum_{k=2}^{n}\lambda_{k}x_{k}\right)-f\left(\lambda_{1}\xi+\sum_{k=2}^{n}\lambda_{k}x_{k}\right)}{x-\xi}\\&\Longleftrightarrow&\displaystyle\overline{D}f(\xi)>\limsup_{x\rightarrow x_{1}}\dfrac{f\left(\lambda_{1}x+\sum_{k=2}^{n}\lambda_{k}x_{k}\right)-f\left(\lambda_{1}\xi+\sum_{k=2}^{n}\lambda_{k}x_{k}\right)}{\lambda_{1}x-\lambda_{1}\xi}\\&\Longleftrightarrow&\displaystyle\overline{D}f(\xi)>\overline{D}f(\lambda_{1}\xi+\lambda_{2}+\cdots+\lambda_{n}x_{n})\end{array}

By the convexity of f, the \overline{D}f is nondecreasing on (a,b) and therefore \xi>\lambda_{1}\xi+\lambda_{2}x_{2}+\cdots+\lambda_{n}x_{n}. Thus,

\displaystyle\xi=\sum_{k=1}^{n}\lambda_{k}\xi>\lambda_{1}\xi+\sum_{k=2}^{n}\lambda_{k}x_{k}\Longleftrightarrow\xi>\dfrac{\sum_{k=2}^{n}\lambda_{k}x_{k}}{\sum_{k=2}^{n}\lambda_{k}}

We apply lemma 2 to h|_{[x_{1},M_{1}]} and the same reasoning to obtain \eta\in (x_{1},M_{1}) such that \overline{D}h(\eta)<0,

\displaystyle\overline{D}f(\eta)<\overline{D}f(\lambda_{1}\eta+\lambda_{2}x_{2}+\cdots+\lambda_{n}x_{n})\Longrightarrow \eta<\dfrac{\sum_{k=2}^{n}\lambda_{k}x_{k}}{\sum_{k=2}^{n}\lambda_{k}}

But this is a contradiction, since \eta>x_{1}>\xi. \Box

If we impose the additional hypothesis that f is differentiable on an interval [a,b], then we have the inequality

\displaystyle\sum_{k=1}^{n}\lambda_{k}f(x_{k})-f\left(\sum_{k=1}^{n}\lambda_{k}x_{k}\right)\leq\dfrac{(b-a)(f'(b)-f'(a))}{4}

Proof. By Theorem 1,

\displaystyle E(x_{1},\cdots,x_{n}):=\sum_{k=1}^{n}\lambda_{k}f(x_{k})-f\left(\sum_{k=1}^{n}\lambda_{k}x_{k}\right)

attains its maximum at a point (c_{1},\cdots,c_{n})\in\left\{a,b\right\}^{n}. Let I\subset\left\{1,\cdots,n\right\} denote the subset of indices k such that c_{k}=a. Then

\begin{array}{lcl}\displaystyle\sum_{k=1}^{n}\lambda_{k}f(x_{k})-f\left(\sum_{k=1}^{n}\lambda_{k}x_{k}\right)&\leq&\displaystyle\sum_{k\in I}\lambda_{k}f(a)+\sum_{k\in I^{c}}\lambda_{k}f(b)-f\left(\sum_{k\in I}\lambda_{k}a+\sum_{k\in I^{c}}\lambda_{k}b\right)\\[.9 em]&=&\displaystyle f(a)\underbrace{\sum_{k\in I}\lambda_{k}}_{:=s}+f(b)\sum_{k\in I^{c}}-f\left(a\sum_{k\in I}\lambda_{k}+b\sum_{k\in I^{c}}\lambda_{k}\right)\\[.9 em]&=&\displaystyle sf(a)+(1-s)f(b)-f(sa+(1-s)b)\end{array}

Since f is convex and differentiable, we have

\displaystyle f(x)\geq f(t)+(x-t)f'(t),\indent\forall x,t\in [a,b]

In particular,

\displaystyle f(sa+(1-s)b)\geq f(a)+(1-s)(b-a)f'(a),\indent f(sa+(1-s)b)\geq f(b)-s(b-a)f'(b)

Hence,

\begin{array}{lcl} \displaystyle sf(a)+(1-s)f(b)-f(sa+(1-s)b)&=&\displaystyle s\left[f(a)-f(sa+(1-s)b)\right]+(1-s)\left[f(b)-f(sa+(1-s)b)\right]\\&\leq&\displaystyle -s(1-s)(b-a)f'(a)+(1-s)s(b-a)f'(b)\\&=&\displaystyle s(1-s)(b-a)[f'(b)-f'(a)]\end{array}

The conclusion follows from noting that s(1-s)\leq \frac{1}{4} for s\in [0,1]. \Box

Theorem 1 also yields the following simple corollary.

If f: [a,b]\rightarrow\mathbb{R} be a convex function, then

\displaystyle\dfrac{f(a)+f(b)}{2}-f\left(\dfrac{a+b}{2}\right)\geq\dfrac{f(c)+f(d)}{2}-f\left(\dfrac{c+d}{2}\right)

for all a\leq c\leq d \leq b.

Proof. We apply Theorem 1 with m_{1}=m_{2}=a and M_{1}=M_{2}=b and \lambda_{1}=\lambda_{2}=\frac{1}{2}. The function

\displaystyle E(x_{1},x_{n}):=\dfrac{f(x_{1})+f(x_{2})}{2}-f\left(\dfrac{x_{1}+x_{2}}{2}\right)

attains its supremum at a vertex of the cube [a,b]^{2}. Since E(x_{1},x_{2})\geq 0 by Jensen’s inequality, we see that the conclusion of the corollary is immediate if the supremum is attained at (x_{1},x_{2})\in\left\{(a,a),(b,b)\right\}. If the supremum is attained at (x_{1},x_{2})\in\left\{(a,b),(b,a)\right\}, then the conclusion follows from noting that c,d\in[a,b] by hypothesis. \Box

We can use Theorem 1 to prove an upper bound for how good an approximation Jensen’s inequality, which Niculescu and Persson attribute to L.G. Khanin.

Proposition 4. Let p>1, x_{1},\cdots,x_{n}\in[0,M], and \lambda_{1},\cdots,\lambda_{n}\in[0,1] satisfying \sum_{k=1}^{n}\lambda_{k}=1. Then

\displaystyle\sum_{k=1}^{n}\lambda_{k}x_{k}^{p}\leq\left(\sum_{k=1}^{n}\lambda_{k}x_{k}\right)^{p}+(p-1)p^{\frac{p}{1-p}}M^{p}

Proof. Fix p>1 and define E:[0,M]^{n}\rightarrow\mathbb{R} by

\displaystyle E(x_{1},\cdots,x_{n}):=\sum_{k=1}^{n}\lambda_{k}x_{k}^{p}-\left(\sum_{k=1}^{n}\lambda_{k}x_{k}\right)^{p},\indent\forall(x_{1},\cdots,x_{n})\in[0,M]^{n}

By Theorem 1, E attains its supremum at a point (c_{1},\cdots,c_{n})\in\left\{0,M\right\}^{n}. Let I\subset\left\{1,\cdots,n\right\} denote the subset of indices k such that c_{k}=M. Then

\begin{array}{lcl}\displaystyle E(x_{1},\cdots,x_{n})\leq E(c_{1},\cdots,c_{n})=\sum_{k\in I}\lambda_{k}c_{k}^{p}-\left(\sum_{k\in I}\lambda_{k}c_{k}\right)^{p}&=&\displaystyle\sum_{k\in I}\lambda_{k}M^{p}-\left(\sum_{k\in I}\lambda_{k}M\right)^{p}\\[.9 em]&=&\displaystyle M^{p}\underbrace{\sum_{k\in I}\lambda_{k}}_{:=s}-M^{p}\left(\sum_{k\in I}\lambda_{k}\right)^{p}\\[.9 em]&\leq&\displaystyle M^{p}\sup_{t\in [0,1]}t-t^{p}\end{array}

We can use the calculus to compute \sup_{t\in [0,1]}t-t^{p}. Indeed, define a function g:[0,1]\rightarrow\mathbb{R} by g(t):=t-t^{p}. Then

\displaystyle g'(t)=1-pt^{p-1}=0\Longleftrightarrow t=p^{\frac{1}{1-p}},

g'(t)>0 for t\in [0,p^{\frac{1}{1-p}}), and g'(t)<0 for t\in (p^{\frac{1}{1-p}},1]. Hence, t=p^{\frac{1}{1-p}} maximizes g, and we conclude that

\displaystyle E(x_{1},\cdots,x_{n})\leq M^{p}\left(p^{\frac{1}{1-p}}-p^{\frac{p}{1-p}}\right)=M^{p}p^{\frac{p}{1-p}}\left(p-1\right)

\Box

Advertisements
This entry was posted in math.CA and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s