Convexity, Continuity, and Jensen’s Inequality

Convexity is an important notion in mathematics with many applications to real-world problems. In this post, I would like to prove a few basic results for convex functions on Euclidean space. Recall that a subset of E \subset\mathbb{R}^{d} is said to be convex if, for all x,y \in E,

\displaystyle \lambda x + (1-\lambda)y \in E, \indent \forall \lambda \in [0,1]

Suppose f: E \rightarrow \mathbb{R} is a real-valued function defined on a convex subset E. We say that f is a convex function if, for all x,y \in E,

\displaystyle f(\lambda x + (1-\lambda)y) \leq \lambda f(x) + (1-\lambda)f(y),\indent \forall \lambda \in [0,1]

If the preceding inequality holds for the special case \lambda = \frac{1}{2}, then we say that f is midpoint convex. There are number of interesting things we can say about convex functions. One is that every local minimum is a global minimum. Indeed, suppose x_{0} is a local minimum of a convex function f: E \rightarrow \mathbb{R}. Then there exists an open ball B(x_{0}; \delta), such that f(x_{0}) \geq f(y) for all \left|y-x_{0}\right| < \delta. For \lambda \in (0,1],

\displaystyle f\left(x_{0} + \lambda(y-x_{0})\right) \leq (1-\lambda)f(x_{0}) + \lambda f(y) = f(x_{0}) + \lambda\left(f(y) - f(x_{0})\right)

and therefore, since x_{0} + \lambda(y-x_{0}) \in B(x_{0};\delta),

\displaystyle \lambda f(x_{0}) \leq \underbrace{\left[f(x_{0}) - f(x_{0}+\lambda(y-x_{0}))\right]}_{\leq 0} + \lambda f(y)

Dividing both sides by \lambda completes the proof. Furthermore, any increasing convex function of a convex function is also convex. Suppose \varphi: \mathbb{R} \rightarrow \mathbb{R} is increasing, and f : E\rightarrow \mathbb{R} is convex. By convexity, for \lambda \in [0,1],

\displaystyle f(\lambda x + (1-\lambda)y)\leq\lambda f(x)+(1-\lambda)f(y)

and since \varphi is increasing on \mathbb{R}, we conclude that

\displaystyle \varphi\left(f(\lambda x + (1-\lambda)y)\right) \leq \varphi\left(\lambda f(x)+(1-\lambda)f(y)\right)\leq\lambda\varphi(f(x))+(1-\lambda)\varphi(f(y))

We now restrict ourselves to convex functions defined on a convex subset of the real line. Any convex function f defined on an interval (a,b) is continuous. Indeed,

\displaystyle f(y + \lambda(x-y)) \leq \lambda f(x) + (1-\lambda)f(y) = f(y) + \lambda(f(x) - f(y))

If we have a convex function f: (a,b) \rightarrow \mathbb{R}, then we can define a function R: \left\{(x,y) : x,y \in (a,b), x \neq y\right\} \rightarrow \mathbb{R} by

\displaystyle R(x,y) :=\frac{f(x) - f(y)}{x-y}

Observe that R is symmetric in x and y. It turns out that f is convex if and only if R is monotonically nondecreasing for x or y fixed. Fix a \leq x_{1} < x_{2} < x_{3} \leq b. First, I claim that we can write x_{2} = x_{1}t+x_{3}(1-t), where t \in [0,1]. Indeed,

\displaystyle x_{2}-x_{3}=x_{1}t-x_{3}t \Longleftrightarrow 0<\frac{x_{3}-x_{2}}{x_{3}-x_{1}}=\frac{x_{2}-x_{3}}{x_{1}-x_{3}} = t<1

By convexity, f(x_{2}) = f(tx_{1}+(1-t)x_{3})\leq tf(x_{1})+(1-t)f(x_{3})Rearranging and dividing both sides by x_{3}-x_{2}, we see that

\displaystyle \frac{f(x_{2})-f(x_{3})}{x_{2}-x_{3}} \geq\frac{f(x_{1})-f(x_{3})}{x_{1}-x_{3}}

Subtracting f(x_{1}) from both sides and dividing by x_{2}-x_{1} in our first convexity inequality, we also obtain the inequality

\displaystyle \frac{f(x_{2}) -f(x_{1})}{x_{2}-x_{1}}\leq \frac{f(x_{1}) -f(x_{3})}{x_{1}-x_{3}} = \frac{f(x_{3})-f(x_{1})}{x_{3}-x_{1}}

Retracing our steps, we see that the preceding inequalities give a sufficient condition for f to be convex, which completes the proof. We note for the interested reader.

We now prove some more interesting results about convex functions related to continuity, midpoint convexity, and Lebesgue measurability.

Lemma 1. Let f: \mathbb{R}^{n} \rightarrow \mathbb{R} be midpoint convex; x_{1},\cdots, x_{n} \in \mathbb{R}^{n}; and r_{1},\cdots,r_{n} \in \mathbb{Q}^{+} with \sum_{i=1}^{n}r_{i} = 1. Then

\displaystyle f\left(\sum_{i=1}^{n}r_{i}x_{i}\right) \leq \sum_{i=1}^{n}r_{i}f(x_{i})

Proof. First, I claim that

\displaystyle f\left(\dfrac{x_{1} + \cdots + x_{n}}{n}\right) \leq \dfrac{f(x_{1}) + \cdots + f(x_{n})}{n}

If n = 2, then the assertion is immediate from the hypothesis of midpoint convexity. Suppose the claim holds for n = 2^{k}. Then by midpoint convexity,

\displaystyle \begin{array}{lcl}f\left(\dfrac{x_{1} + \cdots + x_{2^{k+1}}}{2^{k+1}}\right) &=& f\left(\dfrac{1}{2}\dfrac{x_{1} + \cdots + x_{2^{k}}}{2^{k}} + \dfrac{1}{2}\dfrac{x_{2^{k} + 1} + \cdots + x_{2^{k+1}}}{2^{k}}\right)\\ &\leq&\dfrac{1}{2}f\left(\dfrac{x_{1} + \cdots + x_{2^{k}}}{2^{k}}\right) + \dfrac{1}{2}f\left(\dfrac{x_{2^{k} + 1} + \cdots + x_{2^{k+1}}}{2^{k}}\right)\\&\leq& \dfrac{1}{2}\dfrac{f(x_{1}) + \cdots + f(x_{2^{k}})}{2^{k}} + \dfrac{1}{2}\dfrac{f(x_{2^{k}+1}) + \cdots + f(x_{2^{k+1}})}{2^{k}}\\&=& \dfrac{f(x_{1}) + \cdots + f(x_{2^{k}+1})}{2^{k+1}}\end{array}

where we apply the induction hypothesis in the penultimate inequality. For arbitrary n \in \mathbb{Z}^{\geq 1}, there exists k \in \mathbb{Z}^{\geq 1} such that n \in [2^{k},2^{k+1}). Hence,

\displaystyle \begin{array}{lcl}f\left(\dfrac{x_{1} + \cdots + x_{n}}{n}\right) &= &f\left(\dfrac{1}{2^{k}}\left(x_{1} + \cdots + x_{n} + (2^{k}-n)\dfrac{x_{1} + \cdots + x_{n}}{n}\right)\right)\\&\leq& \dfrac{1}{2^{k}}\left({f(x_{1}) + \cdots + f(x_{n})} + (2^{k}-n)f\left(\dfrac{x_{1} + \cdots + x_{n}}{n}\right)\right)\\&\Rightarrow&f\left(\dfrac{x_{1} + \cdots + x_{n}}{n}\right) \leq \dfrac{2^{k}}{n}\dfrac{f(x_{1}) + \cdots + f(x_{n})}{2^{k}} = \dfrac{f(x_{1}) + \cdots + f(x_{n})}{n}\end{array}

Let d be the least common denominator of r_{1},\cdots,r_{n}. Set dr_{i} = s_{i} \in \mathbb{Z}, and \sum_{i=1}^{n}s_{i} = d. We have that

\displaystyle \sum_{i=1}^{n}r_{i}x_{i} = \dfrac{1}{d}\sum_{i=1}^{n}dr_{i}x_{i} = \dfrac{1}{d}\sum_{i=1}^{n}s_{i}x_{i} = \dfrac{1}{d}\left(\left(\sum_{i=1}^{s_{1}}x_{1}\right) + \cdots + \left(\sum_{i=1}^{s_{n}}x_{n}\right)\right)


\displaystyle \begin{array}{lcl}f\left(\sum_{i=1}^{n}r_{i}x_{i}\right) = f\left(\dfrac{\left(\left(\sum_{i=1}^{s_{1}}x_{1}\right) + \cdots + \left(\sum_{i=1}^{s_{n}}x_{n}\right)\right)}{d}\right) &\leq&\dfrac{\sum_{i=1}^{s_{1}}f(x_{1}) + \cdots + \sum_{i=1}^{s_{n}}f(x_{n})}{d}\\&=& \sum_{i=1}^{n}r_{i}f(x_{i})\end{array}


Obviously every convex function f is also midpoint convex, since we just take \lambda = \frac{1}{2}, but if we impose the additional hypothesis that f is continuous, then f turns out to be convex. We note for the reader’s benefit that this is Exercise 24, Chapter 4 of Baby Rudin.

Lemma 2. If f is continuous and midpoint convex, then f is convex.

Proof. Let x_{1}, x_{2} \in \mathbb{R}^{n} and \lambda \in [0,1]. Then there exists a sequence of positive rational numbers (r_{n})_{n=1}^{\infty} \subset [0,1] such that r_{n} \rightarrow \lambda, n \rightarrow \infty. Applying the preceding lemma to r_{n}, 1-r_{n} and using the continuity of f, we have that

\displaystyle \begin{array}{lcl}f(\lambda{x_{1}} + (1-\lambda)x_{2}) = \lim_{n \rightarrow \infty} f\left(r_{n}x_{1} + (1-r_{n})x_{2}\right) &\leq& \lim_{n \rightarrow \infty}r_{n}f(x_{1}) + (1-r_{n})f(x_{2})\\ &=& \lambda{f}(x_{1}) + (1-\lambda)f(x_{2})\end{array}


Lemma 3. If f is midpoint convex and has a point of discontinuity x_{0}, then \limsup_{x \rightarrow x_{0}}f(x) = +\infty.

Proof. By hypothesis that f is discontinuous at x_{0}, there exists \epsilon > 0 such that \forall \delta > 0, there exists x \in B(x_{0};\delta) such that \left|f(x) - f(x_{0})\right| > \epsilon. Set x' := 2x_{0} - x. Then by midpoint convexity,

\displaystyle \dfrac{f(x) + f(x')}{2} \geq f\left(\dfrac{x + x'}{2}\right) = f\left(x_{0}\right) \Rightarrow \left[f(x) - f(x_{0})\right] + \left[f(x') - f(x_{0})\right] \geq 0

If f(x) - f(x_{0}) \leq \epsilon, then by choice of x, f(x) - f(x_{0}) < -\epsilon \Rightarrow f(x') - f(x_{0}) > \epsilon. So either in any neighborhood of x_{0} there exists a point x_{1} such that f(x) - f(x_{0}) > \epsilon. Set x_{2} := 2x_{1} - x_{0}. By midpoint convexity,

\displaystyle \begin{array}{lcl}& &\dfrac{f(x_{2}) + f(x_{0})}{2} \geq f\left(\dfrac{x_{2} + x_{0}}{2}\right) = f(x_{1}) \Rightarrow f(x_{2}) - f(x_{1}) \geq f(x_{1}) - f(x_{0})\\&\Rightarrow& f(x_{2}) - f(x_{0}) = \left[f(x_{2}) - f(x_{1})\right] + \left[f(x_{1}) - f(x_{0})\right] \geq 2\left[f(x_{1}) - f(x_{0})\right] > 2\epsilon \end{array}

By induction, we construct a sequence (x_{n})_{n=1}^{\infty} such that x_{n} \rightarrow x_{0} and f(x_{n}) - f(x_{0}) > 2^{n-1}\epsilon \rightarrow \infty as n \rightarrow \infty. \Box

Sierpinski showed that if f is midpoint and is also Lebesgue measurable, then f is continuous. This result was not known to me, until I stumbled across it in a textbook.

Lemma 4. (Sierpinski) If f is Lebesgue measurable and midpoint-convex, then f is continuous.

Proof. Suppose that there exists a point of discontinuity x_{0}. The previous lemma shows that there exists an interval (a,b) \ni x_{0} and a sequence (\xi_{n})_{n=1}^{\infty} \subset (a,b), \xi_{n} \rightarrow x_{0}, such that \forall n \in \mathbb{N}, f(\xi_{n}) \geq n. Set

\displaystyle A_{k} := \left\{x \in (a,b): f(x) \geq k\right\} \ \text{and} \ B_{k} := 2\xi_{k} - A_{k} = \left\{y = 2\xi_{k} - x: x \in A_{k}\right\}

Since f is measurable, A_{k} and B_{k} are measurable, and it is evident that A_{k},B_{k} have the same Lebesgue measure. I claim that (a,b) \subset A_{k} \cup B_{k}. Indeed, if x \in (a,b), then either f(x) \geq k \Rightarrow x \in A_{k}. Otherwise, f(x) < k, and setting x := 2\xi_{k} - x', we obtain by midpoint convexity that

\displaystyle \dfrac{f(x) + f(x')}{2} \geq f(\xi_{k}) \geq k \Rightarrow f(x) + f(x') \geq 2k

so f(x') \geq k \Rightarrow x' \in A_{k} \Rightarrow x \in B_{k}. We then have that

\displaystyle 2m(A_{k}) = m(A_{k}) + m(B_{k}) \geq m(A_{k} \cup B_{k}) \geq b-a > 0 \ \forall k \in \mathbb{N}

which is a contradiction since f is finite-valued (or finite-valued a.e.) and therefore m(A_{k}) \rightarrow 0 as k \rightarrow \infty. \Box

Combining the preceding lemmas, we conclude the following:

Proposition 5 (Blumberg-Sierpinski) Let f: \mathbb{R}^{n} \rightarrow \mathbb{R} be a measurable, midpoint-convex function. Then f is continuous, and a fortiori convex.

Lemma 6. Let (X,\mu) be a measure space. If f \in L^{p_{0}}(X,\mu) \cap L^{\infty}(X,\mu) for some p_{0} < \infty, then f \in L^{p}(X,\mu) \ \forall p \in [p_{0},\infty] and

\displaystyle \lim_{p \rightarrow \infty}\left\|f\right\|_{p} = \left\|f\right\|_{L^{\infty}}

Proof. The first assertion is just interpolation. For the second assertion, let 0 < \epsilon < \left\|f\right\|_{L^{p_{0}}} be given. If f = \sum_{i=1}^{n}a_{i}\chi_{A_{i}} is a simple function, then

\displaystyle \limsup_{p \rightarrow \infty} \left\|f\right\|_{L^{p}} = \limsup_{p \rightarrow \infty} \left(\int_{X}\left|f\right|^{p}d\mu\right)^{\frac{1}{p}} \leq \limsup_{p \rightarrow \infty} \left(\sum_{i=1}^{n}\left\|f\right\|_{\infty}^{p}\mu(A_{i})\right)^{\frac{1}{p}} = \left\|f\right\|_{\infty}

Since f \in L^{p} can be approximated by a simple functions from below, we have that

\displaystyle \limsup_{p \rightarrow \infty}\left\|f\right\|_{L^{p}} \leq \left\|f\right\|_{\infty}

We now need to show that \liminf_{p \rightarrow \infty}\left\|f\right\|_{L^{p}} \geq \left\|f\right\|_{L^{\infty}}. Let E := \left\{x \in X: \left|f(x)\right| \geq \left\|f\right\|_{L^{\infty}} - \epsilon\right\}. Then E is measurable, and I claim that E has positive measure by definition of essential supremum. Indeed,

\displaystyle \left\|f\right\|_{L^{p}} \geq \left(\int_{X}\left|f\chi_{E}\right|^{p}d\mu\right)^{\frac{1}{p}} \geq \left(\left\|f\right\|_{L^{\infty}} - \epsilon\right)\mu(E)^{\frac{1}{p}}

Taking the \liminf,

\displaystyle \liminf_{p \rightarrow \infty} \left\|f\right\|_{L^{p}} \geq \left(\left\|f\right\|_{L^{\infty}} - \epsilon\right)\lim_{p \rightarrow \infty}\mu(E)^{\frac{1}{p}} = \left\|f\right\|_{L^{\infty}} - \epsilon

Since \epsilon > 0 was arbitrary, we obtain the desired inequality. \Box

Lemma 7. Let (X,\mu) be a measure space such that \mu(X) = 1, and \varphi be a convex function on \mathbb{R}. If h \in L^{1}(X,\mu), then

\displaystyle \varphi\left(\int_{X}\left|h\right|d\mu\right) \leq \int_{X}\varphi(\left|h\right|)d\mu

Proof. Since \varphi is convex, there exists \lambda_{0} \in \mathbb{R} such that

\displaystyle \varphi(\left|h(x)\right|) \geq \varphi(\left\|h\right\|_{L^{1}}) + \lambda_{0}(\left|h(x)\right| - \left\|h\right\|_{L^{1}})

for all x \in X. Integrating both sides yields and using the fact that \mu(X) = 1,

\displaystyle \int_{X}\varphi(\left|h\right|)d\mu \geq \int_{X}\varphi\left(\left\|h\right\|_{L^{1}}\right)d\mu + \lambda_{0}\int_{X}\left|h\right|d\mu - \lambda_{0}\int_{X}\left\|h\right\|_{L^{1}}d\mu = \varphi\left(\int_{X}\left|h\right|d\mu\right)


Note that if \varphi is concave, then the inequality is just reversed. We conclude this post with the well-known inequality attributed to Johan Jensen.

Proposition 8. (Jensen’s Inequality) Suppose that \mu\left(X\right) = 1. Then, for all p \in (0,\infty),

\displaystyle \left\|f\right\|_{L^{p}} \geq \exp\left(\int_{X}\log\left|f(x)\right|d\mu\right)

If \mu(X) = 1 and f \in L^{p_{0}}(X,\mu) for some p_{0} > 0, then

\displaystyle \lim_{p \rightarrow 0}\left\|f\right\|_{L^{p}} = \exp\left(\int_{X}\log\left|f(x)\right|d\mu\right)

with the interpretation e^{-\infty} = 0.

Proof. Let p \in (0,\infty). If f \notin L^{p}(X,\mu), then the stated inequality is trivially true, so assume otherwise. By Jensen’s inequality for a convex function,

\displaystyle p\int_{X}\log\left|f\right|d\mu = \int_{X}\log\left|f\right|^{p}d\mu \leq \log\left(\int_{X}\left|f\right|^{p}d\mu\right),

which implies that

\displaystyle \exp\left(p\int_{X}\log\left|f\right|d\mu\right) \leq \int_{X}\left|f\right|^{p}d\mu = \left\|f\right\|_{L^{p}}^{p}

and therefore

\displaystyle \exp\left(\int_{X}\log\left|f\right|d\mu\right) = \left(\exp\left(p\int_{X}\log\left|f\right|d\mu\right)\right)^{\frac{1}{p}} \leq \left\|f\right\|_{L^{p}}


This entry was posted in math.CA and tagged , , . Bookmark the permalink.

6 Responses to Convexity, Continuity, and Jensen’s Inequality

  1. Pingback: Hermite-Hadamard Inequality | Math by Matt

  2. Pingback: Another Proof that Midpoint-Convex and Continuous Implies Convex | Math by Matt

  3. Dileep says:

    Really Nice, Helped a lot.

  4. Anonymous says:


    how do you know that $ x_n \rightarrow x_0 $ as $ n \rightarrow \infty $ in lemma 3?

  5. Pingback: Convexity, Continuity, and Jensen’s Inequality Comment | Math by Matt

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s