## Convexity, Continuity, and Jensen’s Inequality

Convexity is an important notion in mathematics with many applications to real-world problems. In this post, I would like to prove a few basic results for convex functions on Euclidean space. Recall that a subset of $E \subset\mathbb{R}^{d}$ is said to be convex if, for all $x,y \in E$,

$\displaystyle \lambda x + (1-\lambda)y \in E, \indent \forall \lambda \in [0,1]$

Suppose $f: E \rightarrow \mathbb{R}$ is a real-valued function defined on a convex subset $E$. We say that $f$ is a convex function if, for all $x,y \in E$,

$\displaystyle f(\lambda x + (1-\lambda)y) \leq \lambda f(x) + (1-\lambda)f(y),\indent \forall \lambda \in [0,1]$

If the preceding inequality holds for the special case $\lambda = \frac{1}{2}$, then we say that $f$ is midpoint convex. There are number of interesting things we can say about convex functions. One is that every local minimum is a global minimum. Indeed, suppose $x_{0}$ is a local minimum of a convex function $f: E \rightarrow \mathbb{R}$. Then there exists an open ball $B(x_{0}; \delta)$, such that $f(x_{0}) \geq f(y)$ for all $\left|y-x_{0}\right| < \delta$. For $\lambda \in (0,1]$,

$\displaystyle f\left(x_{0} + \lambda(y-x_{0})\right) \leq (1-\lambda)f(x_{0}) + \lambda f(y) = f(x_{0}) + \lambda\left(f(y) - f(x_{0})\right)$

and therefore, since $x_{0} + \lambda(y-x_{0}) \in B(x_{0};\delta)$,

$\displaystyle \lambda f(x_{0}) \leq \underbrace{\left[f(x_{0}) - f(x_{0}+\lambda(y-x_{0}))\right]}_{\leq 0} + \lambda f(y)$

Dividing both sides by $\lambda$ completes the proof. Furthermore, any increasing convex function of a convex function is also convex. Suppose $\varphi: \mathbb{R} \rightarrow \mathbb{R}$ is increasing, and $f : E\rightarrow \mathbb{R}$ is convex. By convexity, for $\lambda \in [0,1]$,

$\displaystyle f(\lambda x + (1-\lambda)y)\leq\lambda f(x)+(1-\lambda)f(y)$

and since $\varphi$ is increasing on $\mathbb{R}$, we conclude that

$\displaystyle \varphi\left(f(\lambda x + (1-\lambda)y)\right) \leq \varphi\left(\lambda f(x)+(1-\lambda)f(y)\right)\leq\lambda\varphi(f(x))+(1-\lambda)\varphi(f(y))$

We now restrict ourselves to convex functions defined on a convex subset of the real line. Any convex function $f$ defined on an interval $(a,b)$ is continuous. Indeed,

$\displaystyle f(y + \lambda(x-y)) \leq \lambda f(x) + (1-\lambda)f(y) = f(y) + \lambda(f(x) - f(y))$

If we have a convex function $f: (a,b) \rightarrow \mathbb{R}$, then we can define a function $R: \left\{(x,y) : x,y \in (a,b), x \neq y\right\} \rightarrow \mathbb{R}$ by

$\displaystyle R(x,y) :=\frac{f(x) - f(y)}{x-y}$

Observe that $R$ is symmetric in $x$ and $y$. It turns out that $f$ is convex if and only if $R$ is monotonically nondecreasing for $x$ or $y$ fixed. Fix $a \leq x_{1} < x_{2} < x_{3} \leq b$. First, I claim that we can write $x_{2} = x_{1}t+x_{3}(1-t)$, where $t \in [0,1]$. Indeed,

$\displaystyle x_{2}-x_{3}=x_{1}t-x_{3}t \Longleftrightarrow 0<\frac{x_{3}-x_{2}}{x_{3}-x_{1}}=\frac{x_{2}-x_{3}}{x_{1}-x_{3}} = t<1$

By convexity, $f(x_{2}) = f(tx_{1}+(1-t)x_{3})\leq tf(x_{1})+(1-t)f(x_{3})$Rearranging and dividing both sides by $x_{3}-x_{2}$, we see that

$\displaystyle \frac{f(x_{2})-f(x_{3})}{x_{2}-x_{3}} \geq\frac{f(x_{1})-f(x_{3})}{x_{1}-x_{3}}$

Subtracting $f(x_{1})$ from both sides and dividing by $x_{2}-x_{1}$ in our first convexity inequality, we also obtain the inequality

$\displaystyle \frac{f(x_{2}) -f(x_{1})}{x_{2}-x_{1}}\leq \frac{f(x_{1}) -f(x_{3})}{x_{1}-x_{3}} = \frac{f(x_{3})-f(x_{1})}{x_{3}-x_{1}}$

Retracing our steps, we see that the preceding inequalities give a sufficient condition for $f$ to be convex, which completes the proof. We note for the interested reader.

We now prove some more interesting results about convex functions related to continuity, midpoint convexity, and Lebesgue measurability.

Lemma 1. Let $f: \mathbb{R}^{n} \rightarrow \mathbb{R}$ be midpoint convex; $x_{1},\cdots, x_{n} \in \mathbb{R}^{n}$; and $r_{1},\cdots,r_{n} \in \mathbb{Q}^{+}$ with $\sum_{i=1}^{n}r_{i} = 1$. Then

$\displaystyle f\left(\sum_{i=1}^{n}r_{i}x_{i}\right) \leq \sum_{i=1}^{n}r_{i}f(x_{i})$

Proof. First, I claim that

$\displaystyle f\left(\dfrac{x_{1} + \cdots + x_{n}}{n}\right) \leq \dfrac{f(x_{1}) + \cdots + f(x_{n})}{n}$

If $n = 2$, then the assertion is immediate from the hypothesis of midpoint convexity. Suppose the claim holds for $n = 2^{k}$. Then by midpoint convexity,

$\displaystyle \begin{array}{lcl}f\left(\dfrac{x_{1} + \cdots + x_{2^{k+1}}}{2^{k+1}}\right) &=& f\left(\dfrac{1}{2}\dfrac{x_{1} + \cdots + x_{2^{k}}}{2^{k}} + \dfrac{1}{2}\dfrac{x_{2^{k} + 1} + \cdots + x_{2^{k+1}}}{2^{k}}\right)\\ &\leq&\dfrac{1}{2}f\left(\dfrac{x_{1} + \cdots + x_{2^{k}}}{2^{k}}\right) + \dfrac{1}{2}f\left(\dfrac{x_{2^{k} + 1} + \cdots + x_{2^{k+1}}}{2^{k}}\right)\\&\leq& \dfrac{1}{2}\dfrac{f(x_{1}) + \cdots + f(x_{2^{k}})}{2^{k}} + \dfrac{1}{2}\dfrac{f(x_{2^{k}+1}) + \cdots + f(x_{2^{k+1}})}{2^{k}}\\&=& \dfrac{f(x_{1}) + \cdots + f(x_{2^{k}+1})}{2^{k+1}}\end{array}$

where we apply the induction hypothesis in the penultimate inequality. For arbitrary $n \in \mathbb{Z}^{\geq 1}$, there exists $k \in \mathbb{Z}^{\geq 1}$ such that $n \in [2^{k},2^{k+1})$. Hence,

$\displaystyle \begin{array}{lcl}f\left(\dfrac{x_{1} + \cdots + x_{n}}{n}\right) &= &f\left(\dfrac{1}{2^{k}}\left(x_{1} + \cdots + x_{n} + (2^{k}-n)\dfrac{x_{1} + \cdots + x_{n}}{n}\right)\right)\\&\leq& \dfrac{1}{2^{k}}\left({f(x_{1}) + \cdots + f(x_{n})} + (2^{k}-n)f\left(\dfrac{x_{1} + \cdots + x_{n}}{n}\right)\right)\\&\Rightarrow&f\left(\dfrac{x_{1} + \cdots + x_{n}}{n}\right) \leq \dfrac{2^{k}}{n}\dfrac{f(x_{1}) + \cdots + f(x_{n})}{2^{k}} = \dfrac{f(x_{1}) + \cdots + f(x_{n})}{n}\end{array}$

Let $d$ be the least common denominator of $r_{1},\cdots,r_{n}$. Set $dr_{i} = s_{i} \in \mathbb{Z}$, and $\sum_{i=1}^{n}s_{i} = d$. We have that

$\displaystyle \sum_{i=1}^{n}r_{i}x_{i} = \dfrac{1}{d}\sum_{i=1}^{n}dr_{i}x_{i} = \dfrac{1}{d}\sum_{i=1}^{n}s_{i}x_{i} = \dfrac{1}{d}\left(\left(\sum_{i=1}^{s_{1}}x_{1}\right) + \cdots + \left(\sum_{i=1}^{s_{n}}x_{n}\right)\right)$

Hence,

$\displaystyle \begin{array}{lcl}f\left(\sum_{i=1}^{n}r_{i}x_{i}\right) = f\left(\dfrac{\left(\left(\sum_{i=1}^{s_{1}}x_{1}\right) + \cdots + \left(\sum_{i=1}^{s_{n}}x_{n}\right)\right)}{d}\right) &\leq&\dfrac{\sum_{i=1}^{s_{1}}f(x_{1}) + \cdots + \sum_{i=1}^{s_{n}}f(x_{n})}{d}\\&=& \sum_{i=1}^{n}r_{i}f(x_{i})\end{array}$

$\Box$

Obviously every convex function $f$ is also midpoint convex, since we just take $\lambda = \frac{1}{2}$, but if we impose the additional hypothesis that $f$ is continuous, then $f$ turns out to be convex. We note for the reader’s benefit that this is Exercise 24, Chapter 4 of Baby Rudin.

Lemma 2. If $f$ is continuous and midpoint convex, then $f$ is convex.

Proof. Let $x_{1}, x_{2} \in \mathbb{R}^{n}$ and $\lambda \in [0,1]$. Then there exists a sequence of positive rational numbers $(r_{n})_{n=1}^{\infty} \subset [0,1]$ such that $r_{n} \rightarrow \lambda, n \rightarrow \infty$. Applying the preceding lemma to $r_{n}, 1-r_{n}$ and using the continuity of $f$, we have that

$\displaystyle \begin{array}{lcl}f(\lambda{x_{1}} + (1-\lambda)x_{2}) = \lim_{n \rightarrow \infty} f\left(r_{n}x_{1} + (1-r_{n})x_{2}\right) &\leq& \lim_{n \rightarrow \infty}r_{n}f(x_{1}) + (1-r_{n})f(x_{2})\\ &=& \lambda{f}(x_{1}) + (1-\lambda)f(x_{2})\end{array}$

$\Box$

Lemma 3. If $f$ is midpoint convex and has a point of discontinuity $x_{0}$, then $\limsup_{x \rightarrow x_{0}}f(x) = +\infty$.

Proof. By hypothesis that $f$ is discontinuous at $x_{0}$, there exists $\epsilon > 0$ such that $\forall \delta > 0$, there exists $x \in B(x_{0};\delta)$ such that $\left|f(x) - f(x_{0})\right| > \epsilon$. Set $x' := 2x_{0} - x$. Then by midpoint convexity,

$\displaystyle \dfrac{f(x) + f(x')}{2} \geq f\left(\dfrac{x + x'}{2}\right) = f\left(x_{0}\right) \Rightarrow \left[f(x) - f(x_{0})\right] + \left[f(x') - f(x_{0})\right] \geq 0$

If $f(x) - f(x_{0}) \leq \epsilon$, then by choice of $x$, $f(x) - f(x_{0}) < -\epsilon \Rightarrow f(x') - f(x_{0}) > \epsilon$. So either in any neighborhood of $x_{0}$ there exists a point $x_{1}$ such that $f(x) - f(x_{0}) > \epsilon$. Set $x_{2} := 2x_{1} - x_{0}$. By midpoint convexity,

$\displaystyle \begin{array}{lcl}& &\dfrac{f(x_{2}) + f(x_{0})}{2} \geq f\left(\dfrac{x_{2} + x_{0}}{2}\right) = f(x_{1}) \Rightarrow f(x_{2}) - f(x_{1}) \geq f(x_{1}) - f(x_{0})\\&\Rightarrow& f(x_{2}) - f(x_{0}) = \left[f(x_{2}) - f(x_{1})\right] + \left[f(x_{1}) - f(x_{0})\right] \geq 2\left[f(x_{1}) - f(x_{0})\right] > 2\epsilon \end{array}$

By induction, we construct a sequence $(x_{n})_{n=1}^{\infty}$ such that $x_{n} \rightarrow x_{0}$ and $f(x_{n}) - f(x_{0}) > 2^{n-1}\epsilon \rightarrow \infty$ as $n \rightarrow \infty$. $\Box$

Sierpinski showed that if $f$ is midpoint and is also Lebesgue measurable, then $f$ is continuous. This result was not known to me, until I stumbled across it in a textbook.

Lemma 4. (Sierpinski) If $f$ is Lebesgue measurable and midpoint-convex, then $f$ is continuous.

Proof. Suppose that there exists a point of discontinuity $x_{0}$. The previous lemma shows that there exists an interval $(a,b) \ni x_{0}$ and a sequence $(\xi_{n})_{n=1}^{\infty} \subset (a,b), \xi_{n} \rightarrow x_{0}$, such that $\forall n \in \mathbb{N}, f(\xi_{n}) \geq n$. Set

$\displaystyle A_{k} := \left\{x \in (a,b): f(x) \geq k\right\} \ \text{and} \ B_{k} := 2\xi_{k} - A_{k} = \left\{y = 2\xi_{k} - x: x \in A_{k}\right\}$

Since $f$ is measurable, $A_{k}$ and $B_{k}$ are measurable, and it is evident that $A_{k},B_{k}$ have the same Lebesgue measure. I claim that $(a,b) \subset A_{k} \cup B_{k}$. Indeed, if $x \in (a,b)$, then either $f(x) \geq k \Rightarrow x \in A_{k}$. Otherwise, $f(x) < k$, and setting $x := 2\xi_{k} - x'$, we obtain by midpoint convexity that

$\displaystyle \dfrac{f(x) + f(x')}{2} \geq f(\xi_{k}) \geq k \Rightarrow f(x) + f(x') \geq 2k$

so $f(x') \geq k \Rightarrow x' \in A_{k} \Rightarrow x \in B_{k}$. We then have that

$\displaystyle 2m(A_{k}) = m(A_{k}) + m(B_{k}) \geq m(A_{k} \cup B_{k}) \geq b-a > 0 \ \forall k \in \mathbb{N}$

which is a contradiction since $f$ is finite-valued (or finite-valued a.e.) and therefore $m(A_{k}) \rightarrow 0$ as $k \rightarrow \infty$. $\Box$

Combining the preceding lemmas, we conclude the following:

Proposition 5 (Blumberg-Sierpinski) Let $f: \mathbb{R}^{n} \rightarrow \mathbb{R}$ be a measurable, midpoint-convex function. Then $f$ is continuous, and a fortiori convex.

Lemma 6. Let $(X,\mu)$ be a measure space. If $f \in L^{p_{0}}(X,\mu) \cap L^{\infty}(X,\mu)$ for some $p_{0} < \infty$, then $f \in L^{p}(X,\mu) \ \forall p \in [p_{0},\infty]$ and

$\displaystyle \lim_{p \rightarrow \infty}\left\|f\right\|_{p} = \left\|f\right\|_{L^{\infty}}$

Proof. The first assertion is just interpolation. For the second assertion, let $0 < \epsilon < \left\|f\right\|_{L^{p_{0}}}$ be given. If $f = \sum_{i=1}^{n}a_{i}\chi_{A_{i}}$ is a simple function, then

$\displaystyle \limsup_{p \rightarrow \infty} \left\|f\right\|_{L^{p}} = \limsup_{p \rightarrow \infty} \left(\int_{X}\left|f\right|^{p}d\mu\right)^{\frac{1}{p}} \leq \limsup_{p \rightarrow \infty} \left(\sum_{i=1}^{n}\left\|f\right\|_{\infty}^{p}\mu(A_{i})\right)^{\frac{1}{p}} = \left\|f\right\|_{\infty}$

Since $f \in L^{p}$ can be approximated by a simple functions from below, we have that

$\displaystyle \limsup_{p \rightarrow \infty}\left\|f\right\|_{L^{p}} \leq \left\|f\right\|_{\infty}$

We now need to show that $\liminf_{p \rightarrow \infty}\left\|f\right\|_{L^{p}} \geq \left\|f\right\|_{L^{\infty}}$. Let $E := \left\{x \in X: \left|f(x)\right| \geq \left\|f\right\|_{L^{\infty}} - \epsilon\right\}$. Then $E$ is measurable, and I claim that $E$ has positive measure by definition of essential supremum. Indeed,

$\displaystyle \left\|f\right\|_{L^{p}} \geq \left(\int_{X}\left|f\chi_{E}\right|^{p}d\mu\right)^{\frac{1}{p}} \geq \left(\left\|f\right\|_{L^{\infty}} - \epsilon\right)\mu(E)^{\frac{1}{p}}$

Taking the $\liminf$,

$\displaystyle \liminf_{p \rightarrow \infty} \left\|f\right\|_{L^{p}} \geq \left(\left\|f\right\|_{L^{\infty}} - \epsilon\right)\lim_{p \rightarrow \infty}\mu(E)^{\frac{1}{p}} = \left\|f\right\|_{L^{\infty}} - \epsilon$

Since $\epsilon > 0$ was arbitrary, we obtain the desired inequality. $\Box$

Lemma 7. Let $(X,\mu)$ be a measure space such that $\mu(X) = 1$, and $\varphi$ be a convex function on $\mathbb{R}$. If $h \in L^{1}(X,\mu)$, then

$\displaystyle \varphi\left(\int_{X}\left|h\right|d\mu\right) \leq \int_{X}\varphi(\left|h\right|)d\mu$

Proof. Since $\varphi$ is convex, there exists $\lambda_{0} \in \mathbb{R}$ such that

$\displaystyle \varphi(\left|h(x)\right|) \geq \varphi(\left\|h\right\|_{L^{1}}) + \lambda_{0}(\left|h(x)\right| - \left\|h\right\|_{L^{1}})$

for all $x \in X$. Integrating both sides yields and using the fact that $\mu(X) = 1$,

$\displaystyle \int_{X}\varphi(\left|h\right|)d\mu \geq \int_{X}\varphi\left(\left\|h\right\|_{L^{1}}\right)d\mu + \lambda_{0}\int_{X}\left|h\right|d\mu - \lambda_{0}\int_{X}\left\|h\right\|_{L^{1}}d\mu = \varphi\left(\int_{X}\left|h\right|d\mu\right)$

$\Box$

Note that if $\varphi$ is concave, then the inequality is just reversed. We conclude this post with the well-known inequality attributed to Johan Jensen.

Proposition 8. (Jensen’s Inequality) Suppose that $\mu\left(X\right) = 1$. Then, for all $p \in (0,\infty)$,

$\displaystyle \left\|f\right\|_{L^{p}} \geq \exp\left(\int_{X}\log\left|f(x)\right|d\mu\right)$

If $\mu(X) = 1$ and $f \in L^{p_{0}}(X,\mu)$ for some $p_{0} > 0$, then

$\displaystyle \lim_{p \rightarrow 0}\left\|f\right\|_{L^{p}} = \exp\left(\int_{X}\log\left|f(x)\right|d\mu\right)$

with the interpretation $e^{-\infty} = 0$.

Proof. Let $p \in (0,\infty)$. If $f \notin L^{p}(X,\mu)$, then the stated inequality is trivially true, so assume otherwise. By Jensen’s inequality for a convex function,

$\displaystyle p\int_{X}\log\left|f\right|d\mu = \int_{X}\log\left|f\right|^{p}d\mu \leq \log\left(\int_{X}\left|f\right|^{p}d\mu\right),$

which implies that

$\displaystyle \exp\left(p\int_{X}\log\left|f\right|d\mu\right) \leq \int_{X}\left|f\right|^{p}d\mu = \left\|f\right\|_{L^{p}}^{p}$

and therefore

$\displaystyle \exp\left(\int_{X}\log\left|f\right|d\mu\right) = \left(\exp\left(p\int_{X}\log\left|f\right|d\mu\right)\right)^{\frac{1}{p}} \leq \left\|f\right\|_{L^{p}}$

$\Box$

This entry was posted in math.CA and tagged , , . Bookmark the permalink.

### 6 Responses to Convexity, Continuity, and Jensen’s Inequality

1. Dileep says:

Really Nice, Helped a lot.

2. Anonymous says:

Hi,

how do you know that $x_n \rightarrow x_0$ as $n \rightarrow \infty$ in lemma 3?

• matthewhr says:

The proof appears to be incorrect. I intend to post the correction later today. Thank you for catching it!