(Azuma-)Hoeffding Inequality

In this post, we prove a concentration inequality for bounded independent random variables due to Hoeffding.

Lemma 1. Suppose $X$ is a real-valued random variable with $\mathbb{E}[X] = 0$ and $\left|X\right| \leq c$ almost surely, for some positive constant $c$. Then

$\displaystyle\mathbb{E}\left[e^{\xi X}\right]\leq\exp\left(\frac{\xi^{2}c^{2}}{2}\right) \text{ and } \displaystyle\mathbb{E}\left[e^{\xi\left|X\right|}\right]\leq 2\exp\left(\frac{\xi^{2}c^{2}}{2}\right), \indent \displaystyle\forall\xi\in\mathbb{R}$

Proof. We first show that, for $\xi \in \mathbb{R}$ fixed,

$\displaystyle e^{\xi x}\leq\dfrac{e^{\xi c}(c+x)}{2c}+\dfrac{e^{-\xi c}(c-x)}{2c},\indent\forall x\in [-c,c]$

Note that the function $f(x) := e^{\xi x}$ is convex on $[-c,c]$, since

$\displaystyle f'(x)=\xi e^{\xi x}\Longrightarrow f''(x)=\xi^{2}e^{\xi x}\geq 0, \displaystyle\indent\forall x\in[-c,c]$

Since $\frac{c+x}{2c}+\frac{c-x}{2c}=1$, we have by convexity that

$\displaystyle e^{\xi x}=\exp\left(\frac{\xi c(c+x)}{2c}-\frac{\xi c(c-x)}{2c}\right)\leq e^{\xi c}\frac{c+x}{2c}+e^{-\xi c}\frac{c-x}{2c}$

Denote the distribution function of $X$ by $F$. Then since $0=\mathbb{E}[X] =\int_{-c}^{c}xdF(x)$, we have the inequality

$\begin{array}{lcl}\displaystyle\mathbb{E}[e^{\xi X}]=\int_{-c}^{c}e^{\xi x}dF(x)\leq\int_{-c}^{c}\left[\dfrac{e^{\xi c}(c+x)}{2c}+\dfrac{e^{-\xi c}(c-x)}{2c}\right]dF(x)&=&\dfrac{e^{\xi c}+e^{-\xi c}}{2}\\&=&\dfrac{e^{\left|\xi\right|c}+e^{-\left|\xi\right|c}}{2}\end{array}$

Applying the Taylor expansion to the preceding expression, we see that

$\begin{array}{lcl}\displaystyle\frac{1}{2}\left[e^{\left|\xi\right|c}+e^{-\left|\xi\right|c}\right]=\frac{1}{2}\left[\sum_{k=0}^{\infty}\frac{(\left|\xi\right|c)^{k}}{k!}+\sum_{k=0}^{\infty}\frac{(-1)^{k}(\left|\xi\right|c)^{k}}{k!}\right]=\frac{1}{2}\sum_{k=0}^{\infty}2\frac{(\left|\xi\right|c)^{2k}}{(2k)!}&\leq&\displaystyle\sum_{k=0}^{\infty}\frac{(\left|\xi\right|c)^{2k}}{2^{k}k!}\\&=&\displaystyle\exp\left(\frac{\xi^{2}c^{2}}{2}\right)\end{array},$

which completes the proof. $\Box$

Note that if $X \in [a,b]$, almost surely, where $a,b$ are finite constants, then the preceding argument gives us the upper bound

$\displaystyle\mathbb{E}[e^{\xi X}]\leq\exp\left(\frac{\xi^{2}(b-a)^{2}}{8}\right)$

We can use Hoeffding’s inequality to obtain a special case of the Azuma-Hoeffding concentration inequality. Let $X_{1},\cdots,X_{n}$ be independent random variables where $X_{j}$ almost surely lies in the compact interval $[a_{j},b_{j}]$ for $1 \leq j \leq n$. Then by Markov’s inequality and independence,

$\begin{array}{lcl}\displaystyle\mathbb{P}\left(S_{n}-\mathbb{E}[S_{n}]\geq t\right)=\mathbb{P}\left(e^{\xi(S_{n}-\mathbb{E}[S_{n}])}\geq\displaystyle e^{\xi t}\right) &\leq&\displaystyle e^{-\xi t}\mathbb{E}\left[e^{\xi(S_{n}-\mathbb{E}[S_{n}])}\right]\\&=&\displaystyle e^{-\xi t}\prod_{j=1}^{n}\mathbb{E}\left[e^{\xi(X_{j}-\mathbb{E}[X_{j}])}\right]\end{array}$

for all $\xi > 0$. Applying Hoeffding’s inequality to the random variables $X_{1}-\mathbb{E}[X_{1}],\cdots,X_{n}-\mathbb{E}[X_{n}]$ to obtain

$\begin{array}{lcl}\displaystyle\mathbb{P}\left(S_{n}-\mathbb{E}[S_{n}]\geq t\right)&\leq&\displaystyle e^{-\xi t}\prod_{j=1}^{n}e^{\frac{\xi^{2}(b_{j}-a_{j})^{2}}{8}}\\&=&\displaystyle\exp\left(-\xi t+\xi^{2}\sum_{j=1}^{n}\frac{(b_{j}-a_{j})^{2}}{8}\right)\end{array}$

Set $\alpha := \sum_{j=1}^{n}\frac{(b_{j}-a_{j})^{2}}{8}$ and consider the function $f(\xi) = e^{-\xi t + \xi^{2}\alpha}$. Since the upper $f(\xi)$ is an upper bound for the LHS above for all $\xi \geq 0$, we now to turn to finding the minimizing value $\xi$. Completing the square, we can write

$\displaystyle\alpha \xi^{2}-\xi t = \alpha\left(\xi - \frac{t}{2\alpha}\right)^{2} - \frac{t^{2}}{4\alpha} \Longrightarrow f\left(\frac{t}{2\alpha}\right) \leq f(\xi), \indent\forall \xi \geq 0$

Hence,

$\begin{array}{lcl}\displaystyle\mathbb{P}\left(S_{n}-\mathbb{E}[S_{n}]\geq t\right)\leq\exp\left(-\dfrac{t^{2}}{4\sum_{j=1}^{n}\frac{(b_{j}-a_{j})^{2}}{8}}\right)&=&\exp\left(-\dfrac{t^{2}}{\sum_{j=1}^{n}\frac{(b_{j}-a_{j})^{2}}{2}}\right)\\&=&\displaystyle\exp\left(-\dfrac{2t^{2}}{\sum_{j=1}^{n}(b_{j}-a_{j})^{2}}\right)\end{array}$

Applying the same argument to $\mathbb{E}[S_{n}]-S_{n}$, we also have the upper bound

$\displaystyle\mathbb{P}\left(\mathbb{E}[S_{n}]-S_{n}\geq t\right) \leq \exp\left(-\dfrac{2t^{2}}{\sum_{j=1}^{n}(b_{j}-a_{j})^{2}}\right)$

Combining these two results, we obtain the following tail bound for $\left|S_{n}-\mathbb{E}[S_{n}]\right|$:

$\displaystyle\mathbb{P}\left(\left|S_{n}-\mathbb{E}[S_{n}]\right| \geq t\right) = \mathbb{P}\left(S_{n} -\mathbb{E}[S_{n}] \geq t\right) + \mathbb{P}\left(S_{n}-\mathbb{E}[S_{n}] \leq -t\right) \leq 2\exp\left(-\dfrac{2t^{2}}{\sum_{j=1}^{n}(b_{j}-a_{j})^{2}}\right)$

We summarize these results with the following theorem.

Theorem 2. (Hoeffding)

Suppose $X_{1},\cdots,X_{n}$ are independent random variables, where $X_{j} \in [a_{j},b_{j}] \ \forall 1 \leq j \leq n$, almost surely. If $S_{n} := \sum_{j=1}^{n}X_{j}$, then

$\displaystyle\mathbb{P}\left(S_{n}-\mathbb{E}[S_{n}]\geq t\right)\leq \exp\left(-\dfrac{2t^{2}}{\sum_{j=1}^{n}(b_{j}-a_{j})^{2}}\right)$

and

$\displaystyle\mathbb{P}\left(\left|S_{n}-\mathbb{E}[S_{n}]\right|\geq t\right)\leq 2\exp\left(-\dfrac{2t^{2}}{\sum_{j=1}^{n}(b_{j}-a_{j})^{2}}\right), \indent \forall t> 0$

We now use Hoeffding’s inequality to obtain a concentration inequality, apparently known as Khintchine’s inequality or proved in some form by Khintchine, for finite sums with random signs.

Theorem 3 (Khintchine’s Inequality). Suppose $X_{1},\cdots,X_{n}$ are i.i.d. random variables with $\mathbb{P}(X_{j} = 1)=\frac{1}{2}=\mathbb{P}(X_{j}=-1)$, and $a_{1},\cdots,a_{n}\in\mathbb{R}$ are arbitrary real numbers. If $Z := \sum_{j=1}^{n}a_{j}X_{j}$ and $\sigma_{n}^{2} := \sum_{j=1}^{n}a_{j}^{2}$, then

$\displaystyle\mathbb{P}\left(\left|Z\right|\geq\lambda\right)\leq 2\exp\left(-\frac{\lambda^{2}}{2\sigma_{n}^{2}}\right),\indent\forall\lambda >0,\forall n \geq 1$

Furthermore, for $p \in (1,\infty)$, there exist finite constants $A_{p}, B_{p} > 0$ independent of $a_{1},\cdots,a_{n}$ such that

$\displaystyle A_{p}\sigma_{n}\leq\left\|Z\right\|_{p}\leq B_{p}\sigma_{n}$

Proof. To obtain the tail bound, we apply Hoeffding’s inequality with $X_{j} \in [-\left|a_{j}\right|,\left|a_{j}\right|]$ to obtain

$\displaystyle\mathbb{P}\left(\left|Z\right| \geq \lambda\right) \leq 2\exp\left(-\dfrac{2\lambda^{2}}{\sum_{j=1}^{n}(2\left|a_{j}\right|)^{2}}\right) = 2\exp\left(-\dfrac{\lambda^{2}}{\sum_{j=1}^{n}2a_{j}^{2}}\right)=2\exp\left(-\dfrac{\lambda^{2}}{2\sigma_{n}^{2}}\right)$

We now turn to proving the second assertion. Using the “layer-cake” representation of the expected value, we have

$\displaystyle\left\|Z\right\|_{p}^{p} = p\int_{0}^{\infty}\lambda^{p-1}\mathbb{P}\left(\left|Z\right|>\lambda\right)d\lambda\leq p\int_{0}^{\infty}\lambda^{p-1}2\exp\left(-\frac{\lambda^{2}}{2\sigma_{n}^{2}}\right)d\lambda$

We make the change of variable $\tau = \frac{\lambda}{\sigma_{n}}$ to obtain

$\displaystyle\left\|Z\right\|_{p}^{p}\leq 2p\sigma_{n}\int_{0}^{\infty}(\tau \sigma_{n})^{p-1}\exp\left(-\frac{\tau^{2}}{2}\right)d\tau=\sigma_{n}^{p}\underbrace{2p\int_{0}^{\infty}\tau^{p-1}\exp\left(-\frac{\tau^{2}}{2}\right)d\tau}_{:=B_{p}^{p}}$

Taking the $p^{th}$ root of both sides completes the proof of the upper bound. For the lower bound, first note by Hölder’s inequality, $\left\|Z\right\|_{2} \leq \left\|Z\right\|_{p}\left\|Z\right\|_{q}$, where $\frac{1}{p}+\frac{1}{q} = 1$. Since the $X_{j}$ are independent, we have that $\left\|Z\right\|_{2} = \sigma_{n}$ and therefore

$\displaystyle\sigma_{n}B_{q}^{-1}\leq\sigma_{n}\left\|Z\right\|_{q}^{-1}=\left\|Z\right\|_{2}\left\|Z\right\|_{q}^{-1}\leq\left\|Z\right\|_{p}$

Taking $A_{p} := B_{q}^{-1}$ completes the proof. $\Box$

I leave it as an exercise to the reader to the prove the following generalization of Hoeffding’s inequality:

Theorem 4. Let $V, Z$ be random variables such that $\mathbb{E}[V\mid Z] =0$ a.e. and

$\displaystyle\mathbb{P}\left(f(Z)\leq V\leq f(Z)+c\right) = 1$

for some measurable function $f: \mathbb{R} \rightarrow \mathbb{R}$ and some constant $c > 0$. Then

$\displaystyle\mathbb{E}[e^{\xi V}\mid Z] \leq \exp\left(\dfrac{\xi^{2}c^{2}}{8}\right), \indent \forall \xi \in\mathbb{R}$