(Azuma-)Hoeffding Inequality

In this post, we prove a concentration inequality for bounded independent random variables due to Hoeffding.

Lemma 1. Suppose X is a real-valued random variable with \mathbb{E}[X] = 0 and \left|X\right| \leq c almost surely, for some positive constant c. Then

\displaystyle\mathbb{E}\left[e^{\xi X}\right]\leq\exp\left(\frac{\xi^{2}c^{2}}{2}\right) \text{ and } \displaystyle\mathbb{E}\left[e^{\xi\left|X\right|}\right]\leq 2\exp\left(\frac{\xi^{2}c^{2}}{2}\right), \indent \displaystyle\forall\xi\in\mathbb{R}

Proof. We first show that, for \xi \in \mathbb{R} fixed,

\displaystyle e^{\xi x}\leq\dfrac{e^{\xi c}(c+x)}{2c}+\dfrac{e^{-\xi c}(c-x)}{2c},\indent\forall x\in [-c,c]

Note that the function f(x) := e^{\xi x} is convex on [-c,c], since

\displaystyle f'(x)=\xi e^{\xi x}\Longrightarrow f''(x)=\xi^{2}e^{\xi x}\geq 0, \displaystyle\indent\forall x\in[-c,c]

Since \frac{c+x}{2c}+\frac{c-x}{2c}=1, we have by convexity that

\displaystyle e^{\xi x}=\exp\left(\frac{\xi c(c+x)}{2c}-\frac{\xi c(c-x)}{2c}\right)\leq e^{\xi c}\frac{c+x}{2c}+e^{-\xi c}\frac{c-x}{2c}

Denote the distribution function of X by F. Then since 0=\mathbb{E}[X] =\int_{-c}^{c}xdF(x), we have the inequality

\begin{array}{lcl}\displaystyle\mathbb{E}[e^{\xi X}]=\int_{-c}^{c}e^{\xi x}dF(x)\leq\int_{-c}^{c}\left[\dfrac{e^{\xi c}(c+x)}{2c}+\dfrac{e^{-\xi c}(c-x)}{2c}\right]dF(x)&=&\dfrac{e^{\xi c}+e^{-\xi c}}{2}\\&=&\dfrac{e^{\left|\xi\right|c}+e^{-\left|\xi\right|c}}{2}\end{array}

Applying the Taylor expansion to the preceding expression, we see that

\begin{array}{lcl}\displaystyle\frac{1}{2}\left[e^{\left|\xi\right|c}+e^{-\left|\xi\right|c}\right]=\frac{1}{2}\left[\sum_{k=0}^{\infty}\frac{(\left|\xi\right|c)^{k}}{k!}+\sum_{k=0}^{\infty}\frac{(-1)^{k}(\left|\xi\right|c)^{k}}{k!}\right]=\frac{1}{2}\sum_{k=0}^{\infty}2\frac{(\left|\xi\right|c)^{2k}}{(2k)!}&\leq&\displaystyle\sum_{k=0}^{\infty}\frac{(\left|\xi\right|c)^{2k}}{2^{k}k!}\\&=&\displaystyle\exp\left(\frac{\xi^{2}c^{2}}{2}\right)\end{array},

which completes the proof. \Box

Note that if X \in [a,b], almost surely, where a,b are finite constants, then the preceding argument gives us the upper bound

\displaystyle\mathbb{E}[e^{\xi X}]\leq\exp\left(\frac{\xi^{2}(b-a)^{2}}{8}\right)

We can use Hoeffding’s inequality to obtain a special case of the Azuma-Hoeffding concentration inequality. Let X_{1},\cdots,X_{n} be independent random variables where X_{j} almost surely lies in the compact interval [a_{j},b_{j}] for 1 \leq j \leq n. Then by Markov’s inequality and independence,

\begin{array}{lcl}\displaystyle\mathbb{P}\left(S_{n}-\mathbb{E}[S_{n}]\geq t\right)=\mathbb{P}\left(e^{\xi(S_{n}-\mathbb{E}[S_{n}])}\geq\displaystyle e^{\xi t}\right) &\leq&\displaystyle e^{-\xi t}\mathbb{E}\left[e^{\xi(S_{n}-\mathbb{E}[S_{n}])}\right]\\&=&\displaystyle e^{-\xi t}\prod_{j=1}^{n}\mathbb{E}\left[e^{\xi(X_{j}-\mathbb{E}[X_{j}])}\right]\end{array}

for all \xi > 0. Applying Hoeffding’s inequality to the random variables X_{1}-\mathbb{E}[X_{1}],\cdots,X_{n}-\mathbb{E}[X_{n}] to obtain

\begin{array}{lcl}\displaystyle\mathbb{P}\left(S_{n}-\mathbb{E}[S_{n}]\geq t\right)&\leq&\displaystyle e^{-\xi t}\prod_{j=1}^{n}e^{\frac{\xi^{2}(b_{j}-a_{j})^{2}}{8}}\\&=&\displaystyle\exp\left(-\xi t+\xi^{2}\sum_{j=1}^{n}\frac{(b_{j}-a_{j})^{2}}{8}\right)\end{array}

Set \alpha := \sum_{j=1}^{n}\frac{(b_{j}-a_{j})^{2}}{8} and consider the function f(\xi) = e^{-\xi t + \xi^{2}\alpha}. Since the upper f(\xi) is an upper bound for the LHS above for all \xi \geq 0, we now to turn to finding the minimizing value \xi. Completing the square, we can write

\displaystyle\alpha \xi^{2}-\xi t = \alpha\left(\xi - \frac{t}{2\alpha}\right)^{2} - \frac{t^{2}}{4\alpha} \Longrightarrow f\left(\frac{t}{2\alpha}\right) \leq f(\xi), \indent\forall \xi \geq 0

Hence,

\begin{array}{lcl}\displaystyle\mathbb{P}\left(S_{n}-\mathbb{E}[S_{n}]\geq t\right)\leq\exp\left(-\dfrac{t^{2}}{4\sum_{j=1}^{n}\frac{(b_{j}-a_{j})^{2}}{8}}\right)&=&\exp\left(-\dfrac{t^{2}}{\sum_{j=1}^{n}\frac{(b_{j}-a_{j})^{2}}{2}}\right)\\&=&\displaystyle\exp\left(-\dfrac{2t^{2}}{\sum_{j=1}^{n}(b_{j}-a_{j})^{2}}\right)\end{array}

Applying the same argument to \mathbb{E}[S_{n}]-S_{n}, we also have the upper bound

\displaystyle\mathbb{P}\left(\mathbb{E}[S_{n}]-S_{n}\geq t\right) \leq \exp\left(-\dfrac{2t^{2}}{\sum_{j=1}^{n}(b_{j}-a_{j})^{2}}\right)

Combining these two results, we obtain the following tail bound for \left|S_{n}-\mathbb{E}[S_{n}]\right|:

\displaystyle\mathbb{P}\left(\left|S_{n}-\mathbb{E}[S_{n}]\right| \geq t\right) = \mathbb{P}\left(S_{n} -\mathbb{E}[S_{n}] \geq t\right) + \mathbb{P}\left(S_{n}-\mathbb{E}[S_{n}] \leq -t\right) \leq 2\exp\left(-\dfrac{2t^{2}}{\sum_{j=1}^{n}(b_{j}-a_{j})^{2}}\right)

We summarize these results with the following theorem.

Theorem 2. (Hoeffding)

Suppose X_{1},\cdots,X_{n} are independent random variables, where X_{j} \in [a_{j},b_{j}] \ \forall 1 \leq j \leq n, almost surely. If S_{n} := \sum_{j=1}^{n}X_{j}, then

\displaystyle\mathbb{P}\left(S_{n}-\mathbb{E}[S_{n}]\geq t\right)\leq \exp\left(-\dfrac{2t^{2}}{\sum_{j=1}^{n}(b_{j}-a_{j})^{2}}\right)

and

\displaystyle\mathbb{P}\left(\left|S_{n}-\mathbb{E}[S_{n}]\right|\geq t\right)\leq 2\exp\left(-\dfrac{2t^{2}}{\sum_{j=1}^{n}(b_{j}-a_{j})^{2}}\right), \indent \forall t> 0

We now use Hoeffding’s inequality to obtain a concentration inequality, apparently known as Khintchine’s inequality or proved in some form by Khintchine, for finite sums with random signs.

Theorem 3 (Khintchine’s Inequality). Suppose X_{1},\cdots,X_{n} are i.i.d. random variables with \mathbb{P}(X_{j} = 1)=\frac{1}{2}=\mathbb{P}(X_{j}=-1), and a_{1},\cdots,a_{n}\in\mathbb{R} are arbitrary real numbers. If Z := \sum_{j=1}^{n}a_{j}X_{j} and \sigma_{n}^{2} := \sum_{j=1}^{n}a_{j}^{2}, then

\displaystyle\mathbb{P}\left(\left|Z\right|\geq\lambda\right)\leq 2\exp\left(-\frac{\lambda^{2}}{2\sigma_{n}^{2}}\right),\indent\forall\lambda >0,\forall n \geq 1

Furthermore, for p \in (1,\infty), there exist finite constants A_{p}, B_{p} > 0 independent of a_{1},\cdots,a_{n} such that

\displaystyle A_{p}\sigma_{n}\leq\left\|Z\right\|_{p}\leq B_{p}\sigma_{n}

Proof. To obtain the tail bound, we apply Hoeffding’s inequality with X_{j} \in [-\left|a_{j}\right|,\left|a_{j}\right|] to obtain

\displaystyle\mathbb{P}\left(\left|Z\right| \geq \lambda\right) \leq 2\exp\left(-\dfrac{2\lambda^{2}}{\sum_{j=1}^{n}(2\left|a_{j}\right|)^{2}}\right) = 2\exp\left(-\dfrac{\lambda^{2}}{\sum_{j=1}^{n}2a_{j}^{2}}\right)=2\exp\left(-\dfrac{\lambda^{2}}{2\sigma_{n}^{2}}\right)

We now turn to proving the second assertion. Using the “layer-cake” representation of the expected value, we have

\displaystyle\left\|Z\right\|_{p}^{p} = p\int_{0}^{\infty}\lambda^{p-1}\mathbb{P}\left(\left|Z\right|>\lambda\right)d\lambda\leq p\int_{0}^{\infty}\lambda^{p-1}2\exp\left(-\frac{\lambda^{2}}{2\sigma_{n}^{2}}\right)d\lambda

We make the change of variable \tau = \frac{\lambda}{\sigma_{n}} to obtain

\displaystyle\left\|Z\right\|_{p}^{p}\leq 2p\sigma_{n}\int_{0}^{\infty}(\tau \sigma_{n})^{p-1}\exp\left(-\frac{\tau^{2}}{2}\right)d\tau=\sigma_{n}^{p}\underbrace{2p\int_{0}^{\infty}\tau^{p-1}\exp\left(-\frac{\tau^{2}}{2}\right)d\tau}_{:=B_{p}^{p}}

Taking the p^{th} root of both sides completes the proof of the upper bound. For the lower bound, first note by Hölder’s inequality, \left\|Z\right\|_{2} \leq \left\|Z\right\|_{p}\left\|Z\right\|_{q}, where \frac{1}{p}+\frac{1}{q} = 1. Since the X_{j} are independent, we have that \left\|Z\right\|_{2} = \sigma_{n} and therefore

\displaystyle\sigma_{n}B_{q}^{-1}\leq\sigma_{n}\left\|Z\right\|_{q}^{-1}=\left\|Z\right\|_{2}\left\|Z\right\|_{q}^{-1}\leq\left\|Z\right\|_{p}

Taking A_{p} := B_{q}^{-1} completes the proof. \Box

I leave it as an exercise to the reader to the prove the following generalization of Hoeffding’s inequality:

Theorem 4. Let V, Z be random variables such that \mathbb{E}[V\mid Z] =0 a.e. and

\displaystyle\mathbb{P}\left(f(Z)\leq V\leq f(Z)+c\right) = 1

for some measurable function f: \mathbb{R} \rightarrow \mathbb{R} and some constant c > 0. Then

\displaystyle\mathbb{E}[e^{\xi V}\mid Z] \leq \exp\left(\dfrac{\xi^{2}c^{2}}{8}\right), \indent \forall \xi \in\mathbb{R}

Advertisements
This entry was posted in math.PR and tagged . Bookmark the permalink.

One Response to (Azuma-)Hoeffding Inequality

  1. Pingback: Azuma-Hoeffding Inequality | Math by Matt

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s