## Fun with Generating Functions: Runs in Coin Flips

Well over a year ago, I uploaded some notes on using martingales to compute the expected waiting time for a pattern to appear in a sequence of independent, identically distributed trials. For example, if a monkey randomly hits a key on a typewriter every second, where all keys are equally probable, how long until the monkey types the string “abracadabra.” In today’s post, I want to consider the special case of tossing a coin until we achieve a run of $r$ consecutive heads (or tails, by symmetry). We’re going to attack this problem combinatorally, using generating functions to solve a recurrence relation.

The problem inspiring this post is Theorem 1.19 in Anirban Dasgupta’s text Probability and Statistics for Machine Learning. Many thanks to the author.

Suppose a coin which comes up heads with probability $p$ is tossed until $r$ consecutive heads have been returned. Furthermore, suppose that each toss is independent of the other tosses. Denote the number of tosses by $N=N_{r}$. Then

$\displaystyle\mathbf{E}[N]=\sum_{k=1}^{r}p^{-k}\text{ and }\text{Var}(N)=\dfrac{1-p^{1+2r}-qp^{r}(1+2r)}{q^{2}p^{2r}}$

We introduce the notation $p_{k}=\mathbf{P}(N=k)$, for $k=0,1,\ldots$. It is trivial to say that we must have at least $r$ flips to obtain $r$ heads; therefore, $p_{0}=\cdots=p_{r-1}=0$. If the first $r$ coin flips are heads, then by our hypothesis that the coin flips are independent, we have that $p_{r}=p^{r}$

For general $k>r$, we see that $N=k$ implies that the last $r$ tosses are heads and the $(k-r)^{th}$ toss is tails. Furthermore, there is no run of heads of length $r$ in the first $k-1$ tosses. Denote the outcomes of the $k$ tosses by the random variables $X_{1},\ldots,X_{k}$. We attack this problem by iteratively conditioning on the first $r$ tosses $X_{1},\ldots,X_{r}$.

Observe that if the outcome of the $j^{th}$ toss is tails and there is no run of $r$ heads in $X_{1},\ldots,X_{j-1}$, then we start over in terms of waiting for a run of $r$ heads. So, for $1\leq j\leq k-r$,

$\displaystyle p_{k}=\mathbf{P}\left(N=k\mid X_{j}=T\right)\mathbf{P}\left(X_{j}=T\right)=qp_{k-j}$

where $q=1-p$ is the probability of tails. Using the law of total probabiltiy, we see that

$\begin{array}{lcl}\displaystyle p_{k}&=&\displaystyle\mathbf{P}\left(N=k\mid X_{1}=H\right)\mathbf{P}\left(X_{1}=H\right)+\mathbf{P}\left(N=k\mid X_{1}=T\right)\mathbf{P}\left(X_{1}=T\right)\\[2 em]&=&\displaystyle qp_{k-1}+p\mathbf{P}\left(N=k\mid X_{1}=H\right)\end{array}$

To simplify the second term, we condition on the outcome of the second flip:

$\begin{array}{lcl}\displaystyle p_{k}&=&\displaystyle qp_{k-1}+p\left[\mathbf{P}\left(N=k\mid X_{1}=H,X_{2}=H\right)\mathbf{P}(X_{2}=H)+\mathbf{P}\left(N=k\mid X_{1}=H,X_{2}=T\right)\mathbf{P}(X_{2}=T)\right]\\[2 em]&=&\displaystyle qp_{k-1}+pqp_{k-2}+p\mathbf{P}\left(N=k\mid X_{1}=H,X_{2}=H\right)\end{array}$

We only need to condition on the first $r$ flips because if $X_{1}=\cdots=X_{r}=H$, then

$\displaystyle\mathbf{P}(N=k \mid X_{1}=\cdots=X_{r}=H)=\begin{cases} 1 & {k=r}\\ 0 & {k>r}\end{cases}$

We conclude that

$\displaystyle p_{k}=qp_{k-1}+pqp_{k-2}+\cdots+p^{r-1}qp_{k-r}$

Our problem is now how to solve this recurrence relation. Well, we can solve it using generating functions. Since $\sum p_{k}\leq1$, the power series $G(z)=\sum p_{k}z^{k}$ converges on the closed unit disk. We can write

$\begin{array}{lcl}\displaystyle G(z)=p_{r}z^{r}+\sum_{k=r+1}^{\infty}p_{k}z^{k}&=&\displaystyle p_{r}z^{r}+\sum_{k=r+1}^{\infty}\left[qp_{k-1}+pqp_{k-2}+\cdots+p^{r-1}qp_{k-r}\right]z^{k}\\[2 em]&=&\displaystyle p_{r}z^{r}+qzG(z)+pqz^{2}G(z)+\cdots+p^{r-1}qz^{r}G(z)\end{array}$

Rearranging, we can write $G(z)$ as

$\displaystyle G(z)\left[1-(qz+pqz^{2}+\cdots+p^{r-1}qz^{r})\right]=p_{r}z^{r}\Longrightarrow G(z)=\dfrac{p^{r}z^{r}}{1-qz(1+pz+\cdots+p^{r-1}z^{r-1})}$

We can simplify the expression for $G(z)$ by summing the geometric series $1+pz+\cdots+(pz)^{r-1}$ to obtain

$\displaystyle G(z)=\dfrac{p^{r}z^{r}}{1-qz\frac{1-(pz)^{r}}{1-pz}}=\dfrac{p^{r}z^{r}(1-pz)}{1-pz-qz+qz(pz)^{r}}=\dfrac{p^{r}z^{r}(1-pz)}{1-z+qp^{r}z^{r+1}}$,

where we use the fact that $p+q=1$.

Observe that $G(z)$ is a meromorphic function. Its only pole (i.e., where the denominator is zero) occurs where

$\displaystyle1+qp^{r}z^{r}=0\Longleftrightarrow z=\dfrac{(-1)^{r}}{q^{\frac{1}{r}}p}$

Since $q^{\frac{1}{r}}p<1$, we see that $G$ is holomorphic on the open disk $D(0;1+\varepsilon)$, for $\varepsilon>0$ sufficiently small. Hence, complex analysis tells us that the $n^{th}$ derivative of $G$ on the disk $D(0;1+\varepsilon)$ is given by differentiating the power series term-by-term $n$ times.

$\displaystyle G'(z)=\dfrac{p^{r}rz^{r-1}-(r+1)p^{r+1}z^{r}}{1-z+qp^{r}z^{r+1}}-\dfrac{p^{r}z^{r}(1-pz)\left(-1+(r+1)qp^{r}z^{r}\right)}{(1-z+qp^{r}z^{r+1})^{2}}$

Evaluating at $z=1$, we obtain that

$\begin{array}{lcl}\displaystyle G'(1)=\dfrac{p^{r}r-(r+1)p^{r+1}}{qp^{r}}-\dfrac{p^{r}(1-p)(-1+(r+1)qp^{r})}{q^{2}p^{2r}}&=&\displaystyle\dfrac{r-(r+1)p}{q}-\dfrac{(1-p)(-1+(r+1)qp^{r})}{q^{2}p^{r}}\\[1 em]&=&\displaystyle\dfrac{r-(r+1)p}{q}+\dfrac{1}{qp^{r}}-(r+1)\\[2 em]&=&\displaystyle\dfrac{p^{-r}-1}{1-p}\\[2 em]&=&\displaystyle\sum_{k=1}^{r}p^{-k}\end{array}$

Hence,

$\displaystyle\mathbf{E}[N]=\sum_{k=r}^{\infty}kp_{k}=G'(1)=\sum_{k=1}^{r}p^{-k}$

To compute the variance of $N$, we note that

$\displaystyle G''(1)=\sum_{k=r}^{\infty}k(k-1)p_{k}=\mathbf{E}\left[N^{2}-N\right]$

$\begin{array}{lcl}\displaystyle G''(z)&=&\displaystyle\dfrac{\left[p^{r}rz^{r-1}-(r+1)p^{r+1}z^{r}\right]\left(-1+(r+1)qp^{r}z^{r}\right)}{\left(1-z+qp^{r}z^{r+1}\right)^{2}}-\dfrac{p^{r}r(r-1)z^{r-2}-(r+1)rp^{r+1}z^{r-1}}{1-z+qp^{r}z^{r+1}}\\[2 em]&+&\displaystyle\dfrac{2p^{r}z^{r}(1-pz)\left(-1+(r+1)qp^{r}z^{r}\right)^{2}}{(1-z+qp^{r}z^{r+1})^{3}}-\dfrac{(r+1)p^{r+1}z^{r}-rp^{r}z^{r-1}+2r(r+1)qp^{2r}z^{2r-1}-(2r+1)(r+1)p^{2r+1}qz^{2r}}{\left(1-z+qp^{r}z^{r+1}\right)^{2}}\end{array}$

Evaluating at $z=1$, we see that

$\begin{array}{lcl}\displaystyle G''(1)&=&\displaystyle-\dfrac{\left(p^{r}r-(r+1)p^{r+1}\right)\left(-1+(r+1)qp^{r}\right)}{q^{2}p^{2r}}+\dfrac{r(r-1)p^{r}-(r+1)rp^{r+1}}{qp^{r}}\\[2 em]&+&\displaystyle\dfrac{2p^{r}(1-p)\left(-1+(r+1)qp^{r}\right)^{2}}{q^{3}p^{3r}}-\dfrac{(r+1)p^{r+1}-rp^{r}+2r(r+1)qp^{2r}-(2r+1)(r+1)p^{2r+1}q}{q^{2}p^{2r}}\\[2 em]&=&\displaystyle-\dfrac{\left(r-p(r+1)\right)\left(-1+(r+1)qp^{r}\right)}{q^{2}p^{r}}-r\dfrac{p(r+1)-(r-1)}{q}\\[2 em]&+&\displaystyle2\dfrac{(-1+(r+1)qp^{r})^{2}}{q^{2}p^{2r}}-\dfrac{p(r+1)-r+2r(r+1)qp^{r}-(2r+1)(r+1)qp^{r+1}}{q^{2}p^{r}}\\[2 em]&=&\displaystyle2\dfrac{r-(r+1)p}{q^{2}p^{r}}-\dfrac{(r-p(r+1))(r+1)}{q}+r(r+1)-\dfrac{2r}{q}+2\dfrac{(-1+(r+1)qp^{r})^{2}}{q^{2}p^{2r}}-\dfrac{2r(r+1)-(2r+1)(r+1)p}{q}\\[2 em]&=&\displaystyle2\dfrac{r-(r+1)p}{q^{2}p^{r}}+\dfrac{(r+1)((r+1)p-r)}{q}-2r(r+1)+\dfrac{(r+1)p-2r}{q}+2\dfrac{(-1+(r+1)qp^{r})^{2}}{q^{2}p^{2r}}\\[2 em]&=&\displaystyle2\dfrac{(-1+(r+1)qp^{r})^{2}}{q^{2}p^{2r}}+2\dfrac{r-(r+1)p}{q^{2}p^{r}}-2r(r+1)+2\dfrac{(r+1)p-r}{q}\\[2 em]&=&\displaystyle\dfrac{2-4(r+1)qp^{r}+2(r+1)q^{2}p^{2r}+2rp^{r}-2(r+1)p^{r+1}+2(r+1)qp^{2r+1}-2rqp^{2r}}{q^{2}p^{2r}}\end{array}$

Hence,

$\begin{array}{lcl}\displaystyle\text{Var}(N)&=&G''(1)+\mathbf{E}[N]-\mathbf{E}^{2}[N]\\[2 em]&=&\displaystyle G''(1)+\dfrac{qp^{r}-qp^{2r}-1+2p^{r}-p^{2r}}{q^{2}p^{2r}}\\[2 em]&=&\displaystyle\dfrac{1-(4r+3)qp^{r}+2(r+1)p^{r}-2(r+1)p^{r+1}+2(r+1)q^{2}p^{2r}+2(r+1)qp^{2r+1}-(2r+1)qp^{2r}}{q^{2}p^{2r}-p^{2r}}\\[2 em]&=&\displaystyle\dfrac{1-(2r+1)qp^{r}+2(r+1)qp^{2r}-(2r+1)qp^{2r}-p^{2r}}{q^{2}p^{2r}}\\[2 em]&=&\displaystyle\dfrac{1+(q-1)p^{2r}-(2r+1)qp^{r}}{q^{2}p^{2r}}\\[2 em]&=&\displaystyle\dfrac{1-p^{2r+1}-(2r+1)qp^{r}}{q^{2}p^{2r}}\end{array}$