Well over a year ago, I uploaded some notes on using martingales to compute the expected waiting time for a pattern to appear in a sequence of independent, identically distributed trials. For example, if a monkey randomly hits a key on a typewriter every second, where all keys are equally probable, how long until the monkey types the string “abracadabra.” In today’s post, I want to consider the special case of tossing a coin until we achieve a run of consecutive heads (or tails, by symmetry). We’re going to attack this problem combinatorally, using generating functions to solve a recurrence relation.

The problem inspiring this post is Theorem 1.19 in Anirban Dasgupta’s text *Probability and Statistics for Machine Learning*. Many thanks to the author.

Suppose a coin which comes up heads with probability is tossed until consecutive heads have been returned. Furthermore, suppose that each toss is independent of the other tosses. Denote the number of tosses by . Then

We introduce the notation , for . It is trivial to say that we must have at least flips to obtain heads; therefore, . If the first coin flips are heads, then by our hypothesis that the coin flips are independent, we have that .

For general , we see that implies that the last tosses are heads and the toss is tails. Furthermore, there is no run of heads of length in the first tosses. Denote the outcomes of the tosses by the random variables . We attack this problem by iteratively conditioning on the first tosses .

Observe that if the outcome of the toss is tails and there is no run of heads in , then we start over in terms of waiting for a run of heads. So, for ,

where is the probability of tails. Using the law of total probabiltiy, we see that

To simplify the second term, we condition on the outcome of the second flip:

We only need to condition on the first flips because if , then

We conclude that

Our problem is now how to solve this recurrence relation. Well, we can solve it using generating functions. Since , the power series converges on the closed unit disk. We can write

Rearranging, we can write as

We can simplify the expression for by summing the geometric series to obtain

,

where we use the fact that .

Observe that is a meromorphic function. Its only pole (i.e., where the denominator is zero) occurs where

Since , we see that is holomorphic on the open disk , for sufficiently small. Hence, complex analysis tells us that the derivative of on the disk is given by differentiating the power series term-by-term times.

Evaluating at , we obtain that

Hence,

To compute the variance of , we note that

Evaluating at , we see that

Hence,