Wald Interval Coverage Probability

For work, I’ve needed to use confidence intervals for the sample proportion of a binomial distributed random variable, in particular for “small samples.” In the interests of concreteness, I am going to motivate this problem with an example. 

Suppose we have a weighted coin which, when flipped, comes up heads with probability p and comes up tails with q=1-p. We don’t know the actual value of p, only that its between 0 and 1. To estimate p, we start flipping our coin. If the outcome of the i^{th} flip X_{i} is heads, we record a 1; if tails, a 0. After n flips, we observe the number of heads, which we can write as

\displaystyle X=X_{1}+\cdots+X_{n}

Obviously, the value of X is an integer between 0 and n, including the endpoints. There is no reason to suspect that the outcome of one flip affects the outcome of another flip; in other words, the trials are independent. So the probability of any specific sequence of heads and tails depends only on the number of heads k. For instance, if n = 10 and k=4, then

\displaystyle\mathbf{P}(HHHHTTTTTT)=\mathbf{P}(HHTTHTTTTH)=p^{4}(1-p)^{6}

There are {10\choose 4}=\frac{10!}{6!4!} sequences with four heads, hence the probability that four of the ten flips are heads is

\displaystyle\mathbf{P}\left(X=4\right)={10\choose4}p^{4}(1-p)^{6}

In general, the probability that k of the n flips are heads is

\displaystyle\mathbf{P}(X=4)={n\choose k}p^{k}(1-p)^{n-k}

By now, you might have surmised that X is a binomially distributed random variable with parameters n and p. More compactly, X\sim\text{Bin}(n,p).

Returning to our original problem, we estimate the value of p by computing the sample proportion of heads \hat{p}=\frac{X}{n}. The sample proportion is itself a random variable. It takes values in the set \left\{0,\frac{1}{n},\cdots,\frac{n-1}{n},1\right\} and has probability mass function

\displaystyle\mathbf{P}\left(\hat{p}=\frac{k}{n}\right)={n\choose k}p^{k}(1-p)^{n-k},\indent\forall k=0,\cdots,n

Using the mean and variance of the binomial distribution and the linearity of expectation, we see that the mean and variance of \hat{p} are respectively given by

\displaystyle\mathbb{E}[\hat{p}]=\dfrac{1}{n}\mathbb{E}[X]=p\text{ and }\displaystyle\text{var}(\hat{p})=\frac{1}{n^{2}}\text{var}(X)=\dfrac{p(1-p)}{n}

The sample proportion \hat{p} provides us with a point estimate of the coin’s parameter p, but a point estimate alone is not terribly helpful if its inaccuracte. Hence, we seek a measure of the error of the estimator \hat{p}. One technique is to construct a random interval based on the observed data that contains the true value p with high probability. We call such an interval a confidence interval and refer to the probability that the interval contains true the value of the parameter as the coverage probability. Typically, we specify a nominal significance value \alpha\in(0,1) and then say that the confidence of the interval is (1-\alpha)\times 100 %. A common value is 95%.

Most–or at least, all the ones I have seen–introductory statistics textbooks take the approach that, by the central limit theorem, \hat{p} is approximately normally distributed with mean p and variance \frac{p(1-p)}{n}, where this approximation improves as n\rightarrow\infty. Not knowing p, we also do not know the standard deviation of \hat{p}. Instead, we estimate the standard deviation by substituting \hat{p} for p and obtain an approximate pivotal quantity

\displaystyle\dfrac{\hat{p}-p}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}}\sim{N}(0,1)

If our significance level is \alpha, then the idea is the approximation

\begin{array}{lcl}\displaystyle\mathbf{P}\left(z_{\frac{\alpha}{2}}<\dfrac{\hat{p}-p}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}}<z_{1-\frac{\alpha}{2}}\right)&=&\displaystyle\mathbf{P}\left(\hat{p}+z_{\frac{\alpha}{2}}\sqrt{\frac{p(1-p)}{n}},\hat{p}+z_{1-\frac{\alpha}{2}}\sqrt{\frac{p(1-p)}{n}}\right)\\&\approx&\displaystyle(1-\alpha),\end{array}

where z_{\frac{\alpha}{2}} and z_{1-\frac{\alpha}{2}} are the values of the inverse standard normal cummulative distribution function \Phi^{-1} evaluated at z=\frac{\alpha}{2} and z=1-\frac{\alpha}{2}, respectively. Since z_{\frac{\alpha}{2}}=-z_{1-\frac{\alpha}{2}} by the symmetry of the standard normal distribution, we are led to the Wald interval

\displaystyle\left(\hat{p}-z_{1-\frac{\alpha}{2}}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}},\hat{p}+z_{1-\frac{\alpha}{2}}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\right),

One cannot stress enough the fact that the distribution used to compute the Wald interval relies on an approximation, the appropriateness of which depends on the sample size n. The expression “sufficiently large” is qualitative and vague and therefore useless. Quantitative conditions for the appropriateness of the Wald interval include

  1. np\geq5\text{ and }n(1-p)\geq5
  2. np\geq10\text{ and }n(1-p)\geq10
  3. n\geq50

All three require that n is not too small; but the first two also require that p is only close to the endpoints 0 and 1, if n is large. Students seem to accept these prescriptions without question; I certainly did in my AP Statistics class in high school. Some far more learned professor is telling you the rule of thumb. What basis do you have to doubt him or her?

I wrote a little R script to calculate the coverage probability for the Wald interval for fixed sample size n\geq1 and varying population parameter p\in[0,1]. Below are some graphs to illustrate the poor performance, as measured in coverage probability, of the Wald interval for sample sizes considered sufficiently large by many statistics textbooks. 

Wald Coverage Probability n = 30

Wald Coverage Probability n = 40

Wald Coverage Probability n = 50

 

wald.coverage <- function(n,a,subtitle.str){
p = seq(0,1,1/20000)
z <- abs(qnorm(.5*a,0,1))
coverage = 0
for(i in 1:length(p)){
coverage[i]=1
for(k in 0:n){
p_hat <- k/n
se <- sqrt((p_hat*(1-p_hat))/n)
coverage[i] <- (coverage[i]-ifelse((p_hat+(z*se))<p[i] | p[i]<(p_hat-(z*se)),1,0)*dbinom(k,n,p[i]))
}}
plot(p,coverage,type="l",main="Wald Interval Coverage Probability",xlab="p (population parameter)",ylab="coverage probability",ylim=c(.8,1),xaxp=c(0,1,20),yaxp=c(.8,1,20))
abline(h=(1-a))
title(sub = subtitle.str)
}

k <- ceiling(50*(p[i]+z*sqrt((p[i]*(1-p[i]))/50)))
coverage[i] <- pbinom(k-1,50,p[i])

Advertisements
This entry was posted in math.PR, math.ST, Problem Solving and tagged , , , , , , . Bookmark the permalink.

3 Responses to Wald Interval Coverage Probability

  1. jayden says:

    I found your blog on google and also test some of your early posts. Your site is just wonderful.

  2. Stanley Kay says:

    I find this material a good treatment on binomial distribution topic. By the coverage probability plots, are you trying to show that the curve is far away from a normal/gaussian distribution? I am kind of lost on that point. Also kindly share the R code thanks.

    • Matt R. says:

      Hi Stanley,

      My apologies for getting back to you so late; I’ve been very busy. In the plots, I am showing that even for n large by “rule of thumb” standards, the Wald interval has poor coverage probability if the actual parameter p is close to 0 or 1.

      I’ve uploaded the R code to the site and added a link in the post.

      Best Regards,
      Matt

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s