Binomial proportion confidence interval: Difference between revisions

add equivalent formulas for normal approx. and Wilson score intervals obtained by multiplying the nominator and denominator by n, as these are often more computationally convenient; edit description to introduce n_S = np and n_F = n(1-p)
(add equivalent formulas for normal approx. and Wilson score intervals obtained by multiplying the nominator and denominator by n, as these are often more computationally convenient; edit description to introduce n_S = np and n_F = n(1-p))
The approximation is usually justified by the [[central limit theorem]]. The formula is
 
: <math>\hat p \pm z_{1 - \frac{\alpha}{2}}z \sqrt{\frac{1}{n}\hat p \left(1 - \hat p \right)}</math>
 
or, equivalently
where <math>\hat p</math> is the proportion of successes in a [[Bernoulli trial]] process estimated from the statistical sample, <math>z</math> is the <math>\scriptstyle 1 - \frac{1}{2}\alpha</math> [[quantile]] of a [[standard normal distribution]], <math>\alpha</math> is the error quantile and ''n'' is the sample size. For example, for a 95% confidence level the error (<math>\alpha</math>) is 5%, so <math>\scriptstyle 1 - \frac{1}{2}\alpha</math> = 0.975 and <math>z</math> = 1.96.
 
: <math>\frac{1}{n} \left[ n_S \pm z \sqrt{\frac{1}{n} n_S n_F} \right]</math>
The [[central limit theorem]] applies poorly to this distribution with a sample size less than 30 or where the proportion is close to 0 or 1. The normal approximation fails totally when the sample proportion is exactly zero or exactly one. A frequently cited rule of thumb is that the normal approximation is a reasonable one as long as ''np''&nbsp;>&nbsp;5 and ''n''(1&nbsp;&minus;&nbsp;''p'')&nbsp;>&nbsp;5, however even this is unreliable in many cases; see Brown et al. 2001.<ref name=Brown2001>
 
where <math>\hat p = n_S / n</math> is the proportion of successes in a [[Bernoulli trial]] process estimatedwith from<math>n</math> thetrials statisticalyielding sample<math>n_S</math> successes and <math>n_F = n - n_S</math> failures, and <math>z</math> is the <math>\scriptstyle 1 - \fractfrac{1}{2}\alpha</math> [[quantile]] of a [[standard normal distribution]], corresponding to the target error rate <math>\alpha</math> is the error quantile and ''n'' is the sample size. For example, for a 95% confidence level the error (<math>\alpha</math>) is 5%&nbsp;=&nbsp;0.05, so <math>\scriptstyle 1 - \fractfrac{1}{2}\alpha</math> &nbsp;= &nbsp;0.975 and <math>z</math> &nbsp;= &nbsp;1.96.
 
The [[central limit theorem]] applies poorly to this distribution with a sample size less than 30 or where the proportion is close to 0 or 1. The normal approximation fails totally when the sample proportion is exactly zero or exactly one. A frequently cited rule of thumb is that the normal approximation is a reasonable one as long as ''np''<math>n_S</math>&nbsp;>&nbsp;5 and ''n''(1&nbsp;&minus;&nbsp;''p'')<math>n_F</math>&nbsp;>&nbsp;5, however even this is unreliable in many cases; see Brown et al. 2001.<ref name=Brown2001>
{{Cite journal
| last1 = Brown
: <math>\left\{ \theta \bigg| y \le \frac{\hat p - \theta}{\sqrt{\frac{1}{n}\hat p \left(1 - \hat p\right)}} \le z \right\}</math>
 
where <math>y</math> is the <math>\scriptstyle \fractfrac{1}{2}\alpha</math> [[quantile]] of a [[standard normal distribution]].
 
Since the test in the middle of the inequality is a [[Wald test]], the normal approximation interval is sometimes called the [[Abraham Wald|Wald]] interval, but [[Pierre-Simon Laplace]] first described it in his 1812 book ''Théorie analytique des probabilités'' (page 283).
}
\right]
</math>
 
or, equivalently
 
:<math>
\frac{1}{n + z^2}
\left[
n_S + \frac{1}{2} z^2 \pm
z \sqrt{
\frac{1}{n} n_S n_F +
\frac{1}{4}z^2
}
\right]
</math>
 
</math>
 
can be shown to be a weighted average of <math>\hat{p} = \scriptstyle \fractfrac{Xn_S}{n}</math> and <math>\scriptstyle \fractfrac{1}{2}</math>, with <math>\hat{p}</math> receiving greater weight as the sample size increases. For the 95% interval, the Wilson interval is nearly identical to the normal approximation interval using <math>\tilde p \,=\, \scriptstyle \fractfrac{Xn_S + 2}{n + 4}</math> instead of <math>\hat{p}</math>.
 
===Wilson score interval with continuity correction===