Binomial proportion confidence interval

Binomial proportion confidence interval

In statistics, a binomial proportion confidence interval is a confidence interval for a proportion in a statistical population. It uses the proportion estimated in a statistical sample and allows for sampling error. There are several formulas for a binomial confidence interval, but all of them rely on the assumption of a binomial distribution. A simple example of a binomial distribution is the number of heads observed when a coin is flipped ten times. In general, a binomial distribution applies when an experiment is repeated a fixed number of times, each trial of the experiment has two possible outcomes (labeled arbitrarily success and failure), the probability of success is the same for each trial, and the trials are statistically independent.

There are several ways to compute a confidence interval for a binomial proportion. The normal approximation interval is the simplest formula, and the one introduced in most basic Statistics classes and textbooks. This formula, however, is based on an approximation that does not always work well. Several competing formulas are available that perform better, especially for situations with a small sample size and a proportion very close to zero or one. The choice of interval will depend on how important it is to use a simple and easy to explain interval versus the desire for better accuracy.

Normal approximation interval

The simplest and most commonly used formula for a binomial confidence interval relies on approximating the binomial distribution with a normal distribution. This approximation is justified by the central limit theorem. The formula is

: hat p pm z_{1- alpha /2} sqrt{ frac{hat p left ( 1- hat p ight )}{n

where hat p is the proportion estimated from the statistical sample, z_{1- alpha /2} is the 1- alpha /2 percentile of a standard normal distribution, and n is the sample size.

The central limit theorem applies well to a binomial distribution, even with a sample size less than 30, as long as the proportion is not too close to 0 or 1. For very extreme probabilities, though, a sample size of 30 or more may still be inadequate. The normal approximation fails totally when the sample proportion is exactly zero or exactly one. A frequently cited rule of thumb is that the normal approximation works well as long as "np" > 5 and "n"(1 − "p") > 5; see however Brown et al. 2001.

An important theoretical derivation of this confidence interval involves the inversion of a hypothesis test. Under this formulation, the confidence interval represents those values of the population parameter that would have large p-values if they were tested as a hypothesized population proportion. The collection of values, heta, for which the normal approximation is valid can be represented as

: left { heta igg| Z_{alpha / 2} le frachat p - hetasqrt{hat p left ( {1-hat p} ight ) / n} le Z_{1-alpha / 2} ight }.

Since the test in the middle of the inequality is a Wald test, the normal approximation interval is sometimes called the Wald interval.

Wilson score interval

The Wilson interval is an improvement (the actual coverage probability is closer to the nominal value) over the normal approximation interval and was first developed in Wilson (1927).

: frac {hat p + frac12n z_{1- alpha / 2}^2 pm z_{1- alpha / 2}sqrt {frachat pleft( {1 - hat p} ight){n} + fracz_{1- alpha / 2}^2

This interval has good properties even for a small number of trials and/or an extreme probability. The center of the Wilson interval

:frac {hat p + frac12n z_{1- alpha / 2}^2 }

can be shown to be a weighted average of hat p = "X"/"n" and 1/2, with hat p receiving greater weight as the sample size increases. For the 95% interval, the Wilson interval is nearly identical to the normal approximation interval using scriptstyle ilde p, =,(X+2)/(n+4) instead of hat p.

The Wilson interval can be derived as

: left { heta igg| Z_{alpha / 2} le frachat p - hetasqrt{ heta left ( {1- heta} ight ) / n} le Z_{1-alpha / 2} ight }.

The test in the middle of the inequality is a score test, so the Wilson interval is sometimes called the Wilson score interval.

Clopper-Pearson interval

The Clopper-Pearson interval is an early and very common method for calculating exact binomial confidence intervals (Clopper and Pearson 1934). This method uses the cumulative probabilities of the binomial distribution. The Clopper-Pearson interval can be written as

: left { heta Big| P left [ mathrm{Bin} left ( n; heta ight ) le X ight ] ge alpha /2 ight } igcap left { heta Big| P left [ mathrm{Bin} left ( n; heta ight ) ge X ight ] ge alpha /2 ight }

where "X" is the number of successes observed in the sample and Bin("n"; θ) is a binomial random variable with "n" trials and probability of success θ.

Because of a relationship between the cumulative binomial distribution and the beta distribution, the Clopper-Pearson interval is sometimes presented in an alternate format that uses percentiles from the beta distribution. The beta distribution is, in turn, related to the F-distribution so a third formulation of the Clopper-Pearson interval uses F percentiles.

The Clopper-Pearson interval is an exact interval since it is based directly on the binomial distribution rather than any approximation to the binomial distribution. This interval, however, can be conservative because of the discrete nature of the binomial distribution.

Comparison of different intervals

There are several research papers that compare these and other confidence intervals for the binomial proportion. A good starting point is Agresti and Coull (1998) or Ross (2003) which point out that exact methods such as the Clopper-Pearson interval may not work as well as certain approximations. But it is still used today for many studies.

Web-based calculators

There are numerous web sites that will calculate a binomial proportion confidence interval.

* [http://www.dimensionresearch.com/resources/calculators/conf_prop.html Dimension Research. Confidence Interval for Proportions Calculator] uses the normal approximation.
* [http://faculty.vassar.edu/lowry/prop1.html VassarStats. Confidence Interval of a Proportion] uses the Wilson score interval method.
* [http://www.causascientia.org/math_stat/ProportionCI.html causaScientia. Exact Confidence Interval for a Proportion] uses a Bayesian interval with an uninformative prior distribution.
* [http://www.measuringusability.com/wald.htm Measuring Usability: Confidence Interval for a Completion Rate] Provides simultaneous computation of Wald, Adjusted-Wald (Agresti-Coull), Exact and Score Confidence Intervals.

References

* Agresti, A., and Coull, B. Approximate is better than 'exact' for interval estimation of binomial proportions. "The American Statistician" 52: 119-126, 1998.
* Brown, L. D., Cai, T. T., and DasGupta, A. Interval Estimation for a Binomial Proportion. "Statistical Science" 16(2): 101-117, 2001.
* Clopper, C. and Pearson, S. The use of confidence or fiducial limits illustrated in the case of the binomial. "Biometrika" 26: 404-413, 1934.
* Ross, T. D. Accurate confidence intervals for binomial proportion and Poisson rate estimation. "Computers in Biology and Medicine" 33: 509-531, 2003.
* Wilson, E. B. Probable inference, the law of succession, and statistical inference. "Journal of the American Statistical Association" 22: 209-212, 1927.


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • Confidence interval — This article is about the confidence interval. For Confidence distribution, see Confidence Distribution. In statistics, a confidence interval (CI) is a particular kind of interval estimate of a population parameter and is used to indicate the… …   Wikipedia

  • Binomial distribution — Probability distribution name =Binomial type =mass pdf cdf Colors match the image above parameters =n geq 0 number of trials (integer) 0leq p leq 1 success probability (real) support =k in {0,dots,n}! pdf ={nchoose k} p^k (1 p)^{n k} ! cdf =I {1… …   Wikipedia

  • List of factorial and binomial topics — This is a list of factorial and binomial topics in mathematics, by Wikipedia page. See also binomial (disambiguation).*Alternating factorial *Antichain *Beta function *Binomial coefficient *Binomial distribution *Binomial proportion confidence… …   Wikipedia

  • Confidence band — A confidence band is used in statistical analysis to represent the uncertainty in an estimate of a curve or function based on limited or noisy data. Confidence bands are often used as part of the graphical presentation of results in a statistical …   Wikipedia

  • Coverage probability — In statistics, the coverage probability of a confidence interval is the proportion of the time that the interval contains the true value of interest.[1] For example, suppose our interest is in the mean number of months that people with a… …   Wikipedia

  • Cumulative frequency analysis — is the applcation of estimation theory to exceedance probability (or equivalently to its complement). The complement, the non exceedance probability concerns the frequency of occurrence of values of a phenomenon staying below a reference value.… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • List of mathematics articles (B) — NOTOC B B spline B* algebra B* search algorithm B,C,K,W system BA model Ba space Babuška Lax Milgram theorem Baby Monster group Baby step giant step Babylonian mathematics Babylonian numerals Bach tensor Bach s algorithm Bachmann–Howard ordinal… …   Wikipedia

  • Sample size determination — is the act of choosing the number of observations to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample …   Wikipedia

  • Resampling (statistics) — In statistics, resampling is any of a variety of methods for doing one of the following: # Estimating the precision of sample statistics (medians, variances, percentiles) by using subsets of available data (jackknife) or drawing randomly with… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”