Coverage probability: Difference between revisions

fixed reference
m (fixed reference)
{{Use dmy dates|date=December 2013}}
In statistics, the '''coverage probability''' of a [[confidence interval]] is the proportion of the time that the interval contains the true value of interest.<ref>Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. ISBN 0-19-920613-9</ref> For example, suppose our interest is in the [[expected value|mean]] number of months that people with a particular type of [[cancer]] remain in remission following successful treatment with [[chemotherapy]]. The confidence interval aims to contain the unknown mean remission duration with a given probability. This is the "confidence level" or "confidence coefficient" of the constructed interval which is effectively the "nominal coverage probability" of the procedure for constructing confidence intervals. The "nominal coverage probability" is often set at 0.95. The ''coverage probability'' is the actual probability that the interval contains the true mean remission duration in this example.
If all assumptions used in deriving a confidence interval are met, the nominal coverage probability will equal the coverage probability (termed "true" or "actual" coverage probability for emphasis). If any assumptions are not met, the actual coverage probability could either be less than or greater than the nominal coverage probability. When the actual coverage probability is greater than the nominal coverage probability, the interval is termed "conservative", if it is less than the nominal coverage probability, the interval is termed "anti-conservative", or "permissive."
A discrepancy between the coverage probability and the nominal coverage probability frequently occurs when approximating a discrete distribution with a continuous one. The construction of [[Binomial proportion confidence interval|binomial confidence intervals]] is a classic example where coverage probabilities rarely equal nominal levels.<ref>{{cite journal | last = Agresti| first = Alan | coauthors = Coull, Brent | year = 1998 | title = Approximate Is Better than "Exact" for Interval Estimation of Binomial Proportions | journal = The American Statistician | volume = 52 | pages = 119–126 | jstor=2685469 | doi = 10.2307/2685469 | issue = 2}}</ref><ref>{{cite journal | last=Brown | first=Lawrence | coauthors=Cai, T. Tony; DasGupta, Anirban | title=Interval Estimation for a binomial proportion | journal=Statistical Science | year=2001 | volume=16 | issue=2 | pages=101–117 | url= | doi=10.1214/ss/1009213286}}</ref><ref>{{cite journal | last = Newcombe| first = Robert | year = 1998 | title = Two-sided confidence intervals for the single proportion: Comparison of seven methods. | journal = Statistics in Medicine | volume = 17 | number = 2, |issue 8 |pages = 857–872 | url= | doi = 10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.CO;2-E | pmid = 9595616 | issue = 8}}</ref> For the binomial case, several techniques for constructing intervals have been created. The Wilson or Score confidence interval is one well known construction based on the normal distribution. Other constructions include the Wald, exact, Agresti-Coull, and likelihood intervals. While the Wilson interval may not be the most conservative estimate, it produces average coverage probabilities that are equal to nominal levels while still producing a comparatively narrow confidence interval.
The "probability" in ''coverage probability'' is interpreted with respect to a set of hypothetical repetitions of the entire data collection and analysis procedure. In these hypothetical repetitions, [[independence (probability theory)|independent]] data sets following the same [[probability distribution]] as the actual data are considered, and a confidence interval is computed from each of these data sets.