Cumulative distribution function: Difference between revisions

no edit summary
The cumulative distribution function of a real-valued [[random variable]] <math>X</math> is the function given by<ref name=KunIlPark>{{cite book | author=Park, Kun Il| title=Fundamentals of Probability and Stochastic Processes with Applications to Communications| publisher=Springer | year=2018 | isbn=978-3-319-68074-3}}</ref>{{rp|p. 77}}
{{Equation box 1
==Derived functions==
===Complementary cumulative distribution function (tail distribution)===<!-- This section is linked from [[Power law]], [[Stretched exponential function]] and [[Weibull distribution]] -->
Sometimes, it is useful to study the opposite question and ask how often the random variable is ''above'' a particular level. This is called the '''complementary cumulative distribution function''' ('''ccdf''') or simply the '''tail distribution''' or '''exceedance''', and is defined as
:<math>\bar F_X(x) = \operatorname{P}(X > x) = 1 - F_X(x).</math>
This has applications in [[statistics|statistical]] [[hypothesis test]]ing, for example, because the one-sided [[p-value]] is the probability of observing a test statistic ''at least'' as extreme as the one observed. Thus, provided that the [[test statistic]], ''T'', has a continuous distribution, the one-sided [[p-value]] is simply given by the ccdf: for an observed value <math>t</math> of the test statistic
:<math>p= \operatorname{P}(T \ge t) = \operatorname{P}(T > t) =1 - F_T(t).</math>
:Then, on recognizing <math>\bar F_X(c) = \int_c^\infty f_X(x) \, dx </math> and rearranging terms,
0 \leq c\bar F_X(c) \leq \operatorname{E}(X) - \int_0^c x f_X(x) \, dx \to 0 \text{ as } c \to \infty
:as claimed.
| volume = 81 | issue = 8 | pages = 1179–1182
| year = 2011
| pmid = | pmc = | url =
}}<</ref>) of the distribution or of the empirical results.
If the CDF ''F'' is strictly increasing and continuous then <math> F^{-1}( p ), p \in [0,1], </math> is the unique real number <math> x </math> such that <math> F(x) = p </math>. In such a case, this defines the '''inverse distribution function''' or [[quantile function]].
Some distributions do not have a unique inverse (for example in the case where <math>f_X(x)=0</math> for all <math>a<x<b</math>, causing <math>F_X</math> to be constant). This problem can be solved by defining, for <math> p \in [0,1] </math>, the '''generalized inverse distribution function''':
F^{-1}(p) = \inf \{x \in \mathbb{R}: F(x) \geq p \}.
* Example 1: The median is <math>F^{-1}( 0.5 )</math>.
* Example 2: Put <math> \tau = F^{-1}( 0.95 ) </math>. Then we call <math> \tau </math> the 95th percentile.
Some useful properties of the inverse cdf (which are also preserved in the definition of the generalized inverse distribution function) are:
==Use in statistical analysis==
The concept of the cumulative distribution function makes an explicit appearance in statistical analysis in two (similar) ways. [[Cumulative frequency analysis]] is the analysis of the frequency of occurrence of values of a phenomenon less than a reference value. The [[empirical distribution function]] is a formal direct estimate of the cumulative distribution function for which simple statistical properties can be derived and which can form the basis of various [[statistical hypothesis test]]s. Such tests can assess whether there is evidence against a sample of data having arisen from a given distribution, or evidence against two samples of data having arisen from the same (unknown) population distribution.
===Kolmogorov–Smirnov and Kuiper's tests===