Cumulative distribution function: Difference between revisions

Add more examples both in one and two variables case; add sentences to explain how to get PDF from CDF; add derived function like Z-table and empirical function.
(Add more examples both in one and two variables case; add sentences to explain how to get PDF from CDF; add derived function like Z-table and empirical function.)
If treating several random variables <math>X,Y,\ldots</math> etc. the corresponding letters are used as subscripts while, if treating only one, the subscript is usually omitted. It is conventional to use a capital <math>F</math> for a cumulative distribution function, in contrast to the lower-case <math>f</math> used for [[probability density function]]s and [[probability mass function]]s. This applies when discussing general distributions: some specific distributions have their own conventional notation, for example the [[normal distribution]].
 
The CDFprobability density function of a [[continuous random variable]] <math>X</math> can be expresseddetermined asfrom the integralcumulative of its probability densitydistribution function <math>f_X</math> as follows:<refby name=KunIlPark/>{{rp|pdifferentiating. 86}}
 
[[Probability density function|Probability Density Function]] from the Cumulative Distribution Function<ref>{{Cite book|title=Applied Statistics and Probability for Engineers|last=|first=|publisher=|year=|isbn=1119456266|location=|pages=70}}</ref>
 
Given F(x),
 
''f (x) ='' <math>{dF(x) \over dx}</math>, as long as the derivative exists.
 
 
The CDF of a [[continuous random variable]] <math>X</math> can be expressed as the integral of its probability density function <math>f_X</math> as follows:<ref name="KunIlPark" />{{rp|p. 86}}
 
:<math>F_X(x) = \int_{-\infty}^x f_X(t)\,dt.</math>
 
In the case of a random variable <math>X</math> which has distribution having a discrete component at a value <math>b</math>,
 
:<math>\operatorname{P}(X=b) = F_X(b) - \lim_{x \to b^{-}} F_X(x).</math>
 
 
for all real numbers <math>a</math> and <math>b</math>. The function <math>f_X</math> is equal to the [[derivative]] of <math>F_X</math> [[almost everywhere]], and it is called the [[probability density function]] of the distribution of <math>X</math>.
 
== Examples ==
As an example, suppose <math>X</math> is [[uniformUniform distribution (continuous)|uniformly distributed]] on the unit interval <math>[0,1]</math>.
 
Then the CDF of <math>X</math> is given by
 
: <math>F_X(x) = \begin{cases}
 
0 &:\ x < 0\\
 
x &:\ 0 \le x \le 1\\
 
1 &:\ x > 1
 
\end{cases}</math>
 
Suppose instead that <math>X</math> takes only the discrete values 0 and 1, with equal probability.
 
Then the CDF of <math>X</math> is given by
 
: <math>F_X(x) = \begin{cases}
 
0 &:\ x < 0\\
 
1/2 &:\ 0 \le x < 1\\
 
1 &:\ x \ge 1
 
\end{cases}</math>
 
Suppose <math>X</math> is [[Exponential distribution|exponential distributed]]. Then the CDF of <math>X</math> is given by
 
: <math>F_X(x;\lambda) = \begin{cases}
 
1-e^{-\lambda x} & x \ge 0, \\
 
0 & x < 0.
 
\end{cases}</math>
 
Here λ > 0 is the parameter of the distribution, often called the rate parameter.
 
Suppose <math>X</math> is [[Normal distribution|normal distributed]]. Then the CDF of <math>X</math> is given by
 
: <math>
 
F(x;\mu,\sigma)
 
=
 
\frac{1}{\sigma\sqrt{2\pi}}
 
\int_{-\infty}^x
 
\exp
 
\left( -\frac{(t - \mu)^2}{2\sigma^2}
 
\ \right)\, dt.
 
</math>
 
Here the parameter <math>\mu</math>  is the mean or expectation of the distribution; and <math>\sigma</math>  is its standard deviation.
 
Suppose <math>X</math> is [[Binomial distribution|binomial distributed]]. Then the CDF of <math>X</math> is given by
 
: <math>F(k;n,p)=\Pr(X\leq k)=\sum _{i=0}^{\lfloor k\rfloor }{n \choose i}p^{i}(1-p)^{n-i}</math>
 
Here parameters <math>n</math> and <math>p</math> is the discrete probability distribution of the number of successes in a sequence of n independent experiments, and <math>\lfloor k\rfloor\,</math> is the "floor" under <math>k</math>, i.e. the [[greatest integer]] less than or equal to <math>k</math>.
<br />
==Derived functions==
===Complementary cumulative distribution function (tail distribution)===<!-- This section is linked from [[Power law]], [[Stretched exponential function]] and [[Weibull distribution]] -->
 
In [[survival analysis]], <math>\bar F_X(x)</math> is called the '''[[survival function]]''' and denoted <math> S(x) </math>, while the term ''reliability function'' is common in [[engineering]].
 
Z-table:
 
One of the most popular application of cumulative distribution function is [[standard normal table]], also called the '''unit normal table''' or '''Z table'''<ref>{{Cite web|url=https://www.ztable.net/|title=Z Table|last=|first=|date=|website=Z Table|language=en-US|url-status=live|archive-url=|archive-date=|access-date=2019-12-11}}</ref>, is the value of cumulative distribution function of the normal distribution. It is very useful to use Z-table not only for probabilities below a value which is the original application of cumulative distribution function, but also above and/or between values on standard normal distribution, and it was further extended to any normal distribution.
 
<br />
 
;Properties
# If <math>Y</math> has a <math>U[0, 1]</math> distribution then <math>F^{-1}(Y)</math> is distributed as <math>F</math>. This is used in [[random number generation]] using the [[inverse transform sampling]]-method.
# If <math>\{X_\alpha\}</math> is a collection of independent <math>F</math>-distributed random variables defined on the same sample space, then there exist random variables <math>Y_\alpha</math> such that <math>Y_\alpha</math> is distributed as <math>U[0,1]</math> and <math>F^{-1}(Y_\alpha) = X_\alpha</math> with probability 1 for all <math>\alpha</math>.
 
The inverse of the cdf can be used to translate results obtained for the uniform distribution to other distributions.
 
<br />
 
=== '''Empirical distribution function''' ===
The [[empirical distribution function]] is an estimate of the cumulative distribution function that generated the points in the sample. It converges with probability 1 to that underlying distribution. A number of results exist to quantify the rate of convergence of the empirical distribution function to the underlying cumulative distribution function.
 
==Multivariate case==
equal to <math>y</math>.
 
Example of joint cumulative distribution function:
 
For two continuous variables X and Y: P((a<X<b)and(c<Y<d))=<math>\int\limits_{a}^{b} \int\limits_{c}^{d} f(x,y)dy dx</math>;
 
For two discrete random variables, it is beneficial to generate a table of probabilities and address the cumulative probability for each potential range of X and Y, and here is the example<ref>{{Cite web|url=https://math.info/Probability/Joint_CDF/|title=Joint Cumulative Density Function (CDF)|website=math.info|access-date=2019-12-11}}</ref>:
 
given the joint probability density function in tabular form, determine the joint cumulative distribution function.
{| class="wikitable"
|
|Y=2
|Y=4
|Y=6
|Y=8
|-
|X=1
|0
|0.1
|0
|0.1
|-
|X=3
|0
|0
|0.2
|0
|-
|X=5
|0.3
|0
|0
|0.15
|-
|X=7
|0
|0
|0.15
|0
|}
Solution: using the given table of probabilities for each potential range of X and Y, the joint cumulative distribution function may be constructed in tabular form:
{| class="wikitable"
|
|Y<2
|2≤Y<4
|4≤Y<6
|6≤Y<8
|Y≤8
|-
|X<1
|0
|0
|0
|0
|0
|-
|1≤X<3
|0
|0
|0.1
|0.1
|0.2
|-
|3≤X<5
|0
|0
|0.1
|0.3
|0.4
|-
|5≤X<7
|0
|0.3
|0.4
|0.6
|0.85
|-
|X≤7
|0
|0.3
|0.4
|0.75
|1
|}
<br />
===Definition for more than two random variables===
For <math>N</math> random variables <math>X_1,\ldots,X_N</math>, the joint CDF <math>F_{X_1,\ldots,X_N}</math> is given by
5

edits