Control variates: Difference between revisions

some links; expanded example a bit
(add ISBN, add {{refimprove}})
(some links; expanded example a bit)
==Underlying principle==
Let the unknown [[Parameter#Statistics_and_econometrics|parameter]] of interest be <math>\mu</math>, and assume we have a [[statistic]] <math>m</math> such that the [[expected value]] of ''m'' is &mu;: <math>\mathbb{E}\left[m\right]=\mu</math>, i.e. ''m'' is an [[bias of an estimator|unbiased estimator]] for &mu;. Suppose we calculate another statistic <math>t</math> such that <math>\mathbb{E}\left[t\right]=\tau</math> is a known value. Then
:<math>m^\star = m + c\left(t-\tau\right) \, </math>
is also [[bias of an estimator|an unbiased estimator]] for <math>\mu</math> for any choice of the coefficient <math>c</math>.
The [[variance]] of the resulting estimator <math>m^{\star}</math> is
:<math>\rho_{m,t}=\textrm{Corr}\left(m,t\right); \, </math>
hence,is the term [[variancecorrelation reductioncoefficient]] of ''m'' and ''t''. The greater the value of <math>\vert\rho_{mtm,t}\vert</math>, the greater the [[variance reduction]] achieved.
In the case that <math>\textrm{Cov}\left(m,t\right)</math>, <math>\textrm{Var}\left(t\right)</math>, and/or <math>\rho_{mtm,t}\;</math> are unknown, they can be estimated across the Monte Carlo replicates. This is equivalent to solving a certain [[least squares]] system; therefore this technique is also known as '''regression sampling'''.
We would like to estimate
:<math>I = \int_0^1 \frac{1}{1+x} \, \mathrm{d}x.</math>
The exact result is <math>I=\ln 2 \approx 0.69314718</math>. Usingusing [[Monte Carlo integration]],. thisThis integral can be seen asis the expected value of <math>f(U)</math>, where
The exact result is <math>I=\ln 2 \approx 0.69314718</math>. Using [[Monte Carlo integration]], this integral can be seen as the expected value of <math>f(U)</math>, where
:<math>f(x) = \frac{1}{1+x}</math>
and ''U'' follows a [[uniform distribution (continuous)|uniform distribution]]&nbsp;[0,&nbsp;1].
Using a sample of size '''n''' denote the points in the sample as <math>u_1, \cdots, u_n</math>. Then the estimate is given by
:<math>I \approx \frac{1}{n} \sum_i f(u_i); </math>
IfNow we introduce <math>Tg(x) =\int_0^1 1+x \, \mathrm{d}x. </math> as a control variate with a known expected value <math>\textrmmathbb{E}\left[Tg\left(U\right)\right]=\int_0^1 1+x \, \mathrm{d}x=\frac{3}{2} </math> and combine the two into a new estimate
:<math>I \approx \frac{1}{n} \sum_i f(u_i)+c\left(\frac{1}{n}\sum_i g(u_i) -3/2\right). </math>
Using <math>n=1500</math> realizations and an estimated optimal coefficient <math> c^\star \approx 0.4773 </math> we obtain the following results
The variance was significantly reduced after using the control variates technique. (The exact result is <math>I=\ln 2 \approx 0.69314718</math>.)
==See also==