# Control variates

The control variates method is a variance reduction technique used in Monte Carlo methods. It exploits information about the errors in estimates of known quantities to reduce the error of an estimate of an unknown quantity.[1][2][3]

## Underlying principle

Let the unknown parameter of interest be ${\displaystyle \mu }$ , and assume we have a statistic ${\displaystyle m}$  such that the expected value of m is μ: ${\displaystyle \mathbb {E} \left[m\right]=\mu }$ , i.e. m is an unbiased estimator for μ. Suppose we calculate another statistic ${\displaystyle t}$  such that ${\displaystyle \mathbb {E} \left[t\right]=\tau }$  is a known value. Then

${\displaystyle m^{\star }=m+c\left(t-\tau \right)\,}$

is also an unbiased estimator for ${\displaystyle \mu }$  for any choice of the coefficient ${\displaystyle c}$ . The variance of the resulting estimator ${\displaystyle m^{\star }}$  is

${\displaystyle {\textrm {Var}}\left(m^{\star }\right)={\textrm {Var}}\left(m\right)+c^{2}\,{\textrm {Var}}\left(t\right)+2c\,{\textrm {Cov}}\left(m,t\right).}$

It can be shown that choosing the optimal coefficient

${\displaystyle c^{\star }=-{\frac {{\textrm {Cov}}\left(m,t\right)}{{\textrm {Var}}\left(t\right)}}}$

minimizes the variance of ${\displaystyle m^{\star }}$ , and that with this choice,

{\displaystyle {\begin{aligned}{\textrm {Var}}\left(m^{\star }\right)&={\textrm {Var}}\left(m\right)-{\frac {\left[{\textrm {Cov}}\left(m,t\right)\right]^{2}}{{\textrm {Var}}\left(t\right)}}\\&=\left(1-\rho _{m,t}^{2}\right){\textrm {Var}}\left(m\right)\end{aligned}}}

where

${\displaystyle \rho _{m,t}={\textrm {Corr}}\left(m,t\right)\,}$

is the correlation coefficient of ${\displaystyle m}$  and ${\displaystyle t}$ . The greater the value of ${\displaystyle \vert \rho _{m,t}\vert }$ , the greater the variance reduction achieved.

In the case that ${\displaystyle {\textrm {Cov}}\left(m,t\right)}$ , ${\displaystyle {\textrm {Var}}\left(t\right)}$ , and/or ${\displaystyle \rho _{m,t}\;}$  are unknown, they can be estimated across the Monte Carlo replicates. This is equivalent to solving a certain least squares system; therefore this technique is also known as regression sampling.

When the expectation of the control variable, ${\displaystyle \mathbb {E} \left[t\right]=\tau }$ , is not known analytically, it is still possible to increase the precision in estimating ${\displaystyle \mu }$  (for a given fixed simulation budget), provided that the two conditions are met: 1) evaluating ${\displaystyle t}$  is significantly cheaper than computing ${\displaystyle m}$ ; 2) the magnitude of the correlation coefficient ${\displaystyle |\rho _{m,t}|}$  is close to unity. [3]

## Example

We would like to estimate

${\displaystyle I=\int _{0}^{1}{\frac {1}{1+x}}\,\mathrm {d} x}$

using Monte Carlo integration. This integral is the expected value of ${\displaystyle f(U)}$ , where

${\displaystyle f(U)={\frac {1}{1+U}}}$

and U follows a uniform distribution [0, 1]. Using a sample of size n denote the points in the sample as ${\displaystyle u_{1},\cdots ,u_{n}}$ . Then the estimate is given by

${\displaystyle I\approx {\frac {1}{n}}\sum _{i}f(u_{i}).}$

Now we introduce ${\displaystyle g(U)=1+U}$  as a control variate with a known expected value ${\displaystyle \mathbb {E} \left[g\left(U\right)\right]=\int _{0}^{1}(1+x)\,\mathrm {d} x={\tfrac {3}{2}}}$  and combine the two into a new estimate

${\displaystyle I\approx {\frac {1}{n}}\sum _{i}f(u_{i})+c\left({\frac {1}{n}}\sum _{i}g(u_{i})-3/2\right).}$

Using ${\displaystyle n=1500}$  realizations and an estimated optimal coefficient ${\displaystyle c^{\star }\approx 0.4773}$  we obtain the following results

 Estimate Variance Classical estimate 0.69475 0.01947 Control variates 0.69295 0.00060

The variance was significantly reduced after using the control variates technique. (The exact result is ${\displaystyle I=\ln 2\approx 0.69314718}$ .)