# Atkinson–Stiglitz theorem

The Atkinson–Stiglitz theorem is a theorem of public economics which states "that, where the utility function is separable between labor and all commodities, no indirect taxes need be employed" if non-linear income taxation can be used by the government and was developed in a seminal article by Joseph Stiglitz and Anthony Atkinson in 1976. The Atkinson–Stiglitz theorem is generally considered to be one of the most important theoretical results in public economics and spawned a broad literature which delimited the conditions under which the theorem holds, e.g. Saez (2002) which showed that the Atkinson–Stiglitz theorem does not hold if households have heterogeneous rather than homogeneous preferences. In practice the Atkinson–Stiglitz theorem has often been invoked in the debate on optimal capital income taxation: Because capital income taxation can be interpreted as the taxation of future consumption in excess of the taxation of present consumption, the theorem implies that governments should abstain from capital income taxation if non-linear income taxation is an option since capital income taxation would not improve equity by comparison to the non-linear income tax, while additionally distorting savings.

## Optimal taxation

For an individual whose wage is $w$ , its budget constraint is given by

$\sum _{j}q_{j}x_{j}=\sum _{j}(x_{j}+t_{j}(x_{j}))=wL-T(wL)\;,$

where $q_{i}$  and $x_{i}$  are the price and the purchase of the i-th commodity, respectively.

To maximise the utility function, the first order condition is:

$U_{j}={\frac {(1+t'_{j})(-U_{L})}{w(1-T')}}\;(j=1,2,...,N).$

The government maximises the social welfare function, and so

$\int _{0}^{\infty }\left[wL-\sum _{j}x_{j}-{\overline {R}}\right]dF=0\;.$

Then we use a density function $f$  to express the Hamiltonian:

$H=\left[G(U)-\lambda \left\lbrace wL-\sum _{j}x_{j}-{\overline {R}}\right\rbrace \right]f-\mu \theta U_{L}\;.$

Taking its variation with regard to $x_{j}$ , we use the condition for its maximum.

$-\lambda \left[\left({\frac {\partial x_{1}}{\partial x_{j}}}\right)_{U}+1\right]-{\frac {\mu \theta }{f}}\left[{\frac {\partial ^{2}U}{\partial x_{1}\partial L}}\left({\frac {\partial x_{1}}{\partial x_{j}}}\right)_{U}+{\frac {\partial ^{2}U}{\partial x_{j}\partial L}}\right]=0\;.$

Then the following relation holds:

$\left({\frac {\partial x_{1}}{\partial x_{j}}}\right)_{U}=-{\frac {U_{j}}{U_{1}}}=-{\frac {1+t'_{j}}{1+t'_{1}}}\;.$

Substituting this relation into the above condition yields:

$\lambda \left[{\frac {1+t'_{j}}{1+t'_{1}}}-1\right]={\frac {\mu \theta U_{j}}{f}}\left[{\frac {\partial ^{2}U}{\partial L\partial x_{j}}}\cdot {\frac {1}{U_{j}}}-{\frac {\partial ^{2}U}{\partial L\partial x_{1}}}\cdot {\frac {1}{U_{1}}}\right]={\frac {\mu \theta U_{j}}{f}}{\frac {\partial }{\partial L}}\left(\ln {U_{j}}-\ln {U_{1}}\right)\;,$

and we obtain

$\lambda \left[{\frac {1+t'_{j}}{1+t'_{1}}}-1\right]={\frac {\mu \theta U_{j}}{f}}{\frac {\partial }{\partial L}}\left(\ln {\frac {U_{j}}{U_{1}}}\right)\;.$

Note that there is no loss of generality in setting $t'_{1}$  zero, therefore we put $t'_{1}=0$ . Since $U_{j}=(1+t'_{j})\alpha$ , we have

${\frac {t'_{j}}{1+t'_{j}}}={\frac {\mu \theta \alpha }{\lambda f}}{\frac {\partial }{\partial L}}\left(\ln {\frac {U_{j}}{U_{1}}}\right)\;.$

Thus it turns out that no indirect taxation need to be employed, i.e. $t_{j}=0$ , provided that the utility function is weakly separable between labour and all consumption goods.

## Other approach

Joseph Stiglitz explains why indirect taxation is unnecessary, viewing the Atkinson-Stiglitz theorem from a different perspective.

### Basic concepts

Suppose that those who are in category 2 are the more able. Then, for Pareto efficient taxation at which a government aims, we impose two conditions. The first condition is that the utility of category 1 is equal to or more than a given level:

${\overline {U}}_{1}\leq V_{1}(C_{1},Y_{1})\quad .$

The second condition is that the government revenue $R$ , which is equal to or more than the revenue requirement ${\overline {R}}$ , is increased by a given amount:

$R=-(C_{1}-Y_{1})N_{1}-(C_{2}-Y_{2})N_{2}\;,$
${\overline {R}}\leq R\;,$

where $N_{1}$  and $N_{2}$  indicate the number of individuals of each type. Under these conditions, the government needs to maximise the utility $V_{2}(C_{2},Y_{2})$  of category 2. Then writing down the Lagrange function for this problem:

${\mathcal {L}}=V_{2}(C_{2},Y_{2})+\mu V_{1}(C_{1},Y_{1})+\lambda _{2}(V_{2}(C_{2},Y_{2})-V_{2}(C_{1},Y_{1}))+\lambda _{1}(V_{1}(C_{1},Y_{1})-V_{1}(C_{2},Y_{2}))+\gamma \left(-(C_{1}-Y_{1})N_{1}-(C_{2}-Y_{2})N_{2}-{\overline {R}}\right)\;,$

which ensures satisfaction of the self-selection constraints, we obtain the first order conditions:

$\mu {\frac {\partial V_{1}}{\partial C_{1}}}-\lambda _{2}{\frac {\partial V_{2}}{\partial C_{1}}}+\lambda _{1}{\frac {\partial V_{1}}{\partial C_{1}}}-\gamma N_{1}=0\;,$
$\mu {\frac {\partial V_{1}}{\partial Y_{1}}}-\lambda _{2}{\frac {\partial V_{2}}{\partial Y_{1}}}+\lambda _{1}{\frac {\partial V_{1}}{\partial Y_{1}}}+\gamma N_{1}=0\;,$
${\frac {\partial V_{2}}{\partial C_{2}}}+\lambda _{2}{\frac {\partial V_{2}}{\partial C_{2}}}-\lambda _{1}{\frac {\partial V_{1}}{\partial C_{2}}}-\gamma N_{2}=0\;,$
${\frac {\partial V_{2}}{\partial Y_{2}}}+\lambda _{2}{\frac {\partial V_{2}}{\partial Y_{2}}}-\lambda _{1}{\frac {\partial V_{1}}{\partial Y_{2}}}+\gamma N_{2}=0\;.$

For the case where $\lambda _{1}=0$  and $\lambda _{2}=0$ , we have

${\frac {\partial V_{i}/\partial Y_{i}}{\partial V_{i}/\partial C_{i}}}+1=0\;,$

for $i=1,2$ , and therefore the government can achieve a lump-sum taxation. For the case where $\lambda _{1}=0$  and $\lambda _{2}>0$ , we have

${\frac {\partial V_{2}/\partial Y_{2}}{\partial V_{2}/\partial C_{2}}}+1=0\;,$

and we find that the marginal tax rate for category 2 is zero. And as to category 1, we have

${\frac {\partial V_{1}/\partial Y_{2}}{\partial V_{1}/\partial C_{1}}}=-{\frac {1-\lambda _{2}(\partial V_{2}/\partial Y_{1})/N_{1}\gamma }{1+\lambda _{2}(\partial V_{2}/\partial C_{1})/N_{1}\gamma }}\;.$

If we put $\delta _{i}={\frac {\partial V_{i}/\partial Y_{1}}{\partial V_{i}/\partial C_{1}}}\;,\quad (i=1,2)$ , then the marginal tax rate for category 1 is $\delta _{1}+1$ .

Also, we have the following expression:

$\delta _{1}=-\left({\frac {1-\nu \delta _{2}}{1+\nu }}\right)\;,$

where we denote $\nu$  by

$\nu ={\frac {\lambda _{2}(\partial V_{2}/\partial C_{1})}{N_{1}\gamma }}\;.$

Therefore, by assumption, $\delta _{1}<\delta _{2}$ , and so we can directly prove that $-1<\delta _{1}<\delta _{2}$ . Accordingly, we find that the marginal tax rate for category 1 is positive.

For the case where $\lambda _{1}>0$  and $\lambda _{2}=0$ , the marginal tax rate for category 2 is negative. The lump-sum tax imposed on an individual of category 1 would become larger than that for category 2, if the lump-sum tax were feasible.

### Various commodities

Now we need to consider a case where income level and several commodities are observals.[clarification needed] Each individual's consumption function is expressed in a vector form as

${\textbf {C}}_{1}=\sum _{j}C_{1j}{\textbf {e}}_{j}$
${\textbf {C}}_{2}=\sum _{j}C_{2j}{\textbf {e}}_{j}\;.$

In this case, the government's budget constraint is

$R\leq \sum _{k=1}^{2}(Y_{k}N_{k})-N_{1}\sum _{j}C_{1j}-N_{2}\sum _{j}C_{2j}\;.$

Then we have

$\mu {\frac {\partial V_{1}}{\partial C_{1j}}}-\lambda _{2}{\frac {\partial V_{2}}{\partial C_{1j}}}+\lambda _{1}{\frac {\partial V_{1}}{\partial C_{1j}}}-\gamma N_{1}=0\;,$
$\mu {\frac {\partial V_{1}}{\partial Y_{1}}}-\lambda _{2}{\frac {\partial V_{2}}{\partial Y_{1}}}+\lambda _{1}{\frac {\partial V_{1}}{\partial Y_{1}}}+\gamma N_{1}=0\;,$
${\frac {\partial V_{2}}{\partial C_{2j}}}+\lambda _{2}{\frac {\partial V_{2}}{\partial C_{2j}}}-\lambda _{1}{\frac {\partial V_{1}}{\partial C_{2j}}}-\gamma N_{2}=0\;,$
${\frac {\partial V_{2}}{\partial Y_{2}}}+\lambda _{2}{\frac {\partial V_{2}}{\partial Y_{2}}}-\lambda _{1}{\frac {\partial V_{1}}{\partial Y_{2}}}+\gamma N_{2}=0\;.$

Here we restricting ourselves to the case where $\lambda _{1}=0$  and $\lambda _{2}>0$ . It follows that

${\frac {\frac {\partial V_{2}}{\partial C_{2j}}}{\frac {\partial V_{2}}{\partial C_{2n}}}}=1\;,\quad {\frac {\frac {\partial V_{2}}{\partial C_{2j}}}{\frac {\partial V_{2}}{\partial Y_{2}}}}=1\;.$

Suppose all individuals have the same indifference curve in C-L plane. The separability between leisure and consumption enables us to have ${\frac {\partial ^{2}U_{k}}{\partial C_{kj}\partial L_{k}}}=0\;,$  which yields

${\frac {\partial V_{1}}{\partial C_{1j}}}={\frac {\partial V_{2}}{\partial C_{1j}}}\;.$

As a result, we obtain

${\frac {\frac {\partial V_{1}}{\partial C_{1j}}}{\frac {\partial V_{1}}{\partial C_{1n}}}}=1\;.$

Thus we find that it is unnecessary to impose taxes on commodities.

### Conditions for randomization

We need to consider a case where high ability individuals (who usually earn more money to show their ability) pretend to be like they are not more able. In this case, it could be argued that the government needs to randomize the taxes imposed on the low ability individuals, for the purpose of increasing the effectiveness of screening. It is possible that under certain conditions we can do the randomization of the taxes without damaging the low ability individuals, and therefore we discuss the conditions. For the case where an individual chooses to show his ability, we see a tax schedule be related to $\lbrace C_{2}^{*},Y_{2}^{*}\rbrace$ . For the case where an individual chooses to hide his ability, we see one of two tax schedules: $\lbrace C_{1}^{*},Y_{1}^{*}\rbrace$  and $\lbrace C_{1}^{**},Y_{1}^{**}\rbrace$ . The randomization is done so that the risk of the former case should differ from that of the latter.

To avoid hitting the low ability group, the mean consumption must be shifted upwards at each $Y$ . As the comsumption is maximized, a higher ${\overline {C}}_{1}$  is set for a higher ${\overline {Y}}_{1}$ . Then the relations between those variables are

$C_{1}^{*}={\overline {C}}_{1}+h\;,\quad Y_{1}^{*}={\overline {Y}}_{1}+\lambda h$
$C_{1}^{**}={\overline {C}}_{1}-h\;,\quad Y_{1}^{**}={\overline {Y}}_{1}-\lambda h\;.$

The utility function is $V_{2}(C_{1}^{*},Y_{1}^{*})$  and $V_{2}(C_{1}^{**},Y_{1}^{**})$ , and we have the condition for the optimum:

$V_{2C^{*}}(d{\overline {C}}_{1}+dh)+V_{2Y^{*}}(d{\overline {Y}}_{1}+\lambda dh)+V_{2C^{**}}(d{\overline {C}}_{1}-dh)+V_{2Y^{**}}(d{\overline {Y}}_{1}-\lambda dh)=0\;,$

and likewise

$V_{1C^{*}}(d{\overline {C}}_{1}+dh)+V_{1Y^{*}}(d{\overline {Y}}_{1}+\lambda dh)+V_{1C^{**}}(d{\overline {C}}_{1}-dh)+V_{1Y^{**}}(d{\overline {Y}}_{1}-\lambda dh)=0\;.$

And accordingly we have

${\begin{bmatrix}SV_{2C}&SV_{2Y}\\SV_{1C}&SV_{1Y}\end{bmatrix}}{\begin{bmatrix}d{\overline {C}}\\d{\overline {Y}}\end{bmatrix}}=-{\begin{bmatrix}DV_{2C}+\lambda DV_{2Y}\\DV_{1C}+\lambda DV_{1C}\end{bmatrix}}dh\;,$

where $SV_{kC}=V_{kC^{*}}+V_{kC^{**}}$  and $SV_{kY}=V_{kY^{*}}+V_{kY^{**}}$  and $k=1,2$ . Similarly $DV_{kC}=V_{kC^{*}}-V_{kC^{**}}$  and $DV_{kY}=V_{kY^{*}}-V_{kY^{**}}$ .

Then we have

$\lim _{h\rightarrow 0}{\frac {d({\overline {Y}}-{\overline {C}})}{dh}}={\frac {F_{1}-F_{2}}{(-2)(MRS_{1}-MRS_{2})}}\;,$

where $MRS_{k}=-({\frac {\partial V_{k}}{\partial C_{1}}})^{-1}{\frac {\partial V_{k}}{\partial Y_{1}}}$ . As to $F_{1},F_{2}$  we denote them by $F_{1}=({\frac {\partial V_{2}}{\partial C_{1}}})^{-1}M_{2}(1-MRS_{1})$  and $F_{2}=({\frac {\partial V_{1}}{\partial C_{1}}})^{-1}M_{1}(1-MRS_{2})$ . Also we define $M_{k}$  by $M_{k}=DV_{kC}+\lambda DV_{kY}$ . But the first derivative of ${\overline {Y}}-{\overline {C}}$  with regard to $h$ , at $h=0$ , is zero (because $M_{k}=0$ ), and so we need to calculate its second derivative.

${\frac {d^{2}({\overline {Y}}-{\overline {C}})}{dh^{2}}}=H_{1}+H_{2}\;,$

where $H_{1}={\frac {d(F_{1}-F_{2})}{dh}}{\frac {1}{-2(MRS_{1}-MRS_{2})}}$  and $H_{2}=(-1){\frac {d({\overline {Y}}-{\overline {C}})}{dh}}{\frac {d\ln {(-2)(MRS_{1}-MRS_{2})}}{dh}}$ . And so $H_{2}$  vanishes at $h=0$ . Then we have

${\frac {d^{2}({\overline {Y}}-{\overline {C}})}{dh^{2}}}={\frac {I_{1}+I_{2}}{(-1)(MRS_{1}-MRS_{2})}}\;\;.$
$I_{1}=(V_{2CC}+2\lambda V_{2CY}+\lambda ^{2}V_{2YY})({\frac {\partial V_{2}}{\partial C_{1}}})^{-1}(1-MRS_{1})$
$I_{2}=(-1)(V_{1CC}+2\lambda V_{1CY}+\lambda ^{2}V_{1YY})({\frac {\partial V_{1}}{\partial C_{1}}})^{-1}(1-MRS_{2})$

Since $MRS_{2} , we obtain the condition under which randomization is desirable:

$(V_{2CC}+2\lambda V_{2CY}+\lambda ^{2}V_{2YY})(V_{1C_{1}}+V_{2Y_{1}})-(V_{1CC}+2\lambda V_{1CY}+\lambda ^{2}V_{2YY})(V_{2C_{1}}+V_{2Y_{1}})<0\;.$