Decomposition of GINI

advertisement
Draft
Comments appreciated
Decomposition of Inequality Based on Incomplete Information
A contributed paper to the IARIW 24th General Conference
Lillehammer, Norway, August 18-24, 1996
Yuri Dikhanov
Statistical Advisory Services
International Economics Department, IECDD
The World Bank
1818 H Street, N.W.
Room N2-038
Washington, D.C. 20433 U.S.A.
phone: (202)458-2667
fax:
(202)522-3669
e-mail: ydikhanov@worldbank.org
Abstract
In this paper, the author examines five measures of inequality: the Gini coefficient, two entropy
(Theil) indexes, normalized variance and decile ratio. It is shown how to decompose these
indexes into intra-group and between-group inequalities. These indexes are used to study
inequalities in the former Soviet republics in 1990. This study is based on incomplete
information on income intervals (only income boundaries and population shares have been
used). The robustness of the approximating procedure (piecewise polynomial interpolation of the
cumulative distribution function) is discussed. Two alternative representations of the Gini
coefficient are discussed as well.
The views presented in this paper are the author’s and do not
necessarily represent those of the World Bank or its Board of Directors.
I. Introduction
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
2
Analysis of income or wealth distribution often includes decomposing inequality for total
population into between-group and within-group inequalities. Not all inequality measures
are decomposable, and not all of the decomposable ones are decomposable in the same
way. Theoretically, the second Theil measure (T2) has probably the best properties. The
Gini index, however, is the most widely cited measure. In this paper we made an attempt
to decompose the Gini index in a meaningful way (see Section IV).
The Gini index, along with the two Theil measures, normalized variance and decile
coefficient, was then used to analyze income inequalities in the former Soviet Union and
its Republics in 1990 (see Section II and Annex). We found that the share of inter-group
inequality in total inequality was in the range of 7.7-15.8 percent, depending on the
index. As inputs into this exercise, we used official data on intervals: seven interval
boundaries and population shares within these boundaries for each of the former Soviet
Republics.
To process these discrete data we used interpolation with polynomial of order four on
each interval. These polynomials are chosen to be twice continuously differentiable in all
points of the distribution, which allows differential and integral operations with a
distribution function and its derivatives in explicit form. Section III discusses the
robustness of these procedure using two numerical examples: a “bad” one, a hypothetical
mixture of two normal distributions with different means and variances, presented as five
income intervals (quintiles); and a “good” one, a log-normal distribution, presented as ten
intervals. As expected, in the “good” case the precision of approximation is by one or
two order of magnitude better than in the “bad” case (0.004-0.39 percent depending on
the parameter versus 0.2 - 1.3 percent).
Section V discusses two alternative graphical and analytical representations of the Gini
coefficients that are based on the original distribution function rather than on the Lorenz
curve.
II. Decomposition of income inequalities in the former Soviet Union.
There were two major reasons we used the former Soviet Union data from 1990: first, the
data were available (there were not many countries where regional inequality data were
collected on the regular and comparable basis); and, second, since 1990 the former Soviet
Republics have become independent countries, and as economies in transition, they
attract the special attention of academics and policy makers.
Original information included boundaries and population shares for seven intervals (see
Table 1 below). To process this information we used a version of our Gini ToolPak.
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
3
Georgia
Kazakhstan
Uzbekistan
10.0
14.4
31.1
21.5
11.9
6.0
5.1
34.1
23.0
26.8
10.1
3.7
1.4
0.9
Turkmenistan
Azerbaijan
6.5
11.2
28.7
23.1
14.5
8.2
7.8
Tajikistan
Armenia
29.7
19.7
26.8
13.0
6.0
2.7
2.1
Kyrgyzstan
Moldova
5.4
11.3
31.6
24.6
14.3
7.1
5.7
Estonia
6.1
12.5
32.9
24.5
13.0
6.4
4.6
Belarus
1.2
4.5
20.9
25.8
20.5
13.3
13.8
Ukraine
7.8 3.2 2.7 1.5 0.6 0.9
10.6 8.2 8.6 5.9 2.7 3.8
28.0 27.2 31.2 27.0 15.4 19.5
23.9 26.0 28.0 28.9 23.6 26.1
14.9 17.3 16.2 19.1 21.7 21.3
8.0 9.6 7.9 10.0 16.2 13.9
6.8 8.5 5.4 7.6 19.8 14.5
Russia
Lithuania
<75
75-100
100-150
150-200
200-250
250-300
>300
USSR
Interval
boundaries
Latvia
Table 1. Original data on income distribution shares in the former Soviet Union for 1990
24.8 45.1 26.9
21.7 22.7 22.3
30.8 21.6 29.6
13.7 6.8 12.7
5.5 2.4 5.1
2.1 0.9 2.0
1.4 0.5 1.4
The overall results can be assessed from Figure 0-2 from the Annex that represents
normalized values of various inequality measures (inequality indexes normalized by their
standard deviations). As we can see, the lowest inequality was observed in Belarus and
Ukraine, followed by Estonia, Latvia and Lithuania. That the Baltics had higher
inequality than Belarus and Ukraine has to be attributed to the fact that, although
minimum wages were the same in all of these former republics, the means were higher in
the Baltics. Russia had a higher income inequality than these economies, which is to be
expected given her size. A factor that additionally increased the inequality for Russia was
the relatively high prices (and hence wages) in Siberia. The highest inequality was
registered for Azerbaijan and the Central Asian states (Uzbekistan, Kyrgyzstan,
Turkmenistan and Tajikistan). The results for Azerbaijan are not obvious given the much
lower numbers for neighboring Armenia and Georgia.
Another piece of information that Figure 2 provides relates to the correlation between the
indexes. We can see that, in general, all the indexes for this set of countries produce
highly consistent results. Table 2 of the Annex provides correlation coefficients. As we
can see, one of the highest values of correlation coefficients is observed for the Theil1Theil2 pair: r2=0.9964. By absolute value, the difference between them is around 2
percent, which can be seen as a measure of the deviation of the actual distribution from
the log-normal one (as we know, under the assumption of log-normal distribution, the
two Theil indexes coincide). As we can see, for some economies the deviation between
the two Theil indexes is insignificant: 0.1-0.2 percent - though a part of that can be
attributed to the fact that the approximation errors go in opposite directions. The two
Theil indexes and the Gini coefficient are correlated even tighter: r2=0.9979-0.9987.
Also a very high correlation was registered for the Theil1-Decile ratio pair: r2=0.9980.
Tight correlation is also observed for the Theil 2 - Decile ratio pair: r2=0.9932. The
lowest value of correlation coefficient is registered for the Variance-Decile ratio pair:
r2=0.9908. We have to say, though, that this value is still very significant. The overall
conclusion is that all these inequality measures produce coherent results.
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
4
Table 1 of the Annex provides the results of actual estimations. Shares of inter-group
variance presented in the table are of special significance for this paper. The two Theil
indexes and variance display similar results in the 12.9-15.8 percent range. The share of
inter-group variances for the Gini coefficient, on the other hand, is only 7.7 percent,
which is roughly half of those for other measures. One has to bear in mind, however, that
the ways these indexes decompose are different, and, thus, are not directly comparable.
The two Theil indexes, for example, produce identical results only under the assumption
of “log-normality” of the distribution. However, shares of inter-group variances will still
be different because they are aggregated with different weights (income and population
shares, respectively). The inter-group results produced using the second Theil index
(0.0170) can be compared to those estimated by H. Theil (1989, Development of
International Inequality, Journal of Econometrics, Vol. 42, No. 1, North-Holland). For
1985, he found the inequality between the OECD countries (without Australia) to be
0.0859; for tropical America, 0.0580; for tropical Asia, 0.2003; and for tropical Africa,
0.1871. Figure 5 of the Annex provides a graphical representation of the Theil index for
combined distribution versus the between-group Theil index.
Figure 3 of the Annex presents density functions of income distribution in the former
republics. It is interesting to note that the Estonian distribution has slight irregularities in
the upper part of the distribution. This might indicate urban/rural or Tallinn/rest of the
country income differentials1. More likely, a factor that might have contributed to that
situation was the advance of reforms in Estonia: in 1990 this country had the highest
share of non-agricultural private sector in the former Soviet Union, which provided much
higher salaries than the state sector.
Figure 4 of the Annex is a histogram on a logarithmic scale. It shows shares of
population within proportional boundaries (the next boundary is in proportion to the
previous one). It has to be noted that in this case the highest point would not be the mode
as in a distribution density function, but the mean. Using this type of histogram requires,
however, some compliance with the assumption of “log-normalness” of the distribution.
Table 2 of the Annex presents income shares by decile. That Azerbaijan had the highest
inequality and Belarus and Ukraine had the lowest can be directly inferred from this table.
III. Robustness of the computational procedure
1
Tallinn had 35 percent of Estonian population and 45 percent of the income.
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
5
For this exercise the Gini Toolpak was used. In this section we will briefly explore the
issues of robustness of the procedure. We will use two numerical examples: a “good”
case (ten income intervals for log-normal distribution); and a “bad” case (five intervals,
i.e., quintile data; for a mixture of two normal distributions with different means and
variances).
The essence of the procedure (polynomial interpolation) is the following:
Let’s assume that we are given only a set {F(Yi)} of M elements which describes values
that the cumulative distribution function takes at Yi. We need to approximate all other
points of the distribution, i.e., to estimate F(y) for y[0,+]. Within each interval
[Yi+1 ,Yi], we will interpolate the distribution function by a polynomial of the order 4 in the
form:
n
3
 y  Yi 
n
Fi ,i 1 ( y )    i 

 Yi 1  Yi 
n0
At the boundaries the polynomials are exact, and are not interpolations:
i.e., Fi ,i 1 (Yi )  Fi 1,i (Yi )  F (Yi ) .
These polynomials are chosen to be twice continuously differentiable across the
boundaries. This is a very important property, because it allows differential and integral
operations with F and its derivatives in explicit form. For example, the mean of the
M 3
nY  Y
distribution would be calculated as follows:    ydF     ni i 1 i , where M is
n 1
i  0 n 1
the number of intervals. Other characteristics of the distribution function can be derived
in a similar way.
Errors of estimation in polynomial interpolation
Using logic similar to that behind the remainder term of Taylor formula in Lagrange
form, we arrive at the following expression for estimation errors2:
4
1  Yi 1  Yi 
(4)
Fi ,i 1 ( y )  F ( y )  
 F ( )
4!  2 
where   arg max( F ( 4 ) ( y ) )
y [ Yi ,Yi 1 ]
In the case of normal (standard) distribution the above boils down to:
1
Fi ,i 1 ( y )  F ( y ) 
(Y  Y ) 4 F (1) ( )(3   3 )
384 i 1 i
2
Dividing interval [Yi+1 ,Yi] in half simply states that, because at the end of the interval the polynomial
becomes exact again, maximum errors are attained around the middle of the interval. The coefficient 1/384
[1/(24  4!)] is the absolute theoretical minimum for the errors. The minimum is attained when the
polynomial coefficients for the interval are determined (almost) independently of other intervals. In other
cases, the inequality is somewhat different, although the order of magnitude for errors remains the same.
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
6
Or, in the case when the intervals are separated by /2, we obtain that the biggest errors
will be in the interval [0.5, ] (that can be seen from the first order condition for
F (1) ( )(3   3 ) ), and the errors in this interval are expressed as follows:
1
2  32 6
Fi ,i 1 ( y )  F ( y ) 
e
 0.01%
384  16 2
A. “Good” case
As a “good” case, we used ten income intervals for the log-normal distribution
LN(5,0.25).
The results are presented in the table below. Graphical results are presented in Figure 1.
As can be seen from the graph, the actual distribution cannot be readily distinguished
from the simulation. The largest difference is for the mode, which is notoriously difficult
to get.
Actual
values
Mean Income
153.12
Gini-coefficient
0.14032
Median Income
148.41
Mode Income
139.42
Variance
38.887
Income less than mean 0.5497
Theil index
0.03125
Theil index 2
0.03125
Simulation Difference
153.09
0.14023
148.41
139.97
38.923
0.5494
0.03123
0.03126
-0.02%
-0.06%
0.00%
0.39%
0.09%
-0.06%
-0.07%
0.03%
0.009
0.008
0.007
0.006
0.005
0.004
0.003
0.002
0.001
0
0
100
200
300
400
500
Figure 1. Deviation of simulation from actual distribution: a "good" case
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
7
B. “Bad” case
As a “bad” case, we used five income intervals for the mixture of two normal
distributions N(40,10) and N(60,5). The results are presented in the tables below.
Graphical results are presented in Figure 2. As can be seen from the graph, the actual
distribution is visually readily distinguishable from the simulation. The largest difference
is again for the mode.
Inputs into the procedure
Interval boundaries
Quintiles of population
< 37.4696
Quintile I
37.4696 to 48.10972
Quintile II
48.10972 to 56.60144 Quintile III
56.60144 to 61.47081 Quintile IV
> 61.47081
Quintile V
Results of the simulation
Mean Income
Median Income
Mode Income
Income less than mean
Actual values
50.00
53.33
59.64
43.20%
Simulation
49.67
53.23
58.87
42.82%
Difference
-0.7%
-0.2%
-1.3%
-0.9%
50.0
0.1
Distribution
45.0
0.09
density
40.0
0.08
35.0
0.07
30.0
0.06
25.0
0.05
20.0
0.04
15.0
0.03
0.02
10.0
5.0
0.01
0.0
0
0
10
20
30
40
50
60
70
80
80
Figure 2. Deviation of simulation from actual distribution: "bad" case
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
8
IV. Decomposition of inequality measures
IV.1. Decomposition of GINI - coefficient
Let’s consider a distribution F defined by its cumulative distribution function F(y). The
respective distribution density function is F. The mean of that distribution is defined as
i   ydFi ( y) using Lebesgue-Stiltjes integrals.
(Hereinafter a plain integral sign
describes integrating from 0 to +). Then the essence of the Gini - coefficient can be
seen from the graph of the Lorenz curve (see Figure 3).
1
y
 0
ydF ( y )
F(y)
Figure 3. Lorenz curve
Gini-coefficient is defined as equal to twice the area between the 45 line and Lorenz
curve. Or
F
2
G  1
 (  yd)dF ,

0
or,
G 
2

F
2
 Fd (  yd)  1    FydF  1
0
Let’s consider two distributions F1 and F2, where the distributions are defined by their
respective cumulative distribution functions Fi(y). The respective distribution density
functions are Fi. Means are defined as i   ydFi . Thus, we can define Gini coefficients G for the respective functions as follows:
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
G1 
2
1
1
2
 y( F  2 )F dy    F y F dy  1
1
9
1
1
1
1
2
1
2
G2 
y ( F2  )F2dy 
F y F dy  1

2
2
2  2 2
Then, for the combined distribution we can write:
2
1
G
y ( 1 F1  2 F2  )( 1 F1 2 F2)dy

1 1  2 2
2
(1)
(2)
where:
pi i
- income share of the i distribution
p1 1  p2 2
pi
- population share
  1 1  2 2 - mean income for the combined distribution
i 
Or, after some simple operations we will receive:
G  1G1  2 G2 
2 p1 p2

 y( F
1
 F2 )(F1 F2)dy
Expression (3) is obtained as follows:
2
G
y ( p1 F1  p2 F2 )( p1 F1 p2 F2)dy  1 
p1 1  p2 2 
2

y[ p12 F1 F1 p22 F2 F2  p1 p2 ( F1 F2  F2 F1)]dy  1 

p1 1  p2 2
2
  y[ p12 F1 F1 p22 F2 F2  p1 p2 (F2( F2  F1  F2 )  F1( F1  F2  F1 ))]dy  1 

:

2

y[( p12  p1 p2 ) F1 F1 ( p22  p1 p2 ) F2 F2  p1 p2 (F2  F1)( F1  F2 )]dy  1 
2
y[ p1 F1 F1 p2 F2 F2  p1 p2 (F2  F1)( F1  F2 )]dy  1

and, because
pi  i
G  1G1   2 G2 

i
2 p1 p2

 y(F   F )( F
2
1
1
 F2 )dy
(3)
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
10
It is easy to see how the above expression can be expanded for a multi-component case:
G
2
 pi  i
 y  p F  p F dy  1 
i
i
i
i
i
i
i

1
 pi  i
 y{2 [ F F ( p   p p )]   p p ( F  F )( F  F )}dy  1 
i
2
i
i
i
j i
i
j
i
j
i
j
i
j
i, j
i

1
 pi  i
 y{2 F F p   p p ( F  F )( F  F )}dy  1
i
i
i
i
i
j
i
j
i
j
i, j
i
The above expression can be rewritten as follows:
pi p j
G    i Gi  
y ( Fi  F j)( Fi  F j )dy
 
i
i, j
(4)
And, as it is easy to see how the Gini-coefficient can be expressed through the covariance
as well:
2
COV ( y , Fi )
Yi
and the combined Gini-coefficient can be written as:
pi p j
2
G   i COV ( y, Fi )  
COV ( y, Fi  Fj )
Gi 
i
i
i, j

(5)
Or,
G
1

{2 pi COV ( y, Fi )   pi p j COV ( y, Fi  Fj )}
i
i, j
The first component stands for intra-group covariances, whereas the second stands for
inter-group covariance.
As we can see from expression (3), the Gini - coefficient for the combined distribution
consists of two parts: intra-group and inter-group variances. Similar to the Theil
coefficient T1, the individual Gini - coefficients are added up with income weights.
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
11
IV.2. Decomposition of entropy (Theil) indexes
In his book, H. Theil (1967, Economics and Information Theory, North-Holland,
Amsterdam), introduced, for income inequality measurement, the entropy measure used
in thermodynamics and information theory. He suggested using the entropy index in two
forms: as income-weighted and population-weighted entropy indexes. In this paper we
will call them T1 and T2 respectively.
These indexes can be represented as follows:
Yi
Yi
Ni
)
Y N
i Y
N
N Yi
T 2   i log( i
)
N
N Y
i
T1  
log(
where,
Yi
Ni
is income of group i;
is number of people in group i
Or, using Lebesgue-Stiltjes integrals:
T1 
y
y
  log(  ) dF ( y)

T 2   log( ) dF ( y )
y
As can be shown, these indexes are easily decomposable in the multi-group case. For the
Theil index T1 we have:
T1i  
j
Yi j
Yi
log(
Yi j
Yi
Nij
)
Ni
where:
Yij
is income of sub-group j of group i;
Nij
is number of people in sub-group j of group i;
Yi
is income of group i;
Ni
is number of people in group i
Or, using Lebesgue-Stiltjes integrals:
T1i  
y
log(
y
) dF ( y )
i
i
i
The Theil index T1 decomposes into:
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
T1   i T1i   i log(
i
)   i T1i   i log(
12
i
)

pi
i
j
T2 decomposes in a similar way with the population weights p.
i
j
As has been shown by F. Bourguignon (1979, Decomposable Income Inequality
Measures, Econometrica, Vol. 47, No. 4.), and A.F.Shorrocks (1980, Inequality
Measures, Econometrica, Vol. 48, No 3), the Theil indexes are the only income-weighted
and population-weighted indexes respectively that can be decomposed in that way: i. e.,
weighted sum of individual Theil indexes and the Theil index constructed of individual
distributions as if they were elements of the combined distribution. In this sense, the
decomposition of the Theil indexes is different from that of the Gini.
IV.3. Decomposition of normalized variance
Normalized variance can be seen as a simple way of describing income inequalities.
y yj
y yj
y
s 2 ( )   pi p j COV ( i , )   i  j COV ( i , )

 
i, j
i  j
i, j
Or,
y
y yj
y
s 2 ( )    k2 s 2 ( k )   i  j COV ( i , )

k
k
i j
i  j
IV.4. Decomposition of decile ratio
Decile ratio is a simple and transparent inequality measure, however it cannot be
meaningfully decomposed.
IV.5. Lorenz curve
The Lorenz function L is the function of income shares on population shares. The Lorenz
curve associated with this function is plotted in Figure 3. The Lorenz curve plays an
enormous role in income distribution analysis. Some important relationships between the
Lorenz curve and the cumulative distribution function, as well as a graphical
representation of the Theil index, are shown below.
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
13
L(F)=y/
F
1
0
Figure 4. First derivative of the Lorenz curve
Figure 4 shows the first derivative of the Lorenz curve, L(F). It can be easily seen that
L(F) is essentially the normalized income y/, and, thus, is the inverse (normalized) of
the cumulative distribution function. The graph is also related to the Theil (T2) index.
The logarithm of this graph is a graphical representation of the index (because the index

can be presented as T 2   log( ) dF ( y ) .
y
Log(L(F))=log(y/)
0
1
F
Figure 5. Graphical representation of the Theil index (T2)
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
14
The second derivative of the Lorenz curve is also an important characteristic of a
distribution: L(F)=yF/.. It is essentially the expression for the inverse function of a
distribution density function F(y).
L(F)=yF/
F
1
0
Figure 6. Second derivative of the Lorenz curve
IV.6. Some properties of log-normal distribution
Log-normal distribution plays an important role in inequality measurements. It is thought
that real distributions of wealth and income at least partially can be approximated by it.
An extensive treatment of the log-normal distribution is contained in J. Aitchison and
J.Brown (1957, The Lognormal Distribution, Cambridge University Press). Here we
mention just a few relevant properties.
F ( y ) 
  e

1
e
2
(ln y  ) 2
2 2
 2 / 2
Median  e
Mode  e
 2
S  e e (e  1)
2
s
S

2
 (e  1)
2
E (z )  e
m
m2
where
ln z  
z

2
2
1
y
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
15
A convenient feature of the log-normal distribution is the simplicity of the Gini and Theil
indexes:
T1   (ln y  ln  )dL
1 ln y  ln  
T1  
e

2
ln y 
 y
e

2
1
1



1

e

te t
e
2
  2 / 2

(ln y  ) 2
2 2
dy 
(ln y  ) 2
d ln y  (   2 / 2) 
2 2
( t  ) 2
2 2
dt  (   2 / 2) 

t
e
2
( t  (  2 )) 2
dt  (   2 / 2) 
2 2
    2  (   2 / 2)   2 / 2
And, in the case of the second Theil index, we can obtain the following expression:
T 2   (ln   ln y)dF
ln y  (ln2y2 )
T 2  
e
d ln y  (   2 / 2) 
2
   (   2 / 2)   2 / 2
2
We can use the test of T1=T2 to examine how close a given distribution approaches a
log-normal one.
The relationship of the Theil measures to normalized variance can be expressed as
follows:
T1  T 2 
2
2

1
ln(1  s2 )
2
In the log-normal case, we can also think of the Theil indexes as the difference between
the mean and median.
T1  T 2 
2
2
 log(

Median
)
And, finally, as can be easily seen, the Gini coefficient for the log-normal distribution can
be written as follows:
G  2( / 2)  1  2 F (e  / 2 )  1
where (.) is the standard normal cumulative distribution.
2
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
16
V. Two alternative representations of the Gini coefficient
Apart from the traditional visualization of the Gini index using the Lorenz curve, it is
possible to represent the Gini using simple graph of the distribution function. Below two
such representations are discussed.
1. Let’s start from the following expression for the Gini coefficient:
2
1
(6)
G 
y( F  )dF


2
Or, as it is easy to see, expression (6) can be written as:
2
(7)
G 
 ( y  ) FdF

Expressions (6) and (7) are equivalent to:
G 
2

(8)
Cov( y, F )
We can rewrite Expression (8) using slope coefficient as follows:
G 
2

1
 )dF
1
2
1
2
( F  ) 2 dF 
Slope( F , y )
2

2

12
1
(
F

)
dF

2
 ( y   )( F
1
because  F
2
1 2
F3 F2 F
1
  ( F  ) dF 



2
3
2
4 0 12
(9)
where Slope = slope coefficient3
Or, finally,
1
G  Slope( F , y  )
6
where y  
(10)
y

Expression (10) can be obtained from Expression (8) in a different way as well. Let’s
start from rewriting Expression (8) using the correlation coefficient4:
3
Slope( x, y) 
 ( x  E ( x))( y  E ( y))
 ( x  E ( x)) 
i
i
i
Slope( x, y) 
i
2
i
 ( x  E ( x))( y  E ( y))dF
 ( x  E ( x)) dF
2
, where  i are weights, or, in continuous case,
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
G 
2

Cov ( y , F ) 
because  F 2 
2

 ( y , F ) y  F 
(11)
1
, [see Expression (9)].
12
Now, using  ( y, F )  Slope( y, F )
G 
1 y
( y, F )
3 
17
F
, we obtain Expression (10) again:
y

1 y
1
Slope( F , y ) F  Slope( F , y  )
y 6
3 
F
1/2
y
1
Slope(F, y )=6*Gini
Figure 7. Graphical representation of the Gini coefficient as one sixth of the slope coefficient
between income y and distribution function F.
2. The next representation of the Gini coefficient can be obtained using Expression (7):
2
2
( F 2 )
2
G 
( y   ) FdF 
yFdF  1   y  dF  1 
1
(12)
 
 

4
Discrete case of using correlation coefficients in expressing Gini coefficient [Expression (11)] was shown
in Milanovic (1996)
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
18
where  ( F 2 )   ydF 2 , or the mean for the square of distribution F.
Or, equivalently:
G   y dF 2   y dF   Fdy   F 2 dy
(13)
It is easy to see that distribution F2 has all the properties of a regular distribution. F2 is a
monotonous transformation of F , and, hence, is itself a monotonously increasing
function bounded by [0,1].
Expression (12) essentially says that the Gini coefficient is equal to the difference
between regular mean  and the mean for the square of distribution ( F2). The
expression is presented in Figure 8 in graphical form. In the case when income
normalized by the mean, the Gini coefficient is equal to the area between the distribution
function F and the squared distribution function F2.
F
F
1/2

Area  = Gini
F2
1
y
Figure 8. Graphical representation of the Gini coefficient as the area between the distribution
function F and the squared distribution function F2.
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
19
References
Aitchison J. and J.Brown, 1957, The Lognormal Distribution, Cambridge
University Press, Cambridge.
Bourguignon F.,1979, Decomposable Income Inequality Measures,
Econometrica, Vol. 47, No. 4.
Shorrocks A.F., 1980, Inequality Measures, Econometrica, Vol. 48, No 3.
Theil H., 1967, Economics and Information Theory, North-Holland,
Amsterdam.
Theil H.,1989, Development of International Inequality, Journal of
Econometrics, Vol. 42, No. 1, North-Holland, Amsterdam.
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
20
ANNEX
10.0000
0.1000
Gini-coefficient
Variance
Theil index
Theil 2 index
Decile ratio
0.0100
Figure A-1. Inequality in the former Soviet Union, 1990 (various indexes by absolute value)
Turkmenistan
Tajikistan
Kyrgyzstan
Uzbekistan
Kazakhstan
Georgia
Azerbaijan
Armenia
Moldova
Lithuania
Latvia
Estonia
Belarus
Ukraine
Russia
USSR
1.0000
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
21
2.5
2
Gini-coefficient
Variance
Theil index
1.5
Theil 2 index
Decile ratio
1
0.5
-1
-1.5
Figure A-2. Correlation between various inequality measures in the former Soviet Union, 1990
(inequality indexes normalized by standard deviation)
Turkmenistan
Tajikistan
Kyrgyzstan
Uzbekistan
Kazakhstan
Georgia
Azerbaijan
Armenia
Moldova
Latvia
Estonia
Belarus
Ukraine
Lithuania
-0.5
Russia
USSR
0
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
22
Russia
Ukraine
Belarus
Estonia
Latvia
Lithuania
Moldova
Armenia
Azerbaijan
Georgia
Kazakhstan
Uzbekistan
Kyrgyzstan
Tajikistan
Turkmenistan
Characteristics
USSR
Table A-1. Inequality indexes for the former Soviet Union, 1990
0.2599
0.4760
0.1109
0.1144
5.65
170.6
0.2407
0.4414
0.0946
0.0946
4.64
186.0
0.2155
0.4003
0.0747
0.0744
3.88
173.7
0.2150
0.3970
0.0748
0.0749
3.89
188.5
0.2294
0.4209
0.0856
0.0858
4.37
234.7
0.2313
0.4303
0.0871
0.0888
4.39
217.3
0.2272
0.4229
0.0839
0.0831
4.24
210.4
0.2393
0.4458
0.0935
0.0928
4.58
161.4
0.2431
0.4524
0.0989
0.0959
4.85
167.0
0.3017
0.5780
0.1525
0.1489
7.12
116.8
0.2583
0.4794
0.1128
0.1072
5.42
172.2
0.2646
0.4948
0.1194
0.1134
5.79
155.0
0.2777
0.5323
0.1298
0.1266
6.27
103.3
0.2725
0.5153
0.1268
0.1212
6.23
116.1
0.2753
0.5372
0.1260
0.1260
6.05
91.1
0.2768
0.5265
0.1302
0.1254
6.35
113.4
Share of
intergroup
variance
Gini-coefficient
Variance
Theil index
Theil 2 index
Decile ratio
Mean income
7.7%
12.9%
14.3%
15.8%
N/A
Deciles
USSR
Russia
Ukraine
Belarus
Estonia
Latvia
Lithuania
Moldova
Armenia
Azerbaijan
Georgia
Kazakhstan
Uzbekistan
Kyrgyzstan
Tajikistan
Turkmenistan
Table A-2. Income shares by decile, former Soviet Union, 1990
Decile1
Decile2
Decile3
Decile4
Decile5
Decile6
Decile7
Decile8
Decile9
Decile10
3.58%
5.44%
6.59%
7.62%
8.65%
9.74%
10.98%
12.50%
14.64%
20.25%
4.27%
5.79%
6.81%
7.74%
8.67%
9.68%
10.82%
12.22%
14.19%
19.83%
4.71%
6.16%
7.13%
7.99%
8.86%
9.80%
10.86%
12.17%
14.04%
18.29%
4.76%
6.20%
7.15%
8.01%
8.87%
9.78%
10.81%
12.07%
13.84%
18.51%
4.48%
5.99%
6.99%
7.88%
8.77%
9.72%
10.77%
12.01%
13.84%
19.55%
4.51%
5.99%
6.97%
7.86%
8.73%
9.66%
10.71%
11.99%
13.77%
19.82%
4.48%
5.97%
6.98%
7.89%
8.80%
9.77%
10.87%
12.20%
14.04%
19.00%
4.28%
5.81%
6.82%
7.75%
8.67%
9.66%
10.81%
12.25%
14.35%
19.60%
4.04%
5.80%
6.81%
7.73%
8.67%
9.68%
10.87%
12.34%
14.43%
19.63%
3.20%
4.91%
5.96%
6.96%
8.05%
9.28%
10.74%
12.62%
15.45%
22.82%
3.70%
5.50%
6.56%
7.56%
8.57%
9.68%
10.96%
12.57%
14.86%
20.05%
3.54%
5.46%
6.51%
7.50%
8.52%
9.63%
10.92%
12.55%
14.90%
20.48%
3.45%
5.30%
6.38%
7.35%
8.35%
9.44%
10.74%
12.43%
14.95%
21.62%
3.38%
5.31%
6.47%
7.47%
8.46%
9.56%
10.85%
12.49%
14.93%
21.06%
3.62%
5.46%
6.47%
7.34%
8.23%
9.26%
10.56%
12.28%
14.85%
21.92%
3.37%
5.27%
6.42%
7.41%
8.39%
9.48%
10.78%
12.46%
14.99%
21.43%
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
Theil index
Theil 2 index
1
0.99512
0.99873
0.99791
0.99517
1
0.993
0.9969
0.9908
1
0.9964
0.998
1
0.9932
Decile ratio
Variance
Gini-coefficient
Variance
Theil index
Theil 2 index
Decile ratio
Gini-coefficient
Table A-3. Correlation coefficients between various inequality measures
1
23
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
24
1.E+01
1.E+01
USSR
Russia
Ukraine
Belarus
1.E+01
Estonia
Latvia
Lithuania
8.E+00
Moldova
Armenia
Azerbaijan
6.E+00
Georgia
Kazakhstan
Uzbekistan
Kyrgyzstan
4.E+00
Tajikistan
Turkmenistan
2.E+00
Rubles
0.E+00
-
100
200
300
400
500
600
Figure A-3. Income distribution density, former Soviet Union, 1990
2/6/2016
687318763
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
25
10.0%
USSR
Russia
Ukraine
Belarus
Estonia
Latvia
Lithuania
Moldova
Armenia
Azerbaijan
Georgia
Kazakhstan
Uzbekistan
Kyrgyzstan
Tajikistan
Turkmenistan
8.0%
6.0%
4.0%
2.0%
472
358
295
244
202
167
138
114
94
78
64
53
44
36
30
0.0%
Figure A-4. Histogram of income distribution, former Soviet Union, 1990
2/6/2016
687318763
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
26
1.5
1
0.5
0.4
0.2
0
0.0
-0.2
100.0
200.0
300.0
400.0
500.0
600.0
700.0
800.0
900.0 1000.0
-0.4
-0.5
-0.6
-0.8
-1
-1.5
-2
-2.5
Figure A-5. Graphical representation of the Theil index for combined distribution and betweengroup Theil index, former Soviet Union, 1990.
2/6/2016
687318763
Download