Draft Comments appreciated Decomposition of Inequality Based on Incomplete Information A contributed paper to the IARIW 24th General Conference Lillehammer, Norway, August 18-24, 1996 Yuri Dikhanov Statistical Advisory Services International Economics Department, IECDD The World Bank 1818 H Street, N.W. Room N2-038 Washington, D.C. 20433 U.S.A. phone: (202)458-2667 fax: (202)522-3669 e-mail: ydikhanov@worldbank.org Abstract In this paper, the author examines five measures of inequality: the Gini coefficient, two entropy (Theil) indexes, normalized variance and decile ratio. It is shown how to decompose these indexes into intra-group and between-group inequalities. These indexes are used to study inequalities in the former Soviet republics in 1990. This study is based on incomplete information on income intervals (only income boundaries and population shares have been used). The robustness of the approximating procedure (piecewise polynomial interpolation of the cumulative distribution function) is discussed. Two alternative representations of the Gini coefficient are discussed as well. The views presented in this paper are the author’s and do not necessarily represent those of the World Bank or its Board of Directors. I. Introduction Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 2 Analysis of income or wealth distribution often includes decomposing inequality for total population into between-group and within-group inequalities. Not all inequality measures are decomposable, and not all of the decomposable ones are decomposable in the same way. Theoretically, the second Theil measure (T2) has probably the best properties. The Gini index, however, is the most widely cited measure. In this paper we made an attempt to decompose the Gini index in a meaningful way (see Section IV). The Gini index, along with the two Theil measures, normalized variance and decile coefficient, was then used to analyze income inequalities in the former Soviet Union and its Republics in 1990 (see Section II and Annex). We found that the share of inter-group inequality in total inequality was in the range of 7.7-15.8 percent, depending on the index. As inputs into this exercise, we used official data on intervals: seven interval boundaries and population shares within these boundaries for each of the former Soviet Republics. To process these discrete data we used interpolation with polynomial of order four on each interval. These polynomials are chosen to be twice continuously differentiable in all points of the distribution, which allows differential and integral operations with a distribution function and its derivatives in explicit form. Section III discusses the robustness of these procedure using two numerical examples: a “bad” one, a hypothetical mixture of two normal distributions with different means and variances, presented as five income intervals (quintiles); and a “good” one, a log-normal distribution, presented as ten intervals. As expected, in the “good” case the precision of approximation is by one or two order of magnitude better than in the “bad” case (0.004-0.39 percent depending on the parameter versus 0.2 - 1.3 percent). Section V discusses two alternative graphical and analytical representations of the Gini coefficients that are based on the original distribution function rather than on the Lorenz curve. II. Decomposition of income inequalities in the former Soviet Union. There were two major reasons we used the former Soviet Union data from 1990: first, the data were available (there were not many countries where regional inequality data were collected on the regular and comparable basis); and, second, since 1990 the former Soviet Republics have become independent countries, and as economies in transition, they attract the special attention of academics and policy makers. Original information included boundaries and population shares for seven intervals (see Table 1 below). To process this information we used a version of our Gini ToolPak. Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 3 Georgia Kazakhstan Uzbekistan 10.0 14.4 31.1 21.5 11.9 6.0 5.1 34.1 23.0 26.8 10.1 3.7 1.4 0.9 Turkmenistan Azerbaijan 6.5 11.2 28.7 23.1 14.5 8.2 7.8 Tajikistan Armenia 29.7 19.7 26.8 13.0 6.0 2.7 2.1 Kyrgyzstan Moldova 5.4 11.3 31.6 24.6 14.3 7.1 5.7 Estonia 6.1 12.5 32.9 24.5 13.0 6.4 4.6 Belarus 1.2 4.5 20.9 25.8 20.5 13.3 13.8 Ukraine 7.8 3.2 2.7 1.5 0.6 0.9 10.6 8.2 8.6 5.9 2.7 3.8 28.0 27.2 31.2 27.0 15.4 19.5 23.9 26.0 28.0 28.9 23.6 26.1 14.9 17.3 16.2 19.1 21.7 21.3 8.0 9.6 7.9 10.0 16.2 13.9 6.8 8.5 5.4 7.6 19.8 14.5 Russia Lithuania <75 75-100 100-150 150-200 200-250 250-300 >300 USSR Interval boundaries Latvia Table 1. Original data on income distribution shares in the former Soviet Union for 1990 24.8 45.1 26.9 21.7 22.7 22.3 30.8 21.6 29.6 13.7 6.8 12.7 5.5 2.4 5.1 2.1 0.9 2.0 1.4 0.5 1.4 The overall results can be assessed from Figure 0-2 from the Annex that represents normalized values of various inequality measures (inequality indexes normalized by their standard deviations). As we can see, the lowest inequality was observed in Belarus and Ukraine, followed by Estonia, Latvia and Lithuania. That the Baltics had higher inequality than Belarus and Ukraine has to be attributed to the fact that, although minimum wages were the same in all of these former republics, the means were higher in the Baltics. Russia had a higher income inequality than these economies, which is to be expected given her size. A factor that additionally increased the inequality for Russia was the relatively high prices (and hence wages) in Siberia. The highest inequality was registered for Azerbaijan and the Central Asian states (Uzbekistan, Kyrgyzstan, Turkmenistan and Tajikistan). The results for Azerbaijan are not obvious given the much lower numbers for neighboring Armenia and Georgia. Another piece of information that Figure 2 provides relates to the correlation between the indexes. We can see that, in general, all the indexes for this set of countries produce highly consistent results. Table 2 of the Annex provides correlation coefficients. As we can see, one of the highest values of correlation coefficients is observed for the Theil1Theil2 pair: r2=0.9964. By absolute value, the difference between them is around 2 percent, which can be seen as a measure of the deviation of the actual distribution from the log-normal one (as we know, under the assumption of log-normal distribution, the two Theil indexes coincide). As we can see, for some economies the deviation between the two Theil indexes is insignificant: 0.1-0.2 percent - though a part of that can be attributed to the fact that the approximation errors go in opposite directions. The two Theil indexes and the Gini coefficient are correlated even tighter: r2=0.9979-0.9987. Also a very high correlation was registered for the Theil1-Decile ratio pair: r2=0.9980. Tight correlation is also observed for the Theil 2 - Decile ratio pair: r2=0.9932. The lowest value of correlation coefficient is registered for the Variance-Decile ratio pair: r2=0.9908. We have to say, though, that this value is still very significant. The overall conclusion is that all these inequality measures produce coherent results. Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 4 Table 1 of the Annex provides the results of actual estimations. Shares of inter-group variance presented in the table are of special significance for this paper. The two Theil indexes and variance display similar results in the 12.9-15.8 percent range. The share of inter-group variances for the Gini coefficient, on the other hand, is only 7.7 percent, which is roughly half of those for other measures. One has to bear in mind, however, that the ways these indexes decompose are different, and, thus, are not directly comparable. The two Theil indexes, for example, produce identical results only under the assumption of “log-normality” of the distribution. However, shares of inter-group variances will still be different because they are aggregated with different weights (income and population shares, respectively). The inter-group results produced using the second Theil index (0.0170) can be compared to those estimated by H. Theil (1989, Development of International Inequality, Journal of Econometrics, Vol. 42, No. 1, North-Holland). For 1985, he found the inequality between the OECD countries (without Australia) to be 0.0859; for tropical America, 0.0580; for tropical Asia, 0.2003; and for tropical Africa, 0.1871. Figure 5 of the Annex provides a graphical representation of the Theil index for combined distribution versus the between-group Theil index. Figure 3 of the Annex presents density functions of income distribution in the former republics. It is interesting to note that the Estonian distribution has slight irregularities in the upper part of the distribution. This might indicate urban/rural or Tallinn/rest of the country income differentials1. More likely, a factor that might have contributed to that situation was the advance of reforms in Estonia: in 1990 this country had the highest share of non-agricultural private sector in the former Soviet Union, which provided much higher salaries than the state sector. Figure 4 of the Annex is a histogram on a logarithmic scale. It shows shares of population within proportional boundaries (the next boundary is in proportion to the previous one). It has to be noted that in this case the highest point would not be the mode as in a distribution density function, but the mean. Using this type of histogram requires, however, some compliance with the assumption of “log-normalness” of the distribution. Table 2 of the Annex presents income shares by decile. That Azerbaijan had the highest inequality and Belarus and Ukraine had the lowest can be directly inferred from this table. III. Robustness of the computational procedure 1 Tallinn had 35 percent of Estonian population and 45 percent of the income. Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 5 For this exercise the Gini Toolpak was used. In this section we will briefly explore the issues of robustness of the procedure. We will use two numerical examples: a “good” case (ten income intervals for log-normal distribution); and a “bad” case (five intervals, i.e., quintile data; for a mixture of two normal distributions with different means and variances). The essence of the procedure (polynomial interpolation) is the following: Let’s assume that we are given only a set {F(Yi)} of M elements which describes values that the cumulative distribution function takes at Yi. We need to approximate all other points of the distribution, i.e., to estimate F(y) for y[0,+]. Within each interval [Yi+1 ,Yi], we will interpolate the distribution function by a polynomial of the order 4 in the form: n 3 y Yi n Fi ,i 1 ( y ) i Yi 1 Yi n0 At the boundaries the polynomials are exact, and are not interpolations: i.e., Fi ,i 1 (Yi ) Fi 1,i (Yi ) F (Yi ) . These polynomials are chosen to be twice continuously differentiable across the boundaries. This is a very important property, because it allows differential and integral operations with F and its derivatives in explicit form. For example, the mean of the M 3 nY Y distribution would be calculated as follows: ydF ni i 1 i , where M is n 1 i 0 n 1 the number of intervals. Other characteristics of the distribution function can be derived in a similar way. Errors of estimation in polynomial interpolation Using logic similar to that behind the remainder term of Taylor formula in Lagrange form, we arrive at the following expression for estimation errors2: 4 1 Yi 1 Yi (4) Fi ,i 1 ( y ) F ( y ) F ( ) 4! 2 where arg max( F ( 4 ) ( y ) ) y [ Yi ,Yi 1 ] In the case of normal (standard) distribution the above boils down to: 1 Fi ,i 1 ( y ) F ( y ) (Y Y ) 4 F (1) ( )(3 3 ) 384 i 1 i 2 Dividing interval [Yi+1 ,Yi] in half simply states that, because at the end of the interval the polynomial becomes exact again, maximum errors are attained around the middle of the interval. The coefficient 1/384 [1/(24 4!)] is the absolute theoretical minimum for the errors. The minimum is attained when the polynomial coefficients for the interval are determined (almost) independently of other intervals. In other cases, the inequality is somewhat different, although the order of magnitude for errors remains the same. Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 6 Or, in the case when the intervals are separated by /2, we obtain that the biggest errors will be in the interval [0.5, ] (that can be seen from the first order condition for F (1) ( )(3 3 ) ), and the errors in this interval are expressed as follows: 1 2 32 6 Fi ,i 1 ( y ) F ( y ) e 0.01% 384 16 2 A. “Good” case As a “good” case, we used ten income intervals for the log-normal distribution LN(5,0.25). The results are presented in the table below. Graphical results are presented in Figure 1. As can be seen from the graph, the actual distribution cannot be readily distinguished from the simulation. The largest difference is for the mode, which is notoriously difficult to get. Actual values Mean Income 153.12 Gini-coefficient 0.14032 Median Income 148.41 Mode Income 139.42 Variance 38.887 Income less than mean 0.5497 Theil index 0.03125 Theil index 2 0.03125 Simulation Difference 153.09 0.14023 148.41 139.97 38.923 0.5494 0.03123 0.03126 -0.02% -0.06% 0.00% 0.39% 0.09% -0.06% -0.07% 0.03% 0.009 0.008 0.007 0.006 0.005 0.004 0.003 0.002 0.001 0 0 100 200 300 400 500 Figure 1. Deviation of simulation from actual distribution: a "good" case Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 7 B. “Bad” case As a “bad” case, we used five income intervals for the mixture of two normal distributions N(40,10) and N(60,5). The results are presented in the tables below. Graphical results are presented in Figure 2. As can be seen from the graph, the actual distribution is visually readily distinguishable from the simulation. The largest difference is again for the mode. Inputs into the procedure Interval boundaries Quintiles of population < 37.4696 Quintile I 37.4696 to 48.10972 Quintile II 48.10972 to 56.60144 Quintile III 56.60144 to 61.47081 Quintile IV > 61.47081 Quintile V Results of the simulation Mean Income Median Income Mode Income Income less than mean Actual values 50.00 53.33 59.64 43.20% Simulation 49.67 53.23 58.87 42.82% Difference -0.7% -0.2% -1.3% -0.9% 50.0 0.1 Distribution 45.0 0.09 density 40.0 0.08 35.0 0.07 30.0 0.06 25.0 0.05 20.0 0.04 15.0 0.03 0.02 10.0 5.0 0.01 0.0 0 0 10 20 30 40 50 60 70 80 80 Figure 2. Deviation of simulation from actual distribution: "bad" case Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 8 IV. Decomposition of inequality measures IV.1. Decomposition of GINI - coefficient Let’s consider a distribution F defined by its cumulative distribution function F(y). The respective distribution density function is F. The mean of that distribution is defined as i ydFi ( y) using Lebesgue-Stiltjes integrals. (Hereinafter a plain integral sign describes integrating from 0 to +). Then the essence of the Gini - coefficient can be seen from the graph of the Lorenz curve (see Figure 3). 1 y 0 ydF ( y ) F(y) Figure 3. Lorenz curve Gini-coefficient is defined as equal to twice the area between the 45 line and Lorenz curve. Or F 2 G 1 ( yd)dF , 0 or, G 2 F 2 Fd ( yd) 1 FydF 1 0 Let’s consider two distributions F1 and F2, where the distributions are defined by their respective cumulative distribution functions Fi(y). The respective distribution density functions are Fi. Means are defined as i ydFi . Thus, we can define Gini coefficients G for the respective functions as follows: Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information G1 2 1 1 2 y( F 2 )F dy F y F dy 1 1 9 1 1 1 1 2 1 2 G2 y ( F2 )F2dy F y F dy 1 2 2 2 2 2 Then, for the combined distribution we can write: 2 1 G y ( 1 F1 2 F2 )( 1 F1 2 F2)dy 1 1 2 2 2 (1) (2) where: pi i - income share of the i distribution p1 1 p2 2 pi - population share 1 1 2 2 - mean income for the combined distribution i Or, after some simple operations we will receive: G 1G1 2 G2 2 p1 p2 y( F 1 F2 )(F1 F2)dy Expression (3) is obtained as follows: 2 G y ( p1 F1 p2 F2 )( p1 F1 p2 F2)dy 1 p1 1 p2 2 2 y[ p12 F1 F1 p22 F2 F2 p1 p2 ( F1 F2 F2 F1)]dy 1 p1 1 p2 2 2 y[ p12 F1 F1 p22 F2 F2 p1 p2 (F2( F2 F1 F2 ) F1( F1 F2 F1 ))]dy 1 : 2 y[( p12 p1 p2 ) F1 F1 ( p22 p1 p2 ) F2 F2 p1 p2 (F2 F1)( F1 F2 )]dy 1 2 y[ p1 F1 F1 p2 F2 F2 p1 p2 (F2 F1)( F1 F2 )]dy 1 and, because pi i G 1G1 2 G2 i 2 p1 p2 y(F F )( F 2 1 1 F2 )dy (3) Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 10 It is easy to see how the above expression can be expanded for a multi-component case: G 2 pi i y p F p F dy 1 i i i i i i i 1 pi i y{2 [ F F ( p p p )] p p ( F F )( F F )}dy 1 i 2 i i i j i i j i j i j i j i, j i 1 pi i y{2 F F p p p ( F F )( F F )}dy 1 i i i i i j i j i j i, j i The above expression can be rewritten as follows: pi p j G i Gi y ( Fi F j)( Fi F j )dy i i, j (4) And, as it is easy to see how the Gini-coefficient can be expressed through the covariance as well: 2 COV ( y , Fi ) Yi and the combined Gini-coefficient can be written as: pi p j 2 G i COV ( y, Fi ) COV ( y, Fi Fj ) Gi i i i, j (5) Or, G 1 {2 pi COV ( y, Fi ) pi p j COV ( y, Fi Fj )} i i, j The first component stands for intra-group covariances, whereas the second stands for inter-group covariance. As we can see from expression (3), the Gini - coefficient for the combined distribution consists of two parts: intra-group and inter-group variances. Similar to the Theil coefficient T1, the individual Gini - coefficients are added up with income weights. Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 11 IV.2. Decomposition of entropy (Theil) indexes In his book, H. Theil (1967, Economics and Information Theory, North-Holland, Amsterdam), introduced, for income inequality measurement, the entropy measure used in thermodynamics and information theory. He suggested using the entropy index in two forms: as income-weighted and population-weighted entropy indexes. In this paper we will call them T1 and T2 respectively. These indexes can be represented as follows: Yi Yi Ni ) Y N i Y N N Yi T 2 i log( i ) N N Y i T1 log( where, Yi Ni is income of group i; is number of people in group i Or, using Lebesgue-Stiltjes integrals: T1 y y log( ) dF ( y) T 2 log( ) dF ( y ) y As can be shown, these indexes are easily decomposable in the multi-group case. For the Theil index T1 we have: T1i j Yi j Yi log( Yi j Yi Nij ) Ni where: Yij is income of sub-group j of group i; Nij is number of people in sub-group j of group i; Yi is income of group i; Ni is number of people in group i Or, using Lebesgue-Stiltjes integrals: T1i y log( y ) dF ( y ) i i i The Theil index T1 decomposes into: Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information T1 i T1i i log( i ) i T1i i log( 12 i ) pi i j T2 decomposes in a similar way with the population weights p. i j As has been shown by F. Bourguignon (1979, Decomposable Income Inequality Measures, Econometrica, Vol. 47, No. 4.), and A.F.Shorrocks (1980, Inequality Measures, Econometrica, Vol. 48, No 3), the Theil indexes are the only income-weighted and population-weighted indexes respectively that can be decomposed in that way: i. e., weighted sum of individual Theil indexes and the Theil index constructed of individual distributions as if they were elements of the combined distribution. In this sense, the decomposition of the Theil indexes is different from that of the Gini. IV.3. Decomposition of normalized variance Normalized variance can be seen as a simple way of describing income inequalities. y yj y yj y s 2 ( ) pi p j COV ( i , ) i j COV ( i , ) i, j i j i, j Or, y y yj y s 2 ( ) k2 s 2 ( k ) i j COV ( i , ) k k i j i j IV.4. Decomposition of decile ratio Decile ratio is a simple and transparent inequality measure, however it cannot be meaningfully decomposed. IV.5. Lorenz curve The Lorenz function L is the function of income shares on population shares. The Lorenz curve associated with this function is plotted in Figure 3. The Lorenz curve plays an enormous role in income distribution analysis. Some important relationships between the Lorenz curve and the cumulative distribution function, as well as a graphical representation of the Theil index, are shown below. Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 13 L(F)=y/ F 1 0 Figure 4. First derivative of the Lorenz curve Figure 4 shows the first derivative of the Lorenz curve, L(F). It can be easily seen that L(F) is essentially the normalized income y/, and, thus, is the inverse (normalized) of the cumulative distribution function. The graph is also related to the Theil (T2) index. The logarithm of this graph is a graphical representation of the index (because the index can be presented as T 2 log( ) dF ( y ) . y Log(L(F))=log(y/) 0 1 F Figure 5. Graphical representation of the Theil index (T2) Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 14 The second derivative of the Lorenz curve is also an important characteristic of a distribution: L(F)=yF/.. It is essentially the expression for the inverse function of a distribution density function F(y). L(F)=yF/ F 1 0 Figure 6. Second derivative of the Lorenz curve IV.6. Some properties of log-normal distribution Log-normal distribution plays an important role in inequality measurements. It is thought that real distributions of wealth and income at least partially can be approximated by it. An extensive treatment of the log-normal distribution is contained in J. Aitchison and J.Brown (1957, The Lognormal Distribution, Cambridge University Press). Here we mention just a few relevant properties. F ( y ) e 1 e 2 (ln y ) 2 2 2 2 / 2 Median e Mode e 2 S e e (e 1) 2 s S 2 (e 1) 2 E (z ) e m m2 where ln z z 2 2 1 y Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 15 A convenient feature of the log-normal distribution is the simplicity of the Gini and Theil indexes: T1 (ln y ln )dL 1 ln y ln T1 e 2 ln y y e 2 1 1 1 e te t e 2 2 / 2 (ln y ) 2 2 2 dy (ln y ) 2 d ln y ( 2 / 2) 2 2 ( t ) 2 2 2 dt ( 2 / 2) t e 2 ( t ( 2 )) 2 dt ( 2 / 2) 2 2 2 ( 2 / 2) 2 / 2 And, in the case of the second Theil index, we can obtain the following expression: T 2 (ln ln y)dF ln y (ln2y2 ) T 2 e d ln y ( 2 / 2) 2 ( 2 / 2) 2 / 2 2 We can use the test of T1=T2 to examine how close a given distribution approaches a log-normal one. The relationship of the Theil measures to normalized variance can be expressed as follows: T1 T 2 2 2 1 ln(1 s2 ) 2 In the log-normal case, we can also think of the Theil indexes as the difference between the mean and median. T1 T 2 2 2 log( Median ) And, finally, as can be easily seen, the Gini coefficient for the log-normal distribution can be written as follows: G 2( / 2) 1 2 F (e / 2 ) 1 where (.) is the standard normal cumulative distribution. 2 Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 16 V. Two alternative representations of the Gini coefficient Apart from the traditional visualization of the Gini index using the Lorenz curve, it is possible to represent the Gini using simple graph of the distribution function. Below two such representations are discussed. 1. Let’s start from the following expression for the Gini coefficient: 2 1 (6) G y( F )dF 2 Or, as it is easy to see, expression (6) can be written as: 2 (7) G ( y ) FdF Expressions (6) and (7) are equivalent to: G 2 (8) Cov( y, F ) We can rewrite Expression (8) using slope coefficient as follows: G 2 1 )dF 1 2 1 2 ( F ) 2 dF Slope( F , y ) 2 2 12 1 ( F ) dF 2 ( y )( F 1 because F 2 1 2 F3 F2 F 1 ( F ) dF 2 3 2 4 0 12 (9) where Slope = slope coefficient3 Or, finally, 1 G Slope( F , y ) 6 where y (10) y Expression (10) can be obtained from Expression (8) in a different way as well. Let’s start from rewriting Expression (8) using the correlation coefficient4: 3 Slope( x, y) ( x E ( x))( y E ( y)) ( x E ( x)) i i i Slope( x, y) i 2 i ( x E ( x))( y E ( y))dF ( x E ( x)) dF 2 , where i are weights, or, in continuous case, Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information G 2 Cov ( y , F ) because F 2 2 ( y , F ) y F (11) 1 , [see Expression (9)]. 12 Now, using ( y, F ) Slope( y, F ) G 1 y ( y, F ) 3 17 F , we obtain Expression (10) again: y 1 y 1 Slope( F , y ) F Slope( F , y ) y 6 3 F 1/2 y 1 Slope(F, y )=6*Gini Figure 7. Graphical representation of the Gini coefficient as one sixth of the slope coefficient between income y and distribution function F. 2. The next representation of the Gini coefficient can be obtained using Expression (7): 2 2 ( F 2 ) 2 G ( y ) FdF yFdF 1 y dF 1 1 (12) 4 Discrete case of using correlation coefficients in expressing Gini coefficient [Expression (11)] was shown in Milanovic (1996) Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 18 where ( F 2 ) ydF 2 , or the mean for the square of distribution F. Or, equivalently: G y dF 2 y dF Fdy F 2 dy (13) It is easy to see that distribution F2 has all the properties of a regular distribution. F2 is a monotonous transformation of F , and, hence, is itself a monotonously increasing function bounded by [0,1]. Expression (12) essentially says that the Gini coefficient is equal to the difference between regular mean and the mean for the square of distribution ( F2). The expression is presented in Figure 8 in graphical form. In the case when income normalized by the mean, the Gini coefficient is equal to the area between the distribution function F and the squared distribution function F2. F F 1/2 Area = Gini F2 1 y Figure 8. Graphical representation of the Gini coefficient as the area between the distribution function F and the squared distribution function F2. Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 19 References Aitchison J. and J.Brown, 1957, The Lognormal Distribution, Cambridge University Press, Cambridge. Bourguignon F.,1979, Decomposable Income Inequality Measures, Econometrica, Vol. 47, No. 4. Shorrocks A.F., 1980, Inequality Measures, Econometrica, Vol. 48, No 3. Theil H., 1967, Economics and Information Theory, North-Holland, Amsterdam. Theil H.,1989, Development of International Inequality, Journal of Econometrics, Vol. 42, No. 1, North-Holland, Amsterdam. Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 20 ANNEX 10.0000 0.1000 Gini-coefficient Variance Theil index Theil 2 index Decile ratio 0.0100 Figure A-1. Inequality in the former Soviet Union, 1990 (various indexes by absolute value) Turkmenistan Tajikistan Kyrgyzstan Uzbekistan Kazakhstan Georgia Azerbaijan Armenia Moldova Lithuania Latvia Estonia Belarus Ukraine Russia USSR 1.0000 Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 21 2.5 2 Gini-coefficient Variance Theil index 1.5 Theil 2 index Decile ratio 1 0.5 -1 -1.5 Figure A-2. Correlation between various inequality measures in the former Soviet Union, 1990 (inequality indexes normalized by standard deviation) Turkmenistan Tajikistan Kyrgyzstan Uzbekistan Kazakhstan Georgia Azerbaijan Armenia Moldova Latvia Estonia Belarus Ukraine Lithuania -0.5 Russia USSR 0 Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 22 Russia Ukraine Belarus Estonia Latvia Lithuania Moldova Armenia Azerbaijan Georgia Kazakhstan Uzbekistan Kyrgyzstan Tajikistan Turkmenistan Characteristics USSR Table A-1. Inequality indexes for the former Soviet Union, 1990 0.2599 0.4760 0.1109 0.1144 5.65 170.6 0.2407 0.4414 0.0946 0.0946 4.64 186.0 0.2155 0.4003 0.0747 0.0744 3.88 173.7 0.2150 0.3970 0.0748 0.0749 3.89 188.5 0.2294 0.4209 0.0856 0.0858 4.37 234.7 0.2313 0.4303 0.0871 0.0888 4.39 217.3 0.2272 0.4229 0.0839 0.0831 4.24 210.4 0.2393 0.4458 0.0935 0.0928 4.58 161.4 0.2431 0.4524 0.0989 0.0959 4.85 167.0 0.3017 0.5780 0.1525 0.1489 7.12 116.8 0.2583 0.4794 0.1128 0.1072 5.42 172.2 0.2646 0.4948 0.1194 0.1134 5.79 155.0 0.2777 0.5323 0.1298 0.1266 6.27 103.3 0.2725 0.5153 0.1268 0.1212 6.23 116.1 0.2753 0.5372 0.1260 0.1260 6.05 91.1 0.2768 0.5265 0.1302 0.1254 6.35 113.4 Share of intergroup variance Gini-coefficient Variance Theil index Theil 2 index Decile ratio Mean income 7.7% 12.9% 14.3% 15.8% N/A Deciles USSR Russia Ukraine Belarus Estonia Latvia Lithuania Moldova Armenia Azerbaijan Georgia Kazakhstan Uzbekistan Kyrgyzstan Tajikistan Turkmenistan Table A-2. Income shares by decile, former Soviet Union, 1990 Decile1 Decile2 Decile3 Decile4 Decile5 Decile6 Decile7 Decile8 Decile9 Decile10 3.58% 5.44% 6.59% 7.62% 8.65% 9.74% 10.98% 12.50% 14.64% 20.25% 4.27% 5.79% 6.81% 7.74% 8.67% 9.68% 10.82% 12.22% 14.19% 19.83% 4.71% 6.16% 7.13% 7.99% 8.86% 9.80% 10.86% 12.17% 14.04% 18.29% 4.76% 6.20% 7.15% 8.01% 8.87% 9.78% 10.81% 12.07% 13.84% 18.51% 4.48% 5.99% 6.99% 7.88% 8.77% 9.72% 10.77% 12.01% 13.84% 19.55% 4.51% 5.99% 6.97% 7.86% 8.73% 9.66% 10.71% 11.99% 13.77% 19.82% 4.48% 5.97% 6.98% 7.89% 8.80% 9.77% 10.87% 12.20% 14.04% 19.00% 4.28% 5.81% 6.82% 7.75% 8.67% 9.66% 10.81% 12.25% 14.35% 19.60% 4.04% 5.80% 6.81% 7.73% 8.67% 9.68% 10.87% 12.34% 14.43% 19.63% 3.20% 4.91% 5.96% 6.96% 8.05% 9.28% 10.74% 12.62% 15.45% 22.82% 3.70% 5.50% 6.56% 7.56% 8.57% 9.68% 10.96% 12.57% 14.86% 20.05% 3.54% 5.46% 6.51% 7.50% 8.52% 9.63% 10.92% 12.55% 14.90% 20.48% 3.45% 5.30% 6.38% 7.35% 8.35% 9.44% 10.74% 12.43% 14.95% 21.62% 3.38% 5.31% 6.47% 7.47% 8.46% 9.56% 10.85% 12.49% 14.93% 21.06% 3.62% 5.46% 6.47% 7.34% 8.23% 9.26% 10.56% 12.28% 14.85% 21.92% 3.37% 5.27% 6.42% 7.41% 8.39% 9.48% 10.78% 12.46% 14.99% 21.43% Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information Theil index Theil 2 index 1 0.99512 0.99873 0.99791 0.99517 1 0.993 0.9969 0.9908 1 0.9964 0.998 1 0.9932 Decile ratio Variance Gini-coefficient Variance Theil index Theil 2 index Decile ratio Gini-coefficient Table A-3. Correlation coefficients between various inequality measures 1 23 Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 24 1.E+01 1.E+01 USSR Russia Ukraine Belarus 1.E+01 Estonia Latvia Lithuania 8.E+00 Moldova Armenia Azerbaijan 6.E+00 Georgia Kazakhstan Uzbekistan Kyrgyzstan 4.E+00 Tajikistan Turkmenistan 2.E+00 Rubles 0.E+00 - 100 200 300 400 500 600 Figure A-3. Income distribution density, former Soviet Union, 1990 2/6/2016 687318763 Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 25 10.0% USSR Russia Ukraine Belarus Estonia Latvia Lithuania Moldova Armenia Azerbaijan Georgia Kazakhstan Uzbekistan Kyrgyzstan Tajikistan Turkmenistan 8.0% 6.0% 4.0% 2.0% 472 358 295 244 202 167 138 114 94 78 64 53 44 36 30 0.0% Figure A-4. Histogram of income distribution, former Soviet Union, 1990 2/6/2016 687318763 Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information 26 1.5 1 0.5 0.4 0.2 0 0.0 -0.2 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0 900.0 1000.0 -0.4 -0.5 -0.6 -0.8 -1 -1.5 -2 -2.5 Figure A-5. Graphical representation of the Theil index for combined distribution and betweengroup Theil index, former Soviet Union, 1990. 2/6/2016 687318763