PQLI Dataset Codebook Erlend Garåsen

advertisement
PQLI Dataset
Codebook
Version 1.0, February 2006
Erlend Garåsen
Department of Sociology and Political Science
Norwegian University of Science and Technology
Table of Contents
1. Introduction..........................................................................................................................3
1.1 Files..................................................................................................................................3
1.2 Format ..............................................................................................................................3
2. Methodology .........................................................................................................................4
References.................................................................................................................................7
Appendix...................................................................................................................................8
2
1. Introduction
This codebook describes the PQLI dataset available for 139 countries with more than one
million inhabitants, ranging from 1975 to 2000. It was originally constructed for my Master’s
Thesis in Political Science: Democracy and Development: A Comparative Analysis in Time
and Space. Please cite as follows if you are using this dataset: Garåsen, E. (2006).
Democracy and Development: A Comparative Analysis in Time and Space. Master’s Thesis,
Department of Sociology and Political Science, Norwegian University of Science and
Technology.
1.1 Files
•
codebook.pdf – This document – describes the methodology, content and format of
the different datasets
•
Pqli_75_00.dta – PQLI data for 139 countries in Stata format
•
Pqli_75_00.sav – PQLI data for 139 countries in SPSS format
•
Pqli_75_00.txt – PQLI data for 139 countries in tab delimited text file
•
Pqli_75_00.xls – PQLI data for 139 countries in Excel format
1.2 Format
Three fields are included in the dataset; YEAR, PQLI and COW. Years are ranging from 1975 to
2000 as long as there are available data for the countries. Countries are coded using the
Correlates of War (COW) format. See Appendix for a complete list of the COW codes and
the country mappings.
3
2. Methodology
In order to construct a PQLI dataset for time-series, I tried, as far as possible, to obtain a
complete dataset for all countries with no missing data for each year included. I also used the
same source for collecting the data, and only used additional sources where missing data
were a serious problem. Data for life expectancy, infant mortality and literacy are mainly
collected from the World Bank Development Indicators, whereas some data for adult literacy,
which were missing from the World Bank, are taken from the UNESCO’s Statistical
Yearbook, various years; UNDP’s Human Development Report, various years; UNICEF’s
The State of the World’s Children, various years; and from Kurian’s (1979) The Book of
World Rankings. Missing data are mainly a problem for very poor countries, especially for
early years, countries with less than one million inhabitants and for closed societies such as
North Korea. Since missing data were especially a problem for the years prior to 1975, it was
decided only to include the years from 1975 to 2000, giving 26 years in the time-series
dataset. Also, only countries with more than one million inhabitants in all time periods were
included. This resulted in a dataset with 139 countries with 3,540 observations.
There was a problem with the life expectancy data collected from the World Bank.
Morris (1979) uses life expectancy at age 1, whereas the World Bank publishes data of life
expectancy at birth, so life expectancy at age 1 had to be constructed from the data for life
expectancy at birth and infant mortality using the following formula:
LE (1) = [LE (0) − ( IM × AVG ] ÷ (1 − IM )
(1)
where LE(1) is life expectancy at age 1, LE(0) is life expectancy at birth, IM is infant
mortality rate per thousand live births and AVG is the average time infants live who die in
their first year of life.1 LE(1) gives more weight to the mortality rate of infants under one year
old of age relative to the mortality rates of other age groups.
Morris defines literacy as the population aged fifteen and over being able to read and
write. This definition may not be suitable for all countries that may have defined literacy
1
This formula is the same used by Van der Lijn (1995). AVG needs to be further explained since it is a
complicated measure. The AVG value is 0.25 for countries with an infant mortality rate of 100 per thousand live
births or more, 0.5 for countries with an infant mortality rate of 10 per thousand live births or less and in
between 0.25 and 0.5 for countries between these rates, which means, Albania had an AVG value of 0.35 in
1970 with an infant mortality rate of 0.065, but increased the AVG value in 2000 to 0.45 with an infant mortality
rate of 0.020.
4
differently. The World Bank publishes data for illiteracy, the percentage of the illiterate
people aged fifteen and above, so the data for adult literacy has been obtained by subtracting
the illiteracy rate by 1. There are huge gaps in the literacy data, and the reason is that such
data are not collected annually as the rate is assumed not to change significantly from one
year to the next. This is also the reason why UNESCO collects such data in a five-year
period. Where such gaps existed, linear interpolation was used to obtain data between known
literacy values2. If there were no known values for, say year 1980, all years prior to 1980
were coded as missing. Likewise, linear interpolation was not used in cases where I did not
have a value for the latest years, for example the years 1999 and 2000 are missing from
Somalia. There is one exception to this rule for Guinea, where linear interpolation was used
to obtain data from 1975 to 1990 by using the literacy rate back to 1965. Estimating missing
data for such a long time span may result in imprecise rates, and it is done only for a few
cases. Literacy rates for North Korea should also be read carefully since linear interpolation
was used to estimate values between the years 1977 to 2000. Another problem was to
estimate missing data for the many OECD countries where UNESCO does not have any
recorded data. A similar method used by UNDP when constructing the HDI was used where
all the OECD countries was given the literacy rate of 99 per cent. In cases where there
actually were data available for such countries, these values where used except for countries
with a literacy rate above 99 per cent like Tajikistan for the years 1998 to 2000. Since it is
assumed that a literacy rate of 100 per cent cannot be obtained, 99 per cent is the highest
available rate.
For infant mortality rate, a few cases had missing values. The largest time span
between these missing values were four years, so linear interpolation was used to estimate the
values for these cases. In addition, four values were deleted and estimated instead by linear
interpolation for Central African Republic for the years 1983 to 1986, probably due to errors
in the World Bank data as the values were zero.
These three indicators were then converted to indices ranging from 0 to 100 where 0
represented the worst and 100 the best performance. The index for life expectancy at age 1
2
Linear interpolation is an estimating method for missing data. To be able to estimate such data, one must
assume that a phenomena changes in time. The rate of change must be, ideally, constant and one needs two time
points in order to obtain the missing values. For the literacy rates, it was assumed that the rate of change was
constant between the missing values, and the following formula was applied to obtain the slope of the line:
y = (xH - XL) / (n + 1), where y is the slope, xH is the highest value and xL is the lowest value between the missing
values, and n is the number of missing values between xH and xL. E.g., if one needs to find the two values
between 3 and 4, one would first need to find the slope, or the change between these values:
y = (4 - 3) / (2 + 1) = 0.33. So the next higher value to 3 is 3 + 0.33 = 3.33.
5
was calculated by using the highest and lowest recorded values in my dataset, 81 and 40
respectively.3 The index for infant mortality was calculated almost similarly with the highest
and lowest recorded values in my dataset, 263 and 2.9 respectively.4 The literacy indicator
was not rescaled. These three indices were then calculated to a composite indicator by
averaging them, giving each indicator equal weights in order to obtain PQLI values. Note that
PQLI rates have been calculated for some countries prior to their independence year, such as
Estonia from 1975 to 1991. The reason that these data exist is basically due to recorded
values for these countries from their sources, so there was no attempt to estimate them. As
these values will contribute to a more balanced time-series dataset with fewer missing values,
they are kept.
3
This differs somewhat from Morris’ (1979) methodology. Morris assumes that large improvements in life
expectancy will only occur if there is a breakthrough in the study of geriatrics, and he uses 77 years, two years
above the current best, as the upper limit and 38 years as the lower limit (Vietnam in 1950). I do not go beyond
my time-series dataset to find lower or higher values. Japan has the highest recorded value for life expectancy at
age 1 in 2000, and Rwanda has the lowest value in 1992. The formula applied by Morris for the 0 to 100 index
is:
Life expectancy at age one - 38
.39
where 38 is the worst recorded life expectancy value and .39 is a factor calculated by subtracting the lowest
recorded value from the highest, divided by 100, so a change in life expectancy of .39 years will result in onepoint change in the index (ibid., pp. 45f).
4
Since the highest value for infant mortality rate represents the worst compared to the lowest value, which
represents the best, the formula is somewhat different:
229 - infant mortality rate per thousand
2.22
where 229 is the worst recorded value Morris uses and 2.22 is a factor calculated by subtracting 7 (the best
recorded value Morris uses minus one) from the worst value, divided by 100 (ibid., pp. 43ff). I use 263 and 2.9
as the worst and best values, Cambodia in 1977 and Singapore in 2000 respectively.
6
References
Garåsen, E. (2006). Democracy and Development: A Comparative Analysis in Time and
Space. Master’s Thesis, Department of Sociology and Political Science, Norwegian
University of Science and Technology.
Kurian, G.T. (1979). The Book of World Rankings. London: The Macmillian Press Ltd.
Morris, D.M. (1979). Measuring the Conditions of the World’s Poor: The Physical Quality of
Life Index. New York: Pergamon Press Inc.
UNESCO (1980). Statistical Yearbook 1980. London: UNESCO
UNESCO (1984). Statistical Yearbook 1984. Paris: The Unesco Press
UNESCO (1991). Statistical Yearbook 1991. Paris: The Unesco Press
UNESCO (1993). Statistical Yearbook 1990/91. New York: Department of Economic and
Social Information and Policy Analysis, Statistical Division
UNESCO (1994). Statistical Yearbook 1994. Paris: The Unesco Press
UNESCO (1995). Statistical Yearbook 1995. Paris: UNESCO Publishing & Bernan Press
UNDP (1999). Human Development Report 1999. New York: Oxford University Press
UNDP (2000). Human Development Report 2000. New York: Oxford University Press
UNDP (2002). Human Development Report 2002. New York: Oxford University Press
UNICEF (1996). The State of the World’s Children 1996. New York: UNICEF
UNICEF (2000). The State of the World’s Children 2000. New York: UNICEF
UNICEF (2001). The State of the World’s Children 2001. New York: UNICEF
UNICEF (2005). The State of the World’s Children 2005. New York: UNICEF
Van der Lijn, N. (1995). Measuring well-being with social indicators, HDI, PQLI, and BWI
for 133 countries for 1975, 1980, 1985, 1988, and 1992. Tilburg University, Faculty
of Economics and Business Administration Research Memorandum, No. 704
World Bank (2002). World Development Indicators. New York: United Nations
7
Appendix
COW
700
339
615
540
160
371
900
305
373
771
370
211
434
145
346
571
140
355
439
516
811
471
20
482
483
155
710
100
484
490
94
437
344
40
316
390
42
130
651
92
531
366
530
375
220
372
Country
Comments
Afghanistan
Albania
Algeria
Angola
Argentina
Armenia
Australia
Austria
Azerbaijan
Bangladesh
Belarus
Belgium
Benin
Bolivia
Bosnia
Botswana
Brazil
Bulgaria
Burkina Faso
Burundi
Cambodia
Cameroon
Canada
Central African Republic
Chad
Chile
China
Colombia
Congo, Republic of the
Congo, Democratic Republic of the
Costa Rica
Cote d’Ivoire
Croatia
Cuba
Czech Republic
Denmark
Dominican Republic
Ecuador
Egypt
El Salvador
Eritrea
Estonia
Ethiopia
Finland
France
Georgia
8
255
452
350
90
438
41
91
310
750
850
630
645
205
666
325
51
740
663
705
501
731
732
703
812
367
660
570
450
620
368
343
580
553
820
432
435
70
359
712
600
541
775
790
210
920
93
436
475
385
770
Germany
Ghana
Greece
Guatemala
Guinea
Haiti
Honduras
Hungary
India
Indonesia
Iran
Iraq
Ireland
Israel
Italy
Jamaica
Japan
Jordan
Kazakhstan
Kenya
Korea, North
Korea, South
Kyrgyzstan
Laos
Latvia
Lebanon
Lesotho
Liberia
Libya
Lithuania
Macedonia
Madagascar
Malawi
Malaysia
Mali
Mauritania
Mexico
Moldova
Mongolia
Morocco
Mozambique
Myanmar
Nepal
Netherlands
New Zealand
Nicaragua
Niger
Nigeria
Norway
Pakistan
United
9
95
910
150
135
840
290
235
360
365
517
670
433
451
830
317
349
520
560
230
780
625
380
225
652
702
510
800
461
616
640
701
500
369
200
2
165
704
101
816
818
678
679
345
551
552
Panama
Papua New Guinea
Paraguay
Peru
Philippines
Poland
Portugal
Romania
Russia
Rwanda
Saudi Arabia
Senegal
Sierra Leone
Singapore
Slovakia
Slovenia
Somalia
South Africa
Spain
Sri Lanka
Sudan
Sweden
Switzerland
Syria
Tajikistan
Tanzania
Thailand
Togo
Tunisia
Turkey
Turkmenistan
Uganda
Ukraine
United Kingdom
United States
Uruguay
Uzbekistan
Venezuela
Vietnam, North
Vietnam
Yemen, North
Yemen
Yugoslavia
Zambia
Zimbabwe
United
United
United
10
Download