PQLI Dataset Codebook Version 1.0, February 2006 Erlend Garåsen Department of Sociology and Political Science Norwegian University of Science and Technology Table of Contents 1. Introduction..........................................................................................................................3 1.1 Files..................................................................................................................................3 1.2 Format ..............................................................................................................................3 2. Methodology .........................................................................................................................4 References.................................................................................................................................7 Appendix...................................................................................................................................8 2 1. Introduction This codebook describes the PQLI dataset available for 139 countries with more than one million inhabitants, ranging from 1975 to 2000. It was originally constructed for my Master’s Thesis in Political Science: Democracy and Development: A Comparative Analysis in Time and Space. Please cite as follows if you are using this dataset: Garåsen, E. (2006). Democracy and Development: A Comparative Analysis in Time and Space. Master’s Thesis, Department of Sociology and Political Science, Norwegian University of Science and Technology. 1.1 Files • codebook.pdf – This document – describes the methodology, content and format of the different datasets • Pqli_75_00.dta – PQLI data for 139 countries in Stata format • Pqli_75_00.sav – PQLI data for 139 countries in SPSS format • Pqli_75_00.txt – PQLI data for 139 countries in tab delimited text file • Pqli_75_00.xls – PQLI data for 139 countries in Excel format 1.2 Format Three fields are included in the dataset; YEAR, PQLI and COW. Years are ranging from 1975 to 2000 as long as there are available data for the countries. Countries are coded using the Correlates of War (COW) format. See Appendix for a complete list of the COW codes and the country mappings. 3 2. Methodology In order to construct a PQLI dataset for time-series, I tried, as far as possible, to obtain a complete dataset for all countries with no missing data for each year included. I also used the same source for collecting the data, and only used additional sources where missing data were a serious problem. Data for life expectancy, infant mortality and literacy are mainly collected from the World Bank Development Indicators, whereas some data for adult literacy, which were missing from the World Bank, are taken from the UNESCO’s Statistical Yearbook, various years; UNDP’s Human Development Report, various years; UNICEF’s The State of the World’s Children, various years; and from Kurian’s (1979) The Book of World Rankings. Missing data are mainly a problem for very poor countries, especially for early years, countries with less than one million inhabitants and for closed societies such as North Korea. Since missing data were especially a problem for the years prior to 1975, it was decided only to include the years from 1975 to 2000, giving 26 years in the time-series dataset. Also, only countries with more than one million inhabitants in all time periods were included. This resulted in a dataset with 139 countries with 3,540 observations. There was a problem with the life expectancy data collected from the World Bank. Morris (1979) uses life expectancy at age 1, whereas the World Bank publishes data of life expectancy at birth, so life expectancy at age 1 had to be constructed from the data for life expectancy at birth and infant mortality using the following formula: LE (1) = [LE (0) − ( IM × AVG ] ÷ (1 − IM ) (1) where LE(1) is life expectancy at age 1, LE(0) is life expectancy at birth, IM is infant mortality rate per thousand live births and AVG is the average time infants live who die in their first year of life.1 LE(1) gives more weight to the mortality rate of infants under one year old of age relative to the mortality rates of other age groups. Morris defines literacy as the population aged fifteen and over being able to read and write. This definition may not be suitable for all countries that may have defined literacy 1 This formula is the same used by Van der Lijn (1995). AVG needs to be further explained since it is a complicated measure. The AVG value is 0.25 for countries with an infant mortality rate of 100 per thousand live births or more, 0.5 for countries with an infant mortality rate of 10 per thousand live births or less and in between 0.25 and 0.5 for countries between these rates, which means, Albania had an AVG value of 0.35 in 1970 with an infant mortality rate of 0.065, but increased the AVG value in 2000 to 0.45 with an infant mortality rate of 0.020. 4 differently. The World Bank publishes data for illiteracy, the percentage of the illiterate people aged fifteen and above, so the data for adult literacy has been obtained by subtracting the illiteracy rate by 1. There are huge gaps in the literacy data, and the reason is that such data are not collected annually as the rate is assumed not to change significantly from one year to the next. This is also the reason why UNESCO collects such data in a five-year period. Where such gaps existed, linear interpolation was used to obtain data between known literacy values2. If there were no known values for, say year 1980, all years prior to 1980 were coded as missing. Likewise, linear interpolation was not used in cases where I did not have a value for the latest years, for example the years 1999 and 2000 are missing from Somalia. There is one exception to this rule for Guinea, where linear interpolation was used to obtain data from 1975 to 1990 by using the literacy rate back to 1965. Estimating missing data for such a long time span may result in imprecise rates, and it is done only for a few cases. Literacy rates for North Korea should also be read carefully since linear interpolation was used to estimate values between the years 1977 to 2000. Another problem was to estimate missing data for the many OECD countries where UNESCO does not have any recorded data. A similar method used by UNDP when constructing the HDI was used where all the OECD countries was given the literacy rate of 99 per cent. In cases where there actually were data available for such countries, these values where used except for countries with a literacy rate above 99 per cent like Tajikistan for the years 1998 to 2000. Since it is assumed that a literacy rate of 100 per cent cannot be obtained, 99 per cent is the highest available rate. For infant mortality rate, a few cases had missing values. The largest time span between these missing values were four years, so linear interpolation was used to estimate the values for these cases. In addition, four values were deleted and estimated instead by linear interpolation for Central African Republic for the years 1983 to 1986, probably due to errors in the World Bank data as the values were zero. These three indicators were then converted to indices ranging from 0 to 100 where 0 represented the worst and 100 the best performance. The index for life expectancy at age 1 2 Linear interpolation is an estimating method for missing data. To be able to estimate such data, one must assume that a phenomena changes in time. The rate of change must be, ideally, constant and one needs two time points in order to obtain the missing values. For the literacy rates, it was assumed that the rate of change was constant between the missing values, and the following formula was applied to obtain the slope of the line: y = (xH - XL) / (n + 1), where y is the slope, xH is the highest value and xL is the lowest value between the missing values, and n is the number of missing values between xH and xL. E.g., if one needs to find the two values between 3 and 4, one would first need to find the slope, or the change between these values: y = (4 - 3) / (2 + 1) = 0.33. So the next higher value to 3 is 3 + 0.33 = 3.33. 5 was calculated by using the highest and lowest recorded values in my dataset, 81 and 40 respectively.3 The index for infant mortality was calculated almost similarly with the highest and lowest recorded values in my dataset, 263 and 2.9 respectively.4 The literacy indicator was not rescaled. These three indices were then calculated to a composite indicator by averaging them, giving each indicator equal weights in order to obtain PQLI values. Note that PQLI rates have been calculated for some countries prior to their independence year, such as Estonia from 1975 to 1991. The reason that these data exist is basically due to recorded values for these countries from their sources, so there was no attempt to estimate them. As these values will contribute to a more balanced time-series dataset with fewer missing values, they are kept. 3 This differs somewhat from Morris’ (1979) methodology. Morris assumes that large improvements in life expectancy will only occur if there is a breakthrough in the study of geriatrics, and he uses 77 years, two years above the current best, as the upper limit and 38 years as the lower limit (Vietnam in 1950). I do not go beyond my time-series dataset to find lower or higher values. Japan has the highest recorded value for life expectancy at age 1 in 2000, and Rwanda has the lowest value in 1992. The formula applied by Morris for the 0 to 100 index is: Life expectancy at age one - 38 .39 where 38 is the worst recorded life expectancy value and .39 is a factor calculated by subtracting the lowest recorded value from the highest, divided by 100, so a change in life expectancy of .39 years will result in onepoint change in the index (ibid., pp. 45f). 4 Since the highest value for infant mortality rate represents the worst compared to the lowest value, which represents the best, the formula is somewhat different: 229 - infant mortality rate per thousand 2.22 where 229 is the worst recorded value Morris uses and 2.22 is a factor calculated by subtracting 7 (the best recorded value Morris uses minus one) from the worst value, divided by 100 (ibid., pp. 43ff). I use 263 and 2.9 as the worst and best values, Cambodia in 1977 and Singapore in 2000 respectively. 6 References Garåsen, E. (2006). Democracy and Development: A Comparative Analysis in Time and Space. Master’s Thesis, Department of Sociology and Political Science, Norwegian University of Science and Technology. Kurian, G.T. (1979). The Book of World Rankings. London: The Macmillian Press Ltd. Morris, D.M. (1979). Measuring the Conditions of the World’s Poor: The Physical Quality of Life Index. New York: Pergamon Press Inc. UNESCO (1980). Statistical Yearbook 1980. London: UNESCO UNESCO (1984). Statistical Yearbook 1984. Paris: The Unesco Press UNESCO (1991). Statistical Yearbook 1991. Paris: The Unesco Press UNESCO (1993). Statistical Yearbook 1990/91. New York: Department of Economic and Social Information and Policy Analysis, Statistical Division UNESCO (1994). Statistical Yearbook 1994. Paris: The Unesco Press UNESCO (1995). Statistical Yearbook 1995. Paris: UNESCO Publishing & Bernan Press UNDP (1999). Human Development Report 1999. New York: Oxford University Press UNDP (2000). Human Development Report 2000. New York: Oxford University Press UNDP (2002). Human Development Report 2002. New York: Oxford University Press UNICEF (1996). The State of the World’s Children 1996. New York: UNICEF UNICEF (2000). The State of the World’s Children 2000. New York: UNICEF UNICEF (2001). The State of the World’s Children 2001. New York: UNICEF UNICEF (2005). The State of the World’s Children 2005. New York: UNICEF Van der Lijn, N. (1995). Measuring well-being with social indicators, HDI, PQLI, and BWI for 133 countries for 1975, 1980, 1985, 1988, and 1992. Tilburg University, Faculty of Economics and Business Administration Research Memorandum, No. 704 World Bank (2002). World Development Indicators. New York: United Nations 7 Appendix COW 700 339 615 540 160 371 900 305 373 771 370 211 434 145 346 571 140 355 439 516 811 471 20 482 483 155 710 100 484 490 94 437 344 40 316 390 42 130 651 92 531 366 530 375 220 372 Country Comments Afghanistan Albania Algeria Angola Argentina Armenia Australia Austria Azerbaijan Bangladesh Belarus Belgium Benin Bolivia Bosnia Botswana Brazil Bulgaria Burkina Faso Burundi Cambodia Cameroon Canada Central African Republic Chad Chile China Colombia Congo, Republic of the Congo, Democratic Republic of the Costa Rica Cote d’Ivoire Croatia Cuba Czech Republic Denmark Dominican Republic Ecuador Egypt El Salvador Eritrea Estonia Ethiopia Finland France Georgia 8 255 452 350 90 438 41 91 310 750 850 630 645 205 666 325 51 740 663 705 501 731 732 703 812 367 660 570 450 620 368 343 580 553 820 432 435 70 359 712 600 541 775 790 210 920 93 436 475 385 770 Germany Ghana Greece Guatemala Guinea Haiti Honduras Hungary India Indonesia Iran Iraq Ireland Israel Italy Jamaica Japan Jordan Kazakhstan Kenya Korea, North Korea, South Kyrgyzstan Laos Latvia Lebanon Lesotho Liberia Libya Lithuania Macedonia Madagascar Malawi Malaysia Mali Mauritania Mexico Moldova Mongolia Morocco Mozambique Myanmar Nepal Netherlands New Zealand Nicaragua Niger Nigeria Norway Pakistan United 9 95 910 150 135 840 290 235 360 365 517 670 433 451 830 317 349 520 560 230 780 625 380 225 652 702 510 800 461 616 640 701 500 369 200 2 165 704 101 816 818 678 679 345 551 552 Panama Papua New Guinea Paraguay Peru Philippines Poland Portugal Romania Russia Rwanda Saudi Arabia Senegal Sierra Leone Singapore Slovakia Slovenia Somalia South Africa Spain Sri Lanka Sudan Sweden Switzerland Syria Tajikistan Tanzania Thailand Togo Tunisia Turkey Turkmenistan Uganda Ukraine United Kingdom United States Uruguay Uzbekistan Venezuela Vietnam, North Vietnam Yemen, North Yemen Yugoslavia Zambia Zimbabwe United United United 10