Design and Use of the IPUMSI Database

advertisement
Design and Use of the
IPUMS-International Data Series
http://international.ipums.org
Matt Sobek
Minnesota Population Center
sobek@pop.umn.edu
IPUMS-International
Overview
Processing
Dissemination system
Strengths and limitations
Users
END
https://international.ipums.org
Matt Sobek
Minnesota Population Center
sobek@pop.umn.edu
What is IPUMS-International?
Census data – 1960 to present
Samples – 1 to 10%, nationally representative
Microdata – individual-level
Integrated – consistent codes across time and place
Downloadable – anonymized
Extract system – select variables – pooled data
Map of IPUMS Partners
Dark green = disseminating data
Light green = partners, not yet disseminating
83 countries
Current Countries in IPUMS
Africa
Asia
Americas
Europe
Egypt
Ghana
Guinea
Kenya
Rwanda
South Africa
Uganda
Armenia
Cambodia
China
India
Iraq
Israel
Jordan
Kyrgyz Rep.
Malaysia
Mongolia
Palestine
Philippines
Vietnam
Argentina
Bolivia
Brazil
Canada
Chile
Colombia
Costa Rica
Ecuador
Mexico
Panama
United States
Venezuela
Austria
Belarus
France
Greece
Hungary
Italy
Netherlands
Portugal
Romania
Slovenia
Spain
United Kingdom
44 countries
130 samples
279 million
persons
Countries in IPUMS Archive
Bangladesh
Botswana
Cuba
Czech Republic
Dominican Rep.
El Salvador
Ethiopia
Fiji
Germany
Guatemala
Haiti
Honduras
Indonesia
Liberia
Madagascar
Malawi
Mali
Mauritius
Nepal
Nicaragua
Pakistan
Paraguay
Peru
Puerto Rico
Senegal
Saint Lucia
Sierra Leone
Sudan
Switzerland
Tanzania
Thailand
Turkmenistan
Uruguay
Zambia
IPUMS Microdata
Relation
to head
Marital
status
Literacy
Occupation
Availability of Selected Person Variables
(Number of samples)
Relationship to head
130
Religion
54
Age
130
Language
33
Sex
130
Ethnicity
41
Marital status
Age at first marriage
129
16
Race
School attendance
20
105
Children ever born
91
Literacy
Children surviving
Mother's mortality status
59
16
Education attainment
Years of schooling
119
72
Country of birth
81
Employment status
119
Place of birth
Citizenship
90
67
Class of worker
Occupation
120
116
Year of immigration
22
Industry
116
Migration, international
53
Hours worked weekly
38
Total income
Earned income
24
26
Migration, internal
Disability
101
32
91
Availability of Selected Household Variables
(Number of samples)
Urban-rural status
89
Geography, 1st level
120
Geography, 2nd level
86
Electricity
81
Water
95
Sewage
76
Home ownership
107
Toilet
86
Number of rooms
102
Cooking fuel
39
Floor material
46
Telephone
57
Wall material
40
Television
45
Roof material
27
Computer
16
Living Area
20
Automobiles
42
536 Integrated variables
10,600 Unharmonized variables
User Access
Application
•
Scholarly and educational purposes
•
Key: it must not be redistributed
Once approved, access to all data
Free
Making the IPUMS
Pre-processing
Integration
Dissemination
Making the IPUMS
Pre-processing
• Language translation
• Reformatting
• Error correction
• Sampling
• Confidentiality
Integration
Making the IPUMS
Pre-processing
• Language translation
• Reformatting
• Error correction
• Sampling
• Confidentiality
Integration
• Metadata
• Data harmonization
• Constructed variables
Census Questionnaire (Mexico 2000)
Water
Access
Text of Census Questionnaire (Mexico 2000)
5. Number of Rooms
How many rooms are used for sleeping without counting hallways?
_____ Write the number
Without counting the hallways or bathrooms how many total rooms are in this dwelling? Count
the kitchen
_____Write the number
6. Access to water
Read all of the options until you get an affirmative answer.
Circle only one answer
1 Running water inside the dwelling
2 Running water outside the dwelling but on the land
3 Running water from a public faucet or hydrant
4 Running water that is carried from another dwelling
5 Tanked in by truck
6 Water from a well, river, lake, stream or other
Answers 3, 4, 5, 6 continue with number 8
7. Water supply
How many days of the week is water available?
Circle only one answer
1 Daily
2 Every third day
3 Twice a week
4 Once a week
5 Occasionally
XML-Tagged Census Questionnaire (Mexico 2000)
Water access
Data Integration – Marital Status
MARST
Marital Status
China
1982
code
label
CN82A403
100
SINGLE/NEVER MARRIED
200
MARRIED/IN UNION
210
Married (not specified)
Colombia
1973
CO73A411
Kenya
1989
Mexico
1970
KN89A413
MX70A402
US90A425
1=never married 4=single
1=single
9=single
6=never married
2=married
3=monogamous
2=married
1=married
211
Civil
3=only civil
212
Religious
4=only religious
213
Civil and religious
2=civil and religious
214
Polygamous
220
300
310
3=polygamous
Consensual union
1=free union
SEPARATED/DIVORCED
Legally separated
322
De facto separated
5=free union
3=sep. or divorced
Separated
321
U.S.A.
1990
6=separated
8=separated
3=separated
5=divorced
7=divorced
4=divorced
5=widowed
330
Divorced
4=divorced
400
WIDOWED
3=widowed
5=widowed
4=widowed
6=widowed
999
UNKNOWN/MISSING
0=missing
6=unknown
B=blank
1=unknown
Family Interrelationship Variables
(Simple household)
Pernum
Relate
Age
Sex
Marst
Chborn
Spouse’s
Location
1
head
46
male
married
n/a
2
2
spouse
44
female
married
3
1
3
aunt
77
female
widow
7
0
4
child
15
female
single
0
0
5
child
13
female
single
n/a
0
6
child
11
male
single
n/a
0
Pernum
Relate
Age
Sex
Marst
Chborn
Mother’s
Location
Father’s
Location
1
head
46
male
married
n/a
0
0
2
spouse
44
female
married
3
0
0
3
aunt
77
female
widow
7
0
0
4
child
15
female
single
0
2
1
5
child
13
female
single
n/a
2
1
6
child
11
male
single
n/a
2
1
IPUMS “Pointer” Variables
(Complex household)
Pernum
Relationship
Age
Sex
Marst
Chborn
Spouse’s
Location
Mother’s
Location
Father’s
Location
1
head
53
female
separated
6
0
0
0
2
child
28
male
single
n/a
0
1
0
3
child
22
male
single
n/a
0
1
0
4
child
21
male
single
n/a
0
1
0
5
child
25
female
married
2
6
1
0
6
child-in-law
28
male
married
n/a
5
0
0
7
grandchild
3
male
single
n/a
0
5
6
8
grandchild
1
male
single
n/a
0
5
6
9
non-relative
32
female
separated
2
0
0
0
10
non-relative
10
male
single
n/a
0
9
0
11
non-relative
5
female
single
n/a
0
9
0
Family Interrelationship Pointers
13 censuses include data on location of parent or spouse
Agree Disagree
Under age 18
Spouse
99.5
0.5
Mother
98.7
1.3
Father
99.4
0.6
Mother
97.5
2.5
Father
98.7
1.3
IPUMS Home Page
Variables Page
Variables Page
Variables Page
Sample Filtering
Variables Page
Unharmonized Variables
Variable Description
(Marital status)
Comparability Discussion
(Marital status)
Enumeration Text
(Marital status)
Enumeration Text
(Marital status, Cambodia)
Variable Codes
(Marital status)
Variable Codes
(Marital status)
Variable Codes
(Marital status)
IPUMS Home Page
Extract Step 1 – Login
Extract Step 2 – Select Samples
Extract Step 3 – Select Variables
Extract Step 4 – Variable Options
Extract Step 4 – Select Cases
Extract Step 4 – Attach Characteristics
Age of spouse
Employment
status of father
Occupation of
father
Extract Step 5 – Customize Sample Sizes
Extract Step 5 – Customize Sample Sizes
Extract Step 5 – Customize Sample Sizes
Extract Step 6 – Submit
Download or Revise Extract
Key Strengths of the Census Samples
• Large
Enable study of relatively small populations
• Internationally comparable
Pool data across countries – integrated variables
• Temporal depth
Provide historical perspective
Key Strengths of the Census Samples
• Microdata
All of a person’s characteristics – multivariate analysis
• Hierarchical
Characteristics of everyone a person resided with
Cohabitation and family interrelationships
Limitations Due to Confidentiality
• Samples
Too small to answer some questions
• Geography
20,000 population or larger
• Sensitive variables, very small categories
Other Issues and Limitations
• Cross-sectional data
Not longitudinal
• User burden
Information overload; culturally specific knowledge
Variable labels are insufficient
IPUMS Users
2200 registered users
Academic field (%)
47 Economics
21 Demography
10 Sociology
22 Other
54% Graduate students
Samples Extracted
67% multiple samples
45% multiple countries
17% 5 or more countries
Decade of Extracted Sample
Decade
1960s
1970s
1980s
1990s
2000s
Percent
11
14
16
30
29
Most Frequently Extracted Countries
1.
2.
3.
4.
5.
Mexico
Brazil
United States
Colombia
France
6.
7.
8.
9.
10.
Chile
Ecuador
Vietnam
Kenya
Argentina
Most Frequently Extracted Variables
Relation to head
Age
Sex
Marital status
Educational attainment
Years of schooling
School attendance
Literacy
Employment status
Class of worker
Occupation recode
Industry recode
Occupation
Industry
Urban-rural status
Country of birth
Nativity status
Migration status, 5 years
Children ever born
Children surviving
Religion
Ownership of dwelling
Water
Electricity
Sewage
Number of rooms
Toilet
Earned income
Total income
Spouse’s location in household
Median Age by Country
Italy
42
Chile
29
Kyrgyz Republic 22
Greece
39
Argentina
27
Mongolia
21
Austria
38
Israel
27
Philippines
21
Hungary
38
Brazil
25
Bolivia
20
Portugal
38
China
25
Egypt
20
Canada
37
Colombia
25
Jordan
20
France
37
Costa Rica
24
Ghana
19
Netherlands
37
Mexico
24
Cambodia
17
Slovenia
37
Panama
24
Guinea
17
Spain
37
South Africa
24
Iraq
17
United Kingdom 37
Ecuador
23
Kenya
17
Belarus
36
Malaysia
23
Palestine
17
United States
36
Venezuela
23
Rwanda
17
Romania
35
Vietnam
23
Uganda
15
Armenia
31
India
22
(Calculated from the most recent sample from each country.)
Population Pyramids
Palestine
10
8
6
4
2
0
2
4
6
8
10
10
8
6
4
2
0
2
Egypt
Iraq
10
8
6
4
2
0
2
4
6
8
10
4
6
8
10
Population Pyramids
10
8
6
4
2
0
2
4
6
8
10
10
8
6
4
2
0
2
4
6
8
10
10
8
6
4
2
0
2
4
Young
Medium
Old
(Uganda 2002)
(Philippines 2000)
(USA 2005)
6
8
10
Population Pyramids
10
8
6
4
2
0
2
4
6
8
10
10
8
6
4
2
0
2
4
6
8
10
10
8
6
4
2
0
2
Belarus
Cambodia
China
1998
1998
1990
4
6
8
10
Population Pyramids
Mexico
10
8
6
4
2
0
2
1960
4
6
8
10
10
8
6
4
2
0
2
1990
4
6
8
10
10
8
6
4
2
0
2
2005
4
6
8
10
Married Female Labor Force Participation in Latin America
(age 18 to 65)
50
45
40
Brazil
Percent in Labor Force
35
30
Colombia
25
Venezuela
20
15
Chile
10
Mexico
Costa Rica
Ecuador
5
0
1960
1965
1970
1975
1980
1985
1990
1995
2000
2005
Married Female Labor Force Participation:
Latin America and U.S. (age 18 to 65)
70
60
Percent in Labor Force
50
40
United
States
30
20
Latin
America
10
0
1920
1930
1940
1950
1960
1970
1980
1990
2000
2010
Married Female Labor Force Participation:
Latin America and U.S. (age 18 to 65)
70
United
States
60
Percent in Labor Force
50
Brazil
40
Compare Latin
America to U.S.
40 years earlier
Colombia
30
Venezuela
20
Ecuador
Chile
Costa Rica
10
0
1920
Mexico
1930
1940
1950
1960
1970
1980
1990
2000
2010
Married Female Labor Force Participation:
Mexican-born Women, 1970-2000
70
60
Mexican-born Women
in United States
Percent in Labor Force
50
40
30
Women in
Mexico
20
10
0
1970
1975
1980
1985
1990
1995
2000
Males
Females
Persons age 16 to 65.
United States 1960
United States 1970
United States 1980
United States 1990
United States 2000
France 1962
France 1968
France 1975
France 1982
France 1990
South Africa 1996
South Africa 2001
Kenya 1989
Kenya 1999
Vietnam 1989
Vietnam 1999
China 1982
Venezuela 1971
Venezuela 1981
Venezuela 1990
Mexico 1970
Mexico 1990
Mexico 2000
Ecuador 1962
Ecuador 1974
Ecuador 1982
Ecuador 1990
Ecuador 2001
Costa Rica 1963
Costa Rica 1973
Costa Rica 1984
Costa Rica 2000
Colombia 1964
Colombia 1973
Colombia 1985
Colombia 1993
Chile 1960
Chile 1970
Chile 1982
Chile 1992
Chile 2002
Brazil 1960
Brazil 1970
Brazil 1980
Brazil 1991
Brazil 2000
Percent of Working-Age Population
Working-Age Population in the Labor Force, by Sex
100
90
80
70
60
50
40
30
20
10
0
Population Residing with an Elderly Person
30
20
15
10
5
Brazil
Colombia
Mexico
Kenya
Elderly persons (age 65+)
S Africa
China
Vietnam
France
Non-elderly residing with an elderly person
2000
1990
1980
1970
1960
1990
1982
1975
1968
1962
1999
1989
1982
2001
1996
1999
1989
2000
1990
1970
1993
1985
1973
2000
1991
1980
1970
0
1960
Percent of total population
25
United States
Percent of elders in elder-head intergenerational families
50
Argentina
Brazil
40
Chile
Colombia
Percent
Costa Rica
30
Ecuador
Kenya
20
Mexico
Philippines
Romania
10
Rwanda
Vietnam
South Africa
0
1970 1975 1980 1985 1990 1995 2000
Uganda
Venezuela
Percent of elders in younger-head families
50
Argentina
Brazil
40
Chile
Colombia
Percent
Costa Rica
30
Ecuador
Kenya
20
Mexico
Philippines
Romania
10
Rwanda
Vietnam
South Africa
0
1970 1975 1980 1985 1990 1995 2000
Uganda
Venezuela
Trends in Intergenerational Families

Intergenerational families headed by the older
generation are becoming more common in most
countries, with exceptions mainly in Africa.

Intergenerational families headed by the younger
generation—the configuration that suggests old-age
support—are much rarer, and they are on the decline in
most countries.
Persons with Completed Secondary Education:
National Populations Versus Migrants to the United States
100
90
80
70
Percent
60
50
40
30
20
10
0
Brazil
Chile
Costa Rica
Ecuador
In home country, ca. 2000
Mexico
Vietnam
Migrants to U.S. 1995-2000
Kenya
South Africa
Download