Presentation

advertisement
IPUMS & AICMD Add Value
to African Census Microdata
Robert McCaa and Patricia Kelly-Hall
ASSD VII, January, 2012
Cape Town, South Africa
***
ipums.org/international
ecastats.uneca.org/aicmd
rmccaa@umn.edu
for additional details, please see:
www.hist.umn.edu/~rmccaa/ipums-africa
1
“Dissemination
[means] opening up the
valueWorldwide
inherent in our data”
IPUMS-International:
Free,
--Walter Radermacher (President, Eurostat)
Access
Now
for Censuses of 62
and Microdata
Pieter Everaers
(Director,
Eurostat)
IPUMS+AICMD open up the value inherent in
microdata for censuses throughout
* * *Africa.
ipums.org/international
ecastats.uneca.org/aicmd
rmccaa@umn.edu
for additional details, please see:
www.hist.umn.edu/~rmccaa/ipums-africa
2
The purpose of this talk: “value added by 3rd parties”
1.
Encourage National Statistical Offices to entrust census
microdata samples to the IPUMS-International project


2.
Describe some of the value that IPUMS-International adds to
integrated microdata and metadata.




3.
1960-2000 samples
2010 round samples
Free access to the microdata for bona fide researchers
Extensive analysis of data quality before the samples are released
Integrated metadata (compare questions in 1, 2, … many censuses)
Integrated, pooled microdata (multiple censuses, countries)
Encourage usage of integrated samples by African
researchers

Usage is relatively low, but increasing quickly as more samples
become available
Central Statistics Office-Ireland
Deirdre Cullen, Senior Statistician, testimonial (not in the paper):
Advantages of IPUMS for
Ireland
• Bonus for CSO: as a result of this project, our
historic data sets are now in a much more
usable format
• IPUMS allows – mix of Census years available
in 1 file
• Comparability with other countries
• Ease of access for users
• Positive publicity for Census in Ireland
Outline

Introduction




IPUMS+AICMD adds value to population microdata:
1.
2.
3.
4.

When NSOs disseminate microdata, the task is costly, risky and often
unsatisfactory
IPUMS+AICMD partnership offers solution for African countries
Invitation to participate, entrust microdata for 2010 and earlier
censuses without undue delay
Statistical confidentiality and security – disclosure controls, restricted
access
Integration – census microdata and metadata
Dissemination – custom tailored extracts: country(ies), census(es),
populations, variables, sample density, metadata
Ethics - statistical transparency, academic freedom, responsible use,
sharing of results.
Reflections
6
Why Statistical Offices entrust Responsibility of
Disseminating Census Microdata to IPUMS-International
» NSO Dissemination is costly, risky and often unsatisfactory
» Costly: scarce human resources to prepare sample, assure statistical
confidentiality, and manage access for relatively few users (however
important they may be!)
» Risky: little experience in anonymizing and managing access to
microdata, yet great responsibility
» US Census Bureau anonymization protocol egregiously corrupted ages for elderly in
ACS microdata—took 5 years to discover the error!
» Unsatisfactory: excessive anonymization, slow to provide access.
Troublesome for NSO statisticians who do not wish to risk their job to
some academic.
Most deny access to all but the most persistent, influential would-be
users. Complaints (of a large European NSO):
» “I haven't used the [microdata]; the bureaucracy was just too slow to get much use out
of it.”
» “[Access] is unbelievably bureaucratic and difficult – this discourages people from
using it. It took me 6 months to get the data.”
IPUMS-International assumes responsibilities and risks
for integrating & disseminating microdata and metadata
»
Uniform Memorandum of Understanding with each NSO:
»
»
Founding partners (2001): Kenya, South Africa, Ghana, Egypt,
France, Spain, China, Vietnam, Kenya, Colombia, Mexico, USA …
now almost 100 countries
Specific conditions of access: ownership of data (NSO), use, access,
restrictions, confidentiality, security, publication, violations, sharing,
jurisdiction, and precedence.
» Almost 100 countries entrust census microdata to IPUMS-I.
» 6 most populous countries NOT entrusting census microdata
to IPUMS: India, *Nigeria, Russian Federation, Japan,
Algeria, *Korea (RO—may join at the UNSC in New York)
»
»
* = negotiating
No data: Congo (DR), Myanmar, Afghanistan, Uzbekistan, Somalia
8
90+ National Statistics Offices have endorsed the IPUMSInternational Memorandum of Understanding 9
IPUMS-International results posted at
http://bibliography.ipums.org
IPUMS Milestones
»
»
»
»
»
1995: IPUMS-USA first release of integrated microdata
1999: IPUMS-International funded by NSF & NIH
2002: 1st International launch: 7 countries, 25 samples.
2007 launch (56th ISI):
32
89
2009 launch (57th ISI):
44
130
» ~279 million person records
» ~3,000 registered users
» 2011 launch (58th ISI):
62
185
» 2013 (ISI Hong Kong!):
~70
~225
» 397 million person records
» 5,000 registered users
» ~500 million person records
» ~7,000 registered users
Cartogram of IPUMS+AICMD partners weighted by population
dark green = integrated and disseminating 2002-2011
Open
Invitation to
Cooperate ,
Entrust and
Access
Microdata
Microdata
Disseminating
None inventoried
Integrating
None entrusted
Disseminating
12
None
inv
The IPUMS-International team
(includes National Science Foundation Board)
Steven Ruggles, inventor of IPUMS,
Professor of History, and Director
of the Minnesota Population Center
(Not present: some computer gurus, researchers, research assistants, civil
service employees, and others who were not at the NSF Board meeting)
I. Statistical Confidentiality and Security
1. Statistical Confidentiality and Microdata Security
2. Statistical disclosure control protections
3. Restricted access
See, pp. 3-5:
2012: “IPUMS and AICMD Add Significant Value to African Census Microdata,” ASSD VII,
Cape Town, South Africa, January 2012.. .
.
14
NSI 1
….
NSI entrusts
census
metadata and
anonymized
microdata to
MPC
NSI …62+
MPC
MPC integrates
metadata and
confidentializes
microdata
samples
IPUMSInternational
IPUMS-International manages access
and entrusts researchers with customtailored <ddi> , SAS, STATA, and SPSS
metadata and microdata extracts for any
combination of countries, censuses,
sub-populations, and variables
….
Trusted
researcher
Trusted
researcher
1. Statistical Confidentiality and security.
Trusted researcher receives customized extracts
15
Dennis Trewin on-site evaluation.
former: Australian Statistician, chair: Conference of European
Statisticians Task Force on Microdata and Confidentiality
» “...the best practice for an international repository of
microdata”
» “The security of IPUMS is first class…the standard of the best
national statistical offices”
» “...a valuable and trustworthy microdata service. It meets the
fundamental principles of good practice with respect to
confidentiality and microdata.”
» “in full compliance with the principles and recommendations of
the CES [Conference of European Statisticians]”
2. Statistical Disclosure controls
1.
2.
3.
4.
•
•
•
Microdata are anonymized by suppressing any names,
addresses, or precise geographic identifiers.
Sample is drawn so that researchers have access to only a
minor fraction of the complete dataset.
Disclosure protections are imposed on the sample, variableby-variable and code-by-code.
A small fraction of households is swapped across geographic
boundaries.
See case of Switzerland with 5% household samples for four
censuses.
Suppression thresholds are set by each NSO.
Great satisfaction from NSOs and researchers
3. Restricted access: Thwarting intruders by legal and
administrative procedures
»
Usage is restricted to bona-fide researchers who agree to
stringent conditions of use to protect statistical confidentiality
»
»
»
»
»
»
1,100 word application form; <5,300 word Facebook policy
Agree to 8 specific conditions of use
Supply extensive personal and institution details
Identify your employer’s Office for Protection of Human Subject,
IRB, etc.
Describe research detailing need for access
Rogue intruders face legal and institutional sanctions
»
University attorney’s office is obligated to initiate sanctions against
both individual and the institution
—similar to NIH probationary status
Despite the “P” (Public) in IPUMS,
access to the microdata is
restricted.
Restricted Access: User Registration and Login
Links to Partner Statistical Agency Websites
19
Thwarting intruders by legal and administrative
procedures
»
Usage is restricted to bona-fide researchers who agree to
stringent conditions of use to protect statistical confidentiality
»
»
»
»
»
»
1,100 word application form; <5,300 word Facebook policy
Agree to 8 specific conditions of use
Supply extensive personal and institution details
Identify your employer’s
Office
for Protection
of Human Subject,
Application
form
for IPUMS-I
IRB,
etc.
requesting
information on institutional affiliation
Describe research detailing need for access
Rogue intruders face legal and institutional sanctions
»
University attorney’s office is obligated to initiate sanctions against
both individual and the institution
—similar to NIH probationary status
Conditions of use: must agree to each one--no exceptions

√
Data must not be redistributed without authorization.

√
The microdata are intended only for scholarly research and educational purposes.

√

√

√

√

√
√

All data extracted from the IPUMS-International database are intended solely for the use of the licensee. Under IPUMS-International agreements
with collaborating agencies, redistribution of the data to third parties is prohibited. Each member of a research team using the data must apply for
access and be licensed individually.
These microdata are provided for the exclusive purposes of teaching and scholarly research, and may not be used for any other purposes without
explicit written approval from the relevant official statistical authority.
Commercial use and redistribution of the microdata is strictly prohibited.
Users are prohibited from using microdata acquired from the Integrated Public Use Microdata Series International or other authorized distributors in
the pursuit of any commercial or income-generating venture either privately, or otherwise.
Use of the microdata must follow strict rules of confidentiality.
Users will maintain the confidentiality of persons and households. Any attempt to ascertain the identity of persons or households from the microdata
is prohibited. Alleging that a person or household has been identified in these data is also prohibited. Statistical results that might reveal the identity
of persons or entities may not be reported or published in any form.
The microdata must always be safely secured.
Users will implement security measures to prevent unauthorized access to microdata acquired from Integrated Public Use Microdata Series
International, its partners or authorized distributors. Upon the completion of this research, data may be retained only if they can be safely secured. If
security cannot be guaranteed, the microdata must be destroyed.
Scholarly publications are permitted, and must be cited appropriately.
The publishing of research results based on IPUMS-International microdata is permitted in communications such as scholarly papers, journals and
the like. The authors of these communications are required to cite Integrated Public Use Microdata Series-International and the relevant official
statistical authority as the source of the microdata, and to indicate that the results and views expressed are those of the author. Users are requested
to provide the IPUMS-International staff with a full citation for any publications resulting from their work with these data.
Any violation of this license agreement will result in disciplinary action, including possible loss of employment.
Violation of this agreement will lead to revocation of this license, recall of all microdata acquired, a motion of censure to the relevant professional
organization(s) and civil prosecution under national or international statutes, at the discretion of the Regents of the University of Minnesota and the
official statistical agencies. Sanctions likewise may be taken against the institution with which the violator is affiliated.
User agrees to notify ipums@pop.umn.edu regarding errors in the data.
II. Integration
4.
5.
6.
7.
8.
Comprehensive Source Metadata
Integrated, DDI Compatible Metadata
Integrated Microdata
IPUMS-I Value-Added Variables
Integrated Boundary Files
See, pp. 6-8:
2012: “IPUMS and AICMD Add Significant Value to African Census Microdata,” ASSD VII,
Cape Town, South Africa, January 2012.
22
4. Comprehensive Source Documents (forms,
instruction manuals)
--for
Linksintegrated
to Officialcensuses
Statistical Agency Partners
Bibliography: view cites, link to publications
23
24
5. DDI Compatible Metadata (we share!)
http://microdata.worldbank.org:
Mapped in DDI; compatible with IHSN Microdata toolkit
25
copies entered into the NADA catalog and archive
User Registration, conditions of use license
6. Integrated Metadata (Browse and Select Data
Download Data Extract (and <ddi> codebook)
Source documents (forms, instruction manuals)
Link to Official Statistical Agency home pages
Bibliography: view cites, link to publications
26
Integrated metadata: open access, dynamically
constructed. Example: Marital Status
Page is constructed dynamically
27
Integrated
IPUMS-I
Metadata:
Codes and
Frequencies
Detailed,
Case-Count
View
2 rules:
1. Retain
details
2. Harmonize
everything
Page is constructed dynamically
Displays currently selected samples
28
Integrated
IPUMS-I
Metadata:
Enumeration
text
View text in
English for
any
combination
of countries
and censuses.
2 documents:
First the form
Page is constructed dynamically
Displays currently selected samples
29
Integrated
IPUMS-I
Metadata:
Enumeration
text
View text in
English for
any
combination
of countries
and censuses.
2 documents:
First, the form;
then, the
enumeration
instructions
scroll down
for more
Page is constructed dynamically
Displays currently selected samples
30
7. Integrated Microdata (Table 2)
32 most popular integrated variables in IPUMS-International
(85,505 Sample Extracts)
Rank
Label
1 Educational attainment
2 Age (single years to 85+)
3 Employment status
4 Marital status
5 Person weight
6 Relationship to head
7 Sex
8 Class of work
9 Ownership of dwelling
10 Occupation ISCO recode
11 School attendance
12 Years of schooling
13 Literate
14 Urban/rural
15 Industry-general code
16 Household weight
Extracts Mnemonic Comment
19,307 EDATTAN
19,009 AGE
Grouped age n=3,838
18,490 EMPSTAT
18,214 MARST
17,511 WTPER
Technical variable
15,783 RELATE
14,595 SEX
12,583 CLASSWK
8,050 OWNRSHP
8,004 OCCISCO
7,919 SCHOOL
7,576 YRSCHL
7,290 LIT
7,098 URBAN
31
7,044 INDGEN
6,656 WTHH
Technical variable
Table 2. 32 most popular integrated variables in
IPUMS-International (85,505 Sample Extracts)
Rank
Label
17 Children ever born
18 Nativity (native/foreign born)
19 Occupation
20 Country of birth
21 Religion
22 Industry
23 Location of spouse in household
24 Rule for locating spouse
25 Location of mother in hh
26 Number of children surviving
27 Place of residence 5 years ago
28 Location of father in household
29 Total household income
30 Earned income
31 Number of rooms
32 Consensual union
Extracts Mnemonic
6,363 CHBORN
6,332 NATIVTY
6,246 OCC
6,153 BPLCTRY
6,075 RELIG
5,670 IND
5,007 SPLOC
4,171 SPRULE
4,153 MOMLOC
4,074 CHSURV
4,064 MGRATE5
3,983 POPLOC
3,965 INCTOT
3,655 INCEARN
3,465 ROOMS
3,443 CONSENS
Comment
IPUMS unique
IPUMS unique
IPUMS unique
IPUMS unique
Household variable
IPUMS unique32
Appendix D. 42 (of 60) Integrated Household Variables:
Availability for 13 African Countries (25 Censuses)
33
Appendix E. 88 (of 108) Integrated Person Variables:
Availability for 13 African Countries (25 Censuses)
34
8. GIS Boundary files (and other Data Files
Source documents (forms, instruction manuals)
Link to Official Statistical Agency home pages
Bibliography: view cites, link to publications
35
III. Dissemination
9. Trans-border Access
10. Custom-Tailored Extracts
11. Usage
12. 2010 Round Census Microdata
See, pp. 9-10:
2012: “IPUMS and AICMD Add Significant Value to African Census Microdata,” ASSD VII,
Cape Town, South Africa, January 2012.
.
36
9. Transborder access.
IPUMS-I Extracts by researcher’s place of identity
Place of Identity
United States
France
Spain
United Kingdom
Canada
Colombia
Brazil
Mexico
Singapore
Germany
Austria
Italy
Chile
Argentina
Switzerland
Belgium
Australia
Netherlands
China
Japan
Extracts
(N)
14,669
973
972
961
671
627
598
507
494
420
403
377
318
310
283
250
229
192
184
170
Samples
Extracted
(mean)
3.43
2.95
8.34
2.74
2.35
2.04
2.60
3.33
1.49
3.83
4.77
3.03
6.33
3.79
3.92
2.85
2.17
7.58
2.32
1.68
Institutions
(N)
295
39
23
41
35
16
22
28
4
31
8
27
6
18
10
3
12
8
25 37
19
Top 20 institutions using IPUMS-I (Appendix 4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
University of Michigan
Columbia University
Universitat de Barcelona, Spain
Harvard University
Inter - American Development Bank
Arizona State University
National University of Singapore, Singapore
World Bank
University of California - Berkeley
Universidade Federal de Minas Gerais, Brazil
University of Chicago
Universidad del Valle, Colombia
Institute for Health Metrics & Evaluation
Princeton University
University of Wisconsin - Madison
Brown University
University of Vienna, Austria
University of Pittsburgh
University of Delaware
El Colegio de México, México
742
701
615
589
499
495
467
408
362
314
285
270
260
237
234
229
229
227
213
214
38
Dissemination of microdata and metadata extracts
»
The massive scale of IPUMS requires users to be selective:
»
»
»
»
»
»
Once an extract request is submitted, the IPUMS extract
engine:
»
»
»
»
»
Select country (or countries)
Select samples (census years)
Select variables (e.g., age, sex, educational attainment, etc.)
Select sub-populations (e.g., nurses)
Select sample density
Constructs the microdata extract
Constructs the metadata
Emails the researcher to retrieve the extract
password protected, transmission is encrypted 128 bit SSL
The researcher downloads the extract, un-zips and analyzes
Extract system validated as usage has soared
10. Custom tailored extracts.
www.ipums.org/international:
a. Login with
password
d-1.
Download
extract (SSL
encrypted)
b-1. Study documentation
b-2. Create extract
c. Receive email;
logon with p/word
d-2. UnZip data
e. Analyze
using own
software
Use the extract system to “Select Cases”.
Example: Disability
Second: Click the box to include the variable
Third: Click “select cases” box
Fourth: Scroll down, select “disabled”, then
“Continue to next step”
Click here, to select
every person in
households containing
an individual with
employment disability
2010 round censuses. Minimum Standards for
Samples Entrusted to IPUMS for dissemination
1.
2.
3.
Household samples
High precision: 5% minimum, 10% preferred
Broad set of variables—omit only those required for
statistical confidentiality (low-level geography, low frequency
attributes)
Detailed codes
4.
»
»
»
»
Age: single year to 85
Occupation, industry: 3 digit ISCO, ISIC
Country of birth: detail individual countries consistent with statistical
confidentiality
Thanks to INSEE France for sample of recensement renovee,
2004-2008: 20 million person records launched in IPUMS-I
IV. Ethics
13. Statistical Transparency
14. Academic Freedom
15. Reduce Research Fraud and Exaggeration of Results
16. Share Research Results
See, pp. 11:
2012: “IPUMS and AICMD Add Significant Value to African Census Microdata,” ASSD VII,
Cape Town, South Africa, January 2012.
45
“IPUMS-I is an excellent resource for teaching…”
-- Dr. David Lam, president Population Association of America
1. Free, easy access to data for many countries and censuses
2. Large sample sizes:
• Make it possible to include many different variables in a
regression… multi-level model…
• Produce separate estimates for population sub-groups
• Easy to extract samples with a target sample size (e.g., 50mb)
• Easy to revise an extract for a larger size or to include more
countries, censuses, variables or sub-populations
2. Students show a great deal of creativity in using IPUMS-I
3. Skills acquired have an immediate pay-off when applying for
jobs (e.g., World Bank), graduate school, etc.
Africa Mirror Site: http://ecastats.uneca.org/aicmd/
47
IPUMS-International: Free, Worldwide
“Dissemination [means] opening up the value inherent in our data”
Microdata
Access
Now
for
Censuses
of
62
--Walter Radermacher (President, Eurostat)
Countries--80
by 2015
and Pieter Everaers
(Director, Eurostat)
Robert
McCaa,
Steven
Ruggles,
Matt Sobek
and Wendy
L. Thomas
IPUMS
opens
up the
value
inherent
in census
microdata.
for the 2010 round
Session STS065 The Future of Microdata Access
for the582000,
1990 and
earlier
rounds
(where
microdata
exist)
th International
Statistical
Institute,
Dublin,
Ireland,
26 August,
2011
And for many countries
***
ipums.org/international
ecastats.uneca.org/aicmd
rmccaa@umn.edu
for additional details, please see:
www.hist.umn.edu/~rmccaa/ipums-africa
48
Thank you
To discuss cooperation, please discuss with
Dr. Patricia Kelly-Hall or email:
rmccaa@umn.edu
To use integrated census microdata,
See: ipums.org/international or
ecastats.uneca.org/aicmd
Download