Olivier Dupriez

advertisement
Olivier Dupriez
World Bank, Development Data Group
Manager, International Household Survey Network (IHSN)
Addis Ababa, September 23, 2011

Two main components
◦ Metadata Editor: a specialized software for
documenting any kind of microdata (surveys,
censuses, administrative records)
◦ NAtional Data Archive (NADA): an open source
application for cataloguing and dissemination
◦ (CD-Builder for dissemination)
◦ Compliant with the DDI/DCMI (XML) standards
(Data Documentation Initiative and Dublin Core)





XML metadata standards
Standard checklists of what you need to know
about a study and its dataset (DDI), and about the
related resources (DCMI)
DDI developed by academic data centers
Now used in most countries in the world, and by
various software applications (e.g. DevInfo, CsPro)
Two versions of DDI:
◦ Version 2.n (DDI codebook), used by the Toolkit
◦ Version 3.n (DDI life cycle)
“The National Statistics Office (NSO) of Popstan conducted the
Multiple Indicators Cluster Survey (MICS) with the financial
support of UNICEF. 5,000 households, representing the overall
population of the country, were randomly selected to participate
in the survey, following a two-stage stratified sampling
methodology. 4,900 of these households provided information.”
In XML this could look like this:
<titl> Multiple Indicator Cluster Survey 2005 </titl>
<altTitl> MICS 2005</altTitl>
<AuthEnty> National Statistics Office (NSO) </AuthEnty>
<fundAg abbr= "UNICEF">United Nations Children Fund </fundAg>
<nation> Popstan </nation>
<geogCover> National </geogCover>
<sampProc> 5,000 households, stratified two stages </sampProc>
<respRate> 98 percent </respRate>

Can be transformed into many kinds of
outputs:
◦
◦
◦
◦

HTML
PDF
Databases
Others
Plain text files  not specific to any operating
system or application (“durable” metadata)

Metadata Editor
◦ By Nesstar Ltd (“Nesstar Publisher”) with IHSN
support
◦ Now a freeware
◦ Development benefited from many users’ feedback
◦ Available at www.ihsn.org/toolkit

NADA, (CD-Builder)
◦ By the World Bank / IHSN
◦ Available at www.ihsn.org/nada
Skip demo
The Metadata Editor is a tool for
preparing and packaging your data and
metadata, not a tool for dissemination !
This DDI (+DCMI) file is ready to be “transformed”,
e.g. by being published in a NADA catalog.
Skip demo
 Replicability, transparency
 Visibility
 Credibility
 Institutional memory
 Knowledge generation (if disseminate
microdata)  increase and demonstrate the
value of data  more funding
 Satisfy a legal requirement in some countries
 Participate in Open Data / Data Liberation
movement
Reports, tables (PDF)
Web development tool
On-line tabulation (and
analysis) tool
REDATAM, SuperStar,
Nesstar, Tableau, etc
Indicators
CensusInfo, DevInfo, etc
Microdata (n% sample)
IHSN Metadata Editor and
NADA
Metadata
IHSN Metadata Editor and
NADA
Microdata, full, raw and
edited versions
IHSN Metadata Editor

Guidelines for documenting a dataset using
the IHSN Toolkit
http://www.ihsn.org/home/index.php?q=tools/documentation

Formulating an access policy and procedures
http://www.ihsn.org/home/index.php?q=focus/dissemina
tion-microdata-files-principles-procedures-and-practices

Long term preservation of data and metadata
◦ Based on OAIS “standard”
◦ Complex; useful as a “technical audit manual”
http://www.ihsn.org/home/index.php?q=tools/preservation


Country experience: Statistics Canada’s Data
Liberation Initiative (forthcoming)
Other IHSN manuals (being drafted):
◦ Producing public use census sample files
◦ Anonymizing microdata

Countries
◦ Comply with the DDI standard
◦ Produce sample dataset (n%) for public (free)
dissemination of microdata
◦ Publish a formal microdata management and
dissemination policy
◦ Assess your preservation policy/procedures
◦ Preserve all versions of your census data

International agencies
◦ Develop a central census catalog (UNSD?)
◦ Develop anonymization guidelines
◦ Support the establishment of data archives

Accelerated Data Program (PARIS21/WB)
◦ Training, technical support to data archiving
◦ Contacts:
 Olivier Dupriez at the World Bank
(odupriez@worldbank.org)
 Francois Fonteneau at PARIS21
(francois.fonteneau@oecd.org)
Download