Census Data Archiving Seminar 20 – 23 September, 2011.

advertisement
Census Data Archiving Seminar
20th – 23rd September, 2011.
Ethopia
The Gambia’s Experience
Session 10: Archiving.
The Gambia
Bureaucase
Of statistics.
Session
10 – Country
studies, on
Census Da(GBOS)
Census data.
What is A Population Census?
• A Population Census is the official
enumeration of persons in a country at a
specific time.
• This enumeration also implies the collection,
compilation, evaluation, analysis and
publication of demographic, social and
economic statistics relating to the population.
Objectives of the census
• The objectives of the Census are to count all the people
in the country and to provide the Government with their
number in each Local Government Area and District, by
age, sex and several other characteristics.
• These figures are required for various aspects of
economic and development planning. The ultimate aim
of such planning is to provide a better way of life for the
people of the Gambia, and to conquer what have been
called the Five Giants: Disease, Ignorance, Squalor,
Idleness and Want.
• Census Data Archiving:• Provides a central access point for statistical
information of a country. With an increase
demand for census microdata from developing
countries particularly with emphasis on objective
monitoring and evaluation of poverty reduction
policies as well as tracking progress on the MDGs,
comes the need to archive census data to allow
easy access. The tools stores (Organise) present
data about Indicators and variables collected
during enumeration.
Geographic requirements
• The Geographic Area of the census, consists of :-
•
•
•
•
Country :
Regions :
Provinces :
Villages :
Level
Level
Level
Level
1
2
3
4
The Gambia
LGA’s
Districts
Settlements?
Databases
• Censuses:
•
•
•
•
•
1963
1973
1983
1993
2003
Not available
Hard copies
Archived on Tape
Archived on Tool kit and other formats
Archived on Tool kit and other formats
Getting Started
• Census Archiving, consists of the following
elements:
–Databases
–Questionaires
– manuals
– Programs
– Sample frames
– Urban / Rural definition
– All other technical documents involve
during the census process
The Enumeration & Receiving Of Questionaires from
the field”
• The training of enumerators and supervisors
• Deploy the selected enumerators and supervisors to the
field,and start pilot census enumeration, coding , data
entry and produce tables. Improve on your errors
• Start Actual Census enumeration
• Collect all completed EAs from the DCO’s
• Scrutinise the filled questionaire at random
• Count number of EAs for all LGAs from returning DCOs
• Collect the quick count form (GPC1) as summaries of
the enumerated population at settlement level.
Provisional census results were compiled. When the
preliminary figures were reveiwed and accepted, they
were released for publication.
•Archived census questionaires by LGAs on the shelves or
racks, in the archiving room.
•Place an inventory form on each EA, labeled clearly
•Enter date and sign whenever, the EA Folder is move
from the rack so as to keep track of the history of the EA
Movement from the archive room.
•IF EA folder is return to the archive room it should be
dated and sign (indicating return to rack).
Coding & Data Entry Processing
• Received EAs from the Questionaire Administrators
from the archiving room, enter date received and
signed
• Coders will be asign ID numbers to keep track of coders
errors during EA coding
• After coding is finished return Folder to archive room,
dated and signed on the inventory form as to keep
history and movement of the folder.
• Data entry clerks to received EAs (folders) from the QA
dated and sign as to start data entry keying. To return
the EA to storage room dated and sign after EA is
completed, to keep track of the EA movement.
• The process continues until all the EAs are coded and
data entry completed.
Data Cleaning & Tabulation
• Clean the datafiles till clean by EA’s
• Concatenate data again
• Draw your format tables and test all formats to be
• consistent with the table headings
• Tabulation & Analysis
• Produce the tables
• Hand over the tables to the Statisticians to check the
consistency. Correct errors till complete.
Report writing
• Topics will be divided to report writers
•
•
•
•
•
•
•
•
•
•
Each report writer will focus on his/her area
Census reports are as follows:
Spatial distribution
-Directory of Settlement
Fertility
-Economic Characteristics
Mortality
-Settlement Profile
Migration
-Census Atlas
Elderly
-Education
Disability
-Gender
Housing Characteristics -Methodology
Building
-etc
Archiving
Documents Preparation
• Convert all documents to PDF Format:
• Divide it into 5 Groups:• Data folder, Docs, Misc, Programs, and Technical
Documents
Folders
• Data Folder:
Census data in Spss, Cspro or
other formats.
• Programs:
Edits, Sums, Ap, Tab and Cn.
• Docs:
Fmts, Dcf, Manuals,
Questionaires, Batch ID, etc
• Technical Docs: Programs, Sample frame, etc
Microdata Management Toolkit
• Start by clicking the Metadata Editor
• Click on the census name eg.
Gam_census_household
• The study content will appear: i.e.
• Document Description, Study Description,
Datasets and External Resources
Document Description
• Study Title
•
POPULATION AND HOUSING CENSUS 2003
• Metadata Producer
•
GAMBIA BUREAU OF STATISTICS
•
•
LOLLEY KAH JALLOW
SAI JASSEH
• Date of Production
•
2007-02-27
• DDI Document Version
•
VERSION 1, 2003
• DDI Document ID Number
•
DDI-GMB-2003-001
• Under the document description you have the Metadata preparation
Study Description:
• Identification, Version, Overview, Scope,
• Coverage, Producers and Sponsors, Data
collection, Data Processing, Data Access,
• Disclaimer and copyright and Contacts
The description will be filled on all the above.
Identification
•
Title
•
POPULATION AND HOUSING CENSUS 2003
•
Subtitle
•
POP CEN 2003
•
Abbreviation
•
POPCEN2003
•
Study Type
•
Population and Housing Census [hh/popcen]
•
Series Information
•
1.1 Background
•
A population census is defined as the total process of collecting, compiling, evaluating, analysing and publishing or
otherwise disseminating demographic, economic and
social data pertaining, at a specified time, to all persons in a country or in a well-delimited part of the country. A
housing census is the total process of collecting,
compiling, evaluating, analysing and publishing or otherwise disseminating statistical data pertaining at a specified
time, to all living quarters and occupants thereof in a
country or in a well-delimited part of a country. The 2003 Population and Housing Census of The Gambia was in
accordance with these definitions. Further, it contained
all the essential features of a census, namely individual enumeration, universality within the country and
simultaneity.
•
•
•
•
of the Census
Version
•
Description
•
Version 1: Final datafile and census tables, but the final different reports of the
14 analyst not included. (not ready).
•
Production Date
•
2003-04-15
•
Notes
•
VERSION 1 , FINAL DATAFILE AND CENSUS TABLES. BUT THE FINAL 14 REPORTS
NOT INCLUDED. (NOT READY)
Overview:
•
Abstract
•
POPULATION AND HOUSING CENSUS 2003. CENSUSES ARE CONDUCTED EVERY TEN YEARS. TO
COUNT THE ACTUAL POPULATION OF THE COUNTRY. IT IS LATER USED FOR PLANNING PURPOSES
AND DECISION MAKING.
•
•
THE CENSUS WAS CONDUCTED 15TH APRIL 2003. THIS IS THE FIFTH CENSUS CONDUCTED IN THE
GAMBIA, FUNDED BY GAMBIA GOVERNMENT
THROUGH WORLD BANK. OTHER DONOR AGENCIES ALSO PROVIDE EQUIPMENT, E.G UNFPA.
•
Kind of Data
•
Census/enumeration data [cen]
•
Unit of Analysis
•
•
•
HOUSEHOLDS
INDIVIDUALS
Scope
Description of Scope
•
HOUSEHOLDS
•
INDIVIDUALS
•
BUILDINGS
Coverage
Geographic Coverage
– WHOLE COUNTRY
– Universe
– COVERS EVERYBODY
Producers & Sponsors
•
Primary Investigator
•
•
GAMBIA BUREAU OF STATISTICS
•
Other Producers
•
•
GAMBIA GOVERNMENT
WORLD BANK
•
Funding
•
•
GAMBIA GOVERNMENT
WORLD BANK
•
Other Acknowledgments
•
•
•
•
ALIEU S. M. NDOW
LOLLEY KAH JALLOW
GAMBIA GOVERNMENT
Data Collection
Dates of Collection
2003-04-15
Time Periods
2003-04-15
Mode of Data Collection
Face-to-face [f2f]
Notes on Data Collection
FEW RESPONDENTS REFUSED TO BE INTERVIEWED. DURATION OF INTERVIEW THREE WEEKS AND WE HAD, FEW CALL BACKS.
Questionnaires
FORM A- NORMAL HOUSEHOLD, FORM B- INSTITUTIONAL POPULATION, FORM C- BUILDING AND COMPOUND PARTICULARS
AND FORM G- GRADUATE CARD.
Supervision
THEY WERE SUPERVISED BASED ON THE CENSUS METHODOLOGY.
GPC 6 WAS A FORM FILLED BY SUPERVISORS FOR SPOT CHECKS IN THE COMPOUNDS. TO CHECKED THE LEVEL OF GOOD
ENUMERATION.
Data Processing
• Data Editing
• DATA CLEANING STARTED DURING DATA ENTRY
(VERIFICATION OF 100 PERCENT) STRUCTURE
CHECKS, RANGE CHECKS AND THE VALIDATION ON
VARIABLES. LOGIC CONTROL AND CONCOR EDIT
CHECKS.
Access
•
Access Authority
•
•
GAMBIA BUREAU OF STATISTICS
•
Confidentiality:
•
•
•
•
•
SORT APPROVAL FROM GOVERNMENT AND GBOS BEFORE DIESEMINATING THE DATA.
•
Access Conditions:
•
THIS DATA SHOULD NOT BE GIVEN FOR COMMERCIAL CONDITIONS ONLY FOR RESEARCH PURPOSES.
•
Citation Requirement:
•
•
POPULATION AND HOUSING CENSUS, 2003.
GAMBIA BUREAU OF STATISTICS
THE DATA SHALL NOT BE USED FOR ANY OTHER PURPOSE THAN THE ABOVE SPECIFIED REQUEST.
ALL COST INVOLVED IN MAKING THE DATA AVAILABLE SHALL BE THE RESPONSIBILITY OF THE REQUESTER.
Disclaimer and copyright:
• Disclaimer
• THE GAMBIA BUREAU OF STATISTICS (GBOS) PROVIDES THIS DATA TO
EXTERNAL USERS WITHOUT ANY RESPONSIBILITY IMPLIED. GBOS ACCEPTS
• NO RESPONSIBILITY FOR THE RESULTS FROM THE USE OF THIS DATA.
• Copyright
• USERS HAVE NO RIGHT TO COPY THE CD WITHOUT THE CONCENTS OF
GBOS.
Contacts
• Contact Persons
• ALIEU S.M. NDOW
• LOLLEY KAH JALLOW
SG
P.I.A
Dataset
• 2003 Household
• Variable Count
• 38
• Case Count
• 157,114
Dataset cont.
• 2003 Individual
• Variable Count
• 99
• Case Count
• 1,360,681
Dataset cont.
• 2003 Building
• Variable Count
• 501
• Case Count
• 112,200
File Description.
•
Contents
•
HOUSEHOLD DATA FILE 2003 CENSUS
•
Producer
•
GBOS
•
Version
•
VERSION 1 (FINAL)
•
Processing Checks
•
•
•
•
STRUCTURE CHECKS DURING AND AFTER DATA ENTRY
RANGE CHECKS DURING DATA ENTRY
CONSISTENCY CHECKS
VALIDATION CHECK ON VARIABLES
•
Missing Data
•
•
USING CODE 9 0R 99 0R 999 FOR BLANK
•
Notes
•
DATA PROCESSING WAS DONE WITH 100% VERIFICATION.
Datafile.
•
Contents
•
HOUSEHOLD DATA FILE 2003 CENSUS
•
Producer
•
GBOS
•
Version
•
VERSION 1 (FINAL)
•
Processing Checks
•
•
•
•
STRUCTURE CHECKS DURING AND AFTER DATA ENTRY
RANGE CHECKS DURING DATA ENTRY
CONSISTENCY CHECKS
VALIDATION CHECK ON VARIABLES
•
Missing Data
•
•
•
USING CODE 9 0R 99 0R 999 FOR BLANK
•
DATA PROCESSING WAS DONE WITH 100% VERIFICATION.
Notes
Variables (eg)
Lga
banjul
District
banjul_south
Vil
1
Ur/ru
urban
Ea
10001
Comp
001
Hh
01
etc for all the three questionnaires
External Resources
• Reports
• Questionnaires
• Technical Documents
• Programs
Reports
Resource
DOCS\2003 census topics.pdf
DOCS\COMPOUND IN SETTLEMENT.pdf
DOCS\Enumerator_manual.pdf
DOCS\OCCUPATION CODE.pdf
DOCS\INDUSTRY CODE.pdf
DOCS\DIRECTORYS04.pdf
Questionnaire
Resource
DOCS\CENSUS FORM A.pdf
Technical Documents
• Resource
• DOCS\TECHNICAL DOCUMENT\BUILDING.FMT
Programs
• Resource
• PROGRAMS\BUILD EDIT CN.pdf
Advantage of Archiving
• To lead in developing statistical data and information of high
quality, and to advance their effective use in both public and
private policy decision making
• A power tool that facilitates the process of releasing census
microdata to user community and allows the institution to
exercise control over its data.
• Data Access portal: provides a clear way to implement data
dissemination policies allowing full control to the publishing
institutions. It does so by providing information on data access
policies and access to procedural forms and licenses for :
• Public use file
• Licensed files
• Data enclave admission
Our Mission
•
The Gambia national data archive has been established to:
•
Promote best practice and international standards for the documentation of
microdata amongst data producers in the country
Provide equitable access to microdata in the interest of all citizens, by protecting
confidentiality and following international recommendations and good practice
Promote the effective use of existing census data for statistical and research
purposes thereby encouraging a diverse range of analytical work through
secondary research.
Ensure the long term preservation of microdata and the related metadata, and
their continued viability and usability in the future.
The Gambia National data Archive pursues these objectives within the framework
of the Statistical Act and the United Nations Fundamental Principles of Statistics.
Where microdata cannot be released due to confidentiality or other reasons, the
National Data Archive provides the public with detailed metadata and other
publicly available materials
•
•
•
•
Our activities
•
•
•
•
•
•
•
•
Activities:
Include the acquisition, documentation, anonymization,
dissemination, and preservation of micro-data and related metadata.
Acquisition:
Primarily established to archive census microdata produced by
GBOS and other official data producers. It also serve as a repository for non-official
datasets. Data producers interested in depositing data in the data archive are
invited to contact us.
Documentation:
Data documentation serves several important functions.
It helps data producers build institutional memory, and helps researchers to :
Find the data they are interested in
Locate the datasets and variables that meet their research requirements
Understand what the data are measuring and how the data have been created,
and assess their quaility
Understand the survey design and the methods used when collecting and
processing the data, thereby reducing the risk that data will be misunderstood or
misused.
The Gambia National Data Archive adopted the Data Documentation Initiative
(DDI) and the Dublin Core (DCMI) international metadata standards
Cont.
•
•
•
•
•
•
•
•
•
Anonymization: Gambia Bureau of statistics is charged with legal and ethical obligations to
protect the confidentiality of census respondent. The Gambia National Data archive aims to
protect confidentiality of the data by:
Restricted access to data that present a potential disclosure risk to scrutinized users only,
under formal conditions.
Anonymizing data when necessary, by altering or supressing variables which could potentially
identify a physical or legal individual. This may make the data less useful for analysts. The
Natinal Data Archive seeks to minimise the information loss while ensuring an acceptable
level of disclosure risk. Principles and methods applied for measuring the risk and for
anonymizing data are those provided or recommended by the International Household
Survey Network.
Dissemination: Data dissemination increases the quality, use and potential impact of data,
by:
1.
Making it possible for analytical work to be replicated, a critical step to good science ;
2.
Creating the potential to use old data to test new ideas;
3.
reducing the costs of data collection and the burden on respondents, by avoiding the
need for researchers to undertake their own surveys.
4.
Demonstrating transparency and credibility in data production, which are at the heart
of good governance; and
5.
Improving the relevance and quality of data by incorporating users feedback in future
data collection.
Cont.
•
Obviously, making microdata available also has down sides. It exposes data producers to
criticism, it increase the risk of breach to confidentiality, and it can result in conflicting outputs
being generated. Having faith in the ethical consuct of data users and in their willingness to
contribute to the quality and usefulness of the data, The Gambia National Data Archive
considers that the benefits outweigh the disadvantages. We insist however that
access to microdata must not be seen as a right. Access will only be permited to bona fide
users, and for statistical and research purposes only.
•
Preservation:
Micro-datasets can be damaged or lost because of human error, because of
technical problems, or because of disasters such as fire of flood. New technologies can also
render old data unreadable, because of either hardware or software advances. The Gambia
National Data Archive is implementing standard procedures for ensuring the physical security
and long-term usability of its resources, together with associated backup arrangements for
minimizing the impact of adverse events.
•
Policies and procedures:
Micro-datasets are categorised into three groups, according to the
sensitivity of their content and their inherent disclosure risks:
– Public use files:
made available on-line to all interested users, for research
– Licensed files:
involves a signed agreement between GNDA and external trusted
users, to permit them to access semi-anonymized datafiles.
– Files accessible on-site (data enclave):
sensitive data, access is only provided on
site in our data enclave under strict conditions, and only for research purposes.
– GNDA:
scrutinise the generated outputs in a full disclosure review before they are
released.
Way forward:
•
Seminar is timely as it will help the NSO’s to work on one platform, to help to
identify the core challenges in census data archiving
•
Help in formulating and implementing an effective archiving plan suited to the
needs and requirements of NSO’s
•
Seminar will help in identifying good practices and lessons learned in census data
archiving.
•
Statisticians to give out census documents on time to the archivers to start archiving on
time. (laugh)
•
DevInfo / Gaminfo can also be used for archiving census indicators and
disemination
•
Special thanks to my fellow participants, by working hard and sharing their country’s
experience on census data archiving.
•
Thanks to UNSD and UNECA for their idea in organising this timely seminar
• Bravo
End of Presentation By:
Mrs. Lolley Kah Jallow
Principal Information Analyst
Gambia Bureau Of Statistics.
(GBOS)
Thank You.
Download