Census Data Archiving Seminar 20th – 23rd September, 2011. Ethopia The Gambia’s Experience Session 10: Archiving. The Gambia Bureaucase Of statistics. Session 10 – Country studies, on Census Da(GBOS) Census data. What is A Population Census? • A Population Census is the official enumeration of persons in a country at a specific time. • This enumeration also implies the collection, compilation, evaluation, analysis and publication of demographic, social and economic statistics relating to the population. Objectives of the census • The objectives of the Census are to count all the people in the country and to provide the Government with their number in each Local Government Area and District, by age, sex and several other characteristics. • These figures are required for various aspects of economic and development planning. The ultimate aim of such planning is to provide a better way of life for the people of the Gambia, and to conquer what have been called the Five Giants: Disease, Ignorance, Squalor, Idleness and Want. • Census Data Archiving:• Provides a central access point for statistical information of a country. With an increase demand for census microdata from developing countries particularly with emphasis on objective monitoring and evaluation of poverty reduction policies as well as tracking progress on the MDGs, comes the need to archive census data to allow easy access. The tools stores (Organise) present data about Indicators and variables collected during enumeration. Geographic requirements • The Geographic Area of the census, consists of :- • • • • Country : Regions : Provinces : Villages : Level Level Level Level 1 2 3 4 The Gambia LGA’s Districts Settlements? Databases • Censuses: • • • • • 1963 1973 1983 1993 2003 Not available Hard copies Archived on Tape Archived on Tool kit and other formats Archived on Tool kit and other formats Getting Started • Census Archiving, consists of the following elements: –Databases –Questionaires – manuals – Programs – Sample frames – Urban / Rural definition – All other technical documents involve during the census process The Enumeration & Receiving Of Questionaires from the field” • The training of enumerators and supervisors • Deploy the selected enumerators and supervisors to the field,and start pilot census enumeration, coding , data entry and produce tables. Improve on your errors • Start Actual Census enumeration • Collect all completed EAs from the DCO’s • Scrutinise the filled questionaire at random • Count number of EAs for all LGAs from returning DCOs • Collect the quick count form (GPC1) as summaries of the enumerated population at settlement level. Provisional census results were compiled. When the preliminary figures were reveiwed and accepted, they were released for publication. •Archived census questionaires by LGAs on the shelves or racks, in the archiving room. •Place an inventory form on each EA, labeled clearly •Enter date and sign whenever, the EA Folder is move from the rack so as to keep track of the history of the EA Movement from the archive room. •IF EA folder is return to the archive room it should be dated and sign (indicating return to rack). Coding & Data Entry Processing • Received EAs from the Questionaire Administrators from the archiving room, enter date received and signed • Coders will be asign ID numbers to keep track of coders errors during EA coding • After coding is finished return Folder to archive room, dated and signed on the inventory form as to keep history and movement of the folder. • Data entry clerks to received EAs (folders) from the QA dated and sign as to start data entry keying. To return the EA to storage room dated and sign after EA is completed, to keep track of the EA movement. • The process continues until all the EAs are coded and data entry completed. Data Cleaning & Tabulation • Clean the datafiles till clean by EA’s • Concatenate data again • Draw your format tables and test all formats to be • consistent with the table headings • Tabulation & Analysis • Produce the tables • Hand over the tables to the Statisticians to check the consistency. Correct errors till complete. Report writing • Topics will be divided to report writers • • • • • • • • • • Each report writer will focus on his/her area Census reports are as follows: Spatial distribution -Directory of Settlement Fertility -Economic Characteristics Mortality -Settlement Profile Migration -Census Atlas Elderly -Education Disability -Gender Housing Characteristics -Methodology Building -etc Archiving Documents Preparation • Convert all documents to PDF Format: • Divide it into 5 Groups:• Data folder, Docs, Misc, Programs, and Technical Documents Folders • Data Folder: Census data in Spss, Cspro or other formats. • Programs: Edits, Sums, Ap, Tab and Cn. • Docs: Fmts, Dcf, Manuals, Questionaires, Batch ID, etc • Technical Docs: Programs, Sample frame, etc Microdata Management Toolkit • Start by clicking the Metadata Editor • Click on the census name eg. Gam_census_household • The study content will appear: i.e. • Document Description, Study Description, Datasets and External Resources Document Description • Study Title • POPULATION AND HOUSING CENSUS 2003 • Metadata Producer • GAMBIA BUREAU OF STATISTICS • • LOLLEY KAH JALLOW SAI JASSEH • Date of Production • 2007-02-27 • DDI Document Version • VERSION 1, 2003 • DDI Document ID Number • DDI-GMB-2003-001 • Under the document description you have the Metadata preparation Study Description: • Identification, Version, Overview, Scope, • Coverage, Producers and Sponsors, Data collection, Data Processing, Data Access, • Disclaimer and copyright and Contacts The description will be filled on all the above. Identification • Title • POPULATION AND HOUSING CENSUS 2003 • Subtitle • POP CEN 2003 • Abbreviation • POPCEN2003 • Study Type • Population and Housing Census [hh/popcen] • Series Information • 1.1 Background • A population census is defined as the total process of collecting, compiling, evaluating, analysing and publishing or otherwise disseminating demographic, economic and social data pertaining, at a specified time, to all persons in a country or in a well-delimited part of the country. A housing census is the total process of collecting, compiling, evaluating, analysing and publishing or otherwise disseminating statistical data pertaining at a specified time, to all living quarters and occupants thereof in a country or in a well-delimited part of a country. The 2003 Population and Housing Census of The Gambia was in accordance with these definitions. Further, it contained all the essential features of a census, namely individual enumeration, universality within the country and simultaneity. • • • • of the Census Version • Description • Version 1: Final datafile and census tables, but the final different reports of the 14 analyst not included. (not ready). • Production Date • 2003-04-15 • Notes • VERSION 1 , FINAL DATAFILE AND CENSUS TABLES. BUT THE FINAL 14 REPORTS NOT INCLUDED. (NOT READY) Overview: • Abstract • POPULATION AND HOUSING CENSUS 2003. CENSUSES ARE CONDUCTED EVERY TEN YEARS. TO COUNT THE ACTUAL POPULATION OF THE COUNTRY. IT IS LATER USED FOR PLANNING PURPOSES AND DECISION MAKING. • • THE CENSUS WAS CONDUCTED 15TH APRIL 2003. THIS IS THE FIFTH CENSUS CONDUCTED IN THE GAMBIA, FUNDED BY GAMBIA GOVERNMENT THROUGH WORLD BANK. OTHER DONOR AGENCIES ALSO PROVIDE EQUIPMENT, E.G UNFPA. • Kind of Data • Census/enumeration data [cen] • Unit of Analysis • • • HOUSEHOLDS INDIVIDUALS Scope Description of Scope • HOUSEHOLDS • INDIVIDUALS • BUILDINGS Coverage Geographic Coverage – WHOLE COUNTRY – Universe – COVERS EVERYBODY Producers & Sponsors • Primary Investigator • • GAMBIA BUREAU OF STATISTICS • Other Producers • • GAMBIA GOVERNMENT WORLD BANK • Funding • • GAMBIA GOVERNMENT WORLD BANK • Other Acknowledgments • • • • ALIEU S. M. NDOW LOLLEY KAH JALLOW GAMBIA GOVERNMENT Data Collection Dates of Collection 2003-04-15 Time Periods 2003-04-15 Mode of Data Collection Face-to-face [f2f] Notes on Data Collection FEW RESPONDENTS REFUSED TO BE INTERVIEWED. DURATION OF INTERVIEW THREE WEEKS AND WE HAD, FEW CALL BACKS. Questionnaires FORM A- NORMAL HOUSEHOLD, FORM B- INSTITUTIONAL POPULATION, FORM C- BUILDING AND COMPOUND PARTICULARS AND FORM G- GRADUATE CARD. Supervision THEY WERE SUPERVISED BASED ON THE CENSUS METHODOLOGY. GPC 6 WAS A FORM FILLED BY SUPERVISORS FOR SPOT CHECKS IN THE COMPOUNDS. TO CHECKED THE LEVEL OF GOOD ENUMERATION. Data Processing • Data Editing • DATA CLEANING STARTED DURING DATA ENTRY (VERIFICATION OF 100 PERCENT) STRUCTURE CHECKS, RANGE CHECKS AND THE VALIDATION ON VARIABLES. LOGIC CONTROL AND CONCOR EDIT CHECKS. Access • Access Authority • • GAMBIA BUREAU OF STATISTICS • Confidentiality: • • • • • SORT APPROVAL FROM GOVERNMENT AND GBOS BEFORE DIESEMINATING THE DATA. • Access Conditions: • THIS DATA SHOULD NOT BE GIVEN FOR COMMERCIAL CONDITIONS ONLY FOR RESEARCH PURPOSES. • Citation Requirement: • • POPULATION AND HOUSING CENSUS, 2003. GAMBIA BUREAU OF STATISTICS THE DATA SHALL NOT BE USED FOR ANY OTHER PURPOSE THAN THE ABOVE SPECIFIED REQUEST. ALL COST INVOLVED IN MAKING THE DATA AVAILABLE SHALL BE THE RESPONSIBILITY OF THE REQUESTER. Disclaimer and copyright: • Disclaimer • THE GAMBIA BUREAU OF STATISTICS (GBOS) PROVIDES THIS DATA TO EXTERNAL USERS WITHOUT ANY RESPONSIBILITY IMPLIED. GBOS ACCEPTS • NO RESPONSIBILITY FOR THE RESULTS FROM THE USE OF THIS DATA. • Copyright • USERS HAVE NO RIGHT TO COPY THE CD WITHOUT THE CONCENTS OF GBOS. Contacts • Contact Persons • ALIEU S.M. NDOW • LOLLEY KAH JALLOW SG P.I.A Dataset • 2003 Household • Variable Count • 38 • Case Count • 157,114 Dataset cont. • 2003 Individual • Variable Count • 99 • Case Count • 1,360,681 Dataset cont. • 2003 Building • Variable Count • 501 • Case Count • 112,200 File Description. • Contents • HOUSEHOLD DATA FILE 2003 CENSUS • Producer • GBOS • Version • VERSION 1 (FINAL) • Processing Checks • • • • STRUCTURE CHECKS DURING AND AFTER DATA ENTRY RANGE CHECKS DURING DATA ENTRY CONSISTENCY CHECKS VALIDATION CHECK ON VARIABLES • Missing Data • • USING CODE 9 0R 99 0R 999 FOR BLANK • Notes • DATA PROCESSING WAS DONE WITH 100% VERIFICATION. Datafile. • Contents • HOUSEHOLD DATA FILE 2003 CENSUS • Producer • GBOS • Version • VERSION 1 (FINAL) • Processing Checks • • • • STRUCTURE CHECKS DURING AND AFTER DATA ENTRY RANGE CHECKS DURING DATA ENTRY CONSISTENCY CHECKS VALIDATION CHECK ON VARIABLES • Missing Data • • • USING CODE 9 0R 99 0R 999 FOR BLANK • DATA PROCESSING WAS DONE WITH 100% VERIFICATION. Notes Variables (eg) Lga banjul District banjul_south Vil 1 Ur/ru urban Ea 10001 Comp 001 Hh 01 etc for all the three questionnaires External Resources • Reports • Questionnaires • Technical Documents • Programs Reports Resource DOCS\2003 census topics.pdf DOCS\COMPOUND IN SETTLEMENT.pdf DOCS\Enumerator_manual.pdf DOCS\OCCUPATION CODE.pdf DOCS\INDUSTRY CODE.pdf DOCS\DIRECTORYS04.pdf Questionnaire Resource DOCS\CENSUS FORM A.pdf Technical Documents • Resource • DOCS\TECHNICAL DOCUMENT\BUILDING.FMT Programs • Resource • PROGRAMS\BUILD EDIT CN.pdf Advantage of Archiving • To lead in developing statistical data and information of high quality, and to advance their effective use in both public and private policy decision making • A power tool that facilitates the process of releasing census microdata to user community and allows the institution to exercise control over its data. • Data Access portal: provides a clear way to implement data dissemination policies allowing full control to the publishing institutions. It does so by providing information on data access policies and access to procedural forms and licenses for : • Public use file • Licensed files • Data enclave admission Our Mission • The Gambia national data archive has been established to: • Promote best practice and international standards for the documentation of microdata amongst data producers in the country Provide equitable access to microdata in the interest of all citizens, by protecting confidentiality and following international recommendations and good practice Promote the effective use of existing census data for statistical and research purposes thereby encouraging a diverse range of analytical work through secondary research. Ensure the long term preservation of microdata and the related metadata, and their continued viability and usability in the future. The Gambia National data Archive pursues these objectives within the framework of the Statistical Act and the United Nations Fundamental Principles of Statistics. Where microdata cannot be released due to confidentiality or other reasons, the National Data Archive provides the public with detailed metadata and other publicly available materials • • • • Our activities • • • • • • • • Activities: Include the acquisition, documentation, anonymization, dissemination, and preservation of micro-data and related metadata. Acquisition: Primarily established to archive census microdata produced by GBOS and other official data producers. It also serve as a repository for non-official datasets. Data producers interested in depositing data in the data archive are invited to contact us. Documentation: Data documentation serves several important functions. It helps data producers build institutional memory, and helps researchers to : Find the data they are interested in Locate the datasets and variables that meet their research requirements Understand what the data are measuring and how the data have been created, and assess their quaility Understand the survey design and the methods used when collecting and processing the data, thereby reducing the risk that data will be misunderstood or misused. The Gambia National Data Archive adopted the Data Documentation Initiative (DDI) and the Dublin Core (DCMI) international metadata standards Cont. • • • • • • • • • Anonymization: Gambia Bureau of statistics is charged with legal and ethical obligations to protect the confidentiality of census respondent. The Gambia National Data archive aims to protect confidentiality of the data by: Restricted access to data that present a potential disclosure risk to scrutinized users only, under formal conditions. Anonymizing data when necessary, by altering or supressing variables which could potentially identify a physical or legal individual. This may make the data less useful for analysts. The Natinal Data Archive seeks to minimise the information loss while ensuring an acceptable level of disclosure risk. Principles and methods applied for measuring the risk and for anonymizing data are those provided or recommended by the International Household Survey Network. Dissemination: Data dissemination increases the quality, use and potential impact of data, by: 1. Making it possible for analytical work to be replicated, a critical step to good science ; 2. Creating the potential to use old data to test new ideas; 3. reducing the costs of data collection and the burden on respondents, by avoiding the need for researchers to undertake their own surveys. 4. Demonstrating transparency and credibility in data production, which are at the heart of good governance; and 5. Improving the relevance and quality of data by incorporating users feedback in future data collection. Cont. • Obviously, making microdata available also has down sides. It exposes data producers to criticism, it increase the risk of breach to confidentiality, and it can result in conflicting outputs being generated. Having faith in the ethical consuct of data users and in their willingness to contribute to the quality and usefulness of the data, The Gambia National Data Archive considers that the benefits outweigh the disadvantages. We insist however that access to microdata must not be seen as a right. Access will only be permited to bona fide users, and for statistical and research purposes only. • Preservation: Micro-datasets can be damaged or lost because of human error, because of technical problems, or because of disasters such as fire of flood. New technologies can also render old data unreadable, because of either hardware or software advances. The Gambia National Data Archive is implementing standard procedures for ensuring the physical security and long-term usability of its resources, together with associated backup arrangements for minimizing the impact of adverse events. • Policies and procedures: Micro-datasets are categorised into three groups, according to the sensitivity of their content and their inherent disclosure risks: – Public use files: made available on-line to all interested users, for research – Licensed files: involves a signed agreement between GNDA and external trusted users, to permit them to access semi-anonymized datafiles. – Files accessible on-site (data enclave): sensitive data, access is only provided on site in our data enclave under strict conditions, and only for research purposes. – GNDA: scrutinise the generated outputs in a full disclosure review before they are released. Way forward: • Seminar is timely as it will help the NSO’s to work on one platform, to help to identify the core challenges in census data archiving • Help in formulating and implementing an effective archiving plan suited to the needs and requirements of NSO’s • Seminar will help in identifying good practices and lessons learned in census data archiving. • Statisticians to give out census documents on time to the archivers to start archiving on time. (laugh) • DevInfo / Gaminfo can also be used for archiving census indicators and disemination • Special thanks to my fellow participants, by working hard and sharing their country’s experience on census data archiving. • Thanks to UNSD and UNECA for their idea in organising this timely seminar • Bravo End of Presentation By: Mrs. Lolley Kah Jallow Principal Information Analyst Gambia Bureau Of Statistics. (GBOS) Thank You.