DATA STORAGE, MAINTENANCE AND SECURITY STRATEGY: GHANA’S EXPERIENCE Presentation at the United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia 20 -23 September 2011 By KB Danso-Manu Data Processing Manager Ghana Statistical Service OUTLINE Ghana’s 2010 Census Archiving Strategy & Anonymization Data Storage Data Security and Backup Data Access Policy 2 THE 2010 CENSUS OF GHANA The 2010 Population and Housing Census was conducted with 26th September, 2010 as the reference point (Census Night). Enumeration continued till December 2010 (mopup). Data processing activities for the Summary Sheets started in January 2011. Provisional figures were released in February 2011, giving the population of Ghana as 24.2m, (24,223,431). Males form 48.7% of the population, females 51.3%. 3 THE 2010 CENSUS OF GHANA (CONT’D) Form preparation of the census questionnaires started in March 2011. Production scanning of the census questionnaires started in July 2011. The ICR/OCR scanning technology is being used to capture the census forms, with TeleForm as the form processing software. Meridio Record Management System is being used to store the images of the census forms. It is expected that the final results of the Ghana 2010 Census would be released by end of June 2012. 4 ARCHIVING STRATEGY AND MANAGEMENT The GSS has setup the National Data Archive, which has adopted the Data Documentation Initiative (DDI) and the Dublin Core (DCMI) international metadata standards, since 2008. Census micro data is archived by: Adopting the International Household Survey Network (IHSN)’s standard procedures and recommendations for data archiving. Data is anonymized by altering or suppressing variables which could potentially identify a respondent or establishment. Some challenges faced are: Unavailability of census related documents (questionnaires, manuals, codebook, etc.) at a centralized location. Lack of consistent or harmonized definitions, categorization, classifications of variables among different censuses/surveys. 5 HOW DO WE ANONYMIZE DATA? HIERARCHICAL CENSUS DATASET RecType Region District Locality House Hhold PID Var-1 Var-n 2 4 H 01 01 001 0001 01 P 01 01 001 0001 01 01 1 1 P 01 01 001 0001 01 02 2 1 P 01 01 001 0001 01 03 1 1 H 01 01 001 0002 01 P 01 01 001 0002 01 01 1 H 01 01 001 0002 02 H 01 01 001 0003 02 P 01 01 001 0003 02 01 2 P 01 01 001 0003 02 02 2 6 TYPICAL ANONYMIZED RecType DATASET Region District Var-j Var-k Var-1 Var-n H 01 01 3 00 2 4 P 01 01 3 01 1 1 P 01 01 3 02 2 1 P 01 01 3 03 1 1 H 01 01 5 00 P 01 01 5 01 H 01 01 7 00 H 01 01 9 00 P 01 01 9 01 2 P 01 01 9 02 2 1 Var-J: numbers serially the number of households in the district, (using the required # digits) 7 OUR STORAGE TECHNOLOGY MAPPING Storage system that is virtualization environment aware. Storage system that is application aware, can determine how data is being access. Storage data protection, de-duplication and archive, looking at the quality of the duplication data by understanding the data. Hardware support, we have three years hardware replacement support from our suppliers. 8 GSS DATACENTER SOLUTION ARCHITECTURE 9 GSS SERVER AND STORAGE INFRASTRUCTURE A two node Hyper-V cluster with eight virtual servers for image capturing and archiving applications: All physical servers in the cluster can access the EMC storage system in the datacenter. Two virtual networks in our datacenter, iSCSI SAN Network dedicated for storage and server data traffic, the GSS LAN for user data processing data traffic. The total storage capacity of the primary storage system is 24TB RAW. 16 TB Usable storage after redundancy and fault tolerant configuration of the total 24TB RAW storage capacity. 4TB is deployed for applications data storage for each virtual machine 12TB deployed on the SAN and mapped so that they appear as local drives on the virtual content server to store capture images and process data . 4TB is been used for instant backup recovery purpose using snapshot technologies. 2TB External Drive used for offsite system state and configurations 10 DATA SECURITY The basic security issues include physical security (e.g. stolen laptops), internal security (e.g. file backups), external security (e.g. Internet security), and integrity (e.g. audit trails). Should smart phones be banned from data centers? In order to address some of these security issues, NSOs may adopt record keeping mechanism, passwords, encryption, off-site storage, firewalls, authorization, and authentication or some other method to block unauthorized users from gaining access. 11 BACKUP AND DISASTER RECOVERY (DR) Daily backup of our virtual machines and encrypted copy of the process export are kept on external drive that is kept at the datacenter of the Ministry of Finance. The eGovernment Network Infrastructure will be used as our backbone network to link to remote branches. In addition, plans are been made to use DR site of the eGovernment Datacenter in Kumasi, a city about 400km from the national capital. 12 DATA ACCESS POLICY The Ghana Statistical Service as a public institution has the obligation to promote data dissemination to facilitate governance and national development. There are three levels of access to archived census or survey micro-data. i. ii. iii. Public use files - free from the internet Licensed datasets – signed agreement Datasets only accessible on location Information products Online are free www.statsghana.gov.gh . Published reports (hard copy) available at front desk/Information section at nominal price. Customized tables attract a token fee. 13 DATA ACCESS POLICY (CONT’D) For Raw Data from surveys/census: Make formal request; Fill agreement form www.statsghana.gov.gh Pay processing fee; Dataset picked up or delivered through mail/post. Only 1% of census data is given out. 100% of survey data may be requested by researchers. 14 REFERENCES 1. 2. 3. www.statsghana.gov.gh\nada Yaw Antwi-Adjei (Info Builders (Ghana) Limited) www.infobuildersgh.com Frenck Gyamfi, CSS & Partners www.css-partners.com 4. Virtual Statistical System, www.virtualstatisticalsystem.org 15 Thank you for your attention 16