D , : G ’

advertisement
DATA STORAGE, MAINTENANCE AND SECURITY
STRATEGY: GHANA’S EXPERIENCE
Presentation at the United Nations Regional Seminar
on Census Data Archiving for Africa,
Addis Ababa, Ethiopia
20 -23 September 2011
By
KB Danso-Manu
Data Processing Manager
Ghana Statistical Service
OUTLINE
Ghana’s 2010 Census
 Archiving Strategy & Anonymization
 Data Storage
 Data Security and Backup
 Data Access Policy

2
THE 2010 CENSUS OF GHANA





The 2010 Population and Housing Census was
conducted with 26th September, 2010 as the
reference point (Census Night).
Enumeration continued till December 2010 (mopup).
Data processing activities for the Summary Sheets
started in January 2011.
Provisional figures were released in February 2011,
giving the population of Ghana as 24.2m,
(24,223,431).
Males form 48.7% of the population, females 51.3%.
3
THE 2010 CENSUS OF GHANA





(CONT’D)
Form preparation of the census questionnaires
started in March 2011.
Production scanning of the census questionnaires
started in July 2011.
The ICR/OCR scanning technology is being used to
capture the census forms, with TeleForm as the form
processing software.
Meridio Record Management System is being used to
store the images of the census forms.
It is expected that the final results of the Ghana 2010
Census would be released by end of June 2012.
4
ARCHIVING STRATEGY AND MANAGEMENT



The GSS has setup the National Data Archive, which has adopted the
Data Documentation Initiative (DDI) and the Dublin Core (DCMI)
international metadata standards, since 2008.
Census micro data is archived by:
 Adopting the International Household Survey Network (IHSN)’s
standard procedures and recommendations for data archiving.
 Data is anonymized by altering or suppressing variables which could
potentially identify a respondent or establishment.
Some challenges faced are:
 Unavailability of census related documents (questionnaires,
manuals, codebook, etc.) at a centralized location.
 Lack of consistent or harmonized definitions, categorization,
classifications of variables among different censuses/surveys.
5
HOW DO WE ANONYMIZE DATA?
HIERARCHICAL CENSUS DATASET
RecType
Region District Locality
House
Hhold
PID
Var-1
Var-n
2
4
H
01
01
001
0001
01
P
01
01
001
0001
01
01
1
1
P
01
01
001
0001
01
02
2
1
P
01
01
001
0001
01
03
1
1
H
01
01
001
0002
01
P
01
01
001
0002
01
01
1
H
01
01
001
0002
02
H
01
01
001
0003
02
P
01
01
001
0003
02
01
2
P
01
01
001
0003
02
02
2
6
TYPICAL ANONYMIZED
RecType
DATASET
Region
District
Var-j
Var-k
Var-1
Var-n
H
01
01
3
00
2
4
P
01
01
3
01
1
1
P
01
01
3
02
2
1
P
01
01
3
03
1
1
H
01
01
5
00
P
01
01
5
01
H
01
01
7
00
H
01
01
9
00
P
01
01
9
01
2
P
01
01
9
02
2
1
Var-J: numbers serially the number of households in the
district, (using the required # digits)
7
OUR STORAGE TECHNOLOGY MAPPING

Storage system that is virtualization
environment aware.

Storage system that is application aware, can
determine how data is being access.
Storage data protection, de-duplication and
archive, looking at the quality of the duplication
data by understanding the data.
 Hardware support, we have three years
hardware replacement support from our
suppliers.

8
GSS DATACENTER SOLUTION ARCHITECTURE
9
GSS SERVER AND STORAGE INFRASTRUCTURE



A two node Hyper-V cluster with eight virtual servers
for image capturing and archiving applications:
All physical servers in the cluster can access the EMC
storage system in the datacenter.
Two virtual networks in our datacenter, iSCSI SAN Network
dedicated for storage and server data traffic, the GSS LAN
for user data processing data traffic.

The total storage capacity of the primary storage system is
24TB RAW.

16 TB Usable storage after redundancy and fault tolerant
configuration of the total 24TB RAW storage capacity.

4TB is deployed for applications data storage for each
virtual machine

12TB deployed on the SAN and mapped so that they appear
as local drives on the virtual content server to store capture
images and process data .

4TB is been used for instant backup recovery purpose using
snapshot technologies.

2TB External Drive used for offsite system state and
configurations
10
DATA SECURITY



The basic security issues include physical security
(e.g. stolen laptops), internal security (e.g. file
backups), external security (e.g. Internet security),
and integrity (e.g. audit trails).
Should smart phones be banned from data centers?
In order to address some of these security issues,
NSOs may adopt
record keeping mechanism,
 passwords, encryption, off-site storage, firewalls,
authorization, and authentication
 or some other method to block unauthorized users from
gaining access.

11
BACKUP AND DISASTER RECOVERY (DR)
 Daily
backup of our virtual machines and
encrypted copy of the process export are kept
on external drive that is kept at the
datacenter of the Ministry of Finance.
 The eGovernment Network Infrastructure will
be used as our backbone network to link to
remote branches.
 In addition, plans are been made to use DR
site of the eGovernment Datacenter in
Kumasi, a city about 400km from the national
capital.
12
DATA ACCESS POLICY


The Ghana Statistical Service as a public institution has
the obligation to promote data dissemination to facilitate
governance and national development.
There are three levels of access to archived census or
survey micro-data.
i.
ii.
iii.



Public use files - free from the internet
Licensed datasets – signed agreement
Datasets only accessible on location
Information products Online are free www.statsghana.gov.gh .
Published reports (hard copy) available at front
desk/Information section at nominal price.
Customized tables attract a token fee.
13
DATA ACCESS POLICY (CONT’D)
 For






Raw Data from surveys/census:
Make formal request;
Fill agreement form www.statsghana.gov.gh
Pay processing fee;
Dataset picked up or delivered through
mail/post.
Only 1% of census data is given out.
100% of survey data may be requested by
researchers.
14
REFERENCES
1.
2.
3.
www.statsghana.gov.gh\nada
Yaw Antwi-Adjei (Info Builders (Ghana) Limited)
www.infobuildersgh.com
Frenck Gyamfi, CSS & Partners
www.css-partners.com
4.
Virtual Statistical System,
www.virtualstatisticalsystem.org
15
Thank you for your
attention
16
Download