attachment_id=271

advertisement
Developing Geographical
Information Systems In A Cohort
Study
Andy Boyd
ALSPAC, Social Medicine
University of Bristol
Geographical Data Matching
- the ALSPAC resource • Overview of our data, the issues involve and our plan for the
future
• Time for questions
• Time for discussion on how other studies have developed
their GIS data resource
2
Defining GIS
GIS combine mapping and a record of
location with database technology. This
can be used in the storage, analysis,
management or presentation of data.
E.W.Gilbert‘s1955 version of John Snow’s 1855 Soho Cholera Outbreak Map
3
Scope of this presentation
• Not about GIS tools
• Not about GIS analysis or techniques
• It is about the capture and storage of
data in an accessible manner to allow
future GIS analysis
• Uses ALSPAC as an example
4
The ALSPAC GIS dataset
• Geographic identifiers collected directly from the cohort
• Data collected via external data sources
• Geographical data linkage
• Precision of geographic variables – accuracy
• Precision of geographic variables – ethics
• Providing the data as an integral part of the resource
• Current data availability
5
ALSPAC administered data collection
Residential Address (~50000 address points)
• updated from cohort (self reported)
• team who tracks lost cases
• email
• second contacts
• database searches (osis, electoral roll)
School the young person attends / wishes to attend
• via questionnaire (ALSPAC questionnaires/assessments
administered in schools, primary to secondary transition questionnaire)
• clinic attendance interview
• collected from the school
6
Linkage to external data sources
Validation / Cleaning
• Validation and cleaning of self reported data using data collected
via record linkage (NSTS – NHS Tracing, NPD – National Pupil
DB, Royal Mail/OS products)
Missing Data
• Enhancing the resource through record
linkage
Data collection via geographical identifiers
• Accessing existing data organised
around geographical IDs (census
data,neighbourhood data)
• Primary data collection (distance to
overhead power lines, air quality,
commuting, school selection)
7
Data Collection through Record Linkage
• Office National Statistics (ONS) Tracing
• Health Authority
• Embarkation
• NSTS (NHS Strategic Tracing Service)
• Address registered with GP
• National Pupil Database (DCSF, DIUS*, UCAS*)
• School Address
• Pupil Residential Address
• DWP*
• Home Office*
* Linkage currently being investigated
8
G.I.S – ALSPAC Resource
• ~50,000 ALSPAC residential address points, associated with a
date range which can then be linked to ALSPAC data collection
• Schools attendance data from NPD ~17000
• Schools attendance data from ALSPAC collection ~ 10000
The geographic relation between household
income and polluting factories – FoE 1999
9
G.I.S Precision
• Spatial data held at many geographic levels
• Geographies range in scale from 0.1 meters to regional/national
data
• Tied together via address, postcode or grid reference as central
ID
• Key resources include:
– NSPD ( was All Fields Postcode Directory) - geo linking
database
– Deprivation & Socio Economic indices (IMD, Townsend,
Acorn)
– Census data
10
G.I.S – How we link cases to data
•
•
•
•
Master file of Postcodes (NSPD)
Postcodes linked to grid reference
Grid references of various scales
PCs/GridRef mapped to:
– Electoral geographies
– Census geographies
• Ethics:
– We don’t generally identify
residence at PC or equivalent level
Ordinance Survey – The National Grid
11
G.I.S – How we link geographies
Current Situation
• Use Postcode / postcode centroid grid reference as our highest
precision variable
• Link geographies using NSPD/AFPD appropriate to the measure
required
Proposed Method
• Use property reference number (UPRN) / property centroid grid
reference as highest precision variable
12
G.I.S Problems
• Shifting geographies across time points
• Royal Mail change postcode areas (and therefore postcode
centroids)
• Postcodes are ‘recycled’
• Postcode not precise enough in some cases
• Postcode boundaries are not contiguous with other geographic
boundaries
13
Accuracy issues with analysis at postcode
level
Address level
Postcode level
14
Accuracy issues with analysis at postcode
level
Address level
Postcode level
15
Accuracy issues with analysis at postcode
level
Address level
Postcode level
16
Linkage problems with the cohort data
• Missing data
– Especially problematic for the cases who
didn’t enrol in the original recruitment
– Gaps in the address data
– Move date often date we were informed
not the actual move date
• However…
– ONS matched 99.7% mothers, so we have
their old & new NHS numbers and cleaned
data (original recruitment cases only)
17
GIS Data Availability
• Collected as administrative resource
• Not yet cleaned, documented and
presented to usual ALSPAC standards
• Initiatives under way to validate and fill
gaps in record
• Schools GIS data in the main not
processed
• Aim to build into standard ALSPAC
resource
18
GIS Ethics
• Postcode level or greater accuracy treated as
a personal identifier
• Research proposals to use these data need
ALSPAC Law & Ethics Approval
• Broader geographical data can be released in
normal manner
• A two-stage process is used to collect and
process precise data
• Data collected via linkage not available for all
cases due to ethical decisions
19
GIS Data Access
Step 1 – Postcodes (or full address) provided to
researcher with unique collection ID with no
other data attached
Step 2 – Researcher attaches their data and
returns file to ALSPAC
Step 3 – ID converted to the appropriate
collaborator ID, postcode data removed
Step 4 – Requested ALSPAC data added to the
file and data sent to the researcher
20
Andy Boyd
A.W.Boyd@Bristol.ac.uk
Download