…of routinely collected PC data Data Quality and Ensuring Usability

advertisement
Data Quality and Ensuring Usability
…of routinely collected PC data
Presented to
Integrating Clinical and Genetic Datasets:
Nirvana or Pandora’s Box
Presented by
Simon de Lusignan
slusigna@sgul.ac.uk
9th May 2006
About me
• GP in Guildford
• 11,500 patient practice
• 6.5 Whole time equivalent GPs
• Computerised since 1988
• Senior Lecturer, St. Georges
• Primary Care Informatics (PCI) research group
 Using routinely collected data for quality
improvement + research
 Electronic libraries
 Computer in the consultation
 Telemonitoring
• Chair PCI WG of EFMI
• Developing a BSc in BMI
Overview
• Introduction
• Benefits from linking clinical + genetic data
• Growing volumes of accessible primary care data…
for quality improvement + research
…increasingly used
• Objective
• Is it possible to define the features of a routinely collected dataset which
can be integrated to genetic data
• Method
• Literature review + 10 years of experiential learning working with data
• Features of “quality” data:
1. What is data quality?
2. Unique identifiers + denominators
3. What need to be defined about data processing + storage
• Discussion
Introduction
•
“GIVEN” Benefits from linking clinical and genetic data
•
Routinely collected clinical data is used increasingly for:
1.
2.
3.
4.
Quality improvement
Clinical Audit
Health Service Planning
Research
References:
1. de Lusignan S, van Weel C. The use of routinely collected computer data for research in primary care: opportunities
and challenges. Fam Pract. 2006 Apr;23(2):253-63.
2: de Lusignan S, Hague N, van Vlymen J, Kumarapeli P. Routinely collected general practice data are complex but
with systematic processing can be used for quality improvement and research. Accepted for publication: Informatics
in primary care
Objective
•
To define the features of clinical data which make them fit
for integration with genetic data
Features of “quality” data
•
•
•
Defining Data Quality
Unique identitifiers
Defined process of data extraction + storage
Defining data quality
Evolving definitions:
• Completeness + accuracy
• Currency
• Sensitivity + positive predictive value
• Data Quality Probe
• “Fit for purpose”
(Pringle et al. BJGP 1995)
(Williams, Methods 2003)
(Thiru et al., BMJ 2003)
(Brown + Warmington IPC 2003)
(PCI WG EFMI, 2005)
Unique IDs
•
•
•
•
Linkage of data
Interoperability of systems
Follow-up / traceability of individuals
Population denominator + ghosts….
• England + Wales
• Scotland
- NHS number
- CHI number
Our system
• “MIQUEST” unique ID for one practice
+ compound with study number
+ unique ID for practice
• Convert to non-case sensitive ASCII format
Processing data
(1) Appreciation of data entry issues + contemporary
perspective of system users;
(2) Defined stages of data processing + applications used at
each stage, + quality controls;
(3) Archive coding systems and the look-up tables used to
infer meaning or rubrics;
(4) The queries used to extract the data;
(5) A metadata system to ensure traceability of each cell of
data;
(6)The ethical constraints that apply to the dataset.
(1) Data entry issues
+ contemporary perspective of users
•
COPD and Bronchitis codes are easily confused
•
Recoding half of the practice asthmatics from a diagnosis to
“history of” code
Ref:
Faulconer ER, de Lusignan S. An eight-step method for assessing diagnostic data quality: COPD as an exemplar.
Inform Prim Care. 2004;12(4):243-54.
(2) Defined stages of data processing
We have defined eight discrete steps in data processing:
(1) Design of queries, + piloting,
(2) Data: entry, (already dealt with)
(3) Extraction,
(4) Migration, unique IDs essential
(5) Integration,
(6) Cleaning,
(7) Processing, and
(8) Analysis
Ref:
van Vlymen J, de Lusignan S, Hague N, Chan T, Dzregah B. Ensuring the Quality of Aggregated General Practice Data:
Lessons from the Primary Care Data Quality Programme (PCDQ). Stud Health Technol Inform. 2005;116:1010-5.
(3) Archive coding systems….
•
•
Coding systems are constantly evolving
In general coding systems are becoming larger + more
complex
• You can go from many to few; but not from few to many…
•
We archive:
Clinical codes look-up engine used
 e.g. NHS Triset Browser
• Each relevant version
 E.g. 4 and 5-Byte Read Codes; Drug Dictionary, Proprietary codes
Example of “look-up engine”
(4) The query library
•
•
•
•
Re-issued by date
Query set for each clinical programme
• e.g. C1, C2, C3 – Cardiac programme
Query set for each extraction type
• e.g. E4, E5, G4, G5 (E for EMIS, G for Generic)
Defined look-up tables + rubrics for queries
The query library…
The “C2” queries
The “C2” EMIS 5-Byte set
(5) Metadata system
•
•
•
•
•
Follows data from query set to analysis
Preserves original data
Derived variables clearly identified
Associated dates + numerics labelled
• Rules for units used
Look-up table used to define variable names
van Vlymen J, de Lusignan S. A system of metadata to control the process of query, aggregating, cleaning and analysing
large datasets of primary care data. Inform Prim Care. 2005;13(4):281-91.
Source data
– metadata structure
originating
query set
bigram
C
2
query
file
BIGRAM
MEANING
DI
Diagnosis
RX
Drugs
Prescription
OC
Occupation
HO
History
Symptoms
OE
Examination
Signs
Read
code /
CCC
_ PDNP _ G 3
P1
repeat
index
_
type bigram
1 _ D
I
Linking elements:
Query library
Query &
Core Clinical Concept
Read code
Core clinical concept (CCC)
Automation
(6) Ethics
•
The Ethical constrains on any dataset are indexed in the
query library
Summary
9th May 2006
Summary
•
Data quality is best defined in terms of
• “Fitness for purpose” - What purpose when?
• Transparent methods of data processing allow audit of
results
• Understanding data entry issues / context is essential
• Metadata can help control processing
•
Careful curation of data may allow its use beyond the
timescale of the original study
Thanks for listening
Simon de Lusignan
Tel:
Fax:
Email:
Web:
020 8725 5661
020 8767 7697
slusigna@sgul.ac.uk
www.gpinformatics.org
www.sgul.ac.uk/informatics/
9th May 2006
Download