C. difficile

advertisement
THE FLOW OF DATA
September 27, 2012
The Flow of Data
•Data sources
•Data streams
•Databases
•Data repositories
•Data warehouses
Data Source
An entity that collects the data:
• Health care setting – hospital, clinic
• Diagnostic facilities – labs, mobile unit
• Research laboratories
• Schools
• Work places
• Government agencies
• Surveillance system
Data Stream
A constant flow of a specific type of data
• Death reports
• Laboratory diagnostic data
• Insurance claims
• Pharmaceutical sales
• Website searches
• Infection reports
• Surveillance data
Database
An organized collection of data
• Allows maintenance of complex information
• Organized in a relevant way to purpose
• Allows quick selection of desired data – searchable
Data Repository
A location to safely store and compile data from
similar sources
Data Warehouse
A database for analysis of compiled data for the
purposes of storage and reporting
Often has purpose of enabling decision making
Databases and Health
• Person (or animal or population) – Place – Time
• Concept from descriptive epidemiology
• Characterizes health events
• Helps understand why events happen
• Who is at risk? Where? When?
• Allows formation of hypotheses for research
• Databases can capture what is happening to either an
individual or a population in a certain place at a certain
time
Example: C. difficile infection in the elderly
Centers for Medicare and Medicaid Services
Database contained data for over 1million C.
difficile cases from 1991-2004
Objectives and Hypotheses:
1. Does the age related rate acceleration of C.
difficile in the elderly vary geographically?
H1: Varies similar to rate
2. Does livestock density influence age related rate
acceleration?
H2: Increases with increasing livestock density
Geographic Distribution
US C. difficile age related rate acceleration
C.diff rate increase
per year of age
2.1%
3.7 to 5%
5.1 to 6.5%
6.6 to 7.9%
Accumulated Data Over Time
US C. difficile rate 1991-2004
c. Difficile rate acceleration and livestock density
by state
Human population – place - time
200 Countries, 200 years, 4 minutes
Considerations for Data Use
• Timeliness
• When was the data collected? Recent enough?
• Accessibility
• Who has access to the data? How to gain access?
• Comparability
• Are the data in the database comparable for use
together?
• Data coming from different sources!
• Compatibility
• Are the data in the database compatible? With data from
other sources? With the research question?
Primary vs Secondary Data
Primary data
• Data that was collected for the analysis being performed
• Examples:
• use of laboratory data collected by a hospital to provide care
for an individual
• Treatment trial
• Laboratory experiment
Primary vs Secondary Data
Secondary data
• Data collected for another purpose and now being used
for a different analysis
• Examples:
• Re-use of data for any purpose
• Systematic review
• Use hospital records for a retrospective study
Uncertainty in the Primary Data
Consider in secondary use of the data!
Accuracy
• Degree to which a measurement reflects the true
value (data predicts the true population mean)
Precision
• Degree to which repeated measurements obtain
the same results (data is repeatable)
Bias
• Lacking neutrality or having a one-sided view
Accuracy vs Precision
Quality of Primary Data
• Cannot assume primary data is high quality
• In addition to being accurate and precise, also
consider:
• Relevance – is the data useful to your research question?
• Timeliness – is the data available when needed?
• Completeness – is their missing data?
Improving Data Quality
• Correcting (after entry) – time consuming, possibly
expensive
• Avoiding quality issues:
• Avoid missing data
• Avoid entry errors (typos, etc)
• Enter data into a database for use quickly
Secondary Use of Data
Why do it?
Secondary Use of Data
Why do it?
• New research question
• Analysis
• Public health investigation
• Marketing
• Population level monitoring of health
• Retrospective analysis
• Cost saving
• Proof of concept
Secondary Use of Data
Conservation Medicine Applications
• Not possible to measure individual level exposures in
people or animals
• Ethics
• Cost
• Not possible
• An exposure often shared by many in a population
• Exposure may be limited to a specific population
• Limited scale effects may be hard to study without
population level data
Ethical Considerations in Secondary Data Use
• For humans – data derived from patients
• Individual rights? Restrict use after anonymization?
• Domestic animals – pets, livestock
• Owner or farmer rights?
• Wildlife and ecosystem
• Public?
• Who owns data?
• Who has the right to access it?
• For what purpose can it be used?
• Data use and sharing agreements
• Public policy issues
Data Confidentiality
Example: MDPH Confidentiality Agreement
Field Trips! Thursday Oct 4th
Primary Data Sources
Visit to Angell Animal Medical Center
Time and location: Angell Animal Medical Center, 350 South
Huntington Avenue, Jamaica Plain, MA 02130
1pm-2pm
Visit to the State Lab Institute Epidemiology Unit
Meet with Johanna Vostok, Lynda Glenn and Gillian Haney, Room 123,
MDPH State Laboratory Institute, 305 South Street, Jamaica Plain,
MA 02130
2:45-4pm
Assignment for Oct 3rd
5-10 minute group presentation
• Progress report on systematic review:
• Research question
• Literature review strategy (keywords, databases, etc)
• Retrieved article coding form
• Selection criteria
• Problems encountered
• Solutions?
• Collaboration needed?
Systematic Review Project
• Paper due October 18th
• Presentation on October 24 th 9-12 at TIE
• Paper format – like a journal article:
• Title
• Abstract
• Introduction/Background
• Methods
• Results
• Discussion/Conclusions
• References
Download