THE FLOW OF DATA September 27, 2012 The Flow of Data •Data sources •Data streams •Databases •Data repositories •Data warehouses Data Source An entity that collects the data: • Health care setting – hospital, clinic • Diagnostic facilities – labs, mobile unit • Research laboratories • Schools • Work places • Government agencies • Surveillance system Data Stream A constant flow of a specific type of data • Death reports • Laboratory diagnostic data • Insurance claims • Pharmaceutical sales • Website searches • Infection reports • Surveillance data Database An organized collection of data • Allows maintenance of complex information • Organized in a relevant way to purpose • Allows quick selection of desired data – searchable Data Repository A location to safely store and compile data from similar sources Data Warehouse A database for analysis of compiled data for the purposes of storage and reporting Often has purpose of enabling decision making Databases and Health • Person (or animal or population) – Place – Time • Concept from descriptive epidemiology • Characterizes health events • Helps understand why events happen • Who is at risk? Where? When? • Allows formation of hypotheses for research • Databases can capture what is happening to either an individual or a population in a certain place at a certain time Example: C. difficile infection in the elderly Centers for Medicare and Medicaid Services Database contained data for over 1million C. difficile cases from 1991-2004 Objectives and Hypotheses: 1. Does the age related rate acceleration of C. difficile in the elderly vary geographically? H1: Varies similar to rate 2. Does livestock density influence age related rate acceleration? H2: Increases with increasing livestock density Geographic Distribution US C. difficile age related rate acceleration C.diff rate increase per year of age 2.1% 3.7 to 5% 5.1 to 6.5% 6.6 to 7.9% Accumulated Data Over Time US C. difficile rate 1991-2004 c. Difficile rate acceleration and livestock density by state Human population – place - time 200 Countries, 200 years, 4 minutes Considerations for Data Use • Timeliness • When was the data collected? Recent enough? • Accessibility • Who has access to the data? How to gain access? • Comparability • Are the data in the database comparable for use together? • Data coming from different sources! • Compatibility • Are the data in the database compatible? With data from other sources? With the research question? Primary vs Secondary Data Primary data • Data that was collected for the analysis being performed • Examples: • use of laboratory data collected by a hospital to provide care for an individual • Treatment trial • Laboratory experiment Primary vs Secondary Data Secondary data • Data collected for another purpose and now being used for a different analysis • Examples: • Re-use of data for any purpose • Systematic review • Use hospital records for a retrospective study Uncertainty in the Primary Data Consider in secondary use of the data! Accuracy • Degree to which a measurement reflects the true value (data predicts the true population mean) Precision • Degree to which repeated measurements obtain the same results (data is repeatable) Bias • Lacking neutrality or having a one-sided view Accuracy vs Precision Quality of Primary Data • Cannot assume primary data is high quality • In addition to being accurate and precise, also consider: • Relevance – is the data useful to your research question? • Timeliness – is the data available when needed? • Completeness – is their missing data? Improving Data Quality • Correcting (after entry) – time consuming, possibly expensive • Avoiding quality issues: • Avoid missing data • Avoid entry errors (typos, etc) • Enter data into a database for use quickly Secondary Use of Data Why do it? Secondary Use of Data Why do it? • New research question • Analysis • Public health investigation • Marketing • Population level monitoring of health • Retrospective analysis • Cost saving • Proof of concept Secondary Use of Data Conservation Medicine Applications • Not possible to measure individual level exposures in people or animals • Ethics • Cost • Not possible • An exposure often shared by many in a population • Exposure may be limited to a specific population • Limited scale effects may be hard to study without population level data Ethical Considerations in Secondary Data Use • For humans – data derived from patients • Individual rights? Restrict use after anonymization? • Domestic animals – pets, livestock • Owner or farmer rights? • Wildlife and ecosystem • Public? • Who owns data? • Who has the right to access it? • For what purpose can it be used? • Data use and sharing agreements • Public policy issues Data Confidentiality Example: MDPH Confidentiality Agreement Field Trips! Thursday Oct 4th Primary Data Sources Visit to Angell Animal Medical Center Time and location: Angell Animal Medical Center, 350 South Huntington Avenue, Jamaica Plain, MA 02130 1pm-2pm Visit to the State Lab Institute Epidemiology Unit Meet with Johanna Vostok, Lynda Glenn and Gillian Haney, Room 123, MDPH State Laboratory Institute, 305 South Street, Jamaica Plain, MA 02130 2:45-4pm Assignment for Oct 3rd 5-10 minute group presentation • Progress report on systematic review: • Research question • Literature review strategy (keywords, databases, etc) • Retrieved article coding form • Selection criteria • Problems encountered • Solutions? • Collaboration needed? Systematic Review Project • Paper due October 18th • Presentation on October 24 th 9-12 at TIE • Paper format – like a journal article: • Title • Abstract • Introduction/Background • Methods • Results • Discussion/Conclusions • References