PH_DPH_02_01_Epidemiology and Routine Data Introduction to Epidemiology Epidemiology is one of the founding principles of public health. It involves the study of health states in natural populations. In many scientific fields, experiments can be carried out in controlled laboratory conditions where all factors except the one under study can be controlled for. In epidemiology, this is often not possible as many of the influences on health, such as behaviour, social and economic conditions, cannot be experimentally controlled. Instead, the population serves as a “living laboratory”, and controlling for confounders requires additional techniques. A number of situations are amenable to epidemiological analysis, such as the investigation of an outbreak of disease, clusters of a rare illness, differences in disease occurrence, and more. Formally defined, epidemiology is the study of the distribution of health states and their determinants, in the population. How these are managed involves careful description and comparison of groups, investigation, interpretation and understanding of data, and the sources of bias in information. Routinely Available Data Sources In public health, epidemiological studies are generally undertaken to address specific questions related to health and disease, and populations and factors that influence them. Generally, the data will be specifically gathered to fulfil the study’s particular purpose. However, it can also be collected via “routinely collected data”, which is analysed to assess a population’s health status and investigate the pattern of disease occurrence. This is typically done for observational or descriptive purposes. There are many sources of routinely available data that can be used for epidemiological purposes. Some of it can be used for analytical research as well. Some data is collected on a continuous, systematic basis, while some is collected intermittently. Routine data may cover the entire population, part of it, or some sample thereof. If it is collected intermittently, this can be done with repeated cross-sectional surveys, or following the same individuals over time. Data can be collected to reflect the health experience of individuals over time, or it may be based on episodes of disease. The term “routine” refers to data collection as part of some ongoing data collection system, as opposed to some predefined study for some specific purpose. Such data may be collected for legal or administrative purposes, for instance. Routine data is usually easy to collect, at low cost, and is often stable over long periods of time. The downside is that it is often not collected specifically to answer a given question under study. The data may not always be fully representative, and some more marginal populations may not be wellrepresented in the data. Some data may not be well-defined or readily used for specific queries, and accuracy is not always guaranteed. There may also be changes in how the data is collected, or coded, over time (e.g. changes to medicare numbers). The amount of routinely collected data is growing, and the capability of technology to record and aggregate it is growing also. The term “big data” refers to the capability to know more about individuals and the groups they fall into, to yield meaningful information. Traditionally, routine data can be thought of in four categories: demographic data, mortality data, morbidity data, and health facility usage data. The widening of the concepts of health and the range of influences on health and disease, and the data sources available, mean that these classifications are no longer representative of all available routine data. In the modern era, the expectation is that routine data wources will enable description of the population based on the numbers and characteristics of people being born, living, dying, and becoming unwell. Census Data The main source of demographic data in the UK, as in many countries, is the national census. The census was developed on the basis that knowledge of a country and its people must inform the basis of legislation and diplomacy. Before each census, most countries undergo public consultation on its methodology and content. They have become more comprehensive and detailed over time. All people alive on the night of the census must be counted by law, traditionally on the basis of whichever establishment or household they spent that night. A household is defined as at least one person, or a group, living in the same address, with common housekeeping (e.g. sharing at least one meal or living space). There is usually a central bureau or statistical office that conducts the census. Data should be collected under conditions of strict confidentiality. Generally, complicated questions should not be asked in the census. Completeness of the data requires reliable inputs. The response rate is essential for the reliability of a census. Men aged 25-29 have the lowest completion rate in the UK. The census provides detailed information on the inhabitants of a country, provided it is well-conducted. High response rates and coverage can provide comprehensive information. Some minority subsets of the population may be under-represented, although analytical techniques can be used to adjust for this. Census data can be applied at the local level to allocate resources. Much of the data can also be found online in aggregate reports. There are various potential factors that can interfere with a census. For instance, areas high highly migrant populations might find their census results influenced by whether or not the data are recorded for the normal residents, or the migrant residents present at the time of the census. Civil Registration and Vital Statistics In many countries, it is compulsory for births and deaths to be legally registered. These registrations are an important data source for planning, policymaking and describing the health of the population. Usually the data is provided to the registration system at birth by the family or health provider. Registration of death occurs via death certification. There can be a time lag between death certification and death registration. Death certificates allow the doctor to certify diseases or conditions that lead directly to the death, including related conditions that were not immediately fatal. The quality of mortality statistics depends on the certification and its accuracy. Note that the comorbidities that lead to this are not always well-known, especially as many death certificates are completed by junior staff. Post-mortem examination is the gold standard for determining the cause of death, but these are done increasingly infrequently. Data on Disease Occurrence and Disability Mortality data is of course important for public health purposes, but does not provide the full extent of disease burden, as not all diseases are fatal, or universally or consistently so. As such, other sources of data on disease morbidity and mortality should be used. For some diseases, mortality is a reasonable proxy for the disease burden, but for many it is not. Data on disease and disability can be derived from a number of sources. These include the four main categories: o Primary care records o Hospital databases o Disease or case registries (including notifiable diseases) o Household and population surveys No single source of data can give a complete picture of the disease burden within the population. Note that many data sources involve, explicitly or implicitly, collection of information from patients who have made contact health services. This does not necessarily reflect the total burden on the community. Many countries have traditionally used hospital inpatient data as an indicator of disease in the larger population. This only works accurately if the disease mandates hospital admission (e.g. hip fracture). o For many conditions, such as arthritis, respiratory conditions, etc, there can be a significant impact on the population and community without a great deal of inpatient hospital care. Some countries use hospital episodes as the metric of hospital care. This refers to a period of treatment under the care of a hospital consultant. o Hospital episodes are generally coded under administrative, demographic, diagnostic and procedural codes. o Other data can include the facility, address, birth date, postcode, admission details, discharge details, specialty of admission, and others. o Maternity admissions will often detail information about the mother and the baby. In principle, primary care data could cover a greater variety of conditions. In practice this is sometimes more complex to do in practice, as data coding in primary care differs for a numbers of reasons: o More patients in primary care have complex undifferentiated symptoms. o Diagnostic investigations are not as widely used in primary care. Some practices contribute data to monitoring programs such as the BEACH study. The use of electronic records in other settings has also permitted wider data extraction. For instance, reportable diseases including notifiable communicable diseases and cancers has enabled the construction of disease registers, which often provide an accurate source of data on disease frequency, and in-depth information on individual cases. Many jurisdictions, including Australian states, operate cancer registries. These collect information on type, site, histology, and other important information. Ideally the data is collected prior to cure or death, although this should also be recorded. Registries of these dsieases also allow survival statistics to be calculated. A good disease register has the following characteristics: o There exist clear criteria for the disease, which determine who is included in the register. o Individuals with the appropriate disease criteria are registered and recorded. o The register is longitudinal, meaning that the information about each individual is updated in a defined, systematic manner. o It is based on a geographically defined population. o It is “assiduously curated” over many years. o There are “high standards of information governance”. Government-run surveys provide further information about health and disease within the population, and can record data on the prevalence on certain long-standing conditions. Data on Health-related Factors and Behaviours Information about health-related behaviours and risk factors is often not well-recorded in routine data systems. Research studies and population surveys are important to record data on health-related behaviours. Some related factors are reasonably well-recorded in routine data systems, such as height and weight, and some factors such as blood pressure, serum cholesterol and vaccine status. The amount of alcohol and cigarettes sold within many countries are logged, even if this is done for other purposes such as taxation. This still can provide useful information on health-related risk factors. Other factors may be collected in surveys such as the ABS national health survey. Data on the Social Determinants of Health This data is even less well-recorded than risk factors (e.g. occupation, income or education level are usually not recorded in standard systems). The census remains an important source of information regarding these determinants. Data linkage techniques can be used to correlate findings from different sources- e.g. data from the census can be linked to data from GP practices, to identify correlations between deprivation in one area, to incidence of disease in that same area. Data on the Performance of Health Systems Health services generate and collect a huge amount of data on a day-to-day basis. This is often used to evaluate their performance. The challenge is to sensibly and appropriately use this data: the wrong data, or the right data taken out of context or used inappropriately, can give an inaccurate impression of health service performance. For instance, raw mortality rates are often recorded by hospitals. This needs to be converted into a standardised mortality rate to reflect the true performance of each hospital. Disease-specific incidences and health service performance can be regularly reviewed within the hospital system. The relatively disparate nature of private providers for primary care in Australia can make this more challenging, although Medicare data can also be audited. Disease Nomenclature and Classifications Accurate reporting on the cause of death or disease requires standardised names and terms. Strictly speaking, a “nomenclature” refers to a list of terms, while if it is divided into topics, areas and categories, it is a “classification”. Ideally, such a system should be internationally adopted and applied using accepted rules and conventions. The Who maintains the International Statistical Classifications of Diseases and Related Health Problems. It was formally created to classify cause of death information from death certification, but is now also used widely by systems that monitor disease and disability as well. Each country must then adopt the new ICD classification. Some adopt it faster than others. There also exist some more specialised systems, such as the International Classification of Diseases for Oncology. For other terminology, such as symptoms, signs, diagnoses, treatments and procedures, there exist other systems such as the SNOMED nomenclature and the “Read Codes”. These terminology systems are intended to allow doctors to use their own terminology which can then be converted into standard terminology by software or coding staff. Things like discharge related groups (DRG’s) aggregate data inot larger categories for assessment, resource allocation and planning. Surveillance Data Surveillance is a system that uses descriptive epidemiology to maintain an overview of a population’s health status. It may be used to monitor diseases, syndrome or other variables or events of interest. Surveillance involves the continuous analysis, assessment and feedback of systematically-acquired data. Surveillance systems are of most value if they are comprehensive and supply the data on events soon after they occur. This is particularly important for communicable diseases. Other fields are increasingly using it, such as pharmacosurveillance of adverse events after new drugs are developed. Indicators Routine data can be used to develop “indicators” which can highlight an aspect of the health of the population, or the performance of the health system. These indicators can be drawn from any source of routine data. They should be purposefully sourced. There is usually some specific goal of an indicator, most of them used on a regular basis. The health department will often create a framework to group indicators into different categories for monitoring. Access and Transparency With so much electronic data now being generated, society has increasingly had to grapple with the question of who should be able to access it, under what circumstances, and for what purposes. In many countries, there is a drive towards increasing transparency. Goals of this include saving money, improving public confidence in state processes and enabling greater public participation in decisionmaking. Comparing data between different health services, providers, devices, etc, could in principle enable more informed decision-making relating to use, funding and other decisions. There is an argument for making routine data available for research purpose, which can be used by academic and research staff to find new insights and improve patient care. Linkage of records from different sources can be particularly useful to answer important health-related questions. This requires different data to be identified to ensure data can be linked. Often, though, the researchers themselves need not see the identifying information, provided they can be confident that the data has been properly linked. The balance between confidentiality and transparency is a difficult one, particularly with medical information. The more data on individuals, the more useful the data is as linked data on individuals, their age, location, medical history and other factors, can be very useful, but the more detailed the data, the more easily it may be identified. There can exist some public distrust about data transparency, especially if confidential information is accidentally leaked. Data can be: o Anonymous (usually aggregated data where large numbers of individuals are combined so no individual can be identified). o Pseudonomyous (individual patient data can be separated out, but identifying information like name or date of birth is removed). o Individually identifiable.