Study design Raj Bhopal, Bruce and John Usher Professor of Public Health, Public Health Sciences Section, Division of Community Health Sciences, University of Edinburgh, Edinburgh EH89AG Raj.Bhopal@ed.ac.uk Educational objectives Understanding disease causation and measuring the burden of disease are the two key purposes underlying epidemiological studies. That epidemiological studies are unified by their common purposes, by their utilisation of the survey method and their dependency on the concept of a defined population. All study designs potentially contribute to questions of cause and effect, health policy and planning, and clinical practice. A clinical case-series is a coherent set of cases compiled by one or a few clinicians. A population case-series study, consisting of a set of cases in a defined population and time, lays the foundation for description of disease by place, time and characteristics of population. If cases are compared with non-cases from the same population the design is that of a case-control study, which generates and tests causal hypotheses, through the analysis of associations. Study design: introduction There are five basic designs based on individual data. There are modifications of these study designs. Discussions tend to consider each design as being distinct, but Ideas which underlie study design are interrelated. Commonalities in study design The common goal of epidemiological studies is understanding the frequency, pattern and causes of disease in populations. Reliance on the survey method. Rooted in the concept of population. The underlying, or base population, is the starting point. Such understanding helps define the common ground, and relative unity, of epidemiological study design. Five basic epidemiological designs for studies based on individuals; and a mode of analysis Case-series (clinical and population) Cross-sectional Case control Cohort (prospective and retrospective) Trial Most textbooks refer to an ecological study design but such studies are usually a mode of analysis, applied in large-scale research. Classifications of study design: five dichotomies Descriptive and analytic studies. Retrospective and prospective studies. Observational and experimental studies. Presence or absence of disease at the beginning of a study. Studies which incorporate a specific comparison group and those which do not. Case-series: clinical and population based Clinical case-series - usually a coherent and consecutive set of cases of a disease (or similar problem) which derive from either the practice of one or more health care professionals or a defined health care setting e.g. a hospital or family practice. A case-series is, effectively, a register of cases. Analyse cases together to learn about the disease. Clinical case-series are of value in epidemiology. Studying symptoms and signs. Creating case definitions. Case series When a clinical case-series is complete for a defined geographical area for which the population is known, it is, effectively, a population based case-series consisting of a population register of cases. Epidemiologically the most important caseseries are registers of serious diseases or deaths, and of health service utilisation, e.g. hospital admissions. Usually compiled for administrative and legal reasons. Thought exercise: Case-series, natural history and spectrum How does the case-series (clinical and population) contribute to our understanding of the natural history and spectrum of disease? Case series: natural history and spectrum By delving into the past circumstances of these patients, including examination of past medical records, and by continuing to observe them to death (and necropsy as appropriate) clinicians can build up a picture of the natural history of a disease. Population case-series is a systematic extension of this series but which includes additional cases, e.g. those dying without being seen by the clinicians. Add breadth to the understanding of the spectrum and natural history of disease. Case series: population Full epidemiological use of case-series data needs information on the population to permit calculation of rates. Key to understanding the distribution of disease in populations and to the study of variations over time, between places and by population characteristics. Case-series can provide the key to sound case control and cohort studies and trials. Design of a case-series is conceptually simple. Defines a disease or health problem to be studied and sets up a system for capturing data on the health status and related factors in consecutive cases. Many countries have no valid case-series even for mortality. Case series: requirements for interpretation To make sense of case-series data the key requirements are: The diagnosis or, for mortality, the cause of death. The date when the disease or death occurred (time). The place where the person lived, worked etc (place). The characteristics of the person (person). The opportunity to collect additional data from medical records (possibly by electronic data linkage) or the person directly. The size and characteristics of the population at risk. Case series: additional data Case-series data can be linked to other health data either in the past or the future, e.g. mortality data can be linked to hospital admissions including at birth and childhood, cancer registrations and other records to obtain information on exposures and disease. Cases may also be contacted for additional information. This type of action may turn a case-series design into a cohort design. Case series: analysis Case-series data are analysed using rates. Three circumstances, in particular, where rates are not used. Spatial clustering. When the population is stable. When there is no suitable denominator. (Use proportional ratios.) Case series: strengths Population case-series permit two arguably unique forms of epidemiological analysis and insight. Paint a truly national and even international population perspective on disease. The disease patterns can be related to aspects of society or the environment that affect the population but have no sensible measure at the individual level e.g. ozone concentration at ground level and the thickness of the ozone layer in the earth's atmosphere. Figure 9.1 (b) Population based case series (a) Clinical case series Boundary CHD CHD CHD Visitor CHD CHD CHD CHD CHD CHD Visitor excluded CHD CHD CHD CHD Additional cases Cases outside boundary excluded CHD CHD CHD CHD Identify cases seen by one or more clinicians Case series is unlikely to be a complete set of cases Assess characteristics There is no accurately defined boundary so rates cannot be calculated Only cases within a defined boundary are included Note that there are extra deaths compared to the figure for the clinical case series The extra cases symbolise those not seen at the clinical facility e.g. street deaths Figure 9.2 CHD CHD CHD CHD CHD CHD CHD CHD CHD CHD Natural history Past Healthy Now Diseased Future Dead Making use of indicators with no valid individual measures: exercise How might epidemiology study the potential role in disease causation of factors which vary little between individuals within a region or nation, e.g. fluoride content of the water, the hardness or softness of water supplies or annual exposure to sunshine? Ecological studies, design and analyses Ecology is the study of living organisms in relation to their environment. How, then, must we conceptualise the ecological study? There are variables which are truly not based on individual data and that are useful in epidemiology. Gross national product, air quality measures, lead in water, the weather, expenditure on roads, the type of political structure, the density of population. Variables can be studied on their own with descriptions of time trends, variation between places, and differences by the characteristics of the populations in these places. Ecological studies/ecological analysis There are studies where exposure data relating to a place (say hardness of water, which could be collected on individuals) are correlated with health data collected on individuals but summarised by place (say CHD rates). Are these ecological studies? Boundaries are blurred. Conceptually, the ecological component in this kind of study is an issue of data analysis and not study design. Cross-sectional, case-control and cohort studies and trials (and not just population case-series) could also be analysed in relation to such "ecological" variables and such units of analysis. Most ecological analyses are based on population case-series. Ecological analyses are subject to the ecological fallacy. Ecological fallacy: example Imagine a study of the rate of coronary heart disease in the capital cities of the world relating the rate to average income. Within the cities studied, coronary heart disease is higher in the richer cities than in the poorer ones. We might predict from such a finding that being rich increases your risk of heart disease. In the industrialised world the opposite is the case - within cities such as London, Washington and Stockholm, poor people have higher CHD rates than rich ones. The ecological fallacy is usually interpreted as a major weakness of ecological analyses. Ecological analyses, however, informs us about forces which act on whole populations. Exercise: applying individual data to populations Reflect on whether observations on individuals are always applicable to populations. Can you think of an example of when this is so and when it is not? Why do you think this happens? Atomistic fallacy Studies of individuals are prone to the opposite of the ecological fallacy, the so-called atomistic fallacy. Wrongly assume from observations on the causes of disease in individuals that the same forces apply to whole populations. For example, at an individual level a high income or a marker of material success such as employment, car access etc., is associated with a lower rate of suicide. Does not mean that populations or societies which are rich have a lower rate of suicide or better mental health. Opposite seems to be true. Case series: final comments Viewpoint that case-series studies (whether based on individuals or aggregate data) are descriptive, observational and epidemiologically weak is inappropriate. They offer some unique opportunities and perspectives on the pattern and causes of disease in populations. Cross-sectional study A cross-section is the shape that results from cutting a slice from an object. A cross-sectional study exposes and studies disease and risk factor patterns in a representative part of the population, in a narrowly defined time period. Primarily, this study provides information on prevalence of disease and risk factors. It also can seek associations, generate and test hypotheses and, by repetition, be used to measure change. Ideal cross-sectional study is of a geographically defined, representative sample of the population studied within a slice of time and space. Cross sectional study The sampling frame usually conforms to the snapshot analogy. The measurements are made over a relatively short period of time such as a year or two. Excellent for measuring the population burden of disease. People representing virtually all stages of health and disease. A wide spectrum of disease. Indirect insights on the natural history. People with severe disease, however, may be institutionalised. Survivor bias. Figure 9.3 CHD CHD CHD CHD CHD future now past Figure 9.4 CHD CHD CHD Explore natural history Past Now Case-control study The case-control study is a comparative study where people with the disease (or problem) of interest are compared with a reference population. The comparison, control or reference group supplies information about the expected risk factor profile in the population from which the case group is drawn. Cases can be obtained from a number of sources - from a clinical case-series, a population register of cases, from the new cases identified in a cohort study, and from those identified in a cross-sectional survey. Ideal set of cases would be all the new (incident) ones in the population under study. Case control studies Control subjects should be chosen with no selection in relation to their pattern of exposure to the postulated causes, but should otherwise be alike to the cases. In some studies, controls are recruited to match each case, e.g. if a woman of 53 years was recruited as a case, the investigator would seek a control of similar age, (e.g. 57 would be fine, but not 72). The concept is clear: to find differences in exposure to the hypothesised causes in the past lives of cases as compared to controls. The data are summarised first as differences in prevalence of exposure, and then as the odds ratio. Case control studies A classic study by Herbst et al on the occurrence of the extremely rare disease adenocarcinoma of the vagina in girls and young women illustrates the issues. There was an association between the disease and use of diethylstilbestrol by mothers of cases in the first trimester of pregnancy with seven of eight cases being treated with the drug compared to none of the 32 controls. Figure 9.5 CHD C C CHD CHD C C CHD C CHD C Exposure ? CHD Exposure? future now past Figure 9.6 CHD CHD C C CHD CHD C C C CHD Seek differences in exposure and other aspects of past natural history Past Now Disease is known, exposure unknown Cohort studies People, particularly clinicians, speak of their cohort, simply meaning a group, irrespective of the study design. The word comes from the Latin word cohors meaning an enclosure, company or crowd. In Roman times a cohort was a body of 300-600 infantry. In epidemiological terms the cohort is a group of people with something in common - usually an exposure or involvement in a defined population group. Cohort study involves tracking the study population over a period of time. The essential idea is to relate one or more characteristics, exercise for example, to future outcomes e.g. incidence of coronary heart disease. Cohort studies Cohort studies measure disease incidence rates. Cohort studies usually test the hypothesis that disease incidence differs in people with different characteristics (exposures) at baseline. They begin by establishing baseline data, often from a cross-sectional study. Cohort can be followed up directly, or, The baseline data can be linked to health records. The ratio of the incidence rates in the exposed and non-exposed groups derived from the cohort study is the relative risk. Figure 9.7 Time 1 / Future Time 0 / Now CHD NE E NE E NE NE NE CHD NE E NE E NE CHD NE CHD NE E NE CHD NE NE NE E E NE NE E NE NE CHD E E NE E future now Figure 9.8 E E NE E NE NE CHD NE NE E NE NE CHD CHD NE Explores natural history including disease outcomes Past Now Exposure is known, outcome will be explored Future CHD NE NE E NE Figure 9.9: retrospective cohort study Time 1 / Now Time 0 / Past CHD NE NE E NE E NE NE NE NE E NE CHD NE CHD NE E NE NE E CHD NE NE NE NE E E NE NE NE CHD NE NE E NE NE NE E E CHD E E NE E E Define the cohort E now past Figure 9.10: the retrospective cohort study E E NE E NE NE CHD NE NE E NE NE CHD CHD NE CHD NE NE E NE Explores natural history including disease outcomes Past Now Future Exposure status is known for the past, outcomes are explored in the present Clinical trials Are studies where an intervention designed to improve health has been applied to a population. Trials are experiments. Same design as a cohort study with one vital difference, that the exposure status of the study population has been deliberately changed by the investigator. We observe how this change in exposure alters the incidence of disease or other features of the natural history. Clinical trials Define a study population suitable for answering the question. Divide the study population into two or more groups. The control group may be offered the best known alternative. In the ideal trial, the study and control populations are similar in characteristics impacting on disease outcomes. To achieve this similarity individuals in the study are assigned randomly to the groups. This is a randomised, controlled trial. “Best known alternative" is sometimes an intervention which is “psychologically” of similar impact to the study intervention Time 2 / Future Figure 9.11 Time 1 / Shortly I I Time 0 / Now I CHD I I I I I I I I I I CHD Intervention group randomly allocate CHD C C C C C C CHD C CHD C C C Control group future now Figure 9.12 Explores progression / prognosis I I I I I CHD I I I I C C CHD C CHD C C C CHD C Explores natural history if placebo controlled, or prognosis if best alternate treatment Now Future Size of the study Sample size will be dictated by the research questions and stated study hypotheses. Hypotheses needs to be specified in a way that can be quantified. Precision of the answer required needs to be stated. The size of the minimum difference that it is important to detect should be stated. Keep low the chances of two types of statistical error. Type 1 Type 2 Data and analysis and interpretation Interpret data properly, particularly taking into account error, bias and frameworks for analysis of associations. Make appropriate choices in data analysis. Examine numbers of cases and percentages and age and sex specific prevalence or incidence data. Choices of summary measures need to be made. Judgements of cause and effect will often be required. Design and theory Epidemiological designs are based on the theories discussed in earlier chapters, particularly that differential exposure to the causes of disease leads to differential population patterns of disease. The cohort study tests this theory directly. The trial tests it indirectly. The case-series, case-control and crosssectional designs test the theory indirectly and retrospectively. Exercise: Strengths and weaknesses of the study designs Based on the principles of study design and your knowledge of the purposes of epidemiology, consider the relative strengths and weakness of case-series, cross-sectional, case-control and cohort studies, and trials. Put these in a table. You may find the following key words and phrases helpful in your reflection: ease, timing, maintenance and continuity, costs, ethics, data utilisation, main contributions, observer and selection bias, analytic outputs. Some of the strengths and weaknesses of each study design Theme Ease Timing Maintenance and continuity Costs Ethics Data utilisation Main contribution Observer bias Selection bias Analytic output Overlap in the conceptual basis of the caseseries, cross-sectional, case-control, cohort and trial designs The cross-sectional study can be repeated If the same sample is studied for a second time i.e. it is followed up, the original cross-sectional study now becomes a cohort study. If, during a cohort study, possibly in a subgroup, the investigator imposes an intervention, a trial begins. Cohort study also gives birth to case-control studies, using incident cases (nested case control study). Cases in a case-series, particularly a population based one, may be the starting point of a case-control study or a trial. Not every epidemiological study fits neatly into one of the basic five designs. Summary Studies have a common goal to understand the frequency and causes of disease. Seeking causes starts by describing associations between exposures (causes) and outcomes (disease). Survey method. Basis in defined populations. Case-series is a coherent set of cases of a disease (or similar problem). Cases are compared with reference group we have a case control study In a population studied at a specific time and place (a cross-section) the primary output is prevalence data, though association between risk factors and disease can be generated. Summary If the population in a cross-sectional survey is followed up to measure health outcomes this study design is a cohort study. If the population of such a study are, at baseline, divided into two groups, and the investigators impose a health intervention upon one of the groups the design is that of a trial. Studies based on aggregated data are commonly referred to as ecological studies. Mostly, ecological studies are mode of analysis, rather than a design. Interpretation and application of data are easier when the relationship between the population observed and the target population is understood.