Basic measures and tools of descriptive epidemiology Piyawat Saipan (DVM., M.Sc., Ph.D) Department of Veterinary Public Health, KhonKaen U. Topics: Importance of describing an event Continuous VS categorical data Ratios, proportions, rates; Incidence VS prevalence Other measures: AR, CFR, etc. Basic terminology: accuracy, precision, bias Epidemiologic Research Assumes Disease occurrence is not random Systematic investigation of different populations can identify causal and preventive factors Making comparisons is the cornerstone of systematic investigations Definition of Epidemiology The study of the distribution and determinants of disease frequency in human or animal populations and the application of this study to control health problems Key Words in Definition Disease frequency - count cases, need system, records Disease distribution - who, when, where Frequency, distribution, other factors generate hypotheses about determinants A determinant is a characteristic that influences whether or not disease occurs Natural Progression in Epidemiologic Reasoning 1st – Suspicion that a factor influences disease occurrence. Arises from clinical practice, lab research, examining disease patterns by person, place and time, prior epidemiologic studies 2nd – Formulation of a specific hypothesis Natural Progression in Epidemiologic Reasoning 3rd – Conduct epidemiologic study to determine the relationship between the exposure and the disease. Need to consider chance, bias, confounding when interpreting the study results. 4th – Judge whether association may be causal. Need to consider other research, strength of association, time directionality Importance of describing an event Descriptive epidemiology involves observing and recording diseases and possible causal factors. Descriptive epidemiology counts the frequency of cases and describes distribution patterns of disease among different groups in the population for further analysis (who, what, when, where). It is necessary to know the types of data that are collected suitable for a particular investigation. Epidemiologic measures: Overview ● Types of data ● Measures of occurrence or health ● Measures of association ● Measures of attribution Types of Data Continuous data Categorical data Variable: any observable event that can vary. Variables may be either continuous and discrete. Raw data: the initial measurements that form the basis of analaysis Categorical data Qualitative data describes to which category an animal belongs. Nominal and Ordinal scale Nominal data: sex (male, female), categories of animals (pig, dog, cattle) Ordinal data: a series of clinical signs (mild, moderate, serious, death) Continuous data Quantitative (numerical) data consisting of numerical values on a well defined scale: A discrete (discontinuous) scale: data can take only particular integer values, typically count e.g., litter size, parity A continuous scale: for which all values are theoretically possible e.g. height, weight, concentration of chemical in blood etc. Some descriptive statistics Tables to exhibit features of the data Diagram to illustrate patterns: Categorical data: bar chat, pie chat Numerical data: dot diagram, histogram, stem and leaf diagram etc.) Numerical measures Measures of central tendency or location Measure of disperation Measures of central tendency Arithmetic mean Example: the distribution of body weight of 11 dogs: 10, 11, 12, 13, 14, 15, 16, 17, 18, 18, 20 kg. Arithmetic mean = 164 ÷ 11 = 14.91 kg. Cont. Geometric mean: an average of the logarithmic values converted back to base 10 numbers: Example: four HI titers are given as the following dilutions: 4, 8, 16, and 32 GM = (4+8+16+32)¼ = 15 Cont. Median is the mid-point between the minimum and maximum value. Mode is a value occurs most frequently and it is used to highlight a common data point. Measures of dispersion (spread) Range is defined as the difference between the largest and smallest observational. The interquartile range is the range of values which encloses the central 50% of the observations. Variance is determined by calculating the deviation of each observation from the mean. Standard deviation may be regarded as a kind of average of the deviations of the observations from the arithmetic mean. Interval estimation Measures of Health (Occurrence) Ratio, proportion, and rate Prevalence Incidence Incidence risk and incidence rate Other measures of health Attack rate, secondary attack rate, mortality, fatality, proportional mortality Adjusted measures of health Ratio A ratio defines the relative size of two quantities expressed by dividing one (numerator) by the other (denominator). Example, we have a herd of 100 cattle and 58 are found to be diseased. Ratio ? The ratio of disease in this herd is 58:42 or 1.4 to 1. Proportion A proportion is a fraction in which the numerator is included in the denominator. Say we have a herd of 100 cattle and 58 are found to be diseased. Proportion ? The proportion of diseased animals in this herd is 58 ÷ 100 = 0.58 = 58%. Rate A rate is derived from three pieces of information: (1) a numerator: the number of individuals diseased or dead, (2) a denominator: the total number of animals (or animal time) in the study group and/or period; and (3) a specified time period. Example, we might say that the rate of disease in our herd over a 12-month period was 58 cases per 100 cattle. Epidemiologic Measures The term morbidity is used to refer to the extent of disease or disease frequency within a defined population. Two important measures of morbidity are prevalence and incidence. Prevalence Prevalence is the proportion of a population has a specific disease or attribute at a specified point in time. Prevalence (cont.) Two types of prevalence: (1) point prevalence The number of disease cases in population at a single point in time. (2) period prevalence: The point prevalence at the beginning + the number of new cases that occurred during the remainder of study period Example Of the 216 dogs examined in Kingston, 192 had evidence of tooth decay. Of the 184 dogs examined in Newburgh 116 had evidence of tooth decay. Assuming complete survey coverage, there were 192 prevalent cases of tooth decay among dogs in Kingston at the time of the study. Prevalence ? The prevalence of tooth decay was 192 ÷ 216 = 89% in Kingston and 116 ÷ 184 = 63% in Newburgh. Incidence Incidence describes the number of new cases that arise in a population over a specified period of time. There are two ways to express incidence: Incidence risk and Incidence rate. Incidence risk Incidence risk (as cumulative incidence) is the proportion of initially susceptible individuals in a population who become new cases during a defined time period. For example: Last year a herd of 121 cattle were tested for tuberculosis using the tuberculin test and all tested negative. This year the same 121 cattle were tested and 25 tested positive. Incidence risk? The incidence risk would then be 21 cases per 100 cattle for the 12-month period. We can also say that the risk of cattle becoming positive to the tuberculin test for the 12-month period was 21%. This is an expression of average risk applied to an individual (but estimated from the population). Epidemiologic Measures Occurrence of disease can be measured in a static way or in a dynamic way. Static way: Did the cow have mastitis or not at the moment of measurement? Dynamic way: Did the cow get mastitis during the study period? With respect to disease: prevalence is the static way and incidence is the dynamic way Population at Risk A closed and open population [to recruit (births, purchase) and leave (sale, death)] When the population is open incidence risk cannot be measured directly: adjusted by; Denominator = population size at the mid-point of the study period For example: Last year a herd of 121 cattle were tested for tuberculosis using the tuberculin test and all tested negative. This year the cattle were reduced to 101 and were tested of 25 tested-positive. Incidence risk? The incidence risk would then be 23 cases per 100 cattle for the 12-month period. Incidence rate Incidence rate (incidence density) is the number of new cases of disease that occur per unit of individual time at risk, during a defined time period. Table 1 Note that incidence rate: Accounts for individuals that enter and leave the population throughout the period of study. Can account for multiple disease events in the same individuals (Table 1). For example, On the basis of the data presented in Table 1 the incidence rate of clinical mastitis for the 12-month period is 5 cases per 825 cowdays at risk. Incidence rate ? Incidence rate = 2.2 cases of clinical mastitis per cow-year at risk. Summary: Summary Comparison of prevalence, incidence risk, and incidence rate The relationship between P and I Providing incidence rate is constant, incidence risk can be estimated following: Closed population: incidence risk = incidence rate x length of study period Open population: 1- exp ^ (-incidence rate x length of study period) Providing incidence rate is constant, prevalence rate can be estimated following: P = (incidence rate x duration of disease) ÷ (incident rate x duration of disease + 1) For example: The incidence rate of disease is estimated to be 0.006 cases per cow-day at risk. The mean duration of disease is 7 days. The estimated prevalence of disease is… (0.006 x 7) (0.006 x 7 +1) = 0.041 The estimated prevalence is 4.1 cases per 100 cows. Other measures of health Attack rate Secondary attack rate Fatality rate Mortality Attack rate It is defined as the number of cases divided by the number of individuals exposed. Attack rate are usually in outbreak situations where the period of risk is limited and all cases arising from exposure are likely to occur within the risk period. Secondary attack rate It used to describes infectiousness. Secondary attack rate: the number of cases at the end of the study period less the number of initial (primary) cases divides by the size of the population that were initially at risk. Mortality Mortality risk or rate is an example of incidence where death is the outcome of interest. The denominator includes both prevalent cases of the disease and individuals who are at risk of developing the disease. Proportional mortality It is simply the proportion of all deaths that are due to a particular cause for a specified population and time period Adjusted measures of health Adjusted rates are used when we want to compare the level of disease in different population. In veterinary medicine, age, breed, and population type are commonly used adjustment variable. Two methods for adjusting disease: direct and indirect adjustments For example: If we have two colonies of mice and observe them for one day we might find the mortality rate in the first colony is 10 per 1,000 and the mortality rate in the second colony is 20 per 1,000. We might initially think that this difference is due to a difference in management, but it might also transpire that the first colony is comprised of mainly young mice and the second colony is comprised of mainly older mice. The two colonies might be exactly the same in terms of standards of care and housing quality and the difference in mortality solely due to a difference in age composition of the two populations. Direct adjustment With direct adjustment the observed stratum-specific rates are known and an estimated population distribution is used as the basis for adjustment. Where: STD Pi: the size of standard population in the ith strata OBS Ri: the observed rate in the ith strata Stratum-specific rates • Stratum-specific rates are recommended for comparing defined subgroups between or within populations when rates are strongly stratum-dependent. • Stratum-specific rates are recommended when specific causal or protective factors or the prevalence of risk exposures are different for different levels of strata. For example: Table1. Seroprevalence of leptospirosis in urban dogs, stratified by city. City Positive Sample Seroprevalence E 61 260 23% G 69 251 27% Total 130 511 25% Cont.: (cont.) Table 2 Seroprevalence of leptospirosis in urban dogs, stratified by city and sex Table 3 Directly adjusted seroprevalence of leptospirosis in urban dogs, stratified by city The difference between the cities is de to the different sex structure of the 2 populations Indirect adjustment Indirect adjustment provides an estimate of the expected number of cases, given the stratumspecific population size. It is usual to divide the observed number of disease cases by the expected number to yield a standardised morbidity/mortality ratio (SMR). Where: STD Ri: the standard rate in the ith strata of population OBS Pi: the observed population size in the ith strata For example: We know that the prevalence of a given disease throughout a country is 0.01%. If we are presented with a region with 20,000 animals the expected number of cases of disease in this region will be 0.01% × 20,000 = 2. If the actual number of cases of disease in this region is 5, then the standardised mortality (morbidity) ratio is 5 ÷ 2 = 2.5. That is, there were 2.5 times more cases of disease in this region, compared with the number of cases we were expecting. Example: Measures of association Risk is the probability that an event will happen. A characteristic or factor that influences whether or not an event occurs, is called a risk factor. Associations between putative risk factors (exposures) and an outcome (a disease) can be investigated using analytical observational studies. Measures of association Both exposure and outcome are binary variables (yes or no), the results can be presented as a 2 × 2 table. Exposed Non-exposed Total Diseased a c a+c Non-diseased Total b a+b d c+d b+d a+b+c+d Incidence risk in the exposed population: Incidence risk in the non-exposed population: Cont. Incidence risk in the total population: Measures of association Case-control study Exposed Non-exposed Total Case a b a+b Control c d c+d Odds of disease in the exposed population: Odds of disease in the non-exposed population: Cont. Three main categories: (1) measures of strength (2) measures of effect (3) measures of total effect Measures of strength Risk ratio Incidence rate ratio Odds ratio Risk ratio (RR) To define as the ratio of the risk of disease (i.e. incidence risk) in the exposed group to the risk of disease in the unexposed group. If RR = 1 : risk of disease in the exposed and non-exposed groups are equal. If RR < 1 : exposure reduces the risk of disease and exposure is said to protection If RR > 1 : exposure increases the risk of disease RR range between 0 and infinity. It cannot be estimated in case-control study Incidence rate ratio (IRR) This is the ratio of the incidence rate in the exposed group to that in non-exposed group. The term relative risk is used as a synonym for both risk ration and incidence rate ratio. IRR is interpreted in the same way as risk ratio. Odd ratios (OR) OR is the odds of disease, given exposure. Odd ratio (cont.) When the No. of cases is low relative to the No. of non-cases (i.e. the disease rare) OR approximately RR. Measures of effect in the exposed population Attributable risk (rate) Attributable fraction Attributable risk (AR) AR is defined as the increase or decrease in the risk (or rate) of disease in the exposed group that is attributable to exposure. Attributable fraction (AF) Attributable fraction (the attributable proportion in exposed subjects) is the proportion of disease in the exposed group that is due to exposure. For example In vaccine trials (in foxes), vaccine efficacy is defined as the proportion of disease prevented by the vaccine in vaccinated individuals which is the attributable fraction. The following results were obtained: Vaccination Vaccination + Total • Rabies + 18 12 30 Rabies 30 46 76 Total 48 58 106 The odds of rabies in the unvaccinated group was 2.3 times the odds of rabies in the vaccinated group (OR = 2.3). Fifty six percent of rabies cases in unvaccinated foxes was due to not being vaccinated (AF = 0.56). Measures of effect in the total population Population attributable risk or rate (PAR) is the increase or decrease in risk (or rate) of disease in the population that is attributable to exposed. Measures of effect in the total population Population attributable fraction (PAF) is the proportion of disease in the population that is due to the exposure. For example: A study investigating the relationship between DCF and FUS was conducted. The following results were obtained: DCF + DCF Total FUS + 13 5 18 FUS 2163 3349 5512 Total 2176 3354 5530 Questions ? The incidence risk in exposed group ? The incidence risk in non-exposed group ? RR ? AR ? AF ? PAR ? APF ? Answers: The incidence risk of FUS in DCF+ group was 5.97 cases per 1000 The incidence risk of FUS in DCF- group was 1.49 cases per 1000 The incidence risk of FUS in DCF exposed group was 4.01 times greater than the incidence risk of FUS in DCF non-exposed group (RR = 4.00) Answers: The incidence risk of FUS in DCF+ group that may be attributed to DCF is 4.5 per 1000 (AR = 0.0045) In DCF+ group 75% of FUS is attributable to DCF (AF = 0.75). The incidence risk of FUS in the population that may be attributed to DCF is 1.8 per 1000. That is, we would expect the risk of FUS to decrease by 1.8 cases per 1000 if DCF were not fed (PAR = 0.0018). Fifty-four percent of FUS cases in the population are attributable to DCF (PAF = 0.54). Other basic terminology Accuracy: the accuracy of a test relates to its ability to give a true measure of the substance being measured. To be accurate, a test need not always be close to the true value, but if repeat tests are run, the average of the results should be close to the true value. Cont. The precision of a test relates to how consistent the results of the test are. If a test always gives the same value for a sample (regardless of whether or not it is the correct value), it is said to be precise. Variability among test results might be due to variability among results obtained from running the same sample within the same laboratory (repeatability) or variability between laboratories (reproducibility). Cont. Bias is caused by systematic error, a systematic error being one that is inherent to the technique being used that results in a predictable and repeatable error for each observation. Cont. The most common of bias are: Bias due to confounding Measurement bias Interview bias Selection bias Thank You