Epidemiologic Measures of Event Frequency A. Preliminaries Considerations 1. Mathematical Aspects of Epidemiologic Measures: a) COUNTING (accurately, precisely, and reliably) b) NUMERATOR (those with existing or new condition, i.e. “cases”) c) DENOMINATOR (population in which existing or new condition is counted) d) ESTIMATION (all measures done in samples are used to estimate some “true” characteristic of the population) 2. Mathematical Types of Measures a) Ratio: a) Obtained by dividing one quantity by another without implying any relationship between the numerator and denominator, i.e. x/y…e.g. (# apples)/(# oranges). b) Range of values: (-) to (+) c) Epidemiology Examples: Risk Ratio, Rate Ratio, Odds Ratio; in fact, all epidemiology measures, which involve numerator and denominator are, by definition, ratios (including prevalence and incidence) b) Proportion a) A specific kind of ratio in which elements included in the numerator must also be included in the denominator, i.e. a/(a+b)…e.g. (# apples)/[(# apples) + (# oranges)]. b) All probabilities (including “chance”, “likelihood”) are of this form. c) Range of values: (0.00) to (1.00). d) Epidemiology Examples: Prevalence, Incidence Proportion, all expressions of Risk. c) Odds a) b) c) d) d) Rate a) Another specific ratio used to express the frequency (“likelihood” in a generic sense) of an event, related to probability but with distinct mathematical properties. Defined as the probability of an event occurring divided by the probability of the event not occurring, i.e. [a/(a+b)] / [b/(a+b)] = a/b Range of values: (0.00) to (+) Epidemiology Examples: Prevalence Odds, Incidence Odds, and used in the construction of the ever-present Odds Ratio. Strictly defined as a velocity or speed: A ratio in which there is a distinct numerator-denominator relationship in which those in the numerator also contribute to the denominator, but they do so in terms of time (or person-time) Thus, all true epidemiologic “rates” have time as intrinsic to the denominator and are expressed with units of person-time in the denominator i.e. a/(total “person-time” for a+b). Range of values: (0.00) to (+ ) Epidemiology Example: Incidence Rate, Mortality Rates 1 b) 3. 4. B. “Rate” is used in a variety of ways by epidemiologists and others, not all of which are strictly correct. For example, to speak of “prevalence rate” is incorrect since prevalence is a proportion not a rate For our purposes, use the term “rate” only for measures in which persontime is the unit of measurement in the denominator (either implied or stated). This means, for example, that 10/10,000/year is not a technically correct rate expression (i.e. not equivalent to 10/10,000 personyears), although this form is commonly referred to as a “rate”a Mathematical Form of Epidemiology Measures a) Epidemiology measures (except simple counts) should be expressed with numerators and denominators that promote ease of understanding and comparison: a) Numerators should generally have at least one whole integer and no more than on decimal point b) Denominators should be appropriately converted to a factor of 10 (e.g. 100 or %; 1,000; 10,000; 100,000) c) Example: Six events in 6,000 people simplifies, mathematically, to 0.001, but would be more appropriately expressed in epidemiology as 1/1,000 (or, for example, 100/100,000 if being compared to another measure with 100,000 as the denominator. Time a) Time is a crucial component of all epidemiology measures of event frequency and their interpretation; time should always be specified explicitly a) Point in time: point prevalence b) Time period: chronologic time over which subjects are observed – incidence proportion or period prevalence. c) Person-time observed: total sum of the amount of time each subject is observed at risk (expressed in person-years, person-months, person-days etc – where time is intrinsic to the measure) - incidence rate Some Introductory Definitions: 1. Burden: a) The amount/frequency of an event or disease in a population; generally referring to existing events/disease in a defined population at a point in timeb b) Estimated by Prevalence measures. 2. Risk: a) Most simply defined as the probability (likelihood/chance) of developing an outcome of interest (event or disease) over a specified amount of time b) Risk is always expressed as the proportion of new events occurring over a specified time period in a defined population at risk for acquiring the event. Note that this statement (events per population per time) is a “rate” as defined in the textbook and in Last’s “Dictionary of Epidemiology”. We will, however, consider a measure a rate only if it has person-time at risk in the denominator (representing the “population at risk”). b One could also consider, although less commonly, “burden” in terms of new event/disease. In this sense burden would refer to incidence, not prevalence. For example, the occurrence of five new diabetics each year (as number of cases or cases per population) represents the “burden” of new diabetics who will need educational and support services for their new diagnosis. 2 a c) 3. Population a) Definition: A collection of individuals (usually people) sharing a specified characteristic or set of characteristics (usually includes specification of time and geography) a) Example 1: All people living in the Portland metropolitan area on Jan. 1, 2006 b) Example 2: All men ages 30-39 in the Portland metropolitan area on Jan. 1, 2006 who were followed for a year c) Example 3: All entering HIP students during the years 2005-2010 followed through the five years after graduation. b) Open Population: a defined population that may gain membership through birth or immigration or may lose membership through emigration or death from causes other than the one under study. a) Example 1: Those living in Multnomah County during a study of cancer incidence over a five-year period. b) Example 2: The Nurses Health Study, a cohort study across a long enough time period that some are lost to follow-up or die. c) d) 4. Risk is estimated by Incidence, calculated as either a Rate or a Proportion (see below) Closed Population: a population that neither adds nor loses membership over the course of time a) Example: Outbreak investigation for a small gathering where complete followup of attendees is feasible. b) Example: Clinical studies, using small cohorts followed over a short time that allows for complete follow-up (although, these are uncommon and too much loss to follow-up will convert these into “open” populations) Population At-Risk: a specified population of individuals capable of acquiring the condition or event of interest. This term does not refer to those with one or more risk factor, i.e. “at-risk” does not mean “high risk”. a) Relevant for all cohort and follow-up studies (including randomized trials). b) Conceptualized in terms of either number of people or amount of time c) Example: For uterine cancer the population at risk would include all women (no men) with a uterus (excludes those with a hysterectomy) who do not already have uterine cancer. d) The actual population studied may be further restricted to a specific group of interest within the broader “at risk” population, e.g. for uterine cancer the study population might be restricted to only adult women in a specified age range.] Sample (estimation) a) Because we can never adequately observe all (or all possible) members of a population, we select a sample (smaller group) of individuals from the population to “represent” the population and the characteristics and experiences of that population. b) Representative Sample: Investigators have a number of ways to obtain samples to adequately “represent” the population (e.g. random samples); the important points are: a) Random samples are ideal for the statistics but often impracticable. b) It is critical to have a sampling method that maximizes the ability to obtain a representative sample. c) How samples are selected, recruited to participate, and retained provide clues to whether the sample represents the stated population of interest (or, conversely, what population it represents). 3 C. Measures of Frequency 1. Counts a) Definition: the number of affected individuals who either have (existing – prevalent) or acquire (new – incident) the condition/event (i.e. includes numerator events only). b) Uses: a) Resource allocation: defining the number of existing or new cases that will need services (e.g. 100 people who need services will need those services whether they come from a small population or a large one). b) Identifying trends (i.e. comparisons across time) when we can legitimately assume that the underlying population changes little. c) Issues: a) Critical pieces of information: Adequate definition of “case” (condition or event of interest) Denominator: making explicit the nature and size of the source population within which the existing or new events are to be found Time: the time at or during which the conditions/events are counted b) Difficult to use count data when making certain comparisons across different groups/populations or when evaluating frequency in a sample that we want to generalize to the population. 2. PREVALENCE: a) Point Prevalence: the proportion in a population with a particular existing condition (prevalent cases) at a specific point in time. a) “Point in time” Usually refers to a general or specific temporal point (e.g. a short survey period – December, 2004; or a specified date – December 31, 2004) May also refer to a “point” in the life cycle (e.g. birth, entry into graduate school, retirement) b) Calculation: (# Existing cases at a point in time) / (total specified population at that point) c) d) b) Interpretation The amount (“status” or “burden”) of existing condition in the population at a given point in time. Example: A study in metropolitan Atlanta in 1996 identified 577 children (ages 3-10) with autism in a population of 169,710 white children, yielding a prevalence of 3.4/1,000. Period Prevalence: the proportion in a population with a particular existing condition at any time during a specified time-period. a) This mixes prevalent (existing) and incident (new) cases New cases that develop during the period become “existing cases” and are added to the cases present at the beginning of the period Any cases “existing” at any time during the observation period will be included even if the condition resolves during that period. 4 b) c) Uses: For ambiguously defined conditions or those with ambiguous onset, this may allow the capture of cases that exist but haven’t quite met the threshold of definition (e.g. mental health conditions) For acute short-duration conditions where point prevalence would be low and would not capture the extent of occurrence; most of the cases are likely to be “new” and thus come close to estimating incidence. Calculation: (# Existing cases at any time during a time period) / (total specified population) d) e) c) Interpretation: The amount (“status” or “burden”) of existing condition in the population at any point within a specified time period. Life-time prevalence is a special application of period prevalence Example: In a sample of U.S. adults (ages 18-44 years), 7.7% reported having had a serious mental health disorder at some point during the prior 12 months Prevalence Odds: the odds of occurrence of an existing condition in a population at a specified point in time. a) An alternate way to assess/measure frequency of existing events in a population (prevalence) b) Calculation: (# with Existing Condition) / (# without Existing Condition) c) d) e) d) Interpretation: The odds of having a condition at a point in time in a specific population. Uses: For constructing Odds Ratios in cross-sectional and case-control studies Example: In a group of 250 subjects, 125 were exposed (had the event – exposure – of interest); the odds of exposure is 1:1 or 1.0 (125/125) Issues for Prevalence measures: a) Need clear definition (and means of operationalizing that definition) of who to include in the numerator (“case” definition) and the relevant denominator (the population within which “cases” exist). b) Define and state the specific time (point or period) involved. c) Prevalence is determined by factors that affect: How fast new disease is added to the population (i.e. incidence) How long the disease “exists” in the population (i.e. average duration until resolution, cure, or death) d) Prevalence is not a measure of “risk” (probability) of getting the disease (although it is related to risk as noted above). 5 3. INCIDENCE: a) General Definition: the occurrence of new events or cases that develop in a population at risk during a specified time interval. Incidence = (# New Events observed over time) / (Population at Risk observed over time) b) Population at Risk refers to the population comprised of individuals who are capable of becoming new cases (i.e. free of the condition under consideration and can get it). This definition can be thought of in one of two ways: Person-based definition (at risk persons observed): the population of all individuals capable of acquiring the event or condition at the beginning of an observation period. This only works when all individuals can be observed throughout the specified time interval and for the same amount of time (i.e. complete follow-up in a closed population) Time-based definition (at risk person-time observed): The sum total of every individual’s observed time-at-risk, i.e. the event-free time during which each individual is observed. Individuals start contributing person-time when they enter the population or observed sample and they stop contributing person-time when they leave the at-risk population (by loss to follow up, death, or developing the event of interest)c c) c Example: 100 initially at-risk individuals, all followed for 5 years with one new event at the end of each year (5 total): Person-based population-at-risk: 100 people followed for 5 years Time-based population at risk: (95 followed for 5 yrs)+(1 for 1 yr)+(1 for 2 yrs)+(1 for 3 yrs)+(1 for 4 yrs)+(1 for 5 yrs) = (95*5)+(1)+(2)+(3)+(4)+(5)=490 person-years – which translates approximately to 490 people followed, on average, for 1 year NOTE: time is essential for both formulations of population at risk, but it is ‘outside’ the formulation as people and ‘inside’ the formulation as time (or, more correctly, as people-time) Two forms of Incidence – Because of these two different formulations of the population at risk (the denominator for incidence) we have two different ways of constructing incidence measures: Incidence Proportion (IP) – population at risk is the total number of atrisk people observed over the observation period. Incidence Rate (IR) – population at risk is the total amount of at-risk time observed over the observation period. Example (from above): IP = (5 new events)/(100 initially at risk) over 5 years = 5/100 (or 5%) over 5 years (or, on average, 1/100 (or 1%) over 1 year). Three methods for estimating Person-Time (see supplementary handout): (1) Add together the specific contribution of at-risk person-time for each individual observed (requires the specific onset of follow-up and event occurrence are available); (2) Credit each subject getting the event, lost to follow-up, or entering the observed group midstream as contributing, on average, half of the observation period (requires an assumption that these events occur equally throughout the observation period; (3) Multiply midpoint population by the duration of observation (again, assumes the net change in the population size occurs equally throughout the period of observation) 6 IR = (5 new events)/(490 person-years) = 1.02/100 person-years (which may be interpreted as 1.02/100 persons followed on average for 1 year) d) Incidence Proportion (or Cumulative Incidence) – the proportion of individuals in an at-risk population who develop a condition or event over a specific period of time. d) This measure of incidence can only be calculated when of all individuals have complete follow-up throughout the observation period or are followed for the same amount of time, i.e. only with closed populations. e) Interpretation: Represents an estimate of the “risk” of developing a disease or condition within a specified time in a specific population-at-risk. Expressed as the proportion of an at-risk population experiencing an event or acquiring a condition over a specified time period. You can also think of this as the accumulated (thus “cumulative”) effect of the incidence rate operating on the at-risk population over the specified time period (see below). f) Mathematics: Numerator: number of new events during a specified time period Denominator: number of subjects followed throughout the specified period. Calculation: (# new events in a specified period) / (population at risk at onset of that period) g) h) (expressed as # events/10X over a year) Issues for Incidence Proportion Relevant time period must always be specified (e.g. “over x years”) just as you would in any Risk statement. This measure is intuitively appealing but does not account for the fact that those who become cases during the time period continue to be followed even though they are no longer “at risk” Assumptions: Entire population is “at risk” at the beginning i.e. includes no one who has the condition (prevalent cases) or who can’t get it Ascertainment and follow-up are complete with no additions to or subtractions from the population (i.e. it can only be calculated in a “closed” population) All subjects are followed for an equal amount of time (even if not concurrently) Examples: Six hundred people recruited in January 2002, all of whom were followed until December 2004 and 12 of whom acquired the event. IP = 12/600 over 2 years = 2/100 (2%) over 2 years or, on average, 1/100 (1%) over 1 year. Eight hundred newborns were recruited between 2000 through 2005. Twenty-four developed a condition within the first year of life. IP = 24/800 over 1 year = 3/100 over 1 year. [Note: Although the study period is 5 years, that was the recruitment period for the study subjects; each study subject was followed for only one year after birth (at least for purposes of this study).] 7 b) Incidence Rate – the instantaneous rate or speed at which new cases are developing across a specified amount of observed at-risk time (i.e. in an at-risk population). It is also known as “Incidence Density”, “Force of Mortality”, or “Force of Morbidity” b) The denominator of this measure of incidence includes only the amount of time during the observation period when observed individuals are at risk. Thus it can be calculated even when one does not have complete or equal follow-up times, i.e. it can be calculated for both closed and open populations c) Mathematics: Numerator: number of new events in a specified period of time Denominator: total person-time in the at-risk subjects observed at any time during the specified period Calculation: (# new cases in a specified period)/(total event-free person-time observed) d) e) f) (expressed as # cases/10x person-years) Person-Time d – the sum of every individual’s observed time-at-risk, i.e. the disease- or outcome-free time during which each subject is observed over the course of the study period. This makes time intrinsic to the denominator in a way very different from the expression of time in an incidence proportion Although time is most often measured in years (thus personyears), is may be measured in other units (e.g. person-days, person-weeks, person-months) as appropriate for the event or condition under consideration Incidence Rate (person-time) allows us to account for people/subjects who move in and out of an at-risk observation group (i.e. an open population) or in and out of exposure categories (e.g. smokers who become non-smokers and vice-versa). Interpretation of Incidence Rate: Research and practice: IR is almost always used as an estimate of “Risk” in open populations, which is important because epidemiologists usually work with open populations (or samples). Works best when incidence is relatively low and the observation time is short (which is the case for most events/conditions) Offers the only way to estimate risk in an open population Theoretical: IR also estimates the concept of average instantaneous speed or rate which, acting at all times on an at-risk population over a specified time period, produces an accumulation of new cases in a population at risk. Can be thought of as providing a constant pressure (thus “force of…”) on the population to produce events Person-time is the sum of every individual’s observed time-at-risk (the disease free time during which each was observed). This can be estimated directly by adding every individual’s time of disease-free observation or it can be estimated indirectly in two ways: 1) by using the mid-period population multiplied times the length of the period (this assumes that on average people come and go equally through the period); and 2) by assuming that those entering or leaving the population (by getting the condition, dying or moving away) do so on average half-way through the period of observation. Person-time is usually expressed in personyears but it could be person-hours, person-days, or person-weeks depending on the characteristics of the condition under study). 8 d g) h) c) This particular use of IR is almost entirely theoretical – to help understand the dynamic nature of new event development – and is only relevant when calculated in a closed population Issues: Assumes a reasonable estimate of person-time can be developed Assumes the distribution of individual person-time is not important (e.g. that 100 persons with 1 year of risk exposure is equivalent to 5 individuals with 20 years.) Special case of recurring events: Person-time allows us to formulate an appropriate at-risk denominator when events can recur, i.e. when individuals do not leave the at-risk pool when the acquire the event/condition (or only leave temporarily. Example: Frequency of abuse events in a sample of women followed over 2 years. Risk can be measured as an IP for first events, or IR where all events, including recurrent, are included in the numerator and subjects remain at risk throughout the observation period Incidence Odds: the odds of occurrence of an new condition in a population at risk in a specified period of time b) An alternate way to assess/measure the frequency of occurrence of new events in a population at risk over a specified time (incidence) c) Calculation: (# Developing the New Condition over time) / (# Not Developing the Existing Condition over time) d) e) f) d) Interpretation: The odds of developing a condition in a population at risk over a specified period of time Uses: For constructing Odds Ratios in cohort studies (although this is a less important measure than prevalence odds, since in cohort studies we can calculate incidence rates or proportions directly). Example: In a sample of 140 subjects followed for 2 years, 20 developed the outcome: Odds of the new event over two years is 20/120 = 1:6 (or 1/6 = 0.17) Critical Issues related to all Incidence measures b) Always need a clear indication of the relevant time period of observation and how it will be incorporated into the measure (extrinsically in the IP, intrinsically in the IR). c) Need a clear definition and means of identifying the numerator – as new “cases” of a specified event, disease or condition. d) Need a clear definition and means of identifying the denominator – as specifically relating to the population at risk for the event, disease or condition. 9 2. Relationship among Prevalence and Incidence measures a) Prevalence depends upon both Incidence (the rate at which disease or events occur in the population) and Average Duration of disease/events: b) Prevalence (Incidence Rate) * (Average Duration) e This approximation works well only when the disease prevalence is low (<10%) and it assumes that the population dynamics are in a “steady state”, i.e. that the incidence rate and disease duration are constant. Incidence Rate acts on a population-at-risk over a period of Time to produce an accumulation of cases in that population at risk. That accumulation of cases over time is expressed in the Incidence Proportion: e Incidence Proportion (Incidence Rate) * (Time) This approximation works well only when the underlying incidence is low (<10% Incidence Proportion) and when the observation time period is short relative to the duration of the condition. [Note: Both of these assumptions hold in most epidemiologic and clinical studies – except when dealing with high incidence situations (e.g. epidemics) or prolonged periods of time (e.g. incidence over 10-20 years or longer.)] This most clearly illustrates conceptually the importance of incidence and average duration on Prevalence. It is a simplification of the actual approximation: IR = (Prev) / [(1-Prev)*(Dur)]; when prevalence is low (1-Prev) approaches 1, thus simplifying the expression. 10 D. Miscellaneous Epidemiologic Measures: 1. Mortality b) Measures of mortality are expressed as rates. The average person-years of observation in the denominator is estimated by the mid-year population multiplied by the duration of observation (most often 1 year) Mortality Rates are often expressed as proportions, restating the Rate as a Risk. If you do this, call them Risks not Rates and explicitly state that you are using the Rate to estimate the Risk Crude Mortality b) Definition – the rate of dying from all causes in a total population. c) Calculation: [(Total # Deaths All Causes) / (Total Mid-Year Population x 1yr)] * 10x P-Y f (expressed as Deaths/10x Person-Years) c) Cause-Specific Mortality b) Definition – the rate of dying from a specified cause in the total population. c) Calculation [(Total # Deaths specific cause) / (Total Mid-Year Population x 1yr)]*10x P-Y g d) Category Specific Mortality b) Definition – the rate of dying from all causes within a specified group (e.g. females, those in a specific age group, etc.). c) Calculation [(Total # Deaths in specified group) / (Group Mid-Year Population x 1yr)]* 10x P-Y g 2. Case-Fatality (a form of incidence proportion, not a rate) a) Definition – the proportion with a particular disease/condition that die from that disease/condition in a specified time period b) Calculation [(# Deaths from a specified disease) / (Total # with the specified disease)] * 10x 3. Attack Proportion (a form of incidence proportion, not a rate) a) Definition – The proportion of those in a given exposure category that develop the disease/condition of interest. b) Usually used in the context of outbreaks/epidemics, so the time of observation is often left out of attack proportion expressions. c) Calculation [(# Developing a disease/condition) / (Total # in an exposure category)] * 10x f Usually in these measures, Deaths are counted for one year and the Total Person-Years is estimated as the (Mid-year Population * 1 year), since mortalities are usually calculated for a one-year observation period. The multiplier (x) may be any power of 10 to produce easily interpretable results (i.e. an integer numerator). 11