Principles of Epidemiology for Public Health (EPID600) Study designs: Cross-sectional studies, ecologic studies (and confidence intervals) Victor J. Schoenbach, PhD home page Department of Epidemiology Gillings School of Global Public Health University of North Carolina at Chapel Hill www.unc.edu/epid600/ 2/22/2011 Cross-sectional studies 1 Signs from around the world In a Copenhagen airline ticket office: “We take your bags and send them in all directions.” 2 Signs from around the world In a Norwegian cocktail lounge: “Ladies are requested not to have children in the bar.” 3 Signs from around the world Rome laundry: “Ladies, leave your clothes here and spend the afternoon having a good time.” 4 Faster keyboarding - 1 I cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg. The phaonmneal pweor of the hmuan mnid, aoccdrnig to a rscheearch at Cmabrigde Uinervtisy. It dn'seot mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it wouthit a porbelm. • Gary C. Ramseyer's First Internet Gallery of Statistics Jokes http://davidmlane.com/hyperstat/humorf.html (#162) 5 Faster keyboarding - 2 Most of my friends could read this with understanding and rather quickly I might add. Then I had them read a statistical bit of literature: • Miittluvraae asilyans sattes an idtenossiy ctuoonr epilsle is the itternoiecsno of a panle pleralal to the xlyapne and the sruacfe of a btiiarave nmarol dbttiisruein. Gary C. Ramseyer's First Internet Gallery of Statistics Jokes http://davidmlane.com/hyperstat/humorf.html (#162) 6 Principles of Epidemiology for Public Health (EPID600) Study designs: Cross-sectional studies, ecologic studies (and confidence intervals) Victor J. Schoenbach, PhD home page Department of Epidemiology Gillings School of Global Public Health University of North Carolina at Chapel Hill www.unc.edu/epid600/ 2/22/2011 Cross-sectional studies 7 Today – outline • Cross-sectional studies (and sampling) • Ecologic studies • Confidence intervals 10/15/2001 Cross-sectional studies 8 Cross-sectional studies • Cross-sectional studies include surveys • People are studied at a “point” in time, without follow-up. • Can combine a cross-sectional study with follow-up to create a cohort study. • Can conduct repeated cross-sectional studies to measure change in a population. 2/10/2009 Cross-sectional studies 9 Cross-sectional studies • Number of uninsured Americans rises to 50.7 million. (USA Today, 9/17/2010; data from Census Bureau) • In 2007-2008, almost one in five children older than 5 years was obese. (Health, United States, 2010; data from the National Health and Nutrition Examination Survey) • 35% (~7.4 million) of births to U.S. women during the preceding 5 years were mistimed or unwanted (2002 National Survey of Family Growth, Series 23, No. 25, Table 21) [Source: www.cdc.gov/nchs/] 2/22/2011 Cross-sectional studies 10 Cross-sectional studies • Incidence information is not available from a typical cross-sectional study • Sometimes can reconstruct incidence from historical information • Example: the incidence proportion of quitting smoking, called the “quit ratio”: ex-smokers / ever-smokers is calculated from survey data. 2/10/2009 Cross-sectional studies 11 Measure prevalence at “point” in time • “Snapshot” of a population, a “still life” • Can measure attitudes, beliefs, behaviors, personal or family history, genetic factors, existing or past health conditions, or anything else that does not require followup to assess. • The source of most of what we know about the population 10/15/2001 Cross-sectional studies 12 Population census • A cross-sectional study of an entire population • Provides the denominator data for many purposes (e.g., estimation of rates, assessing generalizability, projecting from smaller studies) • A huge effort – people can be difficult to find and to count; may not want to provide data • Some countries maintain accurate and current registries of the entire country 2/22/2011 Cross-sectional studies 13 National surveys conducted by NCHS National Health Interview Survey (NHIS) – household interviews National Health and Nutrition Examination Survey (NHANES) – interviews and physical examinations National Survey of Family Growth (NSFG) – household interviews National Health Care Survey (NHCS) – medical records 2/22/2011 Cross-sectional studies 14 National surveys • Designed to be representative of the entire country • Modes: household interview, telephone, mail • Employ complex sampling designs to optimize efficiency (tradeoff between information and cost) • Logistically challenging (answering machines, cellphones, . . .) See presentation by Dr. Anjani Chandra at www.minority.unc.edu/institute/2003/materials/slides/Chandra-20030522.ppt 2/22/2011 Cross-sectional studies 15 Example: National Health Interview Survey • Conducted every year in U.S. by National Center for Health Statistics (CDC) • “Stratified, multistaged, household survey that covers the civilian noninstitutionalized population of the United States” • Redesigned every decade to use new census 10/15/2001 Cross-sectional studies 16 “multistaged” • Improves logistical feasibility and reduces costs (though reduces precision) 1. Divide population into primary sampling units (PSU’s) PSU = primary sampling unit: metropolitan statistical area, county, group of adjacent counties 2/10/2009 Cross-sectional studies 17 “multistaged” 2. Select sample of census block groups (SSU’s) within each selected PSU 3. Map each selected census block group or examine building permits 4. Select one cluster of 4-8 housing units dispersed evenly throughout the block NCHS draws a new representative sample for each week’s interviews 2/10/2009 Cross-sectional studies 18 “stratified” • US divided into 1,900 PSU’s • Largest 52 PSU’s are “self-representing” • Rest of PSU’s divided into 73 categories (“strata”), based on socioeconomic and demographic variables • Sampling takes place separately within each category (“stratum”) 10/15/2001 Cross-sectional studies 19 Sample size and Precision Sample Lower Point Upper size 95% estimate 95% Width 100 0.17 0.25 0.33 0.16 400 0.21 0.25 0.29 0.08 900 0.22 0.25 0.28 0.06 1600 0.23 0.25 0.27 0.04 0.25 0.188 0.43301 7/30/2010 Cross-sectional studies 20 Weighted sampling Hypothetical Unweighted Weighted Age group Pop (1,000's) Sample Sample 20-39 yrs 40-59 yrs 60-69 yrs Total 3/6/2006 18,000 18,000 8,000 44,000 Cross-sectional studies 900 900 400 2,200 400 400 400 1,200 21 “stratified” • Also place census blocks into categories and sample within each • Oversample some strata 10/15/2001 Cross-sectional studies 22 “Defined population” • Studies, especially cross-sectional studies, are easiest to interpret when they are based in a population that has some existence apart from the study itself (“defined population”) 1. Political subdivision (city, county, state) 2. Institutional (HMO, employer, profession) • Probability sampling enables statistical generalizability to the defined population 2/10/2009 Cross-sectional studies 23 Surveys of sentinel populations • HIV seroprevalence survey in three county STD clinics in central NC in 1988 • 3,000 anonymous, unlinked, leftover sera • Anonymous questionnaire for demographics and risk factors [Schoenbach VJ, Landis SE, Weber DJ, Mittal M, Koch GG, Levine PH. HIV seroprevalence in sexually transmitted disease clients in a low-prevalence southern state. Ann Epidemiol 1993;3:281-288] 2/22/2011 Cross-sectional studies 24 HIV seroprevalence Group Homosexual men Bisexual men Heterosexual men Women Total % HIV+ 46 25 1.6 0.6 2.5 [Schoenbach VJ, Landis SE, Weber DJ, Mittal M, Koch GG, Levine PH. HIV seroprevalence in sexually transmitted disease clients in a low-prevalence southern state. Ann Epidemiol 1993;3:281-288] 10/15/2001 Cross-sectional studies 25 Seroprevalence (% HIV+) by risk factors Characteristic Gay Hetero Women Syphilis (history/current) 53 9.0 3 Gonorrhea (history) 37 2.6 1 Anal intercourse 41 1.7 2 Paid for sex 5.2 [Schoenbach VJ, Landis SE, Weber DJ, Mittal M, Koch GG, Levine PH. HIV seroprevalence in sexually transmitted disease clients in a low-prevalence southern state. Ann Epidemiol 1993;3:281-288] 10/14/2003 Cross-sectional studies 26 Interpretation • Measures prevalence – if incidence is our real interest, prevalence is often not a good surrogate measure • Studies only “survivors” and “stayers” • May be difficult to determine whether a “cause” came before an “effect” (exception: genetic factors) 10/15/2001 Cross-sectional studies 27 Other points • Can choose by exposure or overall • Can choose by disease – may not be distinguishable from a case-control study with prevalent cases 10/15/2001 Cross-sectional studies 28 Outline • Cross-sectional studies (and sampling) • Ecologic studies • Confidence intervals 10/15/2001 Cross-sectional studies 29 “Ecologic” studies • Most study designs – cross-sectional, casecontrol, cohort, intervention trials – can be carried out with individuals or with groups • Group-level studies which use routinely collected data are easier and less costly • Group-level studies that involve interventions may not be easier or less costly 10/15/2001 Cross-sectional studies 30 Types of group-level variables • Summary of individual-level variable (e.g., median household income, % with high school diploma) • Property of the aggregate (e.g., neighborhood grocery stores, seat belt legislation, “community competence”) 3/6/2006 Cross-sectional studies 31 Interpretation • Link between summary exposure variable and individual-level outcome must be inferred • Inference from group to individual is not always sound 2/22/2011 Cross-sectional studies 32 Example: Male Circumcision and HIV (Slope indicates strength of relationship; r indicates linearity) Source: Bongaarts J, et al. The relationship between male circumcision and HIV infection in African populations. AIDS 1989; 3(6): 373-7. 2/22/2011 Cross-sectional studies 33 Outline • Cross-sectional studies (and sampling) • Ecologic studies • Confidence intervals 10/15/2001 Cross-sectional studies 34 Confidence intervals • Provide a plausible range for the quantity being estimated • Width indicates the precision of an estimate for a given level of “confidence” • Confidence intervals quantify only random error from sampling variation, not systematic error from nonresponse, study design, etc. 3/8/2006 Cross-sectional studies 35 Confidence level vs. precision • The more vague my estimate, the more confident I can be that it includes the population parameter: “I am 100% confident that the prevalence of HIV is between 0 and 100%”. • The more specific my estimate, the lower my confidence: “I am 0% confident that the prevalence of HIV is 5.23%” 10/15/2001 Cross-sectional studies 36 Confidence intervals – interpretation • Simple interpretations are typically not precise • Precise interpretations are typically not simple 10/12/2004 Cross-sectional studies 37 Simple but imprecise • “There is 95% confidence that the interval contains the true value” – True, but begs the question – how to define “confidence” 10/15/2001 Cross-sectional studies 38 Simple but imprecise • “There is a 95% probability that the interval contains the true value” – Not quite correct: probability (as conventionally defined) applies to a process, not to a single instance 10/15/2001 Cross-sectional studies 39 Probability applies to a process: example A 95% confidence interval can be viewed as a measurement or estimation process that will be correct (the interval includes the true value of the parameter) 95% of the time and incorrect 5% of the time. Let us make up another estimation process that will be correct (about) 95% of the time. 3/7/2006 Cross-sectional studies 40 Why probability applies to a process • Estimate your gender by flipping a coin 5 times if the result is 5 heads estimate your gender to be its opposite; otherwise estimate your gender to be what you think it is now. • Probability that estimate will be correct is (1 – Probability of 5 heads) = 0.97 = 97% • Probability that estimate will be incorrect is 3% 6/29/2002 Cross-sectional studies 41 Why probability applies to a process So we now have a measurement process that will be correct 97% of the time. We will use it to measure your gender. Flip the coin 5 times, and suppose you get 5 heads – Is there a 97% probability that you are of the opposite sex? 6/29/2002 Cross-sectional studies 42 Precise but not simple A 95% confidence interval is: 1. obtained by using a procedure that will include the population parameter being estimated 95% of the time 2. the set of all population values which are “likely” to yield a sample like the one we obtained 2/22/2011 Cross-sectional studies 43 Suppose that this line represents the value of the parameter we are trying to estimate True value 10/15/2001 Cross-sectional studies 44 Possible estimates of that parameter in N identical studies (shows sampling variation) o Study estimates oo oooo True value oooooo oooooooo oooooooooo o o ooooooooooo o oo o ooooooooooooooooo o o 10/15/2001 Cross-sectional studies 45 One possible “true” value and how it would manifest, on average, in N identical studies o oo oooo True value oooooo oooooooo oooooooooo o o ooooooooooo o oo o ooooooooooooooooo o o 95% of the distribution 10/15/2001 Cross-sectional studies 46 Estimate from one study of a given size ? Estimate 10/15/2001 Cross-sectional studies 47 A possible “true” value with < 2.5% chance of being observed at or beyond the estimate ? o oo oooo oooooo oooooooo oooooooooo o ooooooooooo o ooooooooooooooo o o Estimate 95% of the distribution 10/14/2003 Cross-sectional studies 48 A possible true value with > 2.5% probability of being observed at or beyond the estimate ? o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo o oooooooooooooooo o o Estimate 95% of the distribution 10/15/2001 Cross-sectional studies 49 A possible true value with > 2.5% probability of being observed at or beyond the estimate ? Estimate o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo o oo o ooooooooooooooo 95% of the distribution 10/15/2001 Cross-sectional studies 50 A possible true value with < 2.5% probability of being observed at or beyond the estimate ? Estimate o oo oooo oooooo oooooooo oooooooooo o o ooooooooooo oo o oooooooooooooo 95% of the distribution 10/15/2001 Cross-sectional studies 51 What the confidence interval represents o o ? oo o oo ooo o o oo oooo oo oo o o oo oooo oooooo oo oo oo oo o o oo oooooo oooooooo oo oo oo oo oo oo o o oo oooooooo oooooooooo oo oo oo oo oo oo oo oo o o o oo oooooooooo o ooooooooooo o oo oo oo oo oo oo oo oo oo oo o o ooooooooooo o oo o ooo ooooooooooooooo o o o o ooooooooooo o oooooooooooooooo o o oo o ooooooooooooooo 95% confidence interval 10/14/2003 Cross-sectional studies 52 What the confidence interval represents o o o o ooo o o oo oo oo oo oooo o o oooooooooooo ooo oo ooo o o o o o o o o o o o o o o o o oo oo oo o o o o ooo o o o o o o o oo oo oo oo o oooooooooo o oo o o o o o o ooo o o o o o o o o o oo oo oo oo oo oo oooooooooooo oo o o o o o o o o ooo o o o o o o o o o o o o o o o o o o o o o o o oo oo o ooooooo o oo o o oo o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o oo oo ooo o oo o o o o o o o oo o o o o o o o o o o o o o o o o o o o o oo o oo o o o o o o o o oo oo oo oo oo oo ooooooooooooooo oo o o o 95% confidence interval 10/15/2001 Cross-sectional studies 53 One possible “true” value and how it would manifest, on average, in N identical studies o oo oooo True value oooooo oooooooo oooooooooo o o ooooooooooo o oo o ooooooooooooooooo o o 1.96 x s.e. | 1.96 x s.e. 3/8/2006 Cross-sectional studies 54 Confidence intervals – another take 10/15/2001 Cross-sectional studies 55 One possible population O 10/15/2001 Cross-sectional studies 56 Another possible population O 10/15/2001 Cross-sectional studies 57 A 3rd possible population O 10/15/2001 Cross-sectional studies 58 A 4th possible population O 10/15/2001 Cross-sectional studies 59 A 5th possible population O 10/15/2001 Cross-sectional studies 60 A 6th possible population O O O O 10/15/2001 Cross-sectional studies 61 etc. O O O O 10/15/2001 Cross-sectional studies 62 There are 1.6 x 1060 possible populations (no cases all cases) O O O O 10/15/2001 Cross-sectional studies 63 Suppose this is the population (prevalence = 15%) O O O O OO O O O O O O O O O O O O O O O O O O O O O O O O 10/15/2001 Cross-sectional studies 64 Take a sample (n=10) O O O O OO O O O O O O O O O O O O O O O O O O O O O O O O 10/15/2001 Cross-sectional studies 65 The sample O O 10/15/2001 Cross-sectional studies 66 Make point estimate of prevalence O O 10/15/2001 Cross-sectional studies 67 Interval estimate • What are all the possible populations that would be expected to yield this prevalence in a sample of size 10? 6/29/2005 Cross-sectional studies 68 This one is not possible O 10/15/2001 Cross-sectional studies 69 Possible, but VERY UNLIKELY O O 3/8/2006 Cross-sectional studies 70 Not quite 2.5% probability (2.1%, in fact) O O O O O 3/8/2006 Cross-sectional studies 71 Yields just about 2.5% (3%, actually) probability of selecting 2 (or more) cases in 10 O O O O O O 3/8/2006 Cross-sectional studies 72 One possible “true” value and how it would manifest, on average, in N identical studies o oo oooo True value oooooo oooooooo oooooooooo o o ooooooooooo o oo o ooooooooooooooooo o o 95% of the distribution 3/8/2006 Cross-sectional studies 73 Just above 2.5% (actually 2.6%) probability of selecting 2 (or fewer) cases in 10 O OO OO O O OOO O O O OO OO OOOOO O OO OO O O O O OO OO O O OO OO O OO O OO O O O O OO O O O O O O OO O OO O O OO OO O OO O OO O O O O O OO OOO O OO O OO O O O O OOO OO OOO O 3/8/2006 Cross-sectional studies 74 Just below 2.5% (actually 2.4%) probability of selecting 2 (or fewer) cases in 10 O OO OO OO O OOO O O O OO OO O OOO O OO O OO O O O O OO OO O O OO OO O OO O OO O O OO O O O O O O O O OO O OO O O OO OO OO OO O OO O O O O O OO O O O OO O OO OO O OO O OOO OO OOO O 3/8/2006 Cross-sectional studies 75 Interval estimate for 2/10 • Lower bound: 2.5% (5 cases) • Upper bound: 55% (110 cases) Meaning: Our sample of 10 with 2 cases provides evidence to exclude, at conventional error tolerance, populations with fewer than 5 cases or more than 110 cases. Populations with 5-110 cannot be excluded as likely sources for this sample. 3/8/2006 Cross-sectional studies 76 Interval estimate for 2/10 • Actual population prevalence was 15%, which in fact is between 2.5% and 55%. • 2.5% to 55% is a very wide interval, i.e., a very imprecise estimate • To make it more precise, we need a larger sample 3/8/2006 Cross-sectional studies 77 Signs from around the world – Germany “A sign posted in Germany's Black Forest: It is strictly forbidden on our black forest camping site that people of different sex, for instance, men and women, live together in one tent unless they are married with each other for that purpose.” 78 Signs from around the world – Finland On the faucet in a Finnish washroom: “To stop the drip, turn cock to right.” 79