Study Designs in GWAS Jess Paulus, ScD January 30, 2013 Today’s topics Case-control studies Population based Hospital based Nested studies Selection bias Introduction to population stratification Genetic Association Study Design Case-Control: Dichotomous endpoints Continuous or Quantitative traits Diabetes: yes versus no HgA1C Family Studies Association Study Sample Size High Family Study Low Low High Heritability Genetic complexity High Low Hierarchy of Study Designs Systematic Reviews & Meta Analysis MR. HAPPY Randomized Controlled Trials Cohort studies Case-control studies Cross-sectional studies Ecologic studies Case reports MR. WORRY Cohort Study: Selection into study on basis of exposure status EXPOSURE OUTCOME ? ? Basis on which groups are selected at beginning of study PRESENT ABSENT Cohort studies in genetic epidemiology Allows study of multiple disease endpoints – extends efficiency of effort to genotype Selection bias is generally limited Cohort study limitations for genetic epidemiology Loss-to-follow-up bias Need for repeated questionnaire assessments for most up to date covariate information Very costly and logistically challenging to genotype entire cohort and survey for disease endpoints Due to this reason, genetic epidemiologic studies of full cohorts are rare Case-Control: Selection based on disease status Control Exposure? Case Basis on which groups are selected at beginning of study Case-control designs for genetic exposures Appropriate for rare diseases, like cancer Can be retrospective or prospective (nested case-control design) Efficient sampling of an underlying cohort Control selection The biggest threat to most case-control studies Controls must be drawn from the source population that gave rise to the cases The ideal controls should: Represent the exposure distribution in the source population that gave rise to the cases Be those who, had they developed the case disease, would have been included in your study as a case Failure to select appropriate controls generates selection bias Selection of participants based on joint probability of exposure and outcome Population case-control study Cases arise from a given population, and controls are randomly sampled from that population (assuming population is enumerated) Example: cases from CT state tumor registry, controls drawn from state census tract listings Reduces potential for selection bias since source of controls is well-defined Limitations of the population-based case-control study for genetic epidemiology Lower participation rates than hospital-based studies, especially given need for biological samples Implementation of specimen collection and processing protocols can be challenging outside a clinical setting If interest in following participants for survival outcomes, tracing can be difficult Hospital-based case-control study Appropriate for genetic epidemiology studies: Hospital setting facilitates subject enrollment and biological specimen collection and analysis Recruitment by medical staff can aid enrollment Smaller geographic area to cover than a population-based study – reduce processing/shipping time Aids in collection of specimens in a timely fashion after disease diagnosis, limiting possibility for reverse causation When cases are hospital-recruited, source population is the catchment population of the clinic The collection of all the people who would have been notified as a case, had they developed disease Hospital-based case-control study limitations Retrospective nature opens door to: Recall bias Reverse causation Selection bias Selection bias in particular is a risk because it is difficult to identify the source population that gave rise to the cases Ideal control: Who would have presented as a case to Hospital X had they in fact become ill? Attempt to identify catchment population can be challenging Sometimes, a control disease (sick controls) is chosen to limit potential for selection bias and differential recall of past exposure Control illness must not be associated with the gene of interest Nested case-control study A type of population-based control sampling Any case-control can be conceived as resting within a cohort of exposed and unexposed When the cohort is very well defined this is called a nested case-control study Sampling from within the cohort (rather than doing full cohort analysis) is usually motivated by efficiency concerns Important applications for genetic epidemiology where it would be too costly to genotype the full cohort Nested case-control study design advantages Limited potential for selection bias because full cohort is enumerated and can randomly sample controls from roster Often prospective – limits potential for gene/biomarker to be affected by disease process Cohort sources of nested casecontrol studies EPIC cohort: http://epic.iarc.fr/ Nurses Health Study: http://www.channing.harvard.edu/nhs/ NCI Breast and Prostate Cancer Cohort Consortium (BPC3): http://epi.grants.cancer.gov/BPC3/ Multiethnic Cohort (MEC) study: http://www.uscnorris.com/mecgenetics/ Alpha-Tocopherol, Beta-Carotene Cancer Prevention cohort: http://atbcstudy.cancer.gov/study_details.html Framingham Heart Study: www.framinghamheartstudy.org Analysis of case-control GWA studies Univariate analysis: Pearson χ2 or Fisher exact test, Armitage trend test Multivariate analysis: Logistic regression (if unmatched) or conditional logistic regression (if matched)