PHENOTYPE DEFINITION AND ANALYTIC STRATEGIES FOR LATE-ONSET AD Deborah Blacker, MD, ScD Director, Gerontology Research Unit, Mass General Hospital Associate Professor of Psychiatry, Harvard Medical School Associate Professor in Epidemiology, Harvard School of Public Health OUTLINE: Sequence of gene discovery •Take stock •Gather families •Genotyping and analysis •Confirmation and understanding •Potential clinical utility SEQUENCE OF GENE DISCOVERY: Step 1: Taking stock • Clinical features (diagnostic reliability and validity, subtypes, boundary issues) • Descriptive epidemiology (incidence, prevalence, onset distribution, survival) • Risk and protective factors • Genetic epidemiology (familial aggregation, heritability) • Biology • Known genes and previous reports Clinical features Late-life dementia Insidious onset, prodromal cases designated Mild Cognitive Impairment Definitive diagnosis by autopsy: about 90% accurate in academic centers Diagnosis probably optimal early in the course, when symptoms are definitive but relatively mild Diagnosis excellent in academic centers, highly variable in the community Descriptive epidemiology Common disease: prevalence at least 5-10% for >65-year-olds, at least 25% for >85-yearolds Incidence and prevalence rise steeply with age Age of onset varies widely Accounts for about 70% of dementia Putative environmental risk and protective factors Increased risk: Longevity Female gender Atherosclerosis Diabetes Hypertension Elevated homocysteine Elevated cholesterol Head trauma Decreased risk: Education Estrogen NSAIDs Anti-oxidants Statins Exercise Genetic epidemiology 2-3 fold increased risk in 1st degree relatives ( = 2-3) Age of onset correlated in families Mostly complex inheritance, but rare families autosomal dominant MZ > DZ twin concordance, MZ < 1, MZ age of onset may differ Family and twin findings hold in earlyand late-onset cases Biology Extensive knowledge of AD biology facilitates selection of candidate genes Options include genes for homologs, substrates, and ligands related to: Known genes (e.g., presenilins, APP) Neuropathologic lesions (e.g., amyloid and tau metabolism) Broader theories of pathophysiology (e.g., cholesterol metabolism, inflammation, clotting cascade) Ab Ab Ab Aggregation Ab AbAb Ab Ab Production: APP*, PSEN1*, PSEN2*, BACE APP Ab Ab a2M Endocytosis Ab Clearance LRP Ab Degradation Lysosomal Degradation a2M apoE Ab AbAb Ab Plasma Membrane Free Protease (IDE, Plasmin) Ab Known genes and previous reports Early onset gene mutations (APP, PSEN1, PSEN2) generally fully penetrant, rare but important for understanding pathophysiology APOE a susceptibility gene with a complex but substantial impact on risk and age of onset Many other reported genetic associations, most inconsistently replicated, if at all Many AD genes remain to be discovered Family history is a risk factor even at advanced ages, and controlling for APOE APOE-4 effect attenuated when AD incidence most common Segregation analysis predicts 3-4 additional AD genes for late onset AD (age of onset) Segregation analysis predicts 4-7 additional AD genes Daw et al, 2000 Taking stock: Constraints and opportunities High prevalence Easy to ascertain large samples High likelihood of within-family heterogeneity Late-onset Complex survival issues and competing risks Parental genotypes rarely available Diagnostic ambiguities Need to minimize error, follow for autopsies Option for age of onset and other quantitative phenotypes Constraints and opportunities (continued) Complex inheritance Reality far more complex than available models Linkage peaks extremely broad Known gene with substantial but variable effect Stratification vs. controlling for APOE Accumulating knowledge of pathophysiology Extensive source of candidate genes Low prior probability for any one candidate SEQUENCE OF GENE DISCOVERY: Step 2: Gather families • Sample size • Ascertainment criteria • Diagnostic methods • Additional phenotypic information allowing for alternate phenotype definition • Databasing and cell-banking Sample size “More is better”: more families means more power “Better is better”: power also increases with: More individuals per family Greater accuracy of diagnosis More homogeneous samples Series of trade-offs to obtain optimal families in large enough numbers Multi-site and pooled samples common Ascertainment criteria Implementation of tradeoffs between optimal family structure and ease of ascertainment Issues to consider: Number of affecteds per family Number of unaffecteds per family Number of additional family members Diagnostic threshold (AD vs. MCI) Age of onset of affecteds Age of unaffecteds Complete information available LOAD study: 2 affecteds with onset >60, one additional family member (affected with onset >50 or unaffected age >60) Diagnostic methods: Affecteds Accurate phenotyping is critical for success, as errors can cause substantial reduction in power To increase diagnostic accuracy: Raise diagnostic threshold (AD vs. MCI) Operationalize diagnostic criteria Raise level of certainty (Definite > Probable > Possible) Require in-person evaluation Facilitate autopsy confirmation In person vs. remote evaluation Based on autopsies to date from MGH site of NIMH Genetics Initiative Subjects evaluated at MGH or an affiliated center vs. those evaluated by medical records + telephone interview Probable AD Possible AD Correct/total PV+ Correct/total PV+ In person 22/24 91.7% -- -- Remote 61/68 89.7% 10/15 66.7% Assessment methods: Unaffecteds For late-onset AD, unaffected status inherently provisional To improve accuracy of current designation: Use formal assessments of cognitive and functional status Include follow-up: older unaffecteds have passed through a greater fraction of the age of risk Additional phenotypic information Allows alternate phenotype definition Difficult to anticipate what will be useful later on; trade-off between completeness and efficiency Quantitative phenotypes particularly appealing given potential for increased power and control of multiple other genetic and non-genetic risk factors Subtyping offers chance to analyze a more homogeneous sample Proposed alternate phenotypes Quantitative traits: Age at onset Memory function Plasma A-beta levels Subtypes: Onset age Psychotic features Parkinsonian features Databasing and cell-banking Database requirements: Accuracy, accessibility, security, ease of use, back-ups, documentation, updates Cell banking requirements: Sample safety and security, ease and reliability of distribution, back-ups, stability SEQUENCE OF GENE DISCOVERY: Step 3: Genotyping and analysis • Overall strategy • Overarching considerations • Linkage analysis • Association analysis Overall strategy: Mendelian disorders Genome screen identifies a chromosomal region of interest Fine mapping with more closely spaced markers, often with enlarged or extended sample, narrows this region Positional cloning used to identify disease genes within the narrowed region Highly effective: virtually all Mendelian disorders have been mapped Overall strategy: complex diseases Linkage peaks broad, often span much of a major chromosome Tend to remain broad even with increases in sample size or addition of more markers Require more complex strategies, typically linkage and then association analysis Association analysis used to test specific candidate genes Growing interest in association for narrowing the linked region after a linkage screen, or for initial screen itself Positional Candidate Approach Genome Screen Genetic Linkage Analysis Chromosomal Regions of Interest Gene and EST Database Searches Candidate Gene Assessment Family-based Association Tests Putative Disease-Associated Gene Demonstration of Functional Effects Confirmation in Population Sample Established Disease-Associated Gene Overarching considerations: Trait Trait analyzed in most studies is AD, but there is growing interest in alternative phenotypes Quantitative trait analysis offers greater power in theory, and more flexible analytic methods, but need to address scaling issues Subtyping may offer greater homogeneity, but difficult to address variability within families Overarching considerations: Other How to handle unaffecteds Many approaches use only genotypic information from unaffecteds Including phenotypic information optimal if age effects can be modeled accurately How to handle APOE and other modifiers of risk (age, gender, education) Stratification Covariate-based methods Linkage studies Based on alleles traveling with the disease within families Aimed at identifying a narrow region in which a disease gene may reside Extremely successful for Mendelian disorders (and Mendelian subforms of complex disorders) Able to identify broad regions where complex disease genes may reside, but rarely able to narrow these regions enough to find genes Typically use highly polymorphic markers with modest spacing for a screen, and tighter spacing for follow-up Basis for linkage analysis Chromosomal segments cross over and recombine during meiosis Genes infrequently separated across generations are said to be genetically linked Tighter linkage suggests greater proximity Trait that travels with a genetic marker suggests a disease gene in the region Types of linkage analysis Extended families vs. sibpairs Depends on sample available, analytic tools to be used Parametric vs. non-parametric (the extent to which a genetic model for the trait is used) Parametric more powerful if the model is correct—unlikely for complex diseases Single point vs. multipoint Multipoint more powerful in theory, but computationally intensive, and susceptible to genotyping and phenotyping errors Results of an AD Genome Screen1 1Blacker et al, Hum Molec Genet 2003 Chromosomes 1 - 3 Chromosomes 4 - 7 Chromosomes 8 - 11 Chromosomes 12 - 15 Chromosomes 16 - 19 Chromosomes 20 – 22, X Association studies Based on alleles traveling with the disease across families Two main uses: To narrow the linked region To test specific candidate genes (based on position and/or biology) Can be done in family, case-control, or population-based samples Generally uses single nucleotide polymorphisms (SNPs), often within candidate genes Basis for association studies Association can occur for two reasons: Causal association: an allele is associated with disease because it increases risk Linkage disequilibrium (LD): an allele is associated with disease because it so close to a risk allele or occurred so recently that recombination hasn't separated them Unlike linkage, LD depends on distance and history, so not a monotonic function of distance Associated allele may vary across populations Haplotypes and multiple SNPs Because association can be “patchy,” need to test multiple SNPs to fully evaluate a given candidate gene Haplotype analysis, which incorporates multiple closely spaced SNPs, can increase ability to detect association Limited programs available at present, but this is an area of intensive methods development Development of the “hapmap” will also facilitate these analyses Types of association analysis: Family samples Not susceptible to bias due to admixture and population stratification In theory less powerful because there is less variability within families Limited options to test for association when parents not available, as is the case for AD Effect size estimates for candidate genes available, with or without controlling for covariates, using conditional logistic regression Conditional ORs cannot be pooled across studies Types of association analysis: Unrelated individuals Sampling: cases and controls, or population based sample More powerful under certain assumptions Bias less of a concern if population based Chi square test provides a simple test of genetic association For candidate genes, OR provides effect size estimates, with or without controlling for covariates Crude ORs can be pooled across studies Meta analysis: LRP1* * Alzheimer Research Forum Gene Database: www.alzgene.org SEQUENCE OF GENE DISCOVERY: Step 4: Confirmation and understanding •Replicate in an additional sample, ideally differently ascertained •Assess impact in clinical and general population samples •Assess functional effects (critical but very difficult to demonstrate) Prevalence of AD by age, sex, and APOE genotype 0.45 0.4 ε4/ε4 ε4/εx εx/εx F 0.35 M 0.25 F M 0.2 0.15 0.1 0.05 Age 104 101 98 95 92 89 86 83 80 77 74 71 68 0 65 Pi 0.3 apoE and A-β accumulation Puglielli et al, 2003 SEQUENCE OF GENE DISCOVERY: Step 5: Clinical utility Improved understanding of pathophysiology Progress in genetics and epidemiology using knowledge of known genes Potential for rational drug development Potential for pharmacogenomic effects, targeted treatments and preventive strategies Potential role of genes in early detection and early intervention Potential role in predictive risk assessment and prophylaxis Caveats AD is a test case for the genetics of complex diseases Current expectations for genetic progress may be overly optimistic, especially for rational drug design Pre-symptomatic knowledge, whether definitive or probabilistic, may cause more harm than good Limited knowledge of genetics in the general population and among physicians compromises ability to use and understand genetically based strategies for treatment and prevention ACKNOWLEDGEMENTS Gerontology Research Unit, MGH M.S. Albert, PhD T.J. Moscarillo, BA Lynelle Cortellini, BA Alzheimer’s Disease Research Center, MGH J. Growdon, MD L. Yap, PhD C. Crosby, BA Genetics and Aging Research Unit, MGH R.E. Tanzi, Ph.D. M.J. Kim, PhD M. Parkinson, BS L. Bertram, MD R. Menon, BS A.J. Sampson, BA M. Hiltunen, PhD K. Mullin, BS M. Hsiao, BS Depts. of Epidemiology and Biostatistics, HSPH N.M. Laird, PhD, C. Lange, PhD M.B. McQueen, MS