16th Annual Mee-ng of the Organiza-on for Human Brain Mapping Catalonia Palace of Congreses, Barcelona, Spain June 6-­‐10, 2010 Structure and Analysis of Genetic Variation Structure & Analysis of Genetic Variation • What we will look into: – Genes, promoters, micro RNA – SNP, CNV, microsatellites – methylation – Cis - trans-acting & epistasis * – transcriptome to Genes – Genes to Pathways Let’s see first how we have modified our idea of what a genetic disorder looks like … The “old” paradigm of genetics … Cystic Fibrosis Key concepts in genetic epidemiology of complex traits … and how now we think genetics works.. Multiple gene variants interacting with each other, and with multiple environmental factors Key concepts in genetic epidemiology of complex traits Variability in human “Variability is the law of life, and as no two faces are the same, so no two bodies are alike, and no two individuals react alike, and behave alike under the abnormal conditions which we know as disease.” Sir William Osler (1849-1919) What is a gene? A real example, the DRD4 Regulatory regions coding region not coding region 12bp repeat (+61 to +85) 48 bp VNTR C-11T (SmaI) 120bp duplication (PstI) 5’ Exon I Exon II (G)n repeat Gly11Arg (G+31C) Exon III G+492T (high freq) Exon IV Val194Gly (T+581G) (low freq) (low freq) Single Nucleotide Polymorphism 3’ C+870A (high freq) Classes of human genetic variants Frazer et al., 2009 Nature Rev Genet. 10:241-51. Key concepts in genetic epidemiology of complex traits Human genome and human variability Our DNA is made up of ~ 4 billion nucleotides which are like “letters”, eg. ACGGCATTGC ….that make up “words”, aka genes. Each “word” codes for an amino acid, and a sequence of these “words” codes for a “sentence”, a particular protein made by a cell.. Human genome and human variability Our DNA is made up of ~ 4 billion nucleotides which are like “letters”, eg. ACGGCATTGC ….that make up “words”, aka genes. Each “word” codes for an amino acid, and a sequence of these “words” codes for a “sentence”, a particular protein made by a cell.. Each of us differs from another by ~ 4 million of these “letters” Single nucleotide polymorphism or “SNP” Person A Person B Key-words in genetics Single Nucleotide SNPs A B C D E F G H Polymorphisms chr 1 at different loci alleles C A A C A T G C chr 1 alleles T C A T C T T genotype t en Example = gene . ic – g .. etic r am C haplotype general population genomic sequences ... A C T T T G A ...! ... A T T T T G A ...! = SNP (Single Nucleotide Polymorphism) … today we know almost 15 - 18 million common SNPs (and many more not so common) Person A Person B Person C Person D Person A Person B Person C Person D Key concepts in genetic epidemiology of complex traits A new technology: DNA MICROARRAYS Allow us to detect these SNPs …. ← ~1000s individuals ~ 1M + SNPs → AA AG AA GG AG AA .. AC AC AC AA AC CC .. CT CC CC CT CC CC .. GG GG GG AG GG GG .. TT AT AA AT AA AT .. CT TT CT TT CT CC .. .. .. .. .. .. .. .. Single Nucleotide Polymorphisms Population-based designs: what does practically means? Population-based :Cases and unrelated population controls from the same study base Affected Individuals (CASES) A a 656 879 CONTROLS 525 471 CASES A a 606 929 CONTROLS 555 441 CASES p-value = 0.3 Not affected Individuals (CONTROLS) p-value = 0.1 A a 856 679 CONTROLS 325 671 CASES p-value = 0.01 !!! SIGNIFICANT !! …and so on! WHOLE GENOME SCAN ASSOCIATION 1 2 3 Known Gene New Gene 4 5 6 How many genes? • In complex traits, there are genes acting together and we must understand “how” if we want to understand the biology of disease: modeling gene^gene interactions – the Epistasis effect Gene A Gene B + + + + + + + + + + + +++++ PNAS 105: 12387-92, 2008 GWAS..here we are.. Outcome of a Genome wide association study What are CNVs? Stretches of DNA larger than 1 kb that display copy number differences in comparison to a reference genome Types of Genomic Structural Changes Affecting Segments of DNA Leading to Deletions, Duplications, Inversions, and CNV Changes (biallelic, Multiallelic, and Complex The current map of human structural variation is far from complete…. http://humanparalogy.gs.washington.edu/structuralvariation/ http://projects.tcag.ca/variation/ Am J Hum Genet. 2009;84:148-61 ~16% of the genome Low frequency High frequency …..converging evidences on loci Chr. 1q21.1 Chr. 15q11.2 Chr. 15q13.3 …..converging evidences on genes … genomic burden of rare variants Science 320: 539-543, 2008 Nat Rev Genet 40: 8881-885, 2008 Genetics of complex disorders: what has been achieved so far “Traditional” genetic methods (eg, association) in a genomic perspective point to extreme alternatives: • GWAS • CNV/CNP CVCD RVCD Dissecting the genetic basis of disease risk requires measuring all forms of genetic variation, including SNPs and copy number variants (CNVs) …….. …….. Most common, diallelic CNPs were in strong linkage disequilibrium with SNPs, and most lowfrequency CNVs segregated on specific SNP haplotypes Nature Genetics 40: 1168-74, 2008 Looking at individual structural variants with sequencing technologies PLoS Biol. 2007 4;5:e254. Looking at individual structural variants with sequencing technologies PLoS Biol. 2007 4;5:e254. 4.1 million DNA variants (~12.3 Mb) 1.288.319 (~30%) novel! We have uncovered only the tip of the iceberg… …each genomic region contributes a modest effect, and collectively all associated region for a given trait explain only a small fraction (5-10%) of the observed phenotypic variation attributed to genetic elements… Outcome of a Genome wide association study We have uncovered only the tip of the iceberg… …each genomic region contributes a modest effect, and collectively all associated region for a given trait explain only a small fraction (5-10%) of the observed phenotypic variation attributed to genetic elements… Where is the rest of the missing heritability? • Incomplete marker coverage • Allelic heterogeneity at a given locus • Contribution of rare variants, including structural (CNVs and smaller) variants • Epistatic interactions • GxE interactions • Epigenetic modifications • Overestimation of heritability Outcome of a Genome wide association study Once a gene has been mapped: understanding the function … Outcome of a Genome wide association study Where is the culprit? Functional variants and affected genes Ioannidis et al., 2009 Nature Rev Genet. 10:318-29. Key concepts in genetic epidemiology of complex traits …. Our results suggest that there are at least 35% more functional promoters in the human genome than previously annotated. RNA interference RNA interference (RNAi) is an evolutionarily conserved mechanism that uses short antisense RNAs that are generated by ‘dicing’ dsRNA precursors to target corresponding mRNAs for cleavage. However, recent developments have revealed that there is also extensive involvement of RNAi-related processes in regulation at the genome level. dsRNA and proteins of the RNAi machinery can direct epigenetic alterations to homologous DNA sequences to induce transcriptional gene silencing or, in extreme cases, DNA elimination. Furthermore, in some organisms RNAi silences unpaired DNA regions during meiosis. These mechanisms facilitate the directed silencing of specific genomic regions. ON OFF ATTCGGTCTTACCGATATTCGG From S. Beck 2008 Integrated genomic approach phen MVPs BisSeq DMRmap Genome HapMap tag SNPs WGAmap DeepSeq phen SNPs candidate ‘(dys)-functional gene’ From S. Beck 2008 From Genes to Pathways: toward a systemic understanding of disease Human GWAS legacy data SNP Disease Phenotype In Genome-Wide Association Studies (GWAS) our goal is to find out the relationship between a Single Nucleotide Polymorphism (SNP, as a proxy for a gene) and the Disease Phenotype of interest Human GWAS legacy data Disease Phenotype SNP Gen e Protein Neuronal func-on Neural circuitry Human GWAS legacy data Animal models SNP Gen e Protein Neuronal Disease Phenotype Neural circuitry func-on Bioinforma-cs Systems Biology addresses links between SNPs and human phenotype originally identified by GWAS ….. Human GWAS legacy data Animal models SNP Genes Genome Non-­‐ coding RNA PROTEIN Neuronal Disease Phenotype Neural circuitry func-on Bioinforma-cs ….. including information related to WHOLE genomic complexity From single genes to networks Genes associated with asthma Leukemia disease network Inferred Networks based on mouse PFC expression data Gene interaction network inferred from prefrontal cortex gene expression in 42 different inbred mouse strains. Schizophrenia candidate genes from GWAS are in yellow. Some unexpected connections: DACT3 (circled), encodes regulator of Wnt signaling that has been linking to schizophrenia41-43. AHI1 exon expression in brain, LCL and immortalized cell-lines (187 subjects): a TE-derived TSS effect? Sequencing the genome of schizophrenic patients BioDataInsight: the mining Space Paris HVP -­‐ Andrea Calabria -­‐ andrea.calabria@unimi.it Biological knowledge and annota;ons i.e. genes annota-ons, pathways, proteins, SNPs, etc Mining Phenotypes i.e. clinical data, pa-ents records, etc Experimental Data i.e. Genotyping, Sequencing, etc Results of Analyses i.e. TE analyses Graphical Engine Paris HVP -­‐ Andrea Calabria -­‐ andrea.calabria@unimi.it Graphical Data Representa-on End Users Applica+on Engine Query Builder Database Engine Data Integra-on & Data Warehouse Annota-on and Knowledge Data Experimental Data Analyses’ Results Paris HVP -­‐ Andrea Calabria -­‐ andrea.calabria@unimi.it Annota-on Data Extended Analysis Data Experimental Data