DUF1220 Domains & the Search for the Genes that Made Us Human James M. Sikela, Ph.D. Human Medical Genetics, Neuroscience, & Comparative Genomics Programs, Department of Biochemistry & Molecular Genetics, University of Colorado School of Medicine Genomics Course February 28, 2012 Key Points • First gene-based and first genome-wide study of lineage-specific gene duplication and loss in human and primate evolution • Dramatic human-specific increase in copy number of DUF1220 protein domains • DUF1220 copy number linked to evolution of brain size • Selection of evolutionarily adaptive genome sequences may be driving disease, e.g. 1q21.1 Primate Evolution 2 MYA 5 MYA 8 MYA 13 MYA 20 MYA 25 MYA Human Gorilla B/C = ~ 2 C/H = ~ 5 HC/G = ~ 8 HCG/O = ~ 13 HCG/O/Gib = ~20 Hom/OWM = ~ 25 HomOWM/NW = ~ 40 Orangutan Gibbons Old World Monkeys (e.g. baboon, rhesus, etc.) New World Monkeys (e.g. squirrel monkey,spider monkey) Chimpanzee Gorilla Bonobo Orangutan More Primates! ---- something has changed! Human Characteristics • Body shape and thorax • Cranial properties (brain case and face) • Small canine teeth • Skull balanced upright on vertebral column • Reduced hair cover • Enhanced sweating • Dimensions of the pelvis • Elongated thumb and shortened fingers • Relative limb length • Neocortex expansion • Enhanced language & cognition • Advanced tool making modified from S. Carroll, Nature, 2005 Reports of “human-specific” genes • FOXP2 – Mutated in family with language disability • ASPM/MCPH – Mutated in individuals with microcephaly • HAR1F – Gene sequence highly changed in humans • DUF1220 protein domains – Highly increased in copy number in humans; expressed in important brain regions Molecular Mechanisms Underlying Genome Evolution • Single nucleotide substitutions - change gene expression & structure • Genome rearrangements • Gene duplication - copy number change: gene dosage - redundancy as a facilitator of innovation Gene Duplication & Evolutionary Change •“There is now ample evidence that gene duplication is the most important mechanism for generating new genes and new biochemical processes that have facilitated the evolution of complex organisms from primitive ones.” - W. H. Li in Molecular Evolution, 1997 •“Exceptional duplicated regions underlie exceptional biology” - Evan Eichler, Genome Research 11:653-656, 2001 Interhominoid cDNA Array-Based Comparative Genomic Hybridization (aCGH) Fig 1. Measuring genomic DNA copy number alteration using cDNA microarrays (array CGH). Fluorescence ratios are depicted in a pseudocolor scale, such that red indicates increased, and green decreased, gene copy number in the test (right) compared to reference sample (left). Experimental Design • Carry out pairwise cDNA aCGH comparisons between human and other hominoid species • Use a >39,000 cDNA microarray representing >29,000 human genes • Hybridize human genomic DNA (reference sequence: cy3/green) and other hominoid genomic DNAs (test sequence: cy5/red) simultaneously to the microarray • Visualize aCGH signals “gene-by-gene” along each chromosome across five species: human (n=5), bonobo (n=3), chimpanzee (n=4), gorilla (n=3) and orangutan (n=3) Whole Genome Caryoscope Image of Interhominoid aCGH Data Human & Great Ape Genes Showing Lineage-Specific Copy Number Gain/Loss Fortna, et al, PLoS Biol. 2004 Summary of Human/Primate ArrayCGH Results • First genome-wide and first gene-based aCGH comparison of human and nonhuman primate gene copy number variation (Fortna, et al 2004) • 1,004 (4,159) genes identified that showed lineagespecific changes in copy number • Time machine of evolutionary copy number change • Gene candidates to underlie lineage-specific traits • Genes identified represent most of major lineagespecific gene duplications and losses over the last 60 million years of human and primate evolution (Dumas, et al 2007) 1 2B 1 MST 2 AMY R1A T4 FCG NUD 3 9 B10 SRP ABC 2 NEK 02 3867 GR1A C |nt141 1q12 A453258|F |A 3 1 O G Mb 1p36 1p34 20 0 6 E2F 1p31 50 1p22 PC1 5 ANA 4 1p13 100 6 140 1q21 1q23 1q32 170 C 1q41 210 250 B 3 I2 KHA DNC PLE 7 H 2 Mb 2p24 30 2p16 50 2p11 90 2q14 110 2q21 130 2q31 2q33 170 200 2q37 2691 1239 04 .1|nt1 FLJ220 2q14 233| |H98 6 240 O G 0 1 MST C 3 B H 3p25 3p21 20 3p12 50 3q13 80 ALB 3q21 130 160 3q25 3q26 180 3q28 200 0 4 Mb 4p16 4p12 10 4q12 50 4q24 80 4q31 100 4q34 140 190 0 9 1 5 N 2H2 C1 PAIP OCL GTF BIR SMA C1 8 BIR 5 9 Mb 5p15 20 5q11 50 0 X MKP 5q13 70 5q15 93 233 100 130 5q34 150 O G 190 C 116 GPR FLJ 10 5q23 5q 1 5q 3.1 13 |nt .1 70 |n 41 t7 7 04 5 ** 26 55 IM 48 |AI A |H G 4| 29 s. E W 11 79 : 7 72 8 5 01 5 43 4|O 9| 09 7| C B 3 G LN IR |5 T C |7 F 1 0 2H 46 2 4 ** 55 IM 1; A 5: 5: G 3 34 E: 51 89 95 38 18 04 35 74 59 95 |H |5 s. |7 43 07 24 10 75 19 |S 5; M 5:2 A 1 5 6 80 52 4 Mb B H 1 MST 6 Mb 6p25 6p22 10 6p21 30 40 6q12 50 6q14 90 2IP1 GTF 6q22 130 FAM GEF ARH 11 Mb 16p13 10 16p12 20 30 50 16q12 16q22 0 0 7q11 60 2 PMP 7q21 90 7q22 100 7q31 130 140 160 0 Mb 0 7 40 8q12 60 7 FGF AOP 8q21 80 8q22 100 8q24 120 150 13 13 10 17q11 20 17q12 30 17q21 50 17q23 70 17q25 K1 ROC 17 Mb 18p11 O G 9 7 18 ** 8p12 20 17p13 FGF 10 0 8p21 12 IM 18 AG :1 E |H 49 : 3 s. 42 65 37 4 51 49 32 5| 88 ;21 9|4 |F :1 13 G 13 4 F 6 96 7 4 9 22 3 6 ;1 5: 8 Mb 90 FLJ 17 7q35 42 76 14 35 56 306 FLJ 0 7p14 30 16q24 70 56 306 1 PAIP 7p21 10 170 7 Mb USP 16 6q25 5 3C SR2 CEL 16 18q12 20 18q21 50 80 18 C Mb 9p23 9p13 30 40 60 9q21 9q22 80 9q33 100 0 FLJ A 23 136MPR1 B 19 B 9q34 120 150 H Mb 19p13 10 0 10 20 19p11 19q11 40 19q12 50 19 Mb 10p15 10p11 20 40 50 10q21 80 10q24 100 10q25 120 20 10q26 140 GF-B R1A BMP 0 SCD Mb 20p13 10 20 0 11 30 20q11 20 FGF7 Mb 11p15 10 3 0 6A1 SLC DDX 11p14 20 11 50 28 11q12 11q13 70 80 827 253 220 FLJ LOC 90 4 T NUD 11q14 11q22 120 140 30 0 12p12 10 30 50 12q13 12q14 70 90 12q21 110 60 Mb 21 12p13 50 21 11q24 TDG 12 Mb 20q13 ALB 12q24 21q22 40 50 6 E2F 22 22 130 0 Mb 13 14 30 13q14 50 13q21 0 14q11 0 110 Human ( Homo Sapiens ) Bonobo ( Pan Paniscus ) Chimpanzee ( Pan Troglodytes ) Gorilla ( Gorilla Gorilla ) Orangutan ( Pongo Pygmaeus ) 14q13 7 694 283 FAM LOC CHR A 50 14q22 14q31 14q32 70 90 Xp22 20 Xp11 23 Test/Reference ratio: 7 FGF Y Mb < _ 0.5 20 15q13 40 15q21 40 50 X 100 15 Mb 22q13 Mb 50 15q22 70 15q24 15q26 100 1 >2 _ Yp11 0 15 30 30 3C 13q33 90 14 Mb FAM 0 13q12 22q11 20 0 Mb 20 50 50 70 Xq21 100 Xq26 130 Xq28 150 19q13 60 90 0 Human & Great Ape Genes Showing Lineage-Specific Copy Number Gain/Loss Fortna, et al, PLoS Biol. 2004 “This (Fortna, et al, 2004) is the first time that copy number changes among apes have been assayed for the vast majority of human genes, and we can expect that the biological consequences of the 140 humanspecific copy number changes identified in this study will be heavily investigated over the coming years. “ ---M. Hurles, PLoS Biol. 2004 DUF1220 Repeat Unit Popesco, et al, Science 2006 InterPro-predicted DUF1220-containing proteins (NBPF family*) *Vandepoule, et al, Mol. Biol. & Evol, 2005 Copy Number of DUF1220 (Q8IX62/17-33) Copy Num ber of Sequences inDUF1220 Primate(Q8IX62/17-33) Species 70 70 60 50 40 60 50 40 30 20 10 0 30 20 10 Baboon Macaque Gibbon Orangutan Gorilla Chimp Bonobo 0 Human Number Q-PCR Predicted Copy Q-PCRNumber Predicted Copy Sequences in Prim ate Species Summary of aCGH, Q-PCR and BLAT results: • DUF1220 domains are highly amplified in human, reduced in great apes, further reduced in Old & New World monkeys, single or low copy non-primate mammals and absent in non-mammals DUF1220 copy number in Animal Genomes Euarchotanglines Genome Laurasiatheria PDE4DIP DUF1220 Total DUF1220 NBPF genes Human 2 268 21 Chimp 3 125 Gorilla 3 Orangutan Genome PDE4DIP DUF1220 Total DUF1220 NBPF genes Cow 1 6 2 15 Pig 1 3 1 99 15 Horse 1 8 3 4 92 11 Dog 1 3 1 Macaque 1 35 10 Panda 1 2 1 Marmoset 1 30 10 Rabbit 1 8 3 Mouse 1 1 0 Rat 1 1 0 Guinea Pig 1 1 0 Afrotheria Elephant 1 1 1 Metatheria Opposum 1 1 0 Prototheria A total of 40 genomes were searched, but only the 22 with 4X coverage or higher are displayed. Platypus 1 1 0 Other Vertebrates Chicken 0 0 0 Lizard 0 0 0 Frog 0 0 0 Zebrafish 0 0 0 DUF1220 Copy Number Statistics in hg19 build DUF1220 Copies Total in Human Genome 272 Total amplified HLS DUF1220 Triplets 129 Total DUF1220 in Last Common Ancestor of Homo/Pan 102 Total of Newly Added Copies in Human Lineage 167 Total Copies Added via Domain Amplification 146 Total Copies Added via Gene Duplication 21 Average Number Added to Human Lineage every million years 28 This table shows the unprecedented DUF1220 copy number increase in the human lineage. The primary mechanism for this expansion was domain amplification via hyper-amplification of the HLS DUF1220 triplet. Sequences encoding DUF1220 domains • Show a major copy number burst in primates • Are increasingly amplified generally as a function of a species evolutionary proximity to humans, where the greatest number of copies (270) is found • Show signs of positive selection • Are highly expressed in brain regions associated with higher cognitive function • In brain show neuron-specific expression preferentially in cell bodies and dendrites Popesco, et al, Science 2006 1q21.1 Deletions* Linked to Microcephaly 1q21.1 Duplications* Linked to Macrocephaly •Recurrent Reciprocal 1q21.1 Deletions and Duplications Associated with Microcephaly or Macrocephaly and Developmental and Behavioral Abnormalities Brunetti-Pierri, et al, Nature Genetics 2008 •Recurrent Rearrangements of Chromosome 1q21.1 and Variable Pediatric Phenotypes Mefford, et al, N. Engl. J. Med. 2008 *Implies human brain size directly related to the dosage of one or more genes in these 1q21.1 CNVs We note that these CNVs encompass or are immediately flanked by DUF1220 sequences (Dumas & Sikela, Cold Spring Harbor Symposium Quant. Biol., 2009) DUF1220/NBPF Sequences & Recurrent Disease-associated 1q21.1 CNVs Association (p<0.0001) of human head circumference (FOC Z-score) & DUF1220 copy number Head Circumference (FOC Z-Score) vs. DUF1220 Copy Number 6 FOC Z-Score 4 2 Class II Deletion 0 Class I Deletion Duplication -2 -4 -6 20 30 40 50 60 Q-PCR-Predicted DUF1220 Copy Number 70 80 Copy number of genes in the 1q21.1-q21.2 region versus brain size • 46 1q21.1 genes compared along with brain size across 5 primate species • DUF1220 shows the most dramatic human-specific copy number increase. • The evolutionary increase in DUF1220 copy number parallels the increase in brain size. Brain Size (g) Copy # DUF1220 PPIAL4 LOC728855 FAM72D SRGAP PDE4DIP SEC22B NOTCH2NL HFE2 TXNIP POLR3 ANKRD34 ANKRD35 LIX1L RBM8A GNRHR2 PEX11B ITGA10 NUDT17 RNF115 CD160 PDZK1 GPR89 PRKAB2 PDIA3P FMO5 CHD1L BCL9 ACP6 GJA5 GJA8 LOC645166 FCGR1 SV2A BOLA1 MTMR11 OTUD7B SF3B4 VPS45 PLEKHO1 ANP32E PRPF3 C1orf54 MRPS21 CA14 C1orf51 APH1A Human 1350 Chimp 380 Orangutan 390 Macaque 88 Mamoset 7 272 5 5 2 1 3 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 125 1 2 0 0 3 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 92 1 2 0 0 4 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 35 0 2 0 0 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 30 0 1 0 0 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 DUF1220 Copy Number Versus Brain Size * Neandertal DUF1220 copy number is estimate based on sequence read depth from the Neandertal genome (Green et al 2010). -but correlation is not causation Factors that must be reconciled with model linking 1q21.1 instability, evolutionary adaptation & recurrent disease • Evolutionarily rapid DUF1220 copy number increase – Estimate, on average, 28 more DUF1220 domains added to human genome every 1 million years since Homo/Pan split • Underlying mechanism must account for continued, recurrent DUF1220 increases • Underlying mechanism must account for excess of 1q21.1 disease-associated CNVs containing dosagesensitive genes Proposed Mechanism Linking DUF1220, Brain Evolution and Disease 1q21.1 duplications Evolutionary Advantage Increased (Increase in 1q21.1 Instability Brain Size?) Macrocephaly; Autism* 1q21.1 deletions Increase in DUF1220 Copy Number Microcephaly; Schizophrenia* *Diseases proposed as “Diametric Opposites” (including brain size), Crespi, Stead & Elliot, PNAS, 2009 DUF1220 Model* DUF1220 model proposes that: 1) DUF1220 copy number is directly involved in influencing human brain size, and 2) the evolutionary advantage of rapidly increasing DUF1220 copy number in the human lineage has resulted in favoring retention of the high genomic instability of the 1q21.1 region which, in turn, has precipitated a spectrum of recurrent human brain and developmental disorders *Dumas & Sikela, Cold Spring Harbor Symposium Quant. Biol., 2009 Concluding Thoughts • DUF1220 domains shows the largest HLS protein coding copy number increase in the genome – But no one gene made us human – DUF1220 genotyping challenges • We know more about our genome than ever – But there are vast areas of our genome about which we know virtually nothing – No mammalian genome has been completely sequenced Acknowledgements • • • • • • • • • • • • • • • • • • Sikela Lab Laura Dumas Majesta O’Bleness Maggie Popesco Erik MacLaren Andy Fortna Jan Hopkins Jonathon Keeney Jack Davis Jay Jackson Megan Sikela Michael Cox Kriste Marshall Matt Brenton Sonya Burgers Raquel Hink Erin Dorning Park McNair • • • • • • Collaborators Stanford – Jon Pollack – Young Kim Univ. of Kansas - Gerald Wyckoff Univ of Utah – Lynn Jorde Baylor College – Pawel Stankiewicz – Sau Wai Cheng UCSOM – Epidemiology • Tasha Fingerlin – Preventive Medicine & Biometrics • Anis Karimpour-Fard – Neuroscience Program • Rock Levinson • John Caldwell A Walk Through Our Genome --All regions of the genome are not created equal