Human Evolutionary Genomics: Lessons from DUF1220 Protein Domains, Cognitive Disease and Human Brain Evolution James M. Sikela, Ph.D. Department of Biochemistry & Molecular Genetics Human Medical Genetics and Neuroscience Programs University of Colorado School of Medicine Advanced Genome Analysis Course University of Colorado School of Medicine March 5, 2015 Primate Evolution 2 MYA 5 MYA 8 MYA 13 MYA 20 MYA 25 MYA Human Gorilla B/C = ~ 2 C/H = ~ 5 HC/G = ~ 8 HCG/O = ~ 13 HCG/O/Gib = ~20 Hom/OWM = ~ 25 HomOWM/NW = ~ 40 Orangutan Gibbons Old World Monkeys (e.g. baboon, rhesus, etc.) New World Monkeys (e.g. squirrel monkey,spider monkey) Chimpanzee Gorilla Bonobo Orangutan More Primates! ---- some things have changed! Human Characteristics • Body shape and thorax • Cranial properties (brain case and face) • Small canine teeth • Skull balanced upright on vertebral column • Reduced hair cover • Enhanced sweating • Dimensions of the pelvis • Elongated thumb and shortened fingers • Relative limb length • Neocortex expansion • Enhanced language & cognition • Advanced tool making modified from S. Carroll, Nature, 2005 Reports of “human-specific” genes • FOXP2 – Mutated in family with language disability • ASPM/MCPH – Mutated in individuals with microcephaly • HAR1F – Gene sequence highly changed in humans • SRGAP2 (neuronal migration?) – Partial human-specific gene duplication • DUF1220 protein domains – Highly increased in copy number in humans; expressed in important brain regions HAR1F Gene Marques-Bonet, et al Ann Rev Genomics 2009 Molecular mechanisms driving genome evolution • Single nucleotide substitutions - change gene expression - change gene structure • Genome rearrangement • Gene/segmental duplication - copy number change - value of redundancy Gene Duplication & Evolutionary Change •“There is now ample evidence that gene duplication is the most important mechanism for generating new genes and new biochemical processes that have facilitated the evolution of complex organisms from primitive ones.” - W. H. Li in Molecular Evolution, 1997 •“Exceptional duplicated regions underlie exceptional biology” - Evan Eichler, Genome Research, 2001 Interhominoid cDNA Array-Based Comparative Genomic Hybridization (arrayCGH) Fig 1. Measuring genomic DNA copy number alteration using cDNA microarrays (array CGH). Fluorescence ratios are depicted in a pseudocolor scale, such that red indicates increased, and green decreased, gene copy number in the test (right) compared to reference sample (left). Human & Great Ape Genes Showing Lineage-Specific Copy Number Gain/Loss Fortna, et al, PLoS Biol. 2004 BAC-FISH with clone containing SLC35F5 gene Human Bonobo Gorilla Orang Chimp H B C G O IMAGE:814107 IMAGE:261219 IMAGE:665496 PLA2G4B/SPTBN5 gene copy number increases in African great apes 1 2B 1 MST 2 AMY R1A T4 FCG NUD 3 9 B10 SRP ABC 2 NEK 02 3867 GR1A C |nt141 1q12 A453258|F |A 3 1 O G Mb 1p36 1p34 20 0 6 E2F 1p31 50 1p22 PC1 5 ANA 4 1p13 100 6 140 1q21 1q23 1q32 170 C 1q41 210 250 B 3 I2 KHA DNC PLE 7 H 2 Mb 2p24 30 2p16 50 2p11 90 2q14 110 2q21 130 2q31 2q33 170 200 2q37 2691 1239 04 .1|nt1 FLJ220 2q14 233| |H98 6 240 O G 0 1 MST C 3 B H 3p25 3p21 20 3p12 50 3q13 80 ALB 3q21 130 160 3q25 3q26 180 3q28 200 0 4 Mb 4p16 4p12 10 4q12 50 4q24 80 4q31 100 4q34 140 190 0 9 1 5 N 2H2 C1 PAIP OCL GTF BIR SMA C1 8 BIR 5 9 Mb 5p15 20 5q11 50 0 X MKP 5q13 70 5q15 93 233 100 130 5q34 150 O G 190 C 116 GPR FLJ 10 5q23 5q 1 5q 3.1 13 |nt .1 70 |n 41 t7 7 04 5 ** 26 55 IM 48 |AI A |H G 4| 29 s. E W 11 79 : 7 72 8 5 01 5 43 4|O 9| 09 7| C B 3 G LN IR |5 T C |7 F 1 0 2H 46 2 4 ** 55 IM 1; A 5: 5: G 3 34 E: 51 89 95 38 18 04 35 74 59 95 |H |5 s. |7 43 07 24 10 75 19 |S 5; M 5:2 A 1 5 6 80 52 4 Mb B H 1 MST 6 Mb 6p25 6p22 10 6p21 30 40 6q12 50 6q14 90 2IP1 GTF 6q22 130 FAM GEF ARH 11 Mb 16p13 10 16p12 20 30 50 16q12 16q22 0 0 7q11 60 2 PMP 7q21 90 7q22 100 7q31 130 140 160 0 Mb 0 7 40 8q12 60 7 FGF AOP 8q21 80 8q22 100 8q24 120 150 13 13 10 17q11 20 17q12 30 17q21 50 17q23 70 17q25 K1 ROC 17 Mb 18p11 O G 9 7 18 ** 8p12 20 17p13 FGF 10 0 8p21 12 IM 18 AG :1 E |H 49 : 3 s. 42 65 37 4 51 49 32 5| 88 ;21 9|4 |F :1 13 G 13 4 F 6 96 7 4 9 22 3 6 ;1 5: 8 Mb 90 FLJ 17 7q35 42 76 14 35 56 306 FLJ 0 7p14 30 16q24 70 56 306 1 PAIP 7p21 10 170 7 Mb USP 16 6q25 5 3C SR2 CEL 16 18q12 20 18q21 50 80 18 C Mb 9p23 9p13 30 40 60 9q21 9q22 80 9q33 100 0 FLJ A 23 136MPR1 B 19 B 9q34 120 150 H Mb 19p13 10 0 10 20 19p11 19q11 40 19q12 50 19 Mb 10p15 10p11 20 40 50 10q21 80 10q24 100 10q25 120 20 10q26 140 GF-B R1A BMP 0 SCD Mb 20p13 10 20 0 11 30 20q11 20 FGF7 Mb 11p15 10 3 0 6A1 SLC DDX 11p14 20 11 50 28 11q12 11q13 70 80 827 253 220 FLJ LOC 90 4 T NUD 11q14 11q22 120 140 30 0 12p12 10 30 50 12q13 12q14 70 90 12q21 110 60 Mb 21 12p13 50 21 11q24 TDG 12 Mb 20q13 ALB 12q24 21q22 40 50 6 E2F 22 22 130 0 Mb 13 14 30 13q14 50 13q21 0 14q11 0 110 Human ( Homo Sapiens ) Bonobo ( Pan Paniscus ) Chimpanzee ( Pan Troglodytes ) Gorilla ( Gorilla Gorilla ) Orangutan ( Pongo Pygmaeus ) 14q13 7 694 283 FAM LOC CHR A 50 14q22 14q31 14q32 70 90 Xp22 20 Xp11 23 Test/Reference ratio: 7 FGF Y Mb < _ 0.5 20 15q13 40 15q21 40 50 X 100 15 Mb 22q13 Mb 50 15q22 70 15q24 15q26 100 1 >2 _ Yp11 0 15 30 30 3C 13q33 90 14 Mb FAM 0 13q12 22q11 20 0 Mb 20 50 50 70 Xq21 100 Xq26 130 Xq28 150 19q13 60 90 0 Human Chromosome 9 Human lineage-specific amplification of AQP7 9p22 Human Bonobo Chimpanzee Gorilla Oranutan Gibbon Macaque Baboon Marmoset Lemur Test/Reference Ratio: < 0.4 1 > 2.5 AQP7 AQP7 -0.6 Lemur Baboon Marmoset Gibbon Macaque Gorilla Orangutan Chimp Human -0.4 1.4 Bonobo aCGH log2 Fluorescent Ratio 0 -0.2 1.2 1 0.8 -0.8 -1 -1.2 0.6 0.4 -1.4 0.2 -1.6 0 Quantitative Real Time PCR Copy Number 9q22 aCGH Q-PCR r2=0.9532 SMA Chr5q13 Williams Beuren Chr7q11.2 Prader-Willi Chr15q11.1 DiGeorge Chr22q11 50 321470 470930 781385 594438 843276 1212231 296679 383823 119768 126229 135010 234376 279874 50904 297084 298685 298862 323796 451080 470261 488945 626842 704320 730398 741841 767345 811138 823588 969906 1030854 1031047 1467026 1468074 1474402 1557341 1638749 1641894 1641988 1683035 1699118 1759573 1856246 1874052 1946251 Number of BLAT Hits BLAT-Predicted Intronless vs. Intron-Containing HLS Gene Copies in Human, Chimp, and Macaque Genomes * 50 45 40 35 Human intron-containing Chimp intron-containing Macaque intron-containing Intronless IMAGE Clone 45 40 35 30 30 25 25 20 20 15 15 10 10 5 5 0 0 DUF1220 Repeat Unit Popesco, et al, Science 2006 Synonymous and Nonsynonymous Differences Between Aligned Sequences T h r A CT Phe T TT A CC T h r GTT Val Ks = Average number of synonymous changes Ka = Average number of nonsynonymous changes Nonsynonymous and Synonymous Sites in Codons Th r ACT N Phe T TT S N N N 1/3 S 2/3 N What will be the Ka/Ks values for most proteins? Ka/Ks Distribution Ka/Ks Distribution 1600 1400 Intra-primate comparison mean:0.91 Rodent-primate comparison mean: 0.61 1000 800 600 400 200 Ka/Ks value Ka/Ks Value 2.00 1.92 1.84 1.76 1.68 1.60 1.52 1.44 1.36 1.28 1.20 1.12 1.04 0.96 0.88 0.80 0.72 0.64 0.56 0.48 0.40 0.32 0.24 0.16 0.08 0 0.00 Number of genes per bin Number of genes per bin 1200 Genome Human Chimp Gorilla Orangutan Gibbon Macaque Marmoset Mouse Lemur Bushbaby Tarsier Rabbit Pika Mouse Rat Guinea Pig Squirrel Tree Shrew Cow Dolphin Pig Horse Dog Panda Cat Megabat Microbat Hedgehog Shrew PDE4DIP 2 3 3 4 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Total DUF1220 272 125 99 92 53 35 31 2 3 1 8 1 1 1 1 1 4 7 4 3 8 3 2 3 1 1 1 1 NBPF Genes 23 19 15 11 10 10 11 1 2 0 3 0 0 0 1 1 3 3 1 1 3 1 1 2 0 0 0 0 • DUF1220 shows greatest human specific copy number expansion of any protein coding sequence in the human genome • Show signs of positive selection • Human increase primarily due to domain amplification (rather than gene duplication) O’Bleness et al. Evolutionary History and Genome Organization of DUF1220 Protein Domains. G3 (Bethesda). Sept (2012). A Chronology of DUF1220 Domain Evolution * Branch points in millions of years. O’Bleness, et al, G3: Genes, Genomes, Genetics, 2012 Consensus Tree of Evolutionary Relationships of 429 Primate DUF1220 Sequences DUF1220 Duplication and Protein Domain Classifications Ancestral DUF1220 found in human PDE4DIP NBPF-type DUF1220 Domains CON1 CON2 HLS1 HLS2 HLS3 CON3 Clades CON1-3 are conserved DU1220 sequences among primates Clades HLS1-3 refers to a three-DUF1220 domain unit that has expanded only in the human lineage DUF1220 triplet DUF1220 triplet NBPF12 CON1 CON2 HLS1 HLS2 HLS3 HLS1 HLS2 HLS3 CON3 DUF1220/NBPF Genome Organization in Chimp & Human Chimpanzee Human O’Bleness, et al, G3: Genes, Genomes, Genetics, 2012 50 37.5 25 A 36kDa B GAPDH Western analysis of Normal Adult Human Brain regions with DUF1220 antibody: Total protein lysates (50ug) from normal adult human brain regions (male and female; ages ranging from 22-82yrs) were electrophoresed on 4-20% denaturing SDS-PAGE gels and blotted with: A) DUF1220 affinity purified antibody B) GAPDH. Popesco, et al Science 2006 DUF1220 Protein Expression in Adult Human Brain A B C E F ml P den igl D DUF1220 antibody staining in the human cerebellum (77yr old white female). A) DUF1220 affinity purified antibody; B) Double labeling with DUF1220 affinity purified antibody and Neurofilament 160kDa; C) same as B-higher magnification; D) Double labeling with DUF1220 affinity purified antibody and GFAP; E) DUF1220 preimmune and GFAP; F) DUF1220 Adsorption control. Blue labeling represents DAPI for nuclear staining. Popesco et al Science 2006 (30yr old female) HippocampusCA regionsDUF1220 Affinity purified + GFAP + DAPI GFAP DUF1220 Affinity Purified Antibody DAPI (30yr old female) Cortical regionsHippocampusDUF1220 Affinity purified + GFAP + DAPI GFAP DUF1220 Affinity Purified Antibody DAPI Noteworthy DUF1220 Copy Number Totals DUF1220 Copies Total in Human Genome Total in Chimp Genome (CLS) 272 125 (23) Total in Last Common Ancestor of Homo/Pan 102 Total of Newly Added Copies in Human Lineage 167 Total Human-Specific Copies Added via Domain Amplification 146 Total Human-Specific Copies Added via Gene Duplication 21 Avg. Number Added to Human Lineage Every Million Years 28 O’Bleness, et al, G3: Genes, Genomes, Genetics, 2012 Sequences Encoding DUF1220 Domains • Show the largest human lineage-specific increase in copy number of any protein coding region in the genome (160 HLS; >270 total in haploid genome) • Show signs of positive selection especially in primates • In brain, are expressed only in neurons • Are highly amplified in human, reduced in great apes, further reduced in monkeys, single-or-low copy in prosimians and non-primate mammals, and absent in non-mammals • Have increased in human primarily by domain hyperamplification involving DUF1220 triplet Key Human-Specific Evolutionary Features of 1q21.1 Region ‡* O’Bleness, et al, Nat Rev Genet, 2012 1q21.1 Deletions linked to Microcephaly* 1q21.1 Duplications linked to Macrocephaly* • Recurrent Reciprocal 1q21.1 Deletions and Duplications Associated with Microcephaly or Macrocephaly and Developmental and Behavioral Disorders Brunetti-Pierri, et al, Nature Genetics 2008 • Recurrent Rearrangements of Chromosome 1q21.1 and Variable Pediatric Phenotypes Mefford, et al, N. Engl. J. Med. 2008 • *Implies the copy number (dosage) of one or more genes in this region is influencing brain size in a dose-dependent manner • These CNVs encompass or are immediately flanked by DUF1220 sequences (Dumas & Sikela, Cold Spring Harbor Symposium Quant. Biol., 2009) DUF1220/NBPF Sequences & Recurrent Disease-associated 1q21.1 CNVs Human Evolutionary Genomics: Relevant Reviews Sikela, J.M. (2006). The Jewels of Our Genome: The Search for the Genomic Changes Underlying the Evolutionarily Unique Capacities of the Human Brain. PLoS Genet. 2, e80. O’Bleness, M.S., Searles, V., Varki, A., Gagneux, P., and Sikela, J.M. (2012). Evolution of genetic and genomic features unique to the human lineage. Nat. Rev. Genet., 13, 853-866.