Human Genetics Weibin Shi Michele Sale Contact Information Shi: ws4v@virginia.edu; 243-9420 Sale: ms5fe@Virginia.EDU; 982-0368 Recommended textbooks Medical Genetics -Jorde, Carey, Bamshad & White • Mosby, ISBN 13: 978-0-323-04035-8 Human Molecular Genetics - Strachan T, Read A Garland Science,ISBN-10: 0815341822 Overview of course content 1: Organization of the human genome 2: Genetic variation 3. Patterns of inheritance 4: Population genetics 5: linkage disequilibrium 6: Genetic epidemiology 7: Applied research in human genetics Organization of the human genome Human genome sequence published February 2001 Genes are found in the nucleus and mitochondria Nuclear genome packaged with proteins to form chromatin Human chromosomes 23 pairs 46 chromosomes 22 pairs – autosomes 1 pair sex chromosomes 46,XY Normal male Human chromosomes 46,XX Normal female A little more basic terminology Human genome = nuclear genome + mitochondrial genome Mitochondrial genome NUCLEAR GENOME 24 distinct chromosomes (22 autosomal + X + Y) 3,200 Mbp 25,000 genes 16,569 bp 37 genes Human Mitochondrial Genome Small (16.5 kb) circular DNA rRNA, tRNA and protein encoding genes (37) 1 gene/0.45 kb Very few repeats No introns 93% coding Genes are transcribed as multimeric transcripts Maternal inheritance What are the mitochondrial genes? 24 of 37genes are RNA coding 22 tRNA 2 ribosomal RNA (23S, 16S) 13 of 37 genes are protein coding some subunits of respiratory complexes and oxidative phosphorylation enzymes Limited autonomy of mitochondrial genome mt encoded NADH dehydrogenase Cytochrome b-c1 comp Cytochrome C oxidase ATP synthase complex 7 subunits 1 subunit 3 subunits 2 subunits nuclear 35 subunits 10 subunits 10 subunits 14 subunits Two overlapping genes encoded by same strand of mt DNA (unique example) Two independent ATG located in Frame-shift to each other, second stop codon is derived from TA + A (from poly-A) Mitochondrial codon table Human Nuclear Genome 3,200 Mb 23 (XX) or 24 (XY) linear chromosomes 25,000 genes 1 gene/120kb Introns in the most of the genes 1.5 % of DNA is coding Genes are transcribed individually Repetitive DNA sequences (45%) Inherited from both parents Human Nuclear Genome In human nuclear genome gene-rich regions are separated by gene deserts Chr. 19 has the highest gene density Chr. 13 & Y show the lowest gene density Human genome base content 41% CG in average 38% CG for Chr. 4 and Chr. 13 49% for Chr. 19 Regions with wide swings in CG content (e.g. from 33.1% to 59.3%) Gene density correlates with higher CG content CpG dinucleotide depletion Expected frequency is 4.2% Observed frequency is five times lower Location of CpG islands in the gene CpG islands in the regulatory areas of human genes Human nuclear genome Gene density varies widely Averagely 9 exons per gene 363 exons in titin gene Certain genes are intronsless Largest intron is 800 kb (WWOX gene) Smallest introns – 10 bp Average 5’ UTR 0.2-0.3 kb Average 3’ UTR 0.77 kb Largest protein: titin: 38,138 aa Gene density varies substantially between chromosomal regions Genes vary in size and exon content INTRONLESS GENES Interferon genes Histone genes Many ribonuclease genes Heat shock protein genes Many G-protein coupled receptors Various neurotransmitters receptors and hormone receptors Genes within genes Classical gene families: members exhibit a high degree of sequence similarity CS = chorionic somatomammotropin four placenta-specific genes, primates only serum albumin alpha-albumin vitamin D-binding protein Gene families: gene products bearing short conservative amino acid motifs DEAD box proteins are involved in mRNA splicing and translation initiation; DEAD box (Asp-Glu-Ala-Asp) WD proteins take part in a variety of regulatory functions, GH (Gly-His) should be at 23-41 aa distance from WD (Trp-Aps) Gene superfamily: Proteins that are functionally related in a general sense, but show only weak homology Functionally similar genes are occasionally clustered, but usually dispersed throughout the genome Non-coding RNA genes Code for functional RNA ncRNA represent 98% of all transcripts in a mammalian cell ncRNA can be: Structural Catalytic Regulatory How many genes in the nuclear genome? ~3000 RNA genes in the nuclear genome ~10% of human gene count have not been taken into account in gene counts Non-coding RNA tRNA – transfer RNA: involved in translation rRNA – ribosomal RNA: structural component of ribosome, where translation takes place snoRNA – small nucleolar RNA: functional/catalytic in rRNA maturation Antisense RNA: gene regulation/silencing microRNA A new class of non-coding RNA gene Products are 19~25 nt RNAs Precursors are 70-100 nt. Block translation or result in degradation of target mRNA Tandem repeats and interspersed repeats Satellite DNA is repetitive DNA that could be separated by centrifugation Equilibrium density gradient centrifugation Sheared DNA in Cesium Chloride gradient Satellite DNA Alpha –satellite (Centromere DNA) Microsatellite Minisatellite Microsatellite di-, tri-, and tetra-nucleotide repeats TGCCACACACACACACACAGC TGCCACACACACA------GC TGCTCATCATCATCAGC TGCTCATCA------GC TGCTCAGTCAGTCAGTCAGGC TGCTCAGTCAG--------GC ~10% of the nuclear genome Minisatellites • 6-64 bp repeating pattern 1 61 121 181 241 301 361 421 tgattggtct attttttagg tggtatttta gatttcggga tacttgattt ggattttaag ttttaggatt ctgaatataa ctctgccacc aattttttta ggatttactt tttcaggatt tgggatttta ttttcttgat acgggatttt atgctctgct gggagatttc atggattacg gattttggga ttaagttttc ggattacggg tttatgattt agggtgctca gctctcgctg cttatttgga ggattttagg ttttaggatt ttgattttat attttagggt taagatttta ctatttatag atgtcattgt Repeat: AGGAATTTTT ggtgatggag gttctaggat gagggatttt gattttaaga ttcaggattt ggatttactt aactttcatg tctcataata gatttcagga tttaggatta agggtttcag ttttaggatt cgggatttca gattttggga gtttaacata cgttcctttg α-Satellite repeat • 171 bp sequence repeat Interspersed repetitive DNA SINE (Short interspersed nuclear elements): Alu, ~0.3 kb, ~10,7% of human DNA (1,200, 000 copies) MIR, ~0.13 kb, 3% of human DNA (500,000 copies) LINE (Long interspersed nuclear elements): ~0.8 kb, ~21% of human DNA (~1,00,000 copies) Chromosomal location of repeats Pseudogenes Non-functional copy of a gene Non-processed pseudogene • Nonfunctional copies of the genomic DNA sequence of a gene • Contain exons, intron, and flanking sequences Processed pseudogene • Nonfunctional copies of the exonic sequences of a gene • Reverse-transcribed from an RNA transcript • No 5’ promoter • No introns • Often includes polyA tail Both include events that make the gene non-functional • Frameshift • Stop codons Could be as high as 20-30% of all Genomic sequence predictions could be pseudogene We assume pseudogenes have no function, but we really don’t know! Human Genome Organization HUMAN GENOME Nuclear genome 3,200 Mb 25,000 genes Genes and generelated sequences Mitochondrial genome 16.5 kb 37 genes Extragenic DNA Two rRNA genes 22 tRNA genes 13 polypeptideencoding genes Unique or moderately repetitive Coding DNA Pseudogenes Unique or low copy number Noncoding DNA Gene fragments Introns, untranslated sequences, etc. Tandemly repeated Moderate to highly repetitive Interspersed repeats