Organisation of human genome Nuclear genome (3.2 Gbp) 24 types of chromosomes Y- 51Mb and chr1 -279Mbp Mitochondrial genome 1.5% Exons Introns (junk) Intergenic regions (junk) The genome is empty? 9 Saccharomyces cerevisiae (baker’s yeast) Estimated number of genes: 6,034 Drosophila melanogaster (fruit fly) 13,061 Caenorhabditus elegans (roundworm) 19,099 Arabidopsis thaliana (mustard plant) 25,000 LA COMPLEJIDAD BIOLÓGICA CRECIENTE EXIGE CAMBIOS GENÓMICOS QUE INCREMENTEN LA CAPACIDAD INFORMACIONAL DEL SISTEMA... ...PERO EL NÚMERO DE GENES EN LOS DISTINTOS GENOMAS SECUENCIADOS NO CONCUERDA CON LO ESPERADO (APARENTEMENTE) Amphimedon queenslandica 18693 Trichoplax adhaerens 11514 Bos taurus >22790 Nematostella vectensis 18000 Nassonia vitripennis 17279 Homo sapiens 21527 Mus musculus 22083 Danio rerio 21413 Drosophila melanogaster 13781 Ciona intestinalis 16000 Takifugu rubripes 18500 Caenorhabditis elegans 20224 Strongylocentrotus purpuratus 23300 Anolis carolinensis 17000 Xenopus tropicalis 18000 Gallus gallus <17000 Arabidopsis thaliana 26000 Gorilla gorilla 21000 Oryza sativa 50000 Pan troglodytes 21000 Populus trichocarpa 45550 Glycine max 75778 Why (coding) gene number doesn’t matter? • More sophisticated regulation of expression? • Proteome vastly larger than genome? – Alternate splicing – RNA editing • Postranslational modifications • Cellular location …but, remember there are other genes Genes in the genome: • Protein-coding genes (mRNA): around 20500 (as of 10/2012) • Non-coding RNAs Ribosomal RNA (rRNA) Transfer RNA (tRNA) Small nuclear RNA (SnRNA) Small nucleolar RNA (SnoRNA) microRNA (miRNA) Other non-coding RNAs (Xist, 7SK, etc.) • Peudogenes Non polypeptide–coding: RNA encoding Statistics about the current Gencode freeze (version 13) *The statistics derive from the gtf files, which include only the main chromosomes of the human reference genome. Version 13 (March 2012 freeze, GRCh37) General stats Total No of Genes 55123 Protein-coding genes 20670 Long non-coding RNA genes 12393 Small non-coding RNA genes 9173 Pseudogenes 13123 Total No of Transcripts 182967 Protein-coding transcripts 77901 Long non-coding RNA loci transcripts 19835 Total No of distinct translations 78119 Genes that have more than one distinct translations 14235 Protein-coding genes (mRNA): HUMAN genes and their homology to genes from other organisms CODING GENES Noncoding regions in coding genes • Regulatory regions – RNA polymerase binding site – Transcription factor binding sites – Polyadenylation [poly(A)] sites – Enhancers • 5’- and 3’-UTRs DNA as a series of ‘docking’ sites It is the relative location of these docking sites to one another that permits genes to be transcribed, spliced, and translated properly and in specific spatial and temporal patterns. …some more statistics • • • • • • • • • • Gene density 1/100 kb (vary widely); Averagely 9 exons per gene 363 exons in titin gene Many genes are intronsless Largest intron is 800 kb (WWOX gene) Smallest introns – 10 bp Average 5’ UTR 0,2-0,3 kb Average 3’ UTR 0,77 kb but underestimated… Largest protein: titin: 38,138 aa Largest gene: dystrophin Human genes vary enormously in size and exon content An example of complex human gene locus INK4a-ARF From: Prof. Gordon Peters website Genes within genes Neurofibromatosis gene (NF1) intron 26 encode : OGMP (oligodendrocyte myelin glycoprotein) EVI2A and EVO2B (homologues of ecotropic viral intergration sites in mouse) Why gene number doesn’t matter? • More sophisticated regulation of expression • Proteome vastly larger than genome – Alternate splicing – RNA editing… • Postranslational modifications • Cooption • GRN’s connectivity REDES DINÁMICAS Why gene number doesn’t matter? • More sophisticated regulation of expression • Proteome vastly larger than genome – Alternate splicing – RNA editing… • Postranslational modifications • Cooption • GRN’s connectivity Table 1. Levels of regulation--loci of control constraints--above the genome. Levels and transitions Dynamic regulatory system 1. Genome to transcriptome Epigenetic regulation of gene expression (5). Includes pathways that detect energy levels (redox levels) and repress DNA transcription when cellular NADH levels are increased. 2. Transcriptome to proteome Regulatory constraints include posttranslational modification of proteins. 3. Proteome to dynamic system Metabolic networks of glycolysis and mitochondrial oxidation-reduction are the dynamic systems presently the best understood in terms of both mechanism of formation and operating principles. They display control distributed over all enzymes of a network, and their phenotype includes cellular redox potential. 4. Dynamic systems to phenotype Control of global phenotype such as disease may be localized to a single regulatory system (such as metabolic, hormone signaling, etc.) or be distributed over many systems and levels Gene Expression • The products of genes may be RNA or protein • RNA and protein synthesis occur in many steps • These steps are regulated and conttroled Table 1. Levels of regulation--loci of control constraints--above the genome. Levels and transitions Dynamic regulatory system 1. Genome to transcriptome Epigenetic regulation of gene expression (5). Includes pathways that detect energy levels (redox levels) and repress DNA transcription when cellular NADH levels are increased. 2. Transcriptome to proteome Regulatory constraints include posttranslational modification of proteins. 3. Proteome to dynamic system Metabolic networks of glycolysis and mitochondrial oxidation-reduction are the dynamic systems presently the best understood in terms of both mechanism of formation and operating principles. They display control distributed over all enzymes of a network, and their phenotype includes cellular redox potential. 4. Dynamic systems to phenotype Control of global phenotype such as disease may be localized to a single regulatory system (such as metabolic, hormone signaling, etc.) or be distributed over many systems and levels UCSC Table 1. Levels of regulation--loci of control constraints--above the genome. Levels and transitions Dynamic regulatory system 1. Genome to transcriptome Epigenetic regulation of gene expression (5). Includes pathways that detect energy levels (redox levels) and repress DNA transcription when cellular NADH levels are increased. 2. Transcriptome to proteome Regulatory constraints include posttranslational modification of proteins. 3. Proteome to dynamic system Metabolic networks of glycolysis and mitochondrial oxidation-reduction are the dynamic systems presently the best understood in terms of both mechanism of formation and operating principles. They display control distributed over all enzymes of a network, and their phenotype includes cellular redox potential. 4. Dynamic systems to phenotype Control of global phenotype such as disease may be localized to a single regulatory system (such as metabolic, hormone signaling, etc.) or be distributed over many systems and levels Gene Expression • The products of genes may be RNA or protein • RNA and protein synthesis occur in many steps • These steps are regulated and conttroled Location of CpG islands in the gene CpG islands do NOT have a deficit of CpG dinucelotides How epigenetics works Promoter Region CpG Island = CpG = methylated CpG Gene Unmethylated CpGs relax chromatin Gene RNA = CpG = methylated CpG Proteins Methylated CpGs constrain chromatin Gene RNA = CpG = methylated CpG Proteins Chromatin Modification Chromatin Remodeling SNF/SWI Transcription Factor Modification Acetylation Phosphorylation DNA Methylation CpG dinucleotides MeCP2 Histone Substitution H2AZ H2Ax H3.3 Histone Modification Acetylation Ubiquitination Sumoylation Methylation Phosphorylation Eukaryotic transcription regulation Modular construction and combinatorial control • The regulatory sequence (cis element) on DNA consists of multiple motifs specific for transcription factors. • Multiple transcription factors can bind simultaneously to the regulatory sequences and act together on the transcription of the gene. Co-activator protein General transcription factors TBP Transcriptional activators binding to promoter region TATA -35 Regulated Transcription Gene X Activators stimulate the highly cooperative assembly of initiation complexes Binding sites for activators that control transcription of the mouse TTR gene Figure 10-60 Model for cooperative assembly of an activated transcription-initiation complex in the TTR promoter Figure 10-61 (TTR= transthyretin) Distant Cis-Acting Elements Locus Control Region Regulatory site required for optimal expression of adjacent group of genes Insulator Element Prevents activation/repression extending to an adjacent regulatory sequence Distant Cis-Acting Elements Insulator Element Prevents activation/repression extending to an adjacent regulatory sequence Co-activator protein General transcription factors TBP Transcriptional activators binding to promoter region TATA -35 Regulated Transcription Gene X ALTERNATIVE PROMOTERS REGULACIÓN ESPECÍFICA DE SEXO EN EL GEN DNMT1 (METHYLTRANSFERASE): PROMOTORES DE OOCITO, SOMÁTICO, O DE ESPERMATOCITO Posttranscriptional control • Regulation of RNA processing • Regulation of mRNA degradation • Regulation of translation mRNA: many places for variation, modification, regulation • transcription • • • • • 5’ capping 3’ polyA addition • • • • editing • changing bases and codons nonsense-mediated decay degradation signals sequestration • • • mature mRNA only stability • • alternative sites alternative exons self-splicing, spliceosomemediated nuclear export • splicing • • • initiation elongation termination • localization in cytoplasmic compartments access to translation machinery antisense/RNA interference • inhibit translation The PolyA Site (PAS) PAS stop UTR 3’ exon PolyA signal ~17nt AATAAA T AAAAAAAAA AAAA Alternative polyadenylation sites Alternative PAS & Post-transcriptional (de)regulation Coding sequence Possible regulatory element (stability, translation, transport) 3' UTR AUUAAA AUUAAA AUUAAA AUUAAA AUUAAA Use of abnormal polyA site is associated to various diseases: A/B Thalassemia (globin) Mantle cell lymphoma (Cyclin CCND1) Teratocarcinoma (PDGF) Hypertension (Ca2+ ATPase) Consensus nucleotides at intron/exon junctions Alternative splicing is a mechanism for Generating functional diversity Alternative processsing example RNA editing RNA editing is a rare form of post-transcriptional processing whereby base-specific changes are enzymatically introduced at the RNA level. Types of RNA editing in humans: (i) C---> U, occurs in humans by a specific cytosine deaminase e.g. The expression of the human apolipoprotein B gene in the intestine involves tissue-specific RNA editing (ii) A ---> I, the amino group in in carbon 6 of adenine is replaced by a carbonyl group. I then acts as a G. Occurs in some ligandgated ion channels. (iii) U ---> C, in mRNA of the WT1 Wilms’ tumor gene (iv) U ---> A, in alpha-galactosidase mRNA Apo B-100 Apo B-48 Gene Expression • The products of genes may be RNA or protein • RNA and protein synthesis occur in many steps • These steps are frequently regulated 3. Protein Phosphorylation Post-translational modifications that alter activity of the p53 protein. Enzymes that have been shown to modify specific amino acid residues of p53 are shown. Enzymes that inhibit the covalent modifications are indicated in red. P, phosphorylation; R, ribosylation; Ac, acetylation. …increasing informational capability of the genome, but there are other genes….