BB30055: genes & genomes MV Hejmadi 2004-05 Post-genomics Limitations of the HGP: What genomics cannot do is predict what proteins are encoded by the genes, their functions and their interactions at the cellular level. The challenge post-genomics includes transcriptome and proteome analysis, which includes identification and quantitation of proteins, their cellular localisation as well as modification, interactions, activities and finally, function. The information from any genome sequencing database facilitates studies on the following (A) Identifying genes from the sequence: 2 general approaches to gene hunting Ab initio method (computational): This method involves scanning DNA sequences (Bioinformatics) for special features associated with genes, including detection of exons and other sequence signals like splice sites, by computational methods. A number of software used for automated annotation of genes like GENSCAN, GENEBUILDER etc are being used. These software employ a range of strategies including Scanning ORFs (open reading frames) – initiation or termination codons o Codon bias found in specific species o Exon-intron boundaries o Upstream control sequences – e.g conserved motifs in transcription factor binding regions o CpG islands Homology searches Experimental method: Experimental evaluation based on the use of transcribed RNA to locate exons and entire genes from DNA fragment. These include a) Hybridisation approaches – Northern Blots, cDNA capture / cDNA select, Zoo blots b) Transcript mapping: RT-PCR, RACE etc. (B) Gene expression profiling (determining gene function): Uses either or both COMPUTATIONAL APPROACH: Homology searches for either orthologous genes (homologues in different organisms with common ancestor) or paralogous genes (genes in the same organism, e.g. multigene families) EXPERIMENTAL APPROACH: Includes functional analysis of known genes using methods such as a) gene inactivation (knockouts, RNAi, site-directed mutagenesis, transposon tagging, genetic footprinting etc) b) gene overexpression (transgenics, reporter genes, knock-ins, etc) (C) Genome activity studies: Functional gene expression on its own is not enough. It needs to be complemented by transcriptome and proteome analyses in order to understand how the cell operates. The transcriptome (global mRNA profiling) The transcriptome can be defined as the complete collection of transcribed elements of the genome. In addition to mRNAs, it also represents non-coding RNAs, which are used for structural and regulatory purposes. Alterations in the structure or levels of expression of any one of these RNAs or their proteins can contribute to disease. An understanding of the transcriptome will provides clues on Regions of transcription Transcription factor binding sites Sites of chromatin modification Sites of DNA methylation Chromosomal origins of replication (Transcriptome maps for chromosomes 21 and 22 published (Science (2002) May 3; 296: 916-919) ) Transcriptome studies can be done by either SAGE (serial analysis of gene expression) Microarrays (the human transcriptome map is available at http://bioinfo.amc.uva.nl/HTM/) The proteome Proteome projects worldwide are co-ordinated by the HUPO (Human Protein Organisation) and involve protein biochemistry on a unprecedented, high-throughput scale. However, the problems associated with proteomics include limited and variable sample material, sample degradation, abundance, post1 BB30055: genes & genomes MV Hejmadi 2004-05 translational modifications, huge tissue, developmental and temporal specificity as well as disease and drug influences. The main areas of proteomics research are 1) Mass spectrometry-based proteomics: Approaches involves protein separation by 2-D gel electrophoresis followed by MS of the protein spots and is based on de-novo analysis of proteins from cells and tissues. MS-based proteomics relies on the discovery of protein ionisation techniques. MSbased proteomics can be used for protein identification and quantification, profiling, protein interactions and modifications. Principle of MS: Any MS consists of an ion source, a mass analyser that measures mass-to-charge ratio (m/z) of the ionised analytes and a detector that registers the number of ions at each m/z value. Electrospray ionsation (ES) and matrix-assisted laser desortion/ionisation (MALDI) are the 2 techniques commonly used to volatize/ionise the proteins/peptides for MS analysis. MALDI-MS is used for simple peptide mixtures whereas ESI-MS is used for complex samples. 2) Array-based proteomics: Based on the cloning and amplification of identified ORFs into homologous (ideally used for bacterial and yeast proteins) or sometimes heterologous systems (insect cells which result in post-translational modifications similar to mammalian cells). A fusion tag (short peptide or protein domain that is linked to each protein member e.g. GST) is incorporated into the plasmid construct. These constructs can then be used to analyse a. Protein expression and purification: b. Protein activity: Analysis can be done using either biochemical genomics or functional protein microarrays. c. Protein interaction analysis can be done using methods such as two-hybrid analysis (yeast 2hybrid), FRET (Fluorescence resonance energy transfer), phage display etc d. Protein localisation: allows understanding protein function in complex cellular networks by immunolocalisation of epitope-tagged products. E.g the use of GFP or luciferase tags. 3) Structural proteomics and imaging techniques 4) informatics 5) clinical proteomics Comparative genomics Comparing genomes of various organisms (e.g mouse) can help in studying human disease genes or help in mapping other genes. References for post-genomics (A) and (B) 1) Chapter 7 from Genomes2 by T Brown OR Chapter 19 from Human Molecular Genetics3 by Strachan & Read 2) Science (2001) Vol 291 No5507 pp1257-60 References for (C) proteomics: 1) Nature (13 March 2003). Proteomics insight articles from Vol. 422, No. 6928 pgs 191-197. 2) Genomes 2 by TA Brown, pgs 208-213 Optional Reading 1) Boheler KR and Stern MD. Trends in Biotechnology (Feb 2003) Vol 21(2) pp 55-57. The new role of SAGE in gene discovery 2