Introduction • Day 1: Introduction • Day 2: Sequence Analysis • Day 3: Databases • Day 3: Dynamic Programming mario@sanbi.ac.za Goals of Bioinformatics • Understand living cells and how they function on a molecular level • Done by analysing molecular sequence and structural data • Rationale is the “central dogma” of biology Genomic Data (2009) http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome Genomic Data (2010) http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome Bioinformatics Limitations • Completely relying on the information is • dangerous if the info is inaccurate Quality of bioinformatics predictions depends on quality of the data and sophistication of the algorithms • Bioinformatics and experimental biology are complementary: Bioinformatics results need to be consistent with experimental biology Bioinformatics Limitations • Data (e.g. sequence, expression) may contain • • • errors Downstream interpretation of sequence date will be wrong if the sequences or the annotation thereof is wrong Many algorithms lack capability and sophistication to truly reflect reality Outcome of computation also depends on available computing power Definitions • Sequence alignment • Dynamic • • • • • Programming Global/ Local Alignment Sequence Identity Phylogenetics Paralog/ homolog Proteomics • • • • • • Genomics Transcriptomics Annotation BLAST Sequence assembly Contig ‘Omics’ • • • • • Genomics Proteomics Transcriptomics Phylolomics etc. Genomics Structural Functional Structural Genomics • Deals with genome structures • Focus on study of Genome mapping Genome sequencing and assembly Genome annotation Genome comparison Structural Genomics: Genome mapping • Identify relative locations of Genes Mutations or Traits Structural Genomics: Genome mapping Increasing Resolution Cytological Map Genetic Map Physical Map DNA Sequence Image adapted from “Essential Bioinformatics” by Jin Xiong For more info: Look at Chap 5 in “Genomes”, T.A. Brown (572.86 MAL) in the UWC Library or the online version at http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.6196 Structural Genomics: Genome mapping Increasing Resolution Cytological Map Genetic Map Physical Map DNA Sequence Image adapted from “Essential Bioinformatics” by Jin Xiong For more info: Look at Chap 5 in “Genomes”, T.A. Brown (572.86 MAL) in the UWC Library or the online version at http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.6196 Structural Genomics: Genome mapping Increasing Resolution Cytological Map Genetic Map Physical Map * * * * * DNA Sequence Image adapted from “Essential Bioinformatics” by Jin Xiong For more info: Look at Chap 5 in “Genomes”, T.A. Brown (572.86 MAL) in the UWC Library or the online version at http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.6196 Structural Genomics: Genome mapping Increasing Resolution Cytological Map Genetic Map Physical Map DNA Sequence * * * * * agctggatttgcgcgcaa Image adapted from “Essential Bioinformatics” by Jin Xiong For more info: Look at Chap 5 in “Genomes”, T.A. Brown (572.86 MAL) in the UWC Library or the online version at http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.6196 Structural Genomics: Genome sequencing • Shotgun sequencing Genome is fragmented and cloned Random sequencing of both ends of cloned DNA High numbers of random sequences It statistically ensures the whole genome is covered Software used to assemble the random fragments into a single, contiguous genome Structural Genomics: Shotgun sequencing http://www.scq.ubc.ca/wp-content/uploads/2006/08/shotgun1.gif Structural Genomics: Genome sequencing • Hierarchical sequencing 100-300kb genomic cloned into a BAC Using a physical map, order and locations of BAC clones on chromosome can be determined Successive sequencing of adjacent BAC clones result in coverage of the complete genome Structural Genomics: Hierarchical sequencing http://www.scq.ubc.ca/wp-content/uploads/2006/08/topdownseq.gif Structural Genomics: Shotgun vs Hierarchical Shotgun Hierarchical Structural Genomics: Genome assembly • Sequence fragments are stitched together through the overlapping sequences between fragments CCAATAA CACCATT TATAAT AATTGGCA TTGAATA Structural Genomics: Genome annotation • Happens before submission to database Gene prediction: GenScan, FgenesH • Verify predictions BLAST search against sequence database Compare to experimentally determined cDNA and EST sequences: GeneWise, Spidey, SIM4, EST2Genome Manual checking by human curators • • Functional assignment BLAST Homology searching against protein database Search protein motif and domain databases: Pfam and Interpro Structural Genomics: Genome annotation http://hinvlite.sanbi.ac.za Structural Genomics: Genome comparison • Comparison of Gene number Gene location Gene content • Reveals extent of conservation between genomes • Reveals core set of genes crucial for survival; the “Minimal Genome” Structural Genomics: Genome comparison http://www.sanger.ac.uk/Software/ACT/ Functional Genomics • Focus on gene function On genome level, using High throughput methods • Conducted using Sequence-based Microarray-based methods Functional Genomics: Sequence-based • Expressed Sequence Tag (EST) Provide rough estimate of actively expressed genes under specific physiological conditions • Serial Analysis of Gene Expression (SAGE) Provides quantitative analysis of mRNA expression Occurrence and quantity of a specific fragment indicates level of gene expression Functional Genomics: ESTs • Selected mRNA • • • sequences are reverse transcribed into cDNA clones cDNA clones are then sequenced Obtained from 5’ or 3’ end Typically 500bp long http://www.ncbi.nlm.nih.gov/About/primer/est.html Functional Genomics: ESTs • EST Limitations Often low quality Contamination (vector) Chimera Represent partial genes • Despite this ESTs are still widely used (www.ncbi.nlm.nih.gov/dbEST) Functional Genomics • EST Gene index construction Organise and consolidate ESTs s.t. data can be used to extract full-length cDNAs • • • • Remove contaminants Mask repeats Cluster sequences Within a cluster, assemble overlapping ESTs into contigs/ consensus sequences • Annotation: similar to process for genome Examples: Unigene, StackPack, TGI Functional Genomics: SAGE • Short DNA • fragment (15-20bp) is cut from a cDNA and used as unique marker for that transcript Fragments are concatenated, cloned and sequenced http://www.sagenet.org/protocol/MANUAL1e.pdf Functional Genomics: Microarrays • Immobilised probes (oligonucleotides or cDNA) are ‘spotted’ on a chip • Probes are representative of a complete genome • Fluorescent cDNA from organism is allowed to hybridise with the probes • Intensity of fluorescence per spot reflect the amount of mRNA present Proteomics: Technology • 2D-Page Gel: Separates proteins based on charge and mass Melanie, CAROL, Comp2Dgel, SWISS-2DPAGE • Mass Spectrometry (MS): peptide is fragmented, • aspirated and the mass-to-charge ratio is determined Database searching: Using peptide fingerprint obtained from MS, a database can be searched ExPASY: AAcompIdent, TagIdent, PeptIdent, CombSearch ProFound, Mascot Proteomics: Technology • Differential In-gel Electrophoresis (DIGE) Proteins from experimental and control samples are labeled with different colored dyes Differentially expressed proteins can be coseparated and visualised on the same gel Proteomics: Technology • Protein Microarrays Chip contains immobilised proteome Used to study protein function Assay • • • • Protein-protein interaction Protein-DNA/ RNA interactions Protein-ligand interactions Enzyme activity Proteomics: Post-translational Modifications • For activity, many proteins have to be covalently • • • modified before or after folding process Proteolytic cleavage, formation of disulfide bonds, addition of phosphoryl, methyl, acetyl groups, etc. Modifications impact protein function Bioinformatics can predict sites for modification AutoMotif, Cysteine, FindModand GlyMod(available from ExPASY), RESID Proteomics: Protein Sorting • Sub-cellular localisation is integral to protein function • Many proteins are only active when after being transported to specific compartments • Identifying protein localisation is important in functional annotation SignalP, TargetP, PSORT Proteomics: Protein-protein Interactions • Experimental determination • Prediction based on Domain fusion Gene neighbours Sequence homology Phylogenetic information Hybrid methods