Bioinformatics Lecture 2

Tools to analyze protein characteristics 3-D fold model Identification of conserved regions -Family member -Multiple alignments Protein sequence Protein sorting and sub-cellular localization Some Signal sequence (tags) Evolutionary relationship (Phylogeny) Protein modifications Anchoring into the membrane nascent proteins contain a specific signal, or targeting sequence that directs them to the correct organelle. (ER, mitochondrial, chloroplast, lysosome, vacuoles, Golgi, or cytosol) Questions Can we train the computers: To detect signal sequences and predict protein destination? To identify conserved domains (or a pattern) in proteins? To predict the membrane-anchoring type of a protein?  (Transmembrane domain, GPI anchor…) To predict the 3D structure of a protein? Learning algorithms are good for solving problems in pattern recognition because they can be trained on a sample data set. Classes of learning algorithms: -Artificial neural networks (ANNs) -Hidden Markov Models (HMM) Artificial neural networks (ANN) Machine learning algorithms that mimic the brain. Real brains, however, are orders of magnitude more complex than any ANN. ANN is composed of a large number of highly interconnected processing elements (neurons) working simultaneously to solve specific problems. like people, learn by example. ANNs cannot be programmed to perform a specific task. ANNs, The first artificial neuron was developed in 1943 by the neurophysiologist Warren McCulloch and the logician Walter Pits. Hidden Markov Models (HMM) Used to answer questions like: What is the probability of obtaining a particular outcome? What is the best model from many combinations?  HMM is a probabilistic process over a set of states, in which the states are “hidden”. It is only the outcome that visible to the observer. Hence, the name Hidden Markov Model. HMM has many uses in genomics: Gene prediction (GENSCAN) SignalP Finding periodic patterns  The ExPASy (Expert Protein Analysis System) Expasy server (http://au.expasy.org) is dedicated to the analysis of protein sequences and structures. Sequence analysis tools include: DNA -> Protein [Translate] Pattern and profile searches Post-translational modification and topology prediction Primary structure analysis Structure prediction (2D and 3D) Alignment   PredictProtein: A service for sequence analysis, and structure prediction http://www.predictprotein.org/newwebsite/submit.html  TMpred:  TMHMM: Predicts transmembrane helices in proteins (CBS; Denmark) http://www.ch.embnet.org/software/TMPRED_form.html http://www.cbs.dtu.dk/services/TMHMM-2.0/  big-PI : Predicts GPI-anchor site:http://mendel.imp.univie.ac.at/sat/gpi/gpi_server.html  DGPI: Predicts GPI-anchor site: http://129.194.185.165/dgpi/index_en.html  SignalP: Predicts signal peptide: http://www.cbs.dtu.dk/services/SignalP/  PSORT: Predicts sub-cellular localization:  TargetP: Predicts sub-cellular localization: http://www.cbs.dtu.dk/services/TargetP/  NetNGlyc: Predicts N-glycosylation sites:http://www.cbs.dtu.dk/services/NetNGlyc/  PTS1: Predicts peroxisomal targeting sequences http://www.psort.org/ http://mendel.imp.univie.ac.at/mendeljsp/sat/pts1/PTS1predictor.jsp  MITOPROT: Predicts of mitochondrial targeting sequences http://ihg.gsf.de/ihg/mitoprot.html Hydrophobicity: http://www.vivo.colostate.edu/molkit/hydropathy/index.html http://www.cbs.dtu.dk/services/: prediction server NetNGlyc: Predicts N-glycosylation sites: http://www.cbs.dtu.dk/services/NetNGlyc/ NetPhos: Predicts phosphorylation of residues: http://www.cbs.dtu.dk/services/NetPhos/ NetPhosK: Predicts recognition sites for specific kinases: http://www.cbs.dtu.dk/services/NetPhosK/ NetAcet: N-terminal acetylation in eukaryotic proteins: http://www.cbs.dtu.dk/services/NetAcet/ NetCGlyc: C-mannosylation sites in mammalian proteins Multiple alignment Used to do phylogenetic analysis: Same protein from different species Evolutionary relationship: history  Used to find conserved regions Local multiple alignment reveals conserved regions Conserved regions usually are key functional regions These regions are prime targets for drug developments Protein domains are often conserved across many species  Algorithm  for search of conserved regions: Block maker: http://blocks.fhcrc.org/blocks/make_blocks.html Multiple alignment tools Free programs: Phylip and PAUP: http://evolution.genetics.washington.edu/phylip.html Phyml: http://atgc.lirmm.fr/phyml/  The most used websites : http://align.genome.jp/ http://prodes.toulouse.inra.fr/multalin/multalin.html http://www.ch.embnet.org/index.html (T-COFFEE and ClustalW)  ClustalW:  Standard popular software  It aligns 2 and keep on adding a new sequence to the alignment  Problem: It is simply a heuristics. Motif  discovery: use your own motif to search databases: PatternFind: http://myhits.isb-sib.ch/cgi-bin/pattern_search http://meme.nbcr.net/meme4_6_0/intro.html Phylogenetic analysis Phylogenetic Describe Major trees evolutionary relationships between sequences modes that drive the evolution: Point mutations modify existing sequences Duplications (re-use existing sequence) Rearrangement Two most common methods Maximum parsimony Maximum likelihood The most useful software: http://www.megasoftware.net/mega4/m_con_select.html Definitions Homologous:Have Orthologous: Paralogous: a common ancestor. Homology cannot be measured. The same gene in different species . It is the result of speciation (common ancestral) Related genes (already diverged) in the same species. It is the result of genomic rearrangements or duplication Determining protein Structure-Function  Direct measurement of structure X-ray crystallography NMR spectroscopy    Site-directed mutagenesis Computer modeling Prediction of structure Comparative protein-structure modeling  Comparative protein-structure modeling  Goal:Construct 3-D model of a protein of unknown structure (target), based on similarity of sequence to proteins of known structure (templates)  Procedure: Template selection Template–target alignment Model building Model evaluation  Blue: predicted model by PROSPECT Red: NMR structure The Protein 3-D Database The Protein DataBase (PDB) contains 3-D structural data for proteins Founded in 1971 with a dozen structures As of June 2004, there were 25,760 structures in the database. All structures are reviewed for accuracy and data uniformity. 80% come from X-ray crystallography 16% come from NMR 2% come from theoretical modeling  Structural data from the PDB can be freely accessed at http://www.rcsb.org/pdb/ High-throughput methods Most used websites for 3-D structure prediction Protein Homology/analogY Recognition Engine (Phyre) at http://www.sbg.bio.ic.ac.uk/phyre/html/index.html PredictProtein at http://www.predictprotein.org/newwebsite/submit.html UCLA Fold Recognition at http://www.doe-mbi.ucla.edu/Services/FOLD/ Commercial bioinformatics softwares CLC Genomics Workbench Genomics: 454, Illumina Genome Analyzer and SOLiD sequencing data; De novo assembly of genomes of any size; Advanced visualization, scrolling, and zooming tools; SNP detection using advanced quality filtering; Transcriptomics: RNA-seq including paired data and transcript-level expression; Small RNA analysis; Expression profiling by tags; Epigenetics: Chromatin immunoprecipitation sequencing (ChIP-seq) analysis; Peak finding and peak refinement; Graph and table of background distribution; false discovery rate; Peak table and annotations; VectorNTI: Sequence analysis and illustration; restriction mapping; recombinant molecule design and cloning; in silico gel electrophoresis; synthetic biology workflows AlignX: BioAnnotator: ContigExpress: GenomBench The bioinformatics not covered in this class Comparative genomics and Genome browser: http://genome.lbl.gov/vista/index.shtml http://www.sanger.ac.uk/resources/software/artemis/ Genome annotation: http://linux1.softberry.com/berry.phtml http:// rast.nmpdr.org/ Metagenomics: http://metagenomics.anl.gov/ System biology tools.

Bioinformatics Lecture 2

Related documents

Products

Support

Bioinformatics Lecture 2

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib