Welcome! Mass Spectrometry meets ChemInformatics Tobias Kind and Julie Leary UC Davis Course 3: Mass spectral and molecular database search Class website: CHE 241 - Spring 2008 - CRN 16583 Slides: http://fiehnlab.ucdavis.edu/staff/kind/Teaching/ PPT is hyperlinked – please change to Slide Show Mode 1 Molecules and mass spectra Close relationship between molecular structure and mass spectra Molecular structure is reflected in mass spectral features (peaks, peak heights and peak combinations) Mass spectra reflect a state of gas phase ion physics and chemistry (rearrangements, fragmentations, bond cleavages) Electron impact (70 eV) mass spectra; Source: NIST05 2 Molecules and mass spectra Similar structures may or may have not similar mass spectra 130 100 Si N 73 Si 50 0 47 59 59 91 65 91 147 105 114 102 163 179 188 163 132 147 204 206 O 294 220 280 179 Si N 50 100 O 44 Si 73 116 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 Silanamine, N,1,1,1-tetramethyl-N-[1-methyl-2-phenyl-2-[(trimethylsilyl)oxy]ethyl]-, [S-(R*,R*)]N-Methylphenylethanolamine, bis(trimethylsilyl)- Electron impact (70 eV) mass spectra; Source: NIST05; Created using structure similarity search in NIST MS Search program 3 Molecules and mass spectra Similar mass spectra may or may have not similar structures 43 100 55 70 83 97 50 29 111 27 125 65 15 0 32 140 139 154 153 168 196 168 196 125 27 111 29 50 97 41 100 10 20 30 40 69 55 50 60 1-Tetradecene 70 83 80 90 100 110 120 130 140 150 160 170 180 Cyclotetradecane 190 200 210 Electron impact (70 eV) mass spectra; Source: NIST05; Created using spectral similarity search in NIST MS Search program 4 Mass spectral databases I Name NIST05 Wiley 8 Palisade 600K Spectra count 200,000 400,000 600,000 Type electron impact spectra (EI 70 eV) electron impact spectra (EI 70 eV) electron impact spectra (EI 70 eV) NIST MS/MS MassFrontier 5,200 7,000 MS/MS (ESI, +/-, 30-100V CID) MSn, ESI, (Spectral Tree Library ) Important is data quality Annotation with CAS and Structure and Formula Link to literature or publication useful Currently no large ESI,APPI,APCI libraries available (free or commercial) 5 Mass spectral databases II Smaller specialized libraries Pfleger Maurer Weber (Drugs) MS+RI, 70eV MassFinder (Volatiles) MS+RI, 70eV RIZA DB (Toxicants) MS+RI, 70eV Golm DB (primary Metabolites) MS+RI, 70eV Fiehnlib (primary Metabolites) MS+RI, 70eV MassBank (Metabolites) ESI, MSn , accurate masses AAFS (Drugs, Forensic,Toxicology), MS+RI, 70eV ChemicalSoft (Drugs), MS/MS, MSE _____________________________________________________________ 272 100 Cl Cl Cl Cl Cl 50 Cl 237 Cl Cl Cl 332 Cl Cl Cl 404 0 230 250 270 290 310 330 350 370 390 410 430 (riza_web) |RI|2583|KEY|1596|CAS|2385-85-5|FRML|Empty|CMPD|Mirex| 450 In case of electron impact (EI) same GC-Column (DB-5, RTX-5, DB-1, OV-1) and temperature program must be used for matching retention indices In case of ESI, APPI spectra (LC-MS) same mass spectrometer design and setup should be used (triple-quad, ion-trap, TOF, Q-TOF), collision energy 6 Mass spectral search algorithms PBM - Probability Based Matching (McLafferty & Stauffer) – since 1976 Dot Product (Finnigan/INCOS) – since 1978 Weighted Dot Product (Stein) – since 1993 Mass Spectral Tree Search (Mistrik) – since 21st century Weighted Dot Product: Source: Stein S.E. see notes Au and Ar: are the abundances of peaks in the user and reference mass spectra m: m/z values w: weighting term 7 NISTMS mass spectral search The NIST MS Search program is the “gold standard” for EI spectral search Used for all types of unit resolution spectra MS/MS, APCI, ESI-MS spectra 8 NIST MS Search program 2.0 Search everything: A) Library Search: Reverse, Normal, Similarity, Neutral Loss B) Structure Similarity Search: find molecules similar to C) Formula Search: find C11H13N3O3S D) Constrained peak search: find peaks with m/z 122 and 188 and 266 E) Name search: find Stuntman (maleic hydrazide) Search Connections: Import/Export molecular structures: (msp, hpj, sdf) Interpret Structures (MSInterpreter.exe) Find substructures (expert algorithm) Import spectra from other programs (AMDIS, Chemstation, ChromaTOF) [Download] – freely available (NIST05 MS Library is licensed ~ $1200) 9 Mass Spectral Trees in Mass Frontier MassFrontier searches MSn and CID mass spectra 10 Source: MassFrontier Helpfile Mass Frontier MS search MS Tree Hitlits 11 Mass spectral search Library search is always the first step during the identification process. Usually library search is not enough to assign unique isomer structures. Mass spectra must be clean and background free before search. For LC-MS and GC-MS this requires peak picking and deconvolution. Additional orthogonal information has to be used: • • • • • restriction of compound space to certain species or material use of isotope pattern information use of retention index if derived from GC-MS data use of retention – logp or logD correlations in case of LC-MS additional fragmentation at different voltages (MSE) Only certain mass spectra can be in-silico predicted (calculated) (peptides, lipids, carbohydrates) – this is not the rule for other molecules 12 MALDI MS based proteomics Clinical Science www.clinsci.org Clin. Sci. (2005) 108, 369-383 13 LC-MS based proteomics approach 14 Source: Paul Rudnick / NIST Proteomics data analysis (pipeline) General approaches A) database search (Sequest, Mascot, OMSSA) B) de-novo sequencing (Peaks, Lutefisk, Pepnovo) C) hybrid methods (GutenTag, Popitam, Inspect) 15 Picture Source: Paul Rudnick / NIST OMSSA- Open mass spectrometry search algorithm • submit spectra to MS/MS search • in-silico digestion of proteins • matching of experimental vs. calculated MSn • hit score computation • inspection and review of results Download OMSSA 16 Source: OMSSA (NCBI) Mass spectral search of peptides (new) 17 See also ProMEX (MPIMP Golm) Source: Paul Rudnick / NIST Conversion of mass spectral libraries Usually a hassle. Keep a copy of libraries always in non-proprietary format. Request export functions or converters from your mass spec producer. XCalibur LibraryManager.exe NIST LIB2NIST.exe [LINK] Thermo Electron Fisher Finnigan MAT ICIS/GCQ/ITS 40 (*.lib, *.lbr) AutoMass (*.spr, *.prs, *.nam, *.hdr, *.fsf, *.cfs) MassLab (*.idb) to NIST and vice versa Spectral files *.msd, *.hpj, *.sdf HP LIB (*.LIB), NIST LIB, JCAMP-DX, (*.jdx *.hpj) 18 How to search molecules Exact search Substructure search Similarity search N Ligand search N L [O,Cl] 19 R-group/Markush search NIST MS DB has structure similarity search Good for comparing mass spectra of similar compounds (may have similar mass spectra) 20 Searching Molecules on PubChem 18 million compound DB (++) Goto PubChem Structure Search 21 CAS SciFinder • 33 million molecules and 60 million peptides/proteins • largest reaction DB (14 million reactions) and literature DB • substructure and similarity search of structures • a must for chemists and biochemists/biologists • no bulk download, no good Import/ Export, no Link outs Download Scifinder 22 Structure search in SciFinder Retrieved 4000 papers (refine search only MS and MALDI) 23 How scientist publish mass spectra (*) Today: PD F A Scientist A Runs MS B Publication on paper as bitmap graphic OCR DB Curation DB Creation Sell DB Scientist B Needs DB Better: A DB B Central and Open Repository Electronic Publishing in XML Computerized Free or Paid Curation OCR – optical character recognition DB – database (*) – and structures and other spectral data 24 Open data repository for mass spectra Submit spectra before publication (ticket system) No loss of information (high resolution spectra) No truncated data (report five peaks only) No hamburger to cow algorithm needed (OCR) Fast and instant use with no restrictions New synergism for data interpretation Can still cost money (curation) Works in genomic sciences (GenBank) Commercial use may be possible … checkout the BlueObelisk DB Central and Open Repository 25 The Last Page - What is important to remember There are different search types for mass spectral data similarity search, reverse search, neutral loss search, MS/MS search There are large libraries for electron impact spectra (EI) from GC-MS There are no large open/commercial libraries for spectra from LC-MS For creation of mass spectral libraries a holistic approach is important Mass spectral trees can give further information (MSE or MSn) There are different types of searching structures Exact search, similarity search, substructure search Before you start a research project, create target lists of possible candidates Collect mass spectra or structures in libraries with references 26 Reading list (20 min) Chemical derivatization and mass spectral libraries in metabolic profiling by GC/MS and LC/MS/MS The critical evaluation of a comprehensive mass spectral library Development and validation of a spectral library searching method for peptide identification from MS/MS Additional reading list for very diligent and interested pupils (30 min) (*) An MS/MS Library on an Ion-Trap Instrument for Efficient Dereplication of Natural Products. Different Fragmentation Patterns for [M + H]+ and [M + Na]+ Ions The History of the NIST/EPA/NIH Mass Spectral Database (WO2006040622) DETERMINATION OF MOLECULAR STRUCTURES USING TANDEM MASS SPECTROMETRY [Link] [PDF] (*) Edison: “Two per cent is genius and 98 per cent is hard work” “Bah. Genius is not inspired. Inspiration is perspiration” [SOURCE] 27 Tasks (7 min): Should be solved and may be graded 1) Goto PubChem [LINK] or Chemspider [LINK] and perform the 3 different structure searches using benzene; report on the number of results (use the sketch function to draw benzene (6 ring with 3 aromatic bonds)) 2) Download NIST MS Search [LINK] and perform the 3 different mass spectral searches on cocaine (download JAMP-DX from NIST [link]) 3) Use Instant-JChem [LINK] from last course session and create a local demo database with PubChem data. Perform 3 different structure searches with benzene by double-clicking on the structure search field. Report number of results. Additional task for proteomics candidates: 4) Download the NIST peptide search [LINK] and perform a search on the given examples 28 Link List http://www.google.com/search?hl=en&q=rearrangements%2C+fragmentations%2C+bond+cleavage&btnG=Search High-resolution mass spectral database http://www.massbank.jp/ http://www.google.com/search?hl=en&q=mistrik+highchem&btnG=Search http://www.google.com/search?hl=en&q=stein+se+peptide+search&btnG=Search http://fields.scripps.edu/sequest/ http://books.google.com/books?lr=&as_brr=0&q=EDISON+Genius++ +inspiration+++perspiration+++date%3A1800-1898&btnG=Search+Books http://allured.stores.yahoo.net/idofesoilbyg.html (fragrances, terpenoid mass spectra SE-52 column + RIs) http://kanaya.naist.jp/DrDMASS/DrDMASSInstruction.pdf http://www.google.com/search?q=mass+spectral+libraries+NIST05&hl=en&start=10&sa=N http://books.google.com/books?id=7IUVi06u0TQC&pg=PA114&lpg=PA114&dq=cid+mass+spectra http://www.google.com/search?hl=en&q=cid+mass+spectra+library+pbm+dot+product&btnG=Google+Search http://www.google.com/search?hl=en&q=%22similarity+search%22+Substructure+search%22+%22exact+search%22&btnG=Search http://mmass.biographics.cz/ http://pubchem.ncbi.nlm.nih.gov/omssa/browser_help.htm#RunOMSSASearchLocalDialog http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1906842 http://www.google.com/search?hl=en&q=proteomics+sequest+mascot++mudpit+OMSSA&btnG=Search http://www.google.com/search?hl=en&q=de+novo+sequencing+peaks+sequit+lutefisk&btnG=Search 29