Data Analysis Pipeline from heterogeneous MS

advertisement
BIOM 209/CHEM 210/PHARM 209
Interrogating Gene, Protein and Lipid Databases:
A Bioinformatics Perspective
Dr. Eoin Fahy, University Of California San Diego
®
Professor Edward A. Dennis
Department of Chemistry and Biochemistry
Department of Pharmacology, School of Medicine
University of California, San Diego
Copyright/attribution notice: You are free to copy, distribute, adapt and transmit this tutorial
or individual slides (without alteration) for academic, non-profit and non-commercial
purposes. Attribution: Edward A. Dennis (2010) “LIPID MAPS Lipid Metabolomics Tutorial”
www.lipidmaps.org
E.A. DENNIS 2016 ©
Definition of a lipid*
Lipids may be broadly defined as hydrophobic or
amphiphilic small molecules that originate entirely or
in part from two distinct types of biochemical
subunits or "building blocks": ketoacyl and isoprene
groups. Using this approach, lipids may be divided
into eight categories : fatty acyls, glycerolipids,
,glycerophospholipids, sphingolipids, saccharolipids
and polyketides (derived from condensation of
ketoacyl subunits); and sterol lipids and prenol lipids
(derived from condensation of isoprene subunits).
* Fahy,E. et al, Journal of Lipid Research, Vol. 46, 839-862, May 2005
Fundamental biosynthetic units of lipids
Lipid classification: biosynthetic routes
LIPID MAPS Classification System
Categories and Examples
Category
Abbreviation
Example
Fatty acyls
FA
Dodecanoic acid
Glycerolipids
GL
1-hexadecanoyl-2-(9Z-octadecenoyl)-snglycerol
Glycerophospholipids GP
1-hexadecanoyl-2-(9Z-octadecenoyl)-snglycero-3-phosphocholine
Sphingolipids
SP
N-(tetradecanoyl)-sphing-4-enine
Sterol lipids
ST
Cholest-5-en-3b-ol
Prenol lipids
PR
2E,6E-farnesol
Saccharolipids
SL
UDP-3-O-(3R-hydroxy-tetradecanoyl)-aDN-acetylglucosamine
Polyketides
PK
Aflatoxin B1
J. Lipid Res. Classification publications
Journal of Lipid Research, Vol. 46, 839-862, May 2005
Journal of Lipid Research, 50th anniversary edition, May 2009
LIPID MAPS Lipid classification system
Category
Abbrev
Fatty Acyls
FA
Glycerolipids
GL
Glycerophospholipids GP
Sphingolipids
Sterol Lipids
Prenol Lipids
Saccharolipids
Polyketides
SP
ST
PR
SL
PK
Example
Arachidonic acid
1-hexadecanoyl-sn-glycerol
1-hexadecanoyl-2-(9Z-octadecenoyl)sn-glycero-3-phosphocholine
Sphingosine
Cholesterol
Retinol
Kdo2-lipid A
epothilone D
Name: PGE2
LM_ID: LMFA03010003
LM_ID description:
Database: LM (LIPID MAPS)
Category: FA (Fatty Acyls)
Main Class: 03 (Eicosanoids)
Sub Class: 01 (Prostaglandins)
Unique identifier within a sub class: 0003
LIPID MAPS: Recommendations for drawing structures
Consistent structure representation across classes
Fatty Acyls(FA)
Glycerolipids (GL)
Glycerophospholipids (GP)
Sphingolipids (SP)
Sterol Lipids (ST)
Prenol Lipids (PR)
Structural comparison of SM and PC
Online lipid structure-drawing tools
http://www.lipidmaps.org/tools/index.html
Online drawing tools for
various lipid categories
(FA,GL,GP,SP,ST)
Structures may be saved as Molfiles
LIPID MAPS Lipidomics gateway
http://www.lipidmaps.org
#Lipids in LMSD by year
45000
40000
35000
30000
25000
20000
15000
10000
5000
0
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
Lipids per category in LMSD
10000
9000
Total: 40,360
8000
7000
6000
5000
4000
3000
2000
1000
0
FA
GL
GP
SP
ST
PR
SL
PK
Populating LIPID MAPS structure database
Structures from core labs
and partners
New structures
identified by LIPID MAPS
experiments
Computationally
generated structures
Public databases
Websites,
Publications
LIPID
MAPS
structure
database
Search LMSD by browsing classification hierarchy
Search LMSD by structure, text, mass, formula ,ontology
Search LMSD with ontology terms
e.g. find all lipids with 20
carbons, 3 double bonds, at
least 3 hydroxyl groups and
1 epoxy group
LMSD Detail view for a lipid structure
Structure
LM_ID
Names, synonyms
m/z calculation tool
Lipid classification
Database cross-references
InChiKey identifier
MS/MS spectrum
Physicochemical properties
Other structure formats
Alternative lipid subclasses/functionality
Take advantage of built-in ontology feature for all lipid
structures in LMSD
Use InChIKey to find structures differing only in stereochemistry,
double-bond geometry or isotopic labeling
Use InChIKey (full or partial) to perform a Google structure search
European Bioinformatics
Inst.
LIPID MAPS
PubChem
Querying Lipidomics Gateway website as well as
LIPID MAPS databases via “Quick search”
Multi-purpose
Small “footprint”
High visibility (on home page)
Search the Lipidomics Gateway html pages
by keyword, or the databases by lipid class,
common name, systematic name or
synonym, mass, formula, InChIKey, LIPID
MAPS ID, gene or protein term.
Quick search query types
LIPID MAPS LM_ID
Lipid classification term
Lipid common/systematic
name or synonym
Lipid molecular formula
InChI Key
Lipid standard (name or LMID)
Gene/protein name/synonym
Keywords on Lipidomics
Gateway website pages
(personnel, publications, news,
updates, etc.)
Example
LMFA03010003
“Choline”, “prostaglandin”, “diterpene”
“Linoleic”, “HETE”, ”, “PAF”, “PGE”,
“5Z,8Z,14Z-eicosatrienoic”,
“PC(16:0/18:1(9Z))”
“MGDG “docosa”, “phytosphingosine”
C12H24O2
XEYBRNLFEZDVAW-ARSRFYASSA-N
sterol
FABP
“Atherosclerosis”, “Dennis”, “homeostasis”
Lipid Proteome Database (LMPD)
Species
Genes
Proteins
Human (Homo sapiens)
1116
2273
Mouse (Mus musculus)
1082
1504
Rat (Rattus norvegicus)
1258
1315
Rhesus monkey (Macaca mulata)
891
1634
Yeast (Saccharomyces cerevisiae
(s288c))
720
720
E. coli (Escherichia coli(K12))
245
245
C. elegans (Caenorhabditis elegans)
595
868
Drosophila (Drosophila melanogaster)
404
1064
1829
2447
638
647
Arabidopsis (Arabidopsis thaliana)
Zebrafish (Danio rerio)
LMPD:Data collection strategy
Lipid-related keywords in gene
names, metabolic pathways and
ontology terms
Manual curation
Entrez Gene ID list
NCBI Entrez
Python
program
UniProt
Gene, mRNA, protein data, PTM variants, motifs, homologs, crossreferences, related proteins, ontologies, annotations, etc.
LMPD database
LMPD organization:
Gene-> mRNA-> (apo)protein -> mature protein
Entrez Gene ID (DNA/genomic links)
RefSeq mRNA ID’s (both coding and UTR variants)
RefSeq protein ID’s and sequences (unique isoforms)
Post–translationally modified variants (e.g. apo-, mature
forms, leader sequences, etc.)
LMPD query page
LMPD overview page: listing of annotations and isoforms
LMPD gene orthologs, alignments, links
LMPD UniProt, domain/motif , related protein annotations
LMPD gene ontology/pathway annotations
LIPID MAPS REST interface
LIPID MAPS REST interface
Different input contexts:
Compounds
Genes
Proteins
Output formats: JSON, text, molfile, image
JSON
molfile
LIPID MAPS lipidomic pathways
Cholesterol Biosynthesis
TLR4 signaling
pathway
Overview of Quantitative Lipid Analysis by Mass Spectrometry
as performed by LIPID MAPS consortium on bone marrow derived macrophages (BMDM)
www.lipidmaps.org
LIPID MAPS funded by Glue
Grant from
:www.nigms.nih.gov
Extract bone marrow cells
Transfer to plates
Repeat 3x
(replicates)
Perform timecourse experiment
on plated cells
Aliquot samples for shipping to
core research labs
Fatty acids
Methanolic
HCl/Isooctane
extraction
Eicosanoids
Separate media
from cells
SPE extraction of
media
Glycerophospholipids
Cardiolipins
Glycerolipids
Methanolic
HCl/CHCl3
extraction
Methanol/CHCl3
extraction
EtOAc/isooctane
extraction
Cholesteryl esters
EtOAc/isooctane
extraction
DFPI
derivatization of
DAGS
Sterols
Sphingolipids
Saponification
Methanol/CHCl3 +
methanolic KOH
extraction
Methanol/CHCl3
extraction
Prenols
Methanol/CHCl3
extraction
SPE extraction
GC/MS analysis
LC/MS analysis
LC/MS analysis
LC/MS analysis
LC/MS analysis
LC/MS analysis
Deuterated standards
(reverse phase)
(normal phase)
(normal phase)
(normal phase)
(normal phase)
ESI-QTRAP viaMRM
methods
ESI-QTRAP
2-stage quantitation
ESI-QSTAR-XL using
MS/MS methods
Deuterated standards
Odd-chain standards
Odd-chain standards
ESI-QTRAP
ESI-QTRAP
[M+NH4]+ detection
mode
[M+NH4]+/neutral loss
detection mode
Deuterated standards
Deuterated standards
Combination of GC/MS,
LC/MS (reverse phase)
on ESI-QTRAP and APCIMS analysis
Deuterated standards
Combination of LC-C18,
LC-Si and LC-NH2
separation
ESI-QTRAP and API-3000
Triple Quad detection
with MRM methods
C12 analog standards
LC/MS analysis
(reverse phase)
QSTAR-XL via MRM
methods
Nor-dolichol/CoQ6
standards
BIOINFORMATICS
Data consolidation, normalization,
statistical analysis and databasing
Presentation in tabular and graphical
formats
For details of extraction, purification and quantitation by MS, see:
Lipidomics reveals a remarkable diversity of lipids in human
plasma. Quehenberger O et al.,J Lipid Res 51, 3299-3305 (2010).
A mouse macrophage lipidome. Dennis EA et al., J Biol Chem 285,
39976-39985 (2010)
Methods Enzymol. (Brown AH, ed.) 2007; Vol. 432 (multiple
chapters)
LIPID MAPS Bioinformatics Corea, UCSD, 9500 Gilman Dr, La Jolla, CA 92093; Department of Bioengineeringb, UCSD, 9500 Gilman Dr, La Jolla, CA, 92093
Data presentation formats
Graphical:
Tabular:
Heatmap:
Lipids
Integrated pathway/heatmap:
Genes
Dennis et al (2010)
J. Biol. Chem, 51, 39976-85
E. Fahy 2010 ©
Online lipid structure-drawing tools
http://www.lipidmaps.org/tools/index.html
Online drawing tools for
various lipid categories
(FA,GL,GP,SP,ST)
Structures viewable in Marvin,
JMol and Chemdraw format. May
be saved as Molfiles
E. Fahy 2010 ©
Online generation of glycan structures in full chair conformation
http://www.lipidmaps.org/tools/index.html
Sugars
Glc
Gal
GlcNAc
GalNac
Xyl
Fuc
Man
NeuAc
NeuGc
KDN
Anomeric Carbon
a or b linkages may
be specified
E. Fahy 2010 ©
Mass spectrometry prediction tools
Using virtual databases of structures based on commonly
occurring core structures and chains
Using known lipids in the LIPID MAPS structure
database (LMSD)
Creation of a virtual lipid database
Choice of range of acyl/alkyl chains
These are used to create “bulk” species e.g. PC(38:4), PE(O-36:0), Cer(d32:1), HexCer(d40:2),
TG(54:2), DG(32:0), FA(20:3(OH)), CE(18:1)
Conservative approach: stereochemistry, sn (glycerol) position, double bond/functional group
regiochemistry, double bond geometry not defined.
Links to: On-demand expansion of all possible chain combinations (within defined limits)
Links to: Matches of bulk species to discrete structures in LMSD database (examples)
Enumeration of “bulk” lipid species from
selected lists of acyl/alkyl chains
Suite of combinatorial expansion tools
Glycerolipids
Acyl CoA’s
Phospholipids
Acyl carnitines
Cardiolipins
Chol. esters
Sphingolipids
Wax esters
Fatty acids
Database of lipid
“bulk” species,
exact masses,
formulae,
annotations
Virtual database of bulk lipids: number of entries per class
Monoradylglycerols
84
Diradylglycerols
615
Triradylglycerols
Fatty acids
13590
Acyl carnitines
78
1844
Chol. Esters
78
Digalactosyl DG's
553
Acyl CoA's
78
Monogalactosyl DG's
553
Wax esters
403
Sulfoquinovosyl DG's
553
Ceramides
258
PA
696
Ceramide phosphates
258
PC
696
PE-Ceramides
230
PE
696
PI-Ceramides
230
PG
696
Mannosyl-di-IP-ceramides
258
PI
696
Mannosyl-IP-ceramides
258
PIP
696
Hexosyl ceramides
258
PS
696
Lactosyl ceramides
258
Cardiolipins
375
Sphingomyelins
258
Sulfatides
258
Precursor ion search interface to virtual database
Input: Either copy/paste a list of precursor ions or upload a peaklist file
Input parameters: Mass tolerance, ion type, all chains or even chains, sort results
Optionally restrict search to one or multiple lipid species
Results page for precursor ion search
Output: view in online format (below) or as tab-delimited text file
Output features: Sub-table for each input ion.
Links: On-demand expansion of all possible chain combinations (abbreviation)
Links: Matches of bulk species to discrete structures in LMSD database (examples)
Expansion of species level to display all possible chain combinations
within defined chain and chain/double-bond ratio limits
Links to examples of discrete structures in LMSD database with the
identical bulk structure
*This feature was
implemented by computing
the “bulk” abbreviation
(where possible) for every
structure in the LMSD
database
Educating the public about lipids
Educating the public about lipids: LIPID MAPS tutorials
http://www.lipidmaps.org
LIPID MAPS
®
Download