PPT - Bioinformatics Research Group at SRI International

advertisement
Dealing With the Unknown
Metabolomics & Ben Bowen
Metabolite Atlases
Pathway
Tools
Workshop
2010
Acknowledgements
Trent Northen
Richard Baran
Wolfgang Reindl
Do Yup Lee
Jane Tanamachi
Jill Banfield
Curt Fisher
Paul Wilmes
US Department of Energy
BER Genome Sciences
Program
LC-MS/MS Workflow
metabolite
solvent extraction
Sample independent:
suitable for
unsequenced
organisms and
communities
HPLC
(C18;
hilic)
C18NEG/255.22807/3.39329/Hexadecanoic acid;
C18NEG/255.22862/4.89002/Hexadecanoic acid;
C18NEG/248.8424/1.47135/24-Dibromophenol;
C18NEG/112.98576/27.34079/Acetylenedicarboxylate;
C18NEG/270.82471/1.34821/
C18NEG/168.88735/1.29241/
Metabolite ‘features’
&
Quantification
AGILENT 6520 QTOF
MS/MS
How a data point becomes a compound
From
Feature to
Formula
Photo: John Waterbury, Woods Hole
Oceanographic Institute (DOE)
Annotation
of
Metabolite
Atlases
From
Formula to
Compound
•Selection of features
•Pure Spectra
•Isotopic pattern fitting
•Stable Isotope Labeling
•Exact Match to MS/MS Spectra
•Partial Match to MS/MS Spectra
•Exchangable hydrogen
•Retention time
•Authentic standards
•Other (NMR & Synthesis)
•Define feature in database
•Sample Metadata
•Extraction methods
•LC/MS methods
•mz@rt annotations
Systems biology depends on accurate
models
Analysis of MetaCyc shows many unique
formulas are shown in only a few reactions or
pathways
Pathway Specific Markers
Or
Sparsity of Knowledge
•Models provide a framework to prove or disprove observations.
•Highlight gaps in annotations when new compounds are
discovered
Using inexact mass for formula ID
C & N Isotopic Labels
Isotopic Pattern Fitting
Reduce Degeneracy About m/z value
Mass and Degeneracy are Correlated
Heuristically Filtered
Brute Force Method
Large-scale formula determination using
stable isotopic labeling
PROBLEM:
Difficult to ID many
metabolites give
low coverage of CONTROL
authentic
standards
Approach: Stable
isotope labeling
(SIL) for direct
empirical formula
determination
Na15NO3
NaH13CO3
Baran et. al. Untargeted metabolite profiling of Synechococcus sp. PCC 7002 reveals a large fraction of
unexpected metabolites (Analytical Chemistry 2010)
Less Degeneracy Isn’t Better
We Prefer to Work With Unique Chemical Formulae
Heuristically Filtered Only
Unfiltered + SIL
Heuristically Filtered + SIL
Noise & Isotopic Patterns
Initial focus is on Synechococcus sp a
simple yet important model system
Simple system
For method
Widely distributed and globally
important in carbon cycling
development
1. Photosynthetic bacteria
2. Small genome (3299
ORFs)
3. ~fast growing and easy to
grow
4. No metabolite
background (salt media)
5. Adaptable: 0-2M salt, T up
to 45C
Benefits of Using SIL
• Are the signals being
measured biological?
• What type of ion is the
signal?
• Has this signal been seen
before?
• What compound(s) is it?
• What else in the sample
behaves like that
compound?
Global
Profiling
SIL
Standards
Stable isotope labeling
Control
[15N]NaNO3
15N
[13C]NaHCO3
13C
Stable isotope labeling
Non-biological features dominate
•Manually curated
•Computationally Identified
•Sets are constructed by grouping features by retention time
Results



~100 distinct metabolites detected
82 assigned chemical formulas
 74 unique
 45 outside of Syn7002Cyc
 24 outside of MetaCyc or KEGG
54 identified or putatively identified
metabolites
 Using authentic standards or
MS/MS
Most dominant biological features
Formula
Metabolite
Peak height
Cell extract
Media extract
(+)
(-)
(+)
(-)
Formula matches in
7002
MetaCyc
KEGG
C9H18O8
C5H9NO4
(Glucosylglycerol)
452242
658300
1
2
2
Glutamate
228714
44229
3
9
10
C25H40N2O18
C25H40N2O18
(Hexos(amine)-based oligomer)
184691
90745
0
0
0
(Hexos(amine)-based oligomer)
174581
152126
0
0
0
C9H16O9
C12H22O11
(Glucosylglycerate)
(2Hexoses-H2O)
39066
163000
0
2
1
19819
83700
2
26
29
C9H15N3O2
(NNN-trimethylhistidine)
69974
2444
0
1
1
Putative hexose(amine)-based trisaccharide:
Excreted metabolites
Formula
Metabolite
C9H11NO2
C3H7NO2
Phenylalanine
Peak height
Cell extract
Media extract
(+)
(-)
(+)
(-)
Formula matches in
7002
MetaCyc
KEGG
12860
8878
24417
8259
1
4
4
(Alanine)
3987
7325
2479
1500
4
7
8
C6H13NO2
C6H13NO2
Isoleucine
1200
1301
4427
1532
2
8
11
Leucine
2089
1992
4093
1707
2
8
11
C11H12N2O2
C5H11NO2S
Tryptophan
1778
2264
929
1
2
7
Methionine
950
1
5
4
C5H11NO2
C10H14N2O6
Valine
600
1
8
10
570
0
0
2
C11H15N5O5
C11H15N5O4
Methylguanosine
350
140
0
3
1
Methyladenosine
310
0
1
2
Methyluridine
220
Histidine-betaine derivatives
O
N
OH
NH
HO
N


O
N
Previously only to attributed to
non-yeast-fungi and
Actinomycetales bacteria
Culture purity validated by PCR
of markers of ribosomal RNA
and sequencing
OH
NH
N
O
N
HS
OH
NH
N
N2-acetyllysine
Lysine biosynthesis VI (Syn7002Cyc)
Lysine biosynthesis V (Syn7002Cyc)
Analyze selected features by MS/MS
Target features at specific
m/z & r.t.
MS/MS structural confirmation
• Commercial
Standards
• Metlin
• Massbank
• Collaborating to
expand the number
of authentic
standards (Siuzdak,
Mukhopadhyay) and
make these publically
available.
De novo MS/MS analysis
5-methyluridine
Proton Painting
CiHjOkNxPySz  Ci (HNj1HEXj2) OkNxPySz
j=j1+j2
Chemical properties in addition to m/z
decyldimethylammoniopropane sulfonate
Glycylglycine
Lipids from microbial communities
• Unlabeled
• 15N labeled
•
2H
labeled (exchangeable)
• Sample independent
Resolve Isomers of lysolipids
Pure-Spectra Includes Ca2+ & Fe2+
Adducts
Absolute abundance of L-PE features is much
higher in a “friable” sample.
AB Muck
DS2
AB Muck
Friable
Relative abundance of various PEs changes
with development stage.
Moving from features to formulas to
metabolites is challenging
m/z 205.097
Chemical
formula
determination
Time (sec)
C11H12N2O2
Structural
analysis
After 12
Observations
Retention Time Correlation
Store retention time correlations
SIL Automatic Annotation
Test the fit for all possible
formulas for common
ionization mechanisms
Label Purity
and Percent
Incorporation
are Parameters
Correlation and mass defect analysis
11
x 10
12
12
x 10
C2H4
8
C2H4
G()
3
G()
4
10
6
4
2
0
28
28.02
28.04

2
1
Kendrick Mass Defect
Kendrick Mass Defect
0
0
0
-0.1
-0.26
-0.28
-0.3
-0.32
-0.2
650
-0.3
-0.4
200
400
600
Nominal Mass
800
700
750
Nominal Mass
800
50
100

150
28.06
Modular Metabolome
Autocorrelation Spectra of
unprocessed data
H2O
Find the dominant mass differences in data
Estimate the likelihood of all possible
chemical differences
0.06
Correlation, G(
)
0.05
0.04
How can you know that this is CH2?
0.03
0.02
0.01
0
13.99
14
14.01
14.02
14.03
m/z lag, 
14.04
14.05
14.06
What can be resolved
1
0.8
G()
0.6
0.4
0.2
1
-2
-1
0
*
1
2
3
x 10
-3
0.8
Mass of an
electron
shown for
scale
0.6
G()
0
-3
0.4
0.2
0
0.98
0.99
1
1.01

1.02
1.03
1.04
1.05
Time and Mass Correlation
C2H4: Positive Time
Correlation
Neutron: Zero
Time
Correlation
H2O: Mixture of: Zero
Time and Negative Time
Correlation
Relate back to features
Correlation, G()
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
16.94 16.96 16.98
17
17.02 17.04 17.06 17.08 17.1
m/z lag, 
17.12
Microbial Metabolite Atlases
5
x 10
6
From
Features to
Pure Spectra
intensity
5
4
3
2
1
0
900
1000
1100
retention time (sec)
2500
5
x 10
retention time (sec)
intensity
6
Within one experiment: 1000s of features
from 100s of metabolites
4
2
0
600
2000
1500
1000
500
0
800
1000
1200
1400
1600
1800
retention time (sec)
2000
2200
2400
500
1000
m/z
1500
2000
The End
Download