Metabolomics

EMBO Practical Course
on Metabolomics Bioinformatics for Life Scientists
“Dissecting an untargeted metabolomic workflow”
Oscar Yanes, PhD
Untargeted metabolomics workflow
Sample
preparation
Experimental
design
Sample analysis
by MS and NMR
Pre-processing
data analysis
Metabolite
identification
Experimental
validation
Hypothesis
Untargeted metabolomics workflow
Sample
preparation
Experimental
design
Sample analysis
by MS and NMR
Pre-processing
data analysis
EMBO Course
Metabolite
identification
Experimental
validation
Hypothesis
Ultimate goal of metabolomics
List of metabolites differentially
regulated
Biomarker discovery
Pathway analysis
Model construction Scientific literature
Disease vs. control
Mechanism
Validation
Hypothesis
Untargeted metabolomics workflow
Sample
preparation
Experimental
design
Sample analysis
by MS and NMR
Pre-processing
data analysis
Metabolite
identification
Experimental
validation
Hypothesis
THE IMPORTANCE OF EXPERIMENTAL DESIGN
I want to do
metabolomics
ME
COLLABORATOR
THE IMPORTANCE OF EXPERIMENTAL DESIGN
…
I want to do
metabolomics
ME
COLLABORATOR
THE IMPORTANCE OF EXPERIMENTAL DESIGN
I have many
samples at -80°C.
Could you do
metabolomics and
find out something?
ME
COLLABORATOR
THE IMPORTANCE OF EXPERIMENTAL DESIGN
I have many
samples at -80°C.
Could you do
metabolomics and
find out something?
!!
ME
COLLABORATOR
THE IMPORTANCE OF EXPERIMENTAL DESIGN
BASIC DIAGRAM OF A MASS SPECTROMETER
BASIC DIAGRAM OF A MASS SPECTROMETER
Gas-phase:
Gas chromatography
Liquid-phase:
Liquid chromatography
Capillary electrophoresis
Solid-phase:
Surface-based
BASIC DIAGRAM OF A MASS SPECTROMETER
Electron ionization (EI)
Chemical ionization (CI)
Atmospheric pressure chemical ionization (APCI)
Electrospray ionization (ESI)
Laser desorption ionization (LDI)
Watch out serum/plasma samples from biobanks!
Lactate
Glucose
1.0
Area/Area (IS)
Area/Area (IS)
0.4
0.3
0.2
0.1
0.0
0
4
12
24
0.8
0.6
0.4
0.2
0.0
0
Time (h)
4
24
Time (h)
Pyruvic Acid
Choline
0.2
1.0
Area/Area (IS)
Area/Area (IS)
12
0.1
0.0
0
4
12
Time (h)
24
0.8
0.6
0.4
0.2
0.0
0
4
12
Time (h)
24
Untargeted metabolomics workflow
Sample
preparation
Experimental
design
Sample analysis
by MS
Pre-processing
data analysis
Metabolite
identification
Experimental
validation
Hypothesis
Requisite for untargeted metabolomics
Maximize ionization efficiency
over the whole mass range (e.g., m/z 80-1500)
Requisite for untargeted metabolomics
Maximize ionization efficiency
over the whole mass range (e.g., m/z 80-1500)
Number of features
Intensity of the features
Requisite for untargeted metabolomics
Maximize ionization efficiency
over the whole mass range (e.g., m/z 80-1500)
Number of features
Coverage of the metabolome
Intensity of the features
Accurate quantification
and identification of metabolites
How do we increase the number of
features and their intensity??
intensity
mass
time
Feature: molecular entity with a unique m/z and retention time value
How do we increase the number of
features and their intensity??
intensity
mass
time
Sample preparation: Chromatography:
- Extraction method - Stationary-phase
- Mobile-phase
Ion Funnel Technology
etc.
Extraction method
Hot EtOH/Amm. Acetate
Cold Acetone/MeOH
Only 45% of the metabolites are
detected with Acetone/MeOH
MS/MS threshold
Extraction method
Yanes O., et al. Anal. Chem. 2011; 83(6):2152-61
Liquid Chromatography: mobile-phase
Ammonium Fluoride
Ammonium acetate
Formic acid
Yanes O et al. Anal. Chem. 2011; 83(6):2152-61
Ammonium fluoride
Ammonium acetate
FAmmonium fluoride
Chromatography: stationary phase
HILIC
RP C18/C8
Effect of pH; ammonium salts; ion pairs (e.g. TBA)
LC flow rate and pressure: UPLC vs. HPLC vs. nanoLC (vs. GC!)
HPLC
UPLC
Minutes
Minutes
BASIC DIAGRAM OF A MASS SPECTROMETER
Electron ionization (EI)
Chemical ionization (CI)
Atmospheric pressure chemical ionization (APCI)
Electrospray ionization (ESI)
Laser desorption ionization (LDI)
PRACTICAL ASPECTS
1. Number of scans/second
Implications in LC/MS and GC/MS:
Quantification
Maximum intensity or integrated area
2. Instrument resolution
Implications:
Detector saturation
Quantification
3. Sample amount injected
Implications:
Detector saturation
Untargeted metabolomics workflow
Sample
preparation
Experimental
design
Sample analysis
by MS and NMR
Pre-processing
data analysis
EMBO Course
Metabolite
identification
Experimental
validation
Hypothesis
RAW METABOLOMICS DATA
FROM RAW DATA TO METABOLITE IDs
METABOLITE IDENTIFICATIONS
STATISTICAL ANALYSIS
PRE-PROCESSING
RAW DATA CONVERSION
FROM RAW DATA TO METABOLITES IDs
GC/MS
METABOLITE
IDENTIFICATIONS
RAW DATA
CONVERSION
LC/MS
PREPROCESSING
LC/MS
GC/MS
STATISTICAL
ANALYSIS
PATHWAY
ANALYSIS
LC-MS WORKFLOW
LC-MS RAW
DATA
PROTEOWIZARD
mZDATA
PREPROCESSING
mZRT1
M1
mZRT1
I M1
M2
...
...
...
mZRT2
mZRT3
...
...
mZRT2
...
...
...
I M2
STATISTICAL ANALYSIS
IDENTIFICATION
mZRT Features Table
Feature: individual ions
with a unique mass-tocharge ratio and a
unique retention time
LC-MS WORKFLOW
RAW LC-MS DATA TO mZXML: PROTEOWIZARD
[Nature Biotechnology, 30 (918–920) (2012)]
VENDOR
Agilent
Bruker
Thermo Fisher
Waters
AB Sciex
FORMATS
MassHunter.d
Compass.d, YEP, BAF, FID
RAW
MassLynx.raw
WIFF
CONVERTER
ProteoWizard
ProteoWizard
ProteoWizard
ProteoWizard
ProteoWizard
LC-MS WORK-FLOW
XCMS PRE-PROCESSING
•http://metlin.scripps.edu/download/
•Free & Open Source
•Based on R
•On-line version
•Suitable for:
-GC-MS
-LC-MS
Analytical Chemistry, 78(3), 779–787, 2006
Analytical Chemistry, 84(11), 5035-5039, 2012
LC-MS WORKFLOW
XCMS PRE-PROCESSING
1. FEATURE DETECTION
[BMC Bioinformatics, 2008 9:504]
LC-MS WORKFLOW
XCMS PRE-PROCESSING
1. FEATURE DETECTION
1. Dense regions
in m/z space
2. Gaussian
peak shape in
chromatogram
LC-MS WORK-FLOW
XCMS PRE-PROCESSING
2. RETENTION TIME CORRECTION
LC-MS WORKFLOW
• 103-104 mZRT features  IDENTIFICATION NOT FEASIBLE!
• features redundancy:
-adducts: [M+H+], [M+Na+], [M+NH4+], [M+H+-H2O]…
-isotopes: [M+1], [M+2], [M+3]
• Many mZRT features are noisy in nature and irrelevant to
our phenomea
STATISTICAL ANALYSIS
FEATURES RANKING
Those features varying according to our phenomena
are retained to further identification experiments
LC-MS WORK-FLOW
FEATURES RANKING CRITERIA
(I) ANALYTICAL VARIABILITY
-RANDOMIZE
-USE QCs TO CHECK ANALYTICAL
VARIATION
WORKLIST
LC-MS WORK-FLOW
FEATURES RANKING CRITERIA
(I) ANALYTICAL VARIABILITY
T
CV mZRT ( j ) 
T
S mZRT ( j )
X
T
mZRT
 100
( j)
QC
CV mZRT ( j ) 
QC
S mZRT ( j )
X
QC
mZRT
( j)
 100
USEFUL PLOTS IN EXPLORATORY
DATA ANALYSIS
RETINAS
Hypoxia (N=12) vs Normoxia (N=13)
#mZRT=7654
NEURONAL CELL CULTURES
KO (N=15) vs WT (N=11)
#mZRT=6831
LC-MS WORK-FLOW
FEATURES RANKING CRITERIA
(IV) HYPOTHESIS TESTING+FDR
=0.05 (235 features significantly varied by chance, 26% out of 900)
FDR=0.0074 (20 features varied by chance, 5% out of 404)
#features=4704
USEFUL PLOTS IN EXPLORATORY
DATA ANALYSIS
RETINAS
Hypoxia (N=12) vs Normoxia (N=13)
#mZRT=7654
NEURONAL CELL CULTURES
KO (N=15) vs WT (N=11)
#mZRT=6831
USEFUL PLOTS IN EXPLORATORY
DATA ANALYSIS
RETINAS
Hypoxia (N=12) vs Normoxia (N=13)
#mZRT=7654
NEURONAL CELL CULTURES
KO (N=15) vs WT (N=11)
#mZRT=6831
LC-MS WORKFLOW
10M data points
# mZRT=51908
(i) analytical variability
# mZRT=38377
(ii) features intensity
# mZRT=4704
(iii) hypothesis testing + fold change
# mZRT=250
Annotation
Data Base look-up
Identification experiments
10-50
differential
metabolites
Workflow for Metabolite Identification
Step 1: Select interesting features
Step 2: Search databases for accurate mass
Step 3: Filter “putative” identification list
Step 4: Compare RT and MS/MS of standards
Workflow for Metabolite Identification
Step 1: Select interesting features
Step 2: Search databases for accurate mass
Step 3: Filter “putative” identification list
Step 4: Compare RT and MS/MS of standards
Workflow for Metabolite Identification
Step 1: Select interesting features
Step 2: Search databases for accurate mass
Step 3: Filter “putative” identification list
Step 4: Compare RT and MS/MS of standards
Step 2: Search databases for accurate mass
Step 2: Search databases for accurate mass
Each feature returns many hits.
HMDB
Metlin
Step 2: Search databases for accurate mass
Common adducts Na+, NH4+, K+, Cl-, and H2O loss
Adducts increase number of hits returned!
Workflow for Metabolite Identification
Step 1: Select interesting features
Step 2: Search databases for accurate mass
Step 3: Filter “putative” identification list
Step 4: Compare RT and MS/MS of standards
Step 3: Filter “putative” identification list
Eliminate
•drugs?
• intensity in the mass spectrum
• adducts?
• matches with obviously inconsistent retention times
Example: feature with m/z 733.56 is unlikely to be a phospholipid
if it has a 1-min RT with reverse-phase chromatography.
Look for hits that implicate the same
pathway, give those features priority.
Standards can be expensive, your intuition will save you money
and time!
Workflow for Metabolite Identification
Step 1: Select interesting features
Step 2: Search databases for accurate mass
Step 3: Filter “putative” identification list
Step 4: Compare RT and MS/MS of standards
What experimental data should be required to
constitute a metabolite identification?
• Accurate mass?
• Retention time?
• MS/MS data?
Unlike proteomics, no journals
have requirements or
guidelines for publication of
metabolite identifications.
accurate mass
“The identification of
certain metabolites as their
exact masses in their given
biological context was
strategic in the context of
searching for biomarkers
for CD.”
accurate mass and retention time
“…this method enables
untargeted profiling of
metabolites using accurate
mass-retention time (AMRT)
identifiers.”
accurate mass, retention time, and MS/MS
“Metabolites were putatively
identified on the basis of accurate
mass and retention time, and
confirmed by comparing MS/MS
data of unknowns to model
compounds.”
accurate mass
“The identification of
certain metabolites as their
exact masses in their given
biological context was
strategic in the context of
searching for biomarkers
for CD.”
Accurate mass identifications are putative
All structures have a neutral mass of 146.0691
Mass error (even if small) and adducts add more possibilities!
accurate mass
“The identification of
certain metabolites as their
exact masses in their given
biological context was
strategic in the context of
searching for biomarkers
for CD.”
accurate mass and retention time
“…this method enables
untargeted profiling of
metabolites using accurate
mass-retention time (AMRT)
identfiers.”
accurate mass, retention time, and MS/MS
“Metabolites were putatively
identified on the basis of accurate
mass and retention time, and
confirmed by comparing MS/MS
data of unknowns to model
compounds.”
accurate mass and retention time
“…this method enables
untargeted profiling of
metabolites using accurate
mass-retention time (AMRT)
identfiers.”
Many structural isomers have the retention time
citrate
isocitrate
Citrate and isocitrate
have the same
retention time but
different MS/MS
patterns.
accurate mass
“The identification of
certain metabolites as their
exact masses in their given
biological context was
strategic in the context of
searching for biomarkers
for CD.”
accurate mass and retention time
“…this method enables
untargeted profiling of
metabolites using accurate
mass-retention time (AMRT)
identfiers.”
accurate mass, retention time, and MS/MS
“Metabolites were putatively
identified on the basis of accurate
mass and retention time, and
confirmed by comparing MS/MS
data of unknowns to model
compounds.”
accurate mass, retention time, and MS/MS
“Metabolites were putatively
identified on the basis of accurate
mass and retention time, and
confirmed by comparing MS/MS
data of unknowns to model
compounds.”
Step 4: Compare RT and MS/MS of standards
Standard7α-hydroxy-cholesterol
H
367.33
H
H
Q-TOF
H
H
HO
OH
367.33
Biological sample
60
100
140
180
220
260
Mass-to-Charge (m/z)
300
340
380
420
Step 4: Compare RT and MS/MS of standards
Retention time will be available from the profiling experiment,
however, to obtain MS/MS data for the feature of interest in
the research sample typically another experiment is required.
Note: Only need to
perform MS/MS on one
research sample. Pick a
sample from the group for
which the feature is upregulated!
Do not pick
this group
What if feature of interest is not in the database?
(or model compound is not commercially available)
FT-ICR MS can be used to limit chemical formulas
MS/MS can be insightful to reveal structural insight
(MS/MS library, bioinformatic approaches)
NMR can provide structural details
When a chemist is your best friend…
What if feature of interest is not in the database?
(or model compound is not commercially available)
FT-ICR MS can be used to limit chemical formulas
MS/MS can be insightful to reveal structural insight
(MS/MS library, bioinformatic approaches)
NMR can provide structural details
When a chemist is your best friend…
What if feature of interest is not in the database?
(or model compound is not commercially available)
FT-ICR MS can be used to limit chemical formulas
MS/MS can be insightful to reveal structural insight
(MS/MS library, bioinformatic approaches)
NMR can provide structural details
When a chemist is your best friend…
What if feature of interest is not in the database?
(or model compound is not commercially available)
FT-ICR MS can be used to limit chemical formulas
MS/MS can be insightful to reveal structural insight
(MS/MS library, bioinformatic approaches)
NMR can provide structural details
When a chemist is your best friend…
• Thermophile organism adapted to live at high
temperatures.
• Organisms challenged with cold temperature (72 º C)
and compared to high-temperature (95 º C) controls.
Feature up-regulated at cold temperature
Natural product
*
N1-Acetylthermospermine
Identification???
*
Feature up-regulated at cold temperature
Natural product
*
N1-Acetylthermospermine
*
Intensity of m/z 112
fragment is significantly
different. NOT A MATCH!
Chemical synthesis of hypothesized structure is
required
Synthesized metabolite produces comparable MS/MS
data as natural product from Pyrococcusfuriosus.
Natural product
N4(NAcetylaminopropyl)spermidine
N1-Acetylthermospermine
Ultimate goal of metabolomics
List of metabolites differentially
regulated
Biomarker discovery
Pathway analysis
Model construction Scientific literature
Disease vs. control
Mechanism
Validation
Hypothesis
Validate your metabolites!!
Targeted metabolomics
Molecular biology techniques
LC and GC-Triple quadrupole MS
Immunohistochemistry
Reverse Transcription-PCR
Gene expression array
Cell cultures
Animal experimentation
…..
Thank you
email: oscar.yanes@urv.cat
web: www.yaneslab.com
Twitter: @yaneslab