Metabolomics

advertisement
Metabolomics
PCB 5530
Tom Niehaus
Fall 2014
Learning Outcomes
Day 1
• Lecture
- Learn the basics of metabolomics
- Understand the limitations of metabolomics
- Things to consider when using metabolomics for your own research
Day 2
• Finish lecture
• Activity 1: Identifying an unknown peak
• Activity 2: Analyzing a metabolomics dataset
Definitions and Background
Metabolome = the total metabolite pool
• All low molecular weight (< 2000
Da) organic molecules in a sample
such as a leaf, fruit, seedling, etc.
Sugars
Nucleosides
Organic acids
Ketones
Aldehydes
Amines
Amino acids
Small peptides
Lipids
Steroids
Terpenes
Alkaloids
Drugs (xenobiotics)
Definitions and Background
Metabolomics = high-throughput analysis of metabolites
Metabolomics is the simultaneous measurement of the levels of a large
number of cellular metabolites (typically several hundred). Many of these
are not identified (i.e. are just peaks in a profile).
Not
hypothesis
driven
snapshot
Definitions and Background
Definitions and Background
Scope
Metabolomics
-measure many
compounds
Metabolic profiling
-measure a set of
related compounds
Targeted analysis
-measure a specific
compound
Accuracy
Definitions and Background
History and Development
• Metabolic profiling is not new. Profiling for clinical detection of human disease using
blood and urine samples has been carried out for Centuries.
This urine wheel was
published in 1506 by
Ullrich Pinder, in his book
Epiphanie Medicorum.
The wheel describes the
possible colors, smells
and tastes of urine, and
uses them to diagnose
disease.
Nicholson, J. K. & Lindon, J. C. Nature
455, 1054–1056 (2008).
Definitions and Background
History and Development
• Advanced chromatographic separation techniques were developed in the late
1960’s.
• Linus Pauling published “Quantitative Analysis of Urine Vapor and Breath by GasLiquid Partition Chromatography” in 1971
• Chuck Sweeley at MSU helped pioneer metabolic profiling using gas chromatography/
mass spectrometry (GC-MS)
• Plant metabolic biochemists (e.g. Lothar Willmitzer) were among other early
leaders in the field.
• Metabolomics is expanding to catch up with other multiparallel analytical
techniques (transcriptomics, proteomics) but remains far less developed and less
accessible.
Definitions and Background
Plant Metabolome Size
• It is estimated that all plant species contain 90,000 - 200,000 compounds.
• Each individual plant species contains about 5,000 – 30,000 compounds.
e.g. ~ 5,000 in Arabidopsis
The plant metabolome is much larger than that of yeast, where there are far
fewer metabolites than genes or proteins (<600 metabolites vs. 6000 genes).
The size of the plant metabolome reflects the vast array of plant secondary
compounds. This makes metabolic profiling in plants much harder than in other
organisms.
Definitions and Background
The Power of Metabolomics
Silent Knockout Mutations.
~90% of Arabidopsis knockout mutations are silent – i.e. have no visible phenotype
and so provide no clues to gene function. (The search for some sort of visible
phenotype therefore often becomes desperate.) The situation in yeast is similar –
up to 85% of yeast genes are not needed for survival.
When there is little or no change in growth rate (visible phenotype) of a knockout
mutant, the pool sizes of metabolites have altered so as to compensate for the
effect of the mutation, leaving metabolic fluxes are unchanged. Thus – intuitively –
mutations that are silent when scored for metabolic fluxes or growth rate (growth
rate is the sum of all metabolic fluxes) should have obvious effects on metabolite
levels. There is a firm theoretical basis for this in MCA.
Definitions and Background
The Power of Metabolomics
Example.
• In the Chloroplast 2010 project
(phenotype analysis of knockouts of
Arabidopsis genes encoding predicted
chloroplast proteins):
• Various knockouts
showed essentially
normal growth and
color but highly
abnormal free
amino acid profiles,
e.g. At1g50770
(‘Aminotransferaselike’)
Definitions and Background
Limitations of metabolomics
• High biological variance in metabolite levels (i.e., high variation between
genetically identical plants grown in the same conditions)
• Unlike nucleic acids and proteins, metabolites have a vast range of chemical
structures and properties. Their molecular weights span two orders of magnitude
(20–2000 Da). Therefore no single extraction or analysis method works for all
metabolites. (Unlike DNA sequencing, microarrays, MS analysis of proteins – all
are general methods.)
• The concentrations of various metabolites can vary dramatically from mM to pM
concentrations.
• Some metabolites are labile and won’t survive extraction and analysis
• Issues with chromatography, detection, and data analysis
Metabolomics
Steps in metabolomics
sample preparation
sample extraction
chromatography
data analysis
detection
Sample Preparation
Growth/Sample Size
• Grow organisms (e.g. plants or bacteria) under identical conditions
• Randomize the treatment groups
(Make sure the effects you measure are due to the variable being testing)
• number of replicates… depends on what you want to find
- Large differences = small replication needed
- Small differences = large replication needed
• In general, six replicates for each treatment are needed (due to
high biological variability)
Sample Preparation
Sample collection
• Uniform sample sizes (e.g. hole punches in leaves)
• Be consistent
- similar tissue
- time of day
• Quickly freeze sample in liquid nitrogen, store
samples at -80°C
• Fast-harvesting method for bacteria (~30 sec)
Sample Extraction
Choosing an extraction method
• No universal extraction method exists
• Some solvents may degrade certain compounds
• Its good to have some idea of what metabolites you want to extract
Sample Extraction
Sample extraction
• The method should be consistent and reproducible
SPEX SamplePrep Grinder
• Further workup may be required (e.g. solid phase extraction)
Chromatography
introduction
• Invented in 1900 by Mikhail Tsvet (used to separate plant pigments)
• There are several types of chromatography, but all consist of a
stationary phase and a mobile phase. Compounds are separated
based on differential partitioning between the two phases.
• Types include:
- TLC (thin-layer chromatography)
- GC (gas chromatography)
Y
- LC (liquid chromatography)
GC and LC are routinely used in
metabolomics
Chromatography
Gas Chromatography
• GC = ‘good chromatography’
• optimized over several decades
• ~5 columns routinely used
(5% diphenyl/95% methyl siloxane)
• high reproducibility
Identification based on RT
Limitations:
- high temperatures can destroy labile compounds
- polar compounds cannot ‘fly’ on GC columns and must first
be derivatized
Chromatography
Sample derivatization
Step 2) Silylation
100
Abundance
Step 1) Methoximation
115
96
340
207
91
107
231
128
50
147
193
163
141
371
218
177
283
244
267
312
356
298
388 401 415
383
415
0
298
244 257
189
141
96
50
91
163
435
457 475 489
356
312
271
283
205
218
107
128
231
371
100
m/z
115
80
110
340
140
170
200
estrone minor_RI 950990
230
260
290
320
350
380
410
440
estrone major_RI 948753
470
500
Z/E isomer have same mass spectrum
but differ 2 seconds in retention time
Gas chromatography requires volatile compounds (two step derivatization in vial)
1) Methoximation of aldehyde and keto groups (primarily for opening reducing ring sugars)
2) Silylation of polar hydroxy, thiol, carboxy and amino groups with silylation agent MSTFA
•
•
A single compound with multiple active groups will result in multiple peaks (1TMS, 2TMS, 3TMS)
GC-MS can distinguish between stereoisomers
Anal Chem. 2009 Dec 15;81(24):10038-48. doi: 10.1021/ac9019522.
FiehnLib: mass spectral and retention index libraries for metabolomics based on quadrupole and time-of-flight gas chromatography/mass spectrometry.
Kind T, Wohlgemuth G, Lee do Y, Lu Y, Palazoglu M, Shahbaz S, Fiehn O.
20
Chromatography
Liquid Chromatography
• LC = ‘Lousy chromatography’
• fairly new, recent advances
• thousands of columns available
- normal phase
-ion exchange
- reverse phase
-HILIC
• infinite solvent systems possible
• low reproducibility
Advantages:
- compound can be collected after separation
- derivatization not necessary
- a separation protocol can be optimized for nearly any compound
Detection
Mass Spectrometry
• mass spectrometry is a technique to measure the mass of ions (m/z)
• All mass spectrometers perform three main tasks:
1) Ionize molecules:
2) Use electric and magnetic fields to accelerate ions and manipulate their flight:
3) Detect ions (convert to electronic signal):
Detection
Mass Spectrometry
relative abundance
Example mass spectrum:
m/z
Detection
Mass Spectrometry
Normalized
Intensity
100
Peak selector
Chromatogram (GC-MS)
75
50
25
0
0
Normalized
Intensity
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
18.00
20.00
100
22.00
Time [min]
166
Mass spectrum (EI)
75
97
129
83
61
50
47
25
35
70
0
30
40
50
60
70
112
80
90
100
110
119
120
130
140
150
160
170
m/z
Mass Spectrometry
Ionization: chemical vs electon
Chemical
Ionization (+)
[M+H]+
70eV; 500uA Emission; 40% CI gas; mass range 65-800;
ScanRate0.2-0.03; source tempe 200C;PushInter 40
397.1690
100
TOF MS CI+
1.43e+003
[M+28]+
%
396.1648
398.1729
[M+40]+
399.1714
400.1689 401.1606
395.2110
Electron
Ionization (+)
0
394.0
396.0
398.0
400.0
402.0
Accurate mass [u]
Mass accuracy [ppm]
Isotopic abundance error [%]
A+1 [%]
A+2 [%]
A+3 [%]
404.0
406.0
397.1690
5
5
37.90
17.84
5.03
• [M+H]+ is very abundant in chemical ionization (CI)
• Different ionization gases can be used such as NH3, methane, butane
Example picture: adduct ions at M+28.02=[M+C2H5]+ and M+40.04=[M+C3H5]+ are
used for verification of [M+H]+
m/z
Adduct formation – expect the unexpected
Adduct ion
Percent [%] Adduct ion
Percent [%] Adduct ion
Percent [%] Adduct ion
Percent [%] Adduct ion
Percent [%]
[M+H]+
62.55381
[M+H-C3H8O]+
0.02667
[M-CCl3]+
0.00381
[M(37Cl)]+.
0.00190
[M-2H+Na]-
0.00127
[M+2H]2+
11.44459
[M-H-H2O-CO2]-
0.02667
[M-H-CO2]-
0.00381
[M-CH3]+
0.00190
[M-H+Co]+
0.00127
[M+H-H2O]+
8.77598
[M-H-H2O-HCO2H]-
0.02667
[M+H-C5H7PO6]+
0.00381
[M+H-C4H11N]+
0.00190
[M+H-(CH3)2NH-C3H6]+
0.00127
[M-H]-
6.25214
[M+H-3H2O]+
0.02540
[M+H-HCl]+
0.00381
[M+H-NO2-CHO]+
0.00190
[M+H-C10H6(OH)N]+
0.00127
[M+Na]+
5.51055
[M+H-CHN]+
0.02540
[M+H-C12H12N2O3]+
0.00381
[M-H-HF]-
0.00190
[M-H+Ni]+
0.00127
[M+H-NH3]+
1.19494
[M+K-3H]2-
0.01905
[M+H-CH3CO2H]+
0.00381
[M(37Cl)+H]+
0.00190
[M-H-H2O-C4H7CO2H]-
0.00127
[M+NH4]+
0.73715
[M+H-(CH3)2NH]+
0.01524
[M+H-CH3]+.
0.00381
[M-H-C6H10O5]-
0.00190
[M+H-OH]+
0.00127
[M-H-H2O]-
0.34604
[M+H-CHNO]+
0.01333
[M+H-H2]+
0.00381
[M+H-H2O-C6H13N]+
0.00190
[M(81Br)+H]+...
0.00127
[M-H+2Na]+
0.32953
[M+H-C2H6O]+
0.01333
[M+H-C3H8NO6P]+
0.00317
[M+H-H2O-H3PO4]+
0.00190
[M-H-CH2O-CH2NH]-
0.00127
[M-H+H2O]-
0.24508
[M+H-CH4O]+
0.01270
[M+H-C5H14NO4P]+
0.00317
[M+H-C5H7PO6-NH3]+
0.00190
[M+H-CO-CONH]+
0.00127
[M+NH4-H2O]+
0.22984
[M+H-C7H13NO3]+
0.01143
[M+Li-(CH3)3N]+
0.00317
[M-H-C5H7PO6]-
0.00190
[M-H-CONH]-
0.00127
[M+H+H2O]+
0.19429
[M+Na-2H]-
0.00952
[M+Li-C5H14NO4P]+
0.00317
[M+H-H2S]+
0.00190
[M+H-C3H4O2]+
0.00127
[M+H+Na]2+
0.18286
[M-H-CH2O]-
0.00952
[M+Cl]-
0.00317
[M+H-H2O-C8H8]+
0.00190
[M+H-C3H6O4]+
0.00127
[M+H+K]2+
0.17524
[M+H-C11H12N2O3]+
0.00952
[M(35Cl)-H]-
0.00317
[M+H-H2O-NH3-C8H8]+
0.00190
[M+Na-H2S]+
0.00127
[M-2H]2-
0.13968
[M+H-C13H16N3O4]+
0.00952
[M(37Cl)-H]-
0.00317
[M+H-H2O-NH3-C8H8-CO]+
0.00190
[M-H+2Na-H2S]+
0.00127
[M+2Na]2+
0.13778
[M+H-C17H25N3O4]+
0.00952
[M-H-C5H7O6P]-
0.00317
[M+H-H2O-NH3]+
0.00190
[M-C5H5Cl]+
0.00127
[M+2H-NH3]2+
0.13714
[M+CH3CO2]-
0.00889
[M+H-C3H7O5P]+
0.00317
[M+H-C3H6]+
0.00190
[M+H-N2]+
0.00127
[M+K]+
0.13651
[M-H2O+Na]+
0.00825
[M-H-C6H6N8O]-
0.00317
[M+HCO2-320]-
0.00190
[M+H-H2O-CO]+
0.00127
[M+H-2H2O]+
0.11810
[M-H+NH3]-
0.00762
[M(81Br)+H]+
0.00317
[M+H-C3H7N]+
0.00190
[M-H-H3PO4]-
0.00127
[M+3H]3+
0.06667
[M+H-C9H9NO]+
0.00762
[M-C4H9]+
0.00317
[M-H-H2]-
0.00190
[M+H+CH3CN]+
0.00127
[M+2H-H2O]2+
0.06476
[M+H-C15H21N2O3]+
0.00762
[M-2H+3Li]+
0.00254
[M-H-C16H30O-H2O]-
0.00190
[M+H-C4H6]+
0.00127
[M]+.
0.05905
[M-2H+3Na]+
0.00698
[M-H-HCl]-
0.00254
[M-H-CH4O]-
0.00190
[M+H-CH3OH]+
0.00127
[M+2Na-H]+
0.05143
[M+HCO2]-
0.00635
[M+2Li-H]+
0.00254
[M+H-C10H8FN3]+
0.00127
[M+H-HCCl3]+
0.00127
[M-H+2K]+
0.05079
[M+H-NO2]+
0.00571
[M+H-C8H10O2]+
0.00254
[M+Li-C3H5NO2]+
0.00127
[M+H-C2H3N3]+
0.00127
[M+H-CO]+
0.04635
[M+H-C6H13NO2]+
0.00571
[M+H-C2Cl4]+
0.00254
[M+Li-H3PO4]+
0.00127
[M+H-C3H6O2]+
0.00127
[M+H-CO2]+
0.04318
[M-H-C3H5NO2]-
0.00508
[M-H-C7H5NO]-
0.00254
[M-2H+3Li-C15H31CO2H]+
0.00127
[M+H-CH2Cl2O]+
0.00127
[M+H-CH2O2]+
0.03810
[M(81Br)-H]-
0.00508
[M+H-C5H11N]+
0.00254
[M-2H+3Na-C3H5NO2]+
0.00127
[M(356)+H-HCl]+
0.00127
[M-H-NH3]-
0.03746
[M+H-HCO2H]+
0.00508
[M+Ba-H]+
0.00254
[M-2H+Na+Co]+
0.00127
[M-C4H4O4S]+
0.00127
[M.Cl]-
0.03556
[M-2H+Li]-
0.00444
[M+H-C14H25NO3]+
0.00254
[M-2H+Li-C3H5NO2]-
0.00127
[M+H-C8H14O3]+
0.00127
[M+Li]+
0.03111
[M+H-CH4]+
0.00444
[M+H-C6H5NO2S]+
0.00254
[M-2H+Li-C16H30O]-
0.00127
[M+H-C2H4]+
0.00127
…around 290 different adducts
Statistics: Adducts in NIST12 MS/MS DB (80,000 spectra)
Most common adducts for LC-MS ([M+H]+ [M+Na]+ [M+NH4]+ [M+acetate]+)
26
Mass Spectrometry
Mass Spectrometers
• There are several types of mass spectrometers:
- TOF (time of flight)
- Q, QQQ (quadrupole)
- Ion Trap
- Orbitrap
- FTICR (Fourier transform ion cyclotron resonance)
Quad
TOF
Mass Spectrometry
Definitions and concepts
• isomer- compounds with the same chemical formula
e.g. propanol and isopropanol (C3H8O)
C8H10N2O has 100,082,479 isomers
• isobar- compounds with similar masses
e.g. CO (27.9949) and C2H4 (28.0313)
• isotopes- compounds with different numbers of neutrons in their nuclei
e.g. 12C vs 13C
Mass Spectrometry
Definitions and concepts
• Resolution (resolving power)
RP(FWHM) = measured
mass / peak width at 50%
peak intensity
• Accuracy
Difference in true mass
and measured mass
• Mass range
Range of ions that can be
detected
(typically 50-1000 m/z)
Mass Spectrometry
Why is resolution important?
• High resolution is needed to
determine the accurate mass
• High resolution is also needed
to determine accurate isotopic
patterns
• Note:
-monoisotopic vs ave mass
-accurate mass can distinguish
isobars, not isomers
Mass Spectrometry
Definitions and concepts
• Dynamic range- the concentration range over which a linear response is obtained
Determines the capability of an instrument to do
quantitative analysis
• Sensitivity- the lowest amount an instrument can detect
• matrix effects- signal is muted due to complex sample or other unknown processes
• Speed- the number of spectra or scans that can be acquired in one second
1 scan/ sec = very slow
500 scans/sec = very fast
Mass Spectrometry
Why is high speed important?
In order to deconvolute (separate/clean) overlapping peaks, enough mass spectra have to be acquired
to perform the mathematical calculations. With only one spectrum per second this is impossible.
That requires:
a) fast scanning detectors like time-of-flight (TOF)
b) fast data acquisition hardware/software (DAC/ADC)
The LECO TOF can acquire up to 500 mass spectra per second.
For GC-MS 20 spectra/second sufficient for comprehensive GC (GCxGC) up to 200 spectra/sec needed
32
Source: LECO ChromaTOF Helpfile
Mass Spectrometry
Properties of various mass spectrometers
TOF
Quad
Ion Trap
Orbitrap
FT-ICR
Resolving Power
very good
fair
fair
very good
excellent
Dynamic Range
very good
excellent
fair
fair
fair
Sensitivity
excellent
excellent
excellent
excellent
excellent
Speed
excellent
good
excellent
good
fair
Cost
150-300K
100K
100K
500K
1M
Maintenance
ave
ave
ave
ave
very high
Data Analysis
Goals
• Huge data files
• Identify all peaks
In practice this is very difficult if not impossible
• quantification or semi-quantification of compounds
Often involves comparing -fold changes in samples or groups of samples
e.g. wild-type vs knockout plant
Various statistical tests to look for differences in the treatment groups
e.g. PCA, MCA, ANOVA
Data Analysis
Identifying peaks
• MS libraries can identify peaks (mostly GC/MS), especially when combined with RT
information (GC/MS only):
e.g. NIST library
Data Analysis
Activity 1: Identifying peaks
• Can you find sucrose in a MS dataset?
Example: sucrose (C12H22O11)
Data Analysis
Activity 1: Identifying peaks
• Accurate mass can help determine the chemical formula:
Example: sucrose (C12H22O11)
-Determine monoisotopic mass at http://www.chemspider.com/
(342.116211 Da)
-Determine M+H from MS adduct excel sheet (class website)
(343.123487 Da)
Lets say you find that mass in the dataset, but is it really sucrose?
-Download Molecular weight calculator at
http://www.alchemistmatt.com/mwtwin.html
-Open formula finder under tools
-enter molecular weight target: 342.116211
-how many isobars are at 2 ppm? 0.1 ppm
-enter 342.116211 at chemspider, how many isomers?
Data Analysis
Example output of a metabolomics experiment
• Open GC-TOF-MS dataset from class website:
-How many compounds identified? How many significant -fold changes
-Pathway analysis at http://www.metaboanalyst.ca/MetaboAnalyst/
-enter compound names or KEGG IDs for significant -fold changes
-choose organism ‘E. coli’ and submit
- Which pathways are affected in this dataset?
• Open HILIC-TOF-MS dataset from class website:
-How many compounds identified? How many significant -fold changes
-How many unidentified peaks?
-Can you identify an unknown peak with a significant fold change
Download