METABOLOMICS DEFINITIONS & BACKGROUND 

advertisement

METABOLOMICS

File name Metabolic Profiling and Metabolomics 2011

DEFINITIONS & BACKGROUND

Metabolome

= the total metabolite pool (by analogy to genome, transcriptome, proteome).

Terminology summary – Fig. 1

The term 'metabolome' refers to the entire complement of all the low molecular weight metabolites in a sample such as a leaf, fruit, or tuber. Low molecular weight metabolites are small organic molecules that include sugars, amino acids, organic acids, sugar phosphates, cofactors, secondary products, etc.

Metabolomics

(more-or-less synonymous with metabolic profiling, metabolic phenotyping)

= high-throughput analysis of metabolites.

Metabolomics is the simultaneous ('multiparallel') measurement of the levels of a large number of cellular metabolites (typically several hundred). Many of these are not identified (i.e. are just peaks in a profile).

Metabolomics analysis is like a snapshot, showing which compounds are present and at what relative levels at a specific time point.

More generally, metabolomics refers to a holistic analytical approach to metabolism that is not guided by specific hypotheses. Instead, metabolomics sets out to determine how (in principle, all) metabolite levels respond to genetic or environmental changes and, from the data, to generate new hypotheses.

Fluxomics –

Branch of metabolomics that measures the turnover of metabolites in pathways using labeled isotopes such as

13

C. It is just beginning; instead of being a snapshot of metabolism, it is a movie.

History & Development

Metabolic profiling is not new. Profiling for clinical detection of human disease using blood and urine samples has been carried out for >30 years. Chuck Sweeley at MSU pioneered this, using gas chromatography/mass spectrometry (GC-MS). See:

Gates SC, Sweeley CC (1978) Quantitative metabolic profiling based on gas chromatography.

Clin Chem 24:1663-73.

Quantitative metabolic profiles of volatilizable components of human biological fluids, e.g. urinary organic acids, were established using GC/MS. Data were processed by computer and statistical methods for analyzing metabolic profiles were developed.

[Note that all the elements of metabolic profiling are here.]

Plant metabolic biochemists (e.g. Lothar Willmitzer) were among other early leaders in the field.

1

Metabolomics is expanding to catch up with other multiparallel analytical techniques

(transcriptomics, proteomics) but remains far less developed and less accessible.

Plant Metabolome Size

For all plant species together, this is estimated to be 90,000-200,000 compounds. There are far fewer in any one species, e.g. ~5,000 in Arabidopsis.

The plant metabolome is much larger than that of yeast, where there are far fewer metabolites than genes or proteins (<600 metabolites vs. 6000 genes). The size of the plant metabolome reflects the vast array of plant secondary compounds. This makes metabolic profiling in plants much harder than in other organisms.

Metabolomics Compared to Genomics, Transcriptomics & Proteomics

Differences between metabolomics and the other multiparallel approaches:

(a) Conceptual: 1 GENE → 1 mRNA → 1 Protein → Many Metabolites

(and conversely: Many proteins

→ 1 Metabolite

)

There is no direct relationship between metabolite and gene in the way there is between genes and mRNAs and proteins. A single gene does not specify the level of a single metabolite, i.e. its pool size (although it may determine whether the metabolite is present or absent).

Rather, as MCA teaches, the level of a metabolite is determined by the activities of all the enzymes of all the pathways that involve that metabolite, and by effectors that act on these enzymes. In practice, therefore, metabolite levels change according to developmental, physiological, and pathological states.

Biological variance in metabolite levels (i.e., the variation between genetically identical plants grown in the same conditions) is accordingly large – about 10× the analytical variability – and limits the resolution of metabolomics.

(b) Chemical: Unlike nucleic acids and proteins, metabolites have a vast range of chemical structures and properties. Their molecular weights span two orders of magnitude (30–3000 Da).

Therefore no single extraction or analysis method works for all metabolites. (Unlike DNA sequencing, microarrays, MS analysis of proteins – all are general methods.)

(c) Dyamic: Many metabolite levels change with half times of minutes or seconds – far faster than nucleic acids or proteins. Thus valuable information is lost if sampling times are too far apart. Also drastic artifactual changes can occur in short intervals between harvest and extraction; this adds to biological variance.

Power of Metabolomics

– Metabolomics analysis can powerfully complement transcriptomics and proteomics. Metabolomes are a step nearer actual function.

Transcriptomes or proteomes are very inadequate monitors of cell function because there is no simple relationship between mRNA or protein levels and metabolism.

2

Thus changes in mRNA level or protein level in mutants or transgenics are usually not closely linked to changes in metabolic function or phenotype as a whole.

Part of the reason for this is the non-linear relation between mRNA and protein levels ( Fig. 2 ) and the typically hyperbolic relation between enzyme level and in vivo flux rate (see MCA class). Another cause is the high level of functional redundancy in plant metabolism – i.e. parallel or alternative pathways for the same process.

Silent Knockout Mutations.

~90% of Arabidopsis knockout mutations are silent – i.e. have no visible phenotype and so provide no clues to gene function. (The search for some sort of visible phenotype therefore often becomes desperate.) The situation in yeast is similar – up to 85% of yeast genes are not needed for survival.

When there is little or no change in growth rate (visible phenotype) of a knockout mutant, the pool sizes of metabolites have altered so as to compensate for the effect of the mutation, leaving metabolic fluxes are unchanged. Thus – intuitively – mutations that are silent when scored for metabolic fluxes or growth rate (growth rate is the sum of all metabolic fluxes) should have obvious effects on metabolite levels. There is a firm theoretical basis for this in MCA.

Example.

In the Chloroplast 2010 project (phenotype analysis of knockouts of Arabidopsis genes encoding predicted chloroplast proteins):

Fig. 3 – Various knockouts showed essentially normal growth and color but highly abnormal free amino acid profiles, e.g. At1g50770 (‘Aminotransferase-like’)

METABOLIC PROFILING METHODS

Sample Preparation

Metabolites are typically extracted in aqueous or methanolic media, then fractionated into lipophilic and polar phases that are then analyzed separately. Further fractionation of each phase may follow to split metabolites into classes prior to analysis.

No single extraction procedure works for all metabolites because conditions that stabilize one type of compound will destroy other types or interfere with their analysis. Therefore the extraction protocol has to be tailored to the metabolites to be profiled.

In practice, these considerations mean that metabolic profiling is often confined to fairly stable compounds that can be extracted together. These include major primary metabolites (sugars, sugar phosphates, amino acids, and organic acids) and certain secondary metabolites (e.g., phenylpropanoids, alkaloids).

The most comprehensive profiling can cover several hundred such compounds, many of which are unidentified. Many crucial metabolites, particularly minor or unstable ones, are currently being missed in metabolomics analyses.

3

Main Analytical Techniques

• Gas Chromatography/Mass-Spectrometry (GC/MS)

In GC/MS, the sample is first derivatized to increase metabolite stability and volatility. The derivatized mix is then fractionated by a gas chromatograph that is coupled to a mass spectrometer.

The mass spectrometer scans the peaks emerging from the GC column at frequent intervals (~1 sec) and so acquires the mass spectrum of each peak, from which peaks can be identified and quantified. Mass spectrometry ‘weighs’ ionized individual molecules and their fragments.

Molecules are identified from their fragmentation pattern and ‘weights’ (mass/charge ratios – m/z values), with the help of mass spectra libraries, and can be quantified from peak size.

Overlapping peaks can be deconvoluted because the spectra of their constituents are distinct

( Fig. 4 ).

Unfortunately, knowing only the exact masses of molecules and their fragments is not enough to identify them. Huge number of chemical structures can have the same exact mass. This is why libraries of retention times and mass spectra, determined for standard compounds, are critical.

The major challenge for metabolomics is identification of unknown peaks. Basically, standards are essential to the process. If there is no standard, a compound cannot be identified with certainty. Thus, the more novel the compound, the less powerful metabolomics becomes.

Mass spectrometry (MS) metabolomic datasets provide relative quantification of cellular metabolites (i.e. –fold changes in levels between different samples. Absolute quantification (i.e. moles per weight of tissue) is possible with MS methods but requires an authentic standard for each metabolite to be quantified.

Animated explanation of GC/MS: http://www.shsu.edu/~chm_tgc/sounds/flashfiles/GC-

MS.swf

Tutorial on MS: http://www.asms.org/whatisms/page_index.html

• Liquid Chromatography/Mass-Spectrometry (LC/MS)

In LC/MS (also termed high performance liquid chromatography, HPLC/MS) the samples are not derivatized before analysis and an HPLC instrument is used for separation. LC/MS is more suitable than GC/MS for labile compounds, for those that are hard to derivatize, or hard to render volatile. LC/MS is less developed than GC/MS. A closely related method is capillary electrophoresis (CE)/MS.

Fig. 5 – Profiling example: Metabolites related to plant isoprenoid biosynthesis. The total ion chromatogram (TIC) is the total output of the ion detector; the extracted ion chromatograms

(EICs) are the outputs for particular ions characteristic of isoprenoid synthesis intermediates.

4

• Nuclear Magnetic Resonance (NMR) Spectroscopy

Advantages of NMR over MS:

- NMR does not destroy the sample

- NMR can detect and quantify metabolite because the signal intensity is only determined by the molar concentration

- NMR can provide comprehensive structural information, including stereochemistry

Many atoms have nuclei that are NMR active, but most NMR data are collected for

1

H and

13

C since these are present in all organic molecules.

The main weakness of NMR is low sensitivity relative to MS. It is therefore less suited for analysis of trace compounds. As the natural abundance of

13

C is only 1.1%,

13

C-NMR is less sensitive than

1

H-NMR. Recent developments have considerably increased sensitivity, making it less of a problem.

NMR uses radio-frequency (RF) radiation and magnetic fields. RF radiation is used to stimulate nuclei present within molecules. The information obtained is displayed as a spectrum. The horizontal axis is the chemical shift (delta, in units of ppm), which is a measure of the position at which RF absorption occurs relative to an internal standard (tetramethylsilane, TMS). The vertical axis is the intensity of the absorption. As with other spectral techniques, compounds have characteristic spectra. More than 100 metabolites occur in plants at levels high enough for analysis by NMR, so NMR spectra of mixtures contain many peaks.

Fig. 6 – Profiling example:

1

H-NMR spectra of extracts of leaves of various Verbascum species

(medicinal plants)

Signal overlap is a problem in the complex spectra of plant extracts. Signal overlap hampers metabolite identification and quantification. Better signal resolution can be obtained using various types of 2D NMR spectroscopy. These approaches cut signal overlap by spreading the resonances in a second dimension.

Example: Heteronuclear single quantum coherence (HSQC) spectroscopy. The 2D spectrum has one axis for

1

H and the other for a heteronucleus (an atomic nucleus other than a proton), usually

13

C or

15

N. The spectrum contains a peak for each unique proton attached to the heteronucleus being considered.

Fig. 7 – HSQC used to select for protons directly bonded to 13 C. ( a ) 1D 1 H NMR spectrum of an equimolar mixture of the 26 small-molecule standards. ( b ) 2D

1

H–

13

C HSQC NMR spectra of the same synthetic mixture overlaid onto a spectrum of aqueous whole-plant extract from

Arabidopsis . Note the greatly improved resolution.

NMR tutorial: http://www.cis.rit.edu/htbooks/nmr/

Data Analysis

5

The avalanche of metabolome data presents great difficulties to analyze. There are also challenges in archiving such data; a standard framework for this is in place.

The problems in extracting meaning from large data sets are similar for all forms of profiling.

The goal is to recognize patterns for further exploration.

Various data mining tools are used for this. These statistical tools reduce data complexity by focusing on the information content of a given data set, i.e. they try to ‘tame’ the wild profusion of profiling data. Unlike many other statistical procedures, these methods are mostly applied when there are no a priori hypotheses.

Data mining tools include cluster analysis (CA) and principal components analysis (PCA). The metabolite data can be known or unidentified peaks.

CA and PCA can establish ‘guilt by association’ – they can point to where in metabolism mutations act from the similarity of their metabolite profiles to those of known mutations.

External factors (e.g. toxins, herbicides, environmental insults) can be studied in an analogous way.

Thus, in principle, the function of an unknown gene can be determined by comparing the metabolic profile of a mutant in that gene with a library of such profiles generated by deleting individual genes of known function.

Caution: This approach may not be so useful for dissecting metabolic responses to normal environmental variations (e.g. in nutrient level, soil aeration, salinity, water supply). There is good reason from MCA theory and from observation to expect such variations to cause relatively little change in metabolite levels. This is because all enzymes in affected pathways tend to be up- or down-regulated together ( Fell, 2005 ).

Two key drawbacks of clustering and other current data mining methods are:

Typically, they detect only simple, one-to-one linear relationships. They do not detect non-linear or multi-input relationships, which are common in biology.

They do not assign confidence levels, so it is not clear which clusters are trustworthy when the input data are not well separated.

Cluster Analysis (CA)

CA is a set of statistical methods that group similar data together. The group (‘cluster’) members have certain properties in common and the resultant classification can yield new insights. The classification reduces the dimensionality of a data set. Data are presented in dendrograms that emphasize natural groupings.

Profiling example: Fig. 8 – Dendrogram of the metabolic profiles of transgenic potato tubers and tubers incubated in a range of glucose concentrations (0 to 500 mM). Note that:

The glucose-fed samples form a cluster that is nearer the cluster of wild-type samples than any of the transgenics.

6

 That independent transgenic lines carrying the same transgene (e.g., the four ‘SP’ lines) tend to cluster together (the principle of ‘guilt by association’).

Principal Component Analysis (PCA)

PCA uses all the metabolite data from a sample to compute an individual metabolic profile that is then compared to all the other profiles. In essence, PCA takes the resulting cloud of data points and rotates it such that the maximum variability is visible – i.e. the extraction of principal components amounts to a variance maximizing rotation of the original variable space. PCA finds the vectors (‘principal components’) that give the best overall sample separation.

The data can be represented as two- or three-dimensional plots in which the axes (principal components or vectors) are those that include as much as possible of the total information derived from metabolic variances.

Profiling example: Fig. 9 – Clusters found after PCA analysis of the same data set for potato tubers as above. Note that:

The two components chosen account together for 69% of the total metabolic variance, i.e. only 1/3 of the original variation has been lost during data reduction.

As before, the glucose-fed samples form a cluster that is nearer the cluster of wild-type samples than any of the transgenics.

Again, independent transgenic lines carrying the same transgene (e.g., the four ‘SP’ lines) tend to cluster together.

Simple Correlations

Computer-generated pairwise plots of every metabolite in the data set against every other metabolite can be informative. But when hundreds of metabolites are analyzed the potential number of such plots is very large – many thousands – and most of them will show no relationship.

Profiling examples: Fig. 10 – correlations between pairs of metabolites among transgenic potato tubers. Note:

The linear correlation (Frame A) between glucose-6-phosphate and fructose-6-phosphate levels. These metabolites are interconvertible by phosphoglucose isomerase, which catalyzes a near-equilibrium reaction. A linear relation is thus predicted.

The non-linear correlation between methionine and lysine levels (Frame C), in which lysine accumulates continuously but methionine reaches a plateau. This is expected because methionine synthesis is under tighter feedback and feedforward control than lysine.

Metabolomics Resources

http://fiehnlab.ucdavis.edu/

Oliver Fiehn’s group at UC Davis. Includes databases.

7

http://www.noble.org/plantbio/MS/metabolomics.html

Lloyd Sumner’s group at the Noble

Foundation. Useful short summary of analytical approaches and bioinformatics involved in metabolomics. http://dbkgroup.org/default.htm

Douglas Kell’s group at University of Manchester – a gateway site with explanations of metabolic profiling technologies and links to other useful sites.

Useful values (for interpreting metabolite concentration data):

In typical plant tissues, dry weight is ~10% of fresh weight (so that there is ~ 0.9 ml of water per gram fresh weight)

In very rough terms, the cytoplasmic volume is 10% of the total tissue water volume.

(‘Cytoplasm’ includes mitochondria, plastids, peroxisomes, nucleus, and cytosol). The vacuolar volume is 70% of total water, and extracellular water is 20% . The extracellular water compartment is also termed the apoplast; the cytoplasmic + vacuole (i.e. intracellular) water compartment is also termed the symplast.

-

Plant leaves typically have a protein content of ~20% of dry weight. N content × 6.25 = protein content (i.e. protein is ~16% N). The free amino acid content of plant tissues is usually only a few percent of the protein-bound amino acid content.

The osmotic potential of a typical plant cell is ~ -10 bars. A 1 molar solution of a sugar or other non-dissociating solute has an osmotic potential of ~ -25 bars; that of a 1 molar solution of a salt such as NaCl is ~ -45 bars. Thus the intracellular accumulation of high concentrations of small molecules or salts has osmotic implications.

8

Download