Know your data

advertisement
RTI International
Designing a high quality metabolomics
experiment
Grier P Page Ph.D.
Senior Statistical Geneticist
RTI International
Atlanta Office
gpage@rti.org
770-407-4907
RTI International is a trade name of Research Triangle Institute.
www.rti.org
RTI International
Metabolomics is Powerful and Central
RTI International
Designing a good study
RTI International
Errors Errors Everywhere
RTI International is a trade name of Research Triangle Institute.
www.rti.org
RTI International
RTI International
UMSA Analysis
Day 1
Day 2
Insulin Resistant
Insulin Sensitive
RTI International
Primary consideration of good
experimental design

Understand the strengths and
weaknesses of each step of the
experiments.
 Take these strengths and
weaknesses into account in your
design.
RTI International
RTI International is a trade name of Research Triangle Institute.
www.rti.org
From Drug Discov Today. 2005 Sep 1;10(17):1175-82.
RTI International
State the Question and Articulate the Goals
RTI International
The Myth That Metabolomics does not
need a Hypothesis




There always needs to be a biological
question in the experiment. If there is not
even a question don’t bother.
The question could be nebulous: What
happens to the metabolome of this tissue
when I apply Drug A.
The purpose of the question is to drive the
experimental design.
Make sure the samples answer the
question: Cause vs. effect.
RTI International
RTI International
Design Issues

Known sources of non-biological error (not
exhaustive) that must be addressed
–
–
–
–
–
–
–
Technician / post-doc
Reagent lot
Temperature
Protocol
Date
Location
Cage/ Field positions
RTI International
Experimental Design
RTI International
Biological replication is essential.


Two types of replication
– Biological replication – samples from different
individuals are analyzed
– Technical replication – same sample measured
repeatedly
Technical replicates allow only the effects of
measurement variability to be estimated and reduced,
whereas biological replicates allow this to be done for
both measurement variability and biological differences
between cases. Almost all experiments that use
statistical inference require biological replication.
RTI International
How many replicates?




Controlled experiments – cell lines, mice, rats 8-12 per
group.
Human studies – discovery 20+ per group
For predictive models – 100+ per group, need model
building and validation sets
The more the better, always.
RTI International
Experimental Conduct
All experiments are subject to nonbiological variability that can confound any
study
RTI International
Control Everything!
Know what you are doing
 Practice!
 Practice!

RTI International
What if you can’t control or make all
things uniform

Randomize
 Orthogonalize
RTI International
What are Orthogonalization and
Randomization ?

Orthogonalization- spreading the biological sources of
error evenly across the non-biological sources of error.
–

Maximally powerful for known sources of error.
Randomization – spear the biological sources of error at
random across the non-biological sources of error.
–
Useful for controlling for unknown sources of error
RTI International
Examples of Orthogonalization and
Randomization ?
The experiment
Orthogonalize
Randomize
Sample #
Treatment
Variety
Order Sample
Order
Sample
1
2
3
4
5
6
7
8
1
1
1
1
2
2
2
2
1
2
1
2
1
2
1
2
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
7
6
4
1
2
8
5
3
1
2
5
6
8
7
4
3
RTI International
Statistical analyses have assumptions too
RTI International is a trade name of Research Triangle Institute.
www.rti.org
RTI International
Statistical analyses

Supervised analyses – linear models etc
–
–
Assume IID (independently identically distibuted)
Normality
Sometimes can rely on central limit
‘Weird’ variances
Using fold change alone as a statistic alone is not valid.
–
‘Shrinkage’ and or use of Bayes can be a good thing.
–
–
–

False-discovery rate is a good alternative to
conventional multiple-testing approaches.
 Pathway testing is desirable.
RTI International
Classification

Supervised classification
– Supervised-classification procedures require
independent cross-validation.
– See MAQC-II recommendations Nat Biotechnol. 2010
August ; 28(8): 827–838. doi:10.1038/nbt.1665.
 Wholly separate model building and validation
stages. Can be 3 stage with multiple models tested
 Unsupervised classification
– Unsupervised classification should be validated using
resampling-based procedures.
RTI International
Unsupervised classification - continued

Unsupervised analysis methods
–
–
–

Cluster analysis
Principle components
Separability analysis
All have assumptions and input parameters and
changing them results in very different answers
RTI International
RTI International
RTI International
Sample size estimation for metabolomics
studies
RTI International
There is strength in numbers —
power and sample size .

Unsupervised analyses
–
Principal components, clustering, heat maps
and variants
– These are actually data transformations or
data display rather than hypothesis testing,
thus unclear if sample size estimation is
appropriate or even possible.
– Stability of clustering may be appropriate to
think about. Garge et al 2005 suggested 50+
samples for any stability.
RTI International
Sample size in supervised experiments

Supervised analyses
–
–
Linear models and variants
Methods are still evolving, but we suggest the
approach we developed for microarrays may
be appropriate for metabolomics (being
evaluated)
RTI International
RTI International
RTI International
Metabolomics does not reveal everything and
different technologies show different things
RTI International is a trade name of Research Triangle Institute.
www.rti.org
RTI International

Technology and
detection evolves over
time.
RTI International
Technologies are not perfect in agreement
RTI International
The human urine metabolome
RTI International
Sample, Image and Data Quality Checking
RTI International
RTI International
RTI International
RTI International
RTI International
RTI International
Metabolite quality


Still evolving field
RTI is one of the Metabolomics Reference
Standards Synthesis Centers
RTI International
Know your data - What should it look like
These are OK
These are not OK
RTI International
One bad sample can contaminate an
experiment
Histogram of p-values
Potentially Bad Data
Histogram of p-values with bad
data removed
RTI International
Quality of Database, Bioinformatics and
Interpretative tools
RTI International
Understand what databases include, don’t
include, and assumptions

Just because a database says something
does not mean it is right. Read the
evidence.
 Databases are biased.
 Databases are incomplete
 Databases have lots of data
 Understand data before you use it
 Database are useful!
RTI International
Issues in the Annotation of Genes, proteins,
metabolites
RTI International is a trade name of Research Triangle Institute.
www.rti.org
RTI International
Gene Symbol
Aco2
Pdk2
Pdk2
Pdha2
Idh1
Acly
Aco2
Fh1
Atp5g3
Suclg1
Mdh1
Mor1
Idh1
Idh3g
Dlst
Sdhd
Sdhc
RGD:735073
Cs
RGD:621624
Idh3B
Mdh1
Pc
RGD:708561
RGD:708561
Dlat
Sdhd
Sdha
Idh3a
Pdk4
Cs
Acly
p-value
0.746656
0.967577
0.823635
0.368075
0.710704
0.367315
1.22E-06
6.76E-06
1.53E-06
8.87E-07
5.92E-09
4.24E-07
2.36E-06
2.19E-06
2.49E-07
5.13E-07
1.82E-06
2.13E-07
1.56E-07
1E-06
2.57E-07
1.08E-05
1.91E-05
0.004002
0.03978
4.76E-06
1.3E-06
7.85E-06
0.000449
0.044616
1.36E-06
0.000227
Annotation is inconsistent across sources
fc 50/21
0.955755
1.005459
1.02781
1.403263
0.994378
0.982691
0.561041
0.690515
0.754735
0.694384
0.519311
0.617645
0.677013
0.709971
0.688339
0.583485
0.64108
0.570307
0.560436
0.486736
0.694389
0.496911
0.468765
0.76777
0.686511
0.435534
0.64335
0.730667
0.690147
1.700116
0.592128
0.554459
Gene Ontology Biological Process
Gene Ontology Cellular ComponentPathway
----Krebs-TCA_Cycle // Ge
6086 // acetyl-CoA biosynthesis from pyruvate
5739 // mitochondrion //
Krebs-TCA_Cycle // Ge
6086 // acetyl-CoA biosynthesis from pyruvate
5739 // mitochondrion //
Krebs-TCA_Cycle // Ge
6096 // glycolysis
5739 // mitochondrion
Krebs-TCA_Cycle // Ge
6099 // tricarboxylic acid cycle
5829 // cytosol
--6099 // tricarboxylic acid cycle
5622 // intracellular
Fatty_Acid_Synthesis
----Krebs-TCA_Cycle // Ge
6099 // tricarboxylic acid cycle //
5739 // mitochondrion
Krebs-TCA_Cycle // Ge
6099 // tricarboxylic acid cycle //
5739 // mitochondrion
--6099 // tricarboxylic acid cycle //
5739 // mitochondrion
Krebs-TCA_Cycle // Ge
6099 // tricarboxylic acid cycle //
--Krebs-TCA_Cycle // Ge
6099 // tricarboxylic acid cycle //
5739 // mitochondrion
Krebs-TCA_Cycle // Ge
6099 // tricarboxylic acid cycle //
5829 // cytosol //
--6099 // tricarboxylic acid cycle //
5739 // mitochondrion
Krebs-TCA_Cycle // Ge
------6121 // mitochondrial electron transport, succinate
5749 // respiratory
to ubiquinone
chain complex II Krebs-TCA_Cycle
(sensu Eukaryota)// Ge
--------9352 // dihydrolipoyl dehydrogenase---complex
--5739 // mitochondrion
Krebs-TCA_Cycle // Ge
6099 // tricarboxylic acid cycle //
5829 // cytosol
------Krebs-TCA_Cycle // Ge
6099 // tricarboxylic acid cycle //
--Krebs-TCA_Cycle // Ge
6094 // gluconeogenesis //
5739 // mitochondrion
Krebs-TCA_Cycle // Ge
--5913 // cell-cell adherens junction Krebs-TCA_Cycle // Ge
--5913 // cell-cell adherens junction Krebs-TCA_Cycle // Ge
6086 // acetyl-CoA biosynthesis from pyruvate
5739 //// inferred
mitochondrion
from electronic
//
annotation
Krebs-TCA_Cycle
/// 6096 // glycoly
// Ge
6121 // mitochondrial electron transport, succinate
5749 // respiratory
to ubiquinone
chain// complex
inferred from
II Krebs-TCA_Cycle
(sensu
sequence
Eukaryota)
or structu
////Ge
in
6099 // tricarboxylic acid cycle //
5739 // mitochondrion //
Krebs-TCA_Cycle // Ge
6099 // tricarboxylic acid cycle //
5739 // mitochondrion //
Krebs-TCA_Cycle // Ge
6086 // acetyl-CoA biosynthesis from pyruvate
5739 // mitochondrion //
Krebs-TCA_Cycle // Ge
--5739 // mitochondrion //
Krebs-TCA_Cycle // Ge
6085 // acetyl-CoA biosynthesis
5622 // intracellular //
Fatty_Acid_Synthesis
RTI International
Issues with pathway data
RTI International is a trade name of Research Triangle Institute.
www.rti.org
RTI International
TCA cycle from Ingenuity
TCA from GeneMAPP
TCA cycle from Ingenuity
RTI International
Share Your Data
Use shared data!
RTI International is a trade name of Research Triangle Institute.
www.rti.org
RTI International
Metabolomics WorkBench

http://www.metabolomicsworkbench.org/
RTI International
MetaboLights
RTI International
Overshare your data and show work


Practice compendium research – to allow others to
replicate your work
Many high profile omic studies are not even technically
reproducible
RTI International
Use metabolomics databases

Limited in the literature
so far. Some work on
tissue and species
metabolomes.
RTI International
Summary

Design your experiment well
 Conduct your experiment well
 Control for non-biological sources of error
 Know what is good and bad quality data at each stage
including metabolite, image, data, and annotation
 If you are aware of these issues and control for them
highly powerful and reproducible metabolite
experimentation is possible.
 Else you get garbage
 Share your data and use shared data
RTI International
References






The MicroArray Quality Control (MAQC)-II study of common
practices for the development and validation of microarray based
predictive models. Nat Biotechnol. 2010 August ; 28(8): 827–838.
Microarray data analysis: from disarray to consolidation and
consensus. Nat Rev Genet. 2006 Jan;7(1):55-65.
Baggerly K. "Disclose all data in publications." Nature. 2010 Sep
23;467(7314):401. PMID: 20864982
Repeatability of published microarray gene expression analyses.
Nat Genet. 2009 Feb;41(2):149-55
A design and statistical perspective on microarray gene
expression studies in nutrition: the need for playful creativity and
scientific hard-mindedness. Nutrition. 2003 Nov-Dec;19(11-12):9971000.
39 Steps. From Drug Discov Today. 2005 Sep 1;10(17):1175-82.
If time allows
RTI International
RTI Regional Comprehensive
Metabolomics Resource Core
(RTI RCMRC)
Susan Sumner, PhD
Director RTI RCMRC
Discovery Sciences
Proteomics and Metabolomics Programs
RTI International
RTI International is a trade name of Research Triangle Institute.
www.rti.org
RTI International
Contact Information for the RTI RCMRC
Susan C.J. Sumner, PhD
Director RTI RCMRC
Senior Scientist nanoSafety
RTI International
Discovery Sciences
3040 Cornwallis Drive
Research Triangle Park
North Carolina 27709
ssumner@rti.org
919-541-7479 (office)
919-622-4456 (cell)
Jason P. Burgess, PhD
Program Coordinator,
RTI RCMRC
Associate Director,
Discovery Sciences
RTI International
3040 Cornwallis Drive
Research Triangle Park
North Carolina 27709
jpb@rti.org
919-541-6700 (office)
RTI International
MS and NMR Instruments at RTI and DHMRI
RTI
Mass Spectrometers (38)
LC-MS
GC-MS
GC x GC-TOF-MS
ICP-MS
MALDI ToF/ToF
NMR (6)
DHMRI
13
4
1
6
2
6
3
1
1
1
2
4
RTI International
Some RTI Metabolomics Applications and Pilots
Experience with adolescent and adult human subject
research, animal model and cell based research, e.g.,
 Apoptosis- cells
 Drug induced liver injury- animal models
 in utero exposure to chemicals and fetal imprintinganimal models
 Dietary exposure and imprinting- animal models
 NAFLD - pediatric obesity; microbiome
 Weight Loss- pediatric obesity
 Preterm delivery- human subjects
 Response to vaccine- human subjects
 Nicotine withdrawal- human subjects
 Colon cancer- human subjects
RTI International
Pilot and Feasibility Studies

The aim of the pilot and feasibility program is to foster collaborations and
promote the use of metabolomics.

Studies will be selected through an application process.
–
Application involves abstract, description of samples available (matrix type, volume,
type and duration of storage, sample processing, freeze thaws, etc), description of
phenotypes, and plan for subsequent grant/contract submissions for metabolomics
analysis beyond initial pilot study.

Applications may also include technology development.

Applications must agree to deposit data in DRCC, coauthor publications,
and submit joint grant/contract proposals.

Deadlines being defined
Download