RTI International Designing a high quality metabolomics experiment Grier P Page Ph.D. Senior Statistical Geneticist RTI International Atlanta Office gpage@rti.org 770-407-4907 RTI International is a trade name of Research Triangle Institute. www.rti.org RTI International Metabolomics is Powerful and Central RTI International Designing a good study RTI International Errors Errors Everywhere RTI International is a trade name of Research Triangle Institute. www.rti.org RTI International RTI International UMSA Analysis Day 1 Day 2 Insulin Resistant Insulin Sensitive RTI International Primary consideration of good experimental design Understand the strengths and weaknesses of each step of the experiments. Take these strengths and weaknesses into account in your design. RTI International RTI International is a trade name of Research Triangle Institute. www.rti.org From Drug Discov Today. 2005 Sep 1;10(17):1175-82. RTI International State the Question and Articulate the Goals RTI International The Myth That Metabolomics does not need a Hypothesis There always needs to be a biological question in the experiment. If there is not even a question don’t bother. The question could be nebulous: What happens to the metabolome of this tissue when I apply Drug A. The purpose of the question is to drive the experimental design. Make sure the samples answer the question: Cause vs. effect. RTI International RTI International Design Issues Known sources of non-biological error (not exhaustive) that must be addressed – – – – – – – Technician / post-doc Reagent lot Temperature Protocol Date Location Cage/ Field positions RTI International Experimental Design RTI International Biological replication is essential. Two types of replication – Biological replication – samples from different individuals are analyzed – Technical replication – same sample measured repeatedly Technical replicates allow only the effects of measurement variability to be estimated and reduced, whereas biological replicates allow this to be done for both measurement variability and biological differences between cases. Almost all experiments that use statistical inference require biological replication. RTI International How many replicates? Controlled experiments – cell lines, mice, rats 8-12 per group. Human studies – discovery 20+ per group For predictive models – 100+ per group, need model building and validation sets The more the better, always. RTI International Experimental Conduct All experiments are subject to nonbiological variability that can confound any study RTI International Control Everything! Know what you are doing Practice! Practice! RTI International What if you can’t control or make all things uniform Randomize Orthogonalize RTI International What are Orthogonalization and Randomization ? Orthogonalization- spreading the biological sources of error evenly across the non-biological sources of error. – Maximally powerful for known sources of error. Randomization – spear the biological sources of error at random across the non-biological sources of error. – Useful for controlling for unknown sources of error RTI International Examples of Orthogonalization and Randomization ? The experiment Orthogonalize Randomize Sample # Treatment Variety Order Sample Order Sample 1 2 3 4 5 6 7 8 1 1 1 1 2 2 2 2 1 2 1 2 1 2 1 2 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 7 6 4 1 2 8 5 3 1 2 5 6 8 7 4 3 RTI International Statistical analyses have assumptions too RTI International is a trade name of Research Triangle Institute. www.rti.org RTI International Statistical analyses Supervised analyses – linear models etc – – Assume IID (independently identically distibuted) Normality Sometimes can rely on central limit ‘Weird’ variances Using fold change alone as a statistic alone is not valid. – ‘Shrinkage’ and or use of Bayes can be a good thing. – – – False-discovery rate is a good alternative to conventional multiple-testing approaches. Pathway testing is desirable. RTI International Classification Supervised classification – Supervised-classification procedures require independent cross-validation. – See MAQC-II recommendations Nat Biotechnol. 2010 August ; 28(8): 827–838. doi:10.1038/nbt.1665. Wholly separate model building and validation stages. Can be 3 stage with multiple models tested Unsupervised classification – Unsupervised classification should be validated using resampling-based procedures. RTI International Unsupervised classification - continued Unsupervised analysis methods – – – Cluster analysis Principle components Separability analysis All have assumptions and input parameters and changing them results in very different answers RTI International RTI International RTI International Sample size estimation for metabolomics studies RTI International There is strength in numbers — power and sample size . Unsupervised analyses – Principal components, clustering, heat maps and variants – These are actually data transformations or data display rather than hypothesis testing, thus unclear if sample size estimation is appropriate or even possible. – Stability of clustering may be appropriate to think about. Garge et al 2005 suggested 50+ samples for any stability. RTI International Sample size in supervised experiments Supervised analyses – – Linear models and variants Methods are still evolving, but we suggest the approach we developed for microarrays may be appropriate for metabolomics (being evaluated) RTI International RTI International RTI International Metabolomics does not reveal everything and different technologies show different things RTI International is a trade name of Research Triangle Institute. www.rti.org RTI International Technology and detection evolves over time. RTI International Technologies are not perfect in agreement RTI International The human urine metabolome RTI International Sample, Image and Data Quality Checking RTI International RTI International RTI International RTI International RTI International RTI International Metabolite quality Still evolving field RTI is one of the Metabolomics Reference Standards Synthesis Centers RTI International Know your data - What should it look like These are OK These are not OK RTI International One bad sample can contaminate an experiment Histogram of p-values Potentially Bad Data Histogram of p-values with bad data removed RTI International Quality of Database, Bioinformatics and Interpretative tools RTI International Understand what databases include, don’t include, and assumptions Just because a database says something does not mean it is right. Read the evidence. Databases are biased. Databases are incomplete Databases have lots of data Understand data before you use it Database are useful! RTI International Issues in the Annotation of Genes, proteins, metabolites RTI International is a trade name of Research Triangle Institute. www.rti.org RTI International Gene Symbol Aco2 Pdk2 Pdk2 Pdha2 Idh1 Acly Aco2 Fh1 Atp5g3 Suclg1 Mdh1 Mor1 Idh1 Idh3g Dlst Sdhd Sdhc RGD:735073 Cs RGD:621624 Idh3B Mdh1 Pc RGD:708561 RGD:708561 Dlat Sdhd Sdha Idh3a Pdk4 Cs Acly p-value 0.746656 0.967577 0.823635 0.368075 0.710704 0.367315 1.22E-06 6.76E-06 1.53E-06 8.87E-07 5.92E-09 4.24E-07 2.36E-06 2.19E-06 2.49E-07 5.13E-07 1.82E-06 2.13E-07 1.56E-07 1E-06 2.57E-07 1.08E-05 1.91E-05 0.004002 0.03978 4.76E-06 1.3E-06 7.85E-06 0.000449 0.044616 1.36E-06 0.000227 Annotation is inconsistent across sources fc 50/21 0.955755 1.005459 1.02781 1.403263 0.994378 0.982691 0.561041 0.690515 0.754735 0.694384 0.519311 0.617645 0.677013 0.709971 0.688339 0.583485 0.64108 0.570307 0.560436 0.486736 0.694389 0.496911 0.468765 0.76777 0.686511 0.435534 0.64335 0.730667 0.690147 1.700116 0.592128 0.554459 Gene Ontology Biological Process Gene Ontology Cellular ComponentPathway ----Krebs-TCA_Cycle // Ge 6086 // acetyl-CoA biosynthesis from pyruvate 5739 // mitochondrion // Krebs-TCA_Cycle // Ge 6086 // acetyl-CoA biosynthesis from pyruvate 5739 // mitochondrion // Krebs-TCA_Cycle // Ge 6096 // glycolysis 5739 // mitochondrion Krebs-TCA_Cycle // Ge 6099 // tricarboxylic acid cycle 5829 // cytosol --6099 // tricarboxylic acid cycle 5622 // intracellular Fatty_Acid_Synthesis ----Krebs-TCA_Cycle // Ge 6099 // tricarboxylic acid cycle // 5739 // mitochondrion Krebs-TCA_Cycle // Ge 6099 // tricarboxylic acid cycle // 5739 // mitochondrion --6099 // tricarboxylic acid cycle // 5739 // mitochondrion Krebs-TCA_Cycle // Ge 6099 // tricarboxylic acid cycle // --Krebs-TCA_Cycle // Ge 6099 // tricarboxylic acid cycle // 5739 // mitochondrion Krebs-TCA_Cycle // Ge 6099 // tricarboxylic acid cycle // 5829 // cytosol // --6099 // tricarboxylic acid cycle // 5739 // mitochondrion Krebs-TCA_Cycle // Ge ------6121 // mitochondrial electron transport, succinate 5749 // respiratory to ubiquinone chain complex II Krebs-TCA_Cycle (sensu Eukaryota)// Ge --------9352 // dihydrolipoyl dehydrogenase---complex --5739 // mitochondrion Krebs-TCA_Cycle // Ge 6099 // tricarboxylic acid cycle // 5829 // cytosol ------Krebs-TCA_Cycle // Ge 6099 // tricarboxylic acid cycle // --Krebs-TCA_Cycle // Ge 6094 // gluconeogenesis // 5739 // mitochondrion Krebs-TCA_Cycle // Ge --5913 // cell-cell adherens junction Krebs-TCA_Cycle // Ge --5913 // cell-cell adherens junction Krebs-TCA_Cycle // Ge 6086 // acetyl-CoA biosynthesis from pyruvate 5739 //// inferred mitochondrion from electronic // annotation Krebs-TCA_Cycle /// 6096 // glycoly // Ge 6121 // mitochondrial electron transport, succinate 5749 // respiratory to ubiquinone chain// complex inferred from II Krebs-TCA_Cycle (sensu sequence Eukaryota) or structu ////Ge in 6099 // tricarboxylic acid cycle // 5739 // mitochondrion // Krebs-TCA_Cycle // Ge 6099 // tricarboxylic acid cycle // 5739 // mitochondrion // Krebs-TCA_Cycle // Ge 6086 // acetyl-CoA biosynthesis from pyruvate 5739 // mitochondrion // Krebs-TCA_Cycle // Ge --5739 // mitochondrion // Krebs-TCA_Cycle // Ge 6085 // acetyl-CoA biosynthesis 5622 // intracellular // Fatty_Acid_Synthesis RTI International Issues with pathway data RTI International is a trade name of Research Triangle Institute. www.rti.org RTI International TCA cycle from Ingenuity TCA from GeneMAPP TCA cycle from Ingenuity RTI International Share Your Data Use shared data! RTI International is a trade name of Research Triangle Institute. www.rti.org RTI International Metabolomics WorkBench http://www.metabolomicsworkbench.org/ RTI International MetaboLights RTI International Overshare your data and show work Practice compendium research – to allow others to replicate your work Many high profile omic studies are not even technically reproducible RTI International Use metabolomics databases Limited in the literature so far. Some work on tissue and species metabolomes. RTI International Summary Design your experiment well Conduct your experiment well Control for non-biological sources of error Know what is good and bad quality data at each stage including metabolite, image, data, and annotation If you are aware of these issues and control for them highly powerful and reproducible metabolite experimentation is possible. Else you get garbage Share your data and use shared data RTI International References The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray based predictive models. Nat Biotechnol. 2010 August ; 28(8): 827–838. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 2006 Jan;7(1):55-65. Baggerly K. "Disclose all data in publications." Nature. 2010 Sep 23;467(7314):401. PMID: 20864982 Repeatability of published microarray gene expression analyses. Nat Genet. 2009 Feb;41(2):149-55 A design and statistical perspective on microarray gene expression studies in nutrition: the need for playful creativity and scientific hard-mindedness. Nutrition. 2003 Nov-Dec;19(11-12):9971000. 39 Steps. From Drug Discov Today. 2005 Sep 1;10(17):1175-82. If time allows RTI International RTI Regional Comprehensive Metabolomics Resource Core (RTI RCMRC) Susan Sumner, PhD Director RTI RCMRC Discovery Sciences Proteomics and Metabolomics Programs RTI International RTI International is a trade name of Research Triangle Institute. www.rti.org RTI International Contact Information for the RTI RCMRC Susan C.J. Sumner, PhD Director RTI RCMRC Senior Scientist nanoSafety RTI International Discovery Sciences 3040 Cornwallis Drive Research Triangle Park North Carolina 27709 ssumner@rti.org 919-541-7479 (office) 919-622-4456 (cell) Jason P. Burgess, PhD Program Coordinator, RTI RCMRC Associate Director, Discovery Sciences RTI International 3040 Cornwallis Drive Research Triangle Park North Carolina 27709 jpb@rti.org 919-541-6700 (office) RTI International MS and NMR Instruments at RTI and DHMRI RTI Mass Spectrometers (38) LC-MS GC-MS GC x GC-TOF-MS ICP-MS MALDI ToF/ToF NMR (6) DHMRI 13 4 1 6 2 6 3 1 1 1 2 4 RTI International Some RTI Metabolomics Applications and Pilots Experience with adolescent and adult human subject research, animal model and cell based research, e.g., Apoptosis- cells Drug induced liver injury- animal models in utero exposure to chemicals and fetal imprintinganimal models Dietary exposure and imprinting- animal models NAFLD - pediatric obesity; microbiome Weight Loss- pediatric obesity Preterm delivery- human subjects Response to vaccine- human subjects Nicotine withdrawal- human subjects Colon cancer- human subjects RTI International Pilot and Feasibility Studies The aim of the pilot and feasibility program is to foster collaborations and promote the use of metabolomics. Studies will be selected through an application process. – Application involves abstract, description of samples available (matrix type, volume, type and duration of storage, sample processing, freeze thaws, etc), description of phenotypes, and plan for subsequent grant/contract submissions for metabolomics analysis beyond initial pilot study. Applications may also include technology development. Applications must agree to deposit data in DRCC, coauthor publications, and submit joint grant/contract proposals. Deadlines being defined