ArrayTrack --- Data management, analysis and interpretation tool for DNA microarray and beyond ArrayTrack – A brief history in the 5 years Development Cycle • AT version 1 (2001) – Filter array; data management tool; • AT version 2 (2002): in-house microarray core facility – Customized two color arrays; data management, analysis and interpretation; – Open to public (late of 2003) • AT version 3.1 (2004): VGDS – Affymetrix; analysis capability enhanced; • AT version 3.2 (2005): MAQC – Tested on 7 commercial platforms (Affy, Agilent one- and two-color arrays, ABI, CodeLink, Illumina …); – Integrated with other software (IPA, MetaCore, DrugMatrix, CEBS, SAS/JMP …) • AT version 4 (2006 – present) – CDISC/SEND standard; – VGDS VXDS ArrayTrack: Client-Server Architecture CLIENT SERVER Analysis Tools Study data (Clinical and non-clinical data) CDISC/SEND Microarray Proteomics Metabolomics MIAME Pub data (Gene annotation, Pathways …) NCBI, KEGG, GO … ArrayTrack: An Integrated Solution Clinical and nonclinical data OH HO HO Microarray data O Chemical Cl data ArrayTrack Cl Cl O Cl Cl Cl Cl Cl Cl Cl HO Cl Cl Cl CCl3 HO Proteomics data OH Public data Metabolomics data Cl Cl ArrayTrack Website http://www.fda.gov/nctr/science/centers/toxicoinformatics/ArrayTrack/ ArrayTrack: MicroarrayDB-LIB-TOOL - An integrated environment for microarray data management, analysis and interpretation uploading TOOL Exploring Microarray DB Gene selection LIB Interpretation • pathways • GO ArrayTrack for Microarray Data Management and Analysis Hypothesis Exp Design Microarray Exp ArrayTrack Components Data management Microarray DB Data analysis GeneTools Data interpretation GeneLib MicroarrayDB – Storing data associated with a microarray exp Microarray database: • Handling both one- and two-channel data, including affy data • Only the CEL file is required for affy data Microarray DB • Supporting toxicogenomics research by storing tox parameters, e.g., dose schedule and treatment, sacrifice time • MIAME supportive to capture the key data of a microarray experiment • Will be MAGE-ML compliant to ensure interexchangeability between ArrayTrack and other public databases LIB Component – Containing functional information for microarray data interpretation Functional data: • Individual gene analysis • Pathway-based analysis • Gene Ontology – based analysis • Linking expression data to the traditional toxicological data Microarray DB Human Human Genome Human Genome Project Human Genome Project Genome Project Project Mirrored Databases Public Databases LIB TOOL Component - Containing functionality for microarray data analysis TOOL Analysis tools: • Four normalization methods – Mean/median scaling for affy data – LOWESS for 2-color array Microarray DB • Gene selection method – T-test, permutation t-test, … LIB – Filtering using fold changes, intensity, flag inf … – Volcano plot, p-value plot … • Data exploring (e.g., HCA, PCA) • Many visualization tools (e.g., flexible scatter plot, Bar chart viewer,… Supporting Eight Platforms Individual hyb import • Affy, Agilent, ABI, Combimatrix, Eppendorf, GE Healthcare, Illumina and customized arrays • Affy data – Probe data (.cel file) – Probe-set data Importing data Batch import Normalization Apply to Gene Selection Data exploring Apply to Interpretation TOOL Microarray DB LIB Importing data PCA Scatter Plot Data uploading and QC Normalization 2-way HCA Apply to Expression pattern using the bar chart plot Four normalization methods, including LOWESS Gene Selection Data exploring Apply to Interpretation Significant genes can be identified based on: Cut-off of p-value (with or without Banferroni correction), fold-change, intensity or combinations thereof Volcano Plot (considering both p and foldchange) P-Value Plot (considering false positives/negatives) Gene Ontology analysis Individual gene analysis Pathway analysis Data Interpretation - GO-based analysis using GOFFA • GOFFA – Gene Ontology For Functional Analysis • It is developed based on Gene Ontology (GO) database • Important for grouping the genes into functional classes • GO – Three ontologies – Molecular function: activities performed by individual gene products at the molecular level, such as catalytic activity, transporter activity, binding – Biological process: broad biological goals accomplished by ordered assemblies of molecular functions, such as cell growth, signal transduction, metabolism – Cellular component: the place in the cell where a gene product is found, such as nucleus, ribosome, proteasome Array domain TOOL Microarray DB Study domain TOOL Study DB LIB Data Interpretation GOFFA: Gene Ontology-based tool Pathway-based tools: • Ingenuity Pathways Analysis • KEGG • PathArt Importing data Normalization Apply to Gene Selection Data exploring Apply to Gene Annotation Interpretation Ingenuity Pathways Analysis (IPA) Ingenuity Pathways Analysis Interrogate genes or proteins on “omics” scale Conduct statistical analysis Elucidate functional pathways • KEGG and PathArt provide canonical pathways • IPA provides both canonical and de-novo pathways Understand markers of efficacy and safety Review Tool for Pharmacogenomics Data Submission: ArrayTrack Receive the data; support future regulatory policy Analyze the data Microarray DB Tool Data repository Analysis Verify the biological interpretation ArrayTrack Components Lib Interpretation Future Direction - Toxicoinformatics Integrated System (TIS) GeneTools Microarray DB GeneLib ProteinTools Proteomics DB PathwayTools Metabonomics DB ProteinLib ToxicantLib PathwayLib Importing data PCA Scatter Plot Data uploading and QC Normalization 2-way HCA Apply to Four normalization methods, including LOWESS Gene Selection Expression pattern using the bar chart plot Data exploring Apply to Gene Ontology analysis Interpretation Significant genes can be identified based on: Cut-off of p-value (with or without Banferroni correction), fold-change, intensity or combinations thereof Volcano Plot (considering both p and foldchange) P-Value Plot (considering false positives/negatives) Individual gene analysis Pathway analysis ArrayTrack – Summary • An integrated solution for microarray data management, analysis and interpretation • Review tool for FDA pharmacogenomics data submission – Training course is provided to the FDA reviewers every two months – At present, ~40 reviewers has been trained • Freely available to public (http://edkb.fda.gov/webstart/arraytrack) • Users at big Pharma, academic and government institutions; U.S., Europe & Asia ArrayTrack Tutorial Topics 1. (Basic) Contents Comparing two groups (e.g., treated vs control groups) Statistical methods (t-test, permutation t-test, ANOVA) for group comparison. Differentially Expressed Genes (DEGs) identification Biological interpretation (individual gene analysis) using LIB Pathway analysis (KEGG, PathArt, IPA, MetaCore, Key Molnet) Gene Ontology analysis using GOFFA 2. Comparing multiple groups (e.g., multiple doses, time points) 3 VennDiagram Determine the common genes/pathways/functions shared by two or three gene lists (extended to cross-experiment and –platform comparison and systems biology) Apply VennDiagram to the external files 4 Data exploring tools: Principal Component Analysis (PCA) Hierarchical Cluster Analysis (HCA) Apply HCA and PCA to the external files Extensive features in HCA Topics Contents 5 Assessing gene expression profiles using BarChart Access BarChart from the TOOL box Access BarChart from the t-test result table Access BarChart from ChipLib and other Libs How to use BarChart for cross-experiment comparison Assign group by color 6 GeneList – An important concept in ArrayTrack Create a gene list through data filtering and statistical analysis Import/export a gene list Conduct normalization filtered by a gene lists Conduct statistical analysis (t-test/ANOVA, PCA, HCA and others) based on a gene list Export a dataset by specifying the gene list (extended for crossplatform and cross-experiment comparison) 7 Normalization methods For Affymetrix platform: MAS5, RMA, DChip, Plier, Plier+16 For other platforms: 7 methods (e.g., LOWESS) 8 How to create your own workspace Copy/Paste/duplicate an experiment 9 Import/Export Manual import and batch import Options of data exporting Export a selected dataset with specifying a sub list of gene Export multiple experiments and/or platforms using selected geneID types (e.g., RefSeq) 10 Other useful functions Correlation matrix IDConverter – converting one gene ID to another (e.g., from AffyID to AgilentID or GeneBank#, or LocusLinkID or vice verse) ScatterPlot – pair-wise plot JoinTable – Combine two tables SplitTable – If a table contains multiple hybridization data in column with genes in row, the function split the table into individual tables with single hybridization data. GetUniqueID – If a table contains duplicated IDs, the function pick out the unique IDs 11 Basic scripting for querying (raw and normalized) data and table Query data from the database tree (How to use *) e.g., *EST, EST*, *EST*=EST Query data in tables e.g., contain, like (%) and inlist