Introduction slides on ArrayTrack

advertisement
ArrayTrack
--- Data management, analysis and interpretation
tool for DNA microarray and beyond
ArrayTrack – A brief history in the 5 years
Development Cycle
• AT version 1 (2001)
– Filter array; data management tool;
• AT version 2 (2002): in-house microarray core facility
– Customized two color arrays; data management, analysis
and interpretation;
– Open to public (late of 2003)
• AT version 3.1 (2004): VGDS
– Affymetrix; analysis capability enhanced;
• AT version 3.2 (2005): MAQC
– Tested on 7 commercial platforms (Affy, Agilent one- and
two-color arrays, ABI, CodeLink, Illumina …);
– Integrated with other software (IPA, MetaCore, DrugMatrix,
CEBS, SAS/JMP …)
• AT version 4 (2006 – present)
– CDISC/SEND standard;
– VGDS  VXDS
ArrayTrack: Client-Server Architecture
CLIENT
SERVER
Analysis Tools
Study data
(Clinical and
non-clinical data)
CDISC/SEND
Microarray
Proteomics
Metabolomics
MIAME
Pub data
(Gene annotation,
Pathways …)
NCBI, KEGG, GO …
ArrayTrack: An Integrated Solution
Clinical and nonclinical data
OH
HO
HO
Microarray data
O
Chemical
Cl data
ArrayTrack
Cl
Cl
O
Cl
Cl
Cl
Cl
Cl
Cl
Cl
HO
Cl
Cl Cl
CCl3
HO
Proteomics data
OH
Public data
Metabolomics data
Cl
Cl
ArrayTrack Website
http://www.fda.gov/nctr/science/centers/toxicoinformatics/ArrayTrack/
ArrayTrack: MicroarrayDB-LIB-TOOL
- An integrated environment for microarray data management, analysis
and interpretation
uploading
TOOL
Exploring
Microarray
DB
Gene
selection
LIB
Interpretation
• pathways
• GO
ArrayTrack for Microarray Data
Management and Analysis
Hypothesis
Exp Design
Microarray Exp
ArrayTrack
Components
Data management
Microarray
DB
Data analysis
GeneTools
Data interpretation
GeneLib
MicroarrayDB
– Storing data associated with a microarray exp
Microarray database:
• Handling both one- and two-channel data,
including affy data
• Only the CEL file is required for affy data
Microarray
DB
• Supporting toxicogenomics research by storing
tox parameters, e.g., dose schedule and
treatment, sacrifice time
• MIAME supportive to capture the key data of a
microarray experiment
• Will be MAGE-ML compliant to ensure interexchangeability between ArrayTrack and other
public databases
LIB Component
– Containing functional information for microarray data interpretation
Functional data:
•
Individual gene analysis
•
Pathway-based analysis
•
Gene Ontology – based analysis
•
Linking expression data to the
traditional toxicological data
Microarray
DB
Human
Human
Genome
Human
Genome
Project
Human
Genome
Project
Genome
Project
Project
Mirrored
Databases
Public Databases
LIB
TOOL Component
- Containing functionality for microarray data analysis
TOOL
Analysis tools:
• Four normalization methods
– Mean/median scaling for affy data
– LOWESS for 2-color array
Microarray
DB
• Gene selection method
– T-test, permutation t-test, …
LIB
– Filtering using fold changes, intensity,
flag inf …
– Volcano plot, p-value plot …
• Data exploring (e.g., HCA, PCA)
• Many visualization tools (e.g., flexible
scatter plot, Bar chart viewer,…
Supporting Eight Platforms
Individual hyb import
• Affy, Agilent, ABI, Combimatrix,
Eppendorf, GE Healthcare, Illumina
and customized arrays
• Affy data
– Probe data (.cel file)
– Probe-set data
Importing data
Batch import
Normalization
Apply to
Gene Selection
Data exploring
Apply to
Interpretation
TOOL
Microarray
DB
LIB
Importing data
PCA
Scatter Plot
Data uploading and QC
Normalization
2-way HCA
Apply to
Expression pattern using
the bar chart plot
Four normalization methods,
including LOWESS
Gene Selection
Data exploring
Apply to
Interpretation
Significant genes can be identified based on:
Cut-off of p-value (with or without Banferroni
correction), fold-change, intensity or
combinations thereof
Volcano Plot (considering both p and foldchange)
P-Value Plot (considering false
positives/negatives)
Gene Ontology analysis
Individual gene analysis
Pathway analysis
Data Interpretation
- GO-based analysis using GOFFA
• GOFFA – Gene Ontology For Functional Analysis
• It is developed based on Gene Ontology (GO) database
• Important for grouping the genes into functional classes
• GO – Three ontologies
– Molecular function: activities performed by individual gene
products at the molecular level, such as catalytic activity,
transporter activity, binding
– Biological process: broad biological goals accomplished by
ordered assemblies of molecular functions, such as cell growth,
signal transduction, metabolism
– Cellular component: the place in the cell where a gene product is
found, such as nucleus, ribosome, proteasome
Array domain
TOOL
Microarray
DB
Study domain
TOOL
Study DB
LIB
Data Interpretation
GOFFA: Gene Ontology-based tool
Pathway-based tools:
• Ingenuity Pathways Analysis
• KEGG
• PathArt
Importing data
Normalization
Apply to
Gene Selection
Data exploring
Apply to
Gene Annotation
Interpretation
Ingenuity Pathways Analysis (IPA)
Ingenuity
Pathways Analysis
Interrogate
genes or
proteins on
“omics” scale
Conduct
statistical
analysis
Elucidate
functional
pathways
• KEGG and PathArt provide canonical pathways
• IPA provides both canonical and de-novo pathways
Understand
markers of
efficacy and
safety
Review Tool for Pharmacogenomics Data
Submission: ArrayTrack
Receive the data;
support future
regulatory policy
Analyze the
data
Microarray
DB
Tool
Data repository
Analysis
Verify the
biological
interpretation
ArrayTrack Components
Lib
Interpretation
Future Direction - Toxicoinformatics Integrated System (TIS)
GeneTools
Microarray
DB
GeneLib
ProteinTools
Proteomics
DB
PathwayTools
Metabonomics
DB
ProteinLib
ToxicantLib
PathwayLib
Importing data
PCA
Scatter Plot
Data uploading and QC
Normalization
2-way HCA
Apply to
Four normalization methods,
including LOWESS
Gene Selection
Expression pattern using
the bar chart plot
Data exploring
Apply to
Gene Ontology analysis
Interpretation
Significant genes can be identified based on:
Cut-off of p-value (with or without Banferroni
correction), fold-change, intensity or
combinations thereof
Volcano Plot (considering both p and foldchange)
P-Value Plot (considering false
positives/negatives)
Individual gene analysis
Pathway analysis
ArrayTrack – Summary
• An integrated solution for microarray data management,
analysis and interpretation
• Review tool for FDA pharmacogenomics data submission
– Training course is provided to the FDA reviewers every two months
– At present, ~40 reviewers has been trained
• Freely available to public (http://edkb.fda.gov/webstart/arraytrack)
• Users at big Pharma, academic and government
institutions; U.S., Europe & Asia
ArrayTrack Tutorial
Topics
1.
(Basic)
Contents
Comparing two groups (e.g., treated vs control groups)
 Statistical methods (t-test, permutation t-test, ANOVA)
for group comparison.
 Differentially Expressed Genes (DEGs) identification
 Biological interpretation (individual gene analysis) using LIB
 Pathway analysis (KEGG, PathArt, IPA, MetaCore, Key Molnet)
 Gene Ontology analysis using GOFFA
2.
Comparing multiple groups (e.g., multiple doses, time points)
3
VennDiagram
 Determine the common genes/pathways/functions shared by two or three
gene lists (extended to cross-experiment and –platform comparison and
systems biology)
 Apply VennDiagram to the external files
4
Data exploring tools:
 Principal Component Analysis (PCA)
 Hierarchical Cluster Analysis (HCA)
 Apply HCA and PCA to the external files
 Extensive features in HCA
Topics
Contents
5
Assessing gene expression profiles using BarChart
 Access BarChart from the TOOL box
 Access BarChart from the t-test result table
 Access BarChart from ChipLib and other Libs
 How to use BarChart for cross-experiment comparison
 Assign group by color
6
GeneList – An important concept in ArrayTrack
 Create a gene list through data filtering and statistical analysis
 Import/export a gene list
 Conduct normalization filtered by a gene lists
 Conduct statistical analysis (t-test/ANOVA, PCA, HCA and others)
based on a gene list
 Export a dataset by specifying the gene list (extended for crossplatform and cross-experiment comparison)
7
Normalization methods
 For Affymetrix platform: MAS5, RMA, DChip, Plier, Plier+16
 For other platforms: 7 methods (e.g., LOWESS)
8
How to create your own workspace
 Copy/Paste/duplicate an experiment
9
Import/Export
Manual import and batch import
Options of data exporting
Export a selected dataset with specifying a sub list of gene
Export multiple experiments and/or platforms using selected geneID types
(e.g., RefSeq)
10
Other useful functions
Correlation matrix
IDConverter – converting one gene ID to another (e.g., from AffyID to
AgilentID or GeneBank#, or LocusLinkID or vice verse)
ScatterPlot – pair-wise plot
JoinTable – Combine two tables
SplitTable – If a table contains multiple hybridization data in column with
genes in row, the function split the table into individual tables with single
hybridization data.
GetUniqueID – If a table contains duplicated IDs, the function pick out the
unique IDs
11
Basic scripting for querying (raw and normalized) data and table
Query data from the database tree (How to use *)
e.g., *EST, EST*, *EST*=EST
Query data in tables
e.g., contain, like (%) and inlist
Download