miame

advertisement
Introduction and Applications
of Microarray Databases
Chen-hsiung Chan
Department of Computer Science and Information
Engineering
National Taiwan University
MIAME (Minimum Information
About a Microarray Experiment)
 MIAME describes the Minimum
Information About a Microarray
Experiment that is needed to enable
the interpretation of the results of the
experiment unambiguously and
potentially to reproduce the
experiment. [Brazma et al, Nature
Genetics]
MIAME
 raw data (CEL or GPR files)
 final processed (normalized) data
 essential sample annotation including
experimental factors and their values
 experimental design including sample
data relationships
 sufficient annotation of the array
 essential laboratory and data
processing protocols
Databases using MIAME
 ArrayExpress at EBI
 GEO at NCBI
 CIBEX at DDBJ
ArrayExpress
http://www.ebi.ac.uk/microarray-as/aer/
 Stores transcriptomics and related
data
 Data warehouse stores gene indexed
expression profiles
 In accordance with MGED
recommendations: MIAME
ArrayExpress statistics
 Experiment repository: 2,914
experiments (each with at least 6
microarrays) and growing
 Expression profiles: including 267
experiments, 121,891 genes
 Data warehouse updated everyday
Searching ArrayExpress
 Keywords: breast cancer, cell cycle, …
etc.
 Accession numbers: E-XXXX-d, e.g.
E-AFFY-1281, E-TIGR-372, … etc.
 Secondary accession numbers: GEO
accession, e.g. GSE5389.
 Species names mainly in Latin names
(e.g. Homo sapiens), common names
may be used as well (e.g. human).
ArrayExpress interface
ArrayExpress Search/Browse Result
Keyword: lung cancer
ArrayExpress Search/Browse Result
Detailed view
Expression Profile results
 Thumbnail view
 BigPlot view
 Gene ranking (most differentially
expressed experiments are top
ranked)
 Similarity search: search genes with
similar expression levels
Take a break…
Gene Expression Omnibus (GEO)
http://www.ncbi.nlm.nih.gov/geo/
 Gene expression/molecular
abundance repository
 MIAME compliant
 Supports browsing, query and
retrieval
GEO record types
 Platform
 Sample
 Series
 DataSet
 Profile
GEO Platform
 Platform record defines the list of elements
that may be detected and quantified in that
experiment (e.g., cDNAs, oligonucleotide
probesets)
 Each Platform record is assigned a unique
and stable GEO accession number (GPLxxx)
 A Platform may reference many Samples
that have been submitted by multiple
submitters
GEO Sample
 Sample record describes the conditions
under which an individual Sample was
handled, the manipulations it underwent,
and the abundance measurement of each
element derived from it
 Each Sample record is assigned a unique
and stable GEO accession number (GSMxxx)
 A Sample entity must reference only one
Platform and may be included in multiple
Series
GEO Series
 A Series record links together a group of
related Samples and provides a focal point
and description of the whole study
 Series records may also contain tables
describing extracted data, summary
conclusions, or analyses
 Each Series record is assigned a unique and
stable GEO accession number (GSExxx)
GEO DataSet
 Assembled in NCBI
 Samples are all equivalently
measured and normalized
 Can be viewed and analyzed with
NCBI’s advanced data display and
analysis tool
GEO Profile
 Profile consists of the expression
measurements for an individual gene
across all Samples in a DataSet
 Profiles can be searched using Entrez
GEO Profiles
 Similar to Expression Profile in
ArrayExpress
SOFT (Simple Omnibus Format in
Text)
 Text based
 Line based
 Easily parsed with text processing
languages, including Perl, Python,
Ruby, PHP, … etc.
Take a break…
Network Biology Visualization
and Analysis
Cytoscape
 Open source network visualization
and analysis software
 ‘Core’ features include network layout
and query, also integrate
visualizations with state data
 Can be extended by plugins
Cytoscape developers
 University of California at San Diego (Trey
Ideker)
 Institute for Systems Biology (Leroy Hood)
 Memorial Sloan-Kettering Cancer Center
(Chris Sander)
 Institut Pasteur (Benno Schwikowski)
 Agilent Technologies (Annette Adler)
 University of California at San Francisco
(Bruce Conklin)
Cytoscape
 A java application
 Require Java 5 or 6 (JDK5/6 or
JRE5/6)
Simple Interaction Format (SIF)
 Each line denotes one interaction
InteractorA xx Interactor B
 ‘xx’ are interaction types:
 pp: protein-protein interaction
 pd: protein-DNA interaction
(transcription factor/regulation)
 pr (protein-reaction), rc (reactioncompound), cr (compound-reaction), gl
(genetic-lethal), pm (protein-metabolite),
mp (metabolite-protein)
Other interaction formats
supported






GML
XGMML
SBML
BioPAX
PSI-MI
Tab-delimited text table and excel
Cytoscape Demonstration
Applications of Gene Expression
 Gene selection (differentially
expressed genes)
 State annotation in networks
(expression level)
 Gene regulatory network
identification
Download