Bioinformatics tools and methods

advertisement
Bioinformatic tools and methods
used at Bioinformatics and Expression analysis core facility at NOVUM
Tools ............................................................................................................ 1
GeneChip Operating Software (GCOS) .......................................................................... 1
R and Bioconductor .................................................................................................... 2
ArrayAssist ............................................................................................................... 2
PathwayAssist ........................................................................................................... 2
Tools from the core facility .......................................................................................... 3
Analysis methods ......................................................................................... 4
Quality Control .......................................................................................................... 4
Sample Similarities .................................................................................................... 4
Gene Selection .......................................................................................................... 5
Annotations .............................................................................................................. 5
Visualizations ............................................................................................................ 5
This paper include tools and methods for expression analysis. Mapping and
resequencing will not be covered here.
Tools
Below are listed the main tools we use for microarray analysis. For further details,
please contact helpdesk-staff@espresso.biosci.ki.se
GeneChip Operating Software (GCOS)
GCOS is the primary analysis tool for microarray data. It handles fluidic stations and scanner,
and perform the initial analysis. At the core facility, GCOS runs on a local Windows machine
with a Microsoft Data Engine.
DataMiningTool (DMT) is an integrated part of GCOS, and a tool for downstream analysis of
expression data.
GeneChip DNA Analysis Software (GDAS)
is also integrated with GCOS, and is a tool for
for further data analysis, allele calling, and
report generation for mapping arrays.
As an option, GCOS can be upgraded to the
GCOS Server client-server configuration for
large-scale data management, enhanced
security, and networkbased database access.
GCOS data sheet
DMT data sheet
GDAS data sheet
R and Bioconductor
R is a language and environment for statistical computing and graphics, and is available as
Free Software under the terms of the Free Software Foundation 's GNU General Public License
in source code form. It compiles and runs on a wide variety of UNIX platforms and similar
systems (including FreeBSD and Linux), Windows and MacOS.
Bioconductor is an open source and open development software project to provide tools for the analysis and comprehension of genomic data (bioinformatics).
Several public developmental packages contain methods designed for expression and mapping analysis, and we also run
scripts developed within the core facility.
We can provide help with installation and startup of R and
Bioconductor packages.
Bioconductor webpage
ArrayAssist
ArrayAssist from Stratagene Software Solutions is a tool for processing and visualization of
expression data. It has strong support for the Affymetrix platform with easy import of cel and
chip files from GCOS, and with comprehensive NetAffx gene annotation support.
Several statistical tools can be used together with different
clustering methods. Probe level analysis algorithms includes
PLIER, RMA, GC-RMA and MAS5.
The full version of ArrayAssist is a commercial software, and
can be downloaded as a 20-day trial version. A compressed
version of the software is free of charge.
Stratagene webpage
Hardware requirement
PathwayAssist
PathwayAssist™ pathway analysis software helps you to interpret your experiment results in
the context of pathways, gene regulation networks and protein interaction maps.
Using curated and automatically created databases, PathwayAssist identifies relationships among genes, small molecules,
cell objects and processes, builds networks and creates
pathway diagrams good for publication.
PathwayAssist support Affymetrix gene expression data.
PathwayAssist webpage
Tools from the core facility
For further info, please contact David Brodin
Name
Description
Author
qc.display
Display several types of QC information for a
single chip
Mark Reimers
t.quantile.plot
Two Sample and Paired Comparison Functions
Mark Reimers
perm
Permutation to generate FDR for microarray data
Alexander Ploner
EisenPlot
Intensity plot for microarray data from
hierarchical clustering
Alexander Ploner
VolcanoPlot
Volcano plot, following Wolfinger et al., J. Comp.
Biol., 8(6) 2001, P. 625-637
Alexander Ploner
cSpatialDiff2
Spatial smoothing of expression values. Based on
cSpatialDiff of library(maCGB), but more flexible
Alexander Ploner
AnovaScript_v2.R
Compute F-statistics for a factorial ANOVA-model
and plot them
Alexander Ploner
Annotation
Wrapper
Reads Affymetrix microarray annotation files and
add annotations to expression data in a useful
format, including hyperlinks to public domains,
pathway information, ontologies et c.
David Brodin
Affy Display
Tool for displaying Affymetrix expression data and
annotations. Methods for selection and
visualization, including a browser for GO
categories.
David Brodin
Haploblocks
Software package for the visualization and the
analysis of haplotype block structure in the
human genome.
Marco Zucchelli
SEQCHECK
This macro is thought to analyze output data of
Sequenom machines. The macro reads in a
worksheet containing SNPs alleles and return
summary tables, Hardy Weinberg equilibrium,
allele frequencies, success ratios for all the SNPS
that have been genotyped.
Marco Zucchelli
R scripts
Java applications
SNP tools
Analysis methods
The core facility provide basic and extended bioinformatics services. Basic services
include quality control and absolute and comparison data from GCOS, and will
included in the overall prize.
Extended services will be charged with an hourly fee after agreement with the
customer. Probe Design Information
Quality Control
The core facility use a set of controls to make the quality of the results are satisfying. RNA
quality is tested with the the Agilent Bioanalyzer, and microarray results are tested with
internal Affymetrix controls in GCOS, and with a standard set of R methods.
For each sample we run on microarray, a report file is
generated in GCOS. This file contain quality controls like
background, scaling factor, number of present and RNAdegradation. Report files are provided to the client as a part
of the results.
We visually inspect the scanned array image in GCOS or with
the help of R scripts. Right image show different visualizations of probe intensity levels for an expression array
(script written by Mark Reimers, NCI).
We also use R scripts for visualizing RNA-degradation and
data distribution (Box plots, pairwise scatter plots,
histograms et c).
Sample Similarities
To investigate differences between groups of samples within a project, we use a set of
multidimensional scaling functions in R.
The MASS package provide several methods and metrics to
visualize differences in expression between samples. These
diagrams can be used in a diagnostic purpose, since outliers
and clusters of samples will be clearly visible.
Right figure shows Euclidean distances between samples,
arranged in a hierarchical tree structure.
Gene Selection
One common selection method is to do a pairwise comparison in GCOS, and selected genes
where the expression is changed between samples. We can also provide other selection
methods, for example:

Groupwise comparisons (t-tests, SAM et c)

Different clustering methods

Members of a specific gene family

Genes with specific function or protein localization
(ontologies)

Members of specific pathways

Chromosome localization

Correlation with sample features
Right figure shows results from a t-test in a t quantile plot.
Annotations
Affymetrix microarray features are supported by large amounts of annotation data available
via Affymetrix NetAffx Analysis Center, an online resource for Affymetrix users.
Information include probe design, annotation method, public domain references, functional
annotations, sequence information and more.
NetAffx is available to customers (after online registration), but the core facility can also
annotate microarray data for the customer as an extended bioinformatics service.
NetAffx Analysis Center
Visualizations
As an extended bioinformatics services, the core facility can assist customers in creating
visualizations of expression data.
Download