Bioinformatic tools and methods used at Bioinformatics and Expression analysis core facility at NOVUM Tools ............................................................................................................ 1 GeneChip Operating Software (GCOS) .......................................................................... 1 R and Bioconductor .................................................................................................... 2 ArrayAssist ............................................................................................................... 2 PathwayAssist ........................................................................................................... 2 Tools from the core facility .......................................................................................... 3 Analysis methods ......................................................................................... 4 Quality Control .......................................................................................................... 4 Sample Similarities .................................................................................................... 4 Gene Selection .......................................................................................................... 5 Annotations .............................................................................................................. 5 Visualizations ............................................................................................................ 5 This paper include tools and methods for expression analysis. Mapping and resequencing will not be covered here. Tools Below are listed the main tools we use for microarray analysis. For further details, please contact helpdesk-staff@espresso.biosci.ki.se GeneChip Operating Software (GCOS) GCOS is the primary analysis tool for microarray data. It handles fluidic stations and scanner, and perform the initial analysis. At the core facility, GCOS runs on a local Windows machine with a Microsoft Data Engine. DataMiningTool (DMT) is an integrated part of GCOS, and a tool for downstream analysis of expression data. GeneChip DNA Analysis Software (GDAS) is also integrated with GCOS, and is a tool for for further data analysis, allele calling, and report generation for mapping arrays. As an option, GCOS can be upgraded to the GCOS Server client-server configuration for large-scale data management, enhanced security, and networkbased database access. GCOS data sheet DMT data sheet GDAS data sheet R and Bioconductor R is a language and environment for statistical computing and graphics, and is available as Free Software under the terms of the Free Software Foundation 's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS. Bioconductor is an open source and open development software project to provide tools for the analysis and comprehension of genomic data (bioinformatics). Several public developmental packages contain methods designed for expression and mapping analysis, and we also run scripts developed within the core facility. We can provide help with installation and startup of R and Bioconductor packages. Bioconductor webpage ArrayAssist ArrayAssist from Stratagene Software Solutions is a tool for processing and visualization of expression data. It has strong support for the Affymetrix platform with easy import of cel and chip files from GCOS, and with comprehensive NetAffx gene annotation support. Several statistical tools can be used together with different clustering methods. Probe level analysis algorithms includes PLIER, RMA, GC-RMA and MAS5. The full version of ArrayAssist is a commercial software, and can be downloaded as a 20-day trial version. A compressed version of the software is free of charge. Stratagene webpage Hardware requirement PathwayAssist PathwayAssist™ pathway analysis software helps you to interpret your experiment results in the context of pathways, gene regulation networks and protein interaction maps. Using curated and automatically created databases, PathwayAssist identifies relationships among genes, small molecules, cell objects and processes, builds networks and creates pathway diagrams good for publication. PathwayAssist support Affymetrix gene expression data. PathwayAssist webpage Tools from the core facility For further info, please contact David Brodin Name Description Author qc.display Display several types of QC information for a single chip Mark Reimers t.quantile.plot Two Sample and Paired Comparison Functions Mark Reimers perm Permutation to generate FDR for microarray data Alexander Ploner EisenPlot Intensity plot for microarray data from hierarchical clustering Alexander Ploner VolcanoPlot Volcano plot, following Wolfinger et al., J. Comp. Biol., 8(6) 2001, P. 625-637 Alexander Ploner cSpatialDiff2 Spatial smoothing of expression values. Based on cSpatialDiff of library(maCGB), but more flexible Alexander Ploner AnovaScript_v2.R Compute F-statistics for a factorial ANOVA-model and plot them Alexander Ploner Annotation Wrapper Reads Affymetrix microarray annotation files and add annotations to expression data in a useful format, including hyperlinks to public domains, pathway information, ontologies et c. David Brodin Affy Display Tool for displaying Affymetrix expression data and annotations. Methods for selection and visualization, including a browser for GO categories. David Brodin Haploblocks Software package for the visualization and the analysis of haplotype block structure in the human genome. Marco Zucchelli SEQCHECK This macro is thought to analyze output data of Sequenom machines. The macro reads in a worksheet containing SNPs alleles and return summary tables, Hardy Weinberg equilibrium, allele frequencies, success ratios for all the SNPS that have been genotyped. Marco Zucchelli R scripts Java applications SNP tools Analysis methods The core facility provide basic and extended bioinformatics services. Basic services include quality control and absolute and comparison data from GCOS, and will included in the overall prize. Extended services will be charged with an hourly fee after agreement with the customer. Probe Design Information Quality Control The core facility use a set of controls to make the quality of the results are satisfying. RNA quality is tested with the the Agilent Bioanalyzer, and microarray results are tested with internal Affymetrix controls in GCOS, and with a standard set of R methods. For each sample we run on microarray, a report file is generated in GCOS. This file contain quality controls like background, scaling factor, number of present and RNAdegradation. Report files are provided to the client as a part of the results. We visually inspect the scanned array image in GCOS or with the help of R scripts. Right image show different visualizations of probe intensity levels for an expression array (script written by Mark Reimers, NCI). We also use R scripts for visualizing RNA-degradation and data distribution (Box plots, pairwise scatter plots, histograms et c). Sample Similarities To investigate differences between groups of samples within a project, we use a set of multidimensional scaling functions in R. The MASS package provide several methods and metrics to visualize differences in expression between samples. These diagrams can be used in a diagnostic purpose, since outliers and clusters of samples will be clearly visible. Right figure shows Euclidean distances between samples, arranged in a hierarchical tree structure. Gene Selection One common selection method is to do a pairwise comparison in GCOS, and selected genes where the expression is changed between samples. We can also provide other selection methods, for example: Groupwise comparisons (t-tests, SAM et c) Different clustering methods Members of a specific gene family Genes with specific function or protein localization (ontologies) Members of specific pathways Chromosome localization Correlation with sample features Right figure shows results from a t-test in a t quantile plot. Annotations Affymetrix microarray features are supported by large amounts of annotation data available via Affymetrix NetAffx Analysis Center, an online resource for Affymetrix users. Information include probe design, annotation method, public domain references, functional annotations, sequence information and more. NetAffx is available to customers (after online registration), but the core facility can also annotate microarray data for the customer as an extended bioinformatics service. NetAffx Analysis Center Visualizations As an extended bioinformatics services, the core facility can assist customers in creating visualizations of expression data.