Developing GUI Microarray Analysis Tools Keith Satterley Bioinformatics, WEHI, Nov. 15 2005 1 Developing GUI Microarray Analysis Tools Overview. 1. R, Environment, tools & resources 2. Graphical tools. 3. LimmaGUI and AffylmGUI. 4. Example Analysis. 5. Resources available. 6. Future Developments. The Walter and Eliza Hall Institute of Medical Research 2 Developing GUI Microarray Analysis Tools The R Project for Statistical Computing • R is language and environment for statistical computing and graphics. R is released under the GNU license. • R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of platforms including Unix variants, Windows and MacOS. • S was developed by by John Chambers and colleagues at Bell Labs. R can be considered as a different implementation of S. • R was initially written by Robert Gentleman and Ross Ihaka of the Statistics Department of the University • of Auckland. • Since mid-1997 a large group of individuals have contributed to R by sending code and bug reports. • The R url is http://www.r-project.org/ The Walter and Eliza Hall Institute of Medical Research 3 Developing GUI Microarray Analysis Tools The R Project for Statistical Computing • R has an effective data handling and storage facility, • A suite of operators for calculations on arrays, in particular matrices, • Provides a vast number of useful statistical tools, many of which have been painstakingly tested, • R produces publication-quality graphics in a variety of formats, including JPEG, postscript, eps, pdf, and bmp, • A well-developed, simple and effective programming language. The Walter and Eliza Hall Institute of Medical Research 4 Developing GUI Microarray Analysis Tools The R Project for Statistical Computing • R allows users to add additional functionality by defining new functions. • C, C++ and Fortran code can be linked and called at run time. • R can be extended (easily) via packages. • There are about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet sites The Walter and Eliza Hall Institute of Medical Research 5 Developing GUI Microarray Analysis Tools Resources for R • Frequently Asked Questions: http://www.ci.tuwien.ac.at/%7Ehornik/R/R-FAQ.html • Archives - CRAN see next. • Mailing Lists – r-help@lists.r-project.org: – r-devel@lists.r-project.org: – r-sig-mac@stat.math.ethz.ch. • Bug-tracking System: http://bugs.r-project.org/ The Walter and Eliza Hall Institute of Medical Research 6 Developing GUI Microarray Analysis Tools Resources for R • CRAN = Comprehensive R Archive Network. • CRAN is a network of ftp and web servers around the world that store identical, upto-date, versions of code and documentation for R. The Walter and Eliza Hall Institute of Medical Research 7 • • • • • • • • • • • • • • • • • • • • • • • • Australia – http://cran.au.r-project.org/ PlanetMirror, Brisbane http://cran.ms.unimelb.edu.au/ University of Melbourne Austria – http://cran.at.r-project.org/ Technische Universitaet Wien Brasil – http://cran.br.r-project.org/ Universidade Federal do Parana?? http://www.insecta.ufv.br/CRAN/ Federal University of Vicosa http://cran.fiocruz.br/ Oswaldo Cruz Foundation, Rio de Jane http://lmq.esalq.usp.br/CRAN/ University of Sao Paulo, Piracicaba http://www.vps.fmvz.usp.br/CRAN/ University of Sao Paulo, Sao Paulo Canada – http://cran.stat.sfu.ca/ Simon Fraser University, Burnaby http://probability.ca/cran/ University of Toronto China – http://www.lmbe.seu.edu.cn/CRAN/ Southeast University, Nanjing Denmark – http://cran.dk.r-project.org/ dotsrc.org, Aalborg France – http://cran.fr.r-project.org/ CICT, Toulouse http://cran.univ-lyon1.fr/ Dept. of Biometry & Evol. Biology, University of Lyon http://mirror.internet.tp/cran/ Boese Internet, Paris Germany – http://cran.r-mirror.de/ Stefan Drees, Berlin http://pangora.org/cran/ Pangora GmbH, Hamburg http://cran.miscellaneousmirror.org/ Miscellaneousdata.de, Koeln http://umfragen.sowi.uni mainz.de/CRAN/ University of Mainz http://cran.mirrorplus.org/ mirrorplus.org, Muenchen Hungary – http://cran.hu.r-project.org/ Semmelweis University Italy – http://cran.arsmachinandi.it/ Ars Machinandi, Arezzo http://microarrays.unife.it/CRAN/ Universita di Ferrara http://rm.mirror.garr.it/mirrors/CRAN/ Garr Mirror, Milano http://dssm.unipa.it/C Universita degli Studi di Palermo Israel – http://cran.active.co.il/ Activetech Ltd, Tel-Aviv Japan – ftp://ftp.u-aizu.ac.jp/pub/lang/R/CRAN University of Aizu http://cran.md.tsukuba.ac.jp/ University of Tsukuba Korea – http://bibscvs.snu.ac.kr/R/ Seoul National University Netherlands – http://cran.nedmirror.nl/ Nedmirror, Amsterdam Poland – http://novum.am.lublin.pl/CRAN/ Skubiszewski Medical University, Lublin http://r.meteo.uni.wroc.pl/ University of Wroclaw Portugal – http://cran.pt.r-project.org/ Universidade do Porto Slovenia – http://www.fastmirrors.org/cran/ Fastmirrors.org, Besnica http://www.wsection.com/cran/ Wsection.com, Ljubljana South Africa – http://cbio.uct.ac.za/CRAN/ University of Cape Town http://cran.za.r-project.org/ Rhodes University Spain – http://cran.es.r-project.org/ Spanish National Research Network, Madrid Switzerland – http://cran.ch.r-project.org/ ETH Zuerich http://www.imsv.unibe.ch/cran/ Universitaet Bern http://cran.prokmu.com/ Prokmu Hosting, Bern Turkey – http://godel.cs.bilgi.edu.tr/mirror/cran/ Istanbul Bilgi University Taiwan – http://cran.cs.pu.edu.tw/ Providence University, Taichung http://cran.csie.ntu.edu.tw/ National Taiwan University, Taipei UK – http://cran.uk.r-project.org/ University of Bristol http://www.sourcekeg.co.uk/cran/ Sourcekeg, London USA – http://cran.cnr.Berkeley.edu University of California, Berkeley, CA http://cran.stat.ucla.edu/ University of California, Los Angeles, CA http://cran.ssds.ucdavis.edu/ University of California http://rh-mirror.linux.iastate.edu/CRAN/ Iowa State University, Ames, IA http://www.biometrics.mtu.edu/CRAN/ Michigan Technological University, Houghton, MI http://cran.wustl.edu/ Wa University, St. Louis, MO http://www.ibiblio.org/pub/languages/R/CRAN/ University of North Carolina, Chapel Hill, NC http://cran.us.r-project.org/ Pair Networks, Pittsburgh, PA http://lib.stat.cmu.edu/R/CRAN/ Statlib, Carnegie Mellon University, Pittsburgh, PA http://cran.hostingzero.com/ Hosting Zero, Dallas, TX http://cran.fhcrc.org/ Fred Hutchinson Cancer R Center, Seattle, WA Developing GUI Microarray Analysis Tools The Walter and Eliza Hall Institute of Medical Research 8 Developing GUI Microarray Analysis Tools CRAN Mirrors – 475 packages The Walter and Eliza Hall Institute of Medical Research 9 Developing GUI Microarray Analysis Tools Resources for R • Features of R. – Graphical abilities. – Package System. – Objects in R. The Walter and Eliza Hall Institute of Medical Research 10 Developing GUI Microarray Analysis Tools Graphical Capabilities in R • On unix(inc. Mac OS X) X11 is used. • On MS Windows it uses the MS windows system commands. • This is not a GUI, but a graphics device for plotting and drawing. • There are high level, low level and interactive plotting commands. • plot(x) is a high level command. – If x is a time series, this produces a time-series plot. – If x is a numeric vector, it produces a plot of the values in the vector against their index in the vector. – If x is a complex vector, it produces a plot of imaginary versus real parts of the vector elements. The Walter and Eliza Hall Institute of Medical Research 11 Developing GUI Microarray Analysis Tools Graphical Capabilities in R • Low-level plotting commands can be used to add extra information (such as points, lines or text) to the current plot. • abline(a, b) – Adds a line of slope b and intercept a to the current plot. • title(main, sub) – Adds a title main to the top of the current plot The Walter and Eliza Hall Institute of Medical Research 12 Developing GUI Microarray Analysis Tools An R command line Example • • • • • • • • • • • library(limma) setwd("C:/aaa-R/swirl/") getwd() list.files() targets <- readTargets("SwirlTargetsFile.txt") targets RG <- read.maimages(targets$FileName, source="spot") RG par(fg="yellow",bg="green") plot(RG$R,lwd=3) abline(2000,1,lwd=5,col ="black") The Walter and Eliza Hall Institute of Medical Research 13 Developing GUI Microarray Analysis Tools R Graphics The Walter and Eliza Hall Institute of Medical Research 14 Developing GUI Microarray Analysis Tools R Graphics (cont.) 10000 5000 0 Frequency 15000 PM Intensity distribution for PreS2 6 8 10 12 14 16 log2(PM Intensity) The Walter and Eliza Hall Institute of Medical Research 15 Developing GUI Microarray Analysis Tools Bioconductor Graphics The Walter and Eliza Hall Institute of Medical Research 16 Developing GUI Microarray Analysis Tools R Packages • Packages provide a mechanism for loading code and attached documentation. • Packaging automatically checks and creates various documentation files from one source • Creates distributable win.binary(.zip), mac.binary(.tgz) or source files(.tar.gz). • Packages can specify dependent or suggested packages The Walter and Eliza Hall Institute of Medical Research 17 Developing GUI Microarray Analysis Tools R Packages(cont.) • install.packages() can install a package and all its dependencies (and their dependencies…), either the essential ones and/or the suggested ones (which maybe needed for examples etc.) The Walter and Eliza Hall Institute of Medical Research 18 Developing GUI Microarray Analysis Tools Objects in R • The entities R operates on are technically known as objects. • The class of an object determines how it will be treated by what are known as generic functions. • For example print, plot or summary will react according to what sort of object they are called to work on. The Walter and Eliza Hall Institute of Medical Research 19 Developing GUI Microarray Analysis Tools Bioconductor • Url is http://www.bioconductor.org/ • Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data. • The Bioconductor core team is based primarily at the Fred Hutchinson Cancer Research Center. • Aims to promote high-quality documentation and reproducible research. • Aims to provide access to a wide range of powerful statistical and graphical methods for the analysis of genomic data. The Walter and Eliza Hall Institute of Medical Research 20 Developing GUI Microarray Analysis Tools Bioconductor • R and the R package system are the main vehicles for designing and releasing software. • Bioconductor has a commitment to full open source discipline, All contributions are expected to exist under an open source license such as GPL2 or BSD. The Walter and Eliza Hall Institute of Medical Research 21 Developing GUI Microarray Analysis Tools Bioconductor • Features of the Bioconductor site. – Packages – code – Packages – metadata – Version management system The Walter and Eliza Hall Institute of Medical Research 22 Developing GUI Microarray Analysis Tools Bioconductor Packages • • • • • • • • • • • • • • • • • • • • 140 code packages listed aCGH affxparser affy affycomp affydata affylmGUI affypdnn affyPLM affyQCReport altcdfenvs ~~~~~~ limma limmaGUI ~~~~~~ vsn webbioc widgetInvoke widgetTools xcms 1.4.0 1.2.0 1.8.1 1.6.0 1.6.0 1.4.0 1.4.0 1.6.0 1.8.0 1.4.0 Classes and functions for Array Comparative Genomic Hybridization data. Affymetrix File Parsing SDK Methods for Affymetrix Oligonucleotide Arrays Graphics Toolbox for Assessment of Affymetrix Expression Measures Affymetrix Data for Demonstration Purpose GUI for affy analysis using limma package Probe Dependent Nearest Neighbours (PDNN) for the affy package Methods for fitting probe-level models QC Report Generation for affyBatch objects alternative cdfenvs 2.2.0 1.6.0 Linear Models for Microarray Data GUI for limma package 1.8.0 1.2.0 1.2.0 1.6.0 1.2.0 Variance stabilization and calibration for microarray data Bioconductor Web Interface Evaluation widgets for functions Creates an interactive tcltk widget LC/MS and GC/MS Data Analysis • • PLUS 250 metadata packages • • • • • • From: ag agahomology To: zebrafishcdf zebrafishprobe 1.10.0 1.10.0 Affymetrix Arabidopsis Genome Array Annotation Data (ag) A data package containing annotation data for agahomology 1.10.0 1.10.0 zebrafishcdf Probe sequence data for microarrays of type zebrafish The Walter and Eliza Hall Institute of Medical Research 23 Developing GUI Microarray Analysis Tools Bioconductor – use the Subversion version mgt. system • Subversion! http://svnbook.red-bean.com/en/1.1/svn-book.html • Subversion is a free/open-source version control system. (replaces CVS). • That is, Subversion manages files and directories over time. • Subversion clients can access their repository across networks, which allows the version repository to be accessed by many users simultaneously. The Walter and Eliza Hall Institute of Medical Research 24 Developing GUI Microarray Analysis Tools Bioconductor – Version management system it remembers every change ever written to it: A client can ask historical questions like, “What did this directory contain last Wednesday?” or “Who was the last person to change this file, and what changes did they make?” • Subversion uses a Copy-Modify-Merge solution, rather than a Lock-Modify-Unlock procedure. The Walter and Eliza Hall Institute of Medical Research 25 Developing GUI Microarray Analysis Tools Graphical User Interfaces • These items are known as widgets. • Tcl/Tk is a tool for creating and interacting with widgets. • Tcl/Tk runs on unix, Windows and Mac OS X. The Walter and Eliza Hall Institute of Medical Research 26 Developing GUI Microarray Analysis Tools Tcl/Tk • Tcl/Tk needs to be installed on the computer as well as R. • There are prewritten librarys of Tcl/Tk tools- - for eg. TkTable. • The R package tcltk needs to be installed in R. • The tcltk R package is an interface between the R language and Tcl/Tk commands. The Walter and Eliza Hall Institute of Medical Research 27 Developing GUI Microarray Analysis Tools GUI Programs • On Windows Tcl/Tk talks to the MS Windows graphical window system. • On Unix(&Mac), Tcl/Tk talks to the X Windows system, hence X11 must be started first. • 1. Run X11 on Unix & Mac • 2. load the R package tcltk using: • library(tcltk) • library(affylmGUI) for example, (actually affylmGUI will automatically load tcltk) The Walter and Eliza Hall Institute of Medical Research 28 Developing GUI Microarray Analysis Tools R tcltk example • This can be used to test if tcltk (or Tcl/Tk) is working correctly: • >library(tcltk) • >tt <- tktoplevel() • >lbl <- tklabel(tt, text="Hello, World!") • >tkpack(lbl) • >but <- tkbutton(tt, text="OK") • >tkpack(but) The Walter and Eliza Hall Institute of Medical Research 29 Developing GUI Microarray Analysis Tools R tcltk testing tools • To check the path that Tcl/Tk uses to find libraries – >tclvalue(“auto_path”) – – – – [1] "{C:\\R\\rw2020\\R-2.2.0/Tcl/lib/tcl8.4} C:/R/rw2020/R-2.2.0/Tcl/lib ./lib C:/R/rw2020/R-2.2.0/Tcl/lib/tk8.4 C:/R/rw2020/R-2.2.0/library/tcltk/exec“ • To add an extra path to search, use: – >addTclPath(“C:/bin”) – >tclvalue(“auto_path”) – – – – [1] "{C:\\R\\rw2020\\R-2.2.0/Tcl/lib/tcl8.4} C:/R/rw2020/R-2.2.0/Tcl/lib ./lib C:/R/rw2020/R-2.2.0/Tcl/lib/tk8.4 C:/R/rw2020/R-2.2.0/library/tcltk/exec C:/bin“ – For a list of package commands: – >ls(package:tcltk) The Walter and Eliza Hall Institute of Medical Research 30 Developing GUI Microarray Analysis Tools Help Commands in R • help(mean) #help window on mean function • ?mean #same as help(mean) • help.search(“regression”) #Help files with alias or concept or title matching 'regression' using fuzzy matching: • help.start() #Browser into R docs • The Browser shows links into the R Language Definition, Installation & Administration of R, Package writing, Package documentation FAQ’s etc. The Walter and Eliza Hall Institute of Medical Research 31 Developing GUI Microarray Analysis Tools Some Useful R Commands for the GUI user! • • • • • • • • • • getwd() #Get working directory. setwd() #Set working Directory. list.files() #list files in working directory. ls() #list objects in workspace. rm(list=ls()) #Remove all objects (recommended at start of a session). savehistory(file=“History.txt”) source(file="C:/path/to/filename/file.R", echo=T) #reads commands from file.R and executes them. installed.packages() #detailed info on all packages installed. summary(RG) #displays basic data about object RG. library(limmaGUI) #loads limmaGUI package. The Walter and Eliza Hall Institute of Medical Research 32 Developing GUI Microarray Analysis Tools Cross Platform Issues • Installation issues are varied • MS Windows – able to be installed in C:\R by ordinary user • Unix – can be installed by user, but duplications if multiple users do so. • Mac OS X – special procedures necessary The Walter and Eliza Hall Institute of Medical Research 33 Developing GUI Microarray Analysis Tools LimmaGUI • limmaGUI is a Graphical User Interface (GUI) based on R-Tcl/Tk for the exploration and linear modelling of data from two-colour spotted microarray experiments, especially the assessment of differential expression in complex experiments. • Swirl Example Analysis. The Walter and Eliza Hall Institute of Medical Research 34 Developing GUI Microarray Analysis Tools AffylmGUI • AffylmGUI enables the user to perform quality assessment, low-level analysis and linear modeling of data from Affymetrix GeneChips®, with the ultimate goal of identifying differentially expressed genes. • Estrogen Example Analysis The Walter and Eliza Hall Institute of Medical Research 35 Developing GUI Microarray Analysis Tools WEHI website Resources • WEHI Bioinformatics home page http://bioinf.wehi.edu.au/ • Microarray Data Analysis http://bioinf.wehi.edu.au/marray/index.html LIMMA:Linear Models for Microarray Data http://bioinf.wehi.edu.au/limma/index.html limmaGUI: http://bioinf.wehi.edu.au/affylmGUI/ affylmGUI: http://bioinf.wehi.edu.au/affylmGUI/ James Wettenhall's Bioinformatics Home Page: http://bioinf.wehi.edu.au/folders/james/ R-Tcl/Tk Examples, Worked Examples for limma/affylmGUI at http://bioinf.wehi.edu.au/limmaGUI/R/library/limmaGUI/doc/DocIndex.html The Walter and Eliza Hall Institute of Medical Research 36 Developing GUI Microarray Analysis Tools Future Directions for AffylmGUI • additional plots to aid in quality assessment of a set of chips, including RNA degradation plots; • calculation and display of QC parameters recommended by Affymetrix (Affymetrix, 2004), such as percent present, ratios of 3’/5’ expression for hybridization controls and the like; • fitting of mixed linear models where there is technical replication; • support for other single-channel platforms. The Walter and Eliza Hall Institute of Medical Research 37 Developing GUI Microarray Analysis Tools Future Directions for LimmaGUI • additional plots to aid in quality assessment of a set of chips; • fitting of mixed linear models where there is technical replication; • fitting of mixed linear models where there is biological replication; • ? The Walter and Eliza Hall Institute of Medical Research 38 Developing GUI Microarray Analysis Tools Aknowledgments • James Wettenhall • Gordon Smyth • Ken Simpson • Terry Speed • Bioinformatics – many seminars on microarrays! The Walter and Eliza Hall Institute of Medical Research 39 Developing GUI Microarray Analysis Tools The Walter and Eliza Hall Institute of Medical Research 40