A Gene Group Database for Systematic Analysis of Microarray Data Madelaine Marchin and Chris Seidel Stowers Institute for Medical Research, Kansas City, Missouri, USA Other databases Introduction Gene Group Database Often when analysing microarray data, it is easy to focus only on the genes that show the most difference between two samples. This artificially crops the center out of the data, in effect ignoring small changes that could offer valuable insight. Other gene group databases have been created. GSEA has MSigDB (Molecular signatures database), which contains groups of genes, including groups based on chromosomal position, signaling pathways, and experimental gene expression res from literature. MSigDB does not currently allow users to upload new groups or We do not limit the type of group users can upload. We encourage users to upload any groups they have developed. Groups might show enrichment in someone else’s data redistribute the database. set, which could be meaningful to both parties. We originally populated the database L2L is another gene list database. All lists represent experimental gene expression with groups from Gene Ontology and L2L (1,2). Here is a screenshot of the download results. Users can contribute their own experimental results. This database has been page. included in our Gene Group Database. Microarray Data Up Genes Data Analysis Up Genes Down Genes Gene group analysis does not work without groups of genes. We built a web database (http://research.stowers-institute.org/microarray/gene_groups) for people to use with group analysis. Our database imposes no limitation on groups, is generally extensible, encourage sharing, and serves multiple group analysis platforms. Down Genes Instead, gene group analysis (also known as gene set enrichment or iterative group analysis) aims to take predefined groups of genes and check their overall values in the microarray data. If a group’s member genes are showing a change in expression, that group is suddenly very interesting. Visualization • Gene Ontology groups • Chromosomal locations • Pathways • Groups from experimental results • genes targeted by a specific microRNA chromatin remodeling If you take every group of genes that you and other people can think of and examine the behavior of the groups in your data, the results are an easy to analyze list of groups changing in your experiment that can offer surprising insights. Group Analysis wt mut 012345 012345 500 Random groups of 6 Tag cloud Search options Users can upload and download groups without registering. Users who would like to keep track of private gene groups may register with a password. Groups can be downloaded in your chosen file format (currently three formats are offered to correspond with iGA, Catmap, and GSEA). A help page explains how to use iGA, Catmap, and Gene Group Database to analyze data. YKL067W YMR271C YGR061C The database back-end is a MySQL database with 7 tables to hold information about groups, users, tags, and genes. The website is written in HTML, PHP and javascript. 0.00 YGL234W 0.10 YML106W Density YMR120C YJL130C YIL116W 0 10 20 30 40 50 user id email password N = 500 Bandwidth = 0.7609 RPI1 SNO1 SNZ1 GO groups of 6 SNZ2 SNZ3 ER lumen THI2 THI20 THI21 THI22 THI4 THI5 THI80 Two groups of genes are shown above, as measured in a time course of yeast induced for expression of either a wt protein, or a misfolded mutant protein, which causes ER stress. Genes in group A are roughly the same under the two conditions, whereas genes in group B show diff erential expression in the wt cells relative to the mutant cells. Enrichment Factor Contact Information Entity-Relationship Diagram for Gene Groups Database. Each shape is a table in the database, and terms in red are keys. 1 20 Summary -We created a public web database to store gene groups for microarray data analysis. -We used our gene groups to run group analysis on a data set. 1 10 DNA replication initiation ribosome biogenesis signal transduction during conjugation with cellular fusion microtubule motor activity tRNA modification spindle pole body thiamin biosynthesis cation homeostasis arginine biosynthesis pyrimidine salvage fatty acid biosynthesis m group_user group_id user_id 0 35S primary transcript processing DNA replication checkpoint regulation of transcription, DNA−dependent alpha−amylase activity alpha DNA polymerase:primase complex ribosome biogenesis and assembly rRNA processing prospore formation 1,6−beta−glucan biosynthesis cytokinesis centric heterochromatin alpha−glucan biosynthesis ATPase activity, coupled to transmembrane movement of substances 1,3−beta−glucan biosynthesis glutamate catabolism glycolysis cell separation during cytokinesis vacuolar protein catabolism 0.10 THI13 0.00 THI12 Density THI11 B 1 nucleus transcription factor TFIID complex DASH complex mismatch repair nuclear chromatin telomere maintenance cytosol mRNA cleavage cell wall (sensu Fungi) SNARE binding nucleotide−excision repair RNA binding mediator complex endonuclease activity mitochondrial intermembrane space riboflavin biosynthesis transcription from RNA polymerase II promoter cell surface mitochondrial protein processing histone acetylation cell wall organization and biogenesis sporulation (sensu Fungi) prefoldin complex vacuolar membrane double−strand break repair via homologous recombination mRNA export from nucleus ER−associated protein catabolism Ctf18 RFC−like complex cell wall mannoprotein biosynthesis ESCRT III complex GTP binding exocyst phosphoprotein phosphatase activity cytoplasmic mRNA processing body SNAP receptor activity intracellular protein transport late endosome to vacuole transport de novo pyrimidine base biosynthesis chromosome condensation chromatin silencing at telomere cytosolic large ribosomal subunit (sensu Eukaryota) ergosterol biosynthesis Golgi membrane metallopeptidase activity heme biosynthesis DNA repair signal recognition particle (sensu Eukaryota) mitochondrial outer membrane kinetochore Golgi apparatus iron−sulfur cluster assembly cell septum conjugation with cellular fusion mitochondrial small ribosomal subunit proteolysis chromosome segregation translation elongation factor activity barrier septum mRNA catabolism DNA replication, removal of RNA primer polynucleotide adenylyltransferase activity mRNA processing DNA−dependent ATPase activity protein folding COPII vesicle coat response to cadmium ion cyclin−dependent protein kinase regulator activity cytosolic ribosome (sensu Eukaryota) protein amino acid acetylation nucleolar ribonuclease P complex DNA−directed RNA polymerase III complex histone methylation attachment of spindle microtubules to kinetochore glutathione transferase activity actin cortical patch distribution cell wall chitin biosynthesis mRNA cleavage and polyadenylation specificity factor complex chromatin silencing at silent mating−type cassette mRNA binding RNA elongation from RNA polymerase II promoter DNA binding chromosome, telomeric region chromatin silencing homologous chromosome segregation DNA recombination commitment complex cellular response to nitrogen starvation attachment of spindle microtubules to kinetochore during meiosis I nuclear pore posttranslational protein targeting to membrane centromeric DNA binding cellular morphogenesis establishment and/or maintenance of cell polarity (sensu Fungi) ATPase activity transcription initiation from RNA polymerase III promoter histidine biosynthesis nucleolus anaphase−promoting complex−dependent proteasomal ubiquitin−dependent protein catabolism medial ring Rho guanyl−nucleotide exchange factor activity RNA splicing factor activity, transesterification mechanism spore wall (sensu Fungi) DNA−directed RNA polymerase II, holoenzyme chromatin silencing at centromere protein sumoylation collapsed replication fork processing microtubule cytoskeleton organization and biogenesis barrier septum formation regulation of progression through cell cycle methionine metabolism sphingolipid biosynthesis GTPase activator activity endocytosis horsetail nuclear movement alpha−1,2−galactosyltransferase activity isoleucine biosynthesis endoplasmic reticulum membrane mannosyltransferase activity DNA replication bipolar cell growth exocytosis acetyl−CoA biosynthesis from pyruvate calcium ion transport nitrogen compound metabolism DNA−directed RNA polymerase activity cell cortex chromatin assembly or disassembly chromatin S−adenosylmethionine−dependent methyltransferase activity cytokinesis, site selection chaperonin−containing T−complex de novo IMP biosynthesis intra−Golgi vesicle−mediated transport GPI anchor biosynthesis histone acetyltransferase complex GTPase activity guanyl−nucleotide exchange factor activity induction of conjugation upon nitrogen starvation potassium ion homeostasis equatorial microtubule organizing center ATP−dependent DNA helicase activity ATP−dependent RNA helicase activity mating type switching 3−5 exonuclease activity DNA clamp loader activity processing of 20S pre−rRNA Shown here are results using groups from our database and iGA on data from Elena Hidalgo, S. pombe untreated vs. Osmotic shock. Shown are the groups that most changed from untreated to osmotic shock. Behind the scenes YMR217W A membrane actin binding iron ion homeostasis protein targeting to vacuole Golgi to endosome transport cytokinesis, contractile ring formation calcium ion binding hexose transporter activity mitochondrial large ribosomal subunit Hsp70 protein binding mitochondrion cell cortex of cell tip integral to endoplasmic reticulum membrane meiotic recombination cytosolic small ribosomal subunit (sensu Eukaryota) cell tip vesicle−mediated transport double−strand break repair response to oxidative stress actin cytoskeleton organization and biogenesis integral to plasma membrane endosome cell wall biosynthesis (sensu Fungi) translation initiation factor activity protein amino acid phosphorylation ribosome meiotic chromosome segregation mitotic spindle checkpoint anaphase−promoting complex calcium ion homeostasis tricarboxylic acid cycle DNA damage checkpoint transcription factor activity protein complex assembly oligosaccharyl transferase complex ubiquitin binding negative regulation of transcription by glucose nuclear membrane plasma membrane protein ubiquitination G2/M transition of mitotic cell cycle protein binding hydrolase activity endoplasmic reticulum mitochondrial outer membrane translocase complex COPI vesicle coat dolichol−linked oligosaccharide biosynthesis 0.8 Groups of genes can be created for a number of different reasons. Groups of genes are limited only by the imagination, and could include: You can search groups to find all groups containing genes of interest. When users upload groups, they are asked to “tag” them with various category terms for easy usergenerated categories. On the download page, a tag cloud shows the tags that have the most member groups and allows for easy navigation. 0.6 Group 5 0.4 Group 4 Group 3 untreated / osmotic shock groups electron transport extracellular region mitochondrial electron transport, ubiquinol to cytochrome c cAMP−mediated signaling proteasomal ubiquitin−dependent protein catabolism ATP synthesis coupled proton transport (sensu Eukaryota) phosphoinositide binding mitochondrial matrix signal transduction proteasome regulatory particle, base subcomplex (sensu Eukaryota) endopeptidase activity damaged DNA binding external side of plasma membrane mitochondrial genome maintenance protein amino acid dephosphorylation aerobic respiration clathrin binding autophagy oxysterol binding general RNA polymerase II transcription factor activity meiosis response to stress cytochrome c oxidase complex assembly chromatin remodeling membrane actin binding iron ion homeostasis protein targeting to vacuole Golgi to endosome transport cytokinesis, contractile ring formation calcium ion binding hexose transporter activity mitochondrial large ribosomal subunit Hsp70 protein binding integral to membrane nuclear mRNA splicing, via spliceosome peroxisomal membrane ER to Golgi transport vesicle Golgi to vacuole transport condensed nuclear chromosome kinetochore FAD binding lipid metabolism peptide alpha−N−acetyltransferase activity chromosome organization and biogenesis (sensu Eukaryota) mitochondrial inner membrane ER to Golgi vesicle−mediated transport sulfur metabolism protein ubiquitination during ubiquitin−dependent protein catabolism actin cortical patch nuclear envelope−endoplasmic reticulum network carrier activity 3−5−exoribonuclease activity protein binding, bridging vacuolar membrane (sensu Fungi) mitochondrial nucleoid G1/S transition of mitotic cell cycle eukaryotic translation initiation factor 3 complex response to DNA damage stimulus 0.2 Group 1 electron transport extracellular region mitochondrial electron transport, ubiquinol to cytochrome c cAMP−mediated signaling proteasomal ubiquitin−dependent protein catabolism ATP synthesis coupled proton transport (sensu Eukaryota) phosphoinositide binding mitochondrial matrix signal transduction proteasome regulatory particle, base subcomplex (sensu Eukaryota) endopeptidase activity damaged DNA binding external side of plasma membrane mitochondrial genome maintenance protein amino acid dephosphorylation aerobic respiration clathrin binding autophagy oxysterol binding general RNA polymerase II transcription factor activity meiosis response to stress cytochrome c oxidase complex assembly 0.0 Group 2 30 40 50 N = 111 Bandwidth = 0.8769 For any given group of genes, a distance based on expression values can be calculated between two conditions, such as wt and mutant. For the example above, distances between wt ant mutant were calculated for groups containing randomly chosen genes. One can see that groups with random members give a distribution of distances. However when distances are calculated for groups of genes based on Gene Ontology terms, some groups partition themselves far away from the mean. In this case, groups relevant to ER stress distinguish themselves. Alternative methods of group analysis exist but all require a source of groups. m 1 group id description group_name date creator visibility downloads Chris Seidel, Managing Director of Microarray, Stowers Institute cws@stowers-institute.org http://research.stowers-institute.org/microarray/gene_groups Acknowledgements m Thanks to Elena Hidalgo for the S. pombe untreated vs. osmotic shock data and Antony Cooper for the ER stress data. 1 References group_tag group_id tag_id group_gene group_id gene_id 1 1 Breitling, R., Amtmann, A., & Herzyk, P. (2004). BMC Bioinformatics, 5. m m tag id tag Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., & Cherry, J. M. et al. (2000). Nature Genetics, 25(1), 25-29. gene id gene_name Breslin, T., Ede´n, P., & Krogh, M. (2004). BMC Bioinformatics, 5. Newman, J. C., & Weiner, A. M. (2005). Genome biology, 6 Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., & Gillette, M. A. et al. (2005).