Bioinformatics for genomics Kickoff Bioinformatics Expertise Center 10 November 2009 Judith Boer Dept. of Human Genetics Omics views of genomes Genetic variation SNPs, loss-of-heterozygosity Copy number variants Epigenetic variation DNA methylation Chromatin Expression variation RNA expression Gene structure Who are we? • Johan den Dunnen always new machines • Peter-Bram ‘t Hoen always new applications • Barend Mons biosemantics • Peter Taschner databases and annotation • Matt Hestand next generation sequencing analysis • Judith Boer microarray and integrated analysis Bioinformatics at Human Genetics Personal view: "Life science researchers should be able to analyze and interpret their own genomics data" • Tools • Courses • Research - example • Expertise Bioinformatics Tools – Commercial • Rosetta Resolver • database, analysis, visualization • Spotfire • analysis and visualization • Ingenuity Pathway Analysis • literature and high-throughput database mining • Dedicated platform tools • Agilent, Illumina, Affymetrix: image analysis, per array • www.lgtc.nl Bioinformatics Tools – Open source • Programming languages • R, Perl, Bash scripting (Linux), MySQL, Apache, PHP, Python, Java, … • Software, e.g. Bioconductor, BioPerl, Ensembl Perl API, Bowtie, BWA, Velvet, Varscan, Rmap, … • Alignment, analysis of next-generation sequencing and microarray data • Web browsers, e.g. UCSC, Ensembl • visualize data in relation to genome features • Gene Ontology, e.g. DAVID • functional annotation and enrichment Bioinformatics Research & Tools • BioSemantics • Databases and annotation • Next generation sequencing analysis • Microarray and integrated analysis • www.humgen.nl/bioinformatics.html • www.lgtc.nl Bioinformatics Tools – BioSemantics • Anni 2.1 (Jelier / Mons / 't Hoen) • associations between gene list and other genes, diseases, processes based on literature mining • Nermal (van Haagen / Mons / 't Hoen) • which proteins associate with / bind to my protein? Bioinformatics Tools – NGS analysis • GAPSS (Hestand / van Galen / 't Hoen) • modular pipeline for next-generation sequencing data analysis • CORE_TF (Hestand / 't Hoen) • Conserved and Over-REpresented Transcription Factor binding sites Bioinformatics Tools – Databases • LOVD (Fokkema / Taschner) www.lovd.nl • locus-specific DNA variation database • Mutalyzer (Fokkema / Taschner) • sequence variant nomenclature check Bioinformatics Tools – Microarray analysis • Microarray Retriever (Brandt / 't Hoen) • search and retrieve data from public array repositories • R packages (Menezes / van Iterson / Boer) • SIM: Statistical Integration of Microarrays • SSPA: Sample Size and Power Analysis for microarray data Bioinformatics Courses • Analysis of microarray gene expression data (MGC) • Judith Boer, Peter van der Spek (ErasmusMC) • 10th edition June 2010 (yearly, 30-40 participants PhD/PD) • Next-generation sequencing data analysis (MGC) • Johan den Dunnen, Judith Boer, Matt Hestand • 3rd edition planned February 2010 • www.medgencentre.nl • others: MolMed Research School ErasmusMC, NBIC Bioinformatics Research – Example • Next generation sequencing data analysis • ChIP-seq • gene structure (DeepCAGE) • expression analysis (DeepSAGE) • miRNA expression and identification • (targeted) re-sequencing • mutation (SNP) detection • copy number variation detection • de novo assembly Bioinformatics Expertise • Experimental design of microarray and NGS studies • Choice of analysis software • Use of analysis software • Setting up a locus-specific database • Some cases: help with data analysis Acknowledgements • BioSemantics: Herman van Haagen, Bharat Singh, Peter-Bram ‘t Hoen, Marco Roos, Barend Mons • Databases and annotation: Ivo Fokkema, Jacopo Celli, Gerard Schaafsma, Jeroen Laros, Peter Taschner • NGS analysis: Michiel van Galen, Jaap van der Heijden, Yuching Lai, Henk Buermans, Matthew Hestand • Microarray analysis: Maarten van Iterson, Judith Boer • Michel Villerius, Johan den Dunnen, Gertjan van Ommen