Bioinformatics Support in Trinity College Karsten Hokamp Darren Fitzpatrick Fiona Roche TBSI, 30/06/2015 Bioinformatics Support in Trinity College Position established in 2005 … to provide computational support for SFI-Biotech funded projects in Trinity College Bioinformatics Support Pharmacology Microbiology Biochemistry Immunology Psychology Medicine Genetics Zoology Botany Already at MIT, Stanford, Harvard, Yale, Oxford, Cambridge, UCL, … cost data Next-generation Sequencing Major research areas: • Sequence analysis • Gene expression • Gene regulation • Computational evolutionary biology • Network and systems biology • Genome annotation • Mutations in cancer • Population genomics • Literature mining • Structural bioinformatics • High-throughput image analysis proficiency Bioinformatics Activities - NGS RNA-seq, ChIP-seq, assembly, SNP detection @HWI-ST1363:132:C201RACXX:3:1101:18624:2742 1:N:0:AGTTCC GTTTAACTTGAGTGCAAGAGGGGAGAGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATAT + <@<ADDDB?FH<<EGD??:FFHE:C):11811?:DHIII<??F>GED7@GI@H@AB55:5?EBED Activities - NGS RNA-seq, ChIP-seq, assembly, SNP detection @HWI-ST1363:132:C201RACXX:3:1101:18624:2742 1:N:0:AGTTCC GTTTAACTTGAGTGCAAGAGGGGAGAGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATAT + <@<ADDDB?FH<<EGD??:FFHE:C):11811?:DHIII<??F>GED7@GI@H@AB55:5?EBED Example – Genome assembly, SNP detection Example: RNA-seq, 20 conditions Web-browser for raw data Example: RNA-seq, 20 conditions Interactive online heat maps for expression and fold-change visualisations Example: ChIP-seq Analyse in-house and public data sets Example: ChIP-seq Analyse in-house and public data sets Activities - Data Visualisation web-tools, BioConductor, Cytoscape, Circos Activities - Web Servers ArrayPipe: microarray data analysis Activities - Web Servers ArrayPipe: microarray data analysis CAPS: coevolution of amino acids using protein sequences Activities - Web Servers ArrayPipe: microarray data analysis CAPS: coevolution of amino acids using protein sequences PFPE: Phylogenetic Fooprinting Ensemble Activities - Web Servers ArrayPipe: microarray data analysis CAPS: coevolution of amino acids using protein sequences PFPE: Phylogenetic FooPrinting Ensemble PubCrawler: Literature alerting system Activities - Web Servers ArrayPipe: microarray data analysis CAPS: coevolution of amino acids using protein sequences PFPE: Phylogenetic FooPrinting Ensemble PubCrawler: Literature alerting system SalCom: Salmonella Typhimurium Gene Expression Compedium More examples Search bacterial genomes for homologues of a set of genes More examples Search bacterial genomes for homologues of a set of genes Are binding sites of TFs X and Y close to antiviral genes? More examples Search bacterial genomes for homologues of a set of genes Are binding sites of TFs X and Y close to antiviral genes? List all proteins with 3D structure that lack amino acids X,Y,Z More examples Search bacterial genomes for homologues of a set of genes Are binding sites of TFs X and Y close to antiviral genes? List all proteins with 3D structure that lack amino acids X,Y,Z Find specific motifs in a set of proteins Introducing: Dr Fiona Roche Background in Infectious Diseases • 1997 – 2000: PhD Microbiology, TCD – Staphylococcal adherence to host tissue (Prof Tim Foster) • 2000 – 2002: TCD, Postdoc – Functional genomics of Staphylococcal genome • 2002 – 2006: SFU Canada, Postdoc – Development of a bioinformatics platform for pathogenomics project (Prof. Fiona Brinkman) • 2007 – 2015: HPSC, Data Manager – Development of national surveillance systems to monitor healthcare associated infections Background in Infectious Diseases • 1997 – 2000: PhD Microbiology, TCD – Staphylococcal adherence to host tissue (Prof Tim Foster) • 2000 – 2002: TCD, Postdoc – Functional genomics of Staphylococcal genome • 2002 – 2006: SFU Canada, Postdoc – Development of a bioinformatics platform for pathogenomics project (Prof. Fiona Brinkman) • 2007 – 2015: HPSC, Data Manager – Development of national surveillance systems to monitor healthcare associated infections Background in Infectious Diseases • 1997 – 2000: PhD Microbiology, TCD – Staphylococcal adherence to host tissue (Prof Tim Foster) • 2000 – 2002: TCD, Postdoc – Functional genomics of Staphylococcal genome • 2002 – 2006: SFU Canada, Postdoc – Development of a bioinformatics platform for pathogenomics project (Prof. Fiona Brinkman) • 2007 – 2015: HPSC, Surveillance – Development of national surveillance systems to monitor healthcare associated infections Bioinformatics for Biologists A web-based genome analysis platform designed for experimental biologists http://galaxyproject.org/ Analysis Categories • • • • • • • Statistics Visualisation Sequence Analysis NGS data analysis Proteomics Metagenomics Computational Chemistry Workflows Example chip-seq workflow Data input fastq Quality control FastQC Mapping to genome Bowtie Peak calling MACS Workflows Example chip-seq workflow Data input fastq Quality control FastQC Mapping to genome Bowtie Peak calling MACS Galaxy Workshops @TCD • Local instance at TCD • Introductory session • Galaxy Workshop Darren J. Fitzpatrick, Ph.D Education B.A. (Mod) Genetics Trinity College Dublin M.Res Computational Biology University of York “In silico approach to the prediction of Antibody Thermostability” Ph.D Statistical Genetics University College Dublin “The effect of combinations of genetic variants” Research Experiences GWAS: plink, eQTLs, epistasis (BMC Genomics, 2015), gene expression, synergy, genotype imputation, population stratification, recombination Population Genetics: genetic ancestry, ancestry prediction, visualisation of population data (AncestryMapper – PLoS One, 2012) Data Integration: integrating novel and public data – SNPs, CNVs, epigenetic modifications, DNA-protein interactions from ENCODE, NIH Roadmap, Chromosome Capture Experiments, etc. Pharmacology: combinatorial modeling of the effects of multiple drugs on platelet activation (PloS Computational Biology, 2015) Structural Biology: antibodies, tessellation, prediction of biophysical properties, machine learning Interests: • Statistical & Population Genetics, Complex Traits, Epigenetics, Next-Gen Technologies, Molecular Evolution The Skills: • Managing ‘Big Data’: data organising, cleaning, formatting and general ‘wrangling’ – (Python, mySQL, Shell Scripting) • Analysis (BIG & small data): hypothesis tests, statistical modeling, supervised/unsupervised learning, data visualisation – (R, Bioconductor, ggplot2, Cytoscape) Current Projects: • PhD Leftovers: GPCR gene expression in Alzheimer’s, Recombination in Autism • ChIP-chip analysis of yeast epigenetic modifications • Developing tool to visualise comparisons amongst many publicly available ChIP-Seq data sets Teaching/Workshops • Teaching experience: R & statistics to 1st year PhD students, Introduction to Bioinformatics to undergraduates Planned Workshops • Python programming and R statistical computing workshops for PhD students, Post-Docs and PIs in August/September – How to work computationally with your data (Bye bye Excel!). – How to explore, analyse and visualise your data. Office: Westland Row, Smurfit Institute of Genetics Contact: fitzpadj@tcd.ie Activities - Training Proposed introductory Workshops: • Programming with Python • Statistics with R • Bioinformatics with Galaxy TBSI Office Thanks! Questions / suggestions? Fiona Roche fmroche@tcd.ie Karsten Hokamp kahokamp@tcd.ie bioinf.gen.tcd.ie Darren Fitzpatrick fitzpadj@tcd.ie