PowerPoint, 12 MB - Bioinf!

advertisement
Bioinformatics Support
in Trinity College
Karsten Hokamp
Darren Fitzpatrick
Fiona Roche
TBSI, 30/06/2015
Bioinformatics Support
in Trinity College
Position established in 2005
… to provide computational support
for SFI-Biotech funded projects
in Trinity College
Bioinformatics Support
Pharmacology
Microbiology
Biochemistry
Immunology
Psychology
Medicine
Genetics
Zoology
Botany
Already at MIT, Stanford, Harvard, Yale, Oxford, Cambridge, UCL, …
cost
data
Next-generation Sequencing
Major research areas:
• Sequence analysis
• Gene expression
• Gene regulation
• Computational evolutionary biology
• Network and systems biology
• Genome annotation
• Mutations in cancer
• Population genomics
• Literature mining
• Structural bioinformatics
• High-throughput image analysis
proficiency
Bioinformatics
Activities - NGS
RNA-seq, ChIP-seq, assembly, SNP detection
@HWI-ST1363:132:C201RACXX:3:1101:18624:2742 1:N:0:AGTTCC
GTTTAACTTGAGTGCAAGAGGGGAGAGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATAT
+
<@<ADDDB?FH<<EGD??:FFHE:C):11811?:DHIII<??F>GED7@GI@H@AB55:5?EBED
Activities - NGS
RNA-seq, ChIP-seq, assembly, SNP detection
@HWI-ST1363:132:C201RACXX:3:1101:18624:2742 1:N:0:AGTTCC
GTTTAACTTGAGTGCAAGAGGGGAGAGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGATATAT
+
<@<ADDDB?FH<<EGD??:FFHE:C):11811?:DHIII<??F>GED7@GI@H@AB55:5?EBED
Example – Genome assembly, SNP detection
Example: RNA-seq, 20 conditions
Web-browser for raw data
Example: RNA-seq, 20 conditions
Interactive online heat maps
for expression and fold-change visualisations
Example: ChIP-seq
Analyse in-house and public data sets
Example: ChIP-seq
Analyse in-house and public data sets
Activities - Data Visualisation
web-tools, BioConductor, Cytoscape, Circos
Activities - Web Servers
ArrayPipe: microarray data analysis
Activities - Web Servers
ArrayPipe: microarray data analysis
CAPS: coevolution of amino acids using protein sequences
Activities - Web Servers
ArrayPipe: microarray data analysis
CAPS: coevolution of amino acids using protein sequences
PFPE: Phylogenetic Fooprinting Ensemble
Activities - Web Servers
ArrayPipe: microarray data analysis
CAPS: coevolution of amino acids using protein sequences
PFPE: Phylogenetic FooPrinting Ensemble
PubCrawler: Literature alerting system
Activities - Web Servers
ArrayPipe: microarray data analysis
CAPS: coevolution of amino acids using protein sequences
PFPE: Phylogenetic FooPrinting Ensemble
PubCrawler: Literature alerting system
SalCom: Salmonella Typhimurium Gene Expression Compedium
More examples
Search bacterial genomes for homologues of a set of genes
More examples
Search bacterial genomes for homologues of a set of genes
Are binding sites of TFs X and Y close to antiviral genes?
More examples
Search bacterial genomes for homologues of a set of genes
Are binding sites of TFs X and Y close to antiviral genes?
List all proteins with 3D structure that lack amino acids X,Y,Z
More examples
Search bacterial genomes for homologues of a set of genes
Are binding sites of TFs X and Y close to antiviral genes?
List all proteins with 3D structure that lack amino acids X,Y,Z
Find specific motifs in a set of proteins
Introducing: Dr Fiona Roche
Background in Infectious Diseases
• 1997 – 2000: PhD Microbiology, TCD
– Staphylococcal adherence to host tissue
(Prof Tim Foster)
• 2000 – 2002: TCD, Postdoc
– Functional genomics of Staphylococcal genome
• 2002 – 2006: SFU Canada, Postdoc
– Development of a bioinformatics platform for
pathogenomics project (Prof. Fiona Brinkman)
• 2007 – 2015: HPSC, Data Manager
– Development of national surveillance systems
to monitor healthcare associated infections
Background in Infectious Diseases
• 1997 – 2000: PhD Microbiology, TCD
– Staphylococcal adherence to host tissue
(Prof Tim Foster)
• 2000 – 2002: TCD, Postdoc
– Functional genomics of Staphylococcal genome
• 2002 – 2006: SFU Canada, Postdoc
– Development of a bioinformatics platform for
pathogenomics project (Prof. Fiona Brinkman)
• 2007 – 2015: HPSC, Data Manager
– Development of national surveillance systems
to monitor healthcare associated infections
Background in Infectious Diseases
• 1997 – 2000: PhD Microbiology, TCD
– Staphylococcal adherence to host tissue
(Prof Tim Foster)
• 2000 – 2002: TCD, Postdoc
– Functional genomics of Staphylococcal genome
• 2002 – 2006: SFU Canada, Postdoc
– Development of a bioinformatics platform for
pathogenomics project (Prof. Fiona Brinkman)
• 2007 – 2015: HPSC, Surveillance
– Development of national surveillance systems
to monitor healthcare associated infections
Bioinformatics for Biologists
A web-based genome
analysis platform
designed for
experimental
biologists
http://galaxyproject.org/
Analysis Categories
•
•
•
•
•
•
•
Statistics
Visualisation
Sequence Analysis
NGS data analysis
Proteomics
Metagenomics
Computational Chemistry
Workflows
Example chip-seq workflow
Data input
fastq
Quality control
FastQC
Mapping to genome
Bowtie
Peak calling
MACS
Workflows
Example chip-seq workflow
Data input
fastq
Quality control
FastQC
Mapping to genome
Bowtie
Peak calling
MACS
Galaxy Workshops @TCD
• Local instance at TCD
• Introductory session
• Galaxy Workshop
Darren J. Fitzpatrick, Ph.D
Education
B.A. (Mod) Genetics
Trinity College Dublin
M.Res Computational Biology
University of York
“In silico approach to the prediction of
Antibody Thermostability”
Ph.D Statistical Genetics
University College Dublin
“The effect of combinations of genetic
variants”
Research Experiences
GWAS: plink, eQTLs, epistasis (BMC Genomics, 2015), gene expression,
synergy, genotype imputation, population stratification, recombination
Population Genetics: genetic ancestry, ancestry prediction, visualisation of
population data (AncestryMapper – PLoS One, 2012)
Data Integration: integrating novel and public data – SNPs, CNVs, epigenetic
modifications, DNA-protein interactions from ENCODE, NIH Roadmap,
Chromosome Capture Experiments, etc.
Pharmacology: combinatorial modeling of the effects of multiple drugs on
platelet activation (PloS Computational Biology, 2015)
Structural Biology: antibodies, tessellation, prediction of biophysical
properties, machine learning
Interests:
• Statistical & Population Genetics, Complex Traits, Epigenetics,
Next-Gen Technologies, Molecular Evolution
The Skills:
• Managing ‘Big Data’: data organising, cleaning, formatting
and general ‘wrangling’ – (Python, mySQL, Shell Scripting)
• Analysis (BIG & small data): hypothesis tests, statistical
modeling, supervised/unsupervised learning, data
visualisation – (R, Bioconductor, ggplot2, Cytoscape)
Current Projects:
• PhD Leftovers: GPCR gene expression in Alzheimer’s,
Recombination in Autism
• ChIP-chip analysis of yeast epigenetic modifications
• Developing tool to visualise comparisons amongst many
publicly available ChIP-Seq data sets
Teaching/Workshops
• Teaching experience: R & statistics to 1st year PhD students,
Introduction to Bioinformatics to undergraduates
Planned Workshops
• Python programming and R statistical computing workshops
for PhD students, Post-Docs and PIs in August/September
– How to work computationally with your data (Bye bye Excel!).
– How to explore, analyse and visualise your data.
Office: Westland Row, Smurfit Institute of Genetics
Contact: fitzpadj@tcd.ie
Activities - Training
Proposed introductory Workshops:
• Programming with Python
• Statistics with R
• Bioinformatics with Galaxy
TBSI Office
Thanks!
Questions / suggestions?
Fiona Roche
fmroche@tcd.ie
Karsten Hokamp
kahokamp@tcd.ie
bioinf.gen.tcd.ie
Darren Fitzpatrick
fitzpadj@tcd.ie
Download