Data mining and data annotation in genomics and proteomics Bonnie Webber School of Informatics University of Edinburgh Informatics (5*A) • Language Technology • Learning from Data • Database Systems / XML Technology Highlight: Theory and practice of annotation in scientific databases • • • • How to characterise annotations? How to describe their attachment to data? How to pass annotations through queries? How to make programs “annotation conscious”? • (Buneman, Koch, Bickmore, …) Example Serves fine French Cuisine in elegant setting. Jackets required. Extensive wine list! NYRestaurants (Source Table) Restaurant Peacock Alley Bull & Bear Pacifica Soho Kitchen & Bar Cost Type Zip $$$ $$$ French 10022 Seafood 10022 $ $ Chinese 10013 American 10022 Yummy chicken curry!! All Restaurants (View 1) Restaurant Peacock Alley Bull & Bear Pacifica Soho Kitchen & Bar Cost Cheap Restaurants (View 2) Type $$$ $$$ French Seafood $ $ Chinese American Restaurant Pacifica Soho Kitchen & Bar Cost $ $ Type Chinese American Highlight: large-scale sequence annotation • To use phylogenomic methods to propagate gene and sequence annotation through large families of “neglected” organisms • This extends the use of available functional annotation from well-annotated organisms to “neglected” ones • (Blaxter, Parkinson, Williams) Highlight: Probabilistic Modelling of Biological Systems and Sequences • Use conditional random fields to model the promoter region of genes, to capture “long distance” dependencies (Osborne, Ghazal) • Use genetic algorithms to explore a vast space of gene network topologies, to find ones consistent with expression data (Armstrong, Levine) • Induce dynamic Bayesian models that can express non-linear temporal relations in gene networks (Armstrong, Barber) Behavioural and genetic responses to gravity: Flies in Space J. Douglas Armstrong havioural Assay 9 8 7 6 5 4 3 2 1 Problem context: behavioural and genetic responses to gravity • In flies, expression levels of 208 genes change in response to changes in gravity. • About 70 mutant strains respond abnormally to gravity. • The goal is to induce the relevant gene networks and understand how gravity affects them. • It is inappropriate to assume these networks have a strictly linear response. Fly walking up tube Flip tube Ellipsoid body inactivated Highlight: Statistical Methods for Haplotype Reconstruction in Livestock Genetics Michael T. Schouten Dr. Chris Williams Department of Informatics University of Edinburgh Professor Chris Haley Department of Genetics The Roslin Institute Marker-Trait Association SNP Haplotypes …ACGCTTGAA… CA …ACGCTTGTA… CT …ACGGTTGAA… GA …ACGGTTGTA… GT Marker Sequencing TCG ACGGCA + G T TCGGCGTCA + A TCG G G CG T CA Develop a Bayesian Model to Reconstruct Haplotypes for a Breeding System that •Has Limited Pedigree Information •Does Not Conform to Hardy-Weinberg Assumptions Future • Our MSc program in bioinformatics is growing. • Successful MSc students are feeding into our PhD program. • PhD students in bioinformatics are being funded under EPSRC quota studentships and targetted bioinformatics studentships from BBSRC and (we hope) the MRC. • We hope to attract new staff and students to bioinformatics and retain those we have.