Data mining and data annotation in genomics and proteomics Bonnie Webber

advertisement
Data mining and data
annotation in genomics and
proteomics
Bonnie Webber
School of Informatics
University of Edinburgh
Informatics (5*A)
• Language Technology
• Learning from Data
• Database Systems / XML Technology
Highlight: Theory and practice of
annotation in scientific databases
•
•
•
•
How to characterise annotations?
How to describe their attachment to data?
How to pass annotations through queries?
How to make programs “annotation
conscious”?
• (Buneman, Koch, Bickmore, …)
Example
Serves fine French Cuisine
in elegant setting. Jackets
required.
Extensive wine list!
NYRestaurants (Source Table)
Restaurant
Peacock Alley
Bull & Bear
Pacifica
Soho Kitchen & Bar
Cost
Type
Zip
$$$
$$$
French 10022
Seafood 10022
$
$
Chinese 10013
American 10022
Yummy chicken curry!!
All Restaurants (View 1)
Restaurant
Peacock Alley
Bull & Bear
Pacifica
Soho Kitchen & Bar
Cost
Cheap Restaurants (View 2)
Type
$$$
$$$
French
Seafood
$
$
Chinese
American
Restaurant
Pacifica
Soho Kitchen & Bar
Cost
$
$
Type
Chinese
American
Highlight: large-scale sequence
annotation
• To use phylogenomic methods to propagate
gene and sequence annotation through large
families of “neglected” organisms
• This extends the use of available functional
annotation from well-annotated organisms to
“neglected” ones
• (Blaxter, Parkinson, Williams)
Highlight: Probabilistic Modelling of
Biological Systems and Sequences
• Use conditional random fields to model the
promoter region of genes, to capture “long
distance” dependencies (Osborne, Ghazal)
• Use genetic algorithms to explore a vast
space of gene network topologies, to find
ones consistent with expression data
(Armstrong, Levine)
• Induce dynamic Bayesian models that can
express non-linear temporal relations in gene
networks (Armstrong, Barber)
Behavioural and genetic responses to gravity:
Flies in Space
J. Douglas Armstrong
havioural Assay
9
8
7
6
5
4
3
2
1
Problem context: behavioural and
genetic responses to gravity
• In flies, expression levels of 208 genes
change in response to changes in gravity.
• About 70 mutant strains respond abnormally
to gravity.
• The goal is to induce the relevant gene
networks and understand how gravity affects
them.
• It is inappropriate to assume these networks
have a strictly linear response.
Fly walking
up tube
Flip tube
Ellipsoid body
inactivated
Highlight: Statistical Methods for
Haplotype Reconstruction
in Livestock Genetics
Michael T. Schouten
Dr. Chris Williams
Department of Informatics
University of Edinburgh
Professor Chris Haley
Department of Genetics
The Roslin Institute
Marker-Trait Association
SNP
Haplotypes
…ACGCTTGAA…
CA
…ACGCTTGTA…
CT
…ACGGTTGAA…
GA
…ACGGTTGTA…
GT
Marker Sequencing
TCG ACGGCA
+
G T
TCGGCGTCA
+
A
TCG G
G
CG T CA
Develop a Bayesian Model to Reconstruct Haplotypes for
a Breeding System that
•Has Limited Pedigree Information
•Does Not Conform to Hardy-Weinberg Assumptions
Future
• Our MSc program in bioinformatics is
growing.
• Successful MSc students are feeding into our
PhD program.
• PhD students in bioinformatics are being
funded under EPSRC quota studentships and
targetted bioinformatics studentships from
BBSRC and (we hope) the MRC.
• We hope to attract new staff and students to
bioinformatics and retain those we have.
Download