Annual Report for 2004 - Imperial College Centre for Bioinformatics

advertisement
Centre for Bioinformatics
Imperial College London
Second Report
31 May 2004
Centre Director: Prof Michael Sternberg
www.imperial.ac.uk/bioinformatics
bioinformatics@imperial.ac.uk
Support Service Head: Dr Sarah Butcher
www.codon.bioinformatics.ic.ac.uk
bsshelp@imperial.ac.uk
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
1
Contents
Summary..................................................................................................................... 3
1. Background and Objectives .................................................................................. 4
1.1 Objectives of the Report........................................................................................ 4
1.2 Background ........................................................................................................... 4
1.3 Objectives of the Centre for Bioinformatics........................................................... 5
1.4 Objectives of the Bioinformatics Support Service ................................................. 5
2. Management ......................................................................................................... 6
3. Biographies of the Team ....................................................................................... 7
4. Bioinformatics Research at Imperial ..................................................................... 8
4.1 Affiliates of the Centre........................................................................................... 8
4.2 Research............................................................................................................... 9
5. Teaching .............................................................................................................. 11
5.1 MSc in Bioinformatics.......................................................................................... 11
5.2 Wellcome Trust 4 year PhD in Bioinformatics..................................................... 11
5.3 Computational Bioinformatics ............................................................................. 11
6. The Bioinformatics Support Service.................................................................... 12
6.1 Introduction ......................................................................................................... 12
6.2 Management ....................................................................................................... 12
6.3 Hardware, Software and Databases ................................................................... 12
6.4 Support & Training .............................................................................................. 16
6.4 Research Projects............................................................................................... 16
6.5 Grants ................................................................................................................. 17
6.6 Courses............................................................................................................... 17
6.7 Financial Arrangements ...................................................................................... 17
7. The London Bioinformatics Forum ...................................................................... 19
7.1 Mission of the London Bioinformatics Forum ...................................................... 19
7.2 Objectives ........................................................................................................... 19
7.3 Management ....................................................................................................... 19
7.4 Activities .............................................................................................................. 19
8. Seminar Programme ........................................................................................... 20
9. Achievements and Plans..................................................................................... 22
9.1 Achievements...................................................................................................... 22
9.2 Plans ................................................................................................................... 22
Appendix 1 - Selected Publications........................................................................... 23
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
2
Summary
The mission of the Centre for Bioinformatics is to promote and co-ordinate
world-class research and training in Bioinformatics within Imperial College
London and to provide state-of-the-art Bioinformatics support to members of
Imperial for their research. Our second report describes the activities of the
Centre from 1st February 2003 to 31st May 2004 and documents our
publications and grants for the calendar year 2003. Additional information can
be found at www.imperial.ac.uk/bioinformatics.
The main achievements of the Centre during this period are:
•
The provision across the College of a Bioinformatics Support Service with
230 registered users as of May 2004.
•
The successful role of the Bioinformatics Support Service in obtaining a
£600K BBSRC grant for the application of E-science to provide support for
microarray analysis.
•
The development of collaborative research projects between the Support
Service and several research groups in the College.
•
The expansion of the Centre with the addition of 11 new Affiliates.
•
The publication by our Affiliates of more than 50 refereed papers in
Bioinformatics during 2003.
•
The award during 2003 of more than £2 million of grant support for research
and training in Bioinformatics.
•
The co-ordination of postgraduate teaching of Bioinformatics across the
College.
•
The running of a seminar series that attracts an audience from the College
and other organisations in the London area.
•
The establishment with colleagues from other London groups of the
London Bioinformatics Forum.
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
3
1. Background and Objectives
1.1 Objectives of the Report
This is the second report of the Centre for Bioinformatics and covers the period 1st
February 2003 to 31st May 2004. The objectives of this report are:
•
To document the status of the Centre as of May 2004, particularly describing the
contribution from our Steering Committee, External Advisors and the Affiliates.
•
To highlight the main developments over the period of this report.
•
To describe the activities of the Bioinformatics Support Service in terms of
facilities provided and its uptake by users.
•
To report grants awarded and publications for the calendar year 2003.
1.2 Background
•
Bioinformatics can be defined as the use of computational, mathematical and
statistical methods to organise, analyse and interpret biological information,
particularly at the molecular, genetic and genomic levels.
•
Bioinformatics is central to the interpretation and exploitation of the wealth of
biological data being generated in the post-genome era with the consequential
major clinical and commercial benefits.
•
It is vital that Imperial has world-class research in Bioinformatics together with
state-of-the-art facilities for all users. Since Bioinformatics research is located in
all four Faculties, a clearly identifiable focus is required, in particular to coordinate multi-disciplinary research. In parallel, it is essential to provide biologists
and clinical reseachers with state-of-the-art Bioinformatics to empower them to
deliver world-class research.
•
To address these issues, in 2001 the Deputy Rector together with the Faculties of
Life Sciences and of Medicine established the Centre for Bioinformatics and the
associated Bioinformatics Support Service.
•
The Bioinformatics Support Service is located on the newly-refurbished third floor
of the Biochemistry Building on the South Kensington Campus. This acts as the
focus for the Centre with its links to the different Departments and Campuses of
the College.
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
4
1.3 Objectives of the Centre for Bioinformatics
The objectives of the Centre for Bioinformatics are to:
•
Co-ordinate the strategic development of Bioinformatics at Imperial across the
four Faculties and at the different Campuses.
•
Develop new collaborative projects within and outside Imperial, particularly those
that are multi-disciplinary.
•
Contribute to a broad College-wide view of Bioinformatics including the
development of links with areas such as statistical genetics, chemometrics and
image processing.
•
Have a strategic role in the provision of teaching and training in Bioinformatics.
•
Organise seminar programs and symposia on Bioinformatics.
•
Disseminate relevant software, databases and information to the UK and world
scientific communities, both academic and industrial.
•
Facilitate the provision of state-of-the-art Bioinformatics support to members of
Imperial by directing the Bioinformatics Support Service.
1.4 Objectives of the Bioinformatics Support Service
The objectives of the Bioinformatics Support Service are to provide the following
services to all Imperial Campuses:
•
In-house facilities for major Bioinformatics tasks, such as sequence database
searching and microarray processing.
•
Access to appropriate commercial software and data.
•
Curated links to public domain sites providing additional services.
•
Expertise and training courses on the use of the above facilities.
•
Collaborative research on specific topics.
•
Support for undergraduate and postgraduate teaching.
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
5
2. Management
•
The Director of the Centre is Professor Michael Sternberg, Department of
Biological Sciences
•
There is a Steering Committee to manage the Centre comprised of senior
academic staff at Imperial involved in Bioinformatics. The Committee reports to
the Principals of the Life Sciences and Medical Faculties via the Director.
Membership of the Steering Committee
Member
Faculty / Affiliation
Prof Michael Sternberg
Prof Timothy Aitman
Prof David Balding
Prof John Darlington
Prof Paul Freemont
Prof Philippe Froguel
Prof Stephen Muggleton
Prof James Scott
Prof Richard Templer
Life Sciences / Biological Sciences (Chair)
Medicine / CSC
Medicine / Epidemiology and Public Health
Engineering / Computer Science / LESC
Life Sciences / Biological Sciences/ CSB
Medicine
Engineering / Computer Science
Medicine / GGRI
Physical Sciences / Chemistry
Dr Sarah Butcher (in attendance) Head Bioinformatics Support Service
CSC - MRC Clinical Sciences Centre; LESC - London e-Science Centre;
CSB - Centre for Structural Biology; GGRI – Genetics and Genomics Research Institute.
•
There is also a panel of External Advisors drawn from leading scientists in
academia and industry with a strong interest in Bioinformatics. Dr Philippe
Sanseau has kindly agreed to replace our previous industrial representative, Prof
Charlie Hodgman, who has now taken up a post with Nottingham University. We
take this opportunity to thank Prof Charlie Hodgman for his work as an External
Advisor.
Member
Prof Alan Bundy
Prof Lon Cardon
Prof Anna Dominiczak
Dr Philippe Sanseau
Prof Janet Thornton, FRS
External Advisors
Affiliation
Division of Informatics, Edinburgh University
Wellcome Trust Centre for Human Genetics, Oxford
Western Infirmary, Glasgow
GlaxoSmithKline, Stevenage
European Bioinformatics Institute, Hinxton
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
6
3. Biographies of the Team
Professor Michael Sternberg is the Director of the Centre. He holds the Chair of
Structural Bioinformatics in the Department of Biological Sciences. His first degree
was in Theoretical Physics (Cambridge) followed by an MSc in Computing at
Imperial. He moved into the Life Sciences via his D.Phil. research in Oxford on
protein modelling. Prior to joining Imperial in 2001, he held posts in the Department
of Crystallography, Birkbeck College and at Cancer Research UK.
Dr Sarah Butcher is the Head of the Bioinformatics Support Service (BSS). Her first
degree was in Applied Biology (Imperial) followed by a PhD in Cellular Immunolgy
from the National Institute for Medical Research (CNAA). She then worked as a
postdoctoral researcher in Virology for the NERC Centre for Virology and
Environmental Microbiology, Oxford. Subsequently Sarah joined Oxford University
Bioinformatics Centre – which she later managed for 3 years. Sarah took up her
post at Imperial in June 2002.
Dr James Abbott is the main software developer for the BSS. He obtained a BSc in
Biology with Biotechnology at the University of Luton, before undertaking a PhD in
plant biochemistry at the University of Dundee. Following this, James worked as a
bioinformatics specialist for Zeneca Agrochemicals (latterly Syngenta), contributing to
the expression bioinformatics project and running the SRS project.
Dr Gail Bartlett is the Computational Biologist for the BSS, responsible for user
support, tutorials and training. She obtained an undergraduate Masters degree in
Biochemistry from the University of Oxford and went on to study for a PhD in
Bioinformatics with Professor Janet Thornton, initially at University College London
and later at the European Bioinformatics Institute.
Mr Derek Huntley is a research assistant for the BSS, specialising in second-line
user support including development of custom java programs and interfaces. He
obtained a first degree in Biology at Sussex University and went on to complete an
MSc in Computer Science at Birkbeck College, London. He worked in the
Department of Computing at Imperial College London for 4 years developing
genomic annotation software before joining the BSS. He has recently submitted a
PhD.
Ms Ruth Walters is the Administrator of the Centre. She joined the Centre in
December 2001. Previously she obtained a degree in Philosophy (Southampton)
and then worked in educational administration.
Dr Suhail Islam assists the Centre part time in the management of local computing.
He is a member of the Structural Bioinformatics Group, develops software and
manages the Linux farm and the molecular graphics system. Previously he held
similar posts at Kings College, Birkbeck College and Cancer Research UK.
Additional support is obtained from members of the London e-Science Centre. The
BSS login server is housed within the Department of Computing and the BSS
receives Unix system support and additional advice from the team led by Professor
John Darlington and Dr Steven Newhouse.
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
7
4. Bioinformatics Research at Imperial
4.1 Affiliates of the Centre
The Centre coordinates Bioinformatics research at Imperial via a network of Affiliates
spanning all four Faculties and many Campuses. Affiliates are members of Imperial
directly involved in Bioinformatics who are either pursuing independent research or
are providing major support or development of Bioinformatics. In addition, several
heads of Imperial Centres are Affiliates, thereby representing the collective
Bioinformatics interests of a set of people. The Affiliates and their research interests
are given below.
During 2003, there were two major new appointments in Bioinformatics. Professor
Jaroslav Stark joined the Department of Mathematics and Dr Michael Stumpf joined
Biological Sciences with his research group being located within the Centre for
Bioinformatics. In addition to these two appointments, nine others academics at
Imperial became Affiliates of the Centre. The eleven new Affiliates are highlighted by
(*) below.
Computer Science, Mathematics and Statistics
Dr Mauricio Barahona (*)
Dr Simon Colton (*)
Prof John Darlington
Prof Yike Guo
Dr Martin Howard (*)
Prof Henrik Jensen
Prof David Hand
Prof Stephen Muggleton
Prof Sylvia Richardson
Prof Marek Sergot
Prof Jaroslav Stark (*)
Dr David Stephens
Prof Guang-Zhong Yang
Biomathematics and dynamical systems
Machine learning and artificial intelligence
High performance computing, e-science and the grid
Machine learning & data mining
Biophysics and pattern formation
Evolution of interacting networks
Statistical and machine learning methods
Machine learning and its application to bioinformatics
Hierarchical Bayesian models, clustering microarray data
Automated reasoning applied to problems in bioinformatics
Mathematical modelling of biological systems
Bayesian probabilistic analysis of biological sequences
Image processing applied to biomolecular modelling
DNA and Protein Sequence Analysis (including Phylogenetics)
Dr Austin Burt
Prof Charles Godfray, FRS
Dr Andy Purvis
Dr Mike Tristem
Dr Alfried Vogler
Evolution of non-Mendelian genetic elements
Population biology & phylogenetics
Inferring evolutionary processes from phylogenetic patterns
Retroviral and retroelement evolution
Comparative genomics and molecular systematics of insects
Genetics and Genomics
Prof David Balding
Dr Mark Field (*)
Prof Philippe Froguel (*)
Prof Neil Ferguson, OBE (*)
Prof James Scott, FRS
Prof Brian Spratt, FRS
Dr Michael Stumpf (*)
Dr John Whittaker
Prof Douglas Young (*)
Disease gene mapping & population genetics
Molecular parasitology
Genome annotation and SNP analysis
Modelling of pathogen population dynamics and evolution
Genetics & genomics
Characterisation of isolates of bacterial strains
Population genetics and comparative genomics
Statistical methods to identify disease genes
Infection & Immunity, pathogen genomics/proteomics
High-throughput 'Omics Methodologies
Dr. Helen Causton
Prof Anne Dell, FRS
Dr David Perkins (*)
Gene expression analysis & data warehousing
Mass spectrometric sequencing of biopolymers
Proteomics and sequence analysis
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
8
Macromolecular Structures
Prof Paul Freemont
Prof Michael Sternberg
Structure and function of biological macromolecules
Structural bioinformatics (especially protein modelling)
Physical and Chemical Methods
Dr Ian Gould
Dr Henry Rzepa
Simulation methods for biological systems
Quantum chemical modelling, XML, semantic web
Support and Training
Dr Sarah Butcher (*)
Bioinformatics support
4.2 Research
Since the Centre started in 2001, Affiliates of the Centre have obtained over £8M of
grants to support Bioinformatics research within Imperial. Most of this support was
obtained over 2001-2 and this has financed extensive research by the Affiliates.
Appendix 1 lists the research publications for calendar year 2003. There are more
than 50 papers including four in Nature, Science and the Proceedings of the National
Academy of Sciences, USA.
During 2003, over £2M of grants were obtained by Affiliates of the Centre for
research and training in Bioinformatics at Imperial. In the list below, multi-disciplinary
grants are placed under the research area of the principal investigator. We have not
included grants to our Affiliates for research outside Bioinformatics. Sums quoted are
the support to Imperial with the total award in brackets afterwards. The main
investigators at Imperial are reported.
Computer Science, Mathematics and Statistics
APRIL II – Application of probabilistic inductive logic programming. EU. £300K (£1M).
2004-2006. Muggleton & Sternberg.
To develop a sound theoretical understanding of probabilistic logic learning that enables one
to develop effective probabilistic learning systems. To apply these methods to applications in
bioinformatics including protein folding, modelling metabolic pathways and genetics.
Computational tools for Bayesian bioinformatics. MRC Training fellowship. £135K.
2003-2006. Lunn, Best & Whittaker.
This grant is to develop a user-friendly specialist interface and computational algorithms
tailored specifically for Bayesian statistical modelling.
Adverse event data mining. EPSRC. Case PhD Studentship with GSK. 2003-2006.Hand.
Drugs in the marketplace are subject to constant monitoring, so that possible side effects and
interactions with other drugs can be detected. Novel statistical tools are required to analyse
the large sparse datasets which are produced.
Genetics and Genomics
Bioinformatics for the analysis and exploitation of re-sequenced genomes. MRC/DTI
link. £400K (£1.5M). 2003-2006. Balding.
The goal of this research is to develop simulation models and statistical tools to investigate
optimal strategies for the use of whole genome re-sequencing data to investigate DNA
variants involved in disease causation. Joint with European Bioinformatics Institute, Wellcome
Trust Sanger Institute, and Solexa Ltd.
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
9
Molecular evolution of G-protein coupled receptors. Royal Society. £10K. 2003-2004.
Stumpf.
G-protein coupled receptors take a central role in cell-cell signaling. In this project we attempt
to understand the evolutionary history of these proteins and the amount of selection that has
operated on them in the human lineage.
Bayes network models of gene regulation. Royal Society. 2004-2006. £20K. Stumpf.
In order to understand the sequence if interactions in a gene regulation network we develop
and test Bayesian network models of gene regulation. Our framework is suitable both for
simulation as well as inferential procedures as we can determine parameters of the network
model from real data. With Professor C Wiuf.
Statistical modelling for the association of multiple SNP genotypes and phenotype.
MRC. PhD Studentship. Balding & Whittaker.
This project will exploit recent developments in spatial statistics to develop methods for the
analysis of data from genetic association studies where many SNPs have been genotyped.
High-throughput 'Omics Methodologies
Microarrays in clinical practice. Department of Health. £247K - 2004-2006. Causton,
Aitman, Navarange, Bloom & Stamp.
This project aims to extend the current Microarray Centre data warehouse to accommodate
clinical data. The use of microarrays in routine clinical use will bring a better understanding of
the relationship between genes and disease, tools for more accurate diagnosis allowing
treatment tailored to fit the individual and will assist in the development of new and more
effective therapies.
Macromolecular Structures
Modelling and prediction of docked protein-protein complexes. MRC. PhD Studentship.
2003-2006. Sternberg.
The aim is to enhance computational methods to predict the structure of a protein-protein
complex starting from the coordinates of the unbound components.
Prediction of protein specificity using machine learning. BBSRC. PhD Studentship.
2003-2006. Sternberg & Muggleton.
The aim is to develop a machine learning approach to predict protein function from structure.
Of particular important is to identify those residues involved in providing specificity of function.
Teaching and Training Programmes
A four year PhD programme in bioinformatics. Wellcome Trust. £1.2M. 2003-2008.
Sternberg & Field.
This programme supports 5 students commencing 2003 and 7 commencing 2004 to
undertake a 4 year PhD in bioinformatics at Imperial. In the first year the students will attend
the MSc in Bioinformatics. The next three years, the student will undertake a PhD in any of
the Departments associated with the programme.
The statistical analysis of gene expression data. EPSRC. £23K. 2003. Richardson.
Funding to organise a workshop to promote good statistical practice, to initiate new
methodological research on ways to analyse this type of data and to foster the interface
between new technological developments and the biological and experimental context. With
P.Brown (Kent).
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
10
5. Teaching
5.1 MSc in Bioinformatics
Imperial established an MSc in Bioinformatics in the academic year 2001-2. The
course takes graduates from either the Life Sciences or the Numerical/Physical
Sciences and trains them in Bioinformatics. The first half of the year has formal
courses covering Computer Programming (C++, JAVA, Perl); Mathematics &
Statistics and Bioinformatics. The second part of the year is spent on research
projects. Staff from all four Faculties at Imperial contribute to the formal teaching
and offer research projects.
In 2003, we awarded 16 degrees including five with distinction. Several students are
progressing to PhD research and many others obtained positions in industry and
academia employing their Bioinformatics skills.
We have 13 students enrolled on the 2003-4 course. The MRC provided one funded
MSc place for 2003-4.
Recently, the BBSRC provided support for five places on the MSc for three annual
intakes (2004 to 2006). In addition, the MRC are funding two places for admission in
2004.
5.2 Wellcome Trust 4 year PhD in Bioinformatics
In January 2003, the Centre for Bioinformatics with the Department of Biological
Sciences were awarded support from the Wellcome Trust to establish a 4 year PhD
programme in Bioinformatics. We recruited five students who started in October
2003. In keeping with the aims of the programme, these students came from a broad
range of undergraduate disciplines – Biological Sciences, Computing, Mathematics
and Statistics. In the first year the students are following the MSc in Bioinformatics.
Towards the end of the first year, students will select PhD research topics offered by
the contributing Departments (Biological Sciences, Chemistry, Computing,
Mathematics, the MRC Clinical Sciences Centre and Primary Care Division of the
Medical School). The students would then join the department of their primary
supervisor. To foster inter-disciplinary training, there will be a second supervisor
from a complementary discipline. The cohort will maintain contact with each other
via common seminar programmes. We have recruited seven students to join the
programme in October 2004.
This programme provides an excellent opportunity for Imperial to attract the best
students to hop disciplines and train in Bioinformatics.
5.3 Computational Bioinformatics
In 2003, the Department of Computing introduced a module “Introduction to
Bioinformatics” that is an option in three courses: the third year undergraduate
degree in Computing, the MSc in Computing (Conversion Course), and the third year
undergraduate degree in Electrical Engineering. The course is run by Drs Yike Guo
and Simon Colton from the Department of Computing.
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
11
6. The Bioinformatics Support Service
6.1 Introduction
The mission of the Bioinformatics Support Service (BSS) is to deliver state-of-the-art
software and training to members of the College to assist their research and teaching
in core areas of Bioinformatics. The team consists of Dr Sarah Butcher (Head), Dr
James Abbott (Software Developer), Dr Gail Bartlett (User Support & Training) and
Mr Derek Huntley (Research Assistant). Ms Nadia Anwar left the service in Oct 2003
to take up a PhD position at the University of Glasgow. She was replaced by Dr Gail
Bartlett, who joined the BSS in January 2004.
The offices of the Service moved to custom SRIF-refurbished space on level three of
the Biochemistry building on the South Kensington Campus in November 2003. This
places the service next to the Bioinformatics research groups of Prof Sternberg and
Dr Stumpf. The Service has its own meeting room and can access two rooms with
PC clusters.
6.2 Management
The management arrangement is that the Head of the Support Service reports to the
Steering Committee of the Centre via the Director of the Centre. To assist the
Support Service in achieving its mission an Operations Committee has been
established. The initial membership of the Committee reflects the essential input
required in Computing, user support and Bioinformatics.
Membership of the Bioinformatics Support Service Operations Committee
Member
Affiliation
Professor Michael Sternberg
Dr Sarah Butcher
Dr Helen Causton
Dr Steven Newhouse
Mr Arthur Spirling
Director of Centre (Chair)
Head of Support Service
MRC Micro Array Centre, Clinical Sciences Centre
London e-Science Centre, Dept. of Computing
Information and Communication Technologies
6.3 Hardware, Software and Databases
The main BSS login server remains a Sun V880 (8x750 MHz processors, 32 GB
RAM, 430 GB disk). We are fortunate to benefit from considerable additional shared
compute resources within the London e-Science Centre (LeSC) through Professor
John Darlington. To date, these comprise a Sun 6800 (24x750 MHz processors, 32
GB RAM, 6TB disk, 24 TB tape system) funded through a JREI grant, currently used
as an SRS server and for selected compute-intensive jobs, and a 133 dual processor
Intel/linux cluster (1 or 2 GB RAM per node). The latter is funded as part of a £2
million investment to support applied computational scientists within Imperial primarily for Bioinformatics, high energy Physics and Computational Engineering.
Job scheduling between the login server and additional shared resources has been
developed using Sun Grid Engine. Currently, selected large-scale analyses (e.g.
BLAST, Interproscan, HMMSearch with >300 input sequences) are targeted for
scheduling on the Linux cluster. Additional wrappers are under development to
extend the scope of shared resource use, with emphasis on ease of use and
transparency to users.
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
12
The BSS maintains a fully up-to-date comprehensive local set of public biological
databases employing cumulative updates – see Table 1. These are checked for
consistency offline and indexed for BLAST and SRS using a fully-automated system
designed and implemented by Dr Abbott. The Centre recently installed the
commercial BRENDA database of enzyme functional data, now available from the
web-site or directly via SQL queries. The BRENDA databases is an illustration of how
the Centre will install and maintain additional databases as requested by users.
The Centre supports a wide range of Unix-based bioinformatics software (see Table
2). The majority are freely available packages but commercial packages are used
where they add significant functionality (e.g. SRS – Lion Biosciences). The Centre
recently acted as a beta test site for the SRS8 package.
A large number of packages have additional PISE-generated web interfaces and
have been integrated within SRS for ease of access. In addition, the service has
adapted and hosted new web-based SiRNA design software from the Wistar
Institute, as well as building a custom graphical BLAST interface integrated with
SRS.
A ‘wish-list’ for commercial software of potential interest is also available from the
web-site. Users can request additions to the list and if sufficient interest is registered
from other users through the accompanying form, the software will be considered for
purchase and central installation.
Table 1 - Bioinformatics Databases Provided
DNA
EMBL
Genbank
dbEST
REFSEQN
Repbase
Miscellaneous
Enzyme
GO/ GOA
Locuslink
OMIM
Rebase
Taxonomy
Sequence-Related
BRENDA
DSSP/FSSP/HSSP
INTERPRO:
BLOCKS
PFAM
PRINTS
PROSITE
Prodom
PDB
Unigene
Uniref/Uniseq
Protein
Genpept
REFSEQP
UNIPROT:
PIR
Swissprot
TrEMBL
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
13
Table 2 - Major Bioinformatics Software
Multifunction Packages
Emboss (with JEMBOSS)
NCBI Toolkit
HMMER
PHYLIP
Codon Use
codonw
Database Searching
ballast
blixem
NCBI blast2
blimps
dbwatcher
fasta
hmmer
Interproscan
MSPcrunch
ssaha
Washington University blast2
Database Text Searching
SRS
entrez
Genome Analysis/Annotation
apollo
artemis
act
firstef
genscan
glimmer
glimmer
grailEXP
qrna
tricross
vista
wise2
Linkage
genehunter
morgan
QTLReaper
Simwalk2
transmit
Primer/SiRNA Design
oligoArray
primer3
siRNA
Phylogenetic Analysis
bonsai
fastdnaml
nifas
njplot
orthostrapper
phylip
protml
rio
tree-puzzle
Protein Structure
aqua
domainer
procheck
rasmol
structer
Repeats
maskeraid
repeatmasker
RNA
qrna
snoscan
trnascan
SNP Discovery
polybayse
polyphred
refcomp
snp_pipe (Oxagen)
Sequence Assembly & Trace Data
phrap
phred
staden package
Sequence Comparisons
avid
clustalw
clustalx
dialign2
dotter
hmmerviewer
jalview
lalnview
seaview
sim4
t-coffee
Sequence Manipulation
phd2fasta
readseq
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
14
Number of Users
250
200
150
100
50
M
ay
-0
4
4
M
ar
-0
Ja
n04
N
ov
-0
3
S
ep
-0
3
Ju
l-0
3
M
ay
-0
3
3
M
ar
-0
Ja
n03
N
ov
-0
2
0
Months
Figure 1 – Number of Users
Life Sciences
Medicine
Engineering
Physical Sciences
Figure 2 – Users by Faculty
South Kensington
Hammersmith
St Marys
Charing Cross
Others
Figure 3 – Users by Campus
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
15
6.4 Support & Training
The Service has 230 registered users as of May 2004, and user numbers continue to
grow (see Fig 1). Users split almost 50/50 between the Faculties of Life Sciences and
Medicine, with a small number in the Faculties of Engineering and Physical Sciences
(see Fig 2). Many campuses are represented, with the largest numbers of users from
the South Kensington, Hammersmith and St Mary’s campuses (see Fig 3).
Users are supported via email, phone and one-to-one or group meetings and site
visits. An email queue tracking system facilitates automatic call logging and tracking.
One to one and group-based advice and consultation are available on many aspects
of analysis, with the emphasis on training users to perform their own analyses.
Bespoke scripts and interfaces are developed to assist users as necessary,
particularly with bulk processing tasks. Where appropriate, these may be adapted
and made available for more generalised use by other College researchers, or
published for outside dissemination.
The BSS also provides help for researchers writing grant proposals. The scope of
this can vary from providing advice on appropriate data analysis methods and
references to include, through to active participation as co-applicants, with provision
of part or full-time posts to provide bioinformaticians for specific analyses and/or
development of new databases, scripts and software. Often the scope of
Bioinformatics analyses to fulfil a particular aim can appear difficult to quantify in
terms of resources. The BSS can provide costs for necessary staff time and identify
the resources required (e.g. additional disk storage). The Service also undertakes
pilot work for grant proposals e.g. exploratory analyses to provide preliminary results
to show proof-of-concept and strengthen cases. The Service produces statements
outlining the expertise of the BSS, which can be included in resource justifications, to
indicate how grants accessing the BSS resources have made a provision for optimal
data analysis.
6.4 Research Projects
The Service has already been engaged in a number of large projects with users
where significant new scripts, programs and/or user interfaces have been produced
for specified functions. These have enabled the researchers to process large and/or
problematic datasets, which would otherwise have proven difficult to handle with a
more piecemeal approach. A few of these are outlined below, and in several cases,
such work has led to substantial further collaborations:
•
The development of AriadneDB a program for automated EST clustering and
filtering with a java interface for phylogeny.
•
The development of an automated system for investigating LINE repeat structure
and distribution within selected mouse chromosomes including a Java interface
to view details which can be zoomed from chromosome to sequence level.
•
The writing of scripts for bulk pattern matching within SwissProt and filtering of
results into manageable groupings based on SwissProt annotations and GO
terms.
•
The development of extensive scripts for reformatting multiple large EST datasets
and BLAST results from a non-standard format to ‘extended’ EMBL format. In
addition, custom built SRS parsers were written to enable resulting data to be
available as SRS indices for easy user interrogation via command-line and SRS
web interface.
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
16
•
The re-annotation of a section of Anopheles and Drosophila genomes with
emphasis on putative alternative splice sites and storage of results in EMBL
format for viewing by multiple users in Artemis graphical viewer.
The Centre’s web site (www.codon.bioinformatics.ic.ac.uk - internal access only) has
continued to develop. We now offer direct access to a large number of software
packages as well as help pages covering a wide range of practical information on
common tasks, locally held software, databases and good practice. We are also
developing a set of self-contained downloadable tutorial exercises e.g. ‘Introduction
to using Unix for Bioinformatics’.
6.5 Grants
Dr Butcher recently led a successful grant application to the BBSRC Bioinformatics
and E-Science Programme (Butcher, Sternberg, Newhouse, Darlington, Causton &
Aitman - A distributed system for E-support of microarray data analysis and
management). This 3-year grant (£600K) provides three new postdoc positions and
will enable the BSS, together with the MRC Microarray Centre at the Hammersmith
Hospital and the London E Science Centre, to develop new methods for supporting
microarray data management and analysis within Imperial. This will underpin the
outreach of the BSS towards actively supporting microarray analysis within the
coming year.
6.6 Courses
The Centre has started to run a program of modular taught courses. A half-day
introductory course is now available and has already been delivered three times. This
course is free to registered users and is expected to run periodically on any Campus
where suitable computer teaching facilities are present. It has also been
complemented by practical software demonstrations at Wye College and the
Kennedy Institute.
A number of other practical half-day courses are in preparation and will commence
later this year. Titles shortly to be released include: ‘Biological databases and getting
the most from database interrogation’, ‘Sequence alignments and database
searching’, ‘Multiple sequence alignments – methods and uses’. It is envisaged that
these courses will incur a small fee.
Dr Butcher has also given a number of lectures on the facilities of the BSS, and their
use within the College. These include lectures as part of core introductory courses for
postgraduates and for undergraduates.
6.7 Financial Arrangements
The Support Service was established with major financial support initially from the
Pro-Rectors Reserve and subsequently from the Faculties of Life Sciences and of
Medical. Clearly with the realities of university finances, the Service cannot continue
to run on major funding from the Faculties. Research and teaching grants that have a
Bioinformatics component are required to include an access fee for use of the
Support Service. The access charge has initially been set at £1,500 per annum per
postdoc per new grant, and will include support of associated PhD students at no
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
17
extra charge. The College has facilitated administration of this access charge by
inclusion of a specific check-box on internal grant processing forms.
This access fee should be considered as funding a small fraction of skilled support
that maintains software and databases and additionally provides expert assistance
and advice. If individual groups were to undertake their own Bioinformatics support
in house, this would be far more expensive in terms of staff time and the resultant
service would almost always be far poorer. Thus the access fee is exceptionally
good value for money in a research grant. In addition to a core level of support for
many researchers, certain projects will require extensive Bioinformatics support. If
the research group wishes the Service to provide such support, then the group would
need to finance the appropriate level of staff time and computational resources from
the Service.
We consider these mechanisms to be the most effective strategy to finance the
Service. The alternative of charging each user has been shown in many
organisations, both in the UK and abroad, to be exceptionally problematic to
administer.
A well-financed Bioinformatics Support Service will empower a wide number of users
in the College to perform first class Bioinformatics in their projects which will be
translated into substantial enhancement of the quality of their research.
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
18
7. The London Bioinformatics Forum
7.1 Mission of the London Bioinformatics Forum
During 2003, the Centre for Bioinformatics at Imperial working with other
bioinformaticians in London established the London Bioinformatics Forum (LBF). The
mission of the LBF is to promote discussion amongst London based groups in all
areas of Bioinformatics including research, teaching, support and training with a view
to encouraging collaboration.
7.2 Objectives
The objectives of the London Bioinformatics forum are:
•
To exchange information about research activities, teaching, support and training
in Bioinformatics.
•
To organise a Bioinformatics seminar series primarily with contributions from
London-based researchers.
•
To identify areas for inter-institutional collaborations in both Bioinformatics and
other disciplines.
•
To identify funding opportunities that can be pursued by members of the Forum
via their home institutions.
•
To highlight to the UK and international communities the strengths in
Bioinformatics in London by mechanisms such as a common web site and
scientific meetings.
•
To facilitate the public engagement of science with respect to Bioinformatics.
7.3 Management
The chair of the LBF is Prof Michael Sternberg (Imperial) and the deputy chair is Prof
David Jones from UCL. The Steering committee of 30 has representatives from 17
London organisations. Further details can be found at the web site
www.londonbioinformatics.org.
7.4 Activities
In November 2004, the LBF held an Inaugural Open Day at Imperial with speakers
from several London organisations (see Section 8). We intend that future events will
include an opportunity for graduate students to present their work. The web site
provides links to both the research activities and major training programmes within
London. The LBF also has mailing lists for news announcements and bioinformatics
discussion
(lbf-announce@imperial.ac.uk
and
lbf-discuss@imperial.ac.uk)
maintained by staff of the Centre for Bioinformatics.
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
19
8. Seminar Programme
The Centre runs a seminar programme. The seminars are followed by networking
stimulated by refreshments. The audience comes from many of the Departments
and Campuses at Imperial and from other London groups. During 2003 we also held
scientific presentations as part of the Inaugural Open Day of our Centre and for the
Opening of the London Bioinformatics Forum. The 2003 programme consisted of the
following seminars.
Monday 3 March
Dr Nigel Saunders
Pathology, University of Oxford
‘Functional genomics and bacterial pathogenesis’
Dr Adrian Cootes
Biological Sciences, Imperial College London
‘The automatic discovery of structural principles describing protein fold space’
Tuesday 18 March - Inaugural Open Day of the Centre for Bioinformatics
Professor Janet Thornton FRS
Director of the European Bioinformatics Institute, Hinxton
‘The evolution of protein function from a structural perspective’
Professor Lon Cardon
Wellcome Trust Centre for Human Genetics, Oxford
‘Use of the human haplotype map in complex disease association studies’
Professor Carole Goble
Dept of Computing, University of Manchester
‘Ontologies and BioGrid services: prospects and pitfalls’
Dr Peer Bork
EMBL, Heidelberg
‘Function, prediction and protein networks’
Monday 24 March
Professor Luis Montero
University of Havana
‘Modeling molecules and biomolecules: basic principles and drug engineering’
Dr Jordi Villa i Freixa
Structural bioinformatics laboratory (GRIB), IMIM/Universitat Pompeu Fabra
‘Seeking for realistic energy profiles in ion channels simulations’
Friday 11 April
Professor Nikolay Kolchanov
Institute of Cytology and Genetics, Novosibirsk, Russia
‘TRRD: Database of transcription regulatory regions - implications for analysis of
expression data’
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
20
Monday 28 April
Professor David Hand
Department of Mathematics, Imperial College London
‘Statistical pattern detection in genomics and proteomics’
Dr Hilary Booth
Centre for Bioinformation Science (CbiS), Australian National University, Australia
‘Normalization of sequence alignment scores’
Monday 2 June
Dr Helen Causton
Microarray Centre, Faculty of Medicine, Imperial College London
‘Low level analysis of affymetrix gene expression data’
Professor Sylvia Richardson
Department of Epidemiology and Public Health, Imperial College London
‘Bayesian hierarchical models for gene expression data’
Monday 7 July
Professor Mark Sansom
Laboratory of Molecular Biophysics, University of Oxford
‘Membrane proteins: structural dynamics via simulations’
Monday 20 October
Dr Sarah Teichmann
Structural Studies Division, MRC Laboratory of Molecular Biology, Cambridge
‘Gene regulatory network growth by duplication’
Mr Jonathan Swire
Department of Biological Sciences, Imperial College London
‘Gradients in amino acid composition within the yeast genome as a response to
selection on cost’
Wednesday 26 November - Opening of the London Bioinformatics Forum
Professor David Jones
Department of Computer Science, University College London
‘Predicting old and new folds for genome sequences’
Professor Stephen Muggleton
Department of Computing, Imperial College London
‘Machine learning for bioinformatics’
Professor Richard Goldstein
NIMR, Mill Hill.
‘Evolutionary studies of G-protein coupled receptors’
Dr Lorenz Wernisch
School of Crystallography, Birkbeck College
‘Graphical models for interpreting microarray experiments’
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
21
9. Achievements and Plans
9.1 Achievements
The major achievements of the Centre for Bioinformatics over the period of this
report (1st February 2003 - 31st May 2004) are:
•
The provision across the College of a Bioinformatics Support Service with 230
registered users as of May 2004.
•
The successful role of the Bioinformatics Support Service in obtaining a £600K
BBSRC grant for the application of E-science to provide support for microarray
analysis.
•
The development of collaborative research projects between the Support Service
and several research groups in the College.
•
The expansion of the Centre with the addition of 11 new Affiliates.
•
The publication by our Affiliates of more than 50 refereed papers in
Bioinformatics during 2003.
•
The award during 2003 of more than £2 million of grant support for research and
training in Bioinformatics.
•
The co-ordination of postgraduate teaching of Bioinformatics across the College.
•
The running of a seminar series that attracts an audience from the College and
other organisations in the London area.
•
The establishment with colleagues from other London groups of the London
Bioinformatics Forum.
9.2 Plans
Our key objectives for the next year are:
•
To continue with the development of the Bioinformatics Support Service in terms
of the number of users assisted, the range of software that is available, the
breadth of advice and training provided, and the number of collaborative research
projects undertaken.
•
To extend the Bioinformatics research community within Imperial.
•
To stimulate the establishment of new research projects in Bioinformatics within
the College, particularly those which are multi-disciplinary.
•
To facilitate the inclusion of Bioinformatics
postgraduate courses in all Faculties.
•
To develop further links with colleagues outside Imperial both nationally and
internationally.
within
undergraduate
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
and
22
Appendix 1 - Selected Publications
Below we list a selection of publication in 2003 by the Affiliates to the Centre. We
focus on those with a Bioinformatics component and note that we do not cite
publications from our Affiliates in areas outside Bioinformatics.
Computer Science, Mathematics and Statistics
Balding, D.J. (2003).
Likelihood-based inference for genetic correlation coefficients.
Theoretical Population Biology, 63, 221-230.
Byng, M.C, Fisher, S.A., Lewis, C.M. & Whittaker, J.C. (2003). Variance components linkage
analysis for adjusted systolic blood pressure in the Framingham Heart Study. BMC Genetics
4(Suppl 1), S4.
Byng, M.C., Whittaker, J.C., Cuthbert, A.P., Mathew, C.G. & Lewis, C.M. (2003).
SNP
subset selection for genetic association studies. Annals of Human Genetics, 67, 543-556.
Callard, R.E., Yates, A. & Stark, J. (2003). Fratricide: A Mechanism for T Memory Cell
Homeostasis. Trends in Immunology, 24, 370-375.
Chan, C.C.W., George, A.J.T. & Stark, J., (2003). T Cell Sensitivity and Specificity - Kinetic
Proofreading Revisited. Discrete and Continuous Dyn. Sys. B, 3, 343-360.
Clifford, R. & Sergot, M.J. (2003). Distributed and paged suffix trees for large genetic
databases. In ‘Proc. 2003 of the 14th Annual Symposium on Combinatorial Pattern Matching
(CPM'03) R. Baeza-Yatres, E. Ch'ave, and M. Crochemore, editors, Morelia, Mexico, June
2003, LNCS 2676'. 70-82. Springer-Verlag.
Clifford, R. & Sergot, M.J. (2003). Distributed suffix trees and their application to large-scale
genomic analysis. In ‘Proc. International Conference on Computational Methods in Sciences
and Engineering (ICCMSE'03), Kastoria, Greece, September 2003'.
Colton, S. & Muggleton, S.H. (2003). ILP for mathematical discovery. In ‘Proceedings of
the 13th International Conference on Inductive Logic Programming’. 93-111. Springer-Verlag.
Denham, M.C. & Whittaker, J.C. (2003). A Bayesian approach to disease gene location using
allelic association. Biostatistics, 4, 399-409
Excoffier, L., Laval, G. & Balding, D.J. (2003). Gametic phase estimation over large genomic
regions using an adaptive window approach. Human Genomics, 1, 7-19.
Balding, D.J., Bishop, M. & Cannings, C. (2003).
Genetics, 2nd edition'. Wiley.
Editors of
Green, P.J., Hjort, N.L., Richardson, N. & Richardson, S.
Structured Stochastic Systems'. Oxford University Press.
'Handbook of Statistical
(2003).
Editors of 'Highly
Morris, A., Whittaker, J., Xu, C-F., Hosking, L. & Balding, D.J. (2003). Multipoint LD mapping
narrows location interval and identifies mutation heterogeneity. Proc. Natl. Acad. Sci. USA.
100, 13442–13446.
Muggleton, S.H., Tamaddoni-Nezhad, A. & Watanabe, H. (2003). Induction of enzyme
classes from biological databases. In ‘Proceedings of the 13th International Conference on
Inductive Logic Programming’, 269-280. Springer-Verlag.
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
23
Phillips, M.S., Lawrence, R., Sachidanandam, R., Morris, A.P., Balding, D.J., Cardon, L.R. &
29 authors, (2003). Chromosome-wide distribution of haplotype blocks and the role of
recombination hotspots. Nature Genetics, 33, 382-387.
Puech, A. & Muggleton, S.H. (2003). A comparison of stochastic logic programs and
Bayesian logic programs. In ‘ICAI03 Workshop on Learning Statistical Models from
Relational Data’. ICAI.
Sibly, R.M, Meade, A., Boxall, N., Wilkinson, M., Corne, D.W. & Whittaker, J.C. (2003). The
structure of interrupted human AC microsatellites. Mol. Biol. Evol. 20, 453-459.
Stark, J., Brewer, D., Barenco, M., Tomescu, D., Callard, R. & Hubank, M. (2003).
Reconstructing Gene Networks: What Are the Limits? Biochemical Society Transactions, 31,
1519–1525.
Stark, J. & Hardy, K. (2003). Chaos: Useful at Last. Science, 301, 1192-1193.
Stark, J., Callard, R. & Hubank, M. (2003). From the Top Down: Towards a Predictive
Biology of Gene Networks. Trends in Biotechnology, 21, 290-293.
Sternberg, M.J.E. & Muggleton, S.H. (2003). Structure activity relationships (SAR) and
pharmacophore discovery using inductive logic programming (ILP). QSAR and Combinatorial
Science, 22, 527-532
Whittaker, J.C., Harbord, R.M., Boxall, N., Mackay, I., Dawson, G. & Sibly, R.M. (2003).
Likelihood-based estimation of microsatellite mutation rates. Genetics, 164, 781-787.
Huntley, D., Hummerich, H., Smedley, D., Kittivoravitkul, S., McCarthy, M., Little, P.F.R. &
Sergot, M.J. (2003). GANESH: Software for customised annotation of genome regions.
Genome Research, 13, 2195-2202.
Whittaker, J.C., Gharani, N., Hindmarsh, P. & McCarthy, M.I. (2003). Estimation and testing
of parent of origin effects for quantitative traits. Am. J. Hum. Gen. 72, 1035-1039.
Wilson, J., Weale, M.E. & Balding, D.J. (2003). Inferences from DNA data: population
histories, evolutionary processes, and forensic match probabilities. Journal of the Royal
Statistical Society A. 166(2), 155-187.
DNA and Protein Sequence Analysis (including Phylogenetics)
Bininda-Emonds, O.R.P., Jones, K.E., Price, S.A., Grenyer, R., Cardillo, M., Habib, M.,
Purvis, A. & Gittleman, J.L. (2003). Supertrees are a necessary not-so-evil: a response to
Gatesy et al. Systematic Biology, 52, 724-729.
Gifford, R. & Tristem, M. (2003). The evolution, distribution and diversity of endogenous
retroviruses. Virus Genes, 26, 291-315.
Grenyer, R. & Purvis, A., (2003). A composite species-level phylogeny of the 'Insectivora'
(Mammalia, Order Lipotyphla Haeckel 1866). Journal of Zoology (London), 260, 245-257.
Isaac, N.J.B., Agapow, P.-M., Harvey, P.H. & Purvis, A. (2003). Phylogenetically nested
comparisons for testing correlates of species-richness: a simulation study of continuous
variables. Evolution, 57, 18-26.
Kambol, R., Kabat, P. & Tristem, M. (2003). Complete nucleotide sequence of an
endogenous retrovirus from the amphibian, Xenopus laevis. Virology, 311, 1-6.
Lynch, C. & Tristem, M. (2003). A co-opted gypsy-type LTR-retrotransposon is conserved in
the genomes of humans, sheep, mice and rats. Current Biology, 13, 1518-1523.
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
24
Mace, G.M., Gittleman, J.L. & Purvis, A. (2003). Preserving the Tree of Life. Science, 300,
1707-1709.
Genetics and Genomics
Capelli, C., Redhead, N., Abernethy, J.K., Gratrix, F., Wilson, J.F., Moen, T., Hervig, T.,
Richards, M., Stumpf, M.P.H, Underhill, P.A., Bradshaw, P., Shaha, A, Thomas, M.G.,
Bradman, N. & Goldstein, D.B. (2003). A Y-chromosome census of the British Isles, Current
Biology, 13, 979-984.
Ferguson N.M. & Donnelly C.A. (2003). Assessment of the risk posed by bovine spongiform
encephalopathy in cattle in Great Britain and the impact of potential changes to current
control measures. Proc. R. Soc. Lond. B. Biol. Sci. 270. 1579-1584.
Ferguson N.M., Keeling M.J., Edmunds W.J, Gani R., Grenfell B.T., Anderson R.M. & Leach
S. (2003). Planning for smallpox outbreaks. Nature, 425, 681-685.
Griffin, J.L., Bonney S.A., Mann, C., Hebbachi, A.M., Gibbons, G.F., Nicholson, J.K.,
Shoulders, C.C. & Scott, J. (2003). An Integrated Reverse Functional Genomic and
Metabolic Approach to Understanding Orotic Acid Induced Fatty Liver. Physiological
Genomics.
Hagenaars T.J., Donnelly C.A., Ferguson N.M. & Anderson R.M. (2003). Dynamics of a
scrapie outbreak in a flock of Romanov sheep: estimation of transmission parameters.
Epidemiol. Infect. 131, 1015-1022.
Jones B, Jones, E.L., Bonney, S. A, Patel, H.N., Mensenkamp, A.R., Rudling, M., Myrdal,
U., Annesi, G., Naik, S., Meadows, N., Quattrone, A., Naoumova, R.P., Angelin, B.,
Infante, R., Levy, E., Roy, C.C., Freemont, P.S., Scott, J. & Shoulders, C.C. (2003). Lipid
Absorption Disorders of the Intestine Caused by Mutations of a Sar1 GTPase. Nature
Genetics, 34, 29-31.
Mead, S., Stumpf, M.P.H., Whitfield, J., Beck, J.A., Poulter, M., Campbell, T., Uphill, J.B. ,
Goldstein, D.B., Alpers, M., Fisher, E.M. & Collinge, J. (2003). Balancing selection at the
prion protein gene consistent with prehistoric kuru-like epidemics. Science, 300, 640-643.
Naoumova, R.P., Bonney. S.A., Eichenbaum-Voline, S, Patel, H.N., Jones, B., Jones, Joanna
E.L., Amey, J., Colilla, S., Neuwirth, C.K.Y., Seed, M., Betteridge, D.J., Galton, D.J.,
Cox, N.J., Bell, G.I., Scott, J. & Shoulders, C.C. (2003). Confirmed Locus on Chromosome
11p and Candidate Loci on 6q and 8p for the Triglyceride and Cholesterol Traits of Combined
Hyperlipidemia. Arterioscler. Thromb. Vasc. Biol. 23, 2070-2077.
Redmond, S., Vadivelu, J. & Field, M.C. (2003). RNAit: an automated web-based tool for the
selection of RNAi targets in Trypanosoma brucei. Molecular and Biochemical Parasitology,
128, 115-118.
Riley S., Donnelly C.A. & Ferguson N.M. (2003). Robust parameter estimation techniques for
stochastic within-host macroparasite models. J. Theor. Biol. 225, 419-430.
Stumpf, M.P.H. & McVean, G.A.T., (2003). Estimating recombination rates from populationgenetic data. Nat. Rev. Genet. 4, 959-968.
Stumpf, M.P.H. & Goldstein, D.B. (2003). Demography, recombination hotspot intensity, and
the block structure of linkage disequilibrium. Current Biology, 13, 1-8 .
Wiuf, C., Laidlaw, Z. & Stumpf, M.P.H. (2003). Some notes of the combinatorial properties of
haplotype tagging. Math. Biosci.185, 205-216.
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
25
High-throughput 'Omics Methodologies
Causton, H.C., Quackenbush, J. & Brazma, A. (2003). Microarray Gene Expression Data
Analysis - A Beginner's Guide. 1st edn. Blackwell publishing.
Kemp,T.J., Causton, H.C. & Clerk, A. (2003). Changes in gene expression induced by H2O2
in cardiac myocytes. Biochem. Biophy. Res. Commun. 307, 416-421.
Shen, W.C., Bhaumik, S.R., Causton, H.C., Simon, I., Zhu, X., Jennings, E.G., Wang, T.H.,
Young, R.A. & Green, M.R. (2003). Systematic analysis of essential yeast TAFs in genomewide transcription and preinitiation complex. EMBO J. 22, 3395-3402.
Macromolecular Structures
Cootes, A.P., Muggleton, S.H. & Sternberg, M.J.E. (2003). The automatic discovery of
structural principles describing protein fold space. J. Mol. Biol. 330, 839-850.
Janin, J., Henrick, K., Moult, J., Eyck, L. T., Sternberg, M.J.E, Vajda, S., Vakser, I. & Wodak,
S. J. (2003). CAPRI: A Critical Assessment of Predicted Interactions. Proteins 52, 2-9.
Smith, G.R. & Sternberg, M.J.E. (2003). Evaluation of the 3D-Dock protein docking suite in
rounds 1 and 2 of the CAPRI blind trial. Proteins 52, 74-79.
Physical and Chemical Methods
Gkoutos, G.V., Rzepa, H.S. & Murray-Rust, P. (2003). Online Validation and Comparison of
Molfile and CML Molecular Atom-Connection Descriptors. Internet. J. Chem. article 1.
Gkoutos, G. V., Rzepa, H. S., Clark, R. M., Adjei, O. & Johal, H. (2003). Chemical Machine
Vision: Automated extraction of chemical meta-data from raster images. J. Chem. Inf. Comp.
Sci., 43, issue 5.
Murray-Rust, P. & Rzepa, H. S. (2003). Chemical Markup, XML and the Worldwide Web.
Part 4. CML Schema. J. Chem. Inf. Comp. Sci. 43, issue 4.
Murray-Rust, P. & Rzepa, H. S. (2003). Towards the Chemical Semantic Web. An
introduction to RSS. Internet J. Chem. 6, article 4.
Murray-Rust, P. & Rzepa, H. S. (2003). XML for scientific publishing. OCLC Systems and
Services. 19, 162-169.
Murray-Rust, P. & Rzepa, H. S. (2003). In 'Handbook of Chemoinformatics. Part 2.
Advanced Topics, ed. J. Gasteiger & T. Engel', Vol 1.
Centre for Bioinformatics - Imperial College London - Second Report - May 2004
26
Download