ELIXIR-UK structures report

advertisement
Structural bioinformatics training for ELIXIR-UK
Workshop report
Date: 17–18th February 2014
Venue: EMBL-European Bioinformatics Institute, Hinxton, UK
Background
ELIXIR (www.elixir-europe.org), Europe's infrastructure for biological data, was launched on
18 December 2013. It is a distributed infrastructure with a hub at the EMBL-European
Bioinformatics Institute and national nodes in its member states. The UK node will specialise
in training users and operators of ELIXIR's infrastructure. The BBSRC has awarded funds to
the ELIXIR-UK node to consolidate training materials in structural bioinformatics. Structural
bioinformatics is one of ten ‘sectors’ identified by the UK node (Figure.1) as being a
particular strength in the UK, and for which there are limited training resources available. By
consolidating training resources that already exist, mapping them to appropriate target
audiences, and providing access to them through a single portal, we will not only be able to
optimise use of these training resources by the research community, but also identify gaps in
training provision. Our long-term goal (beyond the scope of the initial project) is to find a
sustainable mechanism for filling these gaps, with the goal of ensuring that the research
community makes the most of Europe’s wealth of publicly available structural bioinformatics
resources.
Figure 1: Target audiences identified by ELIXIR-UK, the ten sectors identified as needing consolidation of training
materials, and their sector leads. [needs editing to include CB as co-lead of structural bx].
1
Planning
Planning for the workshop began in summer 2013, when a small taskforce was convened to
discuss training needs in the sector. The taskforce comprised:








Cath Brooksbank, EMBL-EBI
Christine Orengo, UCL
Gary Battle, EMBL-EBI
Tom Blundell, U. Cambridge
Charlotte Deane, U. Oxford
Brian Marsden, U. Oxford
Sarah Morgan, EMBL-EBI
Richard Grandison, EMBL-EBI
The taskforce identified six target audiences needing varying levels of structural
bioinformatics expertise, and drafted a list of competencies required by each target audience
(see appendix). The taskforce also planned the workshop reported in this document.
Aims of the workshop:


To reach agreement on the training needs of our target audiences, which span
clinical researchers, bench-based molecular life scientists, structural biologists,
structural bioinformaticians, computational chemists and medicinal chemists;
To identify existing sources of training material, organise it into themes, identify gaps
and plan how we fill those gaps.
Before the workshop:
A survey was sent out to workshop invitees and other experts who covered the range of
target audiences that had been identified. The aim of the survey was to gather information
on peoples’ experiences of, and thoughts on, structural bioinformatics training. This included
what people thought were the most crucial structural bioinformatics competencies for each of
our target audiences. The survey was sent out to around 100 people and we have received
24 responses to date.
The survey can be found here.
Structure of the workshop
The workshop, lasting 1.5 days, was organised into four sessions, combining presentations,
themed breakout groups and discussions in plenary. Session 1 set the scene, ensuring that
all participants were familiar with ELIXIR’s purpose and training strategy. Session 2 explored
the needs of our target audiences, using outputs from our training needs survey. Session 3
considered approaches to training, with an emphasis on elearning, and identified priority
areas for the consolidation and future development of training materials. Finally, session 4
focused on developing an action plan for the future, including our goals for the remainder of
this project and our longer-term plans for improving provision of structural bioinformatics
training.
A full list of delegates can be found in appendix.
2
Presentations:





ELIXIR, ELIXIR-UK and the aims of the training node (Cath Brooksbank, EMBL-EBI
on behalf of Charlotte Deane)
Introduction to the ELIXIR-UK structural bioinformatics project (Christine Orengo
[UCL] and Cath Brooksbank [EMBL-EBI])
Structural biology distance learning at Birkbeck (Nick Keep, Birkbeck College)
Industry needs (Friedrich Rippmann, Merck Serono)
The SysMIC project (Adrian Shepherd, Birkbeck College)
The breakout sessions were set up as follows:
Breakout 1: Target communities
Chaired by Geoff Barton, Charlotte Deane and Cath Brooksbank
Purpose: To reach agreement on who our target audiences are and what their highest
priority training needs are in structural bioinformatics.
Delegates were divided into three groups and were assigned two target audiences to focus
on. These audiences were:T



Structural biologists and bioinformaticians
Medicinal and computational chemists
Clinical researchers and molecular life scientists
Starting information: Each group was provided with a list of identified competencies for each
target audience and the survey results showing which competencies survey responders
considered the most important for each audience.
Each group was asked to address were the following questions:





Are these our target communities?
For your given two communities, what are the most important competencies?
What's the rationale for your ranking
What is the most important question your given communities need to answer using a
protein structure?
Are there common needs across several communities?
Breakout 2: The gaps in structural bioinformatics training and how to fill them
Chaired by Mike Sternberg, John Overington and Adrian Shepherd
Purpose: To identify mechanisms for delivering training to the target audiences agreed upon
in breakout 1. Although not a specified goal, the discussants also began to think about
thematic areas into which indicative content might be organised in the future.
Delegates were again divided into three groups and were assigned two target audiences to
focus on. These audiences were grouped as in Breakout 1.
Starting information: Groups were provided with reports from breakout session 1 and
comments from the survey results on availability of training material.
3
Each group was asked to address were the following questions:


What is the appropriate engagement route? Provide rationale / justification
What is / are the appropriate delivery mechanism(s)? Provide rationale / justification
Breakout 3: Opportunities going forward
Purpose: To agree on both our short-term goals (in terms of consolidating existing training
material) and our long-term goals.
Delegates were divided into two groups. Group 1 had a training focus and looked at the
short-term goals of the project. Group 2 had a strategy focus and looked at the long-term
goals of the project.
Starting information: Each groups were provided with reports/comments from the previous
two breakout sessions.
Group 1 – short term goals (training)
Chaired by Sarah Morgan
Questions to address:


How can we build on the materials we have available?
Where and how do we focus effort?
Group 2 – long term goals (strategy)
Chaired by Christine Orengo
Question to address:

What should be the strategy for ensuring long term progress of training in structural
biology; gaining funding etc?
Common principles of training materials
A brief discussion session was held to identify common principles that ELIXIR-UK could
champion when working with providers of training materials.

Delegates were asked to provide three common principles that all training material
developed and/or collated as part of this project should share. These principles were
then subsequently organised into themes.
4
Outcomes
Breakout 1:
The most important competency, which was unanimously identified across all target
audiences, was being able to understand what can, and what cannot, be inferred from a
structure.
A more detailed overview from each of the three groups can be found in appendix.
Breakout 2
We concluded that our six target audiences can be place into two big silos, and that
categorising training materials into these two silos would be a good first step:
Silo 1 (generalists): need to be convinced what structure can do for them; need an
introduction to what can and cannot be inferred from a structure, and to learn the ‘language’
that will enable them to communicate effectively with silo 2. This silo contains clinical
researchers, molecular life scientists and medicinal chemists.
Silo 2 (specialists): have a firm understanding of structural bioinformatics principles but need
to be kept up to speed with the latest developments. This silo contains structural
bioinformaticians, computational chemists (often both specialties reside in one individual,
especially in industry) and structural biologists.
A more detailed overview from each of the three groups can be found in appendix.
Breakout 3
Conclusions of this breakout were built into an action plan (see later). We also agreed on a
set of thematic areas and nominated a “Tsar” for each of these:
-
Principles of structure (qualitative issues): Instruct / PDBe (Sameer Velankar)
Structure prediction / annotation: Genome 3D (Mike Sternberg and David Jones)
Protein–protein complexes (Franca Fraternali)
Protein–ligand interactions (John Overington)
Structure-based sequence analysis (Geoff Barton)
Structure comparison: SCOP / CATH (Christine Orengo)
Structure to function (Mark Wass)
Web-based learning (Adrian Shepherd)
Molecular dynamics (Phil Biggin?)
A more detailed overview from the two groups can be found in appendix .
Common principles of training materials
The most important common principles of training materials centred around five generic
themes: audiences, topics, learning methods, delivery mechanisms and recognition.
5
A more detailed overview of the most common principles of training materials can be found
in appendix.
Action plan
At the end of the workshop, the following steps were identified as the highest priorities to
progress this project.
1) Source web developer expertise asap (Charlotte Deane)
2) Tsars to identify other experts elsewhere in Europe in the different thematic areas.
These Experts must have something to bring to the table
3) Meeting of Tsars to finalise approach and populate website (Christine Orengo to
organise in London
-
Draft 2 page document outlining long-term plan
Draft letter to experts
4) Gather volunteers for user experience testing website design
6
Appendix
Breakout 1:
Medicinal chemists and computational chemists
Computational chemists
Medicinal chemists
Rate the competencies
1) Understand what can, and what
cannot, be inferred from a structure
2) Interpret 3D structural data both in
terms of the binding-site interactions
and the ligand conformation
3) Bring together ligand screening and
target-based screening - mapping
chemical libraries onto target-based
screening operations
Justification of ranking



They need to understand the structure
(traffic – lights according to the Bfactor), healthy scepticism – discover
more options of visualisation software
Then they can worry about
interpretation, understanding
These two points will trigger the
investigation of last three competencies
What is the most important question they
need to answer using a protein structure?

How can I use structural biology to
maximise the success rate of the series
of compounds I have
Rate the competencies
1) Understand what can, and what
cannot, be inferred from a structure
2) Perform detailed validation and
assessment of protein-ligand structures
3) Prepare protein model(s) for docking
Justification of ranking




They think they know the answer – so
make sure they do
Prepare structure –quality of structure,
pitfalls of homology modelling,
Build a good model (black box tools are
dangerous)
Expertise for training does not lie with
us
What is the most important question they
need to answer using a protein structure?

How to provide scripts they can use
immediately?
(they are driven by “toys and tricks”)
i.e. they want a visualisation of a
docked molecule and that very quickly
7
Clinical researchers and molecular life scientists
Molecular life scientists
Clinical researchers
Rate the competencies
1 = Understand what can, and what
cannot, be inferred from a structure
1 = Visualise SNPs on structures and
understand the functional
consequences
2) Comparative analysis of structures
Justification of ranking



Need to be aware that not every clinical
researcher is going to have the same
priorities – clinical geneticist may have
different needs to, say, a cardiologist
Format/delivery considerations –
perhaps contributing appropriate
content to Wikipedia (or a medical
equivalent – is there an AMA wiki for
example?)
How do you get from OMIM to a
structure? PDBe to a
disease/phenotype?
Rate the competencies
1) Know where to get help – and how to
structure questions appropriately
2) Understand what can, and what
cannot, be inferred from a structure
3) Combine structural and other data
types to build understanding of
molecular function
4) Compare structures from different
species, using structural evolution as a
tool to understand the molecular basis
of processes such as disease or
development
Justification of ranking


What is the most important question they
need to answer using a protein structure?


Can you explain this to me in less than
30 seconds? (the answer to this
question always has to be yes)
Relevance of structure to one specific
disease?
Need for structural bioinformatics
varies enormously depending on area
of interest – in a single department it
can range from ‘don’t need to know
anything non-standard’ to ‘purifying
proteins to solve my own structures’
This community needs a ‘structural
biology advisor’ – something like
BioStars ‘stack exchange’ – but needs
some expert curation; top-ranked
answers can be wrong
What is the most important question they
need to answer using a protein structure?


How do I get from sequence to
structure to function/mechanism?
How far can I push it?
8
Structural biologists and bioinformaticians
Bioinformaticians
Structural biologists
Rate the competencies
1) Understand what can, and what
cannot, be inferred from a structure
2) Interpret 3D structural data both in
terms of the binding-site interactions
and the ligand conformation
3) Bring together ligand screening and
target-based screening - mapping
chemical libraries onto target-based
screening operations
Justification of ranking



They need to understand the structure
(traffic – lights according to the Bfactor), healthy scepticism – discover
more options of visualisation software
Then they can worry about
interpretation, understanding
These two points will trigger the
investigation of last three competencies
What is the most important question they
need to answer using a protein structure?
How can I use structural biology to maximise
the success rate of the series of compounds I
have
Rate the competencies
1) Understand what can, and what
cannot, be inferred from a structure
2) Perform detailed validation and
assessment of protein-ligand structures
3) Prepare protein model(s) for docking
Justification of ranking




They think they know the answer – so
make sure they do
Prepare structure –quality of structure,
pitfalls of homology modelling,
Build a good model (black box tools are
dangerous)
Expertise for training does not lie with
us
What is the most important question they
need to answer using a protein structure?
How to provide scripts they can use
immediately?
(They are driven by “toys and tricks”) i.e. they
want a visualisation of a docked molecule and
that very quickly
9
Breakout 2:
Medicinal chemists and computational chemists
Computational chemists
Medicinal chemists
Ways to engage the community






Provide training that solves their
problems
What is available to solve it?
Gaps seem to be in software
Pitch in the right “language”
List advantages
Research-focused questions
Rationale / justification




More med chemists than comp
chemists
Less “structural” than comp chemists
Engagement must be research focused
Resources are available to train them
Appropriate delivery mechanisms




Face to face is easy (50 miles around
Cambridge)
Onsite training
Online format
Wiki?
Rationale / justification


Ways to engage the community



Do a lot of stuff that is considered to be
bioinformatics
They always did it
Identify missing link, i.e. “you could be
the interface between med chemist and
structural biologist
Rationale / justification


Need to be aware of historical
development of job specs
They know software, but need
mechanisms to keep them at the
cutting edge
Appropriate delivery mechanisms




Get them off site!
Onsite training
Online format
Wiki?
Rationale / justification


Undisturbed training
Accreditation
Content (see JPO’s notes)
Who we deliver to
10
Structural biologists and bioinformaticians
Community:




Structural biologists are more computer literate than many other
communities
Bioinformatics includes medical informatics
Expansion in industry in bioinformatics
GWAS / disease community often not understand structure
Generation of an online portal that would include:










Include a social media organisation possibly on existing resource such as
BioStar
Own Q & A page
Collect case studies
Links to existing material
Links from existing servers and resources
Courses, some of which are captured (video? for training material
(Masterclass)
Webinars
Follow key people
EU links
Subdivide into focus groups, categorised by area of expertise rather than
problem bases
- Principles of structure
- Structure prediction and annotation
- Protein–protein complexes
- Protein–ligand interactions
- Sequence analysis and its structural implications
- Comparative structural analysis
11
Clinical researchers and molecular life scientists
Clinical researchers
General thoughts:



“Time poor”
Need for quick “simple” returns
Need to show relevance of structure
to their work – hence bioinformatics
in general
Ways to engage the community



Clinical conferences
Probably unlikely through web
contact
Must communicate with those in the
field who have a better knowledge of
their community – those that
interface across the bounds of
academia and clinical research (NHS
environment for example)
Rationale / justification



Face to face contact
Easier to relate to their
work/research – establish their
interests/needs
Not “easy” to engage them on the
web (via a dedicated website)
Engagement with “those who know
the area” would make contact far
easier(?) to establish
Appropriate delivery mechanisms

Videos (?) – short “commercials”
designed for specific topics “what
can the Romans do for you”!!
Rationale / justification


Many people watch YouTube
videos!! (quickly digestible material
for “time poor” people)
Note that many CRs are not aware of
the structures/tools available to them
Molecular life scientists
General thoughts:

“Easier” because they work with “strings of
letters” hence we can show them how
structure is relevant to this
Ways to engage the community



Fortunate in some way because there is a
background awareness of the importance of
structure – therefore we can harness that
awareness to encourage their interest
Conferences (again)
Need to show relevance to their
work/research
Rationale / justification


Bring home the understanding that changing
“these letters” within the sequence might have
dramatic consequences on their structure
Benefit from understanding structure
would/could make modifications more
significant/informative
Appropriate delivery mechanisms


Something which gives them both a theory
and a practice – a “tool” to make them more
aware of the significance of structure
Maybe web based
Maybe direct contact (meetings)
Videos (again)
Rationale / justification




Already aware – just need a “nudge” perhaps
YouTube “easy”
Need (perhaps more so) to enable their
knowledge of the bioinformatics “tools”
available to them (because of their
awareness)
However “some” MLS’s are not so aware of
the many tools (structures) that are available
to them
12
Breakout 3
Short-term opportunities
What is realistic to do by June (4 months)?

Single central hub – needs to be hosted on ELIXIR UK website
- Branding? (Make it known / useful within community)
- “Agony Aunt” approach – PI blog entries: LinkedIn, Facebook, Twitter
- Registry of material
- Feed into TESS

Start by targeting a general audience first
- Better to target one general audience well than several audiences not so well
- The idea would be to develop a more tailored portal / material for specific
audiences in the future

Community champions
- Structural and non-structural

Willing volunteers needed from the workshop
- Help generate material
- Contribute to community and help answering questions in forums
- Allowing the project to use their existing training material

Need to be clear about commercial rights (of tools, etc.)

How do we credit good material?

Create a central resource around what is a structure and what can and can’t be
inferred by a structure
- Start basic and can get more complex as you progress through material
- A slide set could be useful covering this topic
- Also create a series of short videos showing an expert talk you through a
structure (with a few case study examples), highlighting the different parts of the
structure and showing the user which bits of the structure are relevant, etc.
- Different users require different mechanisms of delivery

5 main questions – each PI should submit the top 5 questions they are most
commonly asked around structural bioinformatics
- Top 5 questions together with detailed answers using links to appropriate training
material and resources/tools
- Questions not answered here could be answered in the forum

Statement of sustainability
- We need something that we can continually develop (evolve) and update
- Where is what we put together going to sit?
- FAQ-based approach
- What is a structure?
- What makes a good structure; what makes a bad structure?
- Point to training resources that answer these questions
13
-
Pose questions that you’d be happy to answer – so each champion needs to
come up with a structure
Keep things small and modular
Long-term strategy

A single structural bioinformatics network with thematic areas. Experts / champions
are in brackets:
-
Principles of structure (qualitative issues): Instruct / PDBe (Sameer Velankar)
Structure prediction / annotation: Genome 3D (Mike Sternberg and David Jones)
Protein–protein complexes (Franca Fraternali)
Protein–ligand interactions (John Overington)
Structure-based sequence analysis (Geoff Barton)
Structure comparison: SCOP / CATH (Christine Orengo)
Structure to function (Mark Wass)
Web-based learning (Adrian Shepherd)
Molecular dynamics (Phil Biggin?)
Outcomes of common principles for training materials
Audiences







Communities of practice for different thematic areas
Organise the training around community specific research questions and needs
Construct carefully tailored surveys to each of the identified communities to
specifically target their needs and the gaps they identify
Seeing wood from trees – how do make sense of validation of data
Theme-based workshops
Understand the audience - different entry levels
- support from their organisation (academic or industry)
Accessibility - for academic and industry community
- clarity of course / entry to it
Topics




Decide about common scripting languages to be used to share
Not software training
Balanced on polemic issues (e.g. hydrophobic effect)
Conditioning structures to address problem – what to think about when docking
ligands to x-ray structures
Learning methods




Impact
Problem-based learning
EMTRAIN’s central portal of learning methods
Range of learning styles, blended learning
14
Delivery mechanisms








TED conferences and TED talks
Identify best practices for online training
Chat rooms – question and answer sessions
On demand
Modularity – re-versioning core material for different audiences
Central portal
Openly accessible / open access (with attribution)
If multi institution - open access/creative commons
Recognition




Accreditation - recognition of gained skills
- make ELIXIR the place to go
Course badging
Accuracy
Timelines (up to date) – maintainable
To do list from long-term strategy breakout







Meeting with Champions /1 month @UCL – Christine to set up doodle poll
Matrix of people – who should contact who?
Feedback to rest of ELIXIR-UK on outcomes of the workshop as a whole
Champions to get people in their groups to assess existing training material
Community need – obvious and can illustrate this through case studies - RCG to
collect
Workshop – consultation with rest of Europe
Write a 2-pager for Alf Game for Jul 2014
o Building a resource for training
o Structural bioinformatics jump station – serve as a role model for other
sectors
o Prototype website to show them
o ‘Count what’s happening’ and brand it
Some use cases




Mark Wass – clinicians with SNPs -> structural data; Protein networks, PPIs
Charlotte – antibody-based therapeutics
Roman: Sanger – structure-based analysis of GWAS data
Friedrich’s Docking, comp chem – cover this in MTP
15
A list of workshop attendees:
Christine Orengo – UCL, Genome3D
Mark Wass – Kent University
Adrian Shepherd – Birkbeck College
Charlotte Deane – University of Oxford
John Mitchell – St Andrews
Mike Sternberg – Imperial College, Genome3D
Sameer Velankar – EMBL-EBI (PDBe)
David Jones – UCL, Genome3D
John Overington – EMBL-EBI
Geoff Barton - Dundee
Franca Fraternali – King’s College London
Cath Brooksbank – EMBL-EBI
Sarah Morgan – EMBL-EBI
Tim Levine - UCL
Alexandra Simperler – Imperial College
Brian Marsden - SGC
Richard Grandison – EMBL-EBI
Friedrich Rippmann – Merck Serono
Nick Keep – Birkbeck College
Aleksandra Pawlik - Software Sustainability Institute
Ian Sillitoe – UCL
Garrett Morris - Crysalin
Pam Thomas - GSK
Ardan Pawhardan - EMBL-EBI (PDBe)
Robert Janes – Queen Mary University, London
16
Download