Structural bioinformatics training for ELIXIR-UK Workshop report Date: 17–18th February 2014 Venue: EMBL-European Bioinformatics Institute, Hinxton, UK Background ELIXIR (www.elixir-europe.org), Europe's infrastructure for biological data, was launched on 18 December 2013. It is a distributed infrastructure with a hub at the EMBL-European Bioinformatics Institute and national nodes in its member states. The UK node will specialise in training users and operators of ELIXIR's infrastructure. The BBSRC has awarded funds to the ELIXIR-UK node to consolidate training materials in structural bioinformatics. Structural bioinformatics is one of ten ‘sectors’ identified by the UK node (Figure.1) as being a particular strength in the UK, and for which there are limited training resources available. By consolidating training resources that already exist, mapping them to appropriate target audiences, and providing access to them through a single portal, we will not only be able to optimise use of these training resources by the research community, but also identify gaps in training provision. Our long-term goal (beyond the scope of the initial project) is to find a sustainable mechanism for filling these gaps, with the goal of ensuring that the research community makes the most of Europe’s wealth of publicly available structural bioinformatics resources. Figure 1: Target audiences identified by ELIXIR-UK, the ten sectors identified as needing consolidation of training materials, and their sector leads. [needs editing to include CB as co-lead of structural bx]. 1 Planning Planning for the workshop began in summer 2013, when a small taskforce was convened to discuss training needs in the sector. The taskforce comprised: Cath Brooksbank, EMBL-EBI Christine Orengo, UCL Gary Battle, EMBL-EBI Tom Blundell, U. Cambridge Charlotte Deane, U. Oxford Brian Marsden, U. Oxford Sarah Morgan, EMBL-EBI Richard Grandison, EMBL-EBI The taskforce identified six target audiences needing varying levels of structural bioinformatics expertise, and drafted a list of competencies required by each target audience (see appendix). The taskforce also planned the workshop reported in this document. Aims of the workshop: To reach agreement on the training needs of our target audiences, which span clinical researchers, bench-based molecular life scientists, structural biologists, structural bioinformaticians, computational chemists and medicinal chemists; To identify existing sources of training material, organise it into themes, identify gaps and plan how we fill those gaps. Before the workshop: A survey was sent out to workshop invitees and other experts who covered the range of target audiences that had been identified. The aim of the survey was to gather information on peoples’ experiences of, and thoughts on, structural bioinformatics training. This included what people thought were the most crucial structural bioinformatics competencies for each of our target audiences. The survey was sent out to around 100 people and we have received 24 responses to date. The survey can be found here. Structure of the workshop The workshop, lasting 1.5 days, was organised into four sessions, combining presentations, themed breakout groups and discussions in plenary. Session 1 set the scene, ensuring that all participants were familiar with ELIXIR’s purpose and training strategy. Session 2 explored the needs of our target audiences, using outputs from our training needs survey. Session 3 considered approaches to training, with an emphasis on elearning, and identified priority areas for the consolidation and future development of training materials. Finally, session 4 focused on developing an action plan for the future, including our goals for the remainder of this project and our longer-term plans for improving provision of structural bioinformatics training. A full list of delegates can be found in appendix. 2 Presentations: ELIXIR, ELIXIR-UK and the aims of the training node (Cath Brooksbank, EMBL-EBI on behalf of Charlotte Deane) Introduction to the ELIXIR-UK structural bioinformatics project (Christine Orengo [UCL] and Cath Brooksbank [EMBL-EBI]) Structural biology distance learning at Birkbeck (Nick Keep, Birkbeck College) Industry needs (Friedrich Rippmann, Merck Serono) The SysMIC project (Adrian Shepherd, Birkbeck College) The breakout sessions were set up as follows: Breakout 1: Target communities Chaired by Geoff Barton, Charlotte Deane and Cath Brooksbank Purpose: To reach agreement on who our target audiences are and what their highest priority training needs are in structural bioinformatics. Delegates were divided into three groups and were assigned two target audiences to focus on. These audiences were:T Structural biologists and bioinformaticians Medicinal and computational chemists Clinical researchers and molecular life scientists Starting information: Each group was provided with a list of identified competencies for each target audience and the survey results showing which competencies survey responders considered the most important for each audience. Each group was asked to address were the following questions: Are these our target communities? For your given two communities, what are the most important competencies? What's the rationale for your ranking What is the most important question your given communities need to answer using a protein structure? Are there common needs across several communities? Breakout 2: The gaps in structural bioinformatics training and how to fill them Chaired by Mike Sternberg, John Overington and Adrian Shepherd Purpose: To identify mechanisms for delivering training to the target audiences agreed upon in breakout 1. Although not a specified goal, the discussants also began to think about thematic areas into which indicative content might be organised in the future. Delegates were again divided into three groups and were assigned two target audiences to focus on. These audiences were grouped as in Breakout 1. Starting information: Groups were provided with reports from breakout session 1 and comments from the survey results on availability of training material. 3 Each group was asked to address were the following questions: What is the appropriate engagement route? Provide rationale / justification What is / are the appropriate delivery mechanism(s)? Provide rationale / justification Breakout 3: Opportunities going forward Purpose: To agree on both our short-term goals (in terms of consolidating existing training material) and our long-term goals. Delegates were divided into two groups. Group 1 had a training focus and looked at the short-term goals of the project. Group 2 had a strategy focus and looked at the long-term goals of the project. Starting information: Each groups were provided with reports/comments from the previous two breakout sessions. Group 1 – short term goals (training) Chaired by Sarah Morgan Questions to address: How can we build on the materials we have available? Where and how do we focus effort? Group 2 – long term goals (strategy) Chaired by Christine Orengo Question to address: What should be the strategy for ensuring long term progress of training in structural biology; gaining funding etc? Common principles of training materials A brief discussion session was held to identify common principles that ELIXIR-UK could champion when working with providers of training materials. Delegates were asked to provide three common principles that all training material developed and/or collated as part of this project should share. These principles were then subsequently organised into themes. 4 Outcomes Breakout 1: The most important competency, which was unanimously identified across all target audiences, was being able to understand what can, and what cannot, be inferred from a structure. A more detailed overview from each of the three groups can be found in appendix. Breakout 2 We concluded that our six target audiences can be place into two big silos, and that categorising training materials into these two silos would be a good first step: Silo 1 (generalists): need to be convinced what structure can do for them; need an introduction to what can and cannot be inferred from a structure, and to learn the ‘language’ that will enable them to communicate effectively with silo 2. This silo contains clinical researchers, molecular life scientists and medicinal chemists. Silo 2 (specialists): have a firm understanding of structural bioinformatics principles but need to be kept up to speed with the latest developments. This silo contains structural bioinformaticians, computational chemists (often both specialties reside in one individual, especially in industry) and structural biologists. A more detailed overview from each of the three groups can be found in appendix. Breakout 3 Conclusions of this breakout were built into an action plan (see later). We also agreed on a set of thematic areas and nominated a “Tsar” for each of these: - Principles of structure (qualitative issues): Instruct / PDBe (Sameer Velankar) Structure prediction / annotation: Genome 3D (Mike Sternberg and David Jones) Protein–protein complexes (Franca Fraternali) Protein–ligand interactions (John Overington) Structure-based sequence analysis (Geoff Barton) Structure comparison: SCOP / CATH (Christine Orengo) Structure to function (Mark Wass) Web-based learning (Adrian Shepherd) Molecular dynamics (Phil Biggin?) A more detailed overview from the two groups can be found in appendix . Common principles of training materials The most important common principles of training materials centred around five generic themes: audiences, topics, learning methods, delivery mechanisms and recognition. 5 A more detailed overview of the most common principles of training materials can be found in appendix. Action plan At the end of the workshop, the following steps were identified as the highest priorities to progress this project. 1) Source web developer expertise asap (Charlotte Deane) 2) Tsars to identify other experts elsewhere in Europe in the different thematic areas. These Experts must have something to bring to the table 3) Meeting of Tsars to finalise approach and populate website (Christine Orengo to organise in London - Draft 2 page document outlining long-term plan Draft letter to experts 4) Gather volunteers for user experience testing website design 6 Appendix Breakout 1: Medicinal chemists and computational chemists Computational chemists Medicinal chemists Rate the competencies 1) Understand what can, and what cannot, be inferred from a structure 2) Interpret 3D structural data both in terms of the binding-site interactions and the ligand conformation 3) Bring together ligand screening and target-based screening - mapping chemical libraries onto target-based screening operations Justification of ranking They need to understand the structure (traffic – lights according to the Bfactor), healthy scepticism – discover more options of visualisation software Then they can worry about interpretation, understanding These two points will trigger the investigation of last three competencies What is the most important question they need to answer using a protein structure? How can I use structural biology to maximise the success rate of the series of compounds I have Rate the competencies 1) Understand what can, and what cannot, be inferred from a structure 2) Perform detailed validation and assessment of protein-ligand structures 3) Prepare protein model(s) for docking Justification of ranking They think they know the answer – so make sure they do Prepare structure –quality of structure, pitfalls of homology modelling, Build a good model (black box tools are dangerous) Expertise for training does not lie with us What is the most important question they need to answer using a protein structure? How to provide scripts they can use immediately? (they are driven by “toys and tricks”) i.e. they want a visualisation of a docked molecule and that very quickly 7 Clinical researchers and molecular life scientists Molecular life scientists Clinical researchers Rate the competencies 1 = Understand what can, and what cannot, be inferred from a structure 1 = Visualise SNPs on structures and understand the functional consequences 2) Comparative analysis of structures Justification of ranking Need to be aware that not every clinical researcher is going to have the same priorities – clinical geneticist may have different needs to, say, a cardiologist Format/delivery considerations – perhaps contributing appropriate content to Wikipedia (or a medical equivalent – is there an AMA wiki for example?) How do you get from OMIM to a structure? PDBe to a disease/phenotype? Rate the competencies 1) Know where to get help – and how to structure questions appropriately 2) Understand what can, and what cannot, be inferred from a structure 3) Combine structural and other data types to build understanding of molecular function 4) Compare structures from different species, using structural evolution as a tool to understand the molecular basis of processes such as disease or development Justification of ranking What is the most important question they need to answer using a protein structure? Can you explain this to me in less than 30 seconds? (the answer to this question always has to be yes) Relevance of structure to one specific disease? Need for structural bioinformatics varies enormously depending on area of interest – in a single department it can range from ‘don’t need to know anything non-standard’ to ‘purifying proteins to solve my own structures’ This community needs a ‘structural biology advisor’ – something like BioStars ‘stack exchange’ – but needs some expert curation; top-ranked answers can be wrong What is the most important question they need to answer using a protein structure? How do I get from sequence to structure to function/mechanism? How far can I push it? 8 Structural biologists and bioinformaticians Bioinformaticians Structural biologists Rate the competencies 1) Understand what can, and what cannot, be inferred from a structure 2) Interpret 3D structural data both in terms of the binding-site interactions and the ligand conformation 3) Bring together ligand screening and target-based screening - mapping chemical libraries onto target-based screening operations Justification of ranking They need to understand the structure (traffic – lights according to the Bfactor), healthy scepticism – discover more options of visualisation software Then they can worry about interpretation, understanding These two points will trigger the investigation of last three competencies What is the most important question they need to answer using a protein structure? How can I use structural biology to maximise the success rate of the series of compounds I have Rate the competencies 1) Understand what can, and what cannot, be inferred from a structure 2) Perform detailed validation and assessment of protein-ligand structures 3) Prepare protein model(s) for docking Justification of ranking They think they know the answer – so make sure they do Prepare structure –quality of structure, pitfalls of homology modelling, Build a good model (black box tools are dangerous) Expertise for training does not lie with us What is the most important question they need to answer using a protein structure? How to provide scripts they can use immediately? (They are driven by “toys and tricks”) i.e. they want a visualisation of a docked molecule and that very quickly 9 Breakout 2: Medicinal chemists and computational chemists Computational chemists Medicinal chemists Ways to engage the community Provide training that solves their problems What is available to solve it? Gaps seem to be in software Pitch in the right “language” List advantages Research-focused questions Rationale / justification More med chemists than comp chemists Less “structural” than comp chemists Engagement must be research focused Resources are available to train them Appropriate delivery mechanisms Face to face is easy (50 miles around Cambridge) Onsite training Online format Wiki? Rationale / justification Ways to engage the community Do a lot of stuff that is considered to be bioinformatics They always did it Identify missing link, i.e. “you could be the interface between med chemist and structural biologist Rationale / justification Need to be aware of historical development of job specs They know software, but need mechanisms to keep them at the cutting edge Appropriate delivery mechanisms Get them off site! Onsite training Online format Wiki? Rationale / justification Undisturbed training Accreditation Content (see JPO’s notes) Who we deliver to 10 Structural biologists and bioinformaticians Community: Structural biologists are more computer literate than many other communities Bioinformatics includes medical informatics Expansion in industry in bioinformatics GWAS / disease community often not understand structure Generation of an online portal that would include: Include a social media organisation possibly on existing resource such as BioStar Own Q & A page Collect case studies Links to existing material Links from existing servers and resources Courses, some of which are captured (video? for training material (Masterclass) Webinars Follow key people EU links Subdivide into focus groups, categorised by area of expertise rather than problem bases - Principles of structure - Structure prediction and annotation - Protein–protein complexes - Protein–ligand interactions - Sequence analysis and its structural implications - Comparative structural analysis 11 Clinical researchers and molecular life scientists Clinical researchers General thoughts: “Time poor” Need for quick “simple” returns Need to show relevance of structure to their work – hence bioinformatics in general Ways to engage the community Clinical conferences Probably unlikely through web contact Must communicate with those in the field who have a better knowledge of their community – those that interface across the bounds of academia and clinical research (NHS environment for example) Rationale / justification Face to face contact Easier to relate to their work/research – establish their interests/needs Not “easy” to engage them on the web (via a dedicated website) Engagement with “those who know the area” would make contact far easier(?) to establish Appropriate delivery mechanisms Videos (?) – short “commercials” designed for specific topics “what can the Romans do for you”!! Rationale / justification Many people watch YouTube videos!! (quickly digestible material for “time poor” people) Note that many CRs are not aware of the structures/tools available to them Molecular life scientists General thoughts: “Easier” because they work with “strings of letters” hence we can show them how structure is relevant to this Ways to engage the community Fortunate in some way because there is a background awareness of the importance of structure – therefore we can harness that awareness to encourage their interest Conferences (again) Need to show relevance to their work/research Rationale / justification Bring home the understanding that changing “these letters” within the sequence might have dramatic consequences on their structure Benefit from understanding structure would/could make modifications more significant/informative Appropriate delivery mechanisms Something which gives them both a theory and a practice – a “tool” to make them more aware of the significance of structure Maybe web based Maybe direct contact (meetings) Videos (again) Rationale / justification Already aware – just need a “nudge” perhaps YouTube “easy” Need (perhaps more so) to enable their knowledge of the bioinformatics “tools” available to them (because of their awareness) However “some” MLS’s are not so aware of the many tools (structures) that are available to them 12 Breakout 3 Short-term opportunities What is realistic to do by June (4 months)? Single central hub – needs to be hosted on ELIXIR UK website - Branding? (Make it known / useful within community) - “Agony Aunt” approach – PI blog entries: LinkedIn, Facebook, Twitter - Registry of material - Feed into TESS Start by targeting a general audience first - Better to target one general audience well than several audiences not so well - The idea would be to develop a more tailored portal / material for specific audiences in the future Community champions - Structural and non-structural Willing volunteers needed from the workshop - Help generate material - Contribute to community and help answering questions in forums - Allowing the project to use their existing training material Need to be clear about commercial rights (of tools, etc.) How do we credit good material? Create a central resource around what is a structure and what can and can’t be inferred by a structure - Start basic and can get more complex as you progress through material - A slide set could be useful covering this topic - Also create a series of short videos showing an expert talk you through a structure (with a few case study examples), highlighting the different parts of the structure and showing the user which bits of the structure are relevant, etc. - Different users require different mechanisms of delivery 5 main questions – each PI should submit the top 5 questions they are most commonly asked around structural bioinformatics - Top 5 questions together with detailed answers using links to appropriate training material and resources/tools - Questions not answered here could be answered in the forum Statement of sustainability - We need something that we can continually develop (evolve) and update - Where is what we put together going to sit? - FAQ-based approach - What is a structure? - What makes a good structure; what makes a bad structure? - Point to training resources that answer these questions 13 - Pose questions that you’d be happy to answer – so each champion needs to come up with a structure Keep things small and modular Long-term strategy A single structural bioinformatics network with thematic areas. Experts / champions are in brackets: - Principles of structure (qualitative issues): Instruct / PDBe (Sameer Velankar) Structure prediction / annotation: Genome 3D (Mike Sternberg and David Jones) Protein–protein complexes (Franca Fraternali) Protein–ligand interactions (John Overington) Structure-based sequence analysis (Geoff Barton) Structure comparison: SCOP / CATH (Christine Orengo) Structure to function (Mark Wass) Web-based learning (Adrian Shepherd) Molecular dynamics (Phil Biggin?) Outcomes of common principles for training materials Audiences Communities of practice for different thematic areas Organise the training around community specific research questions and needs Construct carefully tailored surveys to each of the identified communities to specifically target their needs and the gaps they identify Seeing wood from trees – how do make sense of validation of data Theme-based workshops Understand the audience - different entry levels - support from their organisation (academic or industry) Accessibility - for academic and industry community - clarity of course / entry to it Topics Decide about common scripting languages to be used to share Not software training Balanced on polemic issues (e.g. hydrophobic effect) Conditioning structures to address problem – what to think about when docking ligands to x-ray structures Learning methods Impact Problem-based learning EMTRAIN’s central portal of learning methods Range of learning styles, blended learning 14 Delivery mechanisms TED conferences and TED talks Identify best practices for online training Chat rooms – question and answer sessions On demand Modularity – re-versioning core material for different audiences Central portal Openly accessible / open access (with attribution) If multi institution - open access/creative commons Recognition Accreditation - recognition of gained skills - make ELIXIR the place to go Course badging Accuracy Timelines (up to date) – maintainable To do list from long-term strategy breakout Meeting with Champions /1 month @UCL – Christine to set up doodle poll Matrix of people – who should contact who? Feedback to rest of ELIXIR-UK on outcomes of the workshop as a whole Champions to get people in their groups to assess existing training material Community need – obvious and can illustrate this through case studies - RCG to collect Workshop – consultation with rest of Europe Write a 2-pager for Alf Game for Jul 2014 o Building a resource for training o Structural bioinformatics jump station – serve as a role model for other sectors o Prototype website to show them o ‘Count what’s happening’ and brand it Some use cases Mark Wass – clinicians with SNPs -> structural data; Protein networks, PPIs Charlotte – antibody-based therapeutics Roman: Sanger – structure-based analysis of GWAS data Friedrich’s Docking, comp chem – cover this in MTP 15 A list of workshop attendees: Christine Orengo – UCL, Genome3D Mark Wass – Kent University Adrian Shepherd – Birkbeck College Charlotte Deane – University of Oxford John Mitchell – St Andrews Mike Sternberg – Imperial College, Genome3D Sameer Velankar – EMBL-EBI (PDBe) David Jones – UCL, Genome3D John Overington – EMBL-EBI Geoff Barton - Dundee Franca Fraternali – King’s College London Cath Brooksbank – EMBL-EBI Sarah Morgan – EMBL-EBI Tim Levine - UCL Alexandra Simperler – Imperial College Brian Marsden - SGC Richard Grandison – EMBL-EBI Friedrich Rippmann – Merck Serono Nick Keep – Birkbeck College Aleksandra Pawlik - Software Sustainability Institute Ian Sillitoe – UCL Garrett Morris - Crysalin Pam Thomas - GSK Ardan Pawhardan - EMBL-EBI (PDBe) Robert Janes – Queen Mary University, London 16