ISBE working paper 3.0 ISBE: Infrastructure for Systems Biology Europe Table of Contents 1 An Infrastructure for Systems Biology Europe ...................................................................... 1 1.1 Why is Systems Biology needed? .................................................................................. 1 1.2 What’s new in the proposal? .......................................................................................... 2 1.3 What is the purpose of Systems Biology? ...................................................................... 2 1.4 What is the role of Systems Biology? ............................................................................. 2 1.5 How will Systems Biology help? ..................................................................................... 3 1.6 How much will it cost? ................................................................................................... 3 2 A distributed research infrastructure, ISBE, is needed urgently for Biology .......................... 3 3 The dynamic infrastructure of European Systems Biology .................................................... 4 4 Focus ................................................................................................................................... 5 5 The different branches of the ISBE: TAP, i.e. Tools, Activities & People .............................. 6 5.1 Tools.............................................................................................................................. 6 5.1.1 Connecting to model generation methodologies ................................................... 6 5.1.2 Connecting to experimental design methodologies ............................................... 6 5.1.3 Connecting to component-data generation methodologies ................................... 6 5.1.4 Connecting to technology development ................................................................ 6 5.1.5 Connecting to physiological expertise ................................................................... 7 5.1.6 Connecting to data and software management facilities ....................................... 7 5.1.7 Connecting to data analysis and data management methodologies ...................... 7 5.2 Activities ........................................................................................................................ 7 5.2.1 Real-time, cross-laboratory data generation ......................................................... 7 5.2.2 Real-time, cross-laboratory modelling ................................................................... 7 5.2.3 Real-time, cross-laboratory model validation ........................................................ 8 2 5.2.4 Real-time, cross-laboratory data integration & curation and model integration & curation................................................................................................................. 8 5.2.5 Dynamic standardisation ...................................................................................... 8 5.3 People ........................................................................................................................... 8 5.3.1 Personnel Training and Education ........................................................................ 8 5.3.2 Methodology and iterative calibration on what is needed for Europe ..................... 9 5.3.3 Meetings ............................................................................................................... 9 5.3.4 Work-workshops and jamborees........................................................................... 9 5.3.5 Platforms linking with industry ............................................................................... 9 5.3.6 Embassies ............................................................................................................ 9 6 Management ...................................................................................................................... 10 7 Timing ................................................................................................................................ 10 1 Executive Summary This paper outlines the proposal for an Infrastructure for Systems Biology in Europe (ISBE), a unique type of distributed infrastructure that is designed to meet the needs of European Systems Biology, both in terms of development and its applications. Systems Biology requires the simultaneous and highly interactive implementation of a large number of multiple and diverse activities. These range from mathematical and formal modelling to biological and clinical experiments, to technology development. For each medical/biological/biotechnological problem to be addressed, the optimal combination of activities is different. The total array of activities is so large, and the needs so dynamic, that it cannot be undertaken at a single location – whilst simultaneously delivering optimal quality and efficiency. Focus of ISBE: Answering the question - how the interaction of biological components leads to the functioning of living organisms in a constantly changing environment? Structure of ISBE: We are proposing a widely distributed, branched infrastructure comprising interconnected hubs – where each hub (ISBE Institutions/Centres) focuses on an area of Systems Biology. For example, model organisms, model cell populations, disease, biotechnology, ecology & green biology. In addition, the infrastructure will include technological expertise; for example, stochastic computation, algorithmic modelling, inverse modelling and high throughput data generation. Infrastructure repositories will focus on data storage and handling, as well as standardisation. It is envisaged that individual research laboratories will participate in the ISBE by contributing more focused expertise - but using the potential offered by the infrastructure for technology development etc. Impact of ISBE: The aim of the ISBE will be to systematically understand complex biological processes and to develop technologies. This will transform basic knowledge of complex molecular systems into the new area of predictive, preventive and personalised medicine. In addition, systems biology approaches will provide new insights and assist the development of tools for the design of new biotechnological and environmental applications. The objective is to transform European Systems Biology into a world-wide activity by the use of web-based experimental facilities and live mathematical modelling. It is envisaged that this will make the science more productive, cost-efficient and useful for European Society. The creation of such an infrastructure will require detailed planning and management – with the need for a preparatory phase. 1 An Infrastructure for Systems Biology Europe 1.1 Why is Systems Biology needed for modern life-science research? The impact of European research in the life sciences for the benefit of society and industry has been significant less than expected. To a large extent this is due to the complexity of living organisms, in terms of both the number of interacting components and the essential nonlinearities of the interactions. Systems Biology sits at the frontier of research in the life sciences. It is a highly interdisciplinary approach to the study of biological complexity in health and disease. Over the past 20 years it has become clear that the understanding of biological 2 systems is central to life science. Single molecules may contribute to life, but are not, in themselves, alive. Medicine today focuses on the systematic diagnosis and treatment of ‘multifactorial’ (and hence elusive) diseases - as opposed to the traditional, single molecule, targeted approach. Hence, new Systems Biology Infrastructures, capable of supporting technology transfer, as well as translating scientific and medical discoveries into medicine and related areas, will improve both the basic understanding of life and Medicine. It will also enhance the economic potential of Europe. The Systems Biology enabled by the proposed ISBE (described in this paper) stands a real chance of significantly increasing the quality and effectiveness the science in areas such as health, green and white biotechnology, bio-energy and ecology. 1.2 What’s new in the proposal? The increasing pace of advances in molecular biology, biotechnology and medicine is driven by interdisciplinary research - which combines experimental molecular biology, clinical research, bioinformatics, computational biology, mathematics, computer science, physics, chemistry and biological engineering in an approach called Systems Biology. Systems Biology is a new way of doing science, demanding new intellectual and organisational structures to deliver its full potential. It requires the combined implementation of computational and experimental approaches (which is novel for most molecular biological sciences). Simultaneously, it addresses (a) the complexity of networks - which are, ultimately, as large as entire expressed genomes; and (b) precise, experimentally determined interactive properties of the molecular constituents. This approach has, hitherto, only been applied in engineering and the physical sciences. In addition, Systems Biology is amenable to studying molecular interactions in more clinical and biotechnological settings. The task of Systems Biology is to integrate accurate experimental data with predictive models of life in health and disease. The challenge for interdisciplinary research is to integrate the experimental, computational and theoretical sciences. 1.3 What is the purpose of Systems Biology? Systems Biology integrates high-throughput technologies, model systems, molecular biology, biochemistry, engineering, information technologies, bioinformatics, clinical research and innovative engineering to understand how biological function emerges from interacting biological components. Such integration can only be achieved through a certain critical mass of experimentation, such as in genomics, and with the help of mathematical analyses, modelling, informatics and statistics. Biological networks, both intracellular and in-and-between whole cells, tissues and organisms, connect thousands of molecular and higher-order functions, such that the functioning of any part of the network depends on different, remote parts. 1.4 What is the role of Systems Biology? Most biological processes governing health or disease (metabolic, sport, developmental, cancer, cardiovascular, neurodegenerative et cetera) involve complex interaction networks between hundreds of genes and proteins. Invariably, the complexity is enormous and every case becomes different. This necessitates the integration of experimental, quantitative data on a systems-wide level to obtain information about the state, dynamics and variability of living cells, 3 organs, organisms and populations. The goal is to standardise these approaches and integrate them into predictive models. This also relates to the understanding and promotion of health and the retardation of ageing, as well as to diagnosis (e.g. through novel biomarking strategies) and therapies (e.g. using new network targeting drugs). Whilst the latter is directly relevant to red biotechnology, the same issues apply to nutrition and white biotechnology (e.g. engineering fungi to produce more and better antibiotics), to green biotechology (e.g. engineering plants towards the production of food in more efficient and carbon dioxide neutral ways), as well as to the fields of bio-energy (e.g. better ways to produce bioethanol from plant waste) and ecology (e.g. reducing the escape of NO and N2O from waste treatment plants, or methane from cattle). 1.5 How will Systems Biology help? Mathematical modelling and analysis can contribute, but both require meticulous and quantitative experimentation to provide precise and useful data. Because so many experimental conditions and approaches are possible, model driven experimental design and control can make important experiments much more effective and meaningful. Ultimately, the ability to carry out experiments steered in real time by modelling, and vice versa, will greatly empower Systems Biology and achieve, in many areas of its application, far greater impact on health and the economy. The capacity to achieve the objectives outlined above, are, in principle, available in Europe - but the infrastructure is fragmented. For any Systems Biology problem to be addressed, numerous state-of the-art methodologies need to be implemented and integrated with new methodologies and standardisation procedures (which have to be developed). These methodologies are currently available, or under development, in individual centres or laboratories in various European countries. An important role of ISBE will be to identify, structure and support of largescale research projects on, for example, multifactorial diseases, bioenergy and biomanufacturing. Such projects are currently beyond the capability of single European Institutions. In parallel, the ISBE will provide support to individual laboratories and smaller medium size consortia that have succeeded in national or European grant applications 1.6 How much will it cost? The creation of the new infrastructure will involve significant cost, both in terms of installing connections between existing centres and funding new and required activities in existing centres. However, such a development will ultimately result in considerable savings. Since the ISBE will connect all the expertise and facilities in Europe, there will be a large reduction in duplication and existing facilities will be used to their full capacity. 2 A distributed research infrastructure, ISBE, is needed urgently for Biology Systems Biology requires the implementation of a large range of approaches, both computational and experimental. What is unique about modern Systems Biology is that all of the approaches are required simultaneously and interactively. The challenge is how to integrate the required facilities and expertise? An initial answer to this problem has been to create centres which, at least, have some of the requirements. Experience in the US and Europe has shown that the most innovative interdisciplinary science is best carried out in settings where: (a) 4 experimental and computational/theoretical/engineering investigators interact on a daily basis; (b) where they build and share open laboratories, resources and technology cores; and (c) where the next generation of Systems Biology scientists are educated in the same location. However, the evidence is that in practice systems biologists in such centres collaborate more with groups in other centres than with groups in their own centre. This is because any specific Systems Biology problem requires many different areas of expertise - and often this cannot be found at the right level within a single institution. It is very difficult to predict which sub-disciplines will be needed two years from now. Hence, building topic-oriented Institutes generally proves to be only a partial solution for most Systems Biology problems. The aim of ISBE therefore is to develop distributed, highly interconnected infrastructure using standardised methodologies that will support well coordinated, multidisciplinary, integrated research in the fields of biomedicine and biotechnology. We are proposing an arrangement of European Centres that will operate like a systems infrastructure having emergent properties and robustness that will surpass the simple summation of the individual parts (ie a synergistic relationship). By its nature, the infrastructure will assimilate most of the existing Systems Biology activities and facilities, without altering their institutional structures. The infrastructure will also assimilate expertise from other fields. In addition, it will also facilitate pan-European training and education (including cross-disciplinary approaches) for current and future scientists. The proposed dynamic overall infrastructure, comprising sub-infrastructures, will function in such a way that all of the expertise required for any new Systems Biology challenge can be assembled dynamically. 3 The dynamic infrastructure of European Systems Biology Modern Systems Biology projects are not confined to single institutions, each covering all of the research. Rather, such projects need to connect research groups in various institutions with specific expertise and/or facilities in order for them to work synergistically. Building on the experience gained by European research consortia in systems biology, the integrated European infrastructure will exploit and institutionalise existing synergies and create new opportunities for efficient research coordination and collaboration. The ISBE will make available cutting-edge technology in experimental and computational systems analysis to the wider community of European life scientists. Its backbone will be formed by ~50 centres that specialise in particular 5 experimental and/or computational technologies. These centres will collaborate with a wide range of scientists on specific projects, train researchers, and act as hubs for further technology development. A particular challenge in modern biology is the integration of approaches from the quantitative sciences of physics, mathematics and engineering. The ISBE centres will have a catalytic role in this process by providing a unique environment where scientists from all these disciplines meet, work together and educate each other. The European aircraft industry is an example of such working. The components of a plane are built in various factories in different, sometimes remote, locations and then brought together in other factories for assembly. The infrastructure required for the new Systems Biology (see the figure) is envisaged as being similar in certain aspects: (i) some of the connected centres (factories’) would be small and others larger; (ii) some connections would lead to the creation of research groups across centres, (iii) depending on the type of plane, different combinations of factories are involved - similarly, different groups would be involved in problems associated with biological function, cell, tissue, or organisms type;’ (iv) some connections will be directed to repositories of data or models, rather than research facilities; (v) centres and repositories are likely to be hubs connected by ‘superhighways’ carrying data, information and other forms of scientific interaction. The Systems Biology infrastructure would host intensive and reciprocal interactions through its information highways - for example, enabling a researcher in one centre to steer an experiment in another centre through a web-based computer interface (this would be in addition to classical cross-laboratory visits by the scientists involved in the project). The ISBE infrastructure will put the jigsaw puzzle of European Systems Biology together. It will comprise three components:(i) institutions/centres, i.e. research facilities at which research expertise and experimental and modelling facilities exist, (ii) repositories of data and models, and (iii) broadband connections between components (i) and (ii). The ISBE is envisaged as an infrastructure where combinations of institutions/centres will focus on a different biological problems by carrying out discovery-directed and hypothesis-driven research. These consortia will focus on distinct conceptual aspects of biology - such as model organisms, model cell populations, diseases, biotechnology, ecology etc. In addition, the development and application of new technologies will be the main task of a number of the ISBE partners. The combined expertise and facilities of the ISBE will serve the European Research Area by functioning as the entity for addressing important scientific problems, by disseminating technologies and by providing open access to data and software. Although ISBE institutions/centres will have complementary activities, each will typically support the following: de novo data generation, data extraction from all pre-existing sources, data management and curation, data analysis, model extraction from literature, de novo model generation and validation, visualisation and modelling, dynamic interaction of models and data, model driven experimental design, and training. 4 Focus The issue of how narrowly to focus is so pervasive that it could involve such a broad range of science as to become unmanageable. Alternately, to focus on a single disease or biotechnological challenge would loose the advantage of much of the expertise that is important for Systems Biology (e.g. quantitative Metabolomics, stochastic modelling). A focus on 6 ‘multifactorial disease’ would eliminate important applications in white and green biotechnology and ecology. Therefore, it is proposed that the focus of ISBE should be the integration of components, from molecules to organisms, towards a complete understanding of those living organisms - including the human, within the context of dynamic interactions with the environment. 5 The different branches of the ISBE: TAP, i.e. Tools, Activities & People Here the different categories of components of the European Infrastructure which are necessary to accelerate Systems Biology are described. This is done by naming an issue and then describing some examples identifying the required expertise and activities. It should be noted that this is only tentative list (a more comprehensive list will be produced in the preparatory startup phase of ISBE, and will remain dynamic thereafter). 5.1 Tools 5.1.1 Connecting to model generation methodologies Text mining for experimental data to populate families of parameters values. High capacity literature mining for existing models; both mathematical and conceptual. Link with the Bioinformatics EIS. Reverse and forward modelling, of various aspects of Systems Biology (molecular level, cell level, intercellular level, organisms level, multiscale in terms of space, multiscale in terms of time, multiscale in terms of chemistry, multiscale in terms of biological hierarchy [transcription, translation, metabolism, function]). Stability analysis, control analysis, regulation analysis, differential equation modelling, Bayesian modelling, Boolean modelling, cellular automata, differential equations solvers, modularization, flux balance analysis, highthroughput parallel computing, stochastic modelling, etc., etc. 5.1.2 Connecting to experimental design methodologies Model and parameter identification, mapping of experimental possibilities onto models, statistics, model comparison, hypothesis formulation, test design, development of new tools, robot scientist implementation, search for the required experimental facilities in the ISBE network. 5.1.3 Connecting to component-data generation methodologies The development of a major experimental infrastructure for component-data generation: High throughput genomics and Transcriptomics (DNA array systems, highly parallel DNA spectrometers and advanced multiplexing technologies). Equipment relevant to Advanced Mass Spectroscopy and multiplex systems (array based) for analytical and quantitative proteomics and the identification of protein networks and assemblages, 5.1.4 Connecting to technology development Systems Biology research programmes invariably have been held back by limitations in methodologies. Examples include: low rates of DNA sequencing (initially), lack of reproducibility of cultivation and sampling, inability to measure the concentrations of proteins and metabolites accurately, inability to identify new metabolites rapidly in biological samples etc. New 7 technologies are being developed. The ISBE will connect demand with technology development, creating both a demand pull and a technology push to drive European Systems Biology. 5.1.5 Connecting to physiological expertise For Systems Biology, not only is data generation a key issue, but, also, the performance of functional experiments involving the organism or cell type of choice, under standardised conditions relevant for model testing. This requires: (a) expertise in batch cultures, tissue culture, fed-batch, chemostat, turbidostat, auxostat, retentostat, in vivo cultures (e.g. tumour cells injected into mice), (b) rapid and reproducible sampling, and (c) the shipment of samples to the data collection centres (see above). These will require the functional integration of ISBE to Biobanking, Clinical Research and Translational Research centres across Europe. We envisage functional interaction with the BBMRI, ECRIN and EATRIS ESFRI projects, respectively. 5.1.6 Connecting to data and software management facilities Methodologies and expertise for high capacity computing infrastructure, for open access storage and management of large data sets and databases (petabyte level). We envisage collaboration with the ELIXIR ESFRI project. Maintenance and distribution of core software (tools) in Systems Biology. Connection to large-scale computing facilities. Collaboration with the EU infrastructure DEISA, and future HPC ESFRI, should be considered. 5.1.7 Connecting to data analysis and data management methodologies The provision of mainstream and specialised computers to allow high performance computing in order to analyse and visualise biological data, to model protein and multi- protein assemblies (nano-machines) and to simulate pathways, networks, cells , tissues and organs. Furthermore, ISBE Institutes will develop and implement GRID computing systems for massively parallel processing, integration with modelling, and model-driven data management. 5.2 Activities 5.2.1 Real-time, cross-laboratory data generation This component should enable cross laboratory experimentation, by short research visits, as well as by web based cross-laboratory experiments. Equipment relevant to Advanced Mass Spectroscopy and multiplex systems (array based) for analytical and quantitative proteomics and the identification of protein networks and assemblages. Gas Chromatography (GC), Liquid Chromatography (LG), Capillary electrophoresis (CE) as well as NMR and Fourier-transform ion cyclotron resonance mass spectrometry (FT-ICR-MS) for Metabolomics and fluxomics. Advanced high throughput imaging systems including microscopy, flow Cytometry and automated cell analysis should also be included, to yield essential functional data. The infrastructure should incorporate advanced robotic systems for remotely controlled experimental stations. 5.2.2 Real-time, cross-laboratory modelling Conversely, this component should allow experimental systems biologists to involve modellers in their day-to-day experimental work, again by short research visits - as well as through web- 8 based cross-laboratory modelling facilities. Workflows must permit to link data-resources, of experimental measurements, models and ontologies, modelling tools and simulation environments. Multi-scale, multi-approach models, such as detailed whole-cell models or large physiological reconstructions, should be developed in a modular fashion – taking advantage of each site's particular expertises. Generic models must be derived in specific instances using remote data for parameterisation, paving the way to personalised modelling. 5.2.3 Real-time, cross-laboratory model validation The provision of facilities for data and model import work flows, with subsequent model validation. In Systems Biology this is a major issue, as there are usually multiple ways of designing experiments to validate models (e.g. by parameter fitting). These multiple ways should be documented. Model validation in Systems Biology therefore becomes a determinant of the extent to which a multitude of models for a certain system/issue are validated by a number of data sets that are in existence. (The data sets may themselves have different qualities.) Again, cross laboratory activities are proposed. On a day-to-day basis, a model developer may call upon expert groups in model validation design, as well as the experimental group carrying out the validation - all in different European laboratories. 5.2.4 Real-time, cross-laboratory data integration & curation and model integration & curation Many data sets and models will become of great importance to Systems Biology. Hence, they will need to be carefully curated and documented. This activity will set up ‘Jamborees’ for such curation (for example to enable the Jamborees to come to consensus regarding, genome-wide maps of yeast and human metabolism). 5.2.5 Dynamic standardisation The standardisation of experimentation and modelling is crucial for the rapid development of Systems Biology. Standards are indispensable for data integration and have to follow the evolution of techniques and knowledge. Therefore, standards should be adaptable and incorporate mechanisms by which they can be changed. Standards, (whether reporting guidelines, data formats or ontologies, but, also, standard analysis tools), should only be adopted if the software allows easy implementation (eg libSBML, cytoscape, bioconductor). The continuous maintenance of standards and their support at a professional level is needed. 5.3 People 5.3.1 Personnel Training and Education Biology and Medicine have been transformed by the advent of Systems Biology. However, industry and academia have great difficulty in identifying on the one hand life scientists with capability in large scale data collection, modelling and data analysis, and on the other engineers/physical scientists with an appreciation for the detail and the importance of the complexity of biological function. Most of the ISBE institutions/centres will have scientists with diverse expertise ranging from mathematicians, physicists, engineers and computer scientists to biologists and medical doctors. Each will be involved in training and educating scientists in Systems Biology by providing both the necessary scientific personnel and infrastructures. More 9 European training centres dedicated to specific aspects of Systems Biology will be developed and connected by the ISBE. The training infrastructure will connect training facilities, such as universities, throughout Europe, including those in areas where there is more training in theory, to areas where training focuses on experiments. Training will be at all levels, from undergraduate through MSc, PhD and postdoctoral training, to retraining of industrial and academic staff. Regular courses will take place at universities, but there will also be summer schools, rotation projects, and distance e-learning will be part of the overall package. These will facilitate discipline hopping, as well as cross-discipline sabbaticals. An important issue is that part of the training will have to be done in non-traditional ways, emphasising collaboration and integration of modelling and experimentation. This will create a critical training mass in Systems Biology. The ISBE will solve the problem that present-day students have in relation to conservative curriculums in the traditional disciplines. 5.3.2 Methodology and iterative calibration on what is needed for Europe Experience with the (otherwise highly successful) biological sciences has shown that they have not been optimally tuned to the needs of European Society, nor have they been targeted at addressing current biological problems, which require Systems approaches. With the advent of Systems Biology, the situation has dramatically improved. From now on it should be possible to prevent, such lack of information flow between science and society. The ISBE should, therefore, house units that analyse the scientific methodology used (philosophy of science) and address the information flow between European Systems Biology, policy makers, and the European public (Public Health Management, ethics, well-tuned information about progress, information back from public and politicians). 5.3.3 Meetings ISBE will coordinate Systems Biology conferences targeting development and applications. 5.3.4 Work-workshops and jamborees The ISBE will organise workshops during which experimentalists and modellers meet for a number of days, with the assignment of producing a model of a defined biological process. A regular series of such work-workshops (or ‘jamborees’) will enable the updating of models with advancing knowledge. 5.3.5 Platforms linking with industry The ISBE will install platforms where scientists from its constituent institutions and scientists from industry and public health institutes can meet on a regular basis to discuss topics of mutual interest. This should enable the optimal steering of the ESBI towards public and industrial utility. These activities will also connect to those involved in the Innovative Medicines Initiative (IMI). 5.3.6 Embassies The ISBE will have ‘embassies’ to help connection to the outside world, such as Systems Biology infrastructures outside Europe. The ISBE will actively stimulate the organisation of similar infrastructures in other parts of the world, in part through its Scientific Advisory Board. 10 The ISBE will foster close links with the other life science ESFRIs (e.g. BBMRI, ELIXIR, INSTRUCT) and EU networks of excellence such as BioSim and ENFIN (and successors). 6 Management The ISBE will be a new phenomenon for the Life Sciences because of its scale and complexity. It will require strong management, unprecedented in Biology, and perhaps only comparable to that of CERN. The aim is that the management structure will crystallise in the set-up period of ISBE. It is recommended that a strong, yet responsive, individual lead the first phase, assisted by a secretariat and a steering committee - representing the components of the infrastructure, the funding bodies and the public. The Steering Committee will organise calls for and subsequent evaluations of, proposals for additions to the infrastructure. A Scientific Advisory Board (e.g. such as one consisting of Lee Hood, Hiroaki Kitano, Bernard Pallson, Douglas Lauffenburger, Sang Yup Lee, chaired by Sydney Brenner) will be asked to check on the excellence of ISBE. The ISBE will be managed at various levels. These include the level of the institutions, but, also, at the overarching level. A CERN model may be used in the beginning, but may need to evolve to manage a structure that is more suitable for Systems Biology. Early in the preparatory phase of ISBE, memoranda of understanding between the components institutions will be put in place. Standards and IP issues will then be settled. 7 Timing It is proposed that ISBE begins as a preliminary infrastructure in a preparatory phase. During these three years it will grow into a full blown infrastructure.