Enabling_European

Enabling European-Wide Sharing of Data in the Life Sciences Biological research is being transformed from a laborious and costly data-gathering discipline to a highly collaborative science driven by systematic and (relatively) inexpensive data acquisition followed by complex analysis. Similar to other data-intensive sciences such as high energy physics, astronomy and oceanography, large datasets drive discoveries and form a bedrock of information on which lifescientists plan, execute and understand future investigations. Much of biology relies on good, accurate, cataloguing of facts about biological systems – from the DNA sequence of genomes through the three dimensional structures of proteins to arrangement of molecules in molecular pathways. Since the mid-1970s systematic databases of known molecules have been developed, first for protein structure (PDB) and then DNA sequence in the 1980s (EMBL, GenBank). These databases, and others like them, have provided the bedrock for many discoveries, both planned and serendipitous, over the decades. The remarkable diversity addressed in life science encompasses 7 billion people worldwide, over 8 million eukaryotic species and at least 10 million bacterial species. Among eukaryotes, individuals themselves can be a complex assemblage of cells, tissues and commensal organisms. This complexity and diversity is altered continuously through the process of evolution, making data management a daunting undertaking. Living organisms respond and interact with their environment, often through mechanisms that are only partly understood. This observational and experimental complexity makes metadata and provenance acquisition complex but critical. It also means that life science arguably provides the most complex and heterogeneous datasets that science can currently imagine. Life science needs a new approach. The onset of high-throughput sequencing technologies has created a deluge of data. Most lifescience data archives double every 9-12 months with some disciplines growing even faster, for example proteomics databases currently double in size every 4-5 months. With high-content biology and, in particular, sequence-based biological assays becoming routine at every major bio-research centre, and accessible by most of Europe’s life-science researchers, we need to connect data management, standards, and services between all stakeholders - from local research institutes through to global core reference data archives. Data-driven analysis and research relies on a large and growing number of reference data resources and biological knowledge-bases that serve all life-science disciplines and provide focused resources that are small but critically important for a single community. In Europe alone there are over 1,800 bioinformatics resources (http://www.elixir-europe.org/documents/final-reportstrategy-data-resources). Data needs to be Findable, Accessible, Interoperable and Re-usable (FAIR) to generate value for a research community beyond the initial researcher’s laboratory. The importance of long-term stewardship is highlighted by the observation that the odds of retrieving the data from a scientific publication decline by 17% per year. Life science data infrastructure must be able to cope with the aggregation, annotation and functional integration of data from thousands of laboratories across Europe, as well as the access demands of users worldwide (e.g. the Human Protein Atlas received more than 750,000 visits during 2013). ELIXIR, established in 2014 as a legal entity, brings together Europe’s major life-science data archives and, for the first time, connects these with national bioinformatics infrastructures. By coordinating local, national and international resources the ELIXIR infrastructure will meet the datarelated needs of Europe’s 500,000 life-scientists. This scalable infrastructure connect and sustain lifescience’s core data archives and provides standards, tools and training for data stewardship. ELIXIR is an Open Infrastructure: it does not “own” all data resources in Europe. ELIXIR provide a coordinated ELIXIR Interoperability Backbone that allows partners (e.g. other Research Infrastructures, national resources, institutional archives) to make use of existing resources and connect and interoperate their own resources. Providing a sustainable infrastructure that manages data identifiers, secures data archiving and access, and ensures mappings between resources will enable long-term, costeffective, data management and drive “standards as the default” across the life sciences. This talk will make the case, through examples of value and reuse, that ‘Open data’ needs to go beyond disclosure; to impact future research projects data needs to be managed - Findable, Accessible, Interoperable and Re-usable research data requires infrastructure and well-trained experts that support users.

Enabling_European

Related documents

Products

Support

Enabling_European

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib