Supplementary Table 1: Sharing Initiatives Project name Commercial DNAnexus http://sra.dnanexus.com/ Illumina https://basespace.illumina.com/home Life Technologies Ion Torrent Community http://www.iontorrent.com/communit y/ Complete Genomics http://www.completegenomics.com/s ervices/data-management-analysis/ Genedata http://www.genedata.com/profession al-service/data-analysis.html GenomeQuest http://www.genomequest.com/techno logy ID Business Solutions http://www.idbs.com/products-andservices/inforsense-suite/ Non-commercial alliances Pistoia Alliance http://www.pistoiaalliance.org/ BioIT Alliance http://bioitalliance.org/ Non-profit initiatives BioSharing http://Biosharing.org crowdLabs http://www.crowdlabs.org/ Galaxy http://galaxy.psu.edu/ myExperiment Virtual Research Sharing functionality Sharing trends Hosts National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) for raw sequence data from next-gen sequencing platforms. Data-sharing, analysis and storage for Illumina platform users User portal share data, protocols and code Built user interface and mirrors 300-400 terabytes of SRA data (no medical data) on Google’s cloud. Offers proprietary cloud-based analysis and visualization tools that users can share. Offers sequencing services, data management, analysis, and results sharing. Software has built-in sharing functionality for power-users and workflow users Downstream analysis and data sharing services invade market of software service providers. Sharing functionality built into analysis tools Software and consulting firm with platform for data analysis and data integration, InforSense Suite Description Collaborative group of pharma and life sciences companies exploring precompetitive data-sharing. Founded by Microsoft, now a non-profit organization. International network of organizations geared toward data-sharing and standardization in the life sciences Repository for computational workflows (not only life sciences); offers access to high performance computing Web and cloud-based open source sequence analysis tools Collaboration between the BaseSpace, genomics data-sharing space on Amazon’s AWS cloud infrastructure, in beta testing phase. Requires user registration. Sharing portal Ion Torrent Community requires user registration Positioned for pharma outsourcing and publicprivate projects, such as Europe’s InnoMedPredTox. http://www.imi.europa.eu/content/pilot-projectinnomed As customer sharing behavior changes, less interest in data storage. More sharing of analysis results that sharing of raw data. Lung Genomics Research Consortium expanded one of suite’s components, ClinicalSense, for its data analysis and sharing portal Projects Launches data-sharing projects for next-gen sequence data, biomarker exchange standards. Runs competitions, for ex. Sequence Squeeze Competition seeking algorithm to compress next-gen sequence data Seeks to create standards—data models and transmission standards—to enable data-sharing in translational medicine Developed standard called ISA Commons to streamline data sharing Uses VisTrails, an open source workflow system. Galaxy Pages lets users see, re-use, and extend workflows http://wiki.g2.bx.psu.edu/Learn/Galaxy%20Page s Platform to share workflows. Users can share Environment http://www.myexperiment.org/ National Center for Biotechnology Information (NCBI) http://www.ncbi.nlm.nih.gov Sage Bionetworks http://sagebase.org/ World Wide Web Consortium Semantic Web http://www.w3.org/2001/sw/ Workflow 4 Ever http://www.wf4everproject.org/web/guest/home Sharing networks and repositories BioPortal http://bioportal.bioontology.org/ Concept Web Alliance http://www.nbic.nl/aboutnbic/affiliatedorganisations/cwa/introduction/ Cytoscape http://www.cytoscape.org/ Datacite http://datacite.org/ Force11 http://force11.org/ universities of Southampton, Manchester and Oxford in the UK. Online resources with databases and analysis tools. A division of the National Library of Medicine at the National Institutes of Health. A non-profit focused on sharing science founded by former Merck researchers Stephen Friend and Eric Schadt Part of the international community organization World Wide Web Constorium (W3C) Web-based resource to preserve and share methods and workflows. workflows openly or keep them private. Repository run by The National Center for Biomedical Ontology, part of the National Centers for Biomedical Computing Group effort addressing semantic web applications, based at The Netherlands Bioinformatics Centre Open-source software to analyze and visualize biological networks A non-profit, international organization of libraries Has portal stores over 300 controlled vocabularies and ontologies in biomedicine. Users can submit download ontologies and upload them to share with others. DNA sequence resource, GenBank, run by NIA, EMBL and DNA DataBank of Japan. Dozens of terabytes of data are downloaded from NCBI resources every day. Launches research collaborations, for example the public-private CommonMind Consortium to share neuropsychiatric disease Has groups devoted to data-sharing in the life sciences, for example Semantic Web Health Care and Life Sciences Interest Group Has partners in genomics and astronomy. Complementary to SHIWA (Sharing Interoperable Workflows for large-scale scientific simulations on distributed computing infrastructures) Establishing uniform, user-friendly online platform for text-mining from published texts, databases, and offline resources. Developers are working on a database for sharing network models. Offers service for data publishers to mint Digital Object Identifiers (DOIs) for data-sharing. DOIs are also available for datasets. Datacite is compiling a list of research data repositories and working on ways to use DOI to retrieve metadata. Formed in 2011 to explore new ways to share, create, and communicate scholarly knowledge. A group of editors, publishers, scientists librarians, and research funders Genocoding Project A data harvesting initiative Software tool scans journal papers for genomic http://text.soe.ucsc.edu/ based at the university of identifiers and maps them to human genome. California at Santa Cruz and the University of Manchester Nanopublications A venture seeking to use Nanopublications are being tested in Open http://www.nanopub.org semantic tools to harvest Pharmacological Concepts Triple Store (Open assertions and to them with PHACTS), a European public- private venture DOIs http://www.openphacts.org EMBL: European Molecular Biology Laboratories. Sources: Nature Biotechnology research, Frost & Sullivan, company data