SiZhe Xiao GigaScience 2013 Open Access POSTER GigaDB – revolutionizing data dissemination, organization and use Xiao Si Zhe1 , Chris Hunter, Tam P. Sneddon, Scott C. Edmunds, Alexandra T. Basford, Peter Li, and Laurie Goodman. Abstract GigaScience, the online open-access open-data journal, has recently developed GigaDB, a new integrated database of ‘big-data’ studies from the life and biomedical sciences. The initial goals of GigaDB are to assign DOIs to datasets to allow them to be tracked and cited, and to provide a user-friendly web interface to provide easy access to selected GigaDB datasets and files. We will be working with authors to make the raw data, computational tools and data processing pipelines described in the GigaScience papers available and, where possible, executable on an informatics platform. We hope that by making both the data and processes involved in their analysis freely accessible, this novel form of publication will help articles published in GigaScience to have a much higher impact in the scientific literature, and maximize their reuse within the community. GigaDB currently accepts submissions in Excel format. Example submission and template files can be found on the website (http://gigadb.org/). To date, GigaDB comprises over 56 datasets and includes Genomic, Transcriptomic, Epigenomic and Metagenomic dataset types but we accept many other dataset types including proteomic and neuroimaging studies. Future goals include integration with the BGI Cloud, and with the Galaxy software tools to enable users to directly upload files to Galaxy for further analysis. We are also working with ISATab and other scientific standards groups to support and extend the usability and interoperability model. Keywords: DOI, Galaxy, big-data, database, informatics platform, GigaScience Background GigaDB Home page: www.gigadb.org Growing replication gap: Datasets public in GigaDB • 10/18 microarray papers cannot be reproduced • Ioannidis: “Most Published Research Findings Are False” • >15X increase in retracted papers in last decade • Lack of incentives to make data/methods available • Poor metadata quality and lack of interoperability GigaSolution: deconstructing the paper Combine and integrate (via citable DOIs): Open-access journal www.gigasciencejournal.com Aspera data transfer Faster download speeds Data Publishing Platform gigadb.org GigaDB Submission Workflow Data Analysis Platform galaxy.cbiit.cuhk.edu.hk Submitter logs in to GigaDB website and uploads Excel submission Linking papers to data and analyses Open-Paper Open-Data Fail – submitter is provided error report Curator Review DOI assigned 78GB CC0 data Pass – dataset is uploaded to GigaDB. Open-Pipelines Open-Workflows Analyses DOI:10.5524/100044 Submitter provides files by ftp or Aspera Excel submission file DOI:10.5524/100038 Data sets Curator contacts submitter with DOI citation and to arrange file transfer (and resolve any other questions/issues). Files GigaDB XML is generated and registered with DataCite DataCite XML file Curator makes dataset public (can be set as future date if required) Public GigaDB dataset DOI 10.5524/100003 Genomic data from the crab-eating macaque/cynomolgus monkey (Macaca fascicularis) (2011) Acknowledgements Thanks to: Laurie Goodman, Chris Hunter, Scott Edmunds, Tam Sneddon (GigaScience), Shaoguang Liang (BGI-SZ), Qiong Luo, Senghong Wang, Yan Zhou (HKUST), Rob Davidson and Mark Viant (Birmingham Uni), Marco Galardini (Unifi) Financial support from: Correspondence: jesse@gigasciencejournal.com 1. BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong SAR, China. 2. BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen, China. 3. School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China. 4. CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China. 5. HKU-BGI Bioinformatics Algorithms and Core Tecnology Research Laboratory & Department of Computer Science, University of Hong Kong, Pok Fu Lam, Hong Kong 6. Oxford e-Research Centre, University of Oxford, Oxford, UK. © 2013 Edmunds et al. This is an Open Access poster distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. doi:10.6084/m9.figshare.xxxxx Cite this poster as: GigaGalaxy: A GigaSolution for reproducible and sustainable genomic data publication and analysis. Scott C. Edmunds, Peter Li, Huayan Gao, Chris Hunter, Si Zhe Zhao, Ruibang Luo, Dennis Chan, Alex Wong, Zhang Yong, Tin-Lap Lee, ISA-TAB team. figshare http://dx.doi.org/10.6084/m9.figsharexxxx Submit your next manuscript containing large-scale data and workflows to GigaScience and take full advantage of: • No space constraints, and unlimited data and workflow hosting in GigaDB and GigaGalaxy • Article processing charges for all submissions in 2013 covered by BGI • Open access, open data and highly visible work freely available for distribution • Inclusion in PubMed and Google Scholar