GigaDB

advertisement
SiZhe Xiao GigaScience 2013
Open Access
POSTER
GigaDB – revolutionizing data dissemination, organization
and use
Xiao Si Zhe1 , Chris Hunter, Tam P. Sneddon, Scott C. Edmunds, Alexandra T. Basford, Peter Li, and Laurie Goodman.
Abstract
GigaScience, the online open-access open-data journal, has recently developed GigaDB, a new integrated database of ‘big-data’ studies from the life and biomedical sciences. The initial
goals of GigaDB are to assign DOIs to datasets to allow them to be tracked and cited, and to provide a user-friendly web interface to provide easy access to selected GigaDB datasets and
files.
We will be working with authors to make the raw data, computational tools and data processing pipelines described in the GigaScience papers available and, where possible, executable
on an informatics platform. We hope that by making both the data and processes involved in their analysis freely accessible, this novel form of publication will help articles published in
GigaScience to have a much higher impact in the scientific literature, and maximize their reuse within the community.
GigaDB currently accepts submissions in Excel format. Example submission and template files can be found on the website (http://gigadb.org/). To date, GigaDB comprises over 56
datasets and includes Genomic, Transcriptomic, Epigenomic and Metagenomic dataset types but we accept many other dataset types including proteomic and neuroimaging studies.
Future goals include integration with the BGI Cloud, and with the Galaxy software tools to enable users to directly upload files to Galaxy for further analysis. We are also working with ISATab and other scientific standards groups to support and extend the usability and interoperability model.
Keywords: DOI, Galaxy, big-data, database, informatics platform, GigaScience
Background
GigaDB
Home page: www.gigadb.org
Growing replication gap:
Datasets public in GigaDB
• 10/18 microarray papers cannot be reproduced
• Ioannidis: “Most Published Research Findings Are False”
• >15X increase in retracted papers in last decade
• Lack of incentives to make data/methods available
• Poor metadata quality and lack of interoperability
GigaSolution: deconstructing the paper
Combine and integrate (via citable DOIs):
Open-access journal
www.gigasciencejournal.com
Aspera data transfer Faster download speeds
Data Publishing Platform
gigadb.org
GigaDB Submission Workflow
Data Analysis Platform
galaxy.cbiit.cuhk.edu.hk
Submitter logs in to
GigaDB website and
uploads Excel
submission
Linking papers to data and analyses
Open-Paper
Open-Data
Fail – submitter is
provided error report
Curator Review
DOI
assigned
78GB CC0 data
Pass – dataset is
uploaded to
GigaDB.
Open-Pipelines
Open-Workflows
Analyses
DOI:10.5524/100044
Submitter
provides files by
ftp or Aspera
Excel
submission file
DOI:10.5524/100038
Data sets
Curator contacts submitter
with DOI citation and to
arrange file transfer (and
resolve any other
questions/issues).
Files
GigaDB
XML is generated and
registered with DataCite
DataCite
XML file
Curator makes dataset public
(can be set as future date if
required)
Public GigaDB dataset
DOI 10.5524/100003
Genomic data from
the crab-eating
macaque/cynomolgus
monkey (Macaca
fascicularis) (2011)
Acknowledgements
Thanks to:
Laurie Goodman, Chris Hunter, Scott Edmunds, Tam Sneddon (GigaScience),
Shaoguang Liang (BGI-SZ), Qiong Luo, Senghong Wang, Yan Zhou (HKUST), Rob
Davidson and Mark Viant (Birmingham Uni), Marco Galardini (Unifi)
Financial support from:
Correspondence: jesse@gigasciencejournal.com
1. BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong SAR, China.
2. BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen, China.
3. School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.
4. CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.
5. HKU-BGI Bioinformatics Algorithms and Core Tecnology Research Laboratory & Department of Computer Science, University of
Hong Kong, Pok Fu Lam, Hong Kong
6. Oxford e-Research Centre, University of Oxford, Oxford, UK.
© 2013 Edmunds et al. This is an Open Access poster distributed under the terms of the Creative Commons Attribution
License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
doi:10.6084/m9.figshare.xxxxx
Cite this poster as: GigaGalaxy: A GigaSolution for reproducible and sustainable genomic data
publication and analysis. Scott C. Edmunds, Peter Li, Huayan Gao, Chris Hunter, Si Zhe Zhao, Ruibang
Luo, Dennis Chan, Alex Wong, Zhang Yong, Tin-Lap Lee, ISA-TAB team. figshare
http://dx.doi.org/10.6084/m9.figsharexxxx
Submit your next manuscript containing large-scale data and workflows to GigaScience
and take full advantage of:
• No space constraints, and unlimited data and workflow hosting in GigaDB and
GigaGalaxy
• Article processing charges for all submissions in 2013 covered by BGI
• Open access, open data and highly visible work freely available for distribution
• Inclusion in PubMed and Google Scholar
Download