Nancy Wilkins-Diehr
San Diego Supercomputer Center wilkinsn@sdsc.edu
That reproducibility is a problem has already been established, but
• Brian Granger (IPython developer) talk at UCSD, May
2014
– Computing (thus software) is one of the foundations of data science
– Important decisions being made on these data
• Political, financial, institutional, peer review system, social
– Several recent examples of errors in academic data
• “Growth in a Time of Debt”, Reinhard and Rogoff (2010), Herndon,
Ash, Pollin (critique, 2013)
• “Capital in the 21 st Century”, Piketty (2014)
• BICEP2 (2014)
Software designed for specific purposes
• May do what it does well, but if it’s not designed to enforce reproducibility it will be nearly impossible for a user to achieve that
– Excel – almost impossible to design a reproducible experiment
– Github – almost impossible not to design a reproducible experiment
Many have used science gateways to address reproducibility
• IPython notebooks
– Perez, Fernando, Brian E. Granger, and C. P. S. L. Obispo. "An Open Source Framework
For Interactive, Collaborative And Reproducible Scientific Computing And Education."
(2013).
• Galaxy
– Goecks, Jeremy, Anton Nekrutenko, and James Taylor. "Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences." Genome Biol 11.8 (2010): R86.
• VisTrails
– Freire, Juliana. "Making computations and publications reproducible with vistrails." Computing in Science & Engineering 14.4 (2012): 18-25.
• nanoHUB
– Lundstrom, Mark, and Gerhard Klimeck. "The NCN: science, simulation, and cyber services." Emerging Technologies-Nanoelectronics, 2006 IEEE Conference on. IEEE, 2006.
Issues for discussion
• What issues do gateway designers need to consider for reproducibility so they can follow the Github model and not the Excel model?
• What happens when a software framework itself goes away? What needs to be considered?
• What does it mean to be reproducible for the long term? How long? How is this possible?