Nancy Wilkins-Diehr

advertisement

Science Gateways and their role in

Reproducibility

Nancy Wilkins-Diehr

San Diego Supercomputer Center wilkinsn@sdsc.edu

That reproducibility is a problem has already been established, but

• Brian Granger (IPython developer) talk at UCSD, May

2014

– Computing (thus software) is one of the foundations of data science

– Important decisions being made on these data

• Political, financial, institutional, peer review system, social

– Several recent examples of errors in academic data

• “Growth in a Time of Debt”, Reinhard and Rogoff (2010), Herndon,

Ash, Pollin (critique, 2013)

• “Capital in the 21 st Century”, Piketty (2014)

• BICEP2 (2014)

Software designed for specific purposes

• May do what it does well, but if it’s not designed to enforce reproducibility it will be nearly impossible for a user to achieve that

– Excel – almost impossible to design a reproducible experiment

– Github – almost impossible not to design a reproducible experiment

Many have used science gateways to address reproducibility

• IPython notebooks

– Perez, Fernando, Brian E. Granger, and C. P. S. L. Obispo. "An Open Source Framework

For Interactive, Collaborative And Reproducible Scientific Computing And Education."

(2013).

• Galaxy

– Goecks, Jeremy, Anton Nekrutenko, and James Taylor. "Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences." Genome Biol 11.8 (2010): R86.

• VisTrails

– Freire, Juliana. "Making computations and publications reproducible with vistrails." Computing in Science & Engineering 14.4 (2012): 18-25.

• nanoHUB

– Lundstrom, Mark, and Gerhard Klimeck. "The NCN: science, simulation, and cyber services." Emerging Technologies-Nanoelectronics, 2006 IEEE Conference on. IEEE, 2006.

Issues for discussion

• What issues do gateway designers need to consider for reproducibility so they can follow the Github model and not the Excel model?

• What happens when a software framework itself goes away? What needs to be considered?

• What does it mean to be reproducible for the long term? How long? How is this possible?

Download