Reproducible Research with R, LaTeX, and

advertisement
Title: Reproducible Research with R, LaTeX, and Sweave
Abstract:
In this one-hour seminar, we will first introduce the concept and importance of reproducible
research. We will then cover how we can use R, LaTex, and Sweave to automatically generate
statistical reports to ensure reproducible research. Each software component will be briefly
introduced: R, the free interactive programming language and environment used to perform the
desired statistical analysis (including the generation of graphics); LaTeX, the typesetting system
used to produce the written portion of the statistical report; and Sweave, the flexible framework
used to embed the R code into a LaTeX document, to compile the R code, and to insert the
desired output into the generated statistical report. The steps to generate a reproducible
statistical report from scratch using the three software components will then be presented using a
detailed example. The ability to regenerate the report when the data or analysis changes and to
automatically update the output will also be demonstrated. In addition, the seminar will provide
useful tips and needed resources/references.
Topics:
1. Reproducible research
2. R
3. Sweave
4. LaTeX
Goals:
1. To understand the concept and the importance of reproducible research.
2. To understand the role of each software component in the automatic generation of statistical
reports.
3. To understand the steps needed to generate a reproducible statistical report from scratch.
4. To understand how to regenerate the report when the data or analysis changes.
5. To be aware of the resources available in order to learn more.
Intended Audience:
Anyone interested in learning the importance of reproducible research and how to automatically
generate statistical reports. No prior experience with R, Sweave, or LaTeX is necessary, just a
desire to learn.
Speaker Description:
Ms. Scott has been a member of the Department of Biostatistics at Vanderbilt University since
April 2004. Over this four year period, Ms. Scott has provided statistical support for the
Department of Obstetrics & Gynecology and the Division of General Pediatrics. In her
collaborative role, Ms. Scott has provided statistical expertise in the design and implementation
of research projects and has participated in their publication. Ms. Scott’s primary role though
has been to implement the planned statistical analysis of these research projects and to prepare
formal written reports summarizing and interpreting the data analysis results. Ms. Scott has been
using R, LaTeX, and Sweave to generate these statistical reports and to ensure reproducible
research since her first month at Vanderbilt. Ms. Scott has also used these software components
to produce the tri-monthly blinded and unblinded reports for the Data Safety Monitoring
Committee of a large worldwide Phase III randomized clinical trial.
In addition to her collaborative role, Ms. Scott teaches, both formally and informally, quite a lot.
Specifically, in response to the growing number of faculty and staff at Vanderbilt (both within
and outside of the Biostatistics department) who are learning and using R, Ms. Scott has been
offering a weekly R Clinic since March 2005. The goal of the clinic is to provide a place where
R users of all levels can find help and learn R together. The topics of LaTeX and Sweave are
also often covered during the clinic. Ms. Scott has also compiled a pair of lecture notes titled
“An Introduction to the Fundamentals & Functionality of the R Programming Language”, which
cover both an overview and the nuts and bolts of the language. She has presented various forms
of these lecture notes numerous times at Vanderbilt, as a one-day short course at the R Users
conference, and as a two-day short course offered through the American Statistical Association.
Ms. Scott also plans to compile a formal series of lectures notes regarding the topic of this
seminar.
Download