Applied Epidemiology Using R UC Berkeley School of Public Health Syllabus for PH 251D CC# 76238 Fall 2012, Mondays, 4pm-6pm 214 Haviland Hall, Updated 2012-09-23 Faculty Tomás J. Aragón, MD, DrPH Principal Investigator, Cal PREPARE Systems Research Center for Infectious Diseases & Emergency Readiness UC Berkeley School of Public Health Health Officer, City & County of San Francisco San Francisco Department of Public Health Email: tomas.aragon@sfdph.org G-Phone: 415-78-SALUD (415-787-2583) ------Michael C. Samuel, DrPH Chief, Surveillance and Epidemiology Section STD Control Branch, DCDC, CID California Department of Public Health Email: Michael.Samuel@cdph.ca.gov; Tel: 510-620-3198 Course description This is an intensive one-semester introduction to the R programming language for applied epidemiology. R is a freely available, multi-platform (Linux, Mac OS, Windows, etc.), versatile, and powerful program for statistical computing and graphics (http://www.r-project.org). This course will focus on core basics of organizing, managing, and manipulating epidemiologic data; basic epidemiologic applications; introduction to R programming; and basic R graphics. Target audience This course is intended for epidemiologists, medical epidemiologists, data analysts, and demographers that want an introduction to the R language for epidemiologic applications. Course prerequisites Completion of one semester of epidemiology and one semester of bio/statistics. ph251d_2012fall_syllabus.odt p. 1 of 4 Created in LibreOffice! Course schedule Date Wk Topics 08/27/12 1 Getting started with R Tomas Aragon 09/03/12 -- Academic and Administrative Holiday -- 09/10/12 2 Working with R data objects I (Vectors, Matrices, Arrays) – HW1 due Tomás Aragón 09/17/12 3 Working with R data objects II (Lists, Data Frames) Tomás Aragón 09/24/12 4 Managing epidemiologic data I (Entering, Editing, Transforming, Merging, Exporting) – HW2 due Tomás Aragón 10/01/12 5 Managing epidemiologic data II (Importing, Dates, Missing values, Regular expressions) Tomás Aragón 10/08/12 6 R programming I – HW3 due Tomás Aragón 10/15/12 7 Analyzing epidemiologic data (basic) Tomás Aragón 10/22/12 8 R programming II Tomás Aragón 10/29/12 9 Graphing epidemiologic data I Michael Samuel 11/05/12 10 Graphing epidemiologic data II (to be confirmed) Michael Samuel 11/12/12 -- -- 11/19/12 11 Analyzing epidemiologic data (regression) Students 11/26/12 12 Student Presentations Students 12/03/12 13 Student Presentations (last day of class) Students Academic and Administrative Holiday Lecturer Course objectives Upon completion of this course, participants will be able to: • • • • • Use R as a scientific calculator and a functional spreadsheet; Enter, manage, and manipulate epidemiologic data in R; Conduct basic epidemiologic analyses in R.; Graphically display epidemiologic data; Write basic R programs. Course format Lecture and computer demonstration. You are welcome to bring your laptop with R and RStudio pre-installed. Course enrollment and fee UC Berkeley students should register for Public Health 251D. Non-registered students who want to receive academic credit will need to register and pay the UC Extension fee (see http://extension.berkeley.edu/info/ConcurrentOverview.html). Auditors are welcome if there is seating space. ph251d_2012fall_syllabus.odt p. 2 of 4 Created in LibreOffice! Course location and schedule Schedule: Every Mondays, 4:00 pm - 6:00 pm. Location: 214 Haviland Hall Course materials • Applied Epidemiology Using R. by Tomás J. Aragón (required) Available at http://www.medepi.com. Updated chapters will appear weekly. • Getting Started with RStudio, by John Verzani (highly recommended) Permalink: http://amzn.com/1449309038 • Epidemiology: An Introduction. by Kenneth J. Rothman (highly recommended) Permalink: http://amzn.com/0199754551 • An Introduction to R. Freely available at http://cran.r-project.org/doc/manuals/R-intro.pdf (also comes with default installation) Course web site Documents will be posted at http://www.medepi.com; bSpace for forums and secure documents. Course requirements and evaluation Grading and evaluation For registered UC Berkeley/Extension students: Units: 2; Grading: Letter or S/U Grading will be based on weekly attendance (10%), homework (20%), and student project (70%). Satisfactory (S) or Passed (P) is at a minimum level of B. Student homework Homework will involve (a) completing problems at the end of each chapter, and (b) demonstrating the following skills: 1. Run the program file (filename1.r) using the 'source' command; 2. Demonstrate reading an ASCII data file (filename2.dat) to create a 'data frame'; 3. Demonstrate simple data manipulation (e.g., variable transformation, recoding, etc.); 4. Demonstrate the use of calendar and Julian dates; 5. Conduct a simple analysis using existing functions (from R, colleagues, etc.); 6. Conduct a simple analysis demonstrating simple programming (e.g., a 'for' loop); 7. Conduct a simple analysis demonstrating an original function created by student;; 8. Create a simple graph with title, axes labels and legend, and output to file; 9. Demonstrate the use of regular expressions; 10.Demonstrate the use of the 'sink' function to generate an output file; ph251d_2012fall_syllabus.odt p. 3 of 4 Created in LibreOffice! Student project and presentation • Prepare and submit a brief paper/article illustrating an epidemiologic method or topic using R. For examples, see short articles in the R Journal (http://journal.r-project.org). • Students will present their paper at end of course during the last sessions, and submit their paper on the last day. Course free and open source software (required) • Install R on your computer (visit http://cran.cnr.berkeley.edu/) • Install RStudio on your computer (visit http://www.rstudio.org) Creating LaTeX documents (optional) I use LaTeX to create high quality typeset documents. You can create LaTeX document online at http://www.scribtex.com/. RStudio can also be used to generate LaTeX documents, but LaTeX must be installed on your computer. Additional readings & resoures: • Official R manuals. (Available from http://cran.r-project.org/manuals.html) • Contributed R tutorials. (Available from http://cran.r-project.org/other-docs.html) Related fall courses This course is not a statistics course, although we review some statistical methods. The following is/are highly recommended course(s) that complement this course: • PH 248 (3 units), Steve Selvin, Statistical/Computer Analysis Using R, Th 2pm-4pm, 330 Evans Hall; CCN 76202 R for epidemiologic computing Where do I get R? • Download from UC Berkeley at http://cran.cnr.berkeley.edu/ • R manuals available at http://cran.r-project.org/manuals.html • R tutorials available at http://cran.r-project.org/other-docs.html How can I get help with R? • We will set up bSpace forum for course. • Subscribe to R mailing lists at http://www.r-project.org/mail.html ph251d_2012fall_syllabus.odt p. 4 of 4 Created in LibreOffice!