Introduction to R Lecture 1: Getting Started Andrew Jaffe 8/30/10 Lecture 1 • • • • • • • Course overview What is R? Installing R Installing a text editor Interfacing text editor with R Writing scripts Using R as a calculator About the Course • Series of 7 seminars • Covers the usage of R – Platform for beginning analyses – NOT covering statistics – Good programming etiquette • Bring your laptop – there will be breaks to allow you to practice the code About the Course • This seminar is 1 unit pass/fail • To pass, attend 5 out of 7 seminars • Very little outside work About the Course • Some learning objectives include: – Importing/exporting data – Data management – Performing calculations – Recoding variables – Producing graphics – Installing packages – Writing functions About the Course • Course communication via E-mail • Lectures and code will be hosted on my webpage – http://www.biostat.jhsph.edu/~ajaffe/rseminar. html About the Instructor • 3rd year PhD student in Genetic Epi program, concurrent MHS in Bioinformatics • Learned R five years ago, been using regularly the last two Lecture 1 • • • • • • • • Course overview What is R? Installing R Installing a text editor Interfacing text editor with R Writing scripts Using R as a calculator Assignment What is R? • R is a language and environment for statistical computing and graphics • R is the open source implementation of the S language, which was developed by Bell laboratories • R is both open source and open development http://www.r-project.org/ What is R? • Pros: – Free – Tons of packages, very flexible – Multiple datasets at any given time • Cons: – Much more “programming” oriented – Minimal interface These are my personal opinions What is R? • Often times, a good first step for data cleaning and manipulation • Then, export data to STATA or SAS for Epi analyses What is R? Console Script Lecture 1 • • • • • • • • Course overview What is R? Installing R Installing a text editor Interfacing text editor with R Writing scripts Using R as a calculator Assignment Installing R • http://cran.r-project.org/ Installing R - Windows • Windows: click “base” and download Installing R - Windows • Click the link to the latest build Installing R - Mac • Mac: click the latest package’s .pkg file Installing R • • • • Double click the downloaded file Hit ‘next’ a few times Use default settings Finish installing Lecture 1 • • • • • • • • Course overview What is R? Installing R Installing a text editor Interfacing text editor with R Writing scripts Using R as a calculator Assignment Installing a Text Editor • Windows: R’s built-in text editor is terrible – It’s essentially Window’s notepad – We will download a much better one • Mac: R’s built-in text editor is sufficient – Color coding, signals parenthesis closing, etc – I suggest using this until you think you need a better one Installing a Text Editor • I prefer Notepad++: – http://notepad-plus-plus.org/ – Download the current version: http://download.tuxfamily.org/notepadplus/5.7/ npp.5.7.Installer.exe – Install on your computer using defaults Installing a Text Editor Lecture 1 • • • • • • • • Course overview What is R? Installing R Installing a text editor Interfacing text editor with R Writing scripts Using R as a calculator Assignment Interfacing with R • Scripts: documents that contain reproducible R code and functions that you can send to the console (and save) – Files are designated with the “.R” extension – You can “source” scripts (more later) • Console: Type commands directly into the console – Good for looking at your data, trying things, and plotting Interfacing with R - Mac • Mac: File New Script • This opens the default text editor • To send a line of code to the R console, press Apple+Enter when the cursor is anywhere on that line • Highlight chunks of code and press Apple+Enter to send Interfacing with R - Windows • Using the default text editor, pressing Ctrl+R sends lines to the console • However, we want to use Notepad++ • We need to download one more thing… Interfacing with R - Windows • “NppToR”: Notepad++ to R • http://sourceforge.net/projects/npptor/ • It must be running when R and Notepad++ are open • When properly configured, press F8 to send lines of code, or highlighted chunks, to the console • I will help configure this after class today Interfacing with R – Windows • More detailed instructions for installing NppToR • http://sourceforge.net/apps/mediawiki/nppt or/index.php?title=Installing Lecture 1 • • • • • • • • Course overview What is R? Installing R Installing a text editor Interfacing text editor with R Writing scripts Using R as a calculator Assignment Writing Scripts • The comment symbol is # (pound) in R • Comment liberally - you should be able to understand a script after not seeing it for 6 months • Lines of #’s are useful to separate sections • Useful for designating headers Writing Scripts ################# # Title: Demo R Script # Author: Andrew Jaffe # Date: 7/30/10 # Purpose: Demonstrate comments in R ################### # this is a comment, nothing to the right of it gets read # this # is still a comment – you can use many #’s as you want # sometimes you have a really long comment, like explaining what you # are doing for a step in analysis. Take it to a second line Writing Scripts • Some common etiquette: – You can use spaces (more generally “white space”) within functions and commands liberally as well – Try to keep a reasonable number of characters per column – many commands can be broken into multiple lines – More to come later… Lecture 1 • • • • • • • • Course overview What is R? Installing R Installing a text editor Interfacing text editor with R Writing scripts Using R as a calculator Assignment R as a Calculator • The R console functions as full calculator • Try to play around with it: +, -, /, * are add, subtract, multiply, and divide ^ or ** is power ( and ) work with order of operations Lecture 1 • • • • • • • • Course overview What is R? Installing R Installing a text editor Interfacing text editor with R Writing scripts Using R as a calculator Assignment Assignment • The assignment… operator: assigning a value to a name • R accepts two operators “<-” and “=“ – Ie: x=8 (remember whitespace!: x = 8, x <- 8) • Variable names are case-sensitive – Ie: X and x are different • Set x = 8, and try using calculator functions on x Assignment • ‘Assignment’ literally puts whatever is on the right side of the operator into your lefthand side variable – Note that although you can name variables anything, you might run into some issues naming things the same as default R functions Np++ turns functions red/pink so you know… Examples of assignment, introducing R data Enough to get R up and running if this is the only class you attend. We will see them in much more detail over the next three sessions Assignment • status <- c(“case”,”case”,”case”, “control”,”control”,”control”) status class(status) table(status) factor(status) [alternatively: status <- c(rep(“case”,3), rep(“control”,3))] Assignment • web <“http://www.biostat.jhsph.edu/~ajaffe/code/lec1_ code.R” – class(web) – source(web) • You also don’t have to save tables/data you find online to your disk (note read.table works for most things – below aren’t tables though) – scan(web, what=character(0), sep = "\n") – scan(“http://www.google.com”, what=character(0)) Assignment mat <- matrix(c(1,2,3,4), nrow = 2, ncol = 2, byrow = T) # this is sourced in class(mat) mat mat + mat mat * mat mat %*% mat Assignment • class(dat) # dat is also sourced in • head(dat) • table(dat$sex, dat$status) • …To be continued… Questions?