Getting Started in R This is a short tutorial to help you familiarize yourself with the R syntax. More thorough references are on the course webpage. 1. Overview: A bit on the structure of R… a.) R is an interpreted language, rather than a compiled language, which means that R executes commands directly. This makes R easy to use (especially for non-programmers like me) because the user does not need to build a comprehensive program before running code. b.) R is object oriented. Named objects are stored in the active memory of R. Type “ls()” to see what objects you have stored. c.) An R function is an important type of object. One can perform actions on objects with functions and operators. Examples of operators include binary operators such as ``+’’, ``-‘’, etc. For a list of other operators see Paradis pg. 29. d.) The “help()” and “help.search()” functions allow you to access R help files. For example, type “help(runif)” to learn about the function, “runif()”, which generates uniformly distributed random variables. 2. Storing ``data’’ in R: Several types of R objects are useful for storing ``data’’ in R. The examples below shows how to store data in vectors, matrices, data frames, and lists. a.) vectors v <- c(1,3,5,7,11) v #v is a vector s1 <- seq(from=1, to=10, length=100); s2 <- seq(from=1, to=10, by=.1) s1; s2 #s1 and s2 are vectors. They consist of the numbers 1, 1.2, 1.3, …, 9.1, 10. l <- 1:30 ##l is vector consisting of the first 30 positive integers. b.) matrices and arrays #store the first 30 positive integers in a 6x5 matrix: m <- matrix(l,6,5,byrow=TRUE) ##fill the rows m2 <- matrix(l,6,5)##fill the columns m; m2 #store the first 30 positive integers in a 2x3x5 array A <- array(l,2,3,5) c.) data frames: data frames are a more flexible than matrices. The columns of a data frame can contain different types of entries. names <- c(“John”, “Ashley”, “Mike”) height <- c(72, 65, 69) heightdat <- data.frame(names, height) heightdat d.) lists: We can store different kinds of objects in a list. objectlist <- list(m, m2, heightdat) ##comments: ##Use a ``#’’ symbol to write a comment in R. ##The ``<-‘’ is equivalent to a `=’. I have gotten into the habit of using the `<-‘. You may find “==’’ easier. 3. Generate Random Numbers: There are several R functions for generating random numbers. Next week, we will learn how these functions work in more detail. For now, here are some easy ways to generate random variables from some of the distributions we have learned about in class. a.) Uniform: u <- runif(10,0,1) #u is a vector of 10 iid standard uniform random variables u; mean(u) #the sample mean of u x <- 2 + (5*u) #what distribution do the elements of x have? mean(x) b.) Bernoulli: u <- runif(100) #u is a vector of 100 iid standard uniform random variables b1 <- ifelse(u<.7,1,0) ; b1 #ifelse is a handy function. Here, if an element of u is less than .7, then the #corresponding element of b1 is 1. Otherwise, the corresponding element of b1 is zero. The #elements of b1 are iid Bernoulli random variables. What is the parameter p? #compute the sample mean of b1 mean(b1) c.) Normal: z <- rnorm(100, 0, 1) #z is a vector with 100 standard normal random variables x <- 3 + 2*z #what is the distribution of the elements of x? mean(x); #sample mean of x sd(x); #sample standard deviation of x; #we have not talked about the sample standard deviation yet, but we will d.) Others: The functions rpoiss(), rbinom(), rgeom(), rexp() allow us to generate Poisson, binomial, exponential, and geometric random variables. We will learn how to write our own functions to simulate from these distributions next week. 4. Syntax for Writing Functions (a.) The syntax for writing functions is as follows: f <- function(x){ …. } The name of the function is “f”; the arguments to the function are x,y,z. The function performs the lines of code “…” specified between the braces. To call the function, type f(x,y,z) (b.) Two examples of simple functions: sumfun <- function(x,y){x+y}; sumfun(2,3); maxoversum <- function(l){max(l)/sum(l)}; u <- runif(5,0,1); maxoversum(u) Exercises: 1. Run the following lines of code, and then answer the questions below. U <- runif(10,0,1); b <- ifelse(U<.4,1,0); b2 <- sum(b) #add up the elements of b X <- 12 + 3*U Z <- rnorm(50,0,1); Y <- 3*Z+25; What are the distributions of the elements of b, b2, X, and Y? State the names of the distributions and the values of parameters. 2. Run the following code to create a 150x3 matrix, whose columns contain realizations of normal random variables with three different means (1, 5 and 9) and a variance of 4. Then, compute the sample mean of the elements in each column. Turn in your three sample means. (Note: even if you are working with a partner, you should turn in different sample means because these numbers are generated randomly.) Z1 <- rnorm(150, 0,1); X1 <- 1+2*Z1 Z2 <- rnorm(150, 0,1); X2 <- 5+2*Z2 Z3 <- rnorm(150,0,1); X3 <- 9+2*Z3 normals <- matrix(c(X1, X2, X3), 150, 3) apply(normals, 2, mean) #the ``apply’’ function performs the specified function or operator (in this case, the ##operator ``mean’’) to every row or column of a matrix. To apply a function to the columns, ##specify ``2’’ as the second argument (as I did above). To apply a function to the rows, specify ##a ``1’’ in the second argument.