Getting Started in R This is a short tutorial to help you familiarize yourself with the R syntax. More thorough references are on the course webpage. 1. Overview: A bit on the structure of R… a.) R is an interpreted language, rather than a compiled language, which means that R executes commands directly. This makes R easy to use (especially for non-programmers) because the user does not need to build a comprehensive program before running code. b.) R is object oriented. Named objects are stored in the active memory of R. Type “ls()” to see what objects you have stored. c.) An R function is an important type of object. One can perform actions on objects with functions and operators. Examples of operators include binary operators such as “+’’, “/‘’, etc. For a list of other operators see Paradis pg. 29. d.) The “help()” and “help.search()” functions allow you to access R help files. For example, type “help(runif)” (or “?runif”, for short) to learn about the function, “runif()”, which is an R function for generating uniformly distributed random variables. 2. Storing “data’’ in R: Several types of R objects are useful for storing ``data’’ in R. The examples below show how to store data in vectors, matrices, data frames, and lists. Note that the use of following symbols: Use a ``#’’ symbol to insert a comment in R code. The characters following this symbol are not executed. The “<-‘’ is an assignment symbol. An equivalent symbol is a “=”. a.) vectors v <- c(1,3,5,7,11) v # v is a vector object; entering the name of an object causes the contents to be printed # c() function is called the concatenate function; primarily used to create vector objects # Here is the seq() function: s1 <- seq(from=1, to=10, length=100); s2 <- seq(from=1, to=10, by=.1) s1; s2 #s1 and s2 are vectors. They consist of the numbers 1, 1.2, 1.3, …, 9.1, 10. x <- 1:30 ## x is vector consisting of the first 30 positive integers. b.) matrices and arrays #store the first 30 positive integers in a 6x5 matrix: m <- matrix(x,6,5,byrow=TRUE) ## fill in row order m2 <- matrix(x,6,5) ## fill in column order m; m2 #store the first 30 positive integers in a 2x3x5 array A <- array(x,2,3,5) A c.) data frames: data frames are a more flexible object than matrices. The columns of a data frame can contain different types of entries. names <- c(“John”, “Ashley”, “Mike”) ### vector of character strings height <- c(72, 65, 69) ### numeric vector heightdat <- data.frame(names, height) heightdat d.) lists: We can combine different kinds of objects in a list. objectlist <- list(m, m2, heightdat) 3. Generate Random Numbers: There are several R functions for generating random numbers. Next week, we will learn how these functions work in more detail. For now, here are some easy ways to generate random variables from some of the distributions we have learned about in class. a.) Uniform: u <- runif(10,0,1) #u is a vector of 10 iid standard uniform random variables u; mean(u) #the sample mean of u x <- 2 + (5*u) #what distribution do the elements of x have? mean(x) b.) Bernoulli: u <- runif(100) #u is a vector of 100 iid standard uniform random variables b1 <- ifelse(u<.7,1,0) ; b1 # ifelse is a handy function. Here, if an element of u is less than .7, then the # corresponding element of b1 is 1. Otherwise, the corresponding element of b1 is zero. # The elements of b1 are iid Bernoulli random variables. What is the parameter p? # compute the sum of b1 sum(b1) c.) Normal: z <- rnorm(100, 0, 1) #z is a vector with 100 standard normal random variables x <- 3 + 2*z #what is the distribution of the elements of x? mean(x); #sample mean of x sd(x); #sample standard deviation of x; #we have not talked about the sample standard deviation yet, but we will d.) Others: The functions rpoiss(), rbinom(), rgeom(), rexp() allow us to generate Poisson, binomial, exponential, and geometric random variables. We will learn how to write our own functions to simulate from these distributions next week. 4. Syntax for Writing Functions (a.) The syntax for writing functions is as follows: fun <- function(x,y,z){ …. } The name of the function is “fun”; the arguments to the function are x,y,z. The function performs the lines of code “…” specified between the braces. To call the function, type fun(x,y,z) (b.) Some examples of simple functions: sumfun <- function(x,y){x+y}; sumfun(2,3); maxoversum <- function(l){max(l)/sum(l)}; u <- runif(5,0,1); maxoversum(u) logbb <- function(x,b) log(x)/log(b); logbb(8,2) fact = function(n) if (n<=1) 1 else n*fact(n-1) # computes n! fact(5) Exercises: 1. Run the following lines of code, and then answer the questions below. U <- runif(10,0,1); b <- ifelse(U<.4,1,0); b2 <- sum(b) #add up the elements of b X <- 12 + 3*U Z <- rnorm(50,0,1); Y <- 3*Z+25; What are the distributions of the elements of b, b2, X, and Y? State the names of the distributions and the values of parameters. 2. Run the following code to create a 150x3 matrix, whose columns contain realizations of normal random variables with three different means (1, 5 and 9) and a variance of 4. Then, compute the sample mean of the elements in each column. Turn in your three sample means. (Note: even if you are working with a partner, you should turn in different sample means because these numbers are generated randomly.) Z1 <- rnorm(150, 0,1); X1 <- 1+2*Z1 Z2 <- rnorm(150, 0,1); X2 <- 5+2*Z2 Z3 <- rnorm(150,0,1); X3 <- 9+2*Z3 normals <- matrix(c(X1, X2, X3), 150, 3) apply(normals, 2, mean) ## the “apply()’’ function performs the specified function or operator (in this case, the ## function “mean’’) to every row or column of a matrix. To apply a function to the columns, ## specify “2’’ as the second argument (as I did above). To apply a function to the rows, specify ## a “1’’ in the second argument.