Getting Started in R

advertisement

Getting Started in R

This is a short tutorial to help you familiarize yourself with the R syntax. More thorough references are on the course webpage.

1. Overview: A bit on the structure of R… a.) R is an interpreted language, rather than a compiled language, which means that R executes commands directly. This makes R easy to use because the user does not need to build a comprehensive program before running code. b.) R is object oriented. Named objects are stored in the active memory of R. Type

“ls()” to see what objects you have stored. An R function is an important type of object. c.) One can perform actions on objects with functions and operators. Examples of operators include binary operators such as ``+’’, ``-‘’, etc. For a list of other operators see

Paradis pg. 29. d.) The “help()” and “help.search()” functions allow you to access R help files. For example, type “help(runif)” to learn about the function, “runif()”, which generates uniformly distributed random variables. e) Other Issues: Comments can be separated from text using the # sign.

2. Storing ``data’’ in R:

Several types of R objects are useful for storing ``data’’ in R. The examples below show how to store data in vectors, matrices, data frames, and lists. a.) vectors v <- c(1,3,5,7,11) v

#v is a vector

s1 <- seq(from=1, to=10, length=91); s2 <- seq(from=1, to=10, by=.1) s1; s2

#s1 and s2 are vectors. They consist of the numbers 1, 1.2, 1.3, …, 9.1, 10. l <- 1:30 ##l is vector consisting of the first 30 positive integers. b.) matrices and arrays

#store the first 30 positive integers in a 6x5 matrix: m <- matrix(l, nrow = 6, ncol = 5,byrow=TRUE) ##fill the rows m2 <- matrix(l,nrow = 6, ncol = 5)##fill the columns

m; m2

#store the first 30 positive integers in a 2x3x5 array

A <- array(l, dim = c(2,3,5))

c.) data frames: data frames are a more flexible than matrices. The columns of a data frame can contain different types of entries.

names <- c(“John”, “Ashley”, “Mike”)

height <- c(72, 65, 69)

heightdat <- data.frame(names = names, height = height) heightdat d.) lists: We can store different kinds of objects in a list. objectlist <- list(m, m2, heightdat)

##comments:

##The ``<-‘’ is equivalent to a `=’. I have gotten into the habit of using the `<-‘. You may find “=’’ easier, however I do not recommend it for good programming habits. It makes it easier to read code and separate it from assignments as opposed to function arguments.

3. Generate Random Numbers: There are several R functions for generating random numbers. For now, here are some easy ways to generate random variables from some of the distributions we have learned about in class. a.) Uniform: u <- runif(n = 10, min = 0, max = 1) #u is a vector of 10 iid standard uniform random variables

# if you maintain order as in the help page, the arguments do not need to be named, however I recommend that you use them for greater clarity and to reduce the chances of error. u; mean(u) #the sample mean of u

# ";" is equivalent to line-break. x <- 2 + (5*u) #what distribution do the elements of x have? mean(x)

b.) Bernoulli: u <- runif(n = 100) #u is a vector of 100 iid standard uniform random variables: note that the arguments a and b take their default values and may be omitted. b1 <- ifelse(u<0.7,1,0) ; b1

#ifelse is a handy function. Here, if an element of u is less than .7, then the

#corresponding element of b1 is 1. Otherwise, the corresponding element of b1 is zero. The

#elements of b1 are iid Bernoulli random variables. What is the parameter p?

#compute the sample mean of b1 mean(b1)

# Note that Bernoulli can also be generated using rbinom(n = 1, size = 1, prob = 0.7) c.) Normal: z <- rnorm(n = 100, mean = 0, sd = 1) #z is a vector with 100 standard normal random variables

x <- 3 + 2*z #what is the distribution of the elements of x? mean(x); #sample mean of x sd(x); #sample standard deviation of x;

#we have not talked about the sample standard deviation yet, but we will in a few weeks. d.) Others: The functions rpois(), rbinom(), rgeom(), rexp() allow us to generate Poisson, binomial, exponential, and geometric random variables.

4. Syntax for Writing Functions

(a.) The syntax for writing functions is as follows: f <- function(x){

….

}

The name of the function is “f”; the arguments to the function are x,y,z.

The function performs the lines of code “…” specified between the braces. To call the function, type

f(x,y,z)

(b.) Two examples of simple functions: sumfun <- function(x, y) {x+y} ; sumfun(2,3);

# a less cryptic version: sumfun <- function(x, y)

{

return( x + y)

}

maxoversum <- function(l){max(l)/sum(l)}; u <- runif(n = 5,a = 0, b = 1); maxoversum(u)

# other examples to simulate from distributions

## function to generate samples from the lognormal distribution

## with parameters mu and sigma

## sigma)) rlognormal <- function(size, mu = 0, sigma = 1) exp(rnorm(size, mean = mu, sd =

# an alternative way to writing the same

# rlognormal <- function(size, mu = 0, sigma = 1)

#mu = 0, sigma = 1 are default values

{ y <- rnorm(size, mean = mu, sd = sigma)

# mean of the normal distribution is mu

# sd of normal distribution is sigma return(exp(y)) # exponentiate to get log-normal random variable

}

# generate samples from the Rice distribution rrice <- function(size, mu = 1, sigma = 1)

{ .....

}

Use Method of Inversion to sample:

Consider the density: f(x) = 2*x^(-3) for x >1.

The cdf for this density is: F(x) = 1 - x^(-2), x > 1

Then if U ~ U(0, 1),

X = F^{-1}(U) is a realization from F(x).

Note that F^{-1}(U) = (1 - U)^{-1/2}. We will write a function in R to obtain n realizations from our target density. rmydensity <- function(n)

{

u <- runif(n, min = 0, max = 1) #n realizations from U(0, 1)

return( (1 - u) ^{-0.5})

}

# try it out:

y <- rmydensity(1000)

Exercises:

1. Run the following lines of code, and then answer the questions below.

U <- runif(10,0,1); b <- ifelse(U<.4,1,0); b2 <- sum(b) #add up the elements of b

X <- 12 + 3*U

Z <- rnorm(50,0,1);

Y <- 3*Z+25;

What are the distributions of the elements of b, b2, X, and Y? State the names of the distributions and the values of parameters.

2. Run the following code to create a 150x3 matrix, whose columns contain realizations of normal random variables with three different means (1, 5 and 9) and a variance of 4. Then, compute the sample mean of the elements in each column. Turn in your three sample means. (Note: even if you are working with a partner, you should turn in different sample means because these numbers are generated randomly.)

Z1 <- rnorm(150, 0,1); X1 <- 1+2*Z1

Z2 <- rnorm(150, 0,1); X2 <- 5+2*Z2

Z3 <- rnorm(150,0,1); X3 <- 9+2*Z3 normals <- matrix(c(X1, X2, X3), 150, 3) apply(normals, 2, mean)

#the ``apply’’ function performs the specified function or operator (in this case, the

##operator ``mean’’) to every row or column of a matrix. To apply a function to the columns,

##specify ``2’’ as the second argument (as I did above). To apply a function to the rows, specify

##a ``1’’ in the second argument.

Download