Getting Started in R

advertisement
Getting Started in R
This is a short tutorial to help you familiarize yourself with the R syntax. More thorough
references are on the course webpage.
1. Overview: A bit on the structure of R…
a.) R is an interpreted language, rather than a compiled language, which means that R
executes commands directly. This makes R easy to use (especially for non-programmers)
because the user does not need to build a comprehensive program before running code.
b.) R is object oriented. Named objects are stored in the active memory of R. Type “ls()”
to see what objects you have stored.
c.) An R function is an important type of object. One can perform actions on objects
with functions and operators. Examples of operators include binary operators such as “+’’, “/‘’,
etc. For a list of other operators see Paradis pg. 29.
d.) The “help()” and “help.search()” functions allow you to access R help files. For
example, type “help(runif)” (or “?runif”, for short) to learn about the function, “runif()”, which
is an R function for generating uniformly distributed random variables.
2. Storing “data’’ in R:
Several types of R objects are useful for storing ``data’’ in R. The examples below show how to
store data in vectors, matrices, data frames, and lists. Note that the use of following symbols:


Use a ``#’’ symbol to insert a comment in R code. The characters following this symbol
are not executed.
The “<-‘’ is an assignment symbol. An equivalent symbol is a “=”.
a.) vectors
v <- c(1,3,5,7,11)
v
# v is a vector object; entering the name of an object causes the contents to be printed
# c() function is called the concatenate function; primarily used to create vector objects
# Here is the seq() function:
s1 <- seq(from=1, to=10, length=100); s2 <- seq(from=1, to=10, by=.1)
s1; s2
#s1 and s2 are vectors. They consist of the numbers 1, 1.2, 1.3, …, 9.1, 10.
x <- 1:30 ## x is vector consisting of the first 30 positive integers.
b.) matrices and arrays
#store the first 30 positive integers in a 6x5 matrix:
m <- matrix(x,6,5,byrow=TRUE) ## fill in row order
m2 <- matrix(x,6,5) ## fill in column order
m; m2
#store the first 30 positive integers in a 2x3x5 array
A <- array(x,2,3,5)
A
c.) data frames: data frames are a more flexible object than matrices. The columns of a
data frame can contain different types of entries.
names <- c(“John”, “Ashley”, “Mike”) ### vector of character strings
height <- c(72, 65, 69) ### numeric vector
heightdat <- data.frame(names, height)
heightdat
d.) lists: We can combine different kinds of objects in a list.
objectlist <- list(m, m2, heightdat)
3. Generate Random Numbers: There are several R functions for generating random numbers.
Next week, we will learn how these functions work in more detail. For now, here are some easy
ways to generate random variables from some of the distributions we have learned about in
class.
a.) Uniform:
u <- runif(10,0,1) #u is a vector of 10 iid standard uniform random variables
u; mean(u) #the sample mean of u
x <- 2 + (5*u) #what distribution do the elements of x have?
mean(x)
b.) Bernoulli:
u <- runif(100) #u is a vector of 100 iid standard uniform random variables
b1 <- ifelse(u<.7,1,0) ; b1
# ifelse is a handy function. Here, if an element of u is less than .7, then the
# corresponding element of b1 is 1. Otherwise, the corresponding element of b1 is zero.
# The elements of b1 are iid Bernoulli random variables. What is the parameter p?
# compute the sum of b1
sum(b1)
c.) Normal:
z <- rnorm(100, 0, 1) #z is a vector with 100 standard normal random variables
x <- 3 + 2*z #what is the distribution of the elements of x?
mean(x); #sample mean of x
sd(x); #sample standard deviation of x;
#we have not talked about the sample standard deviation yet, but we will
d.) Others: The functions rpoiss(), rbinom(), rgeom(), rexp() allow us to generate
Poisson, binomial, exponential, and geometric random variables. We will learn how to write our
own functions to simulate from these distributions next week.
4. Syntax for Writing Functions
(a.) The syntax for writing functions is as follows:
fun <- function(x,y,z){
….
}
The name of the function is “fun”; the arguments to the function are x,y,z. The
function performs the lines of code “…” specified between the braces. To call the function, type
fun(x,y,z)
(b.) Some examples of simple functions:
sumfun <- function(x,y){x+y}; sumfun(2,3);
maxoversum <- function(l){max(l)/sum(l)};
u <- runif(5,0,1); maxoversum(u)
logbb <- function(x,b) log(x)/log(b); logbb(8,2)
fact = function(n) if (n<=1) 1 else n*fact(n-1) # computes n!
fact(5)
Exercises:
1. Run the following lines of code, and then answer the questions below.
U <- runif(10,0,1); b <- ifelse(U<.4,1,0);
b2 <- sum(b) #add up the elements of b
X <- 12 + 3*U
Z <- rnorm(50,0,1);
Y <- 3*Z+25;
What are the distributions of the elements of b, b2, X, and Y? State the names of the
distributions and the values of parameters.
2. Run the following code to create a 150x3 matrix, whose columns contain realizations
of normal random variables with three different means (1, 5 and 9) and a variance of 4. Then,
compute the sample mean of the elements in each column. Turn in your three sample means.
(Note: even if you are working with a partner, you should turn in different sample means
because these numbers are generated randomly.)
Z1 <- rnorm(150, 0,1); X1 <- 1+2*Z1
Z2 <- rnorm(150, 0,1); X2 <- 5+2*Z2
Z3 <- rnorm(150,0,1); X3 <- 9+2*Z3
normals <- matrix(c(X1, X2, X3), 150, 3)
apply(normals, 2, mean)
## the “apply()’’ function performs the specified function or operator (in this case, the
## function “mean’’) to every row or column of a matrix. To apply a function to the columns,
## specify “2’’ as the second argument (as I did above). To apply a function to the rows, specify
## a “1’’ in the second argument.
Download