advertisement

Introduction to R Any entity that exists in R is called an object. Some examples of an object are a vector, a matrix, and a function. To create a vector x 5, 3, 14 , the following could be used: > x = c(5,3,14) > x [1] 5 3 14 Here = is the assignment operator that assigns the vector (5, 3, 14) to the object x. The c function is a function that “combines” similar objects into a vector. Generating patterned data Using sequence operator : > y = 1:8 > y [1] 1 2 3 4 5 6 7 8 Using the seq function: > x = seq(from=-1,to=1,by=0.2) > x [1] -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 The argument by=0.2 in the function specifies the increment size of the sequence. Alternatively, one could use a different argument in the seq function: > seq(from=-1,to=1,len=10) [1] -1.0000000 -0.7777778 -0.5555556 -0.3333333 -0.1111111 [7] 0.3333333 0.5555556 0.7777778 1.0000000 0.1111111 The argument len= specifies the length of the sequence. To generate repeated values, you may use the rep function. The first argument specifies the number or object to be repeated and the second argument determines the number of repetitions. > rep(4,7) [1] 4 4 4 4 4 4 4 > rep(1:5,3) [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 > rep(1:5,c(2,1,3,2,2)) [1] 1 1 2 3 3 3 4 4 5 5 1 In the previous command, the vector c(2,1,3,2,2)) in the second argument tells R to produce a vector with two 1’s, one 2, three 3’s, two 4’s and two 5’s. This vector must be of the same length as the vector in the first argument. To find the length of a vector, use the function length > x = 5:12 > x [1] 5 6 7 > length(x) [1] 8 8 9 10 11 12 Creating matrices To create a matrix, use the matrix function. The first argument in the function is a vector, with the additional arguments specifying the number of rows or columns, and other attributes. > x = c(3,5,1,2,9,7) > my.matrix = matrix(x,nrow=2) > my.matrix [,1] [,2] [,3] [1,] 3 1 9 [2,] 5 2 7 > my.matrix = matrix(x,ncol=2) > my.matrix [,1] [,2] [1,] 3 2 [2,] 5 9 [3,] 1 7 To transpose a matrix, use the t function: > t(my.matrix) [,1] [,2] [,3] [1,] 3 5 1 [2,] 2 9 7 Note that using the vector given in the first argument, R will by default form a matrix column by column from left to right. To form a matrix row by row from top to bottom using the given vector, set the byrow= argument to TRUE, i.e., byrow=T. > my.matrix = matrix(x,ncol=2,byrow=T) > my.matrix [,1] [,2] [1,] 3 5 [2,] 1 2 [3,] 9 7 2 Arithmetic operators Operator + * / %*% ^ %% function addition subtraction multiplication division matrix multiplication exponentiation modulus > x = 1:5 > x*3 [1] 3 6 9 12 15 > 4-x [1] 3 2 1 0 -1 > x^2 [1] 1 4 9 16 25 > x%%3 [1] 1 2 0 1 2 > y = c(3,0,7,1,5) > x-y [1] -2 2 -4 3 0 > y/x [1] 3.000000 0.000000 2.333333 0.250000 1.000000 > z = matrix(1,nrow=3,ncol=2) > z [,1] [,2] [1,] 1 1 [2,] 1 1 [3,] 1 1 > u=c(4,7) > z%*%u [,1] [1,] 11 [2,] 11 [3,] 11 Mathematical functions Function abs exp gamma log log10 description absolute value exponential (e to a power) gamma function logarithm logarithm (base 10) 3 sqrt cos sin tan acos square root cosine sine tangent arc cosine > log(2) [1] 0.6931472 > gamma(0.5) [1] 1.772454 > sqrt(16) [1] 4 A few mathematical functions for matrices Function diag solve t description create diagonal matrix or extract diagonal values solve system of linear equations; find inverse transpose > diag(c(2,3,1)) [,1] [,2] [,3] [1,] 2 0 0 [2,] 0 3 0 [3,] 0 0 1 > z = matrix(log(1:9),ncol=3) > z [,1] [,2] [,3] [1,] 0.0000000 1.386294 1.945910 [2,] 0.6931472 1.609438 2.079442 [3,] 1.0986123 1.791759 2.197225 > diag(z) [1] 0.000000 1.609438 2.197225 > solve(z) [,1] [,2] [,3] [1,] -5.973389 13.88403 -7.84961 [2,] 23.995963 -67.36519 42.50270 [3,] -16.581171 47.99193 -30.27953 > z%*%solve(z) [,1] [,2] [,3] [1,] 1.000000e+00 1.414147e-14 2.435552e-15 [2,] 2.435552e-15 1.000000e+00 -3.632511e-15 [3,] 2.706169e-16 -2.720046e-15 1.000000e+00 Note that because of the finite accuracy in the numerical computations, the matrix product of ZZ 1 is not exactly equal to the identity matrix, but it is very close. 4 Functions for simple statistics Function mean median min max quantile range sample sum var description arithmetic mean median smallest value largest value quantiles min and max of a vector random sample arithmetic sum variance and covariance > x = c(2,8,5) > mean(x) [1] 5 > var(x) [1] 9 > quantile(x) 0% 25% 50% 2.0 3.5 5.0 75% 100% 6.5 8.0 Functions for Probability Distributions All the functions for probability distributions begin with one of the letters d, p, q, r, followed by the name of distribution (which is abbreviated in R). Density (d) These functions evaluate the p.d.f. or p.m.f. f (x) of the specified distribution. The first argument is the value of x; the other arguments specify the parameters of the distribution. Probability (p) These functions evaluate the c.d.f. F (x) of the specified distribution. The first argument is the value of x; the other arguments specify the parameters of the distribution. Quantile (q) These functions provide the desired quantile (percentile/100) of the specified distribution. The first argument is the value of probability between 0 and 1; the other arguments specify the parameters of the distribution. Random sample (r) These functions generate a vector of random sample from the specified distribution. The first argument is the desired size of the random sample; the other arguments specify the parameters of the distribution. 5 Note: R uses slightly different forms of parameters or variables for some distributions. For example, see the negative binomial distribution below. To find P X 4 where X has a binomial distribution with n = 7 and p = 0.2, i.e., X ~ Bin(7, 0.2) > dbinom(4,7,0.2) [1] 0.028672 Note that the 2nd argument is the number of trials n, and the third argument is the probability of success p. Recall that for the negative binomial distribution, X = number of trials required for the rth success. However, R uses a different variable: Y = number of failures in the sequence of trials where the last trial ends in the rth success. So to find the probability P( X 6) for a negative binomial random variable X with parameters r = 4 successes and p = 0.6, > dnbinom(2,4,0.6) [1] 0.20736 Note that the 1st argument is the value of Y X r , followed by r and p. The p.d.f f (x) of gamma distribution, whose shape parameter is 0.5 and scale parameter is 3, at x 1 is > dgamma(1,shape=0.5,scale=3) [1] 0.2333993 The c.d.f. F (x) of the uniform distribution over the interval (2, 6) evaluated at x = 3 is > punif(3,2,6) [1] 0.25 The 0.6 quantile, i.e., the 60th percentile, of the exponential distribution with parameter 3.0 is > qexp(0.6,1/3) [1] 2.748872 Note that the 2nd argument of the function is 1 / . To generate a random sample of size n 12 from the normal distribution with parameter 20 , and standard deviation 3 > rnorm(12,20,3) [1] 20.39824 25.20233 18.39812 19.65888 26.37264 20.22977 19.52098 [8] 21.34009 20.21618 12.73368 18.79826 19.99685 6 The abbreviations of other special distributions are given in your textbook. Functions for graphing In order to plot a graph, the set of values of x and the set of values of y, both of which are in vector form, must be specified. The lengths of the vector x and vector y must be the same. For example, to plot the graph of y e x on the interval (-2, 2), first create a vector of 50 x values between -2 and 2, inclusive; then create the vector y and use the vectors in the plot command: > x = seq(from=-2,to=2,len=50) > y = exp(x) > plot(x,y) The command plot(x,y) gives a dot plot, i.e., the points specified by the vectors x and y are not connected by a smooth curve. To get a smooth curve of y e x , add the argument type =”l”, where l stands for line. > plot(x,y,type=”l”) To add or overlay another (lined) graph on the plot, say the graph of y x 2 , use the lines function: > u = x^2 > lines(x,u) > lines(x,u,lty=2) The argument lty specifies the line type; lty = 1 gives a solid line (which is the same as that for plot function with type =”l”), lty = 2 or lty = 3 and so on give different types of dotted and dashed lines. To overlay a set of points on the graph, use the points function: > plot(x,y,type=”l”) > u = x^2 > points(x,u) To specify the dimensions of the graph (rather than settling for the default values determined by R), the following can be used: > plot(x,y,type="l",xlim=c(-3,3),ylim=c(-1,10)) The arguments xlim=c(-3,3)and ylim=c(-1,10))set the limits of the x-axis to be -3 and 3, and the limits of the y-axis to be -1 and 10. 7