Introduction to R

advertisement
Introduction to R
Any entity that exists in R is called an object. Some examples of an object are a vector, a matrix,
and a function.
To create a vector x  5, 3, 14  , the following could be used:
> x = c(5,3,14)
> x
[1] 5 3 14
Here = is the assignment operator that assigns the vector (5, 3, 14) to the object x. The c
function is a function that “combines” similar objects into a vector.
Generating patterned data
Using sequence operator :
> y = 1:8
> y
[1] 1 2 3 4 5 6 7 8
Using the seq function:
> x = seq(from=-1,to=1,by=0.2)
> x
[1] -1.0 -0.8 -0.6 -0.4 -0.2 0.0
0.2
0.4
0.6
0.8
1.0
The argument by=0.2 in the function specifies the increment size of the sequence.
Alternatively, one could use a different argument in the seq function:
> seq(from=-1,to=1,len=10)
[1] -1.0000000 -0.7777778 -0.5555556 -0.3333333 -0.1111111
[7] 0.3333333 0.5555556 0.7777778 1.0000000
0.1111111
The argument len= specifies the length of the sequence.
To generate repeated values, you may use the rep function. The first argument specifies the
number or object to be repeated and the second argument determines the number of repetitions.
> rep(4,7)
[1] 4 4 4 4 4 4 4
> rep(1:5,3)
[1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
> rep(1:5,c(2,1,3,2,2))
[1] 1 1 2 3 3 3 4 4 5 5
1
In the previous command, the vector c(2,1,3,2,2)) in the second argument tells R to produce
a vector with two 1’s, one 2, three 3’s, two 4’s and two 5’s. This vector must be of the same
length as the vector in the first argument.
To find the length of a vector, use the function length
> x = 5:12
> x
[1] 5 6 7
> length(x)
[1] 8
8
9 10 11 12
Creating matrices
To create a matrix, use the matrix function. The first argument in the function is a vector, with
the additional arguments specifying the number of rows or columns, and other attributes.
> x = c(3,5,1,2,9,7)
> my.matrix = matrix(x,nrow=2)
> my.matrix
[,1] [,2] [,3]
[1,]
3
1
9
[2,]
5
2
7
> my.matrix = matrix(x,ncol=2)
> my.matrix
[,1] [,2]
[1,]
3
2
[2,]
5
9
[3,]
1
7
To transpose a matrix, use the t function:
> t(my.matrix)
[,1] [,2] [,3]
[1,]
3
5
1
[2,]
2
9
7
Note that using the vector given in the first argument, R will by default form a matrix column by
column from left to right. To form a matrix row by row from top to bottom using the given
vector, set the byrow= argument to TRUE, i.e., byrow=T.
> my.matrix = matrix(x,ncol=2,byrow=T)
> my.matrix
[,1] [,2]
[1,]
3
5
[2,]
1
2
[3,]
9
7
2
Arithmetic operators
Operator
+
*
/
%*%
^
%%
function
addition
subtraction
multiplication
division
matrix multiplication
exponentiation
modulus
> x = 1:5
> x*3
[1] 3 6 9 12 15
> 4-x
[1] 3 2 1 0 -1
> x^2
[1] 1 4 9 16 25
> x%%3
[1] 1 2 0 1 2
> y = c(3,0,7,1,5)
> x-y
[1] -2 2 -4 3 0
> y/x
[1] 3.000000 0.000000 2.333333 0.250000 1.000000
> z = matrix(1,nrow=3,ncol=2)
> z
[,1] [,2]
[1,]
1
1
[2,]
1
1
[3,]
1
1
> u=c(4,7)
> z%*%u
[,1]
[1,]
11
[2,]
11
[3,]
11
Mathematical functions
Function
abs
exp
gamma
log
log10
description
absolute value
exponential (e to a power)
gamma function
logarithm
logarithm (base 10)
3
sqrt
cos
sin
tan
acos
square root
cosine
sine
tangent
arc cosine
> log(2)
[1] 0.6931472
> gamma(0.5)
[1] 1.772454
> sqrt(16)
[1] 4
A few mathematical functions for matrices
Function
diag
solve
t
description
create diagonal matrix or extract diagonal values
solve system of linear equations; find inverse
transpose
> diag(c(2,3,1))
[,1] [,2] [,3]
[1,]
2
0
0
[2,]
0
3
0
[3,]
0
0
1
> z = matrix(log(1:9),ncol=3)
> z
[,1]
[,2]
[,3]
[1,] 0.0000000 1.386294 1.945910
[2,] 0.6931472 1.609438 2.079442
[3,] 1.0986123 1.791759 2.197225
> diag(z)
[1] 0.000000 1.609438 2.197225
> solve(z)
[,1]
[,2]
[,3]
[1,] -5.973389 13.88403 -7.84961
[2,] 23.995963 -67.36519 42.50270
[3,] -16.581171 47.99193 -30.27953
> z%*%solve(z)
[,1]
[,2]
[,3]
[1,] 1.000000e+00 1.414147e-14 2.435552e-15
[2,] 2.435552e-15 1.000000e+00 -3.632511e-15
[3,] 2.706169e-16 -2.720046e-15 1.000000e+00
Note that because of the finite accuracy in the numerical computations, the matrix product of
ZZ 1 is not exactly equal to the identity matrix, but it is very close.
4
Functions for simple statistics
Function
mean
median
min
max
quantile
range
sample
sum
var
description
arithmetic mean
median
smallest value
largest value
quantiles
min and max of a vector
random sample
arithmetic sum
variance and covariance
> x = c(2,8,5)
> mean(x)
[1] 5
> var(x)
[1] 9
> quantile(x)
0% 25% 50%
2.0 3.5 5.0
75% 100%
6.5 8.0
Functions for Probability Distributions
All the functions for probability distributions begin with one of the letters d, p, q, r, followed
by the name of distribution (which is abbreviated in R).
Density (d)
These functions evaluate the p.d.f. or p.m.f. f (x) of the specified distribution. The first argument
is the value of x; the other arguments specify the parameters of the distribution.
Probability (p)
These functions evaluate the c.d.f. F (x) of the specified distribution. The first argument is the
value of x; the other arguments specify the parameters of the distribution.
Quantile (q)
These functions provide the desired quantile (percentile/100) of the specified distribution. The
first argument is the value of probability between 0 and 1; the other arguments specify the
parameters of the distribution.
Random sample (r)
These functions generate a vector of random sample from the specified distribution. The first
argument is the desired size of the random sample; the other arguments specify the parameters of
the distribution.
5
Note: R uses slightly different forms of parameters or variables for some distributions. For
example, see the negative binomial distribution below.
To find P X  4 where X has a binomial distribution with n = 7 and p = 0.2, i.e., X ~ Bin(7, 0.2)
> dbinom(4,7,0.2)
[1] 0.028672
Note that the 2nd argument is the number of trials n, and the third argument is the probability of
success p.
Recall that for the negative binomial distribution, X = number of trials required for the rth
success. However, R uses a different variable: Y = number of failures in the sequence of trials
where the last trial ends in the rth success. So to find the probability P( X  6) for a negative
binomial random variable X with parameters r = 4 successes and p = 0.6,
> dnbinom(2,4,0.6)
[1] 0.20736
Note that the 1st argument is the value of Y  X  r , followed by r and p.
The p.d.f f (x) of gamma distribution, whose shape parameter is 0.5 and scale parameter is 3, at
x  1 is
> dgamma(1,shape=0.5,scale=3)
[1] 0.2333993
The c.d.f. F (x) of the uniform distribution over the interval (2, 6) evaluated at x = 3 is
> punif(3,2,6)
[1] 0.25
The 0.6 quantile, i.e., the 60th percentile, of the exponential distribution with parameter   3.0 is
> qexp(0.6,1/3)
[1] 2.748872
Note that the 2nd argument of the function is 1 /  .
To generate a random sample of size n  12 from the normal distribution with parameter   20 ,
and standard deviation   3
> rnorm(12,20,3)
[1] 20.39824 25.20233 18.39812 19.65888 26.37264 20.22977 19.52098
[8] 21.34009 20.21618 12.73368 18.79826 19.99685
6
The abbreviations of other special distributions are given in your textbook.
Functions for graphing
In order to plot a graph, the set of values of x and the set of values of y, both of which are in
vector form, must be specified. The lengths of the vector x and vector y must be the same.
For example, to plot the graph of y  e x on the interval (-2, 2), first create a vector of 50 x values
between -2 and 2, inclusive; then create the vector y and use the vectors in the plot command:
> x = seq(from=-2,to=2,len=50)
> y = exp(x)
> plot(x,y)
The command plot(x,y) gives a dot plot, i.e., the points specified by the vectors x and y are
not connected by a smooth curve. To get a smooth curve of y  e x , add the argument type
=”l”, where l stands for line.
> plot(x,y,type=”l”)
To add or overlay another (lined) graph on the plot, say the graph of y  x 2 , use the lines
function:
> u = x^2
> lines(x,u)
> lines(x,u,lty=2)
The argument lty specifies the line type; lty = 1 gives a solid line (which is the same as that
for plot function with type =”l”), lty = 2 or lty = 3 and so on give different types of
dotted and dashed lines.
To overlay a set of points on the graph, use the points function:
> plot(x,y,type=”l”)
> u = x^2
> points(x,u)
To specify the dimensions of the graph (rather than settling for the default values determined by
R), the following can be used:
> plot(x,y,type="l",xlim=c(-3,3),ylim=c(-1,10))
The arguments xlim=c(-3,3)and ylim=c(-1,10))set the limits of the x-axis to be -3 and 3,
and the limits of the y-axis to be -1 and 10.
7
Download