Stat 579: List Objects Ranjan Maitra

advertisement
Stat 579: List Objects
Ranjan Maitra
2220 Snedecor Hall
Department of Statistics
Iowa State University.
Phone: 515-294-7757
maitra@iastate.edu
,
1/10
Example: Eigenvalues of a matrix
mm <- matrix(rpois(9), ncol = 3)
emm <- eigen(mm)
mode(emm)
This function produced a special kind of an R object called
a list as a result, the components of which were a vector of
eigen values and a matrix of eigen vectors.
Objects such as a vector, a matrix, or an array, are
examples of simple objects.
A list, on the other hand, is a compound object, the
components of which may consist of several simple objects
or other compound objects.
,
2/10
Lists - continued
Components of lists, by default, are numbered in
sequence, or can be assigned names when they are
created.
Components may be referenced either by specifying a
number in square brackets, or, more conveniently, by using
an expression of the form:
listname$component name
if the components were assigned names when the list was
created.
The above symbolism is a very useful convention as it
makes it easier for users to specify the required
component of a given list. Many standard R functions
return the results of computations as named components
of a list. These named components are described in the
function description.
,
3/10
Lists – continued
,
Consider the lsfit() function used for performing least
squares fitting. It returns the parameter estimates and the
residuals in vector objects named coef and residuals.
To demonstrate, the model y = α + βx + is first fitted
using the R function lsfit(), using murder as the
y-variable and illit as the x-variable, respectively, both
variables taken from the state.x77 matrix
> illit <- state.x77[,3]
> murder <- state.x77[,5]
> regout <- lsfit(illit, murder)
> regout$coef
These are two of the named components of the list
object the function lsfit() returns. Other components of
regout are listed in the description of lsfit(), and
could also be obtained using names(regout).
For accessing components of a list such as the results
from lsfit(), the names of may be abbreviated to the
minimum number of letters needed to identify them
4/10
Creating List Objects
,
The list() function is used to create a list, If names are
assigned to the components of a list, then they can
accessed later using with the notation that was illustrated
above. The square or box brackets may be used to access
the components of a list, whether the components were
assigned names or not, as in this example:
> h <- vector(c(15.1, 11.3, 7.0, 9.0))
> names(h) <- "APE", "BOX" "CAT", "DOG"
> hm <- list(h,m)
or hm <- list(h = h, m = mm)
hm[[1]] or hm$h will access the first component
A very important use of lists is for the purpose of returning
results of computations carried out inside a
user-constructed function. As such results could consist of
different types of objects (vectors, matrices, and data
frames), a list is an ideal format to combine them all
together as a single object to be returned as the result of
evaluating a function.
5/10
Mapping Lists and “Ragged” Arrays
Lists cannot be directly used as arguments to R functions,
as demonstrated with the sqrt() function.
The sapply() function, another example of an R
mapping function, must be used for this purpose. In the
following example, the R function log() is applied to all
components of the list hm created in the earlier example. It
results in a list of the same structure as the original list.
> mode(hm)
> hm1 <- sapply(hm, log)
A “ragged” array refers to subsets of values of a vector that
corresponds to the same levels of a factor (or several
factors). It is called “ragged” because the lengths of these
vector subsets may not be the same. An example is the
vector weight whose subsets sre defined by the
corresponding levels of the factor feed:
,
> chickwts
weight feed
1 179 horsebean
2 160 horsebean
6/10
Ragged Arrays, Lists and tapply()
The tapply() function maps a function to a ragged array.
The function call is of the form
tapply(X, INDEX, FUN, ...)
wtm <- tapply(X = weight, INDEX = feed, FUN = mean)
wtsd <- tapply(X = weight, INDEX = feed, FUN = sd)
The first argument is an atomic R object, typically a vector,
the second argument index is a list of factors, each of
same length as the first argument, and the third argument
is the function to be applied followed by values that may be
specified for any other arguments needed by it.
If the function returns a scalar, then tapply() returns an
array with the same dimensions as index. In the following
example, several built-in R functions are applied to weight
using tapply():
> attach(chickwts)
> tapply(X = weight, INDEX = feed, FUN =
sum)
,
7/10
The tapply() function – More examples
Some more examples:
> wtsd <- tapply(X = weight, INDEX = feed, FUN = sd)
> n <- tapply(X = weight, INDEX = feed, FUN =
length)
> Lower <- wtm - qt(p = .975, n-1)*wtsd/sqrt(n)
> Upper <- wtm+qt(.975,n-1)*wtsd/sqrt(n)
> climits <- rbind(lower = Lower, upper = Upper)
> climits
The tapply() can be used even if the arrays are not
“ragged”, i.e., even if the lengths of the subset vectors are
the same. In the following example, we use the cabbages
object from the MASS package:
> help(cabbages,package="MASS")
> data(cabbages,package="MASS")
> with(cabbages, tapply(X = HeadWt, INDEX =
list(Cult,Date), FUN = length))
> with(cabbages, tapply(X = HeadWt, INDEX =
list(Cult,Date), FUN = mean))
,
8/10
Sweeping Out Arrays
Suppose we are required to subtract the column means of
a matrix from the elements in the corresponding columns.
This kind of an operation is called sweeping out a matrix,
and in general applies to arrays of any dimension. The
form of the function as applied to a matrix is:
sweep(matrix, margin, stats, function="-", ...)
where the arguments margin and function are defined
as for the apply() function, except that the default value
of the function argument is the function operator for
subtraction. The value of the stats argument is summary
statistic that is to be swept out.
,
9/10
Example: Sweeping Out Arrays
> m1 <- matrix(1:6, ncol = 2)
> sweep(m1, 2, colMeans(m1))
[,2]
-1 -1
0 0
1 1
> lenth <- sqrt(apply(m1ˆ2, 1 , sum))
> lenth
4.123106 5.385165 6.708204
> norm1 <- sweep(m1,1,lenth,"/")
> apply(norm1ˆ2,1,sum)
1 1 1
,
10/10
Download