Stat 579: List Objects Ranjan Maitra 2220 Snedecor Hall Department of Statistics Iowa State University. Phone: 515-294-7757 maitra@iastate.edu , 1/10 Example: Eigenvalues of a matrix mm <- matrix(rpois(9), ncol = 3) emm <- eigen(mm) mode(emm) This function produced a special kind of an R object called a list as a result, the components of which were a vector of eigen values and a matrix of eigen vectors. Objects such as a vector, a matrix, or an array, are examples of simple objects. A list, on the other hand, is a compound object, the components of which may consist of several simple objects or other compound objects. , 2/10 Lists - continued Components of lists, by default, are numbered in sequence, or can be assigned names when they are created. Components may be referenced either by specifying a number in square brackets, or, more conveniently, by using an expression of the form: listname$component name if the components were assigned names when the list was created. The above symbolism is a very useful convention as it makes it easier for users to specify the required component of a given list. Many standard R functions return the results of computations as named components of a list. These named components are described in the function description. , 3/10 Lists – continued , Consider the lsfit() function used for performing least squares fitting. It returns the parameter estimates and the residuals in vector objects named coef and residuals. To demonstrate, the model y = α + βx + is first fitted using the R function lsfit(), using murder as the y-variable and illit as the x-variable, respectively, both variables taken from the state.x77 matrix > illit <- state.x77[,3] > murder <- state.x77[,5] > regout <- lsfit(illit, murder) > regout$coef These are two of the named components of the list object the function lsfit() returns. Other components of regout are listed in the description of lsfit(), and could also be obtained using names(regout). For accessing components of a list such as the results from lsfit(), the names of may be abbreviated to the minimum number of letters needed to identify them 4/10 Creating List Objects , The list() function is used to create a list, If names are assigned to the components of a list, then they can accessed later using with the notation that was illustrated above. The square or box brackets may be used to access the components of a list, whether the components were assigned names or not, as in this example: > h <- vector(c(15.1, 11.3, 7.0, 9.0)) > names(h) <- "APE", "BOX" "CAT", "DOG" > hm <- list(h,m) or hm <- list(h = h, m = mm) hm[[1]] or hm$h will access the first component A very important use of lists is for the purpose of returning results of computations carried out inside a user-constructed function. As such results could consist of different types of objects (vectors, matrices, and data frames), a list is an ideal format to combine them all together as a single object to be returned as the result of evaluating a function. 5/10 Mapping Lists and “Ragged” Arrays Lists cannot be directly used as arguments to R functions, as demonstrated with the sqrt() function. The sapply() function, another example of an R mapping function, must be used for this purpose. In the following example, the R function log() is applied to all components of the list hm created in the earlier example. It results in a list of the same structure as the original list. > mode(hm) > hm1 <- sapply(hm, log) A “ragged” array refers to subsets of values of a vector that corresponds to the same levels of a factor (or several factors). It is called “ragged” because the lengths of these vector subsets may not be the same. An example is the vector weight whose subsets sre defined by the corresponding levels of the factor feed: , > chickwts weight feed 1 179 horsebean 2 160 horsebean 6/10 Ragged Arrays, Lists and tapply() The tapply() function maps a function to a ragged array. The function call is of the form tapply(X, INDEX, FUN, ...) wtm <- tapply(X = weight, INDEX = feed, FUN = mean) wtsd <- tapply(X = weight, INDEX = feed, FUN = sd) The first argument is an atomic R object, typically a vector, the second argument index is a list of factors, each of same length as the first argument, and the third argument is the function to be applied followed by values that may be specified for any other arguments needed by it. If the function returns a scalar, then tapply() returns an array with the same dimensions as index. In the following example, several built-in R functions are applied to weight using tapply(): > attach(chickwts) > tapply(X = weight, INDEX = feed, FUN = sum) , 7/10 The tapply() function – More examples Some more examples: > wtsd <- tapply(X = weight, INDEX = feed, FUN = sd) > n <- tapply(X = weight, INDEX = feed, FUN = length) > Lower <- wtm - qt(p = .975, n-1)*wtsd/sqrt(n) > Upper <- wtm+qt(.975,n-1)*wtsd/sqrt(n) > climits <- rbind(lower = Lower, upper = Upper) > climits The tapply() can be used even if the arrays are not “ragged”, i.e., even if the lengths of the subset vectors are the same. In the following example, we use the cabbages object from the MASS package: > help(cabbages,package="MASS") > data(cabbages,package="MASS") > with(cabbages, tapply(X = HeadWt, INDEX = list(Cult,Date), FUN = length)) > with(cabbages, tapply(X = HeadWt, INDEX = list(Cult,Date), FUN = mean)) , 8/10 Sweeping Out Arrays Suppose we are required to subtract the column means of a matrix from the elements in the corresponding columns. This kind of an operation is called sweeping out a matrix, and in general applies to arrays of any dimension. The form of the function as applied to a matrix is: sweep(matrix, margin, stats, function="-", ...) where the arguments margin and function are defined as for the apply() function, except that the default value of the function argument is the function operator for subtraction. The value of the stats argument is summary statistic that is to be swept out. , 9/10 Example: Sweeping Out Arrays > m1 <- matrix(1:6, ncol = 2) > sweep(m1, 2, colMeans(m1)) [,2] -1 -1 0 0 1 1 > lenth <- sqrt(apply(m1ˆ2, 1 , sum)) > lenth 4.123106 5.385165 6.708204 > norm1 <- sweep(m1,1,lenth,"/") > apply(norm1ˆ2,1,sum) 1 1 1 , 10/10