Objects in S-Plus - Department of Mathematics and Statistics

advertisement
S-PLUS
Course Notes for STAT 462/862
2000 Edition
for S-Plus 2000 for Windows
 1996, 1997 Deanna Rothwell
Updated 2000 by Andrew Day
Table of Contents
Table of Contents
Table of Contents
2
Introduction to S-PLUS
6
Background
S-PLUS Capabilities
Available Documentation
Starting and Ending an S-PLUS Session
S-PLUS Language and Syntax
Getting Help
Objects in S-PLUS
Intro to Objects
Assigning Objects
Storing Objects
Listing or Removing Objects
Object Names
S-PLUS Namespace
Displaying Objects
Object Attributes
Data Values
Common Values
Special Values
Coercion
Notes on Logical Values
Notes on Missing Values
Saving S-PLUS Source Code and Output in External Files
S-PLUS Source Code
S-PLUS Output
Vectors
Properties
Attributes
Creating Vectors
Naming elements
Subsetting
Vector Arithmetic
Matrices
Properties
Attributes
Creating Matrices
Subsetting
Matrix Arithmetic
Matrix Manipulation
2
6
6
6
7
7
7
9
9
9
9
10
10
10
10
10
12
12
12
12
12
13
14
14
15
16
16
16
16
16
16
17
19
19
19
19
19
20
20
Table of Contents
Row and Column Names
Lists
Properties
Attributes
Creating Lists
Naming Components
Subsetting
Attaching and Detaching Lists
Data Frames
Properties
Attributes
Creating a Data Frame
Subsetting
Attaching and Detaching Frames
Matrix Manipulation
Changing a Data Frame to a Matrix or Vice Versa
Factors
Properties:
Attributes
Creating a Factor from a Vector
Labeling Levels of a Factor
Categorizing Continuous Data
Some Useful Functions Utilizing Factors
21
22
22
22
22
22
22
23
24
24
24
24
24
24
24
24
25
25
25
25
25
25
26
Reading Data into S-PLUS
27
Reading From the Keyboard
Reading from ASCII Files
27
27
Functions in S-PLUS
29
Expressions
Groups
The IF/ELSE Conditional
The FOR Loop
The WHILE Loop
Functions
Creating, Updating, Editing Functions
Returning More than one Object
Arguments to Functions
The .FIRST Function
29
29
29
29
30
30
31
31
31
32
Probability and Random Numbers
33
Function Syntax
Selecting a Random Sample
Graphics in S-PLUS
Opening a Graphics Device
3
33
33
34
34
Table of Contents
Simple Plots
Setting the Plot Shape
Creating Multiple Plots per Page
Adding Titles
Adding Axis Labels
Setting Axis Limits
Specifying Logarithmic Axes
Selecting Plot Types
Selecting Line Types
Selecting the Plotting Character
Adding Straight Lines to an Existing Plot
Adding Points/lines to an Existing Plot
Adding Text to a Plot
Adding Legends
Custom Graphics Parameters
34
35
35
35
35
36
36
36
36
36
37
37
37
37
38
Introduction to Statistics in S-PLUS
39
T-Tests
Other Hypotheses Tests
Summarizing Data in S-PLUS
Classical Linear Models
Updating Models
Options to lm()
Categorical Variables as Predictors
Interactions
Adding or Dropping Terms
Summaries of Fits
Designed Experiments and ANOVA
4
39
39
39
40
41
41
41
41
42
42
42
S-PLUS Intro
5
S-PLUS Intro
Introduction to S-PLUS
BACKGROUND
S-PLUS is an extension of “S” which was developed by ATT’s Bell Labs in the late 80’s. It is
currently produced by StatSci in Seattle, a division of MathSoft.
S-PLUS is a programming environment, like MATLAB, but for statistics. S-PLUS is an interpretive
language (not compiled). It lets you enter commands one-by-one and executes them as you enter
them.
S-Plus 2000 for Windows released in the summer of 1999 adds a GUI (graphical user interface)
to the S-Plus environment. Now many of the common S-Plus tasks can be accomplished
interactively though functions accessed via the toolbar and menus. This point and click
environment is not available for UNIX S-Plus or earlier versions of PC S-Plus. The commands line
which is used for all interfacing with the other S-Plus versions is still available in S-Plus 2000.
S-PLUS CAPABILITIES
Data management
S-PLUS allows for easy storage, organization, retrieval, and manipulation of data. All data objects
are defined as constants, vectors, matrices, etc so that calculations and data manipulation are
quite simple.
Graphics
S-PLUS has many built-in functions for 2- and 3-dimensional plotting, interactive graphics, data
visualization, multivariate graphics, survival curves, and custom graphics. S-PLUS does graphics
very nicely and it is fairly easy to customize the graphs to your liking.
Statistics
S-PLUS has a rich set of built-in functions for hypothesis testing, ANOVA, design of experiments
multiple regression, time series, survival analysis, quality control, generalized linear models,
generalized additive models, local regression, tree-based regression, discriminant anlaysis,
cluster analysis, principle component analysis, non-linear least squares, etc.
Extensibility
S-PLUS is a true programming language so you can write your own functions to automate
calculations or analyses that the built-in functions or procedures don’t do. S-PLUS also has the
ability to interface to Fortran, C, SPSS and other softare; and it is can directly import data in many
formats including: SAS, Excell, SPSS, Quattro Pro, Paradox, Microsoft Access.
AVAILABLE DOCUMENTATION
There are a large source of references both in print and on line. For a list of available references
in print see page 8 of the Users Guide (accessible through the S-Plus Help Menu).
S-Plus 2000: Programmer's Guide is available at the Queens Bookstore. It is a recommended
text for this course.
6
S-PLUS Intro
STARTING AND ENDING AN S-PLUS SESSION
Start S-Plus by selecting it from the programs list of the windows start menu. The data and
objects that you create will be stored in a specified directory on the D drive. If this directory does
not exist at startup then S-Plus will ask to create it.
To get into the command line(which is where we will spend most of our time) choose commands
window from the pull down Window menu. The commands window may also be started by
selecting the appropriate icon from the toolbar.
To exit S-PLUS close the outer S-Plus window or on the commands line type,
> q()
This is a function call, calling the function that quits S-PLUS. In S-PLUS all the commands are
functions which take arguments so you must always use parentheses when calling a function,
even if it takes no arguments. If you don’t type the parentheses, S-PLUS will define the function
for you by listing its code.
S-PLUS LANGUAGE AND SYNTAX
Expressions
You use the S-PLUS commands window by typing expressions after the prompt. Expressions are
evaluated when you press the return key. Most expressions will be function calls. To type a series
of expressions on the same line, separate them with semicolons.
Spaces
S-PLUS ignores spaces except in the middle of numbers or names. However, you may want to
add spaces for aesthetics and for ease of debugging.
Case
S-PLUS is case-sensitive, this goes for S-PLUS objects, arguments, names, etc.
Line Continuation
If you enter an incomplete expression and press return, you will see a ‘+’ (continuation prompt) on
the next line, giving you the opportunity to finish the command. For example,
> 6*
+ 2
[1] 12
Canceling an Expression
Hit the Esc key to stop S-Plus from evaluating an expression.
GETTING HELP
There are numerous ways to get help in S-Plus 2000.
7
S-PLUS Intro
1.
From command line can get help on a function or item by calling the help function with
the function as its argument or prefacing the function with a ?
eg.
>help(rev)
>?rev
both provide help for the rev function.
2.
args(function name)
This will tell you what arguments a function will take.
3.
From the pull down Help menu you will find a thorough indexed help for all functions and
language elements, as well as a complete set of on line reference manuals, visual demos, tips
and direct links to the S-Plus web home page and S-Plus technical support.
8
S-PLUS Objects
Objects in S-PLUS
INTRO TO OBJECTS
S-Plus can be thought of as an object oriented language in that it consists of objects. An object
is a very general term that can be used to describe any component in a software system that can
be defined by its type or class along with a set of properties(or attributes) and methods(which
describe the actions of functionality of an object).
There are three basic types of objects in S-Plus 2000. Computational engine objects include
data sheets and data frames, functions, lists, matrices, and vectors. Interface objects are objects
such as menu items, toolbars, and dialogs. Finally, document objects include graph sheets,
reports, and scripts. Only Computational engine objects exist in S-Plus for UNIX and earlier
versions of Windows S-Plus. In this course, when we refer to objects we mean Computational
engine objects.
In S-Plus you will create and manipulate computational engine objects. Each of the objects you
create will have certain attributes, depending on its object type. The objects we will use include,
vectors, matrices, lists, data frames, arrays, factors, and functions.
The simplest object is the vector. To create a vector you can use the combine function c() to
combine a set of numbers (e.g. 2,5,7,10), logical values (e.g. T,F,T,T), or character strings (e.g.
‘Jane’,’Henry’,’Bridgett’). The combine function forms a vector out of its arguments:
e.g.
> c(1,2,3)
[1] 1 2 3
ASSIGNING OBJECTS
The assignment operator (<-) is used to name and store data. For example, to store the value 20
in the variable x type:
> x <- 20
The underscore ( _ ) can also be used for assignment:
> x_20
but I suggest not using this method because a) it isn’t as obvious nor as readable; and b) it looks
like a SAS variable name.
STORING OBJECTS
Objects you create are permanently stored in a designated directory called _Data. The location of
the _Data directory depends on your installation and can be modified.
S-PLUS searches for objects first in the designated data _Data directory. If the object is not found
then S-PLUS looks through the directories or objects in its search path. To list the path, type
search()
To attach other directories to the search path use the attach() function. To remove extra
search directories from the search path use the detach() function.
9
S-PLUS Objects
For the S-Plus 2000 installed on the first floor computers of Jeffry Hall, the _Data directory resides
on the D drive of the machines. Anyone can modify or remove the contents of this directory and
it is only accessible from the given machine. If you wish to save your data (including functions)
then copy your _Data directory (or selected objects from it) to a floppy disc, or FTP it to your
UNIX account.
You may also save ASCII (text) versions of objects to files by using the write() or dump()
functions.
LISTING OR REMOVING OBJECTS
The functions objects() or ls() lists the objects in your working directory.
> objects()
lists all objects
> ls()
To remove objects from the working directory, use the rm() function:
> rm(x,y,z)
You can also remove objects form Windows by going into the _Data directory and deleting the
objects. The _Data directory has files with the same names as S-PLUS objects but the files are
readable only in S-PLUS.
OBJECT NAMES
Object names can be any combination of upper and lower case letters, numbers, and periods but
they must all start with a letter. The following are all legal names:
data
data.cancer
CancerData
cancer.data.version.1
S-PLUS NAMESPACE
The top value of any object is retrieved by default for objects of the same name. That is, if you
give an object the same name as a pre-defined S-PLUS object you won’t rewrite the S-PLUS
object but you will prevent S-PLUS from finding the pre-defined object since it will only see yours
first in the working directory.
Therefore, you should avoid choosing names that are the same as pre-existing S-PLUS functions
or you will be prevented from accessing that function. Use the masked function to list objects
which mask other objects.
Avoid using the following single-character names – C, D, c, q, s, and t – as they are already
defined as S-PLUS functions.
DISPLAYING OBJECTS
To look at the contents of any object, just type its name at the S-PLUS prompt:
> names
[1] “Matt” “Sam” “Lori”
OBJECT ATTRIBUTES
Every data object has a set of attributes, including:
class = what kind of object class, if any (factor, data frame, list)
10
S-PLUS Objects
length = # of components
mode = what kind of values (‘logical’, ‘numeric’, ‘complex’, ‘character’)
etc. (depending on the data structure)
Helpful functions are:
mode(<obj>)
length(<obj>)
attributes(<obj>)
data.class(<obj>)
11
returns the mode of an object
returns the length of an object
returns all other attributes of an object
returns the type of object
S-PLUS Objects
Data Values
COMMON VALUES
logical
TRUE or T and FALSE or F represent binary data.
numeric
• ordinal decimal numbers (e.g. 27, -6.28, 81.02)
• S-PLUS expressions that generate real values (e.g. pi, exp(2), 2/3)
• scientific notation (e.g. 3.12e-4 represents 0.000312)
• the special value Inf to represent infinity
complex
Similar to numeric except for complex numbers (e.g. a+bi)
character
Any character string enclosed in single or double quotes. There are some
special characters:
\t
tab
\n
new line
\"
double quotes
\'
apostrophe
\\
backslash
SPECIAL VALUES
NA
Represents 'Not Available'. It is the code for 'missing' for logical, numeric, and
complex data. A missing character value is represented by "".
NULL
Represents a 'non-value'. For example, if no values can be returned from a
function call, S-PLUS will return NULL.
COERCION
Many objects in S-PLUS can only contain one kind of value (e.g. vectors, matrices). If you try to
input different kinds of values into one of these objects, S-PLUS will 'coerce' them into a single
mode. In doing so, S-PLUS tries to retain as much information as possible.
When mixed values are entered into one of these objects, all the values are coerced to the mode
of the value with the most information. The values with the most information are characters,
followed by complex, numeric, then logical.
e.g.
> c('names',34,T)
[1] "names" "34" "TRUE"
> c(F,1,T,0)
[1] 0 1 1 0
S-PLUS converts TRUE and FALSE to 1 and 0 respectively when coerced to numeric.
NOTES ON LOGICAL VALUES
Logical values in S-PLUS are written as T, F, TRUE, or FALSE. These are neither numbers nor
character strings, so you don't have to write them in quotes.
Comparisons between Objects
You can compare objects using the following operators:
12
S-PLUS Objects
less than
less than or equal to
greater than
greater than or equal to
equal to
==
not equal to
<
<=
>
>=
!=
The result of a comparison is either a logical or a vector of logicals.
> x<-1:3
> x>1
[1] F T T
> x[x>1]
[1] 2 3
Logical Operators
Logical operators operate on logical objects:
!
not
& or &&
and
| or ||
or
where !, &, and | return vectors when possible and && and || are 'control operators' and always
return a single logical value.
e.g.
> (c(3,5,2) > c(1,6,0)) & (1:3)>2
[1] F F T
Functions Acting on Logical Vectors
any(x)
all(x)
e.g.
returns T if any elements in x are T
returns T if all elements in x are T
> if( all( x>0 ) ) y <- sqrt(x)
NOTES ON MISSING VALUES
S-PLUS uses NA to denote a 'missing' or 'not available' value. Like logical values, NA is neither a
number nor a character string. Any operation involving NA yields NA.
e.g.
> x <- 1:3
> x == NA
[1] NA NA NA
The function is.na(x) works component-wise to yield a logical vector indicating which elements
of x are NA.
13
S-PLUS External Files
Saving S-PLUS Source Code and Output in External Files
S-PLUS SOURCE CODE
1. Creating Source Files
You will often want to save your code so that it can be referenced or re-used later. In fact, it is
good to save code frequently to avoid losing work.
There are many ways to save code in S-Plus 2000.
 You can create and edit the code in an external editor such as Notepad and save the contents
as a file.
 You can open the history log (choose the accordion like icon from the toolbar) and save its
contents to a file.
 You can use the script editor and save the file.
 You can copy and past contents of history log or script editor to an external editor such as
notepad.
2. Reading Source Files
There are several ways to open and execute previously saved code.
 You can copy and paste code from a text editor into the script editor or commands line.
 You can open the code file into the script editor. Chose open from the pull-down file menu.
 You can use the source command
>source(‘code.txt’, auto.print=T) – reads and executes a file named code.txt from the S-Plus
default work directory. The auto.print=T option is required to echo the commands and output to
the commands window.
>source(‘a:\\code.txt’, auto.print=T) – runs code.txt from the a: drive. Note: that the double
backslash is required because a single backslash is used to identify special characters.
When the source() function is used, S-PLUS reads in the file and executes the commands one at
a time, outputting the results to the S-PLUS window as usual. When done, the prompt returns. If
there are errors in the code, none of the assignments made in the source file are kept
Hints
 build and test source files incrementally, re-editing the source file after finding errors
 use comments liberally both to clarify or to temporarily omit some code
COMMENTING CODE
Commenting means making code invisible to S-PLUS so that it doesn't read it in as executable
code.
Reasons for Commenting
 It’s useful for documenting what your commands are doing so you have a record of what
you’re doing and why and others can follow your code. You should get in the habit of
commenting your code.
14
S-PLUS External Files

Commenting is also helpful for temporarily making some commands invisible to S-PLUS so
that it doesn't read them in. This is useful when debugging code and playing around with it.
To comment in S-PLUS, use the number key (#). S-PLUS ignores everything that comes after #
on a line.
e.g.
# the following code defines x
y <- 144
x <- sqrt(y) / z
#x <- sqrt(y)
print(x)
# display the value of x
[1] 6
S-PLUS OUTPUT
By default S-PLUS only writes output to the window that you're running S-PLUS in. Chances are
you'll want to save some or all of that output to an external file so you have a permanent copy.
You may save output by copying it directly from the commands line our script window to a text
editor and saving the file. Or you may use the sink() function:
> sink("a:\\output")
sends the output from that point on to the external file named "output" on you’re a: (floppy) drive.
If you want S-PLUS to output the commands as well as the output, use the command
option(echo=T) in S-PLUS before sinking the output.
To cancel, type
> sink()
To append to an existing file, type
> sink("a:\\output",append=T)
15
S-PLUS Objects
Vectors
A set of elements in a specified order
PROPERTIES



class-less
all elements must be of same mode
not a special case of a matrix
ATTRIBUTES
length = # elements
mode = kind of values
names = value labels
CREATING VECTORS
You can create vectors in several ways. You may use the Fill Numeric Columns feature of the SPLUS 2000 interface by selecting fill from the Data pull down menu. There are also several
functions that allow you to create vectors. The most common of these functions are:
Function
scan
c
rep
:
seq
vector
logical
numeric
complex
character
Description
read values (any mode)
combine values (any mode)
repeat values (any mode)
numeric sequences
numeric sequences
initialize vectors
initialize logical vectors
initialize numeric vectors
initialize complex vectors
initialize character vectors
Examples
scan(), scan("myfile")
c(1,3,2,6), c("yes", "no")
rep(c(1,2), 3)
1:5, 1:-1
seq(-pi, pi, .5)
vector(‘complex’, 5)
logical(3)
numeric(4)
complex(5)
character(6)
Get help on the above functions for more information.
NAMING ELEMENTS
The names() function assigns names to elements of a vector.
e.g.
names(marks) <- c("HW1",”HW2”,”Mid”,”Final”)
SUBSETTING
Suppose x is a vector of any mode, then:
1.
x[4] returns the 4th element of x.
2.
If v is a vector of positive integers then x[v] returns the elements of x indicated by the
integers in v.
e.g.
> x <- c(4,34,7,2,8,3)
> v <- c(2,5)
> x[v]
[1] 34 8
3.
If v is a vector of negative integers then x[v] returns all the elements of x except those
indicated by the integers in v.
16
S-PLUS Objects
e.g.
> x <- c(4,34,7,2,8,3)
> v <- c(-2,-5)
> x[v]
[1] 4 7 2 3
4.
If v is a vector of logicals then x[v] returns the x[i] for which v[i]=T.
e.g.
> x <- c(4,34,7,2,8,3)
> v <- c(T,F,F,F,T,T)
> x[v]
[1] 4 8 3
5.
If v is a vector of character strings and x is a named vector then x[v] returns the
elements of x which have the indicated names.
e.g.
> x <- c(4,34,7,2,8,3)
> names(x) <- c('c1','c2','c3','c4','c5','c6')
> v <- c('c3','c6')
> x[v]
c3 c6
7 3
You can also change selected elements of a vector by using any of the above rules:
e.g.
> x <- c(4,3,2,6,7,8)
> x[ c(2,5) ] <- 0
> x
[1] 4 0 2 6 0 8
VECTOR ARITHMETIC
Arithmetic operations on vectors are performed element-by-element.
e.g.
> x
> y
> x
[1]
<- c(1,2,3)
<- c(2,3,4)
* y
2 6 12
'Short' vectors are 'recycled'. Whatever is missing is supplied by recycling the vector as often as
needed. You may get a warning if the vectors’ lengths are not exact multiples of one another.
e.g.
> z <-c (8,9)
> x + z
[1] 9 11 11
Operators
+
*
/
^
addition
subtraction
multiplication
division
exponentiation
To list the precedence of operators, type
> ?Syntax
Elementary Functions
The following familiar functions also work element-by-element:
log(), log10(), exp(), sin(), cos(), tan(), sqrt(), abs(),
ceiling(), floor(), trunc(), round(), signif(), etc.
17
S-PLUS Objects
Summary Functions
Some useful summary functions include:
max(x)
maximum element of x
min(x)
minumum element of x
range(x)
range of elements of x
length(x)
number of elements of x
sum(x)
sum of the elements of x
prod(x)
product of the elements of x
mean(x)
mean of the elements of x
sort(x)
elements of x sorted in ascending order
rev(x)
elements of x in reverse order
sort.list(x)
returns a vector of integers containing indices of the
elements of x in ascending order
> x<-c(4,3,8,1,9)
> sort.list(x)
[1] 4 2 1 3 5
Examples:
1. To compute
 x 
2
i
1
2
use sqrt( sum( x^2 ))
2. To compute the sample variance of x1, x2, ... , xn , i.e.

1
 xi  x
n 1
use sum( ( x-mean(x) ) ^2) / ( length(x)-1 )
18

2
S-PLUS Objects
Matrices
Two-dimensional array of elements of the same mode
PROPERTIES


class-less
all elements must be of the same mode
ATTRIBUTES
dim = dimensions (# rows, # columns)
length = number of values
mode = kind of values
dimnames = row and column names
CREATING MATRICES
To create a matrix, use the matrix() function.
By default values are stored by column and the number of columns=1.
e.g.
> matrix(1:12, ncol=3)
[,1] [,2] [,3]
[1,]
1
5
9
[2,]
2
6
10
[3,]
3
7
11
[4,]
4
8
12
> matrix(1:12, nrow=4)
[,1] [,2] [,3]
[1,]
1
5
9
[2,]
2
6
10
[3,]
3
7
11
[4,]
4
8
12
(Produces the same thing)
To store values by row, use the optional argument byrow=T.
e.g.
> matrix(1:12, nrow=4, byrow=T)
[,1] [,2] [,3]
[1,]
1
2
3
[2,]
4
5
6
[3,]
7
8
9
[4,]
10
11
12
e.g. To create a 2 x 5 matrix of 0's:
> matrix(0, nrow=2, ncol=5)
e.g. To create a 1 column matrix:
> matrix(1:12)
SUBSETTING
To get specific elements of a matrix, use either
1. matrix[i,j]
to specify the ith row and jth column.
2. matrix[i]
19
S-PLUS Objects
to specify the ith element, counting down by column.
e.g.
> A <- matrix(1:12, nrow=4)
> A
[,1] [,2] [,3] [,4]
[1,]
1
5
9
[2,]
2
6
10
[3,]
3
7
11
[4,]
4
8
12
> A[2,1]
[1] 2
> A[8]
[1] 8
> A[1:3, 2:3]
[,2] [,3]
[1,]
5
9
[2,]
6
10
[3,]
7
11
Note: an empty subscript equals all subscripts
e.g.
> A[1:2,]
[,1] [,2] [,3]
[1,]
1
5
9
[2,]
2
6
10
Note: subsetting can sometime lead to a vector
e.g.
> A[,2]
[1] 5 6 7 8
To keep this as a matrix, you must use the following option
> A[,2, drop=F]
MATRIX ARITHMETIC
Arithmetic on matrices is performed element-by-element. Therefore matrices must be
conformable.
MATRIX MANIPULATION
nrow(A)
ncol(A)
dim(A)
t(A)
A%*%B
cbind(A,B,z)
rbind(A,B,z)
crossprod(A,B)
crossprod(A)
diag(matrix)
diag(vector)
diag(n)
solve(A,b)
20
# rows
# columns
c(nrow(A),ncol(A))
tranpose
matrix multiplication
binds the listed vectors/matrices by column
binds the listed vectors/matrices by row
t(A)%*%B
crossprod(A,A)
main diagonal of indicated matrix
diagonal matrix with the vector on the diagonal
n x n identity matrix
solution x to the equation Ax=b
S-PLUS Objects
solve(A)
matrix inverse
ROW AND COLUMN NAMES
To assign names to the rows and columns, you need to create a list (we'll learn more about lists
later) with two components, the first for the row names and the second for the column names.
Each component is a vector of character values of the appropriate length:
> dimnames(M) <- list( c('row1','row2','row3'),
+ c('col1','col2','col3','col4') )
21
S-PLUS Objects
Lists
An ordered collection of objects ('components')
PROPERTIES




Each list component can be any data object.
Components can be of different modes and lengths. You can even have lists of lists.
The most general and flexible data object in S-PLUS.
Most S-PLUS functions return a list.
ATTRIBUTES
mode/class = "list"
length = # of top-level components
names = names of each top-level component
CREATING LISTS
To create a list, use the list() function.
> list(names, levels, groups)
creates a list with the objects names, levels, and groups
> list(index=1:8, bar=rnorm(100), mat=B)
creates a list and names its components
NAMING COMPONENTS
You can name components of a list directly using the above method or you can use the names()
function.
e.g.
> names(mylist) <- c("index","bar",'mat')
SUBSETTING
To specify list components you can use one of two methods:
1. Use the index number of the component enclosed in double brackets.
> data.list[[3]]
2. Specify the name of the list followed by a $ and the name of the component.
> data.list$name
Once you've specified the component, you can access parts of that component using the usual
single bracket method.
If you are specifying a component by using its name, it is useful to know that you don't actually
need to type in the full name of the component but only enough of the component name for SPLUS to be able to distinguish it from the other components.
For example in the list mylist above, you could specify the index component by:
> mylist$i
22
S-PLUS Objects
ATTACHING AND DETACHING LISTS
You can 'attach' a list to the search path so that you need only refer to the list's components by
name:
> attach(mylist)
You can also detach an attached list:
> detach("mylist")
23
S-PLUS Objects
Data Frames
A special list whose components are vectors of the same length
PROPERTIES



vector components are bound column-wise into a matrix-like structure
the preferred object for storing datasets
each column acts as a variable
ATTRIBUTES
class = "data frame"
names = column names (required)
row.names = row names (optional)
CREATING A DATA FRAME
To create a data frame, use the data.frame() function.
e.g.
> x <- 1:5
> myframe <- data.frame(x=x, x2=x*x, x3=x^3)
> myframe
x x2 x3
1 1 1
1
2 2 4
8
3 3 9 27
4 4 16 64
5 5 25 125
SUBSETTING
A data frame can be treated as either a matrix or a list when extracting components.
> myframe[,2]
> myframe[1,]
> myframe$x3
(picks out the 2nd column)
(picks out the 1st row)
(picks out the 3rd column named x3)
ATTACHING AND DETACHING FRAMES
You can attach and detach data frames as you would a list because all data frames are
essentially special cases of lists.
MATRIX MANIPULATION
Matrix operators (e.g. %*%, etc) unfortunately do not work on data frames until you coerce the
data frame into a matrix.
CHANGING A DATA FRAME TO A MATRIX OR VICE VERSA
To change a data frame to a matrix, use the data.matrix() function.
> mymatrix <- data.matrix(myframe)
To change a matrix to a data frame, use the as.data.frame() funtion.
> myframe <- as.data.frame(mymatrix)
24
S-PLUS Objects
Factors
A vector of categorical/discrete data
PROPERTIES:


the set of allowed categorical values is finite
values are called categories or levels
ATTRIBUTES
class = "factor"
(Generic factor)
"ordered"
(Ordinal categorical data)
levels = possible levels of the factor
Examples of a generic factor:
experimental status = treatment/control
gender = male/female
Examples of an ordered factor:
educational status
income class
CREATING A FACTOR FROM A VECTOR
If gender is a vector of length 100 with a bunch of 'M's and 'F's, then the following creates a
gender factor:
e.g.
> gen.factor <- factor(gender)
> gen.factor
[1] F M M M F M F F ....
attr(,levels):
[1] "F" "M"
To create an ordered factor, use the ordered() function. Suppose educ is a vector of length
100 with various values of education level: "E", "H", "U", and "P". Then the following creates an
ordered education factor:
e.g.
> educ.ord.fac <- ordered(educ, levels = c('E','H','U','P'))
LABELING LEVELS OF A FACTOR
You can provide other labels to the levels of a factor, other than the default ones, by using the
labels= option:
e.g.
> gen.factor <- factor(gender, labels = c("Female","Male"))
CATEGORIZING CONTINUOUS DATA
Use the cut() function to produce categorical data from continuous numerical data.
e.g.
25
> income <- 0:40
> income.cat <- cut(income, breaks = c(0,10,30))
> income.cat
[1] NA 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
[18] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 NA NA NA
S-PLUS Objects
[35] NA NA NA NA NA NA NA
attr(, "levels"):
[1] " 0+ thru 10" "10+ thru 30"
Values falling outside the limits receive NA.
Or alternatively,
> income.cat <- cut(income, breaks = 3)
> income.cat
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2
[27] 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3
attr(, "levels"):
[1] " -0.4+ thru 13.2" " 13.2+ thru 26.8"
[3] " 26.8+ thru 40.4"
SOME USEFUL FUNCTIONS UTILIZING FACTORS
split()
Converts a vector into a list of components according to the values in another vector (usually a
factor). Both vectors must be of the same length.
e.g.
> income.gender <- split(income, gen.factor)
returns a list having two components "F" and "M", each containing incomes.
table()
Computes frequency tables from factors of equal lengths.
e.g.
> table(educ.ord.fac, gen.fac)
produces a two-way table classifying and counting people into the levels of these two factors.
tapply()
Partitions a data vector according to the levels of a factor and applies a function to each partition.
The vector and factor must be of the same length.
e.g.
> tapply(income, gen.factor, mean)
returns a vector with two elements, the mean income for females and the mean income for males.
apply()
Partitions a matrix into either rows or columns and applies a function to each partition.
e.g.
> colmeans <- apply(mymatrix, 2, mean)
returns column means.
e.g.
> rowmeans <- apply(mymatrix, 1, mean)
returns row means.
26
S-PLUS Reading Data
Reading Data into S-PLUS
READING FROM THE KEYBOARD
Vectors
You can read in data from the keyboard in your S-PLUS session by using the c() function:
> x <- c(3,6,5,8,3,9,4,5)
but this is not very convenient.
A better option, when you have several data points, is to use the scan() function:
e.g.
> x <- scan()
1: 3 6 5
4: 8
6: 3 9 4 5
9:
> x
[1] 3 6 5 8 3 9 4 5
By default, scan expects numeric. To specify character values, use:
> x <- scan(what = "")
If character values have internal spaces then you need to use quotes around each value. For
example, "Joe Smith" reads as one value but Joe Smith reads as two. Optionally you can
use the sep= argument to indicate a special separator, like tabs or commas.
You may edit vectors by using the select data utility from the data menu. However, new data
objects created from this utility will be defined as data sheets rather than vectors.
Matrices
For reading in long matrices, you can call scan from inside the matrix function by using the
byrow=T option.
> mymatrix <- matrix(scan(), byrow=T, ncol=3)
1: 80 75
3: 60 90 100
6: 50
7:
You may use the select data utility to edit, but not create, matrices.
e.g.
READING FROM ASCII FILES
For a large dataset, you'll likely want to read in the data from an ASCII file. To do so, use the
file= option in the scan function.
Vectors
e.g.
> my.vector <- scan(file="datafile")
> my.char.vector <- scan(file="names", what="")
Note: S-PLUS will look in the default datadirectory for the file. If you want to read in a file from a
different directory, specify the path name in quotes as well.
27
S-PLUS Reading Data
Matrices
e.g.
> my.matrix <- matrix(scan(file='results'), byrow=T, nrow=12)
Data Frames
e.g.
> read.table("infile")
> read.table("infile", header=T)
The second example uses the first row in the file to assign column names. Get help for more
details when you have a column names inside the data file.
28
S-PLUS Functions
Functions in S-PLUS
EXPRESSIONS
In S-PLUS, everything you type is an expression. Each expression has a value which can be
assigned to other S-PLUS objects.
e.g.
z <- 1:4 is an expression, but so is 1:4.
This means you can have multiple assignments:
e.g.
> y <- z <- 1:4
GROUPS
Expressions can be collected by means of a matching pair of braces. Semicolons are needed
between expressions on the same line.
> {temp <- x; x <- y; y <- temp}
or
> { temp <- x
x <- y
y <- temp }
Both swap x with y.
e.g.
The value of a group equals the value of the last expression in the group.
THE IF/ELSE CONDITIONAL
if (condition) expression1 [ else expression2 ]
where the condition is a logical value, the expressions are usually groups, and the else
statement is optional.
The if statement is itself an expression whose value is either expression1 or expression2.
e.g.
> x <- 10
> z <- if( x>11 ) x else 0
> z
[1] 0
THE FOR LOOP
for (name in expression1) expression2
where name is the name of the counter (e.g. i ).
The for loop is an expression whose value is the value of the last expression executed.
Here 'name' is a variable that counts through the values in expression1.
> f <- 1
> for( i in 1:8 ) { f <- f*i }
Here f gets assigned to 8 factorial (8!)
e.g.
Vectorizing
29
S-PLUS Functions
e.g.
> x <- 1:8
> for (i in
+ { x[i] <+
x[i] <> x
[1] 3 5 7 9
1:8)
2*x[i]
x[i]+1 }
11 13 15 17
Note however that
> x <- 2*x + 1
does exactly the same thing as the above function. This is called 'vectorizing' and is always
preferable to looping.
Important:
S-PLUS is very slow at doing loops. You should avoid using loops whenever
possible.
THE WHILE LOOP
while (condition) expression
The while loop is an expression whose value is expression.
> n <- 8
> f<-1
> while(n>0){ f <- f*n; n <- n-1 }
This results in f = 8!
e.g.
Watch out for infinite loops!!
FUNCTIONS
name <- function(arguments) expression
Arguments are objects brought into the function from outside and are separated by commas. All
objects used in a function must be either brought in as an argument or defined inside the function.
Functions return the value of the expression (i.e. the value of the last line of the expression). To
return nothing, use the expression invisible() as the last line in the expression.
> fact <- fuction(n){
+
f <- 1
+
for(i in 1:n) f <- f*i
+ }
This is the factorial function. It returns the value of f.
e.g.
Assignments within functions are temporary and are lost upon exiting the function.
e.g.
> fact(4)
[1] 24
e.g.
> fact()
Error in fact: Argument "n" is missing, with no default:
fact()
. . .
Dumped
e.g.
> fact
30
S-PLUS Functions
function(n)
{
f <- 1
for(i in 1:n)
f <- f * i
}
CREATING, UPDATING, EDITING FUNCTIONS
You can create, update or edit a function using the fix() function which opens a text editor.
fix(function)
If the function doesn't exist, it creates a blank template which you can edit.
If the function already exists, it displays the current function for you to make changes.
If the edits you make have incorrect syntax, then no changes are saved and S-PLUS gives you a
warning. To beginning re-editing use
> fix()
This returns you to the last function you were 'fixing'.
To return to S-PLUS you must first quit the editor. You cannot use both at the same time.
RETURNING MORE THAN ONE OBJECT
To return more than one object from a function, put them all in a list and return the list.
e.g.
Then
stats <- function(x)
{
n <- length(x)
m <- mean(x)
s <- sqrt(var(x))
list(n=n, mean=m, sd=s)
}
> z <- stats(1:10)
> z
$n:
[1] 10
$mean:
[1] 5.5
$sd:
[1] 3.02765
ARGUMENTS TO FUNCTIONS
Default Values
To set a default value for an argument, assign it in the function definition.
e.g.
fact <- function(n=1) { prod(1:n) }
This assigns the default value of 1 to the argument n.
Arguments that have default values can be omitted on function calls, however arguments without
default values must be supplied.
e.g.
31
> fact()
[1] 1
S-PLUS Functions
In Function Calls
Arguments may be specified either by value (e.g. fact(8)) or by name (e.g. fact(n=8)). If
supplied by value, arguments must be specified in the same order as in the function definition.
Otherwise, arguments must be specified by name.
Note, argument names may be abbreviated so long as they are distinguishable by S-PLUS.
Variable Number of Arguments
The special name ... is used when there can be an arbitrary number of arguments. It can be
used inside an argument list and inside the body of the function to stand for those arbitrary
arguments.
e.g.
> first<-function(...){ c(...)[1] }
This function returns the first element of all its arguments.
THE .FIRST FUNCTION
The .First function is used to customize the S-PLUS session. Commands inside the .First
function are executed each time S-PLUS starts up.
e.g.
.First<-function()
{
options(prompt='Yes master?')
}
To source the .First function after you've edited it, type
> .First()
There is also a .Last function which is automatically sourced at the end of each session.
32
S-PLUS Probability Functions
Probability and Random Numbers
FUNCTION SYNTAX
The probability and random number functions all begin with one of the following letters:
r
p
d
q
random number generator
cumulative distribution function
density function
quantiles
Possible Distributions
beta
binom
cauchy
chisq
exp
f
gamma
geom
hyper
lnorm
logis
nbinom
pois
stab
t
unif
weibull
wilcox
Examples
1. How do I generate a vector of 5 random numbers with mean = 4 and standard deviation = 4?
> rnorm(5,mean=4,sd=4)
[1] 8.169303 1.253711 1.783170 1.878477 6.829833
2. What's the probability that a standard normal is  1.96?
> 1-pnorm(1.96)
[1] 0.0249979
3. What is the result of the standard normal density function evaluated at 0?
> dnorm(0)
[1] 0.3989423
4. What is the quantile of the standard normal distribution for the probability 0.5?
> qnorm(.5)
[1] 0
5. What is the quantile of the standard normal distribution for the probability 0.95?
> qnorm(.95)
[1] 1.644854
6. What is the quantile for the t-distribution on 8 degrees of freedom for the probability 0.975?
> qt(.975,8)
[1] 2.306004
7. What are the quantiles of the uniform distribution going from 0 to 4 for probabilities of 0.25 and
0.75?
> qunif(c(.25,.75),0,4)
[1] 1 3
SELECTING A RANDOM SAMPLE
To randomly select items from a finite population use the sample() function.
Use the S-PLUS help to show you how.
33
S-PLUS Graphics
Graphics in S-PLUS
OPENING A GRAPHICS DEVICE
When a graph is created in S-Plus 2000 a graphics window is opened if one is not already open.
Subsequent calls to plotting functions will produce graphics in this window.
To turn off the graphics window you may close the window or issue the command:
> dev.off()
SIMPLE PLOTS
Plotting Vectors
1. Index Plots
To plot the elements of a numeric vector x, type
> plot(x)
to plot x[i] against i.
2. Scatter Plots
To plot the vector y on the vertical and x on the horizontal, type
> plot(x,y)
where x and y are both numberic and of the same length.
Plotting a Matrix
Suppose M is a matrix with two numeric columns. Then
> plot(M)
plots the 1st column of M on the horizontal and the 2nd column of M on the vertical.
Plotting Complex Numbers
If z is a vector of complex values, then
> plot(z)
plots the real part of z on the horizontal and the imaginary part of z on the vertical.
Plotting Mathematical Functions
To plot a mathematical function, you need two create two vectors:
 one to hold a range of values over which you want to display the function
 another to hold the result of the function over that range.
e.g. To plot the sine function from 0 to 20
> x <- seq(0, 20, by=.1)
> y <- sin(x)
> plot(x, y)
Or similarly
> plot(x, sin(x))
Hint: Vary the number of points to obtain smoother or rougher plots.
34
S-PLUS Graphics
SETTING THE PLOT SHAPE
The default shape of the plotting box is rectangular.
To make the shape of the plotting box square for subsequent plots, type
> par(pty='s')
To return to the default (rectangular) plotting shape, type
> par(pty='')
CREATING MULTIPLE PLOTS PER PAGE
The default number of plots displayed on the screen is one. To create a display with more than
one plot, use the mfrow or mfcol arguments to the par() function.
To create a 3x2 matrix of figures filled in by row, type
> par(mfrow=c(3,2))
To create a 3x2 matrix of figures filled in by column, type
> par(mfcol=c(3,2))
To start a new screen before a multiple plot is finished, just issue another par(mfrow=...) or
par(mfcol=...) command.
To return to the default one figure per page, type
> par(mfrow=c(1,1))
ADDING TITLES
You can add either main titles or subtitles to a plot in either of two ways
 directly in the plot() function using the argument main or sub
 after the plot() function using the function title() and the argument main or
sub
e.g.
> plot(time, pct.fat, main='Percent Body Fat over Time',
+ sub='Patient #14')
e.g.
> plot(age, ps)
> title(main='Performance Status by Age', sub='Placebo Group')
Note: The main title appears above the plot and the subtitle appears below the plot.
ADDING AXIS LABELS
S-PLUS will automatically label your axes by the names of the variables you are plotting. This
usually doesn't look very nice and you'll probably want to add your own labels for clarity.
To do so, use the xlab= and/or ylab= arguments
e.g.
> plot(wk, no.cig,xlab='Time (in weeks)',
+ ylab='Number of Cigarettes')
You can also use these options inside the title() function.
Hint: If you don't want any labels to appear, use xlab='' or ylab=''.
35
S-PLUS Graphics
SETTING AXIS LIMITS
The limits are automatically set by the S-PLUS plotting functions, but you can override this and
choose your own. S-PLUS will round your specified limits to 'sensible' limits.
e.g.
> plot(time, pct.fat,xlim=c(0,100), ylim=c(0,1))
To maintain the same axis limits over future plots, type
> par(xaxs='d', yaxs='d')
to freeze the axis limits to those of the last plot. If you only want to control one axis, drop one of
the arguments as appropriate.
To return to the default, type
> par(xaxs='', yaxs='')
SPECIFYING LOGARITHMIC AXES
To put the log scale on the x-axis, type
> plot(time, pct.fat, log='x')
To do so for the y-axis, use log='y', and for both axes, use log='xy'.
SELECTING PLOT TYPES
Inside the plot() or other graphics function you can specify any of the following line types to
display your data using the type= option.
type='p'
points
type='l'
lines
type='b'
both lines and points (isolated points)
type='o'
both lines and points (overstruck points)
type='h'
high-density plot (vertical line for each data point)
type='s'
stairstep plot
type='n'
empty plot (axis and labels only)
SELECTING LINE TYPES
When your plot involves lines you can also select the type of line you want to display. By default
S-PLUS will plot a solid line. To change this, use the lty= option in the plotting function you are
using.
e.g.
> plot(time, rate, lty=2)
plots a dotted line.
There are eight different line types to choose from (i.e. lty=0,...,lty=8), where lty=0 is 'no
line', lty=1 is the default solid line and the others are variations on dotted lines.
SELECTING THE PLOTTING CHARACTER
The default plotting character is '*' or a dot, depending on the graphics device and plotting
function.
To select a different character, use the pch= option in the plotting function
*
by typing pch=n where n is an integer from 0 to 18 (results in squares, circles, ...)
*
by specifying the symbol in quotation marks (e.g. pch='#')
36
S-PLUS Graphics
ADDING STRAIGHT LINES TO AN EXISTING PLOT
Sometimes it's useful to be able to display straight lines on your plot.
To overlay a line with a given slope and intercept, use
> abline(intercept, slope)
To add a least-squares regression line you can do the following
> plot(x,y)
> abline(lm(y~x), lty=2)
(The function lm() fits a linear model using the method of least squares).
ADDING POINTS/LINES TO AN EXISTING PLOT
Often you want to show several lines on a plot or add additional data to a plot.
The lines() function adds lines to the current plot.
The points() function adds points to the current plot.
Both functions work almost exactly like the plot() function. All the optional arguments above
(lty= , type= , pch= ) can also be used with these functions.
e.g.
>
+
>
>
plot(time, group1, type='l', lty=2, xlim=c(0,100), ylim=c(1,20),
xlab='Days', ylab='Body Fat')
lines(time, group2, lty=4)
lines(time, group3, lty=5)
Note If the data you add to a plot have a greater range than the limits in the existing plot, you will
receive a warning message and those points outside the range will not be plotted. To solve this
problem, set up appropriate axis limits in the first call to the plotting function.
ADDING TEXT TO A PLOT
To add text to an existing plot, use the text() function
e.g.
> text(x=10, y=2,'Placebo Group')
uses x- and y-coordinates to place the text.
e.g.
> text(locator(1),'Experimental Group')
allows you to select the location of the text using the mouse.
Hint: The default positioning of the text is centered at the point you choose. You can change this
using the adj= option.
ADDING LEGENDS
If you're making graphs with several sets of data and line types or characters, you'll generally want
to provide a legend for clarity. The legend() function does this.
> plot(year,series1,pch='*')
> lines(year,series2,lty=2)
> lines(year,series3,type='o',lty=5,pch='+')
> legend(locator(1),c('Series 1','Series 2','Series 3'),
+ pch='* +',lty=c(0,2,5))
Note the deliberate space in the pch= option to indicate there is no plotting character for Series 2.
37
S-PLUS Graphics
CUSTOM GRAPHICS PARAMETERS
To personalize your plots you have to change the graphics parameters.
Layout




affects entire page
can be changed only using par()
changes last until next change or until the end of the session
changes affect only the current device
High-Level
 used only in high-level graphics functions, never inside par()
 changes are only in effect for the function call
e.g. plot(x,y,log='xy')
General


can be changed either inside a high-level graphics function or inside par()
if set inside par() changes stay in effect until next change or end of session
e.g. plot(x,y,lty=4)
par(lty=4)
Look up the par() function from the online help to get a complete list of graphical parameters.
38
Statistics in S-PLUS
Introduction to Statistics in S-PLUS
T-TESTS
One-Sample t-Test
Question: Given the data in the vector x, how do I test
H0:  = 44
vs.
H1:   44?
Answer: t.test(x,mu=44)



The default value for  is 0.
The default confidence level is 95%
The default action is a two-sided test
Two-Sample t-Test
Question: Given data in the vectors x and y, how do I test
H0: x = y
vs.
H1: x < y?
Answer: t.test(x,y,alternative='less')

The default assumes equal variances.
Paired t-Test
e.g.
t.test(x,y,paired=T)
Note: All these tests produce an object of the class 'htest' containing details of the test and its
conclusion.
OTHER HYPOTHESES TESTS
The following functions also return objects of the class 'htest'.
var.test
binom.test
prop.test
wilcox.test
kruskal.test
friedman.test
cor.test
chisq.test
fisher.test
mcnemar.test
mantelhaen.test
SUMMARIZING DATA IN S-PLUS
General Idea: Before beginning modeling, you should examine the data first.
plot()
summary()
These two functions are 'generic' functions in the sense that they can recognize the class of object
that they're given and react accordingly.
summary(myframe)
 produces a printed summary of all variables
 produces mean, median, quartiles, extremes for numeric variables
 produces frequencies and table of contents for factors
 lists counts of missing values
plot(myframe)
39
Statistics in S-PLUS



summarizes the distribution of the variables
shows a quantile plot for numeric variables
shows a graph of counts for each level of factors
plot(formula, myframe)
 produces scatter plots of the variables specified in the formula
e.g. plot(z~x+y, myframe) produces scatter plots of z vs x and z vs y.
 if the left side of the formula is omitted it produces distribution plots
e.g. plot(~x+y, myframe) produces two distribution plots
pairs(~x+y+z, myframe)
 produces a 'matrix' of scatter plots
scatter.smooth(myframe$z ~ myframe$x)
 plots a scatter plot and overlays a smooth curve using non-parametric regression.
CLASSICAL LINEAR MODELS
lm(formula, dataframe)
e.g. Suppose you have a dataframe called 'myframe' containing variables w,x,y,z. The formula
y~x+z+w
is interpreted by lm to mean 'y is modeled as a linear combination of x, z, and w plus an
intercept'. Mathematically this means
y = a + bx + cz + dw.
To fit this model in S-PLUS, type
lm(y~x+z+w,myframe)
Note: lm() uses least squares to fit models. It creates an 'lm' object which can be used by other
functions to analyze and modify the fit.
The call
fit<-lm(y~x+z+w,myframe)
doesn't produce any output. Instead the object fit holds all the essential information. It is a list
with a number of components, for example
fit$coefficients
fit$residuals
fit$fitted.values
You can extract information from fit using the functions
coef(fit)
resid(fit)
fitted(fit)
To print the lm object just type its name. This gives a brief summary of the fit. For a more
technical statistical description, use
summary(fit)
More on the Formula



40
The individual terms on the right-hand side can be numeric vectors (one coefficient),
factors (one coefficient for every level), numeric matrices (one coefficient for every
column). Logical and character vectors are turned into factors.
The response cannot be a factor.
The special name "." can be used on the right-hand side to stand for all the variables
in the data frame other than the response.
Statistics in S-PLUS




e.g. lm(y~., myframe)
Operators (+ : * ^ / -) have special meaning on the right-hand side. To include
terms that use these operators in the usual sense, you have to protect them with the
identity function I().
e.g. y~I(x+z) has a single predictor = x+z
To omit the intercept, put a term "-1" in the model formula.
e.g. y~x+z-1
If you omit the data argument from the call, S-PLUS will search its path for any
variables that it needs. It is often useful to attach a dataframe before fitting.
You can save a formula as an object.
e.g. myform<-formula(y~x+z-1)
UPDATING MODELS
e.g.
1.
fit1 <- lm( y ~ x + z, myframe)
2.
Add the predictor w to the model:
fit3 <- update( fit1, . ~ . + w)
3.
Get rid of the predictor w:
fit.old <- update( fit3, . ~ . - w)
4.
Add w2 to the model:
fit4 <- update( fit1, . ~ . + I(w^2))
5.
Change the response to sqrt(y):
fit5 <- update( fit1, sqrt(.) ~ .)
OPTIONS TO lm()
1. subset = <index vector indicating rows of data frame>
This fits the model to only the indicated subset of the data frame.
2. weights = <vector of non-negative weights, same length as the #rows
in the data frame>
This performs weighted regression.
3. na.action=na.omit
Drops any rows of the data frame for which any of the variables included in the fit have missing
values. (S-PLUS can't deal with missings in the predictor or response).
CATEGORICAL VARIABLES AS PREDICTORS
Suppose I'm fitting the model
salary ~ age + gender
Then the lm() function fits dummy variables for the categories of gender.
INTERACTIONS
a)
b)
c)
41
gender:age
gender*age
(x+y+z)^2
Indicates the interaction of gender and age.
Equivalent to gender + age + gender:age.
Symbolizes all terms involving x,y,z of the order 2 or less
i.e. x + y + z + x:y + x:z + y:z
Statistics in S-PLUS
ADDING OR DROPPING TERMS
Another way to update the lm object:
drop1(fit)
 Produces statistics obtained by dropping each term out of the model, one at a time.
add1(fit,c('v','log(v)')
 Produces statistics obtained by adding the indicated terms one at a time.
SUMMARIES OF FITS
plot(fit)
summary(fit)
qqnorm(resid(fit))
Diagnostics of the fit
Printed summary of the fit
Quantile-quantile plot of residuals
DESIGNED EXPERIMENTS AND ANOVA
1. One-Factor Experiments
Given: A data frame with 2 components, a factor holding treatments and a numeric variable
holding the responses. The number of rows equals the number of experimental units.
plot.design(myframe)
 Plots mean response for each factor level and overall mean response
plot.design(myframe,fun=median)
 Plots median responses
plot.factor(myframe)
 Shows boxplots of response, one for each factor level
aovfit <- aov(response ~ treatment, myframe)
 Runs an analysis of variance
summary(aovfit)
fitted(aovfit)
resid(aovfit)
hist(resid(aovfit))
qqnorm(resid(aovfit))
plot(fitted(aovfit),resid(aovfit))
Displays ANOVA table
Returns fitted values
Returns residuals
Makes histogram of residuals
Makes Q-Q plot of residuals
Plots residuals against fitted values
2. Two-Factor Experiments
Given: A data frame with 3 components, a factor holding treatments, a factor holding block, and
a numeric variable holding the responses. The number of rows equals the number of
experimental units.
plot.design(myframe)
 Plots mean response for each level of each factor and overall mean response
plot.factor(myframe)
 Shows two sets of boxplots, one for each factor
interaction.plot(treatment,block,response)
 Plots the response against treatment for each level of block on the same graph.
42
Statistics in S-PLUS
aovfit <- aov(response ~ treatment + block, myframe)
 Runs an analysis of variance
43
Download