STAT 534: Computational Statistics

advertisement
STAT 534: Statistical Computing
Hari Narayanan
harin@uw.edu
Course objectives
• Write programs in R and C tailored to specifics
of statistics problems you want to solve
• Familiarize yourself with:
– optimization techniques
– Markov Chain Monte Carlo (mcmc)
Logistics
• Class:
– Tuesdays and Thursdays 12:00pm – 1:20pm
• Office hours:
– Thursday 2:30pm – 4pm (Padelford B-301) or by appt
• Textbooks:
– Robert & Casella : Introducing Monte Carlo Methods with
R
– Kernighan & Ritchie : C Programming Language
• Evaluation:
– 4 assignments
– 2 quizzes
– Final project
Introduction to R
• R is a scripting language for statistical data
manipulation and analysis
• R is the successor of S/S Plus
• R standard for professional statisticians
• R is free and available on major platforms
(Windows, Unix, Mac)
• It is: general, object oriented
• It is an interpreted programming language
Getting R
• Main website: http://cran.r-project.org/
• ~25 standard packages come with a default
download, many more contributed packages
can be obtained from the main website
• Development environment/GUI:
– Rstudio http://www.rstudio.com/
First R interactive session
• Type interactive commands at the prompt
> 2+3
5
> 2==4
FALSE
> 5/0
Inf
> 0/0
NaN
• Note that R is case sensitive
• Getting help
– help(FALSE)
– ?FALSE
• Ending session:
– >q()
R workspaces
• R creates and manipulates objects: variables, arrays of numbers, list
of character, functions, structures build from these components:
>a=4
>b=5
> objects() # list all the objects in this workspace
[1] "a" "b"
> ls() # same as objects()
[1] "a" "b"
> rm(a) # remove an object from this workspace
> ls()
[1] "b“
• Objects of the current session are stored in .Rdata in the current
folder and command history is stored in .Rhistory
– These are reloaded every time you start R from the same directory
Assignment
• Multiple ways to assign values [ primitive
values or results of the evaluation of an
expression] to variables:
>a=2+3
> a <- 2 + 3
> 2 + 3 -> a
> assign(“a”, 2+3)
Vectors
• Created using the c (concatenation) function:
> v = c(1,2)
>v
[1] 1 2
• A number by itself is considered a vector of length 1
• No nesting
> u=c(-4, v, 4)
>u
[1] -4 1 2 4
• Missing values
c(1, NA, 4)
Operations on vectors
• Regular arithmetic operations apply (+, -, *, /, ^). Shorter vector is
recycled to match needed length:
> a=c(1,2) # becomes 1 2 1
> b=c(1,2,3)
> r=3*a+b-1
Warning message:
In 3 * a + b :
longer object length is not a multiple of shorter object length
>r
[1] 3 7 5
• Other math functions can be applied element wise : sqrt, log, ..
• Other functions: max, min, length, sum, prod, mean, var, sort
> sort(c(4,3,7))
[1] 3 4 7
Logical operations
•
Operators <, <=, >, >=, ==, &, |, !
> a=c(2,4)
> r1=a>3
[1] FALSE TRUE
> r2=a>4
[1] FALSE FALSE
> r1 & r2
[1] FALSE FALSE
> r1 | r2
[1] FALSE TRUE
> ! r1
[1] TRUE FALSE
•
Can be used in arithmetic operations, FALSE coerced to 0, TRUE to 1
> r1 + 1
[1] 1 2
> c(2,3) & c(0,1)
[1] FALSE TRUE
Generating vectors
•
: operator (high precedence in an expression)
> a=3
> 1:a
[1] 1 2 3
> 1:a+1
[1] 2 3 4
> 1:(a+1)
[1] 1 2 3 4
•
seq function
> seq(from=2, to=4) # named arguments same as seq(2,4)
[1] 2 3 4
> seq(to=4, from=2)
[1] 2 3 4
> seq(from=2, length=3)
[1] 2 3 4
•
rep function
> a=c(1,2)
> rep(a, times=2)
[1] 1 2 1 2
> rep(a, each=2)
[1] 1 1 2 2
Manipulating vector data
• Simple indexing:
> a=c(2,3,8)
> a[1]
[1] 2
> a[5]
[1] NA
> a[-1]
[1] 3 8
• More complex:
> a[1:2]
[1] 2 3
> a[a>2 & a<7]
[1] 3
> a[c(1,1)]
[1] 2 2
Matrices
•
Associating a dimension vector with a vector allows it to be treated by R as an array/matrix:
> a=c(2,3,8)
> attributes(a)
NULL
> dim(a) = c(3,1)
>a
[,1]
[1,] 2
[2,] 3
[3,] 8
> attributes(a)
$dim
[1] 3 1
> matrix(c(1,2,3,4,5,6), nrow=2)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> matrix(c(1,2,3,4,5,6), nrow=2, byrow=TRUE)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
Operations on matrices
•
•
•
•
Addition/subtraction/element-wise multiplication : +,-,*
Matrix multiplication : %*%
Transpose : function t() e.g. t(matrix(c(1,2),nrow=1))
diag function:
– If argument is a number we get identity matrix
> diag(2)
[,1] [,2]
[1,] 1 0
[2,] 0 1
– If argument is a vector, we get diag matrix with elements of vector
> diag(c(1,2))
[,1] [,2]
[1,] 1 0
[2,] 0 2
– If argument is a matrix, we get the elements of its major diagonal
>m
[,1] [,2]
[1,] 3 5
[2,] 4 6
> diag(m)
[1] 3 6
Indexing
•
Similar to indexing vectors, except we have an indexing vector for every dimension:
>m
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> m[2,2]
[1] 4
> m[c(2),c(2)] # indexing vectors
[1] 4
> m[c(1,2),c(2)] # first 2 rows and 2nd column
[1] 3 4
> m[c(1,2),c(2,3)]
[,1] [,2]
[1,] 3 5
[2,] 4 6
> m[c(1,2),c(2,3)]=0
>m
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 2 0 0
> m[c(TRUE,FALSE),TRUE] # keep first line and all columns
[1] 1 0 0
Download