STAT 534: Statistical Computing Hari Narayanan harin@uw.edu Course objectives • Write programs in R and C tailored to specifics of statistics problems you want to solve • Familiarize yourself with: – optimization techniques – Markov Chain Monte Carlo (mcmc) Logistics • Class: – Tuesdays and Thursdays 12:00pm – 1:20pm • Office hours: – Thursday 2:30pm – 4pm (Padelford B-301) or by appt • Textbooks: – Robert & Casella : Introducing Monte Carlo Methods with R – Kernighan & Ritchie : C Programming Language • Evaluation: – 4 assignments – 2 quizzes – Final project Introduction to R • R is a scripting language for statistical data manipulation and analysis • R is the successor of S/S Plus • R standard for professional statisticians • R is free and available on major platforms (Windows, Unix, Mac) • It is: general, object oriented • It is an interpreted programming language Getting R • Main website: http://cran.r-project.org/ • ~25 standard packages come with a default download, many more contributed packages can be obtained from the main website • Development environment/GUI: – Rstudio http://www.rstudio.com/ First R interactive session • Type interactive commands at the prompt > 2+3 5 > 2==4 FALSE > 5/0 Inf > 0/0 NaN • Note that R is case sensitive • Getting help – help(FALSE) – ?FALSE • Ending session: – >q() R workspaces • R creates and manipulates objects: variables, arrays of numbers, list of character, functions, structures build from these components: >a=4 >b=5 > objects() # list all the objects in this workspace [1] "a" "b" > ls() # same as objects() [1] "a" "b" > rm(a) # remove an object from this workspace > ls() [1] "b“ • Objects of the current session are stored in .Rdata in the current folder and command history is stored in .Rhistory – These are reloaded every time you start R from the same directory Assignment • Multiple ways to assign values [ primitive values or results of the evaluation of an expression] to variables: >a=2+3 > a <- 2 + 3 > 2 + 3 -> a > assign(“a”, 2+3) Vectors • Created using the c (concatenation) function: > v = c(1,2) >v [1] 1 2 • A number by itself is considered a vector of length 1 • No nesting > u=c(-4, v, 4) >u [1] -4 1 2 4 • Missing values c(1, NA, 4) Operations on vectors • Regular arithmetic operations apply (+, -, *, /, ^). Shorter vector is recycled to match needed length: > a=c(1,2) # becomes 1 2 1 > b=c(1,2,3) > r=3*a+b-1 Warning message: In 3 * a + b : longer object length is not a multiple of shorter object length >r [1] 3 7 5 • Other math functions can be applied element wise : sqrt, log, .. • Other functions: max, min, length, sum, prod, mean, var, sort > sort(c(4,3,7)) [1] 3 4 7 Logical operations • Operators <, <=, >, >=, ==, &, |, ! > a=c(2,4) > r1=a>3 [1] FALSE TRUE > r2=a>4 [1] FALSE FALSE > r1 & r2 [1] FALSE FALSE > r1 | r2 [1] FALSE TRUE > ! r1 [1] TRUE FALSE • Can be used in arithmetic operations, FALSE coerced to 0, TRUE to 1 > r1 + 1 [1] 1 2 > c(2,3) & c(0,1) [1] FALSE TRUE Generating vectors • : operator (high precedence in an expression) > a=3 > 1:a [1] 1 2 3 > 1:a+1 [1] 2 3 4 > 1:(a+1) [1] 1 2 3 4 • seq function > seq(from=2, to=4) # named arguments same as seq(2,4) [1] 2 3 4 > seq(to=4, from=2) [1] 2 3 4 > seq(from=2, length=3) [1] 2 3 4 • rep function > a=c(1,2) > rep(a, times=2) [1] 1 2 1 2 > rep(a, each=2) [1] 1 1 2 2 Manipulating vector data • Simple indexing: > a=c(2,3,8) > a[1] [1] 2 > a[5] [1] NA > a[-1] [1] 3 8 • More complex: > a[1:2] [1] 2 3 > a[a>2 & a<7] [1] 3 > a[c(1,1)] [1] 2 2 Matrices • Associating a dimension vector with a vector allows it to be treated by R as an array/matrix: > a=c(2,3,8) > attributes(a) NULL > dim(a) = c(3,1) >a [,1] [1,] 2 [2,] 3 [3,] 8 > attributes(a) $dim [1] 3 1 > matrix(c(1,2,3,4,5,6), nrow=2) [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 > matrix(c(1,2,3,4,5,6), nrow=2, byrow=TRUE) [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 Operations on matrices • • • • Addition/subtraction/element-wise multiplication : +,-,* Matrix multiplication : %*% Transpose : function t() e.g. t(matrix(c(1,2),nrow=1)) diag function: – If argument is a number we get identity matrix > diag(2) [,1] [,2] [1,] 1 0 [2,] 0 1 – If argument is a vector, we get diag matrix with elements of vector > diag(c(1,2)) [,1] [,2] [1,] 1 0 [2,] 0 2 – If argument is a matrix, we get the elements of its major diagonal >m [,1] [,2] [1,] 3 5 [2,] 4 6 > diag(m) [1] 3 6 Indexing • Similar to indexing vectors, except we have an indexing vector for every dimension: >m [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 > m[2,2] [1] 4 > m[c(2),c(2)] # indexing vectors [1] 4 > m[c(1,2),c(2)] # first 2 rows and 2nd column [1] 3 4 > m[c(1,2),c(2,3)] [,1] [,2] [1,] 3 5 [2,] 4 6 > m[c(1,2),c(2,3)]=0 >m [,1] [,2] [,3] [1,] 1 0 0 [2,] 2 0 0 > m[c(TRUE,FALSE),TRUE] # keep first line and all columns [1] 1 0 0