Computational Methods for Data Analysis – 2014/15 Lab 1: A Crash Intro to R Following very closely the first three chapters of Baayen’s Analyzing Linguistic Data , available from Baayen’s pages at http://www.sfs.uni-tuebingen.de/~hbaayen/publications/baayenCUPstats.pdf Getting started with R - starting R - the R console - New documents Basics (Baayen 1.1) - R as a calculator: > 2+ 3 - Assignment > X <- 2+3 - Comments # this is a comment Data: Tables (Data Frames) - library(languageR) head(verbs, n=10) - accessing elements of data frames (Baayen 1.3) o cell verbs[1, 5] o column verbs[, 5] X <- verbs[, 5] o row verbs[1, ] o column by column name verbs$LengthOfTheme - Changing a value o verbs[1,3] = “XXX” - Creating contingency tables out of frames: o xtabs( ~ RealisationOfRec + AnimacyOfRec, data = verbs) - writing out write.table(verbs, file = “dative3.txt”) - reading in data = read.table(“dative3.txt”) data = read.csv() Vectors - creating a vector: o rs = c (638, 799, 390, 569, 567) - using the vector to select items from a data frame: o verbs.rs = verbs[rs, ] - creating a vector of integers in sequence: o 1:5 - sorting a vector o sort(rs) - vectorization o v1 * v2 - What vectors are for o Basic statistics: w <- rbinom(500, 4, .3) mean(w) sd(w), Other data types (R Cookbook, ch. 5) - Factors o verbs.rs$AnimacyOfRec - Scalars o Really just vectors with one element o Fundamentally vectorized - Matrices o Vectors with dimension A <- 1:6 dim(A) print(A) dim(A) <- c(2,3) print(A) o diagonal matrices: diag(3) - Lists o Heterogeneous vectors o Indexed by index and by name (= like hash maps) o Data frames as lists Data exploration and plotting (Baayen, ch.2) - library(MASS) - mean(ratings$Length) - median(ratings$Length) - Histograms truehist(ratings$Length, xlab=”words frequency”, col=”grey”) - Saving plots - Plotting plot(ratings$Frequency,ratings$FamilySize) - Boxplots boxplot(lexdec$RT) Mosaic plots mosaicplot(verbs.xtabs, main=”dative”) Control and Functions - defining a function: foo = function (x) { …. } - control structures Warmup exercise - open a new document - define a function that returns an identity matrix - use getwd(), setwd() to set the working directory appropriately - use source to load the document in the R console PlotData exercise - Download ex1data1.txt from the web pages - Read the table into the variable data - data <- read.csv('ex1data1.txt', header=FALSE) - Set variable X to the first column of data, y to the second column o NB in R first column has index 1 - In a new document, define a function that plots two vectors x and y, and source it - Invoke your function on X and y