Introduction to R Workshop in Methods and Indiana Statistical

Welcome to the R intro Workshop

Before we begin, please download the

“SwissNotes.csv” and “cardiac.txt” files from the

ISCC website, under the R workshop (more info).

www.iub.edu/~iscc

Introduction to R

Workshop in Methods from the Indiana Statistical Consulting

Center

Thomas A. Jackson

February 15, 2013

Overview

The R Project for Statistical Computing http://cran.r-project.org

“R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now

Lucent Technologies) by John Chambers and Colleagues.

R can be considered as a different implementation of S.

There are some important differences, but much code written for S runs unaltered under R.”

- Description from CRAN Website

Benefits

R …

• is free

• is interactive: we can type something in and work with it

▫ How we analyze data can be broken into small steps

• is interpretative: we give it commands and it translates them into mathematical procedures or data management steps

• can be used in a batch: nice because it is documented

• is a calculator: it is unlike other calculators though because you can create variables and objects

Let’s Get R Started

• How to open R

→ Start Menu

→ Programs

→ Departmentally Supported

→ Stat/Math

→ R

Graphical User Interface (GUI)

Three Environments

• Command Window (aka Console)

• Script Window

• Plot Window

Command Window Basics

To quit: type q()

Save workspace image? Moves from memory to harddrive

Storing variable in memory

• <- , -> , or =

• a<- 5 stores the number 5 in the object “a”

• pi -> b stores the number π= 3.141593 in “b”

• x = 1 + 2 stores the result of the calculation (3) in “x”

• “=“ requires left-hand assignment

Try not to overwrite reserved names such as t, c, and pi!


Printing to output

• Calculations that are not stored print to output

> 3 + 5

[1] 8

• Type name to view stored object

> a

[1] 5

• Use print()

> print(a)

[1] 5

View objects in workspace

• objects() or ls()


Clearing the console (command window)

• Mac: Edit → Clear Console

• Windows: Edit → Clear Console or

• Mac: Alt + Command + L

• Windows: Ctrl + L

Removing variables from memory

• rm() or remove()

> x <- 4

> rm(x)

• rm(list = ls()) remove all variables

Script Window Basics

Saving syntax (code)

• Mac: File → New

• Windows: File → New Script

Documenting code: # Comments out everything on line behind

Running code from Script Window

• Mac: Apple + Enter

• Windows: F5 or Ctrl + r

Working Directory

Obtaining working directory

• getwd()

• Mac: Misc → Get Working Directory

• Windows: File → Change dir...

Changing working directory

• setwd()

• Mac: Misc → Change Working Directory

• Windows: File → Change dir...

Path Names

Specify with forward slashes or double backslashes

Enclose in single or double quotation marks

Examples

• setwd(“C:/Program Files/R/R-2.6.1”)

• setwd(‘C:\\Program Files\\R\\R-2.6.1’)

R Help

Helpful commands

• If you know the function name: help() or ?

> help(log)

> ?exp

• If you do not know the function name: help.search() or ??

> help.search(“anova”)

> ??regression

Documentation

Elements of a documentation file

• Function{Package}

• Description

• Usage: What your code should look like, “=“ gives default

• Arguments: Inputs to the function

• Details

• Value: What the function will return

• See Also: Related functions

• Examples

Online Resources

• CRAN Website: http://cran.r-project.org/

• R Seek: http://www.rseek.org/

• Quick-R tutorial: http://www.statmethods.net/

• R Tutor: http://www.r-tutor.com/

• UCLA: http://www.ats.ucla.edu/stat/r/

• R listservs

• Google

Google tip: include “[R]” (instead of just “R”) with search topic to help filter out non-R websites

Additional Packages

Over 2,500 listed on the CRAN website!

• Use with caution

• Initial download of R: base, graphics, stats, utils

1) Installing a package:

• Mac: Packages & Data → Package Installer

Use Package Search to locate and press ‘Install Selected’

• Windows: Packages → Install Packages

Locate desired package and press ‘OK’

• install.packages(“MASS”)

2) Using an installed package:

You MUST call it into active memory with library()

> library(MASS)

Data Structures

R has several basic types (or “classes”) of data:

• Numeric - Numbers

• Character – Strings (letters, words, etc.)

• Logical – TRUE or FALSE

• Vector

• Matrix

• Array

• Data Frame

• List

NOTE: There are other classes, but these are most common. Understanding differences will save you some headache.

Data Structures

• Find class of data

• Unknown class: class()

• Check particular class: is.“classname”()

> a <- 5

> class(a)

[1] “numeric”

> is.character(a)

[1] FALSE

Change class: as.classname()

> as.character(a)

[1] “5”

Vectors

Combine items into vector: c()

> c(1,2,3,4,5,6)

[1] 1 2 3 4 5 6

Repeat number of sequence of numbers: rep()

> rep(1,5)

[1] 1 1 1 1 1

> rep (c(2,5,7), times = 3)

[1] 2 5 7 2 5 7 2 5 7

Vectors

Sequence generation: seq()

> seq(1,5)

[1] 1 2 3 4 5

> seq(1,5, by = .5)

[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Try 1:10 or 10:1

Matrices

Create matrix: matrix()

• 6 x 1 matrix: matrix(1:6, ncol = 1)

• 2 x 3 matrix: matrix(1:6, nrow =2, ncol =3)

• 2 x 3 matrix filling across rows first: matrix(1:6, nrow = 2, ncol = 3, byrow = TRUE)

Create matrix of more than two dimensions

(array): array()

Lists

Create a list: list()

• Holds vectors, matrices, arrays, etc. of varying lengths

• Objects in the list can be named or unnamed

> list(matrix(0, 2, 2), y = rep(c(“A”, “B”), each = 2))

[[1]]

[,1] [,2]

[1,] 0

[2,] 0

0

0

$y

[1] “A” “A” “B” “B”

Data Frame: specialized list that holds variables of same length

Data Frames

Create a data frame: data.frame()

• Like a matrix, holds specified number of rows and columns

> x <- 1:4

> y <- rep(c(“A”, ”B”), each = 2)

> data.frame(x,y) x y

1 1 A

2 2 A

3 3 B

4 4 B

• Unnamed variables get assigned names

> data.frame(1:2, c(“A”, “B”))

X1.2 c..A….B..

1 1 A

2 2 B

Basic Operations

• Arithmetic: +, -, *, /

• Order of operations: ()

• Exponentiaition: ^, exp()

• Other: log(), sqrt

• Evaluate standard Normal density curve, at x = 3

> x <- 3

> 1/sqrt(2*pi)*exp(-(x^2)/2)

[1] 0.004431848

Vectorization

R is great at vectorizing operations

• Feed a matrix or vector into an expression

• Receive an object of similar dimension as output

For example, evaluate at x = 0,1,2,3

> x <- c(0,1,2,3)

> 1/sqrt(2*pi)*exp(-(x^2)/2)

[1] 0.39842280 0.241970725 0.053990967

0.004431848

Logical Operations

• Compare: ==, >, <, >=, <=, !=

> a <- c(1,1,2,4,3,1)

> a == 2

[1] FALSE FALSE TRUE FALSE FALSE

FALSE

• And: & or &&

• Or: | or ||

• Find location of TRUEs: which()

> which(a == 1)

[1] 1 2 6

Subsetting

> a <- 1:5

> b <- matrix(1:12,nrow = 3)

Use Square brackets []

• Pick range of elements: a[1:3]

• Pick particular elements: a[c(1,3,5)]

• Do not include elements: a[-c(1,4)]

Subsetting (cont.)

Use commas in more than on dimension (matrices

& data frames)

• Pick particular elements: B[1:2,2:4]

• Give all rows and specified columns: B[,1:2]

• Give all columns and specified rows: B[1:2,]

• Note: B[2] coerces into a vector then gives specified element

Reading External Data Files

SwissNotes.csv Data set

• Complied by Bernard Flury

• Contains measurements on 200 Swiss Bank

Notes

• 100 genuine and 100 counterfeit notes

Reading External Data Files (cont.)

Most general function: read.table() read.table(file,header=FALSE,sep = “”,…)

• Creates a data frame

• File name must be in quotes, single or double

• File name is case sensitive

• Include file name extension if data not in working directory

> read.table(“C:/Users/jacksota/Desktop/SwissNotes.csv”, T,“,”)

Don’t know the file extension? Try: file.choose()

> read.table(file.choose(), header = TRUE, sep = ”,”)

• sep defines the separator, e.g. “,” or “\t” or “”

• header indicates variable names should be read from first row

Reading External Data Files

For comma delimited files: read.csv()

For tab delimited files: read.delim()

For Minitab, SPSS, SAS, STATA, etc. data:

foreign package

• Contains functions to read variety of file formats

• Functions operate like read.data()

• Contains functions for writing data into these file formats

Data Frame Hints

• Identify variable names in data frame: names()

> data1 <- read.table(“SwissNotes.csv”, sep=“,”, header =TRUE)

> names(data1)

[1] “Length” “LeftHeight” “RightHeight” “LowerInner.Frame”

[5] “UpperInner.Frame” “Diagonal” “Type”

Assign name to data frame variables

> names(data1) <- c(“Length”, “LeftHeight”, “RightHeight”,

“LowerInner..Frame”, “UpperInner.Frame”, “Diagonal”, “Type”)

Note: names are strings and MUST be contained in quotes

Data Frame Hints (cont.)

Create objects out of each data frame variable: attach()

In the Swiss Note data, to refer to Type as its own object

> attach(data1)

> Type

[1] Genuine Genuine Genuine ….

Data Frame Hints (cont.)

Remove attached objects from workspace: detach()

> detach(data1)

> Type

Error: object “Type” not found

Note: Type is still part of original data frame, but is no longer a separate object.

plot() function

plot() is the primary plotting function

Calling plot will open a new plotting window

Documentation: ?plot

For complete list of graphical parameters to manipulate: ?par

plot() function

Let’s visualize the SwissNotes.csv data.

After loading the data into R, attach the data frame using attach(data).

Let’s try a scatter plot of LeftHeight by RightHeight.

>plot(LeftHeight, RightHeight)

plot() function

Change symbols: Option pch=.

See ?par for details.

>plot(LeftHeight,RightHeight,pch=2)

plot() Function

Change symbol color: Option col=

Specify by number or by name: col=2 or col=“red”

Hint: Type palette() to see colors associated with number

Type colors() to see all possible colors

> plot(LeftHeight, RightHeight, col=“red”)

What types of points can we get?

plot() Function

Change plot type: Option type =

“p” for points

“l” for lines

“b” for both

“c” for lines part alone of “b”

“o” for both overplotted

“h” for histogram like (or high-density) vertical lines

“s” for stair steps

“S” for other steps, see Details below

“n” for no plotting

Plot() Function

Points with lines…works better on sorted list of points

>plot(LeftHeight,RightHeight,type=“o”)

Scatterplots for Multiple Groups

Use plot() with points() to plot different groups in same plot

Genuine notes vs. Counterfeit notes

>plot(LeftHeight[Type==“Genuine”],Rightheight[Type==“Genuine”], col=“red”)

>points(LeftHeight[Type==“Counterfeit”],RightHeight[Type==“Counterfeit”]

,col=“blue”)

Axis Labels and Plot Titles

The plot() command call has options to

• Specify x-axis label: xlab = “X Label”

• Specify y-axis label: ylab = “Y Label”

• Specify plot title: main = “Main Title”

• Specify subtitle: sub = “Subtitle”

Axis Labels and Plot Titles

>plot(LeftHeight[Type==”Genuine”],RightHeight[Type==“Genuine”], col=“red”,main=“Plot of Bank Note Heights”,sub=“Measurements are in mm”,xlab=“Height of Left Side”,ylab=“Height of Right Side”)

>points(LeftHeight[Type==“Counterfeit”],

RightHeight[Type=“Counterfeit”],col=“blue”)

Legends

 legend(“topleft”,c(“Genuine Notes”,

”Counterfeit Notes”),pch=c(21,21),col=c(“red”,”blue”))

Adding Lines

To add straight lines to plot: abline() abline() refers to standard equation for a line: y = bx + a

• Horizontal line: abline(h= )

• Vertical Line: abline(v= )

• Otherwise: abline(a= , b= ) or abline(coef=c(a,b))

Adding Lines

> abline(coef=c(21.7104,0.8319))

Histograms

Histograms are another popular plotting option.

> hist(Length)

pairs() Function

Using the SwissNote Data

> pairs(swiss)

Boxplots

To create boxplots: boxplot()

Specify one or more variables to plot.

> boxplot(swiss$Length)

> boxplot(swiss[,2:3])

Boxplots

Use a formula specification for side-by-side boxplots.

Note: boxplot() has many options, e.g. notches. See

?boxplot.

> boxplot(Length~Type,notch=TRUE,data=swiss)

Mean or Average

• Mean()

> mean(swiss[,”Length”])

> mean(swiss)

• rowMeans()

> rowMeans(swiss[,1:6])

• colMeans

> colMeans(swiss[,7])

Variability

• Variance: var()

> var(swiss[,”Length”])

> var(swiss)

• Covariance()

> cov(swiss)

• Correlation()

> cor(swiss[,1:6])

Five-number Summary

>summary(swiss[1:3])

Length

Min. :213.8

1st Qu.:214.6

Median :214.9

Mean :214.9

3rd Qu.:215.1

Max. :216.3

LeftHeight

Min. :129.0

1st Qu.:129.9

Median :130.2

Mean :130.1

3rd Qu.:130.4

Max. :131.0

RightHeight

Min. :129.0

1st Qu.:129.7

Median :130.0

Mean :130.0

3rd Qu.:130.2

Max. :131.1

Creating Tables

table() produces crosstabs of factors or categorical variables

Using the cardiac data:

> table(cardiac[,7:9])

, , newMI = 0 chestpain gender 0 1

F 6 10

M 4 8

, , newMI = 1 chestpain gender 0 1

F 100 222

M 62 146

Univariate t-tests

t.test() produces 1- and 2-sample (paired or independent) ttests.

• 1-sample t-test

> t.test(x,alternative=“two.sided”,mu=0,conf.level=0.95)

• 2 independent samples t-test

> t.test(x,y,alternative=“two.sided”,mu=0,paired=FALSE,

• paired t-test conf.level=0.95)

> t.test(x,y,alternative=“two.sided”,mu=0,paired=TRUE, var.equal=TRUE,conf.level=0.95)

2 Independent Samples t-test

x: diagonal measurements for Genuine bank notes y: diagonal measurements for Counterfeit bank notes

> x = swiss[Type==“Genuine”,”Diagonal”]

> y = swiss[Type==“Counterfeit”,”Diagonal”]

> t.test(x,y,alternative=“greater”,mu=0, paired=FALSE,var.equal=TRUE)

2 Independent Samples t-test

> t.test(x,y,alternative=“greater”,mu=0, paired=FALSE,var.equal=TRUE)

Two Sample t-test data: x and y

T = 28.9149, df = 198, p-value < 2.2e-16 alternative hypothesis: true difference in means is greater than

0

95 percent confidence interval:

1.948864

Inf sample estimates: mean of x mean of y

141.517

139.450

Generating Random Numbers

R contains functions for generating random numbers from many well-known distributions.

Random number from standard normal distribution:

> rnorm(1,mean=0,sd=1)

[1] 0.5308293

Vector of random numbers from uniform distribution:

> runif(3, min=0, max=1)

[1] 0.6578880 0.3261863 0.3093383

To reproduce results: set.seed()

Function Basics

if() statement

> n = rnorm(1)

> if(n < 0){ n = abs(n)

} if() statement with else()

> n = rnorm(1)

>if (n < 0){ n = abs(n)

} else{n = 0}

Function Basics

for() loop

> temp = rep(0,10)

> for (i in 1:10){ temp[i] = i+1

}

> temp

[1] 2 3 4 5 6 7 8 9 10 11

Function Basics

while() loop

> n = 1

> while (n < 10 ){ n = n+1

}

Creating Functions

test.function = function(input arguments){ commands to execute

}

Creating Functions

For example, let’s define a new function average to find the average of a set of numbers.

average = function(x){ n = length(x) average = sum(x)/n print(average)

}

Sourcing

After writing a function in a script file, bring it into working memory using source().

Source(“pathname/test.function.R”)

Introduction to R Workshop in Methods and Indiana Statistical

Welcome to the R intro Workshop

Introduction to R

Overview

Benefits

Let’s Get R Started

Graphical User Interface (GUI)

Command Window Basics

Command Window Basics

Command Window Basics

Script Window Basics

Working Directory

Path Names

R Help

Documentation

Online Resources

Additional Packages

Data Structures

Data Structures

Vectors

Vectors

Matrices

Lists

Data Frames

Basic Operations

Vectorization

Logical Operations

Subsetting

Subsetting (cont.)

Reading External Data Files

Reading External Data Files (cont.)

Reading External Data Files

Data Frame Hints

Data Frame Hints (cont.)

Data Frame Hints (cont.)

plot() function

plot() function

plot() function

plot() Function

What types of points can we get?

plot() Function

Plot() Function

Scatterplots for Multiple Groups

Axis Labels and Plot Titles

Axis Labels and Plot Titles

Legends

Adding Lines

Adding Lines

Histograms

pairs() Function

Boxplots

Boxplots

Mean or Average

Variability

Five-number Summary

Creating Tables

Univariate t-tests

2 Independent Samples t-test

2 Independent Samples t-test

Generating Random Numbers

Function Basics

Function Basics

Function Basics

Creating Functions

Creating Functions

Sourcing

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib