Common Command Statements Operators Equal: == Not equal: != Greater/Less than: > < Greater/Less than or equal: >= <= And: & Or: | Not: ! Basic Functions Means: mean(var) Median: median(var) Standard deviation: sd(var) Minimum Value: min(var) Maximum Value: max(var) Variable Definition Create a new vector/variable Syntax: x <- vector Ex: year <- c(2015, 2014, 2013, 2012) Change a value, or add more values Ex: year[5:7] <- c(2011, 2010, 2009) Add value labels Ex: gender <- 1,2 > names(gender) <- c(“female”, “male”) Erase a variable Syntax: var1 <- NULL Ex: Creating factor variables # These commands produce a nominal variable, eye color Ex: eyeColor <- c(“Brown”, “Blue”, “Green”, “Gray”) > factor_eyeColor <- factor(eyeColor) # These commands produce an ordinal variable, diploma Ex: diploma <- c(“None”, “HS”, “BA”, “HS”, “HS”, “MS”, “AAS” “PhD”, “MS”, “BA”, “None”) > factor_diploma <- factor(diploma, order=TRUE, levels=c(“None”, “HS”, “AAS”, “BA”, “MS”, “PhD”) Recoding variables There are several ways of recoding variables, here is one way that is useful for changing missing data coded “999” to R’s specification for missing data, “NA”: Syntax: newVar <- ifelse(var == oldValue, newValue, else) # When looking at earnings (from $0 to $100,000+), 999 (missing) will bias results if not classified as missing. In this case, all other values can remain the same. Ex: earningsData$rEarnings <- ifelse(earningsData$earnings == 999, NA, earningsData$earnings) Use the same strategy to dichotomize variables and add value labels simultaneously: Ex: earningsData$rich <- ifelse(earningsData$earnings > 80000, c(“Rich”), c(“n.Rich”)) To simply tell R how to recode a variable and add labels, you can do this: Ex: earningsData$incomeCat[earnings<20000] <- “Low” > earningsData$incomeCat[earnings>=20000 & earnings<80000] <- “Middle” > earningsData$incomeCat[earnings>=80000] <- “High” #Since we created a factor variable we should tell R that it is a factor for future analyses earningsData$incomeCat <- factor(earningsData$incomeCat) Missing data from STATA uses “.”, “.a”, “.b” and so on, which R will sometimes recognize (most numerical and factor variables) and sometimes not (string variables). We can recode these “.”s to NA like this: stataData$stringplot[which(stataData$string == “.”)] <- NA Getting Help help(functionName): Brings up help page example(functionName): Gives an example of the usage Running R Script You can save a record of commands in a plain text file to be called upon and run in R. To run the script, type: Syntax: Source(file.choose()) # To browse for your file that you want to execute Plotting Use the naming command to add labels to graphs Ex: earnings <- c(18, 35, 60) names(earnings) <- c(“High School”, “Bachelor’s”, “Ph.d”) barplot(earnings) For graphs where you specify both x and y axes: Ex: educ <- c(‘HS’, ‘BA’, ‘PhD’) loanDebt <- c(0, 30, 80) as.integer(educ) plot(loanDebt, earnings, pch=as.integer(educ)) legend(“topright”, c(“HS”, “BA”, PhD”), pch=1:3) Visualizing Matrices Contour Plot: Ex: contour(volcano) Three-Dimensional Plot: Ex: persp(volcano, expand=0.2) Handling Data Frames To tie our vectors into a more permanent structure (like an Excel file or STATA dataset), we create data frames. Keep in mind that the vectors included must all be the same length (e.g. you cannot tie together one vector with 200 observations and one with 300 observations). Ex: earningsData <- data.frame(earnings, loanDebt, educ) To recall what is in one of these columns, simply call on the variable in the dataset: Syntax: DataFrame$variable Ex: earningsData$loanDebt Sometimes we may want to look at only part of a dataset to get an idea of how things are coded. The command “head()”shows the first several observations in a data frame, while “tail()” shows the last observations. Ex: head(earningsData) Ex: tail(earningsData) To tell R to report the total number of observations, variables, a list of the variable names, and the data type of each variable (e.g. number, string), type “str()”. Ex: str(earningsData) Select elements of a data frame using []. Use this format to see what is in the first row and 4th column of the frame: Ex: earningsData[1, 4] To look at a full row (or column), omit the column (or row) number: Ex: earningsData[1, ] Ex: earningsData[, 4] Alternatively, you can use the variable name to select a column (here, we want the first 5 observations for “educ”): Ex: earningsData[1:5, “educ”] Sort data using the function “order()”. To sort observations by income (highest first, lowest last): # This first command assigns observations a ranking in descending order Ex: sortIncome <- order(earningsData$earnings, decreasing=TRUE) # The next command can be used to check the ordering, and is not necessary to do every time Ex: sortIncome[order(sortIncome)] # This final command creates a new data frame that is ordered according to the “sortIncome” row rankings Ex: decreasing_earningsData <- earningsData[sortIncome, ] If we get tired of incessantly typing “dataFrame$” to refer to variables, we can use the attach() and detach() functions to assign a single data frame to R’s search path: Ex: attach(earningsData) # Watch out as sometimes R will report that an object is “masked” by the “.GlobalEnv”. If so, this means the object appears in both the workspace and data frame. To get rid of the copy of the object in the workspace, type rm(var) When finished using an attached data frame, detach it using: Ex: detach(earningsData) To completely clear the workspace, type: Syntax: rm(list=ls()) Reading, Merging, and Subsetting data Read a comma-delimited file (CSV)) into a dataframe: Syntax: read.csv(“fileName1.csv”) Read in tab-delimited file into a dataframe, where the first line of the file contains variable names: Syntax: read.table(“fileName2.txt”, sep=”\t”, header=TRUE) Read in a data file which you do not know the file path to: Syntax: Data <- read.table(file.choose(),header=T) Merge two dataframes that share a single, common variable name: Syntax: mergedData <- merge(x= fileName1, y= fileName2) Create a subset of all data that meet a certain condition: Syntax: subsetData <- subset(dataFrame, subset=(argument)) Ex: Rich_folks <- subset(earningsData, subset= (earningsData$earnings > 80000)) Analyzing Data Data Description Frequencies are obtained using “summary()”. Make sure factor variables are coded as such. Ex: summary(factor_diploma) Correlation and Linear Regression Look to see if there appears to be a correlation between var1 and var2: Syntax: plot(indVar, depVar) Compute the correlation and significance: Syntax: cor.test(indVar, depVar) Fit a linear model (Linear Regression) Syntax: model1 <- lm(depVar ~ indVar) Add linear prediction to scatterplot: Syntax: abline(model1) Removing missing data to perform an analysis (R will not even give you the mean of a variable if data is missing): Option Syntax: , na.rm=TRUE Ex: summary(earnings, na.rm=TRUE) Saving the workspace, data frames, and exporting data Save everything that is in your workspace using “save.image()”, load it with “load.image()” Ex: save.image(‘myWorkspace.RData’) Ex: load.image(‘myWorkspace.RData’) Save just some objects (e.g. two data frames) and specify a file name for these using “save()”-- load in with load(): Ex: save(earningsData, USArrests, file=’earnArrests.rda’) Ex: load(‘earnArrests.rda’) Export a data frame in CSV format: Ex: earningsData <- write.csv(earningsData, file=“/Desktop/Earnings.csv”) A Couple Useful Packages To install user-written packages, simply type: > install.packages(“packageName”) > library(packageName) ggplot2: A powerful graphics program for R with great customizability and utility for many univariate and multivariate graph types. foreign: Allows the user to load data frames from other data packages (e.g. SPSS, SAS, STATA) into R. Ex: GSSjobs2006 <- read.dta(“/Desktop/GSSjob2006Data.dta”)