Common Command Statements Operators Equal: == Not equal

advertisement
Common Command Statements
Operators
Equal: ==
Not equal: !=
Greater/Less than: > <
Greater/Less than or equal: >= <=
And: &
Or: |
Not: !
Basic Functions
Means: mean(var)
Median: median(var)
Standard deviation: sd(var)
Minimum Value: min(var)
Maximum Value: max(var)
Variable Definition
Create a new vector/variable
Syntax: x <- vector
Ex: year <- c(2015, 2014, 2013, 2012)
Change a value, or add more values
Ex: year[5:7] <- c(2011, 2010, 2009)
Add value labels
Ex: gender <- 1,2
>
names(gender) <- c(“female”, “male”)
Erase a variable
Syntax: var1 <- NULL
Ex:
Creating factor variables
# These commands produce a nominal variable, eye color
Ex: eyeColor <- c(“Brown”, “Blue”, “Green”, “Gray”)
>
factor_eyeColor <- factor(eyeColor)
# These commands produce an ordinal variable, diploma
Ex: diploma <- c(“None”, “HS”, “BA”, “HS”, “HS”, “MS”,
“AAS” “PhD”, “MS”, “BA”, “None”)
>
factor_diploma <- factor(diploma, order=TRUE,
levels=c(“None”, “HS”, “AAS”, “BA”, “MS”, “PhD”)
Recoding variables
There are several ways of recoding variables, here is one
way that is useful for changing missing data coded “999” to
R’s specification for missing data, “NA”:
Syntax: newVar <- ifelse(var == oldValue, newValue, else)
# When looking at earnings (from $0 to $100,000+), 999
(missing) will bias results if not classified as missing.
In this case, all other values can remain the same.
Ex: earningsData$rEarnings <- ifelse(earningsData$earnings
== 999, NA, earningsData$earnings)
Use the same strategy to dichotomize variables and add
value labels simultaneously:
Ex: earningsData$rich <- ifelse(earningsData$earnings
> 80000, c(“Rich”), c(“n.Rich”))
To simply tell R how to recode a variable and add labels,
you can do this:
Ex: earningsData$incomeCat[earnings<20000] <- “Low”
> earningsData$incomeCat[earnings>=20000 &
earnings<80000] <- “Middle”
> earningsData$incomeCat[earnings>=80000] <- “High”
#Since we created a factor variable we should tell R that
it is a factor for future analyses
earningsData$incomeCat <- factor(earningsData$incomeCat)
Missing data from STATA uses “.”, “.a”, “.b” and so on,
which R will sometimes recognize (most numerical and factor
variables) and sometimes not (string variables). We can
recode these “.”s to NA like this:
stataData$stringplot[which(stataData$string == “.”)] <- NA
Getting Help
help(functionName): Brings up help page
example(functionName): Gives an example of the usage
Running R Script
You can save a record of commands in a plain text file to
be called upon and run in R. To run the script, type:
Syntax: Source(file.choose())
# To browse for your file that you want to execute
Plotting
Use the naming command to add labels to graphs
Ex: earnings <- c(18, 35, 60)
names(earnings) <- c(“High School”, “Bachelor’s”, “Ph.d”)
barplot(earnings)
For graphs where you specify both x and y axes:
Ex: educ <- c(‘HS’, ‘BA’, ‘PhD’)
loanDebt <- c(0, 30, 80)
as.integer(educ)
plot(loanDebt, earnings, pch=as.integer(educ))
legend(“topright”, c(“HS”, “BA”, PhD”), pch=1:3)
Visualizing Matrices
Contour Plot:
Ex: contour(volcano)
Three-Dimensional Plot:
Ex: persp(volcano, expand=0.2)
Handling Data Frames
To tie our vectors into a more permanent structure (like an
Excel file or STATA dataset), we create data frames. Keep
in mind that the vectors included must all be the same
length (e.g. you cannot tie together one vector with 200
observations and one with 300 observations).
Ex: earningsData <- data.frame(earnings, loanDebt, educ)
To recall what is in one of these columns, simply call on
the variable in the dataset:
Syntax: DataFrame$variable
Ex: earningsData$loanDebt
Sometimes we may want to look at only part of a dataset to
get an idea of how things are coded. The command
“head()”shows the first several observations in a data
frame, while “tail()” shows the last observations.
Ex: head(earningsData)
Ex: tail(earningsData)
To tell R to report the total number of observations,
variables, a list of the variable names, and the data type
of each variable (e.g. number, string), type “str()”.
Ex: str(earningsData)
Select elements of a data frame using []. Use this format
to see what is in the first row and 4th column of the
frame:
Ex: earningsData[1, 4]
To look at a full row (or column), omit the column (or row)
number:
Ex: earningsData[1, ]
Ex: earningsData[, 4]
Alternatively, you can use the variable name to select a
column (here, we want the first 5 observations for “educ”):
Ex: earningsData[1:5, “educ”]
Sort data using the function “order()”. To sort
observations by income (highest first, lowest last):
# This first command assigns observations a ranking in
descending order
Ex: sortIncome <- order(earningsData$earnings,
decreasing=TRUE)
# The next command can be used to check the ordering, and
is not necessary to do every time
Ex: sortIncome[order(sortIncome)]
# This final command creates a new data frame that is
ordered according to the “sortIncome” row rankings
Ex: decreasing_earningsData <- earningsData[sortIncome, ]
If we get tired of incessantly typing “dataFrame$” to refer
to variables, we can use the attach() and detach()
functions to assign a single data frame to R’s search path:
Ex: attach(earningsData)
# Watch out as sometimes R will report that an object is
“masked” by the “.GlobalEnv”. If so, this means the object
appears in both the workspace and data frame. To get rid of
the copy of the object in the workspace, type rm(var)
When finished using an attached data frame, detach it
using:
Ex: detach(earningsData)
To completely clear the workspace, type:
Syntax: rm(list=ls())
Reading, Merging, and Subsetting data
Read a comma-delimited file (CSV)) into a dataframe:
Syntax: read.csv(“fileName1.csv”)
Read in tab-delimited file into a dataframe, where the
first line of the file contains variable names:
Syntax: read.table(“fileName2.txt”, sep=”\t”, header=TRUE)
Read in a data file which you do not know the file path to:
Syntax: Data <- read.table(file.choose(),header=T)
Merge two dataframes that share a single, common variable
name:
Syntax: mergedData <- merge(x= fileName1, y= fileName2)
Create a subset of all data that meet a certain condition:
Syntax: subsetData <- subset(dataFrame, subset=(argument))
Ex: Rich_folks <- subset(earningsData, subset=
(earningsData$earnings > 80000))
Analyzing Data
Data Description
Frequencies are obtained using “summary()”. Make sure
factor variables are coded as such.
Ex: summary(factor_diploma)
Correlation and Linear Regression
Look to see if there appears to be a correlation between
var1 and var2:
Syntax: plot(indVar, depVar)
Compute the correlation and significance:
Syntax: cor.test(indVar, depVar)
Fit a linear model (Linear Regression)
Syntax: model1 <- lm(depVar ~ indVar)
Add linear prediction to scatterplot:
Syntax: abline(model1)
Removing missing data to perform an analysis (R will not
even give you the mean of a variable if data is missing):
Option Syntax: , na.rm=TRUE
Ex: summary(earnings, na.rm=TRUE)
Saving the workspace, data frames, and exporting data
Save everything that is in your workspace using
“save.image()”, load it with “load.image()”
Ex: save.image(‘myWorkspace.RData’)
Ex: load.image(‘myWorkspace.RData’)
Save just some objects (e.g. two data frames) and specify a
file name for these using “save()”-- load in with load():
Ex: save(earningsData, USArrests, file=’earnArrests.rda’)
Ex: load(‘earnArrests.rda’)
Export a data frame in CSV format:
Ex: earningsData <- write.csv(earningsData,
file=“/Desktop/Earnings.csv”)
A Couple Useful Packages
To install user-written packages, simply type:
> install.packages(“packageName”)
> library(packageName)
ggplot2: A powerful graphics program for R with great
customizability and utility for many univariate and
multivariate graph types.
foreign: Allows the user to load data frames from other
data packages (e.g. SPSS, SAS, STATA) into R.
Ex:
GSSjobs2006 <- read.dta(“/Desktop/GSSjob2006Data.dta”)
Download