Controlling and Managing Your R Workspace This guide provides an overview of the R Studio Workspace—including its basic logic and structure. The goal is to develop an understanding of how to best control and manage your Workspace so that you can avoid or quickly diagnose several common errors. Basic Overview The R Environment consists of all the files necessary for running the R Program as well as data sets and other objects that you have created or loaded into your Workspace. These files can be broken down into three basic types: 1. The base packages that run all the standard analyses that we use in this course. These files are installed automatically when you first download and install the R program. 2. Additional packages you can install on your own and which allow for more advanced statistical analysis or additional commands. 3. The datasets, functions and other objects you create or import. R-Studio Console The four default windows in R-Studio are: 1. R scripts (top left) – this is where you’ll type the code you want to save. You can have multiple scripts open at the same time. 2. Environment/History windows (top right) – the Environment tab gives you the list of datasets and variables you’ve assigned and the History tab gives you the list of commands you’ve recently executed. 3. Console (bottom left) 4. Bottom Right Window – Five tabs: Files (shows all the objects present in your workspace), Plots (where your plots are displayed), Packages (shows the packages you’ve installed and loaded (checked)), Help (can use to help clarify syntax, learn about specifications/variations to commands), and Viewer for viewing local web content. Getting Started Your working directory tells R-Studio where the files you’ll be accessing are stored. In order to set your working directory: either go to ‘Session’ tab at the top of the screen and click on ‘set working directory’ which takes you to windows explorer to choose your directory, or use the code: Windows: setwd(“C:/MyDocuments/….”). Macs: setwd("/Users/Rachel/….") It is also where everything you save will go by default (e.g., plots or pdfs), unless you specify the path in the filename. Importing data After you’ve set your working directory, you’re ready to import your data. Multiple file types can be imported, but comma separated value (.csv) files are most common. When importing data, it’s best to give it a (short) name. When assigning names to datasets and variables (any object), you use the <symbol: Name <- read.csv(“File_name.csv”) The read.csv command tells R it’s reading a .csv file. If you’re importing another type of file you’ll use a different read command (i.e. read.dcf, read.table, etc.) Example: import InsectSprays – a pre-existing data frame in R using: data(InsectSprays) Note – you only use “data()” with datasets that come with R. For everything else you read it in using read.csv() or read.table(). *If using read.table(), be sure you assign your separator using sep=” “ (for spaces), sep=”,” (for commas), etc. *Be sure you don’t name objects the same as functions (e.g. data<-read.csv()). If you decide you want to rename your dataset or make a copy, be sure you have the new name on the left side of the arrow. To have a look at the data, we can use View(InsectSprays) or just type the dataset name InsectSprays. You can also view portions of the data using head(InsectSprays) or tail(InsectSprays). Head prints the first 6 lines by default, and tail does the same thing, but from the end of the dataset. You can also specify a specific number of rows – e.g. head(InsectSprays, 10). Syntax Syntax is everything in R. You may have your files loaded, the code written and ready to go, but a comma out of place can set you back an hour trying to get it to work. Capitals vs. lower case letters are also very different, so ‘Sample.csv’ is totally different from ‘sample.csv’. R-Studio helps a lot with this. When you type in a command it often automatically creates a close bracket for your open bracket, and if you forget to close it somehow, it will start your next line indented. However, it only goes so far. If you’re having trouble getting code to work, take the time to look over your syntax and make sure everything is right. Remember this when you’re creating column names in your csv. All spaces will be converted to periods when imported. It’s often best to look at the data set after you’ve imported it – you can do so by clicking on the name in the ‘Environment’ window (top right) or by using the code: View(InsectSprays) In your script window (top left), you will write/save your code. You can write notes to yourself using # sign: # note to self: R is awesome # subset of InsectSprays subset <- InsectSprays[1:79, ] # everything written after the # sign is a note and will be read as text (so won’t run) You can also create indexes by using four # signs: e.g. #### anovas section #### There is a small drop down window in the bottom left corner of the script window that allows you to easily move from one index to another. To run your code from the script window you can either hit the run button at the top right, or simply hit ctrl + R in windows, or command enter in mac Editing Data You can make minor edits using edit(InsectSprays), but try to avoid editing your original data frame, and it’s better to show the code of what you changed so you can track any changes you made. R most efficiently refers to data by position. By using square brackets you can designate a specific row and/or column (the default is all, so if you don’t assign one, it uses all the rows/columns). In general: InsectSprays[row,column] For example, InsectSprays[1, 2] refers to the cell in the first row, second column of InsectSprays. InsectSprays[ , 2] refers to the entire second column (by leaving the row assignment blank it includes all rows). InsectSprays[3, 5] <- 8 will change the third row of the fifth column to the number 8. InsectSprays[ , 5] <- InsectSprays[ , 5]*10 will multiply every cell in the fifth column by 10, and replace it with that number. Assigning Variables To assigning variables, you use the same <- as when you assigned your data frame a name: variable1 <- sqrt(InsectSprays$column) This creates a vector of the square root of whatever data column you select. You can now use this transformed data in a t-test or graph, etc. However, it is often better practice to create a new variable within your data frame, rather than in your workspace, which can get overloaded: data$new_column <- sqrt(InsectSprays$column) Subsetting Data To remove any objects from your workspace: rm(list=ls()) to remove any objects you’ve created: rm(object_name) If you want to subset a portion of your data, there are a few ways: subset1 <- subset(InsectSprays, InsectSprays$column==”value”, drop = TRUE) Subsets your data for all rows that have the specified value in the column subset2 <- InsectSprays[1:79, ] For rows 1 through 79 of your data frame (includes all columns) subset3 <- InsectSprays[ ,1:79] For columns 1 through 79 of your data frame (includes all rows) Example: sprayA <- subset(InsectSpray, InsectSpray$spray==”A”, drop = TRUE) Common Errors + - if you’re typing a line of code in the R console or trying to run something from your script, you may get a + sign instead of your desired output. This generally means you haven’t closed a bracket, so R is waiting for the rest of the command. You’ll need to go back and correct your code. Object not found – this means that you are telling R to use something (e.g. a column) it can’t find. e.g. What is the mean of the sprays in the InsectSpray data? Try: 1. mean(count) 2. mean(InsectSprays$count) With this syntax we tell R where to find the count data. Could not find function – this most often happens when you don’t have the right package loaded. e.g. Quick plot of the spray data (not using base graphics) Try: 1. qplot(InsectSprays$spray, InsectSprays$count) 2. install.packages(“ggplot2”) library(ggplot2) qplot(InsectSprays$spray, InsectSprays$count) Basic Functions Some basic functions you may need: mean() sd() log() sqrt() sd(x)/sqrt(length(x)) mean standard deviation log square root standard error try: mean(dataset$variable, na.rm=T) For our example: mean(InsectSprays$count, na.rm = TRUE) na.rm=TRUE tells R not to include blanks – otherwise the mean will be N/A. Graphing: plot(dataset$predictor, dataset$response) or: with(InsectSprays, plot(spray, count)) This second option tells R, I am going to use InsectSprays to do the following commands, so look there for my data. Normality tests: 1. shapiro.test(InsectSprays$column) 2. qqnorm(InsectSprays$column) T-test: t.test(response,predictor) Anova: 1. Create a linear model: lm1 <- lm(response~predictor, data=InsectSprays) summary(lm1) 2. Conduct anova: summary.aov(lm1) Also good to know: class(InsectSprays$column) tells you if it’s numeric, character, factor etc – nice str(InsectSprays) tells you lots about the data – useful to see if you have a data frame or a list. If it’s a list pretty much nothing will work, so it’s good to know. Important to check before running any tests. summary(InsectSprays$column) gives you mean, quartiles, etc. for the column levels(InsectSprays$column) will tell you all the levels of the factor names(InsectSprays) will tell you all the column names. Useful note: you can use Tab to autocomplete your command in both your script and the console. If you want to run a t.test() for example, type t and then hit Tab – it will give you a list of possible commands that start with t (there will be a lot). If you type t. then Tab you’ll get a shorter list. This is a nice way to help you find the correct command without having to search the help tab each time. Loading Packages You may end up needing to do analyses that are not included in the default R library. In order to install packages, you’ll need to know the name of the package you’re interested in (Google helps). R-Studio makes it rather easy to load packages through the packages tab in the bottom right window - just click on the check box next to the package you want and it will load. It’s important to note that loading packages is a two-step process. First you install the package via install.packages(package_name) You only need to install a package one, but each time you want to use the package, you have to load it (either by checking the box in your packages tab, or using either: library(package_name) or require(package_name) Projects Creating projects is a good way to keep your analyses grouped together appropriately (by project, chapter, etc.). To create a project, just go to the file tab (top left) and select ‘New Project’. By creating projects it allows you to switch between workspaces without having to re-load everything or cluttering your environment with too many objects. Accessing help within and outside of R If you need help with a command, you can search the help tab on the bottom right window, run help(command) in the console or put a question mark before it?command, or run an example of the command via example(command). The R Book – Second Edition, Michael Crawley (in PDF form you can easily ctrl+F to search for what you’re looking for) Google! - R-project, R bloggers, stack overflow, CRAN-R project, Quick-R, rseek.org, code school, other statistics departmental websites, etc. Help tab within R – takes a while to learn how to read, but can be helpful, especially with syntax CRAN Task view - http://cran.r-project.org/web/views/ Practice: 1. 2. 3. 4. 5. Import mtcars Look at data; change Mazda RX4 to 8 cylinder Find min, max, and mean of mpg, hp, and qsec Create subset of 8 cylinder cars; find min, max, and mean of mpg, hp, and qsec again Check for normality of qsec and run a t test on the effect of cylinder size on qsec Extra Practice – R Script Liv sent you