Lab 1 – Introduction to R Date: August 23, 2011 Assignment and Report Due Date: August 30, 2011 Goal: The purpose of this lab is to get R running on your machines and to get you familiar with the basics of how to run R. You should also become familiar with how to submit assignments for this lab using Blackboard. There will be multiple sets of instructions for this lab based on the operating system you choose to use for this course. 1. Install R on your personal computers. Follow the instructions in the handout “Installing R.” 2. Launch R. WINDOWS AND MAC Double Click the R icon to launch R. Campus Computers Log In to your math account: On the blue welcome screen, make sure the Java desktop is selected. To do this, click and hold the options button, and select session -> Java Desktop System, Release #. Do not release the mouse button until you’ve highlighted the Java... option. Java Desktop System, Release # should now appear beneath the user name field. Note: The default session option is User’s Last Desktop, so you should only have to do this step this one time. Enter your user name in the appropriate field, and press OK. The welcome box should now say Welcome username. Enter your password. (The field will appear to remain empty.) Press OK. Open a new terminal window by selecting Launch -> Applications -> Utilities -> Terminal. Now launch R by simply typing: > R ** You must press enter to execute a command in the terminal. Note: Launching R on the campus computers does not open a new window as it runs directly from the terminal. Your terminal window now becomes your R console where you will be able to type R commands. At this point everyone should have R running on their machines. 3. Getting familiar with basic R commands R has all the functions of a basic calculator plus many more. Let’s try a few by typing some basic calculations into the R console. YOU MUST PRESS ENTER or RETURN AFTER EACH LINE TO EXECUTE IT! > 2+3 > 2-3 > 2*3 > 2/3 R can also store values inside of variables as shown below. > a = 2 This command assigns the value of 2 to the variable a. When assigning variables R does not print out the assignment just made. In order to see what is stored in a variable simply type the variable name. > a The output is 2 as that is what is stored in a. Now try calling a variable that we have not yet defined and see what happens. > b This gives an error as we have not defined what value we want b to have. >b=3 >b Now b has been assigned the value of 3. R can also do calculations with variables. Try adding “a” and “b” together. >a+b It is important to realize that variables can be named anything you like. > cat = 7 > banana = -33 > cat > banana As you can see, R recognizes that the variable “cat” has been assigned a value of 7 and “banana” has been assigned a value of 33. What does cat + banana equal? > cat + banana Variable names can include numbers or some other symbols. Again, we can do calculations with any variables we have defined. > w1 = 1 > w2 = -2.4 > w3 = 3.83 > math_is_fun = 1000000 > math_is_fun*cat+a-w1 What if we want to assign a variable to have the same value as something else? It’s simple as long as we remember the rules for defining variables we went over earlier. Let’s make a new variable called “LOL” and give it the same value as “b”. > LOL = b “LOL” now has the same value as “b”. We can also reassign the value of a variable at any time we’d like. Let’s change the values of math_is_fun and banana. > math_is_fun = 1234567 > banana = -8 Now let’s try something a little harder. > sqrt(w3/(math_is_fun^banana)) Try giving values to any new variables you can think of and do some simple calculations with them. HELPFUL HINT: Instead of retyping things you have already typed, you can use the arrow keys to scroll through previous lines of code. Hit the up arrow and see what happens. 4. Functions and Plots Recall that functions have an input variable (or independent variable, often x) and an output (often f(x), or sometimes y). Remember the slope-intercept form of a line, f(x) = mx + b where m is the slope and b is the y-intercept. We are going to use R to plot the line f(x) = 2x + 1. Therefore we need to define the variables “m” and “b”. > m = 2 > b = 1 Now we need to learn how to represent the independent variable “x” in R. We will do this by defining “x” as a vector of the values for which we would like to plot f(x). A vector is simply a collection of numbers, i.e. 1, 2, 3, 4, 5. In R, to define x as the vector (1, 2, 3, 4, 5) we use the following code. > x = c(1,2,3,4,5) You can think of the command c() as “combining” the given values into a vector. > x HELPFUL HINT: To define a vector of ordered integers like we just did with the c() command, you can simply type > x = 1:5 We can now define another vector (set of numbers), f, which has the output values for each input value specified in “x”. > f = m*x + b > f The values you see in the vector f are the outputs for the corresponding input values of x. Let’s test it to make sure. The first value in “x” is 1. Let’s see what the output value is for f(1). We do this by substituting 1 for x in our equation. > m*1 + b As you can see, this is the same as the first number in f! To plot the points given by x and f (input, output) we use the plot command in R. > plot(x,f) This opens a graphics display and creates a scatter plot with a collection of points that represent each input with its output. As you can see, the input values are on the x-axis with the output values on the yaxis. To fill in the rest of the line we must specify the type of plot we want which in this case “o”. This will connect each point on our graph with a straight line and leave the individual points plotted and is called “overplotting” hence the “o”. There are many other plot types which we may run into later in the semester. > plot(x,f,type = “o”) Now let’s plot a different part of the line by giving different input values for the function. > x = c(-2,-1,0,1,2) > f = m*x + b > plot(x,f,type = “o”) You have now plotted a different part of the same line we plotted before. Now let’s define another function f2(x). > f2 = (x)^2 -3 > f2 What should this graph look like? > plot(x,f2,type = “o”) Does this plot look like what you expected? Why or why not? This plot is not very smooth since we haven’t evaluated the function at very many points. We can smooth the plot out by evaluating at more points. To do this let’s define a new input vector, “x2”. We will use the command seq()to create a new vector going from -2 to 2 in steps of .1. > x2 = seq(from = -2,to = 2, by = .1) > x2 > f2 = (x2)^2 - 3 > f2 > plot(x2,f2,type = "o") With more points we can see that the graph is indeed a parabola as we expected. Try defining a function f3 of your choice and plot it to see what it looks like. Try giving it different input vectors to plot different pieces of the graph. 5. More Fun with Plots We will now learn how to customize data in figures, label figures, and add to plots that we have already created. As you may have noticed, each time we use plot() the figure in R is updated to our current plot and erases any previous figure we had open. We will learn two other commands in order to add data to already existing plots, points() and lines(). The figure we have open now has a parabola plotted. Let’s now add some points to this already existing plot. First we will add a single points at (0,0). > points(0,0) Now let’s add the points we used to create a line as before with x and f. > points(x,f) Notice that only three of the five points show up on our figure. Why do you think that is? When adding data points to figures in R, the size of the figure is set by the first plot you create. Therefore, the parabola in our figure set the x-axis to range from -2 to 2 and the y-axis to range from -3 to 1. Any points we try to add that are outside this region will not be plotted. This is why we don’t get all the points specified by x and f. We can also add lines to the plot again by specifying points that we would like connected by lines. For example, say we want to connect the following points: (-2,0) (-1,1) (0,-1) (1,0) (2,-2). We again need a vector to specify the x coordinates and another to specify the y coordinates. The following will use lines to connect the points above. > lines(c(-2,-1,0,1,2),c(0,1,-1,0,-2)) Again remember that we use c() to put our vectors together. What if we want a horizontal line on the x-axis? > lines(c(-2,-1,0,1,2),c(0,0,0,0,0)) Now you see that we can add things to already existing plots using points() and lines(). We will next learn how to customize your plots. See what happens with the following code. > plot(1:20,1:20,pch=1:20,cex=.5*(1:20),col=1:20) As you can see R has lots of choices for style of markers and colors. Any guesses as to what pch, cex, and col do? pch – specifies marker type. It can be inputted as a vector (like we just did) or you can just specify one marker type. > plot(1:20,1:20,pch=14,cex=.5*(1:20),col=1:20) cex – specifies the size of the marker. It can also be inputted as a vector or you can just specify one marker size. > plot(1:20,1:20,pch=1:20,cex=3,col=1:20) col – specifies the color of the markers. Again, it can be inputted as a vector or you can choose one color for all your points. > plot(1:20,1:20,pch=1:20,cex=.5*(1:20),col=3) HELPFUL HINT: R also lets you specify the color using color names as opposed to color codes. For example, > plot(1:20,1:20,pch=10,cex=3,col=”red”) These same customizations can be used with points(). > points(7:11,c(3,3,3,3,3),pch=8,cex=2,col=”green”) Since lines() just connects the points you specify, you cannot use these customizations because there are no markers placed at the points you plot. Therefore, the only customization above that can be used with lines() is col. >lines(1:5,c(10,13,11,8,12),col = 7) As you can see, 7 is the code for yellow. Now that we know how to add to existing plots and customize our plots, we need to learn how to label them. We will first add a title to our figure. > title(main=”Lab 1 Figure 1”, col.main = “blue”) The input col.main specifies the color of the title. The labels on the axes are set by the input you gave when you first used plot() for the current figure. Right now that is 1:20, 1:20. Let’s create a new plot in order to label the axes the way we want them. > plot(1:20,1:20,pch=1:20,cex=3,col=1:20,ann=FALSE) Now we have removed the labels on our axes and we are free to label them how we would like. We will use the title command to title our figure “Lab 1 Figure 2” and to label our axes “x-axis” and “yaxis”. > title(main=”Lab 1 Figure 2”, xlab = “x-axis”, ylab = “y-axis”) Now we have our figure labeled just the way we want it. There are many more customizations that you may discover throughout the semester. 6. Saving Figures Once you have created figures in R, it is important that you save them. You will often be asked to submit figures for assignments. We will save figures as jpegs although R has the ability to save them in many different formats. We can save “Lab 1 Figure 2” to Lab1Figure2.jpg by using the following code. > dev.copy(jpeg,"Lab1Figure2.jpg") > dev.off() This code saves a copy of your figure to Lab1Figure2.jpg in your current directory. If you are unsure what directory that is you can find out by typing: > getwd() 7. Submitting Assignments Now that you know how to save figures, submitting assignments should be fairly simple. Open the word processor of your choice. For campus computers, Launch -> Applications -> Office -> OpenOffice.org 3.2 Writer. As an example, insert “Lab1Figure2.jpg” into your document. Under this figure summarize the things you learned about customizing plots. Once you are finished save(or export) this file as a PDF called “LAB1.pdf”. All assignments should be submitted as PDFs. Now that you have LAB1.pdf saved we will go through the steps of submitting assignments via Blackboard. 1. Log in to CIS 2. Under My Classes you should have a link to this lab. Click it to open Blackboard. 3. Once logged in to Blackboard, in the click on the “Assignments” tab in the left-hand menu. 4. Click on Lab1 – Practice Assignment 5. Attach LAB1.pdf. 6. Submit. This is the process you will go through to submit your assignments for the lab. 8. Help in R When coding, it is often necessary to use the R’s help feature to understand how to use commands or know what arguments or options to include. You can easily get help with a specific command using the R console. All you need to type is a ? followed by a command name. > ?plot On a PC or MAC this opens the online help page for the command plot(). On the campus computers, the help page for plot is displayed directly in the terminal. To get out the help information on the terminal simply type q. 9. Quitting R To quit R and close the workspace simply type. > q() R will then ask you if you would like to save your workspace and you have the options of (y)es, (n)o, or (c)ancel. Selecting yes will save all your work and the variables you have already defined. That way you can pick up right where you left off the next time you begin work in R. You should now be somewhat familiar with the R environment and some of the basic commands you will be using throughout the semester. Be sure you understand how to create and use figures in R and how to submit assignments. If you have questions, please ask now since you have an assignment due next week.