Practice Examples

advertisement
Workshop Activity
Introduction to R: A workshop hosted by STAT CLUB
PART I:
Creating and modifying data
1. Create a vector of the integers from 1 to 50 and call it “x”.
2. Use the seq function to create a vector of even numbers from 2 to 100 and call it “y”.
3. Find the length of each vector using the length function.
4. Add the vectors “x” and “y” element-size.
5. Combine “x” and “y” into a single vector and call it “z” using c function.
6. Sort the “z” vector from lowest to highest.
7. Take the natural logarithm of all the values in z and store that in “log.z” variable.
8. Find the mean of “log.z”.
9. Count the number of values in “log.z” that are greater than 4 using the command:
sum(log.z > 4)
10. Output the values in “log.z” that are greater than 4 using the command: log.z[log.z > 4]
11. Combine the vectors “x” and “y” column-wise using the cbind function and give the new
object the name “xy.mat”.
12. Use the class function to determine which class your new object belongs to.
13. Find the element in the 10th row, 2nd column.
14. Using the t function to find the transpose of “xy.mat” and call it “t.xy.mat”.
15. Use %*% to multiply “t.xy.mat” and “xy.mat” (in that order) and call the output
“xy.mat2”
16. Use the dim function to find the dimensions of “xy.mat2”.
17. Find the inverse of “xy.mat2” using the solve function.
PART II:
Using data
1. Create or download a dataset in .csv format. Make sure that it has variable names in the
document in the first row.
2. Load the dataset using the read.table function. Print the dataset to make it was imported
properly.
3. Load the “datasets” package using the library function.
4. There is a dataset in this package called “chickwts”. Use the class function to determine
which class the object belongs to.
5. Use the names function to find the variable names.
6. Attach the dataset to the workspace so that we can continue to work with it.
7. Suppose we are only interested in chickens who have eaten soybean or sunflower feeds.
Use the which function to print out the indices of the observations of with feed
“sunflower” or “soybean”.
8. Create a new dataset called “chickwts2” that is a subset of the “chickwts” data but
includes only sunflower or soybean feeds. Use the command: chickwts2 =
subset(chickwts, feed== “sunflower” | feed== “soybean”)
9. Attach this new dataset to the workspace so that we can continue to work with it.
10. Conduct a one-sample t-test to see if the mean chicken weights are greater than 280. If
needed, utilize the help menu by typing in ?t.test.
11. There is another dataset in this package called “cars”. Use the class function to determine
which class the object belongs to.
12. Use the names function to find the variable names.
13. Attach the dataset to the workspace so that we can continue to work with it.
14. In this data, Speed is in miles per hour and Distance is in feet. Suppose we want to use
yards instead of feet. Convert the distance from feet to yards and store in new variable
called “dist.yards”.
15. Add this new variable to your car dataset by using the command: cars = data.frame(cars,
dist.yards)
16. Create a scatterplot of Y = distance and X= speed using the plot function.
17. Fit a linear regression model of Y = stopping distance in yards versus X = speed in mph
yards using the lm function.
18. Note the equation of the line. Create a function that takes a value of x as input and
outputs the fitted value of y. Call the function “f”.
19. Add the fitted line equation in the color red to your plot using the command: lines(0:30,
f(0:30),col= “red”)
20. One last dataset that we will look at is called “infert”. Use the class function to determine
which class the object belongs to.
21. Use the names function to find the variable names.
22. Attach the dataset to the workspace so that we can continue to work with it.
23. Create a two-way table of the “education” and “induced”. Call the table “tab”.
24. Perform a chi-square test of independence for the “tab” data using the chisq.test function.
25. There is an “age” variable in the dataset. Recode this variable into age groups: 24 or
younger, 25 to 29, 30 to 35, and 36 and over. There are many ways to do this, and one
such way is to use a for-loop couple with an if-then statement:
for (i in 1:length(age) ){
if (age[i] < 25) {age[i] = “24 or younger”}
else if (age[i] < 30) {age[i] = “25 to 29”}
else if (age[i] < 36) {age[i] = “30 to 35”}
else {age[i] = “36 or older”}
}
Try this for yourself and verify that it works and that you understand what the code is
doing. Notice that in this version of the code, you will be replacing the “age” variable
from a vector of numerical values to a vector of characters.
PART III:
Advanced exercises for functions, if-then statements, and for-loops
1. Write a function that takes a dataframe as an argument and returns the mean of numeric
column in the data frame. Test it on the “iris” dataset, preloaded in R.
2. Modify your function so that it returns a list, the first element of which is the means of
the numeric variables, and the second of which is the counts of the levels of each
categorical variable.
3. Write a function that outputs the string “positive” if the input is positive, “negative” if the
input is negative, and “zero” if the input is 0. Test it on a few values.
4. Use a for-loop to get the class of each column in the “iris” dataset.
5. Use the loop to calculate the mean of each numeric column in the “iris” dataset.
Download