LINKÖPINGS UNIVERSITET 2012-08-24 Institutionen för datavetenskap Programming in R

advertisement
LINKÖPINGS UNIVERSITET
Institutionen för datavetenskap
Avdelningen för statistik
Mattias Villani
Oleg Sysoev
2012-08-24
Programming in R
732A44
Exam in Programming in R, 7.5 hp
Exam time:
Material:
Teachers:
Grades:
14-18
The books R Cookbook (Teetor) and/or The Art of R Programming (Matloff).
The books should be free from notes and comments, but can contain
underlined lines and indicators for quickly finding chapters/sections etc.
Paper print-outs of the above mentioned books may be used.
Slides from the last lecture on data mining will be distributed in the exam room.
Mattias Villani (Questions 1-3 and 5. Mattias is available during the exam)
Oleg Sysoev (Question 4. Oleg will be present around 3 PM for questions)
Maximum is 20 points.
A=19-20 points
B=17-18 points
C=12-16 points
D=10-11 points
E=8-9 points
F=0-7 points
Write your solutions in complete and readable code.
Solutions should be written in a file that can be directly run in R.
The name of the file should be Main.R
Comment directly in Main.R whenever something needs to be explained or discussed.
Graphs produced during the exam should NOT be submitted for grading,
it is enough to submit the code that produces the graphs.
DO NOT FORGET TO SAVE YOUR RESULTS THROUGHOUT THE EXAM!!!
1. Sequences and Loops
(a) Use an R-command to create the vector fiveSteps containing the elements 5, 10, 15, 20
and 25. You are not allowed to simply type in the numbers by hand.
1p.
(b) Repeat question 1a, but this time using a for -loop where the loop variable runs from 1
to 5.
1.5p.
(c) Use an R-command to create the vector tenNumbers containing 10 numbers between 0
and 3.14.
1.5p.
1
2. Functions
(a) Write a function rectArea that computes the area of a rectangle with base b and height
h (Area = b · h). Use the new function to compute the area of a rectangle with base 2
and height 3.
1.5p.
(b) Change the function in 2a so that it also prints ’Rectangle is a square’ if the rectangle
has equal base and height.
1.5p.
(c) Change the function so that it can be used also when the user only inputs the base of
the rectangle (that is, the height argument can be omitted). The function should in
this case return the area of a square, and print ’Rectangle is a square’ to the screen.
1p.
3. Matrices and Data frames
(a) Create a matrix X with 3 rows and 2 columns from the vector x = (2, 4, 6, 1, 2, 5)
1p.
0
−1
0
(b) Compute (X ∗ X) , where X is the transpose of X, ∗ denotes the matrix product and
(X 0 ∗X)−1 denotes the matrix inverse of X 0 ∗X
1.5p.
(c) Make a data frame from the matrix X and assign the names ’XCol1’ and ’XCol2’ to the
two columns.
1.5p.
4. [Oleg’s question] In this assignment, you shall work with data set “Boston” that can be accessed
from library “MASS”. Type the following commands in R:
library(MASS)
help(Boston)
and see description of the data file columns.
(a) Create the data set Boston1 containing observations 1 up to 150, and Boston 2 containing
observations 151 up to 300. Fit a generalized linear model to the data set Boston1
with “rad” as response and “crim”, “rm” and “age” as explanatory variables, and do
not specify the degrees of freedom in the model. Use the fitted model afterwards to
predict “rad” observations for Boston2 (round off the predicted observations) and present
a matrix showing the amounts of the misclassified observations for each level of “rad”.
2p.
(b) Use Boston data set and the variables “crim”, “rm” and “age” to perform K-means clustering with K=2 and the following initial cluster centers: (0,0,0) and (100,100,100). What
are the final positions of the cluster centers?
2p.
5. Indexing
(a) Create the vector x = (5, 2,4, −3, 0, 1). Use an R-command to create the vector y containing only positive elements in x.
1p.
(b) Use an R-command to create the vector z containing all positive even elements of x.
[Hint: the command a%%2 returns zero only if a is an even number. The so called
modulus-operator %% can also be used on vectors.]
1.5p.
(c) Replace the second element of x with NaN (that is, the second element is a so called
Not-A-Number) without rewriting the whole vector. Then use an R-command to create
a new vector q containing all proper numbers of x (that is, the numbers which are not
NaN).
1.5p.
2
Download