LINKÖPINGS UNIVERSITET 2012-03-30 Institutionen för datavetenskap Programming in R

advertisement
LINKÖPINGS UNIVERSITET
2012-03-30
Institutionen för datavetenskap
Programming in R
Avdelningen för statistik
732A44
Mattias Villani
Oleg Sysoev
Exam in Programming in R, 7.5 hp
Exam time:
8-12
Material:
The books
R Cookbook (Teetor) and/or The Art of R Programming (Matlo ).
The books should be free from notes and comments, but can contain
underlined lines and indicators for quickly nding chapters/sections etc.
Slides from the last lecture on data mining will be distributed in the exam room.
Teachers:
Mattias Villani (Questions 1-3 and 5. Mattias is available regularly during the exam)
Grades:
Maximum is 20 points.
Oleg Sysoev (Question 4. Oleg will be present around 10 AM for questions)
A=19-20 points
B=17-18 points
C=12-16 points
D=10-11 points
E=8-9 points
F=0-7 points
Write your solutions in complete and readable code.
Solutions should be written in a le that can be directly run in R.
The name of the le should be Main.R
Comment directly in Main.R whenever something needs to be explained or discussed.
Graphs produced during the exam should NOT be submitted for grading,
it is enough to submit the code that produces the graphs.
1. Data structures
sta containing the following three elements:
The vector names containing (Mike,Luke,Adrian,Sonja)
The vector wage containing (24000, 17000, 31000,36000)
The vector manager with logical elements (TRUE eller FALSE) with the information
(a) Create a list named
i.
ii.
iii.
that Mike and Sonja are managers, but Luke and Adrian are not managers.
(b) Write code that selects the names of the managers with a wage greater than
1.5p.
30000.
1p.
(c) Write code that lowers the wage with 10% for all non-managers with a wage greater than
20000.
criteria.
Note that the list
sta
should be changed if at least one person fullls these
1.5p.
1
2. Loops
(a) Write a for-loop that prints the numbers
3, 7, 12
14
and
to the screen.
1p.
(b) Write a for-loop that calculates the sum of the elements in the vector
(3, 7, 12, 14).
1.5p.
(c) Write a while-loop that rst generates a random number from a uniform distribution over
the interval
[0, 1],
and then prints the sentence 'The random number is smaller than 0.9'
on the screen until the generated random number is smaller than
0.9.
Note that you
should generate a new random number in each iteration of the loop, and exit the loop
the rst time the random number is greater than
0.9.
1.5p.
3. Functions
(a) Write a function that takes any vector with data observations as input argument and
then returns the so called coecient of variation. The coecient of variation is dened
as
Coecient of variation =
Standard deviation
M ean
,
where both the mean and standard deviation are computed from the observation vector.
1.5p.
(b) Generate a random sample with
mean
2.
10
observations from an exponential distribution with
Use the function in 3a to calculate the coecient of variation for the generated
random sample. [Hint:
?rexp ]
1p.
(c) Write a code that uses the function from 3a to calculate the coecient of variation in
100
dierent random samples of size
The code should thus create
100
10
from the exponential distribution with mean
random samples, where each sample contains
observations. Compute the coecient of variation for each of the
100
10
data
random samples.
Use a histogram to illustrate the distribution of the coecient of variation in the
random samples.
2.
100
1.5p.
4.
Data mining (Written by Oleg). In this assignment, you shall work with data set anorexia
that can be accessed from library MASS . Type the following commands in R:
library(MASS)
help(anorexia)
anorexia1=anorexia
anorexia1$Postwt=scale(anorexia1$Postwt)
anorexia1$Prewt=scale(anorexia1$Prewt)
and see the description of the data le columns. Use anorexia1 further.
(a) Create a decision tree that has
Prewt and Treat as predictors and Postwt as response
(do not prune the tree). Find out the residual deviance and the residual mean deviance
(report these numbers as a comment in your R code).
2
2p.
Treat. Fit a neural network with Prewt and all dummy
Postwt as response; use 1 hidden neuron and 3 preliminary
(b) Create dummy variables for
variables as predictors and
runs and hyperbolic tangent as an activation function in the hidden layer. Report the
best residual error achieved (as a comment in your R code) (2p)
2p.
5. Sequences and function objects
sin)
(a) Plot the so called sinus function (
point between
0
och
π
by rst computing the function in
100
selected
(the number pi is built-in in R). The plotted function should be a
solid red line. Labels the axes properly.
(b) Write your own function
2p.
PlotFunction that can print an arbitrary function over an arbi-
trary interval of values in the function domain, and using an arbitrary number of function
evaluations. That is, the user of the function should be able to freely specify: i) which
function to plot, ii) over which domain to plot it, iii) how many function values to use in
plotting the function, without having to change anything in the code for
PlotFunction.
You can assume that the user-specied function is a vectorized function. Show how to
call the
PlotFunction() to solve Question 5a
3
2p.
Download