LINKÖPINGS UNIVERSITET 2012-03-30 Institutionen för datavetenskap Programming in R Avdelningen för statistik 732A44 Mattias Villani Oleg Sysoev Exam in Programming in R, 7.5 hp Exam time: 8-12 Material: The books R Cookbook (Teetor) and/or The Art of R Programming (Matlo ). The books should be free from notes and comments, but can contain underlined lines and indicators for quickly nding chapters/sections etc. Slides from the last lecture on data mining will be distributed in the exam room. Teachers: Mattias Villani (Questions 1-3 and 5. Mattias is available regularly during the exam) Grades: Maximum is 20 points. Oleg Sysoev (Question 4. Oleg will be present around 10 AM for questions) A=19-20 points B=17-18 points C=12-16 points D=10-11 points E=8-9 points F=0-7 points Write your solutions in complete and readable code. Solutions should be written in a le that can be directly run in R. The name of the le should be Main.R Comment directly in Main.R whenever something needs to be explained or discussed. Graphs produced during the exam should NOT be submitted for grading, it is enough to submit the code that produces the graphs. 1. Data structures sta containing the following three elements: The vector names containing (Mike,Luke,Adrian,Sonja) The vector wage containing (24000, 17000, 31000,36000) The vector manager with logical elements (TRUE eller FALSE) with the information (a) Create a list named i. ii. iii. that Mike and Sonja are managers, but Luke and Adrian are not managers. (b) Write code that selects the names of the managers with a wage greater than 1.5p. 30000. 1p. (c) Write code that lowers the wage with 10% for all non-managers with a wage greater than 20000. criteria. Note that the list sta should be changed if at least one person fullls these 1.5p. 1 2. Loops (a) Write a for-loop that prints the numbers 3, 7, 12 14 and to the screen. 1p. (b) Write a for-loop that calculates the sum of the elements in the vector (3, 7, 12, 14). 1.5p. (c) Write a while-loop that rst generates a random number from a uniform distribution over the interval [0, 1], and then prints the sentence 'The random number is smaller than 0.9' on the screen until the generated random number is smaller than 0.9. Note that you should generate a new random number in each iteration of the loop, and exit the loop the rst time the random number is greater than 0.9. 1.5p. 3. Functions (a) Write a function that takes any vector with data observations as input argument and then returns the so called coecient of variation. The coecient of variation is dened as Coecient of variation = Standard deviation M ean , where both the mean and standard deviation are computed from the observation vector. 1.5p. (b) Generate a random sample with mean 2. 10 observations from an exponential distribution with Use the function in 3a to calculate the coecient of variation for the generated random sample. [Hint: ?rexp ] 1p. (c) Write a code that uses the function from 3a to calculate the coecient of variation in 100 dierent random samples of size The code should thus create 100 10 from the exponential distribution with mean random samples, where each sample contains observations. Compute the coecient of variation for each of the 100 10 data random samples. Use a histogram to illustrate the distribution of the coecient of variation in the random samples. 2. 100 1.5p. 4. Data mining (Written by Oleg). In this assignment, you shall work with data set anorexia that can be accessed from library MASS . Type the following commands in R: library(MASS) help(anorexia) anorexia1=anorexia anorexia1$Postwt=scale(anorexia1$Postwt) anorexia1$Prewt=scale(anorexia1$Prewt) and see the description of the data le columns. Use anorexia1 further. (a) Create a decision tree that has Prewt and Treat as predictors and Postwt as response (do not prune the tree). Find out the residual deviance and the residual mean deviance (report these numbers as a comment in your R code). 2 2p. Treat. Fit a neural network with Prewt and all dummy Postwt as response; use 1 hidden neuron and 3 preliminary (b) Create dummy variables for variables as predictors and runs and hyperbolic tangent as an activation function in the hidden layer. Report the best residual error achieved (as a comment in your R code) (2p) 2p. 5. Sequences and function objects sin) (a) Plot the so called sinus function ( point between 0 och π by rst computing the function in 100 selected (the number pi is built-in in R). The plotted function should be a solid red line. Labels the axes properly. (b) Write your own function 2p. PlotFunction that can print an arbitrary function over an arbi- trary interval of values in the function domain, and using an arbitrary number of function evaluations. That is, the user of the function should be able to freely specify: i) which function to plot, ii) over which domain to plot it, iii) how many function values to use in plotting the function, without having to change anything in the code for PlotFunction. You can assume that the user-specied function is a vectorized function. Show how to call the PlotFunction() to solve Question 5a 3 2p.