Take Home Final Exam – Stat 506 Spring 2015 Due: 6 pm Thursday May 7 as a printed (from pdf) copy. Total points: 80 1. You’ve spent a lot of time over the past 8-9 months learning how to carry out statistical analysis using statistical software. Two things Kezia Manlove brought up in her talk inspire this question: • We should go back and clean up old code so that we’re proud of it. • We should reflect back on what we’ve learned. Your tasks: (a) Check out style guides for R and read about styles others are using for R coding. R Journal article on naming conventions: http://journal.r-project.org/archive/ 2012-2/RJournal_2012-2_Baaaath.pdf Google suggestions http://google-styleguide.googlecode.com/svn/trunk/google-r-style. html Write yourself a style guide for R (and SAS if you wish) programming covering at least these things intended to make your code more readable. • • • • • • How will you construct names (pick a style from one of my links, or elsewhere). When will you indent R code? Will it be a tab or some number of spaces? Discuss use of white space. When will you separate lines? Use equals for assignment? Comments on when to comment? One hash or 2? When lines are grouped together with curly braces, where will opening and closing braces appear? (10 pts) (b) Look back over the assignments you’ve completed in Stat 505 and 506 to see which ones have been most challenging (at the time they were assigned) or seem will be most useful in your future work. Pick two of the homeworks and explain why you chose them. (4 pts) (c) For each, show • the old code. • redo the computer code using your style guide. Improve flow, make it more efficient, add comments to explain the logic and how the code works. Make sure that variables and functions have meaningful names. Keep track and explain your improvements. (16 pts) (d) Reflect on what you have learned about coding this year. • • • • What What What What were the biggest challenges? resources were most useful when you needed help? advice do you have for a new student starting this fall? are your goals to learn next in the realm of stat computing? I’m looking for two (on average) specific observations in each area. (10 pts) 2. For this problem you will explore an R package which we have not used in class. This is a very practical task because there are thousands of packages, so you will certainly have to learn some of them on your own. Please keep track of the resources you use. Install the mi (multiple imputation) R package and read the vignette. Also look at the file mi.pdf on CRAN with vignette("mi_vignette"). (a) Run the code in the vignette, also available as miCode.R in the Rcode folder, to see how it works. (Nothing to report on this part. Don’t include the pictures or analysis, just see how it works.) (b) What assumptions about randomness are made to use mi? What distributional assumptions are made? (6 pts) (c) Load the CHAIN dataset in the same package and set up a missing_data.frame for these data. What warning do you get? Show the image plot and discuss why that warning appeared and how the plot illustrates the problem. How well does mi guess the variable types (read the help page)? Improve the types as they did with the nlsyV data. (4 pts) (d) The article they refer to is available, but we cannot work with survival times as they did. Instead, use mi to fit a linear model to log_virus using all other variables as predictors. http://www.math.montana.edu/~jimrc/classes/stat506/notes/chain-HIV-study. pdf. Explain the fitted model and how the predictors are related to log_virus. (6 pts) (e) Compare the mi averaged output to that from a plain lm fit, making sure that you use factors in the same way that mi does. • Do coefficient estimates and or SE’s change? (4 pts) • Give an opinion and a justification: Was it important to account for missing values with these data? (4 pts) • Is there an effect of treatment on log viral load? (6 pts) • Are you willing to say the effects are “causal”? To whom do they apply? (6 pts) (f) Which resources were most helpful in getting to know this package? (4 pts) Write up this exam in “report” format, with • Part I Computer Coding • Part II Multiple Imputation Ignore the numbering scheme I used above, but do address each point in your own organized way. As usual, I want a document which includes the code as an appendix. Turn in a printed paper copy.