Take Home Final Exam – Stat 506 Spring 2015

advertisement
Take Home Final Exam – Stat 506 Spring 2015
Due: 6 pm Thursday May 7 as a printed (from pdf) copy.
Total points: 80
1. You’ve spent a lot of time over the past 8-9 months learning how to carry out statistical
analysis using statistical software. Two things Kezia Manlove brought up in her talk
inspire this question:
• We should go back and clean up old code so that we’re proud of it.
• We should reflect back on what we’ve learned.
Your tasks:
(a) Check out style guides for R and read about styles others are using for R coding.
R Journal article on naming conventions: http://journal.r-project.org/archive/
2012-2/RJournal_2012-2_Baaaath.pdf
Google suggestions http://google-styleguide.googlecode.com/svn/trunk/google-r-style.
html
Write yourself a style guide for R (and SAS if you wish) programming covering at
least these things intended to make your code more readable.
•
•
•
•
•
•
How will you construct names (pick a style from one of my links, or elsewhere).
When will you indent R code? Will it be a tab or some number of spaces?
Discuss use of white space. When will you separate lines?
Use equals for assignment?
Comments on when to comment? One hash or 2?
When lines are grouped together with curly braces, where will opening and
closing braces appear?
(10 pts)
(b) Look back over the assignments you’ve completed in Stat 505 and 506 to see which
ones have been most challenging (at the time they were assigned) or seem will be
most useful in your future work. Pick two of the homeworks and explain why you
chose them.
(4 pts)
(c) For each, show
• the old code.
• redo the computer code using your style guide.
Improve flow, make it more efficient, add comments to explain the logic and
how the code works. Make sure that variables and functions have meaningful
names. Keep track and explain your improvements.
(16 pts)
(d) Reflect on what you have learned about coding this year.
•
•
•
•
What
What
What
What
were the biggest challenges?
resources were most useful when you needed help?
advice do you have for a new student starting this fall?
are your goals to learn next in the realm of stat computing?
I’m looking for two (on average) specific observations in each area.
(10 pts)
2. For this problem you will explore an R package which we have not used in class. This
is a very practical task because there are thousands of packages, so you will certainly
have to learn some of them on your own. Please keep track of the resources you use.
Install the mi (multiple imputation) R package and read the vignette. Also look at the
file mi.pdf on CRAN with vignette("mi_vignette").
(a) Run the code in the vignette, also available as miCode.R in the Rcode folder, to
see how it works. (Nothing to report on this part. Don’t include the pictures or
analysis, just see how it works.)
(b) What assumptions about randomness are made to use mi? What distributional
assumptions are made?
(6 pts)
(c) Load the CHAIN dataset in the same package and set up a missing_data.frame
for these data. What warning do you get? Show the image plot and discuss why
that warning appeared and how the plot illustrates the problem. How well does mi
guess the variable types (read the help page)? Improve the types as they did with
the nlsyV data.
(4 pts)
(d) The article they refer to is available, but we cannot work with survival times as they
did. Instead, use mi to fit a linear model to log_virus using all other variables as
predictors.
http://www.math.montana.edu/~jimrc/classes/stat506/notes/chain-HIV-study.
pdf. Explain the fitted model and how the predictors are related to log_virus.
(6 pts)
(e) Compare the mi averaged output to that from a plain lm fit, making sure that you
use factors in the same way that mi does.
• Do coefficient estimates and or SE’s change?
(4 pts)
• Give an opinion and a justification: Was it important to account for missing
values with these data?
(4 pts)
• Is there an effect of treatment on log viral load?
(6 pts)
• Are you willing to say the effects are “causal”? To whom do they apply? (6 pts)
(f) Which resources were most helpful in getting to know this package?
(4 pts)
Write up this exam in “report” format, with
• Part I Computer Coding
• Part II Multiple Imputation
Ignore the numbering scheme I used above, but do address each point in your own organized
way. As usual, I want a document which includes the code as an appendix.
Turn in a printed paper copy.
Download