student_coursework_booklet_stats2

advertisement
Medical Statistics Course two Advanced concepts 1
Student workbook
This workbook should be completed during the course and
submitted by the specified date
The course material can be found at: www.robin-beaumont.co.uk/virtualclassroom/stats/course2.html
Student name:
Date submitted:
Other information that may be relevant (optional):
Important information




You should paste the relevant output from the various analyses in the spaces provided; if necessary
add additional pages to the handout keeping subsequent pages with the same format as originally
provided.
You will not usually paste all the output, this is particularly true with SPSS where it tends to be
voluminous, select only the relevant parts.
An easy way to paste your output is to use the windows accessories -> snipping tool there are also
freeware versions available.
In one or two places you need to carry out manual calculations. In this case you can either paste the
details/results from the R console window or just type them in.
Robin Beaumont robin@organplayers.co.uk
page 1 of 17
1. Survival analysis
1.
Work through the exercise on page 7 concerning Histiocytic lymphoma and copy below the Kaplan Meier plot produced
by R. The dataset is available in sav format (you can use r commander to import it if you want) from the course webpage
as survival1a.sav
Mark on the plot below the median survival time for each group (hint in r you can use the abline(v=??) or abline(h=??)
functions.
2.
Consider the lung_cancer_prentice1973.sav dataset. This dataset is the Lung Cancer data set taken from Prentice (1973),
which consists of 40 patients’ survival time due to lung cancer using two different treatments. The dependent variable is
the survival time. The treatment has two levels-1: standard treatment, and 2: experimental treatment. There are also
several covariates including medical condition, patient's age, tumour type, and time between diagnosis and treatment.
Provide an appropriate analysis below either using R or SPSS
Robin Beaumont robin@organplayers.co.uk
page 2 of 17
3.
Print out (for your own use) and work through the Chen 2004 article. Specifically replicate the analysis on page 254 -256
where the appropriate breast cancer survival dataset is available from the course website as breast_cancer_survival.sav
print out below the relevant parts of the output (not all of it)
4.
Carry out a survival analysis (in either R or SPSS) on one of the exercise datasets provided at the end of the survival
analysis chapter. Present the important output below along with your conclusions.
Robin Beaumont robin@organplayers.co.uk
page 3 of 17
2. Simple Logistic regression
5.
Complete the exercise on page 8 of the simple Logistic regression chapter:
The odds ratio is . . . . .. This implies that the estimated odds of successful breast feeding at discharge improve by about . . . .%
for each additional year of the mother's age.
For a mother I five year difference in age results in a value of . . . ., which implies that a change of five years in age will more
than . . .[Select stay the same | double | triple | quadruple ] . the odds of exclusive breast feeding.
6.
Replicate the logistic regression exercise on pages 11 - 12 and paste the relevant output below:
7.
Also paste the Logistic curve from the above analysis (details on pages 13 14 of chapter)
Robin Beaumont robin@organplayers.co.uk
page 4 of 17
8.
Carry out the exercise on page 16 of the chapter, that is, Also carry out a separate Chi square analysis using the counts
from the classification table. How do the results compare to the logistic regression analysis, is it what you expected?
9.
Replicate the R analysis on page 15 of the chapter, paste the output below
10. In the above output what do the following terms equate to in SPSS output:
Null deviance =
Residual deviance =
Robin Beaumont robin@organplayers.co.uk
page 5 of 17
11. Consider one of the datasets provided on the course website specifically for logistic regression (andy_f_eel.sav,
andy_f_burnout.sav, or miles_shevlin_p152_data6_22.sav alternatively adapt miles_shevlin_p169_data7_1.sav) and carry
out an appropriate analysis in either R or SPSS - paste ONLY the relevant output below.
12. Comment on the findings above - what are your main conclusions?
Robin Beaumont robin@organplayers.co.uk
page 6 of 17
3. Multiple regression
13. carry out the multiple regression as described on page 9 of my multiple regression chapter, the dataset is available as
campbell_mir_p12.sav from the website.
14. Create the scatterplot matrix as described on page 10 of my multiple regression chapter, the dataset is available as
campbell_mir_p12.sav
15. Repeat the last two analyses using R (see my simple regression chapter and page 16 of the multiple regression chapter to
get you started)
Robin Beaumont robin@organplayers.co.uk
page 7 of 17
16. Continuing with the campbell_mir_p12.sav dataset replicate the analysis on page 19 (that is 2 independent variables one
continuous, 1 dichotomous with interaction). Paste the relevant output below:
17. Repeat the above analysis using R commander (see page 22). Paste the relevant output below
18. Write one or two sentences about the following methods used in multiple regression:
Enter:
Stepwise:
Remove:
Backward:
Forward:
Robin Beaumont robin@organplayers.co.uk
page 8 of 17
19. Discuss the first Multiple choice question written by Micheal Campbell (page 28 my chapter).
20. Discuss the second Multiple choice question written by Micheal Campbell (page 29 my chapter).
21. Consider the daniel_random_p514_ed5_multreg_chd_risk.sav file. This is of 1000 patients oxygen consumption
(dependent variable), systolic BP, total cholesterol, HDL cholesterol and triglycerides. Basically measuring cardiac health
against 4 continuous independent(?) variables.
Take random samples or the following sizes (to select a random sample in SPSS when in the data window use the menu option:
Data -> select cases -> Random sample of cases then click the button etc.)
Size 40
Size 100
Size 500
Size 1000
Discuss your results, comparing adjusted multiple R, Beta's and p values, and anything else you think worthy of note. You
might want to produce some type of table to help the comparison.
Robin Beaumont robin@organplayers.co.uk
page 9 of 17
4. Repeated measures 1
22. Using the t_test_paired_long_format.sav file provided on the course website produce the xyplot shown on page 7
(instructions start on page 6) and paste below.
23. Paste the R result only of the exercise at the bottom of page 7:
24. Pages 9-10 provides the equivalent independent t test analysis of the data using the Mixed models dialog boxes in SPSS Did you find this task useful, were there specific aspects that were difficult to understand?
Robin Beaumont robin@organplayers.co.uk
page 10 of 17
25. Page 11 describes how a Radom Effect parameter different from a fixed parameter, in your own words below explain
what you feel the difference is?
26. What is the relationship between variance and standard deviation? (A simple mathematical equation will do)
27. What is the relationship between covariance and correlation (A simple mathematical equation will do)
28. Considering the xyplot produced earlier from the t_test_paired_long_format.sav file we note that the paired t test results
in a insignificant result (p=0.08). The table of the table in wide format (page 12) shows that most few changes much
between the pre and post period.
Edit the data, using the t_test_paired_long_format.sav file as the basis, where the effect of the training has produced a
much more dramatic effect and a subsequent statistically significant result. Paste the relevant output below.
Robin Beaumont robin@organplayers.co.uk
page 11 of 17
29.
-2 Log Likelihood values, the degrees of freedom, the Chi square value and associated p value are all used when
evaluating two competing models - Try to explain the process in the form of a flow diagram below.
30. The P value associated with the chi square distribution can be graphically interpreted explain this interpretation and also
the fact that it is interpreted as a one tailed value.
31. Model covariance matrixes for repeated measures can take several forms allowing us to investigate various aspects of the
data. The table below lists some of the most common types - please complete it?
name
Variance at each time point
Scaled identity
Compound
constant
symmetry
Diagonal
different at each time
Unstructured
AR(1)
=Autoregressive
Correlation (covariance) between measurement times
Correlation gets less as time points get further apart
(i.e. t1, t2 = ρ but t1, t3 = ρ2 )
32. Explain what the standard error means in the above set of results.
Robin Beaumont robin@organplayers.co.uk
page 12 of 17
5. Repeated measures 2
33. In a random intercept model what does the random parameter represent, if you find it easier you can explain with the
help of a diagram.
34. In a random slope model what does the random parameter represent, if you find it easier you can explain with the help of
a diagram.
35. When both a random slope and intercept parameters are modelled what additional level of complexity needs to be
considered.
36. Concerning repeated measures which type of model is considered to be essential.
Robin Beaumont robin@organplayers.co.uk
page 13 of 17
37. The exercise on pages 1-12 is problematic as it fails to achieve convergence (with a -2LL = 817.8). In contrast I mention
that Twist did manage to obtain convergence with a -2LL of 810 rerun the analysis changing the statistics and estimation
options to that shown below.
Paste below the relevant output from the new analysis.
Have the covariance estimates changed much.
What do you make of the associated P values for each of them?
38. Paste below the SPSS syntax from the above exercise.
Robin Beaumont robin@organplayers.co.uk
page 14 of 17
6. Repeated measures 3
39. While working through the various exercises in the chapter Repeated measure 3 please paste below the syntax requested
in the exercise at the bottom of page 13.
40. This chapter has looked at a complex scenario which required a rather torturous analysis process, draw a flow chart of
what you did below adding additional comments to help you understand it.
41. What were the most difficult aspect of this chapter?
42. What aspects do you feel were most clearly explained?
Robin Beaumont robin@organplayers.co.uk
page 15 of 17
7. Principal Component Analysis (PCA) and Factor Analysis
43. Depending upon your background carry out a Exploratory Factor Analysis (EFA) using the med_factor.sav or
psy_factor.sav files.
Past below only the relevant output and beneath it explain the main findings.
44. It is believed that the grnt_fem.sav dataset has a two factor structure investigate this and discuss which of the variables
load on each factor, suggest names for each of the factors.
45. Explain the concept of reification with regard to factor analysis.
46. For no more than the length of a single page discuss the statement "Factor analysis is where art meets science
unfortunately to the detriment of both".
Robin Beaumont robin@organplayers.co.uk
page 16 of 17
47. Go to http://www.ats.ucla.edu/stat/spss/output/factor1.htm and download the dataset and carry out the analysis as
described on the page. Please paste the most relevant of your findings below.
48. Provide a summary of the above findings.
The end
Robin Beaumont robin@organplayers.co.uk
page 17 of 17
Download