Medical Statistics Course two Advanced concepts 1 Student workbook This workbook should be completed during the course and submitted by the specified date The course material can be found at: www.robin-beaumont.co.uk/virtualclassroom/stats/course2.html Student name: Date submitted: Other information that may be relevant (optional): Important information You should paste the relevant output from the various analyses in the spaces provided; if necessary add additional pages to the handout keeping subsequent pages with the same format as originally provided. You will not usually paste all the output, this is particularly true with SPSS where it tends to be voluminous, select only the relevant parts. An easy way to paste your output is to use the windows accessories -> snipping tool there are also freeware versions available. In one or two places you need to carry out manual calculations. In this case you can either paste the details/results from the R console window or just type them in. Robin Beaumont robin@organplayers.co.uk page 1 of 17 1. Survival analysis 1. Work through the exercise on page 7 concerning Histiocytic lymphoma and copy below the Kaplan Meier plot produced by R. The dataset is available in sav format (you can use r commander to import it if you want) from the course webpage as survival1a.sav Mark on the plot below the median survival time for each group (hint in r you can use the abline(v=??) or abline(h=??) functions. 2. Consider the lung_cancer_prentice1973.sav dataset. This dataset is the Lung Cancer data set taken from Prentice (1973), which consists of 40 patients’ survival time due to lung cancer using two different treatments. The dependent variable is the survival time. The treatment has two levels-1: standard treatment, and 2: experimental treatment. There are also several covariates including medical condition, patient's age, tumour type, and time between diagnosis and treatment. Provide an appropriate analysis below either using R or SPSS Robin Beaumont robin@organplayers.co.uk page 2 of 17 3. Print out (for your own use) and work through the Chen 2004 article. Specifically replicate the analysis on page 254 -256 where the appropriate breast cancer survival dataset is available from the course website as breast_cancer_survival.sav print out below the relevant parts of the output (not all of it) 4. Carry out a survival analysis (in either R or SPSS) on one of the exercise datasets provided at the end of the survival analysis chapter. Present the important output below along with your conclusions. Robin Beaumont robin@organplayers.co.uk page 3 of 17 2. Simple Logistic regression 5. Complete the exercise on page 8 of the simple Logistic regression chapter: The odds ratio is . . . . .. This implies that the estimated odds of successful breast feeding at discharge improve by about . . . .% for each additional year of the mother's age. For a mother I five year difference in age results in a value of . . . ., which implies that a change of five years in age will more than . . .[Select stay the same | double | triple | quadruple ] . the odds of exclusive breast feeding. 6. Replicate the logistic regression exercise on pages 11 - 12 and paste the relevant output below: 7. Also paste the Logistic curve from the above analysis (details on pages 13 14 of chapter) Robin Beaumont robin@organplayers.co.uk page 4 of 17 8. Carry out the exercise on page 16 of the chapter, that is, Also carry out a separate Chi square analysis using the counts from the classification table. How do the results compare to the logistic regression analysis, is it what you expected? 9. Replicate the R analysis on page 15 of the chapter, paste the output below 10. In the above output what do the following terms equate to in SPSS output: Null deviance = Residual deviance = Robin Beaumont robin@organplayers.co.uk page 5 of 17 11. Consider one of the datasets provided on the course website specifically for logistic regression (andy_f_eel.sav, andy_f_burnout.sav, or miles_shevlin_p152_data6_22.sav alternatively adapt miles_shevlin_p169_data7_1.sav) and carry out an appropriate analysis in either R or SPSS - paste ONLY the relevant output below. 12. Comment on the findings above - what are your main conclusions? Robin Beaumont robin@organplayers.co.uk page 6 of 17 3. Multiple regression 13. carry out the multiple regression as described on page 9 of my multiple regression chapter, the dataset is available as campbell_mir_p12.sav from the website. 14. Create the scatterplot matrix as described on page 10 of my multiple regression chapter, the dataset is available as campbell_mir_p12.sav 15. Repeat the last two analyses using R (see my simple regression chapter and page 16 of the multiple regression chapter to get you started) Robin Beaumont robin@organplayers.co.uk page 7 of 17 16. Continuing with the campbell_mir_p12.sav dataset replicate the analysis on page 19 (that is 2 independent variables one continuous, 1 dichotomous with interaction). Paste the relevant output below: 17. Repeat the above analysis using R commander (see page 22). Paste the relevant output below 18. Write one or two sentences about the following methods used in multiple regression: Enter: Stepwise: Remove: Backward: Forward: Robin Beaumont robin@organplayers.co.uk page 8 of 17 19. Discuss the first Multiple choice question written by Micheal Campbell (page 28 my chapter). 20. Discuss the second Multiple choice question written by Micheal Campbell (page 29 my chapter). 21. Consider the daniel_random_p514_ed5_multreg_chd_risk.sav file. This is of 1000 patients oxygen consumption (dependent variable), systolic BP, total cholesterol, HDL cholesterol and triglycerides. Basically measuring cardiac health against 4 continuous independent(?) variables. Take random samples or the following sizes (to select a random sample in SPSS when in the data window use the menu option: Data -> select cases -> Random sample of cases then click the button etc.) Size 40 Size 100 Size 500 Size 1000 Discuss your results, comparing adjusted multiple R, Beta's and p values, and anything else you think worthy of note. You might want to produce some type of table to help the comparison. Robin Beaumont robin@organplayers.co.uk page 9 of 17 4. Repeated measures 1 22. Using the t_test_paired_long_format.sav file provided on the course website produce the xyplot shown on page 7 (instructions start on page 6) and paste below. 23. Paste the R result only of the exercise at the bottom of page 7: 24. Pages 9-10 provides the equivalent independent t test analysis of the data using the Mixed models dialog boxes in SPSS Did you find this task useful, were there specific aspects that were difficult to understand? Robin Beaumont robin@organplayers.co.uk page 10 of 17 25. Page 11 describes how a Radom Effect parameter different from a fixed parameter, in your own words below explain what you feel the difference is? 26. What is the relationship between variance and standard deviation? (A simple mathematical equation will do) 27. What is the relationship between covariance and correlation (A simple mathematical equation will do) 28. Considering the xyplot produced earlier from the t_test_paired_long_format.sav file we note that the paired t test results in a insignificant result (p=0.08). The table of the table in wide format (page 12) shows that most few changes much between the pre and post period. Edit the data, using the t_test_paired_long_format.sav file as the basis, where the effect of the training has produced a much more dramatic effect and a subsequent statistically significant result. Paste the relevant output below. Robin Beaumont robin@organplayers.co.uk page 11 of 17 29. -2 Log Likelihood values, the degrees of freedom, the Chi square value and associated p value are all used when evaluating two competing models - Try to explain the process in the form of a flow diagram below. 30. The P value associated with the chi square distribution can be graphically interpreted explain this interpretation and also the fact that it is interpreted as a one tailed value. 31. Model covariance matrixes for repeated measures can take several forms allowing us to investigate various aspects of the data. The table below lists some of the most common types - please complete it? name Variance at each time point Scaled identity Compound constant symmetry Diagonal different at each time Unstructured AR(1) =Autoregressive Correlation (covariance) between measurement times Correlation gets less as time points get further apart (i.e. t1, t2 = ρ but t1, t3 = ρ2 ) 32. Explain what the standard error means in the above set of results. Robin Beaumont robin@organplayers.co.uk page 12 of 17 5. Repeated measures 2 33. In a random intercept model what does the random parameter represent, if you find it easier you can explain with the help of a diagram. 34. In a random slope model what does the random parameter represent, if you find it easier you can explain with the help of a diagram. 35. When both a random slope and intercept parameters are modelled what additional level of complexity needs to be considered. 36. Concerning repeated measures which type of model is considered to be essential. Robin Beaumont robin@organplayers.co.uk page 13 of 17 37. The exercise on pages 1-12 is problematic as it fails to achieve convergence (with a -2LL = 817.8). In contrast I mention that Twist did manage to obtain convergence with a -2LL of 810 rerun the analysis changing the statistics and estimation options to that shown below. Paste below the relevant output from the new analysis. Have the covariance estimates changed much. What do you make of the associated P values for each of them? 38. Paste below the SPSS syntax from the above exercise. Robin Beaumont robin@organplayers.co.uk page 14 of 17 6. Repeated measures 3 39. While working through the various exercises in the chapter Repeated measure 3 please paste below the syntax requested in the exercise at the bottom of page 13. 40. This chapter has looked at a complex scenario which required a rather torturous analysis process, draw a flow chart of what you did below adding additional comments to help you understand it. 41. What were the most difficult aspect of this chapter? 42. What aspects do you feel were most clearly explained? Robin Beaumont robin@organplayers.co.uk page 15 of 17 7. Principal Component Analysis (PCA) and Factor Analysis 43. Depending upon your background carry out a Exploratory Factor Analysis (EFA) using the med_factor.sav or psy_factor.sav files. Past below only the relevant output and beneath it explain the main findings. 44. It is believed that the grnt_fem.sav dataset has a two factor structure investigate this and discuss which of the variables load on each factor, suggest names for each of the factors. 45. Explain the concept of reification with regard to factor analysis. 46. For no more than the length of a single page discuss the statement "Factor analysis is where art meets science unfortunately to the detriment of both". Robin Beaumont robin@organplayers.co.uk page 16 of 17 47. Go to http://www.ats.ucla.edu/stat/spss/output/factor1.htm and download the dataset and carry out the analysis as described on the page. Please paste the most relevant of your findings below. 48. Provide a summary of the above findings. The end Robin Beaumont robin@organplayers.co.uk page 17 of 17