SAN DIEGO STATE UNIVERSITY Graduate School of Public Health Division of Epidemiology and Biostatistics PH 628 Applications of Multivariate Statistics in Public Health Fall 2013 Section 1 Day Mon Wed Instructor: John Alcaraz, Ph.D. E-mail: jalcaraz@mail.sdsu.edu Time 2:00p – 3:15p 3 units Location HH 210 Schedule No. 22321 Office location: Hardy Tower 231 Office hours: Mon Wed 11:30am – 1:30pm Thursday 1:00pm – 3:00pm Blackboard: During the semester, course-related materials such as announcements, lecture notes, and homework solutions will be posted on Blackboard. Please check regularly for new materials. Required texts: – Kleinbaum, Kupper, Muller, Nizam: Applied Regression Analysis and Multivariable Methods, 4th edition. [KKM] – Afifi, May, Clark: Practical Multivariate Analysis, 5th edition. [AMC] – Slymen: “PH 628: Applications of Multivariate Statistics in Public Health” (Customized Materials). [R] – Slymen: “PH 628: Annotated SAS Output for Public Health 628” (Customized Materials). Note: The annotated output should be brought to every class meeting. Grading System*: Exercises: 25% Project**: 35% Final Exam: 40% 93 – 100 = A 73 – 77 = C 90 – 93 = A– 70 – 73 = C– 87 – 90 = B+ 67 – 70 = D+ 83 – 87 = B 63 – 67 = D 80 – 83 = B– 60 – 63 = D– 77 – 80 = C+ 0 – 60 = F * All coursework will require using SAS on the PC. ** Paper describing an in-depth analysis you perform using one or more multivariate methods. Dates for Coursework (subject to change): Date Assigned Exercise 1 Sept 11 Exercise 2 Sept 25 Exercise 3 Oct 2 Exercise 4 Oct 16 Exercise 5 Oct 30 Exercise 6 Nov 6 Project Sept 16 Final Exam (open-book, open-notes) Nov 20 Date Due Sept 25 Oct 9 Oct 16 Oct 30 Nov 13 Nov 20 Nov 25 Dec 11 All submitted coursework must be printed, computer-written documents. Handwritten coursework is not acceptable. Do not submit any SAS output. -1- Prerequisites: 1) PH 627 or equivalent course work in multiple regression, analysis of variance and logistic regression. 2) Completion of the SAS computer class or equivalent knowledge of SAS. Learning Objectives: In this course, students will learn the appropriate use of multivariate methods for the analysis of health-related data with multiple dependent and independent measures where multivariate assessment and/or variable reduction are the primary goals. Students will become familiar with computer procedures in SAS commonly used in multivariate analyses. Using SAS, students will be able to perform the following statistical procedures: 1. Linear regression diagnostics. Students will be able to check for violations of the assumptions of multiple linear regression, to identify influential data points, and to check for collinearity. 2. Logistic regression diagnostics. Students will be able to assess goodness-of-fit of logistic regression models, to identify influential data points, and to check for collinearity. 3. Principal components analysis. Students will be able to construct a set of components which summarize the interrelationships among a set of variables. Students will be able to assess whether these components may be used in place of the original variables in other analyses. 4. Factor analysis. By constructing a set of factors, students will be able to verify whether hypothesized or expected interrelationships appear among a set of variables. 5. Cluster analysis. Students will be able to group together subjects (e.g., patients) according to similar values on measured variables, where such groupings are not specified in advance. This is primarily an exploratory technique. 6. Discriminant analysis. Students will be able to construct a rule based on a set of variables which optimizes the classification of subjects among two or more specified groups (e.g., with disease, without disease). Students will be able to assess the utility of the rule for classification. 7. Polychotomous logistic regression. Students will be able to test the association between a set of independent variables and a categorical dependent variable that has more than two categories. 8. Ordinal regression. Students will be able to test the association between a set of independent variables and a categorical dependent variable that has more than two ordered categories. 9. Analysis of longitudinal data. Students will be able to analyze studies in which subjects are followed over time and repeated measurements of the outcome variables are taken on each subject. 10. Poisson regression. Students will be able to test the relationship between a set of independent variables and a dependent variable which counts the number of times a particular event occurs. 11. Additionally, students will learn to work independently or with minimal supervision to formulate and pursue an applied public health research question, and to communicate the results in writing. -2- Attendance: Although attending every class meeting is not required, it is strongly encouraged if you wish to get the most value out of this course. If because of severe circumstances (such as illness, injury, death in the family) you are absent on a day when an assignment is due, you must submit it to me via email no later than one week after the due date, along with documentation explaining your absence. Students for whom a due date falls on a date of planned absence (e.g., religious observance) must submit their assignment to me via email by 1:00pm on the due date or earlier. Email submissions must be in Word or PDF format. Student Conduct and Grievances: SDSU is committed to maintaining a safe and healthy living and learning environment for students, faculty and staff. Section 41301, Standards for Student Conduct (at http://csrr.sdsu.edu/conduct1.html ), and Sections 41302-41304 of the University Policies regarding student conduct should be reviewed. If a student believes that a professor’s treatment is grossly unfair or that a professor’s behavior is clearly unprofessional, the student may bring the complaint to the proper university authorities and official reviewing bodies. See University Policies on Student Grievances. Academic Ethics: SDSU has a strict code of ethical conduct which students are expected to follow. See http://csrr.sdsu.edu/conduct1.html for details. In particular, cheating on the exam will not be tolerated. You may not work together on the exam, may not copy answers from other students, and may not allow other students to copy your answers. Anyone caught cheating will face disciplinary action. Nondiscrimination Policy: SDSU complies with the requirements of Title VI and Title VII of the Civil Rights Act of 1964, as well as other applicable federal and state laws prohibiting discrimination. No person shall, on the basis of race, color, or national origin be excluded from participation in, be denied the benefits of, or be otherwise subjected to discrimination in any program of the California State University. SDSU does not discriminate on the basis of sex, gender, or sexual orientation in the educational programs or activities it conducts. SDSU does not discriminate on the basis of disability in admission or access to, or treatment or employment in, its programs and activities. Students should direct inquiries concerning SDSU’s compliance with all relevant disability laws to the Director of Student Disability Services (SDS), Calpulli Center, Suite 3101, or call 619-594-6473 (TDD: 619-594-2929). More details on SDSU’s Nondiscrimination Policy can be found in the SDSU General Catalog, University Policies. Students with Disabilities: Students with disabilities should discuss with me privately any specific accommodations for which they have received authorization. Authorization may be obtained by contacting Student Disability Services at 619-594-6473 (Calpulli Center, Suite 3101). Please obtain authorization before making an appointment to see me. More information can be found at http://www.sa.sdsu.edu/sds/. -3- Course Outline for PH 628 * Related Book Chapters Topic 1. Review of multiple linear & logistic regressions 2. Regression diagnostics and goodness-of-fit in multiple linear and logistic regression a. Residual analysis b. Detecting outliers c. Detecting collinearity d. Goodness-of-fit statistics KKM 12 (3rd ed.) or KKM 14 (4th ed.) 3. Principal components analysis (PCA) a. Basic properties & geometric interpretation b. Using PCA to detect outliers and collinearity AMC 14, R 4. Factor analysis (briefly) AMC 15 5. Cluster analysis AMC 16 6. Discriminant analysis (DA) a. Basic properties b. Two-group DA c. DA for more than 2 groups d. Estimation of error rates & posterior probabilities e. Relationship to multivariate ANOVA AMC 11, R 7. Polychotomous logistic regression a. Basic properties b. Estimation and hypothesis testing c. Modeling and examples R 8. Ordinal regression a. Basic properties b. Modeling and examples R 9. Analysis of longitudinal data a. Introduction and examples b. Mixed effects models c. Modeling the covariance structure d. Special issues: missing data, attrition R 10. Poisson regression (briefly) R * Lecture notes for all topics will be available on Blackboard. -4- R