SAN DIEGO STATE UNIVERSITY Graduate School of Public Health

advertisement
SAN DIEGO STATE UNIVERSITY
Graduate School of Public Health
Division of Epidemiology and Biostatistics
PH 628 Applications of Multivariate Statistics in Public Health
Fall 2013
Section
1
Day
Mon Wed
Instructor: John Alcaraz, Ph.D.
E-mail: jalcaraz@mail.sdsu.edu
Time
2:00p – 3:15p
3 units
Location
HH 210
Schedule No.
22321
Office location: Hardy Tower 231
Office hours: Mon Wed 11:30am – 1:30pm
Thursday 1:00pm – 3:00pm
Blackboard:
During the semester, course-related materials such as announcements, lecture notes, and
homework solutions will be posted on Blackboard. Please check regularly for new materials.
Required texts:
– Kleinbaum, Kupper, Muller, Nizam: Applied Regression Analysis and Multivariable Methods,
4th edition. [KKM]
– Afifi, May, Clark: Practical Multivariate Analysis, 5th edition. [AMC]
– Slymen: “PH 628: Applications of Multivariate Statistics in Public Health” (Customized
Materials). [R]
– Slymen: “PH 628: Annotated SAS Output for Public Health 628” (Customized Materials).
Note: The annotated output should be brought to every class meeting.
Grading System*:
Exercises:
25%
Project**:
35%
Final Exam:
40%
93 – 100 = A
73 – 77 = C
90 – 93 = A–
70 – 73 = C–
87 – 90 = B+
67 – 70 = D+
83 – 87 = B
63 – 67 = D
80 – 83 = B–
60 – 63 = D–
77 – 80 = C+
0 – 60 = F
* All coursework will require using SAS on the PC.
** Paper describing an in-depth analysis you perform using one or more multivariate methods.
Dates for Coursework (subject to change):
Date Assigned
Exercise 1
Sept 11
Exercise 2
Sept 25
Exercise 3
Oct 2
Exercise 4
Oct 16
Exercise 5
Oct 30
Exercise 6
Nov 6
Project
Sept 16
Final Exam (open-book, open-notes)
Nov 20
Date Due
Sept 25
Oct 9
Oct 16
Oct 30
Nov 13
Nov 20
Nov 25
Dec 11
All submitted coursework must be printed, computer-written documents. Handwritten
coursework is not acceptable. Do not submit any SAS output.
-1-
Prerequisites:
1) PH 627 or equivalent course work in multiple regression, analysis of variance and logistic
regression.
2) Completion of the SAS computer class or equivalent knowledge of SAS.
Learning Objectives:
In this course, students will learn the appropriate use of multivariate methods for the analysis of
health-related data with multiple dependent and independent measures where multivariate
assessment and/or variable reduction are the primary goals. Students will become familiar with
computer procedures in SAS commonly used in multivariate analyses. Using SAS, students will
be able to perform the following statistical procedures:
1. Linear regression diagnostics. Students will be able to check for violations of the assumptions
of multiple linear regression, to identify influential data points, and to check for collinearity.
2. Logistic regression diagnostics. Students will be able to assess goodness-of-fit of logistic
regression models, to identify influential data points, and to check for collinearity.
3. Principal components analysis. Students will be able to construct a set of components which
summarize the interrelationships among a set of variables. Students will be able to assess whether
these components may be used in place of the original variables in other analyses.
4. Factor analysis. By constructing a set of factors, students will be able to verify whether
hypothesized or expected interrelationships appear among a set of variables.
5. Cluster analysis. Students will be able to group together subjects (e.g., patients) according to
similar values on measured variables, where such groupings are not specified in advance. This is
primarily an exploratory technique.
6. Discriminant analysis. Students will be able to construct a rule based on a set of variables
which optimizes the classification of subjects among two or more specified groups (e.g., with
disease, without disease). Students will be able to assess the utility of the rule for classification.
7. Polychotomous logistic regression. Students will be able to test the association between a set of
independent variables and a categorical dependent variable that has more than two categories.
8. Ordinal regression. Students will be able to test the association between a set of independent
variables and a categorical dependent variable that has more than two ordered categories.
9. Analysis of longitudinal data. Students will be able to analyze studies in which subjects are
followed over time and repeated measurements of the outcome variables are taken on each
subject.
10. Poisson regression. Students will be able to test the relationship between a set of independent
variables and a dependent variable which counts the number of times a particular event occurs.
11. Additionally, students will learn to work independently or with minimal supervision to
formulate and pursue an applied public health research question, and to communicate the results
in writing.
-2-
Attendance: Although attending every class meeting is not required, it is strongly encouraged if
you wish to get the most value out of this course.
If because of severe circumstances (such as illness, injury, death in the family) you are absent on
a day when an assignment is due, you must submit it to me via email no later than one week after
the due date, along with documentation explaining your absence. Students for whom a due date
falls on a date of planned absence (e.g., religious observance) must submit their assignment to me
via email by 1:00pm on the due date or earlier. Email submissions must be in Word or PDF
format.
Student Conduct and Grievances:
SDSU is committed to maintaining a safe and healthy living and learning environment for
students, faculty and staff. Section 41301, Standards for Student Conduct (at
http://csrr.sdsu.edu/conduct1.html ), and Sections 41302-41304 of the University Policies
regarding student conduct should be reviewed.
If a student believes that a professor’s treatment is grossly unfair or that a professor’s behavior is
clearly unprofessional, the student may bring the complaint to the proper university authorities
and official reviewing bodies. See University Policies on Student Grievances.
Academic Ethics:
SDSU has a strict code of ethical conduct which students are expected to follow. See
http://csrr.sdsu.edu/conduct1.html for details. In particular, cheating on the exam will not be
tolerated. You may not work together on the exam, may not copy answers from other students,
and may not allow other students to copy your answers. Anyone caught cheating will face
disciplinary action.
Nondiscrimination Policy:
SDSU complies with the requirements of Title VI and Title VII of the Civil Rights Act of 1964,
as well as other applicable federal and state laws prohibiting discrimination. No person shall, on
the basis of race, color, or national origin be excluded from participation in, be denied the
benefits of, or be otherwise subjected to discrimination in any program of the California State
University.
SDSU does not discriminate on the basis of sex, gender, or sexual orientation in the educational
programs or activities it conducts.
SDSU does not discriminate on the basis of disability in admission or access to, or treatment or
employment in, its programs and activities. Students should direct inquiries concerning SDSU’s
compliance with all relevant disability laws to the Director of Student Disability Services (SDS),
Calpulli Center, Suite 3101, or call 619-594-6473 (TDD: 619-594-2929).
More details on SDSU’s Nondiscrimination Policy can be found in the SDSU General Catalog,
University Policies.
Students with Disabilities:
Students with disabilities should discuss with me privately any specific accommodations for
which they have received authorization. Authorization may be obtained by contacting Student
Disability Services at 619-594-6473 (Calpulli Center, Suite 3101). Please obtain authorization
before making an appointment to see me. More information can be found at
http://www.sa.sdsu.edu/sds/.
-3-
Course Outline for PH 628 *
Related Book
Chapters
Topic
1. Review of multiple linear & logistic regressions
2. Regression diagnostics and goodness-of-fit in multiple linear and
logistic regression
a. Residual analysis
b. Detecting outliers
c. Detecting collinearity
d. Goodness-of-fit statistics
KKM 12 (3rd ed.) or
KKM 14 (4th ed.)
3. Principal components analysis (PCA)
a. Basic properties & geometric interpretation
b. Using PCA to detect outliers and collinearity
AMC 14, R
4. Factor analysis (briefly)
AMC 15
5. Cluster analysis
AMC 16
6. Discriminant analysis (DA)
a. Basic properties
b. Two-group DA
c. DA for more than 2 groups
d. Estimation of error rates & posterior probabilities
e. Relationship to multivariate ANOVA
AMC 11, R
7. Polychotomous logistic regression
a. Basic properties
b. Estimation and hypothesis testing
c. Modeling and examples
R
8. Ordinal regression
a. Basic properties
b. Modeling and examples
R
9. Analysis of longitudinal data
a. Introduction and examples
b. Mixed effects models
c. Modeling the covariance structure
d. Special issues: missing data, attrition
R
10. Poisson regression (briefly)
R
* Lecture notes for all topics will be available on Blackboard.
-4-
R
Download