Syllabus 19_577 Spring 2012 - 19-577-Spring-2012

advertisement
Biostatistics for Health Data 19.577 - Spring 2012
Dr. Manuel Cifuentes
Wednesday, 6 to 9 pm
Credits: 3
Location: Kitson 208
Schedule: Wednesday, 6:00 to 8:50 p.m.
Office: Kitson Hall – 200-Q - Wednesdays 4 - 6 p.m. and per request.
Telephones: 978-934-3271 (Cifuentes) – (603) 274-9419 (Cell phone for emergencies)
Email: Manuel_Cifuentes@uml.edu (PLEASE, use 19.577 Spring 2012 in the beginning of the subject of
any email communication regarding the course).
Course Description
This a graduate level course in basic statistical techniques to be used in, but not limited to,
health research. The purpose of this course is to train students in developing a consistent statistical
approach to solve quantitative research problems of mild complexity. This approach begins by
understanding the statistical logic (not necessarily the mathematics) underlying quantitative research
questions, continues with being skillful in using statistical software to perform the mathematical work,
and concludes with knowing how to interpret and explain the numerical results. Using type of variable
measurement as the main criterion, the following statistical models will be studied: linear regression,
ANOVA, Chi square, logistic regression, non-parametric methods, and general linear models. Emphasis
will be placed on interpretation of regression coefficients, odds ratios, and intercept. There will be an
introduction to effect modification and confounding using multivariate regression.
As with many other areas of knowledge, statistics is affected by the use it or lose it law. This law
has deep neurophysiological correlates that can be used as excuses to forget almost everything about
statistics. Therefore, more than asking you to remember a formula or a computational process, you will
be asked to organize yourself in a way that will allow you, at a later time, to re-learning what you have
forgotten about statistics and to learn new interesting statistical tools.
Class organization
The class meets for three hours once a week. In case of severe weather or other disasters that
might result in cancellation of class, please check your email looking for a message from Manuel
Cifuentes and/or call Manuel Cifuentes phone number above - a voice mail message will provide
information.
To achieve the course goals, we will simultaneously use the book, SPSS, lecturing and discussion.
Be prepared to show how much you do not know and to be fully respectful of others’ need and desire to
learn. Between classes you will read the book and use SPSS and, therefore will come fully prepared for
classes. It is not a bad idea to believe that you will need 2 to 3 hours of study per each hour attending
classes, so that you may need as much as 9 hours working by yourself or with study partners before
and/or after classes.
Our classes, except the first one, will be divided in two blocks. One week of readings and
exercises will be used for the first block of each class. The first block of every class will be group work.
The instructor will make different groups of students every class. Each group is going to review and
grade other group member’s homework due for that day. A perfect homework will have a score of 7. A
perfect HW passed in after 6:05 PM will have a score of 6. Any minor problem or mistake will discount
one point; a major problem will discount two points. There will be not half points. Participation and
questioning are encouraged. You are also encouraged to appeal for a better grade, after you have been
in-class graded, using the best of your statistical knowledge. It is absolutely forbidden to talk about this
grading activity out of class time. We are going to be as candid as we can with each other. Therefore,
privacy and confidentiality must be protected. We will respect each other to learn from our mistakes.
During the second block, usually after a 10-minute break, the instructor will introduce the new topic and
perform a lecture/demonstration. Some concepts will be explained to prepare you for a better
understanding of the reading material. The general logic and the statistical assumptions of the statistical
model will be discussed.
Student Responsibilities
Attendance at all classes is expected. If you will be absent, please notify the instructors in
advance. Homework assignments are due PRINTED or HANDWRITTEN by 6:00 PM on the dates noted on
the class schedule. Readings are assigned and are strongly recommended. Students are expected to
learn the assigned textbook material, even if it is not explicitly presented in class.
“All students are advised that there is a University policy regarding dishonesty and cheating. It is
the students’ responsibility to familiarize themselves with this policy.”
Students should notify the instructors in advance about any potential conflicts between their
religious observance and course due dates/examinations. When a conflict occurs, the instructors and
student will work out, in advance, a reasonable alternative.
Instructor’s Responsibilities
In addition to organizing and presenting material, preparing and re-grading homework and
preparing and grading exams, I am available for help sessions with students. If it is not possible to come
during office hours, please call or email for an appointment. Email questions are encouraged, and I will
get back to you as soon as I can. If you will be making a special trip to campus to see me, please phone
ahead to reserve time. I find that it is often useful to respond to email course content questions by
sending the response to the entire class. This way you can all benefit from the inquiries of other class
members. Of course, if the question is a personal one, then I will respond only to the questioner.
Software and Textbooks
Required:
We have chosen a friendly statistical package and a friendly text book as companions for this
semester. The software is SPSS, which is widely available on the UMASS Lowell campus, and the book is
Discovering Statistics Using SPSS by Andy Field (SAGE, 3rd edition, hardcover, 822 pages).
Supplementary:
A Handbook of Statistical Analyses using SPSS by Sabine Landau and Brian S. Everitt (PDF
provided by the instructor)
Grading
Attendance, class participation
 25%
HW
 25%
Midterm
 25%
Final
 25%
Graduate students are required to maintain a B average at all times, and in addition cannot
receive more than two grades of B/C or C. Please pay attention to the drop dates, and talk to me before
these dates are reached if you have any doubts about how you are doing in the class. It may be useful
for you to know our own interpretations of letter grades, so that you know what to expect by way of
grading:
A
excellent. Student has mastered the material completely. There is essentially no improvement
possible.
A-
very good. Student has mastered the material to a high degree. Only minor mistakes were
made, or minor room for improvement is evident.
B+/B
good. Student has mastered the material to an acceptable degree. There is room for
improvement, but all the essential mastery has been demonstrated.
B-/C
poor. Student has mastered only some of the material, and there are serious gaps in
demonstrated mastery. This is not an acceptable level of performance for a continuing
graduate student.
F
unacceptable. Student has not shown even a minimum level of learning.
Course goal
Using SPSS, the student should be able to adequately check assumptions, perform and interpret
linear regression, logistic regression, ANOVA, Chi square, general linear, and non-parametric models.
The student will master the interpretation of intercepts, regression coefficients, and odds ratios and will
be able to control for confounding and effect modifiers.
Course Objectives
At the end of the course, students will be able to:
1. Understand how to classify variables according to their measurement level (continuous,
categorical)
2. Understand the concept of variable role: (predictor, independent variable, cause) and (outcome,
dependent variable, effect)
3. Perform basic data cleaning and data management using SPSS
4. Find the best statistical model to study association between two variables based on
measurement level and variable role (previous point)
5. Being able to use SPSS to check assumptions of and run correlations and simple linear regression
6. Being able to use SPSS to check residuals in linear regression
7. Understand how to interpret correlation and linear regression results
8. Being able to use SPSS to check assumptions of and run one way ANOVA
9. Understand how to interpret one way ANOVA results
10. Being able to use SPSS to check assumptions of and run Chi Square
11. Understand how to interpret Chi Square results
12. Being able to use SPSS to check assumptions of and run logistic regression
13. Understand how to interpret logistic regression results
14. Being able to use SPSS to check assumptions of and run general linear models
15. Understand how to interpret general linear model results
16. Perform basic non-parametric tests using SPSS
17. Control for confounding and determine the presence of effect modification
18. Interpret intercepts
19. Interpret regression coefficients
20. Interpret odds ratios
21. Do not be afraid of statistics
19.577 Spring 2012 Class Schedule
Class
number
Date
Content
Readings
related to the
class
Chapters 1 and
2
1.
January 25*
2.
February 1
Reminding the
basis of
statistics
Using SPSS
3.
February 8
Using SPSS
4.
February 15
Correlation
Chapter 6. Skip
6.5.5.
5.
February 22
Linear
regression
Chapter 7 – up
to 7.6 complete
6.
February 29*
Linear
regression
Chapter 7 – 7.7
to 7.10
7.
March 7
ANOVA
Chapter 9
(recommended)
Chapter 10
(required)
March 14
Spring break. No
classes
Midterm exam
8.
March 21
Chapters 3 and
4
Chapter 5
Homework due by 6:05 PM on the day of the class
No homework due.
0. Quizzes on page 29 and page 59-60
1. Data management A. Use database “cars.sav” and describe with charts the
following variables: mpg, weight, accel, year, origin, cylinder. Explain why you
selected each chart and interpret each chart.
2. Data management B. Use database “General Social Survey.sav” and check
normality (all and by sex) and homogeneity of variance (by sex) in the
following variables: age, educ, prestg80. Explore whether transformations
produce normality and/or homogeneity of variance. Describe and interpret
your findings.
3. Correlation. Use database “World95.sav” and compute the appropriate
correlation (check assumptions!!) among the following variables (all
combinations): populatn, density, babymort, lifeexpm, lifexpf, religion, and
pop_incr. Interpret the results and their statistical significance.
4. Linear regression. Use database “Employee data.sav” and predict salary using
salbegin. Check assumptions and interpret the intercept, regression
coefficient, and model fit. Look for influential cases and outliers.
5. Linear regression. Use database “Employee data.sav” and predict salary using
prevexp. Add a second predictor salbegin, check assumptions again, look for
influential cases, outliers, and multicollinearity. Interpret the intercept,
regression coefficients, and model fit. Compare residuals, regression
coefficients, and model fit for both models. Would you prefer a model with
one or two predictors? What predictor(s). Why?
9.
March 28
Logistic
regression I
Chapter 8 – Up
to 8.5
10.
April 4
Logistic
regression II
Chapter 8 –
Complete
11.
April 11
Chapter 7 = 7.7,
7.8
Chapter 11
12.
April 18
Introduction to
General Linear
Models and
multivariate
analysis I
Introduction to
General Linear
Models and
multivariate
analysis II
13.
April 25
TBA
14.
May 2
Centering
predictors;
meaningful
intercept
Review
(questions and
answers)
Non-parametric
analysis and
Categorical data
15.
May 9
Final exam
TBA
Chapter 15
Chapter 18 – up
to 18.5
6. ANOVA. Use database “University of Florida graduate salaries.sav” and
compare the mean initial salary across genders, across colleges, and across
graduation dates. Check assumptions. Describe and interpret your results.
7. Logistic regression. Use database “University of Florida graduate salaries.sav”
and dichotomize salaries by the median. Predict over the median salaries
using separately gender, college, and graduation dates. Check assumptions.
Describe and interpret odds ratios.
8. Logistic regression. Use database “Employee data.sav” and dichotomize salary
by the highest tertile. Predict highest tertile salary with two separate (two
regressions with one predictor each) and two simultaneous (one regression
with two predictors) predictors: salbegin and prevexp. Check assumptions.
Describe, interpret, and compare odds ratios across models.
9. Running General Linear Models. Use database “Employee data.sav” and
predict salary using prevexp. Add a second predictor salbegin. Interpret the
intercept, regression coefficients, and model fit. Compare with the same
analysis using linear regression (homework 5). Add a third predictor (Job
category). Interpret the intercept, regression coefficients, and model fit. How
would you run the analysis with these three simultaneous predictors
(prevexp, salbegin, and jobcat) using multivariate least squares linear
regression?
10. Confounding and effect modification. Use database “World95.sav” and
determine whether religion is a confounder or an effect modifier or both of
the association between the predictor lit_fema and the outcome lifeexpf.
Describe and interpret the regression coefficients and the intercept.
11. Multivariate analysis and intercept interpretation. Use database
“World95.sav” and determine whether religion is a confounder or an effect
modifier or both of the association between the predictor lit_fema and the
outcome lifeexpf. Center the continuous predictor and run the analysis again.
Compare the models (describe and interpret the regression coefficients, the
intercept, and the model fit) with non-centered and centered continuous
predictor. Explain the differences if there is any.
Download