Uploaded by fontanagb13

7 - Logistic Regression and Nonparametrics

advertisement
STAT 4210
Worksheet 7: Logistic Regression and Nonparametrics
Section 13.6 Modeling a Categorical Response
The regression models we studied previously were designed for a quantitative response
variable 𝑦. When 𝑦 is categorical, a different modeling approach is needed, called logistic
regression.
Suppose we want to use 𝑥 = age to predict the probability of subscribing to a newspaper.
Linear regression model:
Linear Fit
Subscribe = -0.749531 + 0.0279225*Age
Logistic regression model:
A regression model for an S-shaped curve for the probability of success 𝜋 is
𝑒 𝛼+𝛽𝑥
𝜋=
1 + 𝑒 𝛼+𝛽𝑥
Parameter Estimates
Term
Estimate Std ChiSq Prob>ChiSq
Error
Intercept -5.888 2.369 6.17 0.0130*
Age
0.131
0.052 6.35 0.0117*
For log odds of 1/0
STAT 4210
Worksheet 7: Logistic Regression and Nonparametrics
Inference for logistic regression
𝐻0 : 𝛽 = 0
𝐻𝐴 : 𝛽 ≠ 0
𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡 =
𝑠𝑡𝑎𝑡 − 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
𝑠𝑡 𝑑𝑒𝑣 𝑜𝑓 𝑠𝑡𝑎𝑡
It is important to check if a logistic regression model fits the data. BUT the methods of
evaluating fit are beyond the scope of this class. You should be able to interpret the results
of a logistic regression, but you will need to seek outside information before you
produce your own analysis.
Multiple logistic regression
Burris, Wiley, Welner, and Murphy (2008) explored (among other things) the effect of
tracking on the likelihood of a student receiving an IB diploma.
STAT 4210
Worksheet 7: Logistic Regression and Nonparametrics
The Logic of Statistical Inference
1. Statistic – Compute a statistic to summarize the observed sample data.
2. Sampling Distribution – Identify a “by chance alone” explanation for the data (the null
hypothesis). Choose a model to represent the values of the statistic that would occur by
chance if the null were true.
3. Strength of Evidence – Consider whether the observed value of the statistic would be
likely or unlikely to occur by chance if the null hypothesis were true. Quantify the
likelihood using a p-value.
Chapter 15: Nonparametric Statistics
Definition: Nonparametric statistical methods are inferential methods that do not
assume a particular form of distribution, such as the normal distribution for the population
distribution. This is especially useful in two cases:
•
•
Example: Comparing clinical therapies
A clinical psychologist wants to choose between two therapies for treating depression.
They select eight patients who are similar in their symptoms and overall health. Four of the
patients are randomly assigned to receive Therapy 1 and the others are assigned to receive
Therapy 2. After a month of treatment, the improvement in each patient is measured by the
change in a score for measuring severity of depression – the higher the change score, the
better.
Treatment 1
Treatment 2
Improvement Scores
25, 33, 40, 45
10, 12, 28, 30
Ranks
Sum of Ranks
The Wilcoxon sum rank test uses one of two test statistics:
Hypotheses:
𝐻0 : Identical population distributions for Treatments 1 and 2
𝐻𝐴 : Different population distributions for Treatments 1 and 2
STAT 4210
Worksheet 7: Logistic Regression and Nonparametrics
Assumptions:
Test Statistic:
Do these two treatments have different effects on depression?
Therapy 1
Simulation Results
Y₀ = 24 (Original Estimate)
Empirical p-Values
Test
Y ≥ |Y₀|
Y ≤ Y₀
Y ≥ Y₀
p-Value
0.0584
0.9696
0.0584
What are some advantages/disadvantages of using the Wilcoxon rank test in this scenario?
Improvement By Therapy
STAT 4210
Worksheet 7: Logistic Regression and Nonparametrics
Using the Wilcoxon Sum Rank test with Quantitative Responses
Example: Popcorn data
The yield for the popcorn data had some pretty funky sample distributions (below),
indicating that the assumption of normality in the population for small and large batches
was pretty suspect.
Distributions batch=large
yield
Distributions batch=small
yield
The violation of the normality and standard deviation assumptions required for both the t
and the F tests means that the p-values we got for those tests are suspect. Instead, we can
use the nonparametric Wilcoxon test.
Wilcoxon / Kruskal-Wallis Tests (Rank Sums)
2-Sample Test, Normal Approx
Level Count Score Sum Expected Score Score Mean (Mean-Mean0)/Std0
large
8
46.000
68.000
5.7500
-2.261
small
8
90.000
68.000
11.2500
2.261
S
90
Z
2.26128
Prob>|Z|
0.0237*
1-Way Test, ChiSquare Approx
ChiSquare
5.3540
DF
1
Prob>ChiSq
0.0207*
STAT 4210
Worksheet 7: Logistic Regression and Nonparametrics
Section 15.2 Nonparametric Methods for Several Groups
The Wilcoxon sum rank test for comparing two groups extends to comparisons of mean
ranks for more than two groups, using the Kruskal-Wallis test.
The test statistic for the Kruskal-Wallis test is
𝐻=
12
∑ 𝑛𝑖 (𝑅̅𝑖 − 𝑅̅ )2
𝑛(𝑛 + 1)
The K-W test is a nonparametric alternative to one-way ANOVA, and does not assume
normal populations or equal population standard deviations for the test to be valid.
Example: Holding Time
An airline analyzed whether telephone callers to their reservations office would remain on
hold longer, on average, if they heard (a) an advertisement about the airline, (b) Muzak, or
(c) classical music. For 15 callers randomly assigned to these three conditions, the table
below shows the data.
Recorded message
Muzak
Advertisement
Classical
Holding times (min)
0, 1, 3, 4, 6
1, 2, 5, 8, 11
7, 8, 9, 13, 15
Ranks
1, 2.5, 5, 6, 8
2.5, 4, 7, 10.5, 13
9, 10.5, 12, 14, 15
Mean Rank
4.5
7.4
12.1
Hypotheses:
𝐻0 : Identical populations distributions of holding times for Muzak, Ad, and Classical
𝐻𝐴 :
Assumptions:
Test Statistic: Kruskal-Wallis statistic 𝐻 =
STAT 4210
Worksheet 7: Logistic Regression and Nonparametrics
Wilcoxon / Kruskal-Wallis Tests (Rank Sums)
Level
Count
Score Sum
5
5
5
37.000
60.500
22.500
Advertisement
Classical
Muzak
Expected
Score
40.000
40.000
40.000
Score Mean
(Mean-Mean0)/Std0
7.4000
12.1000
4.5000
-0.307
2.454
-2.086
1-Way Test, ChiSquare Approximation
ChiSquare
DF Prob>ChiSq
7.3814
2
0.0250*
Small sample sizes. Refer to statistical tables for tests, rather than large-sample approximations.
Strength of evidence:
Nonparametric Comparisons For Each Pair Using Wilcoxon Method
q*
1.95996
Alpha
0.05
Level
- Level
Classical
Muzak
Muzak
Advertisement
Advertisement
Classical
Score Mean
Difference
3.00000
-1.80000
-4.80000
Std Err Dif
Lower CL
Upper CL
1.909043
1.909043
1.914854
-3.0000
-10.0000
-14.0000
13.0000
4.0000
-2.0000
Download