Review for Final Exa..

advertisement
Review for Final Examination
COMM 550X, May 12, 11 am- 1pm Final
Examination
Practice for the Mid-Term
•
•
Multiple choice portion of the test: There will be 50
multiple choice questions chosen at random from this
pool of possible test questions. Each item will be worth
1 point
SPSS DATA ANALYSIS: You will be tested in SPSS on
bivariate correlation, multiple regression, and
MANOVA/discriminant analysis. The questions will use
the data sets statelevel.sav and
NationsoftheWorldModified.sav. The questions will have
point values as follows: bivariate correlation, 10 points;
multiple regression, 18 points; MANOVA/discriminant
analysis, 22 points
Sample Test Question for Bivariate
Correlation (8 points)
 Using the NationsoftheWorldModified.sav
data set, test the hypothesis that that there
is a significant positive association between
a country’s civil liberties score and the
annual number of peace demonstrations in
that country. Set your confidence level at
.05. Report the obtained value of the test
statistic, the N, the df and probability level,
and whether or not you can reject the null
hypothesis of no association between the
two variables.
Testing the Hypothesis






You have been asked to see if there is a significant association
between two variables. For tests where both variables are interval
level or better and no causal relationship between the two is
implied, the appropriate test statistic to compute is the bivariate
correlation. You are looking for a significant level of Pearson’s r,
the correlation coefficient
In SPSS Data Editor, open the NationsoftheWorldModified.sav data
file
Go to Analyze/Correlate/Bivariate and put the two variables, civil
liberties score and number of peaceful political demonstrations,
into the Variables window
Select a one-tailed test (you do this because you have made a
prediction about the direction of the relationship, that it will be
positive) and flag significant correlations
Under Correlation Coefficients select Pearson and click OK
Compare your output to the next slide
SPSS Output for Bivariate
Correlation


You only get a small amount of output for bivariate correlation. Note the
correlation coefficient (.077), the sample size (N = 112) and the significance
level (.208). DF is equal to N-2 for Pearson’s r.
Before you did the test, you set your confidence level to .05, so p (the
probablility level) needed to be smaller than .05 for you to reject the null
hypothesis. But your obtained value of Pearson’s r has a significance level of
.208. Consequently, you cannot reject the null hypothesis, and you are not
able to confirm your research hypothesis that there is a significant positive
association between a country’s civil liberties score and the number of its
peaceful political demonstrations
Correlations
Civil liberties score
Number of peaceful
political demonstrations
Pearson Correlation
Sig . (1-tailed)
N
Pearson Correlation
Sig . (1-tailed)
N
Civil liberties
score
1
.
112
.077
.208
112
Number of
peaceful
political
demonstra
tions
.077
.208
112
1
.
112
Pearson’s r
significance level
Writing up your Result
 “Bivariate correlation analysis was performed to test
the hypothesis that a country’s civil liberties score was
positively associated with its number of peaceful
political demonstrations. The obtained value of
Pearson’s r was .077 (N = 112, df = 110, p = .208,
one-tailed test), which was not significant.
Consequently, we cannot reject the null hypothesis
that there is no association between a country’s civil
liberties score and its number of peaceful political
demonsrations, and our research hypothesis was not
confirmed.” (Note: if the significance level had fallen below .05,
then you would have confirmed your research hypothesis only if
the sign of the association between the two variables was positive,
as predicted, that is, if the obtained correlation coefficient was
positive)
Sample Test Question for Multiple
Regression

You are asked to test the hypothesis that a country’s scores on the
civil liberties index is a function of a linear combination of three
variables, (1) percentage of seats in the lower legislative house
held by the largest party, (2) percentage in the work force who are
women, and (3) percentage of voting age population who voted in
the last election. You believe that these variables are of
importance in the order listed above. Further, you expect that the
signs of the first predictor, percentage of seats, will be negative,
and the signs of the second two predictors will be positive. Test the
hypothesis and then write an equation for predicting the score of a
new case on the civil liberties index based on the three variables.
Set your confidence level to .05. Report the test statistic, N, df,
and obtained probability level, and all other statistics appropriate
to determining whether or not you have used the procedure
correctly, and state whether or not your data support rejecting the
null hypothesis that civil liberties is unrelated to the three
variables, and confirming your research hypothesis
Testing the Hypothesis
 To test this hypothesis, you need a
procedure which looks at the relationship
between a single, interval or better level
variable on the one hand and multiple
interval level or better predictors on the
other. This is multiple regression. Since
your theory has given you a reason to
order the importance of your predictors
ahead of time, you choose a hierarchical
regression analysis where you enter the
variables into the regression equation in
the order of their presumed importance.
SPSS Procedure for Multiple
Regression


Download the NationsoftheWorldModified.sav data file
Go to Analyze/ Regression/ Linear








Move civil liberties score into the Dependent Box
Now we are going to enter variables one at a time, in the order
predicted by our theory. Move your first to enter variable, percentage
of seats in the lower legislative house held by the largest party, into the
Independent box and click Next
Move your second to enter variable, percentage of the work force who
are women, into the Independent box and click Next
Finally, move your third to enter variable, percentage of the voting age
population who voted in the last election, into the Independent box.
DON’T click next again
Make sure the enter option is selected under Method
Under Statistics, select Estimates, Confidence Intervals, Model Fit, R
squared change, Descriptives, Part and Partial Correlation, and Collinearity
Statistics, and click Continue.
Under Options, check Include Constant in the Equation, click Continue and
then OK. You are doing this so you will be able to write the equation for
predicting new cases’ civil liberties scores from raw scores on the predictor
variables.
Compare results to next slides
SPSS Output: The Variables and
their Order of Entry
 Look for this box to make sure you have done the
hierarchical regression form of multiple regression
and that your variables have been entered in the
order predicted by your theory
Variables Entered/Removedb
Model
1
2
3
Variables
Entered
Percent of
seats in
lower legis
hse held
by larg
est
a
party
Percent of
labor force
who area
women
Percent of
voting age
pop who
voted in
last
a
election
Variables
Removed
Method
.
Enter
.
Enter
.
Enter
a. All req uested variables entered.
b. Dependent Variable: Civil liberties score
The Regression Model Summary
Table

Next, look for your model summary. Note that there are three
models examined, and the notes a, b, and c tell which of your
predictors are in each model. Note that model 1, with only the
percent of seats in the lower legislative house variable entered, was
significant (F = 52.544, p <.001), and when the percentage of labor
force who are women variable was added in model 2, the increase in
R square, the percent of variance accounted for, was significant (F =
6.346, p < .014). Thus the two-variable model is significantly
correlated with Y. Note that Model three didn’t change R square
significantly (p = .471) (didn’t improve prediction significantly) so
you really don’t need the third predictor, percent of voting age
population who voted in last election. You choose Model 2
Model Summary
Chang e Statistics
Model
1
2
3
R
.625a
.659b
.662c
R Square
.391
.435
.438
Adjusted
R Square
.383
.421
.417
Std. Error of
the Estimate
1.277
1.237
1.240
R Square
Chang e
.391
.044
.004
F Change
52.544
6.346
.525
df1
df2
1
1
1
82
81
80
Sig . F Change
.000
.014
.471
a. Predictors: (Constant), Percent of seats in lower legis hse held by largest party
b. Predictors: (Constant), Percent of seats in lower legis hse held by largest party, Percent of labor force who are women
c. Predictors: (Constant), Percent of seats in lower legis hse held by largest party, Percent of labor force who are women,
Percent of voting age pop who voted in last election
Regression Statistics; R and R
Square

Note the statistics for the Model you have chosen, Model 2. The
multiple correlation R between civil liberties score and the two
predictors is .659. The amount of variance in the civil liberties score
accounted for by the combination of the two variables is .435
Model Summary
Chang e Statistics
Model
1
2
3
R
.625a
.659b
.662c
R Square
.391
.435
.438
Adjusted
R Square
.383
.421
.417
Std. Error of
the Estimate
1.277
1.237
1.240
R Square
Chang e
.391
.044
.004
F Change
52.544
6.346
.525
df1
df2
1
1
1
82
81
80
Sig . F Change
.000
.014
.471
a. Predictors: (Constant), Percent of seats in lower legis hse held by largest party
b. Predictors: (Constant), Percent of seats in lower legis hse held by largest party, Percent of labor force who are women
c. Predictors: (Constant), Percent of seats in lower legis hse held by largest party, Percent of labor force who are women,
Percent of voting age pop who voted in last election
Overall Significance of the
Regression Equation
 Look in the ANOVA table to get the overall F value for
the Model you have chosen (the F (2, 81) value for the
two variable combination of percent of seats held by
largest party and percent of labor force who are women
is 31.158, p < .001
ANOVAd
Model
1
2
3
Reg ression
Residual
Total
Reg ression
Residual
Total
Reg ression
Residual
Total
Sum of
Squares
85.620
133.618
219.238
95.327
123.911
219.238
96.135
123.103
219.238
df
1
82
83
2
81
83
3
80
83
Mean Square
85.620
1.629
F
52.544
Sig .
.000a
47.664
1.530
31.158
.000b
32.045
1.539
20.825
.000c
a. Predictors: (Constant), Percent of seats in lower leg is hse held by largest party
b. Predictors: (Constant), Percent of seats in lower leg is hse held by largest party,
Percent of labor force who are women
c. Predictors: (Constant), Percent of seats in lower leg is hse held by largest party,
Percent of labor force who are women, Percent of voting ag e pop who voted in last
election
d. Dependent Variable: Civil liberties score
Standardized and Unstandardized
Coefficients; Multicollinearity
Continue to examine your output. Note the standardized and unstandardized coefficients. You can
use the unstandardized coefficients to write the regression equation Y = 6.194 -.039 percent of
seats held by largest party + .034 percent of labor force who are women. You can use the
standardized coefficients to compare the relative contributions of number of seats and percent of
women (.-620 and .210, respectively) and note that both standardized coefficents were significantly
different from zero. Note also that the sign of the standardized coefficient for percentage of seats
was a minus sign, as predicted by your theory, and that the sign of the other variable was positive,
as predicted. You can also report your tolerance and VIF statistics which suggest that
multicollinearity was not a problem (tolerance is 1.0, VIF is not near 10)

Coefficientsa
Model
1
2
3
(Constant)
Percent of seats in lower
legis hse held by largest
party
(Constant)
Percent of seats in lower
legis hse held by largest
party
Percent of labor force who
are women
(Constant)
Percent of seats in lower
legis hse held by largest
party
Percent of labor force who
are women
Percent of voting age pop
who voted in last election
Unstandardized
Coefficients
B
Std. Error
7.298
.358
Standardized
Coefficients
Beta
Sig .
.000
-7.249
.000
-.051
-.029
11.078
.000
5.081
7.306
Zero-order
Correlations
Partial
-.625
-.625
-.625
1.000
1.000
Part
Collinearity Statistics
Tolerance
VIF
-.040
.005
6.194
.559
-.039
.005
-.620
-7.425
.000
-.050
-.029
-.625
-.636
-.620
1.000
1.000
.034
.014
.210
2.519
.014
.007
.061
.224
.270
.210
1.000
1.000
5.776
.805
7.179
.000
4.175
7.377
-.038
.006
-.594
-6.497
.000
-.049
-.026
-.625
-.588
-.544
.840
1.190
.032
.014
.195
2.263
.026
.004
.060
.224
.245
.190
.942
1.062
.006
.009
.068
.725
.471
-.011
.023
.347
.081
.061
.796
1.256
a. Dependent Variable: Civil liberties score
-.625
95% Confidence Interval for B
Lower Bound
Upper Bound
6.585
8.011
t
20.367
Writing up Your Multiple Regression
Results
 “To test the hypothesis that a country’s civil liberties score
was significantly related to a linear combination of the number
of seats in the lower legislative house held by the largest
party, the number of women in the labor force, and the
percentage of the voting age population who voted in the last
election, a multiple regression analysis was conducted. It was
expected that the variable ‘number of seats held by the
largest party’ would be negatively correlated with civil liberties
score and the other two variables positively related. Results
of the regression analysis indicated that a two-variable model
which included number of seats in the lower legislative house
held by the largest party and percentage of women in the
workplace was significantly correlated with civil liberties
scores (F (2, 81) = 31.158, p < .001. Addition of the third
variable to the predictive model did not significantly increase
the amount of variance in civil liberties score (F = .525, p <
.471). The two-variable combination accounted for
approximately 43.5% of the variance in civil liberties score.
(continued on next slide)
Writing up Your Multiple Regression
Results, cont’d
 The best fitting regression equation for predicting civil
liberties score from the two variables was civil liberties
score = 6.194 -.039 percent of seats held by largest
party + .034 percent of labor force who are women.
Significant standardized coefficients (βs) were
obtained for the two variables (-.620 for percent of
seats held by the largest party and .210 for
percentage of women in the labor force), indicating
that countries with higher scores on civil liberties
would be likely to have a smaller percentage of seats
in the lower legislative house held by the largest party
and a larger percentage of women in the labor force,
as predicted. Tolerance and VIF for the two-variable
model were both equal to 1.0, indicating that
multicollinearity was not an issue. Thus we can say
that partial support for the hypothesis was obtained.”
Sample Test Question for
Discriminant Analysis
 Now we are going to test the
following hypothesis: Southern and
non-Southern states differ
significantly on a combination of two
types of traffic fatality: restrained
and unrestrained motor vehicle
accidents, such that Southern states
will have a significantly higher value
on the combined indicators than nonSouthern states.
Testing the Hypothesis


Both discriminant analysis and MANOVA can be used in the case
where you have two or more interval or better level predictors
(DVs in the usage of MANOVA) and a nominal level grouping
variable (IV in the usage of MANOVA). In this case we have a
nominal level grouping variable (Southern/non-Southern) and
interval level (actually ratio level) DVs or discriminating variables
(traffic fatality variables).
We are going to use discriminant analysis to do the MANOVA,
which (1) will give the identical result in the case where
there are only two groups (two levels of the grouping
variable) and (2) let us practice doing discriminant analysis and
evaluating the efficacy of the discriminant function. We are going
to be looking for a significant level of Wilks’ lambda as an indicator
of significant differences and support for the hypothesis. It is also
necessary for the signs of the discriminant function coefficients to
be in the same direction as that predicted for the two variables (a
positive relationship with “southerness”).
SPSS Procedure for Discriminant
Analysis

Download the file statelevel.sav.








In SPSS Data Editor, open the data file statelevel.sav
Go to Analyze/Classify/Discriminant
In the Group box put South (dummy) and set the maximum
and minimum values to 1 and 0, respectively
In the Independents, put restrained motor vehicle deaths per
100k and unrestrained motor vehicle deaths per 100k
Make sure that the Enter Independents Together button is
checked
Under Statistics, check Means, univariate ANOVAs, Box’s M,
Unstandardized function coefficients, and click continue
Under Classify, select Summary Table and Territorial Map, and
click Continue, and then OK
Compare your output to the next few slides
Examining Your SPSS Output:
Group Means

First, look at the group means. Note that the means are in the
expected direction with levels of the two vehicle death variables
higher in the South than in the non-South. Univariate F tests show
that the differences are significant for both of the variables. So you
have significant differences in the expected direction on both of
your variables considered separately
Group Statistics
South dummy
Non-south
South
Total
Mean
Restrained motor
veh deaths per 100k
Unrestrained motor
veh deaths per 100k
Restrained motor
veh deaths per 100k
Unrestrained motor
veh deaths per 100k
Restrained motor
veh deaths per 100k
Unrestrained motor
veh deaths per 100k
Std. Deviation
Valid N (listwise)
Unweig hted
Weig hted
8.62437
2.785566
34
34.000
7.54033
4.380484
34
34.000
10.65369
2.182391
16
16.000
10.69006
4.332577
16
16.000
9.27376
2.756466
50
50.000
Tests
of Equality of Group Means
8.54824
4.568596
50
Wilks'
50.000
Lambda
Restrained motor
veh deaths per 100k
Unrestrained motor
veh deaths per 100k
F
df1
df2
Sig .
.880
6.567
1
48
.014
.894
5.664
1
48
.021
Box’s M Test for Equality of Group
Covariances, and Significance of Wilk’s
Lambda Overall Test

Next, look at your Box’s M test for the equality of group covariances. Box’s M is not
significant, which means you have met one of the assumptions of MANOVA, that the
group covariances for the levels of the grouping variable are equal. Now look at the
value of Wilks’ lambda, and assess it for significance. Wilks’ lambda is significant by the
Chi-square test, and it equals .783. If we interpret this significant value of Wilks’
lambda in a MANOVA-like way, we have confirmed the hypothesis that Southern and
non-Southern states differ significantly on the combination of the two motor vehicle
predictors. (If we were interpreting this in a discriminant analysis type of way, we would
say that the combination of two types of traffic related fatalities left .783 of the variance
in Southern state-ness “unexplained”). Wilks’ lambda is one of those measures you
want to be close to zero, so this result is statistically significant, but not all that
impressive
Test Results
Box's M
F
Approx.
df1
df2
Sig .
1.912
.602
3
19070.702
.613
Tests null hypothesis of equal population covariance matrices.
W ilks' Lambda
Test of Function(s)
1
Wilks'
Lambda
.783
Chi-square
11.478
df
2
Sig .
.003
The Canonical Correlation

From your printout you will also want to report the canonical correlation between the
combination of the two traffic fatality variables and South/Non-South, which is. 465.
This represents the correlation of the grouping variable (South/non-South) with the
new canonical variable formed by weighting the two original predictors (traffic
fatalities belted and unbelted) by the weights from the discriminant function. You
don’t usually report the equation for classifying new cases in the write-up when you
are using MANOVA or discriminant analysis to test a hypothesis about group
differences
You would use these weights
to classify new cases as to
south/non-South
Eigenv alues
Canonical Discriminant Function Coefficients
Function
1
Restrained motor
veh deaths per 100k
Unrestrained motor
veh deaths per 100k
(Constant)
Unstandardized coefficients
.291
.163
-4.093
Function
1
Eig envalue
.277a
% of Variance
100.0
Cumulative %
100.0
a. First 1 canonical discriminant functions were used in the
analysis.
Canonical
Correlation
.465
Discriminant Function Coefficients,
Group Means on Functions

You would report the standardized discriminant function coefficients to
show the relative contribution of each of the two predictors, which in
this case are about equal, and both positively associated with the
discriminant function, as required for support of your hypothesis.
Then would you report the group means (centroids) on the
discriminant function which shows that the South is highly positively
correlated with it (e.g., being a Southern state is highly correlated with
higher vehicle deaths) and the non-south is negative correlated with it.
Standardized Canonical Discriminant Function Coefficients
Function
1
Restrained motor
veh deaths per 100k
Unrestrained motor
veh deaths per 100k
.760
.713
Functions at Group Centroids
South dummy
Non-south
South
Function
1
-.354
.751
Unstandardized canonical discriminant
functions evaluated at g roup means
Classification Results

Finally, you would report the re-classification results (that is, the results of
using the discriminant function coefficients to create a new, canonical
variable out of the old predictors and use this new variable to re-classify
cases as to South or non-South) and the most frequently occurring
misclassifications; e.g., 78% of the cases were correctly re-classified based
on the discriminant function. Slightly more errors proportionally were
made re-classifying the Southern than the non-Southern cases
Classification Resultsa
Original
Count
%
South dummy
Non-south
South
Non-south
South
Predicted Group
Membership
Non-south
South
27
7
4
12
79.4
20.6
25.0
75.0
a. 78.0% of orig inal grouped cases correctly classified.
Total
34
16
100.0
100.0
Writing up your Discriminant
Analysis Result

“A discriminant analysis was conducted to perform a multivariateanalysis of variance test of the hypothesis that Southern states
differ from non-Southern states on a linear combination of two
types of traffic fatality, restrained motor vehicle accidents and
unrestrained motor vehicle accidents, such that Southern states
will have a significantly higher value on the combined indicators
than non-Southern states. The obtained value of Wilks’ lambda,
.783, was significant at p <.003 (Chi-square = 11.478, df = 2,
Box’s M =1.912, n.s.). The canonical correlation between the
grouping variable and the new canonical variable composed of the
two predictors was .465. Significant univariate differences of
means between Southern and non-Southern states were also
obtained for restrained motor vehicle accidents (F (1, 48) = 6.567,
p <.014) and unrestrained vehicle accidents (F (1, 48) = 5.664, p
< .021). Mean differences were in the expected direction: means
for restrained motor vehicle accidents were 10.65 for Southern
states and 8.62 for non-Southern states; means for unrestrained
motor vehicle accidents were 10.69 for Southern states and 7.54
for non-Southern states.
Writing up Your Discriminant
Analysis Result, cont’d
 Table 1 presents the standardized discriminant
function coefficients. Higher scores on the
discriminant function corresponded to higher traffic
fatality rates for both of the discriminating variables.
Table 2 presents the group centroids on the
discriminant function; the Southern states group had
a high, positive centroid with respect to the function,
corresponding to higher rates of traffic fatalities.
Table 3 presents the results of the re-classification
analysis, which shows that the discriminant function
was successful in reclassifying 78% of the cases.”
Download