Stat 203 – Assignment 4. Due Wednesday July 18,... This is the complete assignment. There are 8 questions. ...

advertisement
Stat 203 – Assignment 4. Due Wednesday July 18, 2012, at 4:30 in the drop box.
This is the complete assignment. There are 8 questions. 4 from chapter 10, 4 from chapter 11.
Only questions 6,7,8 are for marks.
Name: ________________________ Student Number: _________________________
1. Consider the following scatterplot:
A) What is the correlation? JUSTIFY YOUR ANSWER
B)
C)
D)
E)
i) 1.2
iv) -0.980
ii)-1.2
v) 0.3
iii) 0.980
vi) -0.3
What proportion of the variance in Y is explained by X?
If X and Y were switched, what would the correlation become?
If Y were multiplied by 10, what would the correlation become?
If Y were multiplied by -10, what would the correlation become?
2. In this question we compute the correlation by hand. (See WK08_Extra for additional info)
Consider the dataset:
X
Y
3
0
10
-5
2
2
7
-3
-5
1
With sample means and standard deviations
= 2.4
= 4.56
= -1
= 2.91
A) Find the standardized values for x and y. (That’s How many standard deviations above/below the mean each
value is using this formulas
and
B) Take each pair of standardized values and multiply them together.
Table for A and B
X
Y
3
5
2
7
-5
0
-5
2
-3
1
0.131
C) Take the sum the products from part B and divide by N-1 (N is the number of X-Y pairs?
D) Enter the values for X and Y directly into SPSS and verify that that your answer from C is the same as a
Pearson correlation coefficient.
3. We have the spending habits of 100 families. Specifically, their food spending per month and their clothing
spending per year. (A4spending.sav)
A) Find the Pearson correlation coefficient. Is it significant at the 0.01 level?
B) How much of the variation in clothing spending can be explained by food spending?
C) Verify the significance using a two-sided t-test.
D) Produce a scatterplot to better see the relation.
E) Is there any trend in the scatterplot that could be a problem?
4. Consider the gross domestic product (GDP, a measure of the mean income of the citizens in a country), and life
expectancy (at birth) in the countries of the world. (A4GDPLife.sav). Based on data from 2003 CIA Factbook,
GDP measured in US Dollars/year, life expectancy in American years.
A. Find the Pearson correlation coefficient. Is it significant at the 0.01 level?
B. How strong is this correlation? Does it fully describe the strength of the relationship between GDP and life
expectancy?
C. Produce a scatterplot and use it to back up your answer in B.
D. From you graph in C, another correlation measure is more appropriate than Pearson correlation. Name it
and explain why it’s more appropriate.
E. Include the results from the correlation measure you described in D, is there a strong relationship between
GDP and life expectancy?
For the following questions, use the data set NHL2011-12.sav, which is the final standings of the NHL 2011-2012 regular
season. The assignment is to analyze in detail the factors behind season points.
W – The number of games this team won in the season.
L – The number of games this team lost in the first three periods.
OTL – The number of games this team lost in overtime or the shootout.
P – Season points, determines who goes to the playoffs (2 pts for a win, 1 pt for an overtime loss)
GF – Goals For. The number of goals that team scored.
GA – Goals Against. The number of goals that team had scored on them.
Conf – Conference. 0 for an Eastern team, 1 for a Western Team.
5. Consider the regression: Season Points (P) as a function of the number of goals (GF). (That means Season Points
is the dependent variable and Goals For is the independent variable)
a) What is the regression equation? (Include the SPSS coefficients table)
b) What does the slope mean in this context?
c) What does the intercept mean? Does this intercept mean anything in a real-world context? Why or why not?
d) From the regression table, is the slope significantly above zero at alpha = 0.05. (That means would we reject the
null hypothesis that the slope is zero)
e) From the regression table and part D what two things can you tell about the correlation between Goals For and
Season Points? Verify your conclusion by finding the correlation (correlation table not needed, but include r =
_____ )
f)
On average, how many season points would you expect a team that scored 250 goals to have?
6. Consider the regression: Season Points (P) as a function of Wins (W).
a) Get the regression equation, include the coefficients table.
b) What do the slope and intercept mean in this context?
c) Which would you expect to have a stronger correlation with season points: Wins (W) or Goals For (GF)? Justify.
d) Verify the expectation by finding the correlation between Wins and Points, and comparing it to the correlation
in 5E.
e) On average, how many season points would you expect a team that won 45 games to have?
f)
How many season points would you expect a team that won 200 games to have? Are there any problems with
this interpretation?
7. This question is about multiple regression.
a) How much variance in Season Points is explained by Wins? (There are two ways to do this: Get the correlation
coefficient and square it, or use the model summary table in the regression. I suggest the second way for this
question)
b) How much variance in Season Points is explained by Overtime Losses (OTL)?
c) Find the regression equation for Season Points as a function of Wins and Overtime Losses together. (To do this,
put Season Points in dependent, and BOTH Wins and Overtime Losses in independent.) Include the coefficient
table.
d) How much of the variation in Season Points is explained by Wins and Overtime losses?
e) Interpret each of the slopes. (Remember, this is a multiple regression)
f)
Interpret the intercept. (-3.381 E-15 means 0.000000000000003381, treat it as zero)
g) Explain why your findings in D, E, and F aren’t surprising when you consider how Season Points are calculated.
The definition of season points is in the data set description above question 5.
8. This question is about dummy variables. Conference is a dummy variable because it’s a category that we can
use in a regression by treating like a 0-1 variable.
a) Perform an independent samples t-test on Season Points using Confname as a grouping variable (group1 = E,
group = W). What is the mean difference in season points between the two conferences? What is the p-value
against the null that there is no difference?
b) Find the regression on Season Points as a function of Conference using Confdummy as an independent variable.
How many season points more or less does a team in the Western Conference earn? What is the p-value against
a difference of zero?
Download