DS 303

advertisement
DS 303
Spring 2005
Exam # 3
Name: _________Key__________
1.
The information below represents the relationship between the selling price (Y, in
$1000) of a home, the square footage of the home ( X1 ), and the number of bedrooms in
the home ( X 2 ). The data represents 65 homes sold in a particular area of a city and was
analyzed using simple linear regression for each independent variable. Use the
information to answer the following questions.
Summary measures
Multiple R
R-Square
Standard Error
0.8148
0.6640
8.5572
Regression coefficients
Constant
Square Footage
Summary measures
Multiple R
R-Square
Standard Error
Coefficient
52.157
4.646
Std Err
7.4784
0.4164
t-value
6.9744
p-value
0.0000
Std Err
5.2324
1.6310
t-value
19.2316
6.7660
p-value
0.0000
0.0000
0.6487
0.4208
11.2344
Regression coefficients
Constant
Number of Bedrooms
a)
Coefficient
100.628
11.035
Is there evidence of a linear relationship between the selling price and the square
footage of the homes? State the null, the alternative hypothesis, the test statistic,
the decision criteria at  = 5% and your decision.
ANSWER:
Yes; Yˆ  4.646 X 1  52.157 ; This model shows that homes in this area start at an
average of $52,157 and the selling price increases by approximately $4,646 for
each square foot in house size.
b)
Identify and interpret the coefficient of determination ( R2 ) and the standard error
of the estimate (Sy.x) for the model in the above question.
ANSWER:
R2 = 0.6640; This represents 66.4% of the variation in selling price can be
explained by this regression equation. se = 8.5572; This represents the standard
deviation of the residuals.
c)
Is there evidence of a linear relationship between the selling price and number of
bedrooms of the homes? If so, interpret the least squares line and characterize the
relationship (i.e., positive, negative, strong, weak, etc.).
ANSWER:
Yes; Yˆ  11.035 X 1  100.628 ; This model shows that homes in this area start at an
average of $100,628 and the selling price increases by approximately $11,035 for
each bedroom in the house.
d)
Identify and interpret the coefficient of determination ( R2 ) and the standard error
of the estimate ( se ) for the model in Question c.
ANSWER:
R2 = 0.4208; This represents 42.08% of the variation in selling price can be
explained by this regression equation. se = 11.2344; This represents the standard
deviation of the residuals.
e)
Which of the two variables, the square footage or the number of bedrooms, is the
relationship with home selling price stronger? Justify your choice.
ANSWER:
Square footage seems to have a stronger relationship with the selling price. When
using square footage as the explanatory variable, the R2 value is higher (0.6640 >
.4208) and the se value (8.5572 < 11.2344) is lower. This indicates that the first
model (using square footage) is a better fitting model.
2. The following time series plot shows the monthly data on new homes sales in the
United States.
80
70
New Home sales
60
50
40
30
20
10
0
May-79
Oct-80
Feb-82
Jul-83
Nov-84
Mar-86
Months
To check the data for trend and seasonality, we also produced a correlogram for the new
homes sales.
ACF FUNCTION FOR NEW HOME SALES
1.0000
.8000
.6000
ACF
.4000
Upper Limit
.2000
Lower Limit
.0000
-.2000
1
2
3
4
5
6
7
8
9
10
11
12
-.4000
Based upon examination of the time-series plot and correlogram of new homes sales, are the
data seasonal? Is there an underlying trend? Explain
ANSWER
Both plots indicate the existence of trend in the data. The new homes sales are gradually
increasing. ACF value at time lag 12 is significant at 5% level, indicating existence of
seasonal components. The time series plot shows that new homes sales are the lowest in the
12th month. There is also cyclical component in this time series data as indicated by the not
so regular fluctuations around the underlying trend.
1.
In choosing the “best-fitting” line through a set of points in linear regression, we
choose the one with the:
a.
b.
c.
d.
e.
smallest sum of squared residuals**
largest sum of squared residuals
smallest number of outliers
largest number of points on the line
none of the above
2.
The regression line ŷ  -3 + 2.5 x has been fitted to the data points (28,60),
(20,50), (10,18), and (25,55). The sum of the squared residuals will be:
a. 20.25
b. 16.00
c. 49.00
d. 94.25**
e. none of the above
3.
If an estimated regression line has a y-intercept of –7.5 and a slope of 2.5, then
when x = 3, the actual value of y is:
a. 0**
b. 5
c. 10
d. –20
e. unknown
4.
In a test of the distribution of the anti-fungus activity of a chemical compound,
fungus is grown in petri dishes with different concentrations of the compound and
the diameter of the fungus colonies is measured after one day. There are 20
dishes, two at each of 10 concentrations. A plot of diameter against concentration
shows a straight-line pattern, with higher concentrations giving smaller diameters.
Least squares regression is used to analyze the data. What distribution is used in
the test of the hypothesis that concentration has no effect on diameter?
A)
B)
C)
D)
E)
5.
t- distribution with 9 degrees of freedom.
t- distribution with 8 degrees of freedom.
t- distribution with 19 degrees of freedom.
t- distribution with 18 degrees of freedom.**
None of the above.
Stepwise regression is an approach to choosing the independent variables to be
included in a multiple regression equation.
A) True**
B) False
C) Not enough information
6.
A time series can consist of four different components: trend, seasonal, cyclical,
and random (or noise).
a.
7.
E)
10.
rarely has a useful interpretation. **
almost always has a useful interpretation.
is always a positive number.
is always positive when the correlation between the dependent and independent
variable is positive.
All the above.
there has been an error since "b" cannot be a negative number.
there is a negative relationship between the two variables.
Y equals 44 when X is 10. **
the correlation coefficient for Y and X will be negative.
None of the above.
Visual inspection of the data will help the forecaster identify
A)
trend.
B)
seasonality.
C)
linearity.
D)
nonlinearity.
E)
All the above. **
A multiple regression model using 200 data points (with three independent variables) has
how many degrees of freedom for testing the statistical significance of individual slope
coefficients?
A)
B)
C)
D)
11.
False
The following regression equation was estimated: Y = -2.0 + 4.6X. This indicates that
A)
B)
C)
D)
E)
9.
b.
The Y-intercept of the simple regression model
A)
B)
C)
D)
8.
True**
199.
198.
197.
196. **
Which time-series component is said to fluctuate around the long-term trend and is fairly
irregular in appearance?
A)
B)
C)
Trend.
Cyclical. **
Seasonal.
12.
D)
Irregular.
E)
None of the above.
The difference between seasonal and cyclical components is:
13.
A)
Duration.
B)
Source.
C)
Predictability.
D)
Frequency.
E)
All the above. **
When a time series contains no trend, it is said to be
A)
B)
C)
D)
E)
nonstationary.
seasonal.
nonseasonal.
stationary. **
filtered.
Download