Practice Final

advertisement
Practice Exam
Here are some practice problems that we (Paul Ferguson and I) have compiled. They are
very similar to those I will test you on. In fact, many of these problems are straight off of
my old exams. There are approximately two tests worth of problems here, but I wanted
to give you plenty of practice. Problems are not worth equal points on the final exam.
Regression problems are typically longer, have more parts, and therefore receive more
points.
Good luck!
1) In an attempt to reduce the number of person-hours lost as a result of industrial
accidents, a large production plant installed new safety equipment. In a test of the
effectiveness of the equipment, a random sample of 50 departments was chosen. The
number pf person-hours lost in the month prior to and the month after the installation
of the safety equipment was recorded. The percentage change was calculated for the
sample and resulted in a mean of -1.2% and a sample standard deviation of s = 5%.
a) What conclusion can you draw using a 10% significance level?
b) Calculate the p-value for the test in part (a).
2) An automatic machine in a manufacturing process is operating properly if the lengths
of an important subcomponent are normally distributed, with mean = 117 cm and std.
dev = 5.2 cm.
a) Find the probability that one randomly selected unit has a length greater than
120 cm.
b) Find the probability that, if four units are randomly selected, their mean length
exceeds 120 cm.
3) A lawn and garden retailer operates 4 stores in the DFW metroplex. One of their
most popular items is a lawn tractor. Weekly customer demand is N(10,25) at each
store. Each store replenishes its stock to 15 lawn tractors at the start of each week.
Note: Assume weekly demands at each store are independent.
a) What is the probability of a stockout in a single store?
b) Suppose the 4 stores decide to pool their stock. Specifically, they decide to
pool their weekly allocations (4 X 15 = 60) in a centrally located warehouse
and draw from it as needed to satisfy their demand. How often will a store
experience a stockout now?
c) Comment on whether it is better for each store to have its own inventory or
whether a central distribution center is better.
4) Baseball fans marvel at the home runs Barry Bonds hits. To get an idea of how far
his home runs travel, a random sample of 6 home runs is used. The distance of each
is given below:
{440, 460, 420, 470, 460, 450}
The descriptive statistics for this data set are:
Barry’s HR Distance
Mean
Standard Error
450
7.302967
Median
455
Mode
Standard Deviation
460
17.88854
Sample Variance
320
Kurtosis
0.585937
Skewness
-0.94334
Range
Minimum
Maximum
Sum
Count
50
420
470
2700
6
a) Assuming distances are normally distributed, construct a 98% confidence
interval for the mean distance traveled.
b) A number of teams are interested in acquiring Barry as their ‘franchise
player’. In particular, the Baltimore Orioles are seriously interested in
pursuing him to bolster their forever sagging offense. The manager of the O’s
thinks that the stadium in Baltimore, Camden Yards, might hurt Barry’s home
run production. He estimates that Barry would need to average better than
430 ft per home run to maintain his offensive production in Camden Yards. Is
there sufficient information at the  .05 level to put the manager’s mind at
ease?
c) Calculate the p-value for the test in part (b).
5) REI, an outdoor retailer, operates 125 stores in the western region of the US. One of
their most popular items is ‘3-season’ tents (independent of retailers: e.g., North
Face, Sierra Designs, Mountain Hardware, etc.). Monthly customer demand at each
store for ‘3-season’ tents follows a uniform distribution between 5 and 9 (i.e., U[5,9]
at each store; mean = 7 and variance = 1.33). Note: Assume monthly customer
demands at each store are independent.
a) Assume each store replenishes its stock to 8 units at the start of each month.
This stock must last the entire month. What is the probability of a stockout in
a single store?
b) Suppose REI decides to hold its inventory of ‘3-season’ tents in a centralized
warehouse facility in Denver, CO. Each store in the western region draws
stock from the warehouse to satisfy its demand. The retailer will replenish the
inventory of tents to a level of 900 ‘3-season’ tents at the start of each month
in the warehouse (7.2 tents per store). This supply, again, must last for the
entire month. What is the probability that these 900 ‘3-season’ tents will
satisfy the demand for all 125 stores?
c) Suppose REI decides to set a minimum service level to 95% using the
inventory from its centralized warehouse. In other words, the retailer wishes
to meet monthly demand at all of its western region stores 95% of the time.
How many tents need to be stored in the warehouse at the start of each month
to meet this minimum service level (and, when stating your answer, answer in
the context of actually ordering tents from vendors)?
d) Comment on whether it is better for each store to have its own inventory or
whether a central distribution center is better.
6) (Yes, this is real data!) In the Excel worksheet “Problem 2 – Portfolio,” you are
presented with monthly return data for two investment funds — one Growth fund and
one Real Estate Investment Trust (REIT) fund. Please answer/address the following
questions:
a)
b)
c)
d)
e)
f)
Compute the sample means.
Compute the sample variances.
Compute the sample standard deviations.
Calculate the sample covariance for the funds.
Calculate the sample correlation for the funds.
Determine the allocations that minimize the variance of the portfolio (you
need only determine the allocation to the nearest tenth). Also, estimate the
expected return and the standard deviation of this portfolio. Hint: Using
Excel, first “guess” the fractions for each investment. Then make systematic
adjustments until the minimum variance portfolio is discovered (alternatively,
build a table and select the appropriate allocations). Finally, calculate the
expected return and standard deviation associated with this portfolio.
7) Profitable banks are ones that make good decisions on loan applications. Credit
scoring is a statistical technique that helps banks make that decision. However, many
branches overturn credit scoring recommendations, while still others do not use the
technique at all. In an attempt to determine the factors that affect loan decisions, a
statistician surveyed 100 banks and recorded the percentage of bad loans (any loan
that is not completely repaid), the average loan size, and whether a score card (a
credit scoring method) is used, and if so, whether the scorecard recommendations are
overturned more than 10% of the time. The worksheet “Problem 3 – Banks”
contains the data. Column 1 contains the percentage of good loans, column 2
contains the average loan amount, column 3 contains the credit scoring code (1 = no
scorecard, 2 = scorecard overturned more than 10% of the time, and 3 = scorecard
overturned less than 10% of the time).
a) Perform a regression analysis, state the regression equation and assess the fit.
b) Interpret and test the coefficients. What does this tell you?
c) Predict with 95% confidence the percentage of bad loans for a bank whose
average loan is $10,000 and which does not use a scorecard.
8) Physicians have been recommending more exercise for their patients, particularly
those who are overweight. One benefit of regular exercise appears to be a reduction
in cholesterol, a substance associated with heart disease. To study the relationship
more carefully, a physician took a random sample of 50 patients who do not exercise.
She measured their cholesterol levels. She then started them on regular exercise
programs. After 4 months, she asked each patient how many minutes per week (on
average) he or she exercised and also measured their cholesterol levels. The
worksheet “Problem 4 – Cholesterol” contains the data. Column 1 contains weekly
exercise in minutes, column 2 contains cholesterols level before the exercise program
and column 3 contains cholesterol levels after the exercise program.
a) Determine the regression equation that relates exercise time with cholesterol
reduction. Also, interpret the coefficient(s).
b) Discuss the fit of the model.
c) Can we conclude at the 5% significance level that the amount of exercise is
linearly related to cholesterol reduction?
d) Predict with 95% confidence the reduction in cholesterol level of an
individual who plans to exercise for 5 hours per week for a total of 4 months.
9) (Yes, this is real data!) Suppose we are interested in buying or selling products
through online auctions. What situations are good for buying? What situations are
good for selling?
To investigate this problem more rigorously, a researcher collected data on winning
bid prices for used computers purchased through online auctions. Over an
approximately three month interval beginning in May 2002, 488 purchases of Dell’s
Latitude CPXH 500GT 500MHz 128MB laptop on eBay were recorded. Data
included (1) the winning bid for a particular auction, (2) the day of the week the
auction closed, (3) the number of bids in the auction, (4) the number of auctions that
closed that day for the same laptop, and (5) the rank of the auction within a day (the
order it closed among auctions for the same item). The worksheet “Problem 5 –
Ebay” contains the actual data.
The day of the week was coded with dummy variables. SUN = 1 if it was a Sunday
(0 otherwise), MON = 1 if it was a Monday (0 otherwise), etc. The Excel output for
the model is given below (NOTE: As additional practice, you should be able to
recreate this output!)
Regression Statistics
Multiple R
0.471383199
R Square
0.222202121
Adjusted R Square
0.207557391
Standard Error
31.55046086
Observations
488
ANOVA
df
Regression
Residual
Total
9
478
487
Intercept
SUN
MON
TUES
WED
THUR
FRI
#Bids
#AUCTIONS
Rank-in-Day
Coefficients
Standard Error
558.693216
5.726602353
-4.294706396
5.260381458
9.906281109
5.670000132
17.39920411
5.252984387
15.38320751
5.471611839
16.90919123
5.397031356
10.42141417
5.090961399
1.52278322
0.291589103
-0.839917408
0.356722205
-1.761991465
0.411566571
SUN
SUN
MON
TUES
WED
THUR
FRI
#Bids
#AUCTIONS
Rank-in-Day
SS
135931.7025
475816.2955
611747.998
MS
F
Significance F
15103.5225 15.17284
8.43639E-22
995.4315806
t Stat
97.56102861
-0.816424898
1.747139485
3.312251251
2.811458115
3.133054103
2.047042465
5.222359836
-2.354541982
-4.281182168
P-value
0
0.414664
0.081255
0.000996
0.005134
0.001836
0.0412
2.64E-07
0.018949
2.25E-05
Lower 95%
Upper 95%
547.4407827 569.9456493
-14.63104331 6.041630523
-1.234932159 21.04749438
7.077401992 27.72100622
4.631815447 26.13459957
6.304345388 27.51403708
0.417977601 20.42485074
0.949827963 2.095738477
-1.54085534 -0.138979476
-2.570695316 -0.953287614
CORRELATION MATRIX
WED
THUR
FRI
-0.160313 -0.166031678 -0.185440699
-0.139881 -0.144871165 -0.161806532
-0.158989 -0.164660978 -0.183909765
1 -0.156328035 -0.174602705
-0.156328
1 -0.180831348
-0.174603 -0.180831348
1
0.018363 -0.009551683 -0.015571909
-0.111259 -0.138577096 -0.048708705
-0.065688 -0.081817125 -0.028758044
1
-0.148563898
-0.16885815
-0.160312802
-0.166031678
-0.185440699
0.026809086
0.005221148
0.003082611
MON
-0.148563898
1
-0.147337406
-0.139881152
-0.144871165
-0.161806532
0.08939618
-0.054887516
-0.032406068
TUES
-0.16885815
-0.147337406
1
-0.158989315
-0.164660978
-0.183909765
0.048372866
0.135913531
0.080244533
1.686543874
0.768556569
0.811538059
0.812097352
0.833052957
0.884575206
-0.135624146
0.171281717
-0.015765442
0.768556569
1.601056562
0.744962263
0.754937389
0.774085283
0.817335382
-0.180339334
0.212786718
-0.020963297
INVERSE OF CORRELATION MATRIX
0.811538059 0.812097
0.833052957 0.884575206
0.744962263 0.754937
0.774085283 0.817335382
1.662092941 0.779789
0.796794509 0.856283502
0.779789225 1.672419
0.826821526 0.868783747
0.796794509 0.826822
1.712524719 0.893256151
0.856283502 0.868784
0.893256151
1.77626704
-0.16076004 -0.11775 -0.094042202 -0.102422467
0.048830138 0.27359
0.302189257 0.230313932
-0.018687329 -0.013688 -0.010931806 -0.011905958
#Bids #AUCTIONS
0.026809 0.005221
0.089396 -0.054888
0.048373 0.135914
0.018363 -0.111259
-0.009552 -0.138577
-0.015572 -0.048709
1 -0.073642
-0.073642
1
-0.119202 0.590409
Rank-in-Day
0.003082611
-0.032406068
0.080244533
-0.065688364
-0.081817125
-0.028758044
-0.119201966
0.590408718
1
-0.135624
-0.180339
-0.16076
-0.11775
-0.094042
-0.102422
1.040647
-0.013248
0.120969
-0.015765442
-0.020963297
-0.018687329
-0.013687649
-0.010931806
-0.011905958
0.1209686
-0.907884524
1.549175529
0.171282
0.212787
0.04883
0.27359
0.302189
0.230314
-0.013248
1.62273
-0.907885
(a) Do the variables included in the model collectively explain a significant
amount of the variation in winning bids? Cite the appropriate test, your test
statistic, and your conclusion at the   .05 level. What is your p-value?
(b) What is the price difference between a Dell laptop auctioned on Saturday and
one auctioned on Sunday (all other things held equal)? Is this difference
statistically significant? Cite your null and alternative hypothesis, the relevant
test statistic, and your conclusion at the   .05 level. What is the p-value?
(c) Suppose you are interested in whether it is better to auction laptops on
weekdays or weekends (all other things being equal). What is your general
conclusion based on this model? What day would you auction your Dell laptop
on (ceteris paribus)? What day would you buy one on (ceteris paribus)?
(d) Is there a relationship between the winning bid price and the auction’s position
(rank-in-day) in this particular model? Cite an appropriate test, your test statistic,
and your conclusion (at the   .05 level). What is the p-value for this test?
(e) Suppose you wanted to test whether selling laptops on a Tuesday is
significantly different (at level .05) than selling on a Monday (ceteris paribus).
Construct an appropriate model using the data in the attached excel file. Write
down the formal test and your results/conclusions.
10) (Yes, this is real data!) Relief pitchers are baseball’s equivalent to place kickers in
the NFL. You bring in some poor sap with the game on the line and he’s either a
forgotten hero or a memorable goat. Many great relief pitchers do not have much on
their record in the way of wins or losses since their role is to save games, i.e., protect
a lead in the late innings.
If you want to know what the “experts” think, CBS Sportsline.Com (September 24,
2002) posts ratings for the majority of MLB relief pitchers. Along with the pitchers’
ratings they post assorted “hard data” on performance. The site does not include any
information on how they arrive at their expert ratings. The worksheet “Problem 6 —
Relief Pitcher,” contains the actual data.
(a) Are “wins” related to ratings? Build a simple linear regression model and
discuss your results.
(b) There appears to be a lot of residual “noise” in the data. Suppose we include
other variables to account for this noise. Are wins related to ratings in this new
model?
Download