LAST NAME (Please Print): KEY FIRST NAME

advertisement

LAST NAME (Please Print): KEY

FIRST NAME (Please Print):

HONOR PLEDGE (Please Sign):

Statistics 111

Midterm 4

This is a closed book exam.

You may use your calculator and a single page of notes.

The room is crowded. Please be careful to look only at your own exam. Try to sit one seat apart; the proctors may ask you to randomize your seating a bit.

Report all numerical answers to at least two correct decimal places or (when appropriate) write them as a fraction.

All question parts count for 1 point.

1

1. Captain Hornblower is sailing from Bombay to London. He can either sail through the

Suez canal or around the Cape of Good Hope. The Suez route risks capture by Somali pirates; this happens with probability 1/80 and will cost a random ransom that has a uniform distribution between $2 million and $4 million. If he travels around the

Cape, the longer trip will cost him $30,000.

$37.5K million What is the expected loss from taking the Suez route?

1 / 80

IE[ ransom ] = $37,500.

What is his best choice?

Cape

2. Suppose the duration of a summer romance has density f ( x ) = θx

( θ − 1) for 0 < x < 1 and 0 < θ and is 0 otherwise. (Here, time is scaled so that a full summer has duration 1.)

What is the survival function?

1

− x θ

The cdf is x θ , and S ( x ) = 1

F ( x ).

What is the hazard function?

θx

( θ −

1)

/ [1

− x θ ]

The hazard function is f ( x ) /S ( x ).

What is the shape of the failure rate?

increasing

Take the derivative of the hazard function, using the quotient rule, to get h ′ ( x ) =

θ ( θ

1) x

θ − 2

(1

− x

θ

(1

− x θ ) 2

θ

2 x

2( θ − 1)

.

You can rearrange things in the numerator to show this is positive, or just plug in a few values.

2

2.16

You observe three summer romances, which lasted 1/2, 3/4, and 2/3 of the summer. What is your maximum likelihood estimate for θ ?

The likelihood is f ( x

1

, x

2

, x

3

; θ ) = θ

3 Y i = 1

3 x

θ − 1 i so the log-likelihood is

3 ln θ + ( θ

1)

X ln x i and taking the derivative, setting it to 0, and solving shows ˆ = for these numbers.

3 / P ln x i or 2.1640

3. You can invest d dollars in fixing up a house to sell. You believe that the sale price of the fixed-up house has an exponential distribution with parameter λ = (5 d ) − 1 where d is the amount you invest.

$40K What is your net expected profit from investing $10K?

The expected profit is the difference between what you invest and your average sales price. For an exponential with λ = 1 / 5 d , the mean is 5 d but you had to spend d . So the profit is 4 d .

$30K If you utility for money is linear and you have $30K in the bank, how much should you invest?

You should invest all you have.

4. Describe how you would use 2-fold cross-validation to assess the predictive accuracy of a nonparametric regression. (Anthony: Please print neatly.) (6 points)

You randomly divide the training sample into two equal portions. (1 pt.) You fit the nonparametric regression to the first portion (1 pt.) and use it to predict the other

(1 pt.) You calculate the average squared error in those predictions. (1 pt.) Then you reverse the portions and repeat the process. (1 pt.) Finally, you average the two averages to estimate your error on a new observation. (1 pt.)

5. You want to compare salaries for new graduates among Harvard, Yale, Princeton, and

Duke. To control for effects due to major, you pick one person from each school who majored in Statistics, Economics, Computer Science, English, and Biology.

3

Source df SS MS F

School 3 40 13.33

0.44

Major 4 200 50 1.67

Error 12 360 30

Total 19 600

Complete the table above (10 points).

In words specific to the problem, what is the appropriate null hypothesis regarding school?

There are no differences in average starting salaries among these schools.

3.49

What is the critical value for a 0.05 level test on school?

It is F

3 , 12 , 0 .

05

= 3 .

49 .

In words specific to the problem, what is the conclusion for a 0.05 level test regarding school?

We see no evidence that average starting salaries are different among these schools.

No Was it useful to use blocks in this problem?

Because the block effects, i.e., the majors, were also not significant.

Assume that one should not have used blocks. Write the one-way ANOVA table for the situation above without blocking (12 points).

4

Source df SS MS F

School 3 40 13.33

0.38

Error 16 560 35

Total 19 600

6.

A Which model is better for the lifespan of rabbit in the wild: (A) competing risks or a (B) Cox proportional hazards model?

Eventually, something will eat the rabbit.

7.

0.02

You fit a Cox proportional hazards model to the lifespans of companies. The two covariates (measured in millions) are Liquidity, with coefficient 10, and CEO Pay, with coefficient -12. Suppose Exxon (in the numerator) has $8M and $7M for Liquidity and CEO Pay, respectively, while Walmart has $6M and $5M. What is the hazard ratio?

exp(10

∗ exp(10

8

12

6

12

7)

5)

= 0 .

0183 .

For the Capital S, lower-case m version of the exam, the answer is exp(32).

Which company will probably last longer?

Exxon

For the Capital S, lower-case m, it is Walmart

8. You observe the following IQs for random Duke students in three different majors.

Statistics Economics English

120

122

121

115

115

113

110

105

107

110

5

Suppose you do an analysis of variance and that the mean squared error is 5 and that the test is significant at the 0.05 level. Now you want to find which majors are significantly different.

2.36 or 2.37

What is the value of your critical value?

t

10

3;0 .

025

= 2 .

36 or 2 .

37 .

7.12

What is the value of your test statistic for comparing Statistics and English majors?

The test statistic is

X

S − p M SE (1 /n

X

S

En

+ 1 /n

En

)

.

9.

0.42

In order to pass this test, you need to know probability and statistics and have a working calculator. The chance that you know probability is 0.8, the chance that you know statistics is 0.9, and you brought two calculators to class: one fails with probability 0.7 and the other fails with probability 0.6. What is the probability that you pass?

This is a system with a series system and a parallel system, so 0.8 * 0.9 * (1 - 0.7 *

0.6) = 0.4176.

10. List all, and only, the true statements (10 pts.) A, D, F, G, H, I

A. Multicollinearity occurs when two explanatory variables are strongly correlated.

B. Interactions in regression are handled by taking the product of explanatory variables.

C. Increasing failure rates describe human lifespans.

D. Including irrelevant explanatory variables reduces predictive accuracy.

E. Interpolation is less reliable than extrapolation.

F. A running line smoother is more accurate than bin smoothing.

G. In high dimensions, it becomes hard for statistical tests to select a good model.

6

H. People underestimate common risks.

I. Humans frame risk perception in terms of dread and controllability.

J. People tend to have linear utility for money.

7

Download