ST102/Elementary Statistical Theory

advertisement
Examiners’ commentaries 2015
Examiners’ commentary 2015
ST102 Elementary Statistical Theory
General remarks
Learning outcomes
By the end of this module you should:
• be able to summarise the ideas of randomness and variability, and the way in which these link to
probability theory to allow the systematic and logical collection of statistical techniques of great
practical importance in many applied areas
• be competent users of standard statistical operators and be familiar with a variety of well-known
distributions and their moments
• understand the fundamentals of statistical inference and be able to apply these principles to
choose an appropriate model and test in a number of different settings
• recognise that statistical techniques are based on assumptions and in the analysis of real
problems the plausibility of these assumptions must be thoroughly checked.
Examination structure
You have three hours to complete this paper, which is in two parts: Sections A and B. Answer both
questions from Section A, and three questions from Section B. All questions carry equal marks.
Candidates who submit answers to more than three Section B questions will only have their best 3
answers count towards the final mark. All questions are given equal weight, and carry 20 marks.
What are the Examiners looking for?
The Examiners are looking for you to demonstrate command of the course material. Although the
final solution is ‘desirable’, the Examiners are more interested in how you approach each solution, as
such, most marks are awarded for the ‘method steps’. They want to be sure that you:
• have covered the syllabus
• know the various definitions and concepts covered throughout the year and can apply them as
appropriate to examination questions
• understand and answer the questions set.
You are not expected to write long essays where explanations or descriptions are required, and
note-form answers are acceptable. However, clear and accurate language, both mathematical and
written, is expected and marked.
1
ST102 Elementary Statistical Theory
Key steps to improvement
The most important thing you can do is answer the question set! This may sound very simple, but
these are some of the things that candidates did not do. Remember:
• Always show your working. The bulk of the marks are awarded for your approach, rather
than the final answer.
• Write legibly!
• Keep solutions to the same question in one place. Avoid scattering your solutions
randomly throughout the answer booklet — the Examiners will not appreciate having to spend a
lot of time searching for different elements of your solutions.
• Where appropriate, underline your final answer.
• Do not waste time calculating things which are not required by the Examiners!
Using the commentary
We hope that you find the commentary useful. For each question and subquestion, it gives:
• the answers, or keys to the answers, which the Examiners were looking for
• common mistakes, as identified by the Examiners.
Student performance by question
Question #
1
2
3
4
5
6
7
Number of
attempts
621
613
507
553
342
476
319
Mean Score
Std. deviation
9.93
12.32
10.15
13.39
7.75
13.53
11.00
3.71
5.80
6.53
3.87
3.85
5.68
5.87
10
0
5
Marks
15
20
Boxplots of student performance by question
Q1
Q2
Q3
Dr James Abdey, ST102 Lecturer, July 2015
2
Q4
Q5
Q6
Q7
Examiners’ commentaries 2015
Examiners’ commentary 2015
ST102 Elementary Statistical Theory
Specific comments on questions
Section A
Question 1
(a) Feedback on this question:
Most candidates performed strongly on this part of the question. However, some had difficulty
forming the correct distributions of Y and D in (i.) and (ii.) below. In (iv.), a common error was
determining the correct sample space and associated probabilities. Follow through marks were
awarded in (v.) as appropriate.
Full solutions are:
Let Xi ∼ N (150, 100) be the amount of coffee in cup i, and let Z ∼ N (0, 1).
i. By independence of the Xi s, then Y =
5
P
Xi ∼ N (750, 500). Hence:
i=1
P (Y > 700) = P
700 − 750
Z> √
500
= P (Z > −2.24) = 0.98745.
(3 marks)
ii. Let D = Xi − Xj for i 6= j, then D ∼ N (0, 200). Hence:
20 − 0
20 − 0
√
√
P (−20 < D < 20) = P −
<Z<
= P (−1.41 < Z < 1.41) = 0.8414.
200
200
(3 marks)
iii. We have:
P (Xi < 137) = P
Z<
137 − 150
√
100
= P (Z < −1.30) = 0.0968.
(1 mark)
iv. There are two possible outcomes: no income from cup i with probability 0.0968, and £1
income with probability 1 − 0.0968 = 0.9032. Therefore, the expected income from cup i is:
0 × 0.0968 + 1 × 0.9032 = £0.9032 ≈ £0.90.
(3 marks)
v. Let N denote the number of cups with coffee below 137 ml among the 5 cups purchased,
then N ∼ Bin(5, 0.0968). Hence:
P (N ≥ 1) = 1 − P (N = 0) = 1 − (0.9032)5 = 0.3989.
(3 marks)
3
ST102 Elementary Statistical Theory
(b) Feedback on this question:
This question was meant to be deliberately challenging and aimed to assess a candidate’s ability
to approach an unseen question through thinking. Indeed, the subject matter of a question
should be of great interest to all taking the course due to the graduate selection theme!
The probabilities for i = 1, 2 and n should have proved straightforward for all. The general
expression required a candidate to recognise that classical probability is appropriate.
Full solutions are:
Let:
πi = P (the best candidate is hired | the hiring occurs in the ith interview).
With n candidates, after i − 1 rejections there are n − (i − 1) = n − i + 1 remaining candidates.
Since candidates are selected for interview in a random order, for the best candidate to be hired
in the ith interview, s/he needs to be chosen from the remaining applicants which occurs with
probability:
1
.
πi =
n−i+1
Hence:
• πi = 1/n for i = 1
• πi = 1/(n − 1) for i = 2
• πi = 1/(n − k + 1) for i = k
• πi = 1 for i = n.
(7 marks)
4
Examiners’ commentaries 2015
Question 2
(a) Feedback on this question:
Overall candidates knew the concepts but algebraic mistakes were quite common. In (i.), errors
included:
• X1 = 1, X2 = 2, . . . , Xn = n
• mixing up the product of Xi s with the sum of Xi s
• algebraic manipulation to get the estimator, resulting in an expression in terms of X̄.
Full solutions are:
i. Since the Xi s are independent and identically distributed, the likelihood function is:
n √
P
n
Y
√
−θ
θn
θ
i=1
√ e−θ Xi =
L(θ) =
e
n
Q√
2
X
i
n
i=1
2
Xi
Xi
.
i=1
Hence the log-likelihood function is:
n p
n p
X
Y
l(θ) = ln L(θ) = n ln θ − θ
Xi − ln 2n
Xi
i=1
!
.
i=1
Differentiating with respect to θ, we obtain:
n
n Xp
dl(θ)
= −
Xi .
dθ
θ i=1
Equating to zero, we obtain the maximum likelihood estimator:
n
θb = P
n √
.
Xi
i=1
(8 marks)
b
ii. The maximum likelihood estimator of θ/2 is θ/2.
For the given data, the point estimate is:
θb =
√
2( 4.1 +
√
4
√
√
= 0.1953.
7.3 + 6.5 + 8.8)
(2 marks)
(b) Feedback on this question:
This question required material direct from the lectures. It was noticeable that many candidates
had memorised the bias, and some stated the result without proof.
Full solutions are:
i. Since σ 2 = µ2 − µ21 = E(X 2 ) − (E(X))2 , then:
n
σ
b2 = M2 − M12 =
n
1X 2
1X
Xi − X̄ 2 =
(Xi − X̄)2 .
n i=1
n i=1
(4 marks)
5
ST102 Elementary Statistical Theory
ii. We have:
E(b
σ2 )
=
E(X 2 ) − E(X̄ 2 )
=
σ 2 + µ2 −
=
n−1 2
σ .
n
σ2
+ µ2
n
The bias is E(b
σ 2 ) − σ 2 = −σ 2 /n.
(5 marks)
iii. The sample variance:
n
S2 =
1 X
(Xi − X̄)2
n − 1 i=1
is a more frequently-used estimator for σ 2 , due to it being an unbiased estimator, i.e.
E(S 2 ) = σ 2 .
(1 mark)
6
Examiners’ commentaries 2015
Section B
Question 3
(a) Feedback on this question:
In (i.), some candidates did not know the definition of the cdf, such as using a incorrect integral
interval and applied a discrete format and some calculations were incorrect. In (ii), some
candidates did not know the definition of expectation, with around half suffering calculation
problems.
Full solutions are:
i. We have:
Z
x
x
Z
f (t) dt =
−∞
k
α kα
dt = (−k α )
tα+1
α
=
(−k )(x
Therefore:
−α
(
F (x) =
−k
−α
Z
k
x
x
(−α) t−α−1 dt = (−k α ) t−α k
) = 1 − k α x−α = 1 − (k/x)α .
0
1 − (k/x)α
when x < k
when x ≥ k.
(5 marks)
ii. We have:
∞
Z
E(X)
=
x f (x) dx =
x f (x) dx
−∞
∞
Z
=
k
=
∞
Z
k
αk α
x · α+1 dx =
x
αk
α−1
∞
Z
|k
Z
∞
k
αk α
dx
xα
αk
(α − 1)k α−1
dx =
(α−1)+1
α−1
x
{z
}
(if α > 1).
=1
(7 marks)
(b) Feedback on this question:
The most frequent mistake made by candidates was being unable to split the integral into two
parts.
Full solutions are:
We have:
MX (t) = E(etX ) =
Z
∞
etx f (x) dx =
−∞
=
=
Z
∞
1
etx e−|x| dx
2
−∞
Z 0
Z ∞
1 (t+1)x
1 (t−1)x
e
dx +
e
dx
2
2
−∞
0
h
i0
h
i∞
1
1
e(t+1)x
+
e(t−1)x
2(t + 1)
2(t − 1)
−∞
0
=
−2
2(t + 1)(t − 1)
=
1
1 − t2
provided |t| < 1.
(8 marks)
7
ST102 Elementary Statistical Theory
Question 4
(a) Feedback on this question:
Candidates performed very well on this question, although the correct variance in (i.) and
degrees of freedom in (ii.) proved problematic for some. The values of k in the remaining parts
were generally free of errors.
Full solutions are:
i. ai Zi ∼ N (0, a2i ), hence:
5
X
a i Zi ∼ N
0,
i=1
5
X
!
a2i
.
4
i=1
ii. Zi2 ∼ χ21 , hence:
5
X
Zi2 ∼ χ25 .
i=1
2
iii. kZ1 + Z2 ∼ N (0, k + 1), hence:
P (kZ1 + Z2 ≤ 4) = P
Z≤√
k2 + 1
= 0.8413.
From tables, P (Z ≤ 1) = 0.8413, hence:
√
4
k2 + 1
=1
⇔
k=
√
15.
iv. X = Z12 + Z22 ∼ χ22 , hence:
P (X ≤ 7.378) = 0.975 = k.
v. V = Z12 + Z22 + Z32 ∼ χ23 and W = Z42 + Z52 ∼ χ22 . Therefore:
2k
V /3
≤
= 0.95.
P (V ≤ kW ) = P
W/2
3
From tables, P (F ≤ 19.2) = 0.95 where F ∼ F3, 2 , hence:
2k
= 19.2
3
⇔
k = 28.8.
(10 marks)
(b) Feedback on this question:
This was a fairly easy question making use of known properties of a probability function.
Full solutions are:
i. Let the other value be θ, then:
X
E(Y ) =
y P (Y = y) = (θ × 0.7) + (1 × 0.2) + (2 × 0.1) = 0
y
hence θ = −4/7.
ii. Var(Y ) = E(Y 2 ) − (E(Y ))2 = E(Y 2 ), since E(Y ) = 0. So:
!
2
X
4
2
2
Var(Y ) = E(Y ) =
y P (Y = y) =
−
× 0.7 + (12 × 0.2) + (22 × 0.1) = 0.8286.
7
y
(4 marks)
8
Examiners’ commentaries 2015
(c) Feedback on this question:
This proved the most challenging part of Question 4. The main difficulty seemed to be
understanding that a loss is simply a negative profit. The fact that the gambler bets on red 100
times means the sample size is sufficient to apply the central limit theorem.
Full solutions are:
We have:
E(X) = 5 × 18/38 + (−5) × 20/38 = −10/38 = −0.2632
and:
Var(X) = 25 × 18/38 + 25 × 20/38 − (−10/38)2 = 24.9308.
We require:
P
Y =
100
X
i=1
!
Xi > −50
≈P
−50 − 100(−0.2632)
Z> √
100 × 24.9308
= P (Z > −0.47) = 0.6808.
(6 marks)
9
ST102 Elementary Statistical Theory
Question 5
Feedback on this question:
Candidates usually score highly on questions involving discrete bivariate distributions, however
this year aggregate performance was below average. The original aspect of the question
concerned the use of a parameter θ in the table of joint probabilities, with the determination of
the parameter space required in (a). Of course, joint probabilities must be non-negative and
cannot exceed 1, so this needed to be recognised to determine the range of θ. Part (b) was
generally done well. Part (c) required some thought, but candidates needed to remember that an
estimator must be a statistic, i.e. a known function of the data – here Y and X for the two
estimators, respectively. Part (d) attempts were generally fine, while (e) required consideration of
the range of θ derived in (a).
Full solutions are:
(a) All values in the table should be in [0, 1] which is equivalent to θ ∈ [−0.1, 1/30].
(3 marks)
(b) For X:
−1
0.3 + 3θ
X=x
P (X = x)
0
0.4 − 6θ
1
0.3 + 3θ
We have:
E(X) = −0.3 − 3θ + 0.3 + 3θ = 0.
For Y :
Y =y
P (Y = y)
−1
0.5 + 4θ
0
0.3 − 5θ
1
0.2 + θ
We have:
E(Y ) = −0.5 − 4θ + 0.2 + θ = −0.3 − 3θ.
(4 marks)
(c) Since E(Y ) = −0.3 − 3θ, an unbiased estimator for θ based on Y is:
b ) = −0.1 − Y .
θ(Y
3
Since E(X) = 0, we need to consider |X| in order to obtain an estimator as a function of X. An
unbiased estimator would be:
|X|
b
θ(X)
= −0.1 +
.
6
(5 marks)
(d) Since E(X) = 0, it holds that:
Cov(U, X) = E(U X) − E(U ) E(X) = E(U X).
As X and Y take values in {−1, 0, 1}, the random variable U X has values in {−1, 0, 1} as well.
We have:
P (U X = 1)
= P (X = 1, U = 1) + P (X = −1, U = −1)
=
P (X = 1, Y = 1) + P (X = −1)
=
0.3 + 3θ
and:
P (U X = −1) = P (X = 1, U = −1) = P (X = 1, Y = −1) = 0.3 + 3θ.
It follows that E(U X) = 0, hence Cov(U, X) = 0.
(4 marks)
10
Examiners’ commentaries 2015
(e) The fact that Cov(U, X) = 0 is only sufficient to show that U and X are uncorrelated, not that
they are independent.
U and X and independent only when θ = −0.1.
When θ = −0.1, we have that P (X = 0) = 1 from which it readily follows that X and U are
independent.
When θ 6= −0.1, then P (U = 0, X = −1) = 0, however:
P (U = 0) P (X = −1) = (0.3 − 6θ)(0.3 + 3θ) 6= 0
since θ ≤ 1/30.
One could also argue that since U = min(X, Y ), then U is a function of X so they cannot be
independent. This would be correct if we were sure that X cannot be a constant, as is the case
when θ 6= −0.1.
(4 marks)
11
ST102 Elementary Statistical Theory
Question 6
(a) Feedback on this question:
Many candidates mixed up the F and t distributions, stating that the t distribution had a pair of
degrees of freedom.
Full solutions are:
2
2
X̄, Ȳ , SX
and SY2 are independent. X̄ ∼ N (µX , σX
/n), Ȳ ∼ N (µY , σY2 /m),
2
2
2
2
2
2
(n − 1)SX /σX ∼ χn−1 and (m − 1)SY /σY ∼ χm−1 .
2
Let σX
= σY2 = σ 2 . Hence X̄ − Ȳ ∼ N µX − µY , σ 2 (1/n + 1/m) and:
2
(n − 1)SX
+ (m − 1)SY2
∼ χ2n+m−2 .
σ2
We have:
X̄ − Ȳ − (µX − µY )
p
σ 2 (1/n + 1/m)
p
((n −
s
n+m−2
X̄ − Ȳ − δ0
·p
∼ tn+m−2 .
2 + (m − 1)S 2
1/n + 1/m
(n − 1)SX
Y
=
2
1)SX
+ (m −
1)SY2
)/σ 2 ) /(n + m − 2)
The tn+m−2 distribution arises due to the divison of a standard normal random variable by the
square root of an independent chi-squared random variable divided by its degrees of freedom.
(10 marks)
(b) Feedback on this question:
This question was generally done well, although common errors included using the test statistic
in (a) and using the t table to look up critical values.
Full solutions are:
We test H0 : σ12 = σ22 vs. H1 : σ12 6= σ22 . Under H0 , the test statistic is:
T =
S12
∼ Fn−1, m−1 = F7, 8 .
S22
Critical values are F0.975, 7, 8 = 1/F0.025, 8, 7 = 1/4.90 = 0.20 and F0.025, 7, 8 = 4.53. The test
statistic value is:
21.2/7
= 0.8130
29.8/8
and since 0.20 < 0.8130 < 4.53 we do not reject H0 , which means there is no evidence of a
difference in the variances.
(7 marks)
(c) Feedback on this question:
This required a standard definition of the p-value, although some candidates thought the p-value
is used to calculate critical values.
Full solutions are:
A p-value is a measure of the discrepancy between the hypothesised (claimed) value for θ and the
observed/estimated value. It is the probability of observing the test statistic value, t, or more
extreme values under the null hypothesis. It is compared to a significance level in order to decide
whether or not to reject the null hypothesis.
(3 marks)
12
Examiners’ commentaries 2015
Question 7
(a) Feedback on this question:
Around a third of candidates who attempted this question got the wrong answer when taking the
first-order derivative.
Full solutions are:
We have:
n
X
S=
(yi − βxi )2
i=1
and:
n
X
dS
= −2
xi (yi − βxi ).
dβ
i=1
Setting the first derivative to zero and solving, we obtain:
n
P
xi yi
i=1
b
.
β= P
n
x2i
i=1
Since:
P
n
xi yi

b = E  i=1
E(β)
n
 P
i=1
x2i
P
n

xi (βxi + εi )


 = E  i=1
n


P
i=1
x2i

n
P

=β+

i=1
xi E(εi )
n
P
i=1
=β
x2i
then βb is an unbiased estimator for β.
(5 marks)
(b) Feedback on this question:
In (i.), many struggled to derive the proof which, as can be seen below, is fairly short. Part (ii.)
was much better attempted
due to the provision of the formula sheet. In (iii.), many got the
P
wrong answer for i (xi − x)2 , although method marks were awarded as appropriate.
Full solutions are:
i. Since:
ybi = βb0 + βb1 xi = ȳ + βb1 (xi − x̄)
then:
ybi − ȳ = βb1 (xi − x̄).
The required identity follows from this immediately.
(4 marks)
ii. We have:
βb1
=
8.39 + 1.06 × 13.14/12
= 0.6823
14.09 − (−1.06)2 /12
βb0
=
13.14 + 1.06 × 0.6823
= 1.1553
12
σ
b2
=
(27.80 − 13.14 × 13.14/12) − (0.6823)2 (14.09 − 1.06 × 1.06/12)
= 0.6896.
10
Note (i.) is the regression sum of squares, hence subtracting from the total sum of squares
gives the residual sum of squares to obtain σ
b2 .
(6 marks)
13
ST102 Elementary Statistical Theory
iii. For x = 0.5, we have:
µ
b(x) = 1.1553 + 0.6823 × 0.5 = 1.4965.
Also:
X
(xi − x)2
=
i
and
P
i (xi
X
(xi − x̄)2 + n(x̄ − x)2
i
=
(14.09 − 1.06 × 1.06/12) + 12 × (1.06/12 − 0.5)2
=
16.03
− x̄)2 = 13.9964. Since t0.025, 10 = 2.228, a 95% confidence interval for µ(x) is:
µ
b(x) ± t0.025, 10 · σ
b·
!1/2
P
2
(x
−
x)
i
Pi
n j (xj − x̄)2
r
=
1.4965 ± 2.228 × 0.6896 ×
=
1.4965 ± 0.4747
=
(1.02, 1.97).
16.03
12 × 13.9964
(5 marks)
14
Download