MLE: r fo Example

advertisement
n
Stat 330 (Spring 2015): slide set 27
2
√
2πσ 2
1
e
μ̂ =
2
−
i=1
n
1
and setting
♠ However, bias does not ruin MLE’s because of other nice features like
small MSE etc.
3
2
♠ Do you find something different? σˆ2 =
s2 (divisor is n − 1 not n)! the
MLE is biased!
n
1
(xi − μ̂)2
σˆ2 =
n i=1
(xi −μ)2
2σ 2
Stat 330 (Spring 2015): slide set 27
= (2πσ 2)n/2 · e
1
xi = x̄,
n i=1
n
(x −μ)
− i 2
2σ
log L(μ, σ ) = 0 gives
i=1
n
♦ Plugging this value into the derivative for σ 2
d
log L(μ̂, σ 2) = 0 gives
dσ 2
♦ Setting
d
dμ
L(μ, σ 2) =
♦ Since we have values from n independent variables, the Likelihood
function is a product of n densities:
♥ What is the pdf of normal random variable?
x1, · · · , xn are the data/sample values of X1, · · · , Xn
Example: Let X1, . . . , Xn be i.i.d N (μ, σ 2), both μ and σ 2 are unknown.
♣ θ may be multiple: Θ ⊂ Rp with p > 1
Stat 330 (Spring 2015): slide set 27
♦ Need find values for μ and σ 2, that yield zeros for both derivatives at the
same time
n
n1
1 ∂
log L(μ, σ 2) = − 2 +
(xi − μ)2
2
∂σ
2σ
2(σ 2)2 i=1
n
n
1 1 ∂
(xi − μ) · (−2) = 2
(xi − μ)
log L(μ, σ 2) = 0 − 2 ·
∂μ
2σ i=1
σ i=1
♦ Since we have now two parameters, μ and σ 2, we need to get 2 partial
derivatives of the log-Likelihood:
n
1 l(μ, σ 2) = log L(μ, σ 2) = − ln(2πσ 2) − 2
(xi − μ)2
2
2σ i=1
♦ Log-Likelihood is:
Last update: April 1, 2015
Stat 330 (Spring 2015)
Slide set 27
Review: What is MLE? How to find it (5 steps)?
Example for MLE:
Stat 330 (Spring 2015): slide set 27
Interpretation of confidence intervals (CI):
♠ This is WRONG! (2, 8) is a fixed interval, it either contains or not contain
θ (in other word, the probability for θ in this particular interval is either 0
or 1)
6
(b) The probability associated with a confidence interval may also be
considered from a pre-experiment point of view, in the same context in
which arguments for the random allocation of treatments to study items
are made. Here the experimenter sets out the way in which they intend
(a) The confidence interval can be expressed in terms of samples (or repeated
samples): ” Were this procedure to be repeated on multiple samples, the
calculated confidence interval (which would differ for each sample) would
encompass the true population parameter 90% of the time”
7
to calculate a confidence interval and know, before they do the actual
experiment, that the interval they will end up calculating has a certain
chance of covering the true bun unknown value. This is very similar to the
”repeated sample” interpretation above, except that it avoids replying on
considering hypothetical repeats of a sampling procedure that may not be
repeatable in any meaningful sense.
Stat 330 (Spring 2015): slide set 27
θ∈Θ
Stat 330 (Spring 2015): slide set 27
for C.C.).
♠ Remark 4: min Pθ (θ ∈ (L, U )) is also called confidence coefficients (short
♠ Remark 3: The true value θ is either within the confidence interval or
not.
♠ Remark 2: In the above definitions, α is a value near 0.
♠ Remark 1: L, U in the first definition are functions of sample, that is,
L = L(X1, · · · , Xn) and U = U (X1, · · · , Xn). In other words, L, U are
random variables.
5
Wrong interpretation: I computed a 95% CI, it is (2, 8), so I can say my
parameter θ must be in this interval with probability 95%.
Right interpretation:
Stat 330 (Spring 2015): slide set 27
we say, that the interval (θ̂ − e, θ̂ + e) is an (1 − α) · 100% confidence
interval of θ, 2e is the size of confidence interval.
P (|θ̂ − θ| < e) ≥ 1 − α
Equivalent definition: Let θ̂ be an estimate of θ. If
4
The coverage probability 1 − α is called a confidence level.
P (L < θ < U ) = 1 − α.
Definition: An interval (L, U ) is an (1 − α) · 100% confidence interval for
the parameter θ if it contains the parameter with probability (1 − α)
Further thoughts: Instead of just looking at the point estimate, we will
now try to compute an interval around the estimated parameter value, in
which the true parameter is ”likely” to fall. An interval like that is called
confidence interval.
Motivations: The last lectures have provide a way to compute point
estimate for parameters. Based on that, it is natural to ask ”how good is
this point estimate?” Or ”how close is the estimate to the true value of the
parameter?”
Topic 2: Confidence intervals:
Stat 330 (Spring 2015): slide set 27
zα/2
1.65
1.96
α
0.02
0.01
zα/2
2.33
2.58
Some useful critical values, zα/2,
Stat 330 (Spring 2015): slide set 27
10
Example 2: Suppose, we want to analyze some complicated queueing
system, for which we have no formulas and theory. We are interested in the
♥ The 95% confidence interval is then: 21543 ± 588 i.e. if we repeat this
study 100 times (with 100 different employees each time), can say: in 95
out of 100 studies, the true parameter μ falls into a 588 range around x̄.
where Φ−1(1 − α/2) = Φ−1(1 − 0.05/2) = Φ−1(0.975) = 1.96
♥ By using the above expression, we get a 95% confidence interval as:
α 3000
21543 ± Φ−1 1 −
·√
= 21543 ± Φ−1(0.975) · 300
2
100
in 50th simulation
length of x̄ = 21.5 and s =
in 1st simulation
in 2nd simulation
= (17.9998, 25.0002)
11
♠ A 90% confidence interval is given as (α = 1 − 0.9 = 0.1)
15
s
s
15
x̄ − z0.05 · √ , x̄ + z0.05 · √
=
21.5 − 1.65 · √ , 21.5 + 1.65 · √
n
n
50
50
♠ After 50 simulations, we have got data:
x1
= number in queue at time 1000 hrs
x2
= number in queue at time 1000 hrs
...
x50 = number in queue at time 1000 hrs
♠
observations yield an average queue
Our
n
1
2
i=1 (xi − x̄) = 15.
n−1
♠ The only thing possible for us is to run simulations of this system and
look at the queue length at some large time t, e.g. t = 1000 hrs.
mean queue length of the system after reaching steady state.
Stat 330 (Spring 2015): slide set 27
♠ Recall Φ−1(1 − α/2) is defined by P (Z ≤ zα/2) = 1 − α/2
α
0.1
0.05
♠ How to find Φ−1(p) from table?
depending on α are:
9
Example 1: Suppose, we want to find a 95% confidence interval for the
mean salary of an ISU employee. A random sample of 100 ISU employees
gives us a sample mean salary of 21543 = x̄. Suppose, the standard
deviation of salaries is known to be 3000.
Examples for CI:
Stat 330 (Spring 2015): slide set 27
Large sample CI of μ: An (1 − α)· 100% confidence interval for μ is given
as
σ
σ
√
√
, x̄ + zα/2 ·
x̄ − zα/2 ·
n
n
−1
−1
where zα/2 = Φ (1 − α/2), −zα/2 = Φ (α/2).
n
1
2
♠ In practice, we plug-in s = n−1
i=1 (xi − x̄) for σ, when a value for
it is not available.
8
♠ By CLT, X̄ is an approximately normal distributed random variable with
2
X̄−μ
√ ∼
E[X̄] = μ and V ar[X̄] = σn , i.e. X̄ ∼ N (μ, σ 2/n) then Z := σ/
n
N (0, 1) = Φ
♠ X̄ is an unbiased estimator for μ
♠ Situation: we have a large set of observed values (n > 30, usually)
x1, · · · , xn, that these values are realizations of i.i.d X1, . . . , Xn with
E[Xi] = μ and V ar[Xi] = σ 2.
Large sample CI for μ:
♥ However, we do not know the true value. For different parameters, we
have different method to construct CI.
♠ If the true value of the parameter lies outside the 90% CI once it has
been calculated, then an even has occurred which had a probability of 10%
(or less) of happening by chance
Construct CI:
Stat 330 (Spring 2015): slide set 27
p(1−p)
n .
Example:
i=1
Stat 330 (Spring 2015): slide set 27
12
14
♥ There are not enough evidences to allow the conclusion that the true
average is NOT 0.3, namely, 0.3 could be the true average in 2002.
♠ The substitution method gives a slightly smaller confidence interval, but
both intervals contain 0.3.
♠ Compute a 95% CI for the true batting average to study the above
question.
√
♥ Conservative Method gives: 0.288 ± 1.96/(2 555) ⇒ 0.288 ± 0.042
♥ Substitution Method gives: 0.288 ± 1.96 · 0.288(1 − 0.288)/555 ⇒
0.288 ± 0.038
Example: In the 2002 season the baseball player Sammy Sosa had a batting
average of 0.288. (The batting average is the ratio of the number of hits
and the times at bat.) Sammy Sosa was at bats 555 times in the 2002
season. Could the ”true” batting average still be 0.300?
p(1 − p)/n
2. conservative C.I. are larger than C.I. found by substitution.
13
♦ Remarks:
♥ Recall: Bernoulli random variable: E(X) = p and Var(X) = p(1 − p)
1. for large n or p̂ is close to 0.5, there is almost no difference at all
2. Substitution method: using p̂ instead of p, so that (1 − α) · 100% CI for
p is p̂ ± zα/2 p̂(1 − p̂)/n.
♠ This corresponds to a Bernoulli-n-sequence, where we are only interested
in the number of successes, X, which in our case corresponds to the number
of individuals that qualify for the interesting subgroup.
♠ Unbiased estimator of p is p̂ = X̄ whose value is p̂ = x̄.
n
E(Xi) = p, and Var(X̄) = n1 Var(X1) =
♥ X1, · · · , Xn i.i.d., E(X̄) = n1
1. Conservative√ method: p := 0.5, so that (1 − α) · 100% CI for p is
p̂ ± zα/2/(2 n) (why conservative?).
♦ In the above expression, we need use an appropriate value to substitute
p:
♥ Using C.L.T., the (1 − α) · 100% CI is p̂ ± e where e = zα/2 ·
Stat 330 (Spring 2015): slide set 27
♠ In order to get an estimate for this proportion, we can take a sample of
n individuals from the population and check each one of them, whether or
not they fulfill the criterion to be in that proportion of interest.
♠ p denote a proportion of a large population or a probability
Large sample CI for proportion p:
Construct CI (Cont’d):
Download