LECTURE 23 Classical statistics • Readings: Section 9.1 (not responsible for t-based confidence

advertisement
X
X
Θ̂
Estimator
pΘ(θ)
X X
Θ
Estimator
Θ̂
NN
Θ
Estimator
pΘ(θ)
Θ ∈ {0, 1}
X =Θ+W
W ∼ fW (w)
Estimator
Θ̂ Θ̂
N
Estimator
Estimator
(x
|
θ)
p
Θ
pΘ(θ)
X|Θ
N
pX|Θ(x | θ)
Θ
Estimator
Θ ∈ {0,
1} Estimator
X =Θ+W
W ∼ fW (w)
pX|Θ(x | θ)
(w)
Θ
∈
{0,
1}
X
=
Θ
+
W
W ∼ fW
(w)
Θ
∈
{0,
1}
X
=
Θ
+
W
∼
f
W
W
N
pX|Θ(x | θ)
pΘ(θ) XX
pΘ(θ)
Classical
statistics
X
Estimator
W ∼ fW (w)
• Readings: Section 9.1
fΘ(θ)
(not responsible for t-based confidence
intervals, in pp. 471-473)
Θ
Θ̂
Θ̂
LECTURE 23
pΘp(θ)
(θ)
Θ̂
Θ ∈ {0, 1}
N
1/6
4
pX|Θ(x | θ)
Θ̂
N
Estimator
X
10
θ
Estimator
pX|Θ
pX
(x; (x
θ) | θ)
pX|Θ(x | θ)
X
• Outline
X Θ̂
=Θ̂Θ + W
Estimator
Θ̂
X
• also for vectors X and θ:
Estimator
pX1,...,Xn (x
Θ̂1 , . . . , xn ; θ1 , . . . , θm )
– Classical statistics
Θ̂
– Maximum likelihood (ML) estimation
Estimator
X
Θ̂
Estimator
• These areEstimator
NOT conditional probabilities;
θ is NOT random
– Estimating a sample mean
– Confidence intervals (CIs)
Θ ∈ {0, 1}
W ∼–
fW (w)
mathematically:
X=Θ
+W
many
models,
one for each possible value of θ
– CIs using an estimated variance
fΘ(θ)
1/6
4
10
θ
• Problem types:
– Hypothesis testing:
H0 : θ = 1/2 versus H1 : θ = 3/4
– Composite hypotheses:
� 1/2
H0 : θ = 1/2 versus H1 : θ =
– Estimation: design an estimator Θ̂,
to keep estimation error Θ̂ − θ small
Maximum Likelihood Estimation
Desirable properties of estimators
(should hold FOR ALL θ !!!)
• Model, with unknown parameter(s):
X ∼ pX (x; θ)
• Unbiased: E[Θ̂n] = θ
• Pick θ that “makes data most likely”
– exponential example, with n = 1:
E[1/X1] = ∞ �= θ
(biased)
θ̂ML = arg max pX (x; θ)
θ
• Compare to Bayesian MAP estimation:
• Consistent: Θ̂n → θ (in probability)
θ̂MAP = arg max pΘ|X (θ | x)
– exponential example:
(X1 + · · · + Xn)/n → E[X] = 1/θ
θ
θ̂MAP = arg max
θ
pX |Θ(x|θ)pΘ(θ)
– can use this to show that:
Θ̂n = n/(X1 + · · · + Xn) → 1/E[X] = θ
pX (x)
• Example: X1, . . . , Xn: i.i.d., exponential(θ)
max
θ
n
�
E[(Θ̂ − θ)2] = var(Θ̂ − θ) + (E[Θ̂ − θ])2
= var(Θ̂) + (bias)2
i=1
�
max n log θ − θ
θ
• “Small” mean squared error (MSE)
θe−θxi
θ̂ML =
n
�
xi
i=1
n
x1 + · · · + xn
�
Θ̂n =
n
X1 + · · · + X n
1
Estimate a mean
Confidence intervals (CIs)
• An estimate Θ̂n may not be informative
enough
• X1, . . . , Xn: i.i.d., mean θ, variance σ 2
Xi = θ + W i
• An 1 − α confidence interval
+
is a (random) interval [Θ̂−
n , Θ̂n ],
Wi: i.i.d., mean, 0, variance σ 2
Θ̂n = sample mean = Mn =
X1 + · · · + X n
n
∀ θ
– often α = 0.05, or 0.25, or 0.01
– interpretation is subtle
Properties:
• E[Θ̂n] = θ
• CI in estimation of the mean
Θ̂n = (X1 + · · · + Xn)/n
– normal tables: Φ(1.96) = 1 − 0.05/2
(unbiased)
• WLLN: Θ̂n → θ
•
+
P(Θ̂−
n ≤ θ ≤ Θ̂n ) ≥ 1 − α,
s.t.
(consistency)
MSE: σ 2/n
P
�
�
|Θ̂n − θ|
≤ 1.96 ≈ 0.95 (CLT)
√
σ/ n
�
• Sample mean often turns out to also be
the ML estimate.
E.g., if Xi ∼ N (θ, σ 2), i.i.d.
�
1.96 σ
1.96 σ
P Θ̂n− √
≤ θ ≤ Θ̂n+ √
≈ 0.95
n
n
More generally: let z be s.t. Φ(z) = 1 − α/2
�
�
zσ
zσ
P Θ̂n − √ ≤ θ ≤ Θ̂n + √
≈1−α
n
n
The case of unknown σ
• Option 1: use upper bound on σ
– if Xi Bernoulli: σ ≤ 1/2
• Option 2: use ad hoc estimate of σ
– if Xi Bernoulli(θ): σ̂ =
�
Θ̂(1 − Θ̂)
• Option 3: Use generic estimate
of the variance
– Start from σ 2 = E[(Xi − θ)2]
2=
σ̂n
n
1 �
(Xi − θ)2 → σ 2
n i=1
(but do not know θ)
Ŝn2 =
n
1 �
ˆ n )2 → σ 2
(Xi − Θ
n − 1 i=1
(unbiased: E[Ŝn2] = σ 2)
2
MIT OpenCourseWare
http://ocw.mit.edu
6.041SC Probabilistic Systems Analysis and Applied Probability
Fall 2013
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Download