Data analysis Ben Graham October 15, 2015 MA930, University of Warwick

advertisement
Data analysis
Ben Graham
MA930, University of Warwick
October 15, 2015
Statistics
Def 5.2.1
I Let X , . . . , Xn ∼ f (x | θ) independent
I
I
I
f (x
1
1
, . . . xn | θ) =
Qn
i=
1
f (xi | θ)
Let T (x , . . . , xn ) denote some function of the data,
i.e. T : Rn → R.
T is a statistic
1
I
I
T (x1 , . . . , x
T (x1 , . . . , x
n
n
) = x1
) = max
i
x
i
I mean, median, etc.
I
I
θ,EX , Var(X ), etc, are not statistics.
The probability distribution of T is called the sampling
distribution of T .
Common statistics
I
I
X
1
, . . . , Xn i.i.d.r.v.
Sample mean
X̄
I
=
X
1
+ · · · + Xn
n
=
n
X
i=
Xi
1
Sample variance
n
n
1 X
1 X
S =
(X − X̄ ) =
X − nX̄
n − 1 i= i
n − 1 i= i
"
2
2
1
I
2
#
2
1
If the mean EXi and variance Var(Xi ) exist, then these are
unbiased estimates.
Trimmed mean
I
Trimmed/truncated mean
I
p∈[
, ]
0 1
I Remove the smallest and biggest
np
items from the sample
I Take the mean of what is left
I i.e. LIBOR, 18 banks, top and bottom 4 removed
I Cauchy location parameter
Def 5.4.1 Order Statistics
I
The order statistics of a random sample X , . . . , Xn are the
values placed into increasing order: X( ) , . . . , X(n) .
1
I
I
X(1) =
X(2) =
mini
I
I
X( ) =
n
X
i
second smallest
I ...
I
1
maxi
X
i
The sample range is R = X(n) − X(
The sample median is
)
1
M=
X((n+ )/ )
n odd
X(n/ ) + X(n/ + ) n even
(
1
2
2
1
1
2
2
2
1
The median may give a better sense of what is typical than the
mean.
Quantiles
I
For p ∈ [0, 1], the p quantile is (R, method 7 of 9)
(1−γ)x(j ) +γ x(j +1) ,
I
I
I
I
(n−1)p < j ≤ (n−1)p +1,
Quantiles are a continuous function of the data
data=c(1,3,7)
q=seq(0,1,length.out = 1000)
plot(quantile(data,q))
Quartiles: i /4 quantiles
Deciles: i /10 quantiles
Percentiles: i /100 quantiles
γ = np +1−p −j
Theorem 5.4.4 Distribution of the order statistics
I
I
Sample of size n with c.d.f. FX and pdf fX
Binomial distribution:
F X( ) ( x ) =
j
I
n X
n
k
k =j
[FX (x )]k [1 − FX (x )]n−k
Dierentiate:
fX( ) (x ) =
j
n!
fX (x ) [FX (x )]j − [1 − FX (x )]n−j .
(j − 1)!(n − j )!
1
QQ-plots
I
I
I
I
X( ) ≤ · · · ≤ X(n) order statistics
suspected c.d.f. F
The j -th order statistic is expected to be ≈ F − ( j −n
Plot X( ) ≤ · · · ≤ X(n) against
F − ( n ), F − ( n ) . . . , F − ( n−n )
t line y = x →F is the right c.d.f
1
1
2
1
2
I
I
1
)
1
1
I
2
1
3
2
1
2
1
2
N.B. expect some noise
plot(apply(replicate(10000,sort(runif(11))),1,sd))
plot(apply(replicate(10000,sort(rnorm(11))),1,sd))
t a line →F is correct c.d.f. for some aX + b, a, b ∈ R.
not a line → F is not really the c.d.f.
More sample mean
I
Characteristic function
I
Thm 5.3.1: If Xi ∼ N (µ, σ ) are independent then
n
it EX
n
+ o (t /n)
φX̄ (t ) = [φX (t /n)] ≈ 1 +
n
2
I
I
I
I
X̄
S2
X̄ ∼ N (µ, σ2 /n)
(n − )S 2 /σ 2 ∼ χ2−1
and
are independent
.
1
n
Proof: For simplicity, assume µ = 0 and σ = 1.
2
Ingredients for the proof
I
If e , . . . , en form an orthonormal basis for Rn .
Then (ei · X )ni= are i.i.d.r.v N (0, 1).
Let e = ( √n , . . . , √n )
The Γ(α, β) distribution is dened
1
1
I
I
1
1
1
f (x ) = C (α, β)x α− exp(−β x ).
1
I
I
I
The χk distribution is the Γ(k /2, 1/2) distribution.
If X ∼ N (0, 1), then X ∼ χ
The sum of k independent χ r.v. is χk .
2
2
2
1
2
1
2
Derived distributions
I
If Z ∼ N (0, 1) and V ∼ χk then Z / V /k ∼ tk (Student's t)
I
I
I
X̄ −√µ
∼t
S / n n−
1
If U ∼ χk and V ∼ χ` are independent, then UV //`k ∼ Fk ,`
Suppose X , . . . , Xm ∼ N (µX , σX ) and
Y , . . . , Yn ∼ N (µY , σY ). Then
2
2
2
1
1
2
SX /σX
SY /σY
I
p
2
2
2
2
2
∼ Fm−1,n−1
These distributions are also used for linear regression/ANOVA.
Convergence in probability
I
Def 5.5.1: A sequence of random variables X , X , . . . converge
in probability to a random variable X if for every > 0,
1
2
lim P (|Xn − X | ≥ ) = 0.
n→∞
I
Thm 5.5.2 (Weak Law of Large Numbers) Let X , . . . , Xn be
i.i.d.r.v. with mean
P µ and variance σ < ∞ (or E|X | < ∞).
Dene X̄n = n ni= Xi .
X̄n converges in probability to µ.
Proof: Characteristic functions:
1
2
1
1
n
it EX
n
φX¯ (t ) = [φX (t /n)] ≈ 1 +
+ o (t /n)
n
n
Ex 5.5.3 Consistency of S
2
Sample variance
n
n
1 X
1 X
S =
(Xi − X̄n ) =
X
n − 1 i=
n − 1 i= i
"
2
2
2
1
1
Converges to σ in probability if E|X | < ∞.
2
2
#
−
n
X̄
n−1 n
2
Def 5.5.6 Almost sure convergence
I
A sequence of random variables X , . . . , Xn converges almost
surely to a random variable X if, for all > 0,
1
P
I
I
I
lim |Xn − X | < = 1.
n→∞
N.B. the limit is now on the inside.
Xn ∼ Bernoulli (1/n) independent: converge in probability to
0, not almost surely.
Theorem 5.5.9 Strong Law of Large Numbers:
Let X , X be iidrv with mean µ and variance σ < ∞
(or even better:Pif E|X | < ∞).
Dene X̄n = n ni= Xi .
Then X̄n converges almost surely to µ.
1
2
2
1
1
6.2 Sucient statistics
I
I
I
I
I
I
I
I
Def 6.2.1 A statistic T (X ) is a sucient statistic for θ if the
conditional distribution of the sample X given the value of
T (X ) does not depend on θ.
Thm 6.2.2 If f (x | θ)/f (T (x ) | θ) is constant, then T (X ) is
sucient.
Thm 6.2.6 T (X ) is sucient i f (x | θ) = g (T (x ) | θ)h(x ) for
some g , h
Example: Independent Xi ∼Bernoulli(θ), θ ∈ (0, 1).
Example: Independent X , . . . , XN ∼Uniform(0, θ), θ > 0.
Example: Independent X , . . . , Xn ∼ N (θ, σ ), θ ∈ R
Example: Independent X , . . . , Xn ∼ N (θ , θ ), θ ∈ R, θ > 0
Minimal sucient statistics
1
2
1
1
1
2
2
1
2
6.3 Likelihood principle
I
I
I
I
Random sample X = (X , . . . , Xn )
Xi ∼ f (xi | θ) pmf or pdf
1
X
∼
Q
i f (xi | θ) = f (x | θ)
Likelihood function
L(θ | x ) = f (x | θ)
I
Likelihood principle: if
L(θ | x )/L(θ | y )
is independent of θ, then the conclusions drawn from x and y
should be identical.
Chapter 7 Point estimation
7.2.2 Maximum Likelihood Estimator
Q
k
I L(θ | x ) =
i f (xi | θ), θ ∈ R
I MLE: Statistic θ̂(x ) = arg maxθ L(θ | x )
Download