Data analysis Ben Graham MA930, University of Warwick October 15, 2015 Statistics Def 5.2.1 I Let X , . . . , Xn ∼ f (x | θ) independent I I I f (x 1 1 , . . . xn | θ) = Qn i= 1 f (xi | θ) Let T (x , . . . , xn ) denote some function of the data, i.e. T : Rn → R. T is a statistic 1 I I T (x1 , . . . , x T (x1 , . . . , x n n ) = x1 ) = max i x i I mean, median, etc. I I θ,EX , Var(X ), etc, are not statistics. The probability distribution of T is called the sampling distribution of T . Common statistics I I X 1 , . . . , Xn i.i.d.r.v. Sample mean X̄ I = X 1 + · · · + Xn n = n X i= Xi 1 Sample variance n n 1 X 1 X S = (X − X̄ ) = X − nX̄ n − 1 i= i n − 1 i= i " 2 2 1 I 2 # 2 1 If the mean EXi and variance Var(Xi ) exist, then these are unbiased estimates. Trimmed mean I Trimmed/truncated mean I p∈[ , ] 0 1 I Remove the smallest and biggest np items from the sample I Take the mean of what is left I i.e. LIBOR, 18 banks, top and bottom 4 removed I Cauchy location parameter Def 5.4.1 Order Statistics I The order statistics of a random sample X , . . . , Xn are the values placed into increasing order: X( ) , . . . , X(n) . 1 I I X(1) = X(2) = mini I I X( ) = n X i second smallest I ... I 1 maxi X i The sample range is R = X(n) − X( The sample median is ) 1 M= X((n+ )/ ) n odd X(n/ ) + X(n/ + ) n even ( 1 2 2 1 1 2 2 2 1 The median may give a better sense of what is typical than the mean. Quantiles I For p ∈ [0, 1], the p quantile is (R, method 7 of 9) (1−γ)x(j ) +γ x(j +1) , I I I I (n−1)p < j ≤ (n−1)p +1, Quantiles are a continuous function of the data data=c(1,3,7) q=seq(0,1,length.out = 1000) plot(quantile(data,q)) Quartiles: i /4 quantiles Deciles: i /10 quantiles Percentiles: i /100 quantiles γ = np +1−p −j Theorem 5.4.4 Distribution of the order statistics I I Sample of size n with c.d.f. FX and pdf fX Binomial distribution: F X( ) ( x ) = j I n X n k k =j [FX (x )]k [1 − FX (x )]n−k Dierentiate: fX( ) (x ) = j n! fX (x ) [FX (x )]j − [1 − FX (x )]n−j . (j − 1)!(n − j )! 1 QQ-plots I I I I X( ) ≤ · · · ≤ X(n) order statistics suspected c.d.f. F The j -th order statistic is expected to be ≈ F − ( j −n Plot X( ) ≤ · · · ≤ X(n) against F − ( n ), F − ( n ) . . . , F − ( n−n ) t line y = x →F is the right c.d.f 1 1 2 1 2 I I 1 ) 1 1 I 2 1 3 2 1 2 1 2 N.B. expect some noise plot(apply(replicate(10000,sort(runif(11))),1,sd)) plot(apply(replicate(10000,sort(rnorm(11))),1,sd)) t a line →F is correct c.d.f. for some aX + b, a, b ∈ R. not a line → F is not really the c.d.f. More sample mean I Characteristic function I Thm 5.3.1: If Xi ∼ N (µ, σ ) are independent then n it EX n + o (t /n) φX̄ (t ) = [φX (t /n)] ≈ 1 + n 2 I I I I X̄ S2 X̄ ∼ N (µ, σ2 /n) (n − )S 2 /σ 2 ∼ χ2−1 and are independent . 1 n Proof: For simplicity, assume µ = 0 and σ = 1. 2 Ingredients for the proof I If e , . . . , en form an orthonormal basis for Rn . Then (ei · X )ni= are i.i.d.r.v N (0, 1). Let e = ( √n , . . . , √n ) The Γ(α, β) distribution is dened 1 1 I I 1 1 1 f (x ) = C (α, β)x α− exp(−β x ). 1 I I I The χk distribution is the Γ(k /2, 1/2) distribution. If X ∼ N (0, 1), then X ∼ χ The sum of k independent χ r.v. is χk . 2 2 2 1 2 1 2 Derived distributions I If Z ∼ N (0, 1) and V ∼ χk then Z / V /k ∼ tk (Student's t) I I I X̄ −√µ ∼t S / n n− 1 If U ∼ χk and V ∼ χ` are independent, then UV //`k ∼ Fk ,` Suppose X , . . . , Xm ∼ N (µX , σX ) and Y , . . . , Yn ∼ N (µY , σY ). Then 2 2 2 1 1 2 SX /σX SY /σY I p 2 2 2 2 2 ∼ Fm−1,n−1 These distributions are also used for linear regression/ANOVA. Convergence in probability I Def 5.5.1: A sequence of random variables X , X , . . . converge in probability to a random variable X if for every > 0, 1 2 lim P (|Xn − X | ≥ ) = 0. n→∞ I Thm 5.5.2 (Weak Law of Large Numbers) Let X , . . . , Xn be i.i.d.r.v. with mean P µ and variance σ < ∞ (or E|X | < ∞). Dene X̄n = n ni= Xi . X̄n converges in probability to µ. Proof: Characteristic functions: 1 2 1 1 n it EX n φX¯ (t ) = [φX (t /n)] ≈ 1 + + o (t /n) n n Ex 5.5.3 Consistency of S 2 Sample variance n n 1 X 1 X S = (Xi − X̄n ) = X n − 1 i= n − 1 i= i " 2 2 2 1 1 Converges to σ in probability if E|X | < ∞. 2 2 # − n X̄ n−1 n 2 Def 5.5.6 Almost sure convergence I A sequence of random variables X , . . . , Xn converges almost surely to a random variable X if, for all > 0, 1 P I I I lim |Xn − X | < = 1. n→∞ N.B. the limit is now on the inside. Xn ∼ Bernoulli (1/n) independent: converge in probability to 0, not almost surely. Theorem 5.5.9 Strong Law of Large Numbers: Let X , X be iidrv with mean µ and variance σ < ∞ (or even better:Pif E|X | < ∞). Dene X̄n = n ni= Xi . Then X̄n converges almost surely to µ. 1 2 2 1 1 6.2 Sucient statistics I I I I I I I I Def 6.2.1 A statistic T (X ) is a sucient statistic for θ if the conditional distribution of the sample X given the value of T (X ) does not depend on θ. Thm 6.2.2 If f (x | θ)/f (T (x ) | θ) is constant, then T (X ) is sucient. Thm 6.2.6 T (X ) is sucient i f (x | θ) = g (T (x ) | θ)h(x ) for some g , h Example: Independent Xi ∼Bernoulli(θ), θ ∈ (0, 1). Example: Independent X , . . . , XN ∼Uniform(0, θ), θ > 0. Example: Independent X , . . . , Xn ∼ N (θ, σ ), θ ∈ R Example: Independent X , . . . , Xn ∼ N (θ , θ ), θ ∈ R, θ > 0 Minimal sucient statistics 1 2 1 1 1 2 2 1 2 6.3 Likelihood principle I I I I Random sample X = (X , . . . , Xn ) Xi ∼ f (xi | θ) pmf or pdf 1 X ∼ Q i f (xi | θ) = f (x | θ) Likelihood function L(θ | x ) = f (x | θ) I Likelihood principle: if L(θ | x )/L(θ | y ) is independent of θ, then the conclusions drawn from x and y should be identical. Chapter 7 Point estimation 7.2.2 Maximum Likelihood Estimator Q k I L(θ | x ) = i f (xi | θ), θ ∈ R I MLE: Statistic θ̂(x ) = arg maxθ L(θ | x )