Data analysis Ben Graham October 15, 2015 MA930, University of Warwick

Data analysis Ben Graham MA930, University of Warwick October 15, 2015 Statistics Def 5.2.1 I Let X , . . . , Xn ∼ f (x | θ) independent I I I f (x 1 1 , . . . xn | θ) = Qn i= 1 f (xi | θ) Let T (x , . . . , xn ) denote some function of the data, i.e. T : Rn → R. T is a statistic 1 I I T (x1 , . . . , x T (x1 , . . . , x n n ) = x1 ) = max i x i I mean, median, etc. I I θ,EX , Var(X ), etc, are not statistics. The probability distribution of T is called the sampling distribution of T . Common statistics I I X 1 , . . . , Xn i.i.d.r.v. Sample mean X̄ I = X 1 + · · · + Xn n = n X i= Xi 1 Sample variance n n 1 X 1 X S = (X − X̄ ) = X − nX̄ n − 1 i= i n − 1 i= i " 2 2 1 I 2 # 2 1 If the mean EXi and variance Var(Xi ) exist, then these are unbiased estimates. Trimmed mean I Trimmed/truncated mean I p∈[ , ] 0 1 I Remove the smallest and biggest np items from the sample I Take the mean of what is left I i.e. LIBOR, 18 banks, top and bottom 4 removed I Cauchy location parameter Def 5.4.1 Order Statistics I The order statistics of a random sample X , . . . , Xn are the values placed into increasing order: X( ) , . . . , X(n) . 1 I I X(1) = X(2) = mini I I X( ) = n X i second smallest I ... I 1 maxi X i The sample range is R = X(n) − X( The sample median is ) 1 M= X((n+ )/ ) n odd X(n/ ) + X(n/ + ) n even ( 1 2 2 1 1 2 2 2 1 The median may give a better sense of what is typical than the mean. Quantiles I For p ∈ [0, 1], the p quantile is (R, method 7 of 9) (1−γ)x(j ) +γ x(j +1) , I I I I (n−1)p < j ≤ (n−1)p +1, Quantiles are a continuous function of the data data=c(1,3,7) q=seq(0,1,length.out = 1000) plot(quantile(data,q)) Quartiles: i /4 quantiles Deciles: i /10 quantiles Percentiles: i /100 quantiles γ = np +1−p −j Theorem 5.4.4 Distribution of the order statistics I I Sample of size n with c.d.f. FX and pdf fX Binomial distribution: F X( ) ( x ) = j I n X n k k =j [FX (x )]k [1 − FX (x )]n−k Dierentiate: fX( ) (x ) = j n! fX (x ) [FX (x )]j − [1 − FX (x )]n−j . (j − 1)!(n − j )! 1 QQ-plots I I I I X( ) ≤ · · · ≤ X(n) order statistics suspected c.d.f. F The j -th order statistic is expected to be ≈ F − ( j −n Plot X( ) ≤ · · · ≤ X(n) against F − ( n ), F − ( n ) . . . , F − ( n−n ) t line y = x →F is the right c.d.f 1 1 2 1 2 I I 1 ) 1 1 I 2 1 3 2 1 2 1 2 N.B. expect some noise plot(apply(replicate(10000,sort(runif(11))),1,sd)) plot(apply(replicate(10000,sort(rnorm(11))),1,sd)) t a line →F is correct c.d.f. for some aX + b, a, b ∈ R. not a line → F is not really the c.d.f. More sample mean I Characteristic function I Thm 5.3.1: If Xi ∼ N (µ, σ ) are independent then n it EX n + o (t /n) φX̄ (t ) = [φX (t /n)] ≈ 1 + n 2 I I I I X̄ S2 X̄ ∼ N (µ, σ2 /n) (n − )S 2 /σ 2 ∼ χ2−1 and are independent . 1 n Proof: For simplicity, assume µ = 0 and σ = 1. 2 Ingredients for the proof I If e , . . . , en form an orthonormal basis for Rn . Then (ei · X )ni= are i.i.d.r.v N (0, 1). Let e = ( √n , . . . , √n ) The Γ(α, β) distribution is dened 1 1 I I 1 1 1 f (x ) = C (α, β)x α− exp(−β x ). 1 I I I The χk distribution is the Γ(k /2, 1/2) distribution. If X ∼ N (0, 1), then X ∼ χ The sum of k independent χ r.v. is χk . 2 2 2 1 2 1 2 Derived distributions I If Z ∼ N (0, 1) and V ∼ χk then Z / V /k ∼ tk (Student's t) I I I X̄ −√µ ∼t S / n n− 1 If U ∼ χk and V ∼ χ` are independent, then UV //`k ∼ Fk ,` Suppose X , . . . , Xm ∼ N (µX , σX ) and Y , . . . , Yn ∼ N (µY , σY ). Then 2 2 2 1 1 2 SX /σX SY /σY I p 2 2 2 2 2 ∼ Fm−1,n−1 These distributions are also used for linear regression/ANOVA. Convergence in probability I Def 5.5.1: A sequence of random variables X , X , . . . converge in probability to a random variable X if for every > 0, 1 2 lim P (|Xn − X | ≥ ) = 0. n→∞ I Thm 5.5.2 (Weak Law of Large Numbers) Let X , . . . , Xn be i.i.d.r.v. with mean P µ and variance σ < ∞ (or E|X | < ∞). Dene X̄n = n ni= Xi . X̄n converges in probability to µ. Proof: Characteristic functions: 1 2 1 1 n it EX n φX¯ (t ) = [φX (t /n)] ≈ 1 + + o (t /n) n n Ex 5.5.3 Consistency of S 2 Sample variance n n 1 X 1 X S = (Xi − X̄n ) = X n − 1 i= n − 1 i= i " 2 2 2 1 1 Converges to σ in probability if E|X | < ∞. 2 2 # − n X̄ n−1 n 2 Def 5.5.6 Almost sure convergence I A sequence of random variables X , . . . , Xn converges almost surely to a random variable X if, for all > 0, 1 P I I I lim |Xn − X | < = 1. n→∞ N.B. the limit is now on the inside. Xn ∼ Bernoulli (1/n) independent: converge in probability to 0, not almost surely. Theorem 5.5.9 Strong Law of Large Numbers: Let X , X be iidrv with mean µ and variance σ < ∞ (or even better:Pif E|X | < ∞). Dene X̄n = n ni= Xi . Then X̄n converges almost surely to µ. 1 2 2 1 1 6.2 Sucient statistics I I I I I I I I Def 6.2.1 A statistic T (X ) is a sucient statistic for θ if the conditional distribution of the sample X given the value of T (X ) does not depend on θ. Thm 6.2.2 If f (x | θ)/f (T (x ) | θ) is constant, then T (X ) is sucient. Thm 6.2.6 T (X ) is sucient i f (x | θ) = g (T (x ) | θ)h(x ) for some g , h Example: Independent Xi ∼Bernoulli(θ), θ ∈ (0, 1). Example: Independent X , . . . , XN ∼Uniform(0, θ), θ > 0. Example: Independent X , . . . , Xn ∼ N (θ, σ ), θ ∈ R Example: Independent X , . . . , Xn ∼ N (θ , θ ), θ ∈ R, θ > 0 Minimal sucient statistics 1 2 1 1 1 2 2 1 2 6.3 Likelihood principle I I I I Random sample X = (X , . . . , Xn ) Xi ∼ f (xi | θ) pmf or pdf 1 X ∼ Q i f (xi | θ) = f (x | θ) Likelihood function L(θ | x ) = f (x | θ) I Likelihood principle: if L(θ | x )/L(θ | y ) is independent of θ, then the conclusions drawn from x and y should be identical. Chapter 7 Point estimation 7.2.2 Maximum Likelihood Estimator Q k I L(θ | x ) = i f (xi | θ), θ ∈ R I MLE: Statistic θ̂(x ) = arg maxθ L(θ | x )

Data analysis Ben Graham October 15, 2015 MA930, University of Warwick

Related documents

Products

Support

Data analysis Ben Graham October 15, 2015 MA930, University of Warwick

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib