All of Statistics: Chapter 7 Toby Xu UW-Madison 07/02/07 The Empirical Distribution Function Def: The Empirical distribution function Fˆn is the CDF that puts mass 1/n at each data point Xi. Formally, 1 n F̂n (x) = I ( Xi x) n i 1 Where 1 if Xi x I ( Xi x) 0 if Xi x Theorems: E( Fˆn ( x)) F ( x) The supremum or least upper bound of a set S of real numbers is denoted by sup(S) and is defined to be the smallest real number that is greater than or equal to every number in S. F ( x)(1 F ( x)) V ( Fˆn ( x)) n MSE= F ( x)(1 F ( x )) 0 n sup | Fˆn ( x) F ( x) | 0 DKW inequality: P(sup | F ( x) Fˆn( x) | ) 2e 2 n 2 DKW: confidence level L(x)=max{ Fˆn ( x) n ,0} U(x)=min{ Fˆn ( x) n ,1} Where 1 2 n log( ) 2n For an F P( L( x) F ( x) U ( x) for all x) 1 Statistical Functions A statistical function T(F) is any function of F. Mean: x dF(x) Variance : 2 ( x )2 dF ( x) Median : m=F-1(1/2) Plug-in estimator of T ( F ) is defined by ˆn T ( Fˆn ) If T ( F ) r ( x)dF ( x) for some function r(x) then T is called a linear function Statistical Functionals continued The plug-in estimator for linear functional 1 T ( F ) r ( x)dF ( x) r ( X ) n Assume we can find se, then for many cases: n n n i 1 i T ( Fn ) N (T ( F ), se 2 ) Normal-based interval for 95% CL T ( Fn ) 2se Examples: The Mean: let T ( F ) xdF ( x) , the plug-in estimator is ˆ xdFˆ ( x) X . se V ( X ) / n The Variance: T ( F ) V ( X ) x dF ( x) ( xdF ( x)) n 2 n n 2 ˆ 2 x 2 dFˆn ( x) ( xdFˆn ( x)) 2 1 n 1 n 2 X i ( X i )2 n i 1 n i 1 1 n ( X i X n )2 n i 1 1 n 2 Sample Variance: Sn ( X X ) i n n 1 i 1 2 2 Examples Continued The Skewness: Correlation: ˆ E( X ) (X i 3 i 3 ( x ) dF ( x) ( x ) dF ( x) 2 X n )(Yi Yn ) i (X 2 2 X ) i n (Y Y ) i i n 2 3/ 2