Chapter 2 Statistic Review

advertisement
Chapter 2 Statistic Review
A. Random variables;
1. expected value:
Define : X is a discrete random variable, “ the mean (or expected value) of X ” is the
weighted average of the possible outcomes, where the probabilities of the outcome
serve as the appropriate weight.
pi is ith of prob., i=1,2, ……n
 X  E( X )   pi X i  p1 X 1  p2 X 2  ...... pn X n
Interpretation:
The random variable is a variable that have a probability associated with
each outcome. Outcome is not controlled.
Discrete random Var. : has finite outcome, or outcome is countable infinite.
Continuous random Var.: uncountable infinite outcome, the probability of each outcome is
small because of too many numbers.
For normal random Var., probability density function is used to calculate the probability
between the are.
E( ): the expectations operator,  pi  1
N
 X 
X 1  X 2  X 3  .... X N

N (of obs.)
X
i 1`
N
i
… “ sample mean”, used to estimate  X

The X is changed from sample to sample. is not a fixed on time,
 the outcome selected
should not be the same. There is prob. associated with each X .


X is also a random variable, we can calculate E( X ).
2. variance: measure the dispersion(分散), the range of the value
Var ( X )   X2
  { pi [ X i  E ( X )] 2 }  E[ X i  E ( X )]  E ( X i ) 2   X
 X  Var(X )
 ˆ  S
2
X
2
X
2
“population variance” constant
…………... “population standard deviation”
(X

i
 X )2
N 1
“sample variance” used to estimate  X2
ˆ X  S X  S X2 ……….. “sample standard deviation”
1
3. joint distribution:
(linear relation of X and Y bi-variance random variance.)
Cov( X , Y )  E[( X   X )(Y  Y )]
Covariance, measuring the linear relationship between X and Y.
Cov( X , Y ) , depends on the units of X and Y; different unit-> different
Cov( X , Y ) >0: the best-fitting line has a positive slope, positive relationship between X and Y.
Cov( X , Y ) <0: the best-fitting line has a negative slope, negative relationship between X and Y.
Cov( X , Y ) =0: there is no linear relationship between X and Y, but may be have nonlinear
relationship.
 XY 
cov( X , Y )
“population correlation coefficient” is scale free.
 XY
 XY >0, a positive correction,  XY <0, a negative correction
 XY =0, no linear relationship between X and Y
1   XY  1,  XY =1: regression line is a straight line with positive slope,  XY =-1: regression
line is a straight line with negative slope.
N
Covˆ( X , Y ) 
(X
i 1
i
 X )(Yi  Y )
,
N 1
N
 XY
cov̂( X , Y )

=
S X SY
(X
i 1
i
 X )(Yi  Y )
 ( X  X ) ( Y
2
i
i
Y )
“sample correlation coefficient”
2
2
Ex: joint prob. distribution of X and Y
X
1
2
6
0.175
0.088
Y
5
0.070
0.210
4
0.018
0.105
Prob(X)
0.263
0.403
Ni
Nj
 p( X )  1,  p(Y )  1,  p
i 1 j 1
ij
3
0.035
0.105
0.194
0.334
Prob(Y)
0.298
0.385
0.317
1
 1, ( jo int prob)
E(X) =0.263×1+0.403×2+0.334×3=2.071
Var(X) =4.881-(2.071)2=0.591959
E(Y) =0.298×6+0.385×5+0317×4=4.981
Var(Y)= 25.425-(4.981)2=0.614639
Cov(X,Y)=  pi j X i Y j  E ( X ) E (Y ) =10.001-2.071×4.981= -0.317
i
j
4. formula
E(b)=b , Var(b)=0;E(aX)=aE(X), Var(aX)=a2 Var(X);
E(aX+b)=aE(X)+b, Var(aX+b)= a2 Var(X)
Var (aX  b)  E[( aX  b)  E (aX  b)] 2  E[ax  b  aE ( X )  b]2
 E{a 2 [ X  E ( X )]} 2  a 2 E[ X  E ( X )] 2  a 2Var ( X )
E(X+Y)=E(X)+E(Y), Var(X+Y)=Var(X)+Var(Y)+2Cov(X, Y)\
Var ( X  Y )  E[( X  Y )  E ( X  Y )] 2  E[ X  Y  E ( X )  E (Y )] 2
 E{[ X  E ( X )]  [Y  E (Y )]} 2  E{[ X  E ( X )] 2  [Y  E (Y )] 2  2[ X  E ( X )][Y  E (Y )]}
 Var ( X )  Var (Y )  2Cov( X , Y )
If
X and Y is independent (linear uncorrelated), than E(X+Y)=E(X)+E(Y)
 Cov(X,Y)=0,  XY =0,  Var(X+Y)=Var(X)+Var(Y)
3
 Cov( X , Y )  E[ X  E ( X )][Y  E (Y )]  E[ XY  XE ( X )  XE (Y )  E ( X ) E (Y )]
 E[ XY  Y X  X Y   X  Y ]  E ( XY )   X  Y   X  Y   X  Y  0
Cov( X , Y )  0, can’t define the X and Y are independent.
E ( X  X )  ( E ( X )) 2
X is not independent of itself.
4
B. (probability) distributions:
1. the normal distribution:
X `~ N (  X ,  X2 )
2. the standard normal distribution:
X  X
Z
X
,
Z ~ N (0, 1)
3. the Chi-square distribution:
N
 N2   Z i2  Z 12  Z 22    Z N2
i 1
Zi : N independently normal distribution random variables with 0 mean and variance 1.
 As N gets larger, the χ2 distribution because an approximation of normal distribution.
the rang of χ2 :
4.
0 ∞
the t distribution:
If (1) Z is normal distribution with mean 0 and variance 1,
(2) χ2 is distribution as Chi-square with N degrees of freedom,
(3) Z and χ2 are independent
Then
Z
 n2
~ tn
N
As N gets large, the t distribution will tend to approximately be normal distribution.
5. the F distribution:
if X and Y are independent and distribution as Chi-square will N1 and N2 degrees of
freedom, respectively thus
5
X
Y
N1
~ FN 1 , N 2
N2
 assume Xi’s are independent each other,
X ~ ( X ,
E(X)= E[
=
 X2
N
X
N
)
i
regardless of the distribution of X
]
1
1
E ( X i )  E ( X 1  X 2    X N )
N
N
N  X
1
( X  X X    X X ) 
 X
N
N
 X2  var( X )  var(
1
N
 Xi) 
 X2
1
1
2
var(
X
)

(
N


)

 i N2
X
N
N2
 var(  X i )  E[( X 1  X 2    X N )  E( X 1  X 2    X N )] 2
= E{[ X 1  E ( X 1 )]  [ X 2  E ( X 2 )    [ X N  E ( X N )]} 2
= [ X i  E( X i )]2  2 E[ X i  E( X i )][ X j  E( X j )]
=  var( X i )  var( X 1 )  var( X 2 )    var( X N )
=  X2   X2     X2  N X2
If X ~ N (  X ,  X2 ) , then i.i.d.
(independent with other variables, identically distributed over time)
If X ~ N (  X ,  X2 ) , then X ~ N (  X ,
 X2
N
)
* Central limit theorem:
If X ~ ( X ,  X2 ) (at any distribution), then X ~ N (  X ,
6
 X2
N
) as N(sample size) increase.
S  ˆ X 
(X
i
 X )2
N 1
,
what is the distribution of X   0
?
S
N
C. hypothesis testing:
ex.
H0: μ=100… null hypothesis,
H1: μ≠100 …alternative hypothesis
α=0.05 … level of significance
X  100
to get the value of X , the test statistic, Z 

2
if σ2 is know.
N
To check the critical value, Zc ( whether or not to accept H0)
If accept H0:“we can not reject H0 at 95% confidence level based on the data we have”
If reject H0:“we can reject H0 at 5% level of confidence.”
Type I error: reject H0 when H0 is true, the probability of making type I error isα.
Type II error: accept H0 when H0 is false, the probability of making type II error is difficult to
determine.
When the confidence interval increase, then the type I error will reduce and type II error will increase.
※

X  0 ( X   0 ) N


S
S
N
( X  0 ) N

( N  1) S 2
( N  1) 2
7
(1) if X ~ N (  X ,  ) , then X ~ N (  X ,
2
X
 X2
N
) , the numerator(分子) ( X   0 ) is the standard
S
N
normal variable Z, and
( N  1) S

2
2
is  N2 1
2
2
X X
X X
1
( N  1) S 2
( 2 
( i
) 
 ( i
)   N2 1 )

2
N 1




S2

X  0
S
N
~
Z
 N2 1
~ t N 1 ,
if numerator and denominator(分母) are independent.
( N  1)
(2) if X ~ N (  X ,  X2 ) , the we will appeal to the central limit theorem.
8
D. point and interval estimate:
1. point estimate: ex X =40 , but we don’t know whether the true value approximates it or
not.
2. interval estimate:
X  Z
X  t
X
N
2
, α=0.05 , if  X is know.
SX
2
, t 1
N
,α=0.05 , if  X is unknown and N is small (<31).
95% interval will include the true value.
E. properties of estimator: unbiasedness, efficiency, consistency
1. unbiasedness:
ˆ is an unbiased estimator of β, if E( ˆ ) =β(true value)
 ex. X is an unbiased estimator of  X ;ex. S X2 is an unbiased estimator of  X2
E (S
2
(X  X )
)  E[
2
N 1
=
]  E{
2
1
[ ( X   X ) 2  N ( X   X ) ]}
N 1
2
1
( N X2  N X )   X2
N 1
N
  ( X  X ) 2  [( X   X )  ( X   X )]2  [( X   X ) 2  ( X   X ) 2  2( X   X )( X   X )]
=  ( X   X ) 2  N ( X   X ) 2  2( X   X ) ( X   X )
=  ( X   X ) 2  N ( X   X ) 2  2N ( X   X ) 2
= ( X   X )2  N ( X   X )2

bias= E( ˆ ) -β
2. efficiency:
define: (a) minimum mean square error,
MSE( ˆ )=E( ˆ -β)2 =[bias( ˆ )]2+Var( ˆ ).
9
Pove : E ( ˆ   ) 2  E{[ ˆ  E (  )]  [ E (  )   ]}2
= E[ ˆ  E (  )] 2  [ E (  )   ] 2  2 E{[ ˆ  E (  )] [ E (  )   ]}
= var( ˆ )  [bias ( ˆ )] 2  2 E{ˆ E ( ˆ )  ˆ  [ E ( ˆ )] 2  E ( ˆ )}
= var( ˆ )  [bias ( ˆ )] 2  0
 we check X , it is mostly around μ.
 when bias( ˆ )=0, then MSE( ˆ )=Var( ˆ )
 If we have an unbiased estimator with large dispersion of true value (ie. High var.) and a
biased estimator with low var. we might prefer biased estimator than unbiased estimator to
maximize the precision of prediction.
~
~
Def. (b) If for a given sample size, (1) ˆ is unbiased, (2) Var( ˆ )  var(  ) , when  is any
other unbiased estimator of β. Then ˆ is an efficient estimator of β.
Cramer-Rao lower bounds: gives a lower limit for the variance of any unbiased estimator.
EX. The lower var( X )=
2
2 4
; the lower variance for  2 
N
N
2 4
( between the all unbiased estimators, the min. is var(ˆ ) 
)
N k
2
3. consistency:
The probability limit of ˆ ,
plim( ˆ )=β as N∞ ;
lim prob (   ˆ   )  1, for any δ>0.
N 
ie : prob( ˆ   )  1,
N 
prob( ˆ     )  0
N 
10
2. criteria to consistency:
(a) sufficient condition for consistency: not necessary for consistency
ˆ is a consistent estimator ofβ if plim( ˆ )=β as N∞
 consistency   plim( ˆ )=β
(b) mean square error consistency:
If ˆ is an estimator of β and if lim MSE ( ˆ )  0, then ˆ is a consistent estimator
N 
of β
 ie X is a consistent estimator of  X
 MSE ( X )  [bias ( X )] 2  var( X )  0 
Slutsky’s theorem: If
 X2
N

 X2
N
,
lim MSE  0
N 
plim( ˆ )=β and g( ˆ ) is a continuous function of ˆ ,
then plim g( ˆ )=g[ plim( ˆ )]
( This is “ consistency carries over” However, consistency is not always carries over. )
Biasedness doesn’t carry over.
11
4. asymptotic unbiasedness: as N becomes large and large.
“ ˆ ” is an asymptotically unbiased estimator of β if lim E ( ˆ )   ,
N 
 if estimator is unbiased ,  it will be asym. unbiased.

1
 Ex. If ~ 2   ( X i  X ) 2 , it is asym. unbias
N
N 1 2
 E (~ 2 ) 
 , lim E (~ 2 )   2
N 
N
5. asymptotic efficiency: N∞
“ ˆ ” is an asymptotically efficient estimator of β if all of the following conditions are
satisfied:
(a) ˆ is an asymptotic distribution with finite mean and finite variance.
(b) ˆ is consistent ;
(c) no other consistent estimator of β has small asymptotic variance than ˆ .
 if estimator is efficient,  it will be asym. efficient.

unbiaedness
asym. unbiasedness
efficiency
asym. efficiency
consistency
biaedness
asym. unbiasedness
inefficiency
asym. efficiency
consistency
biaedness
asym. unbiasedness
inefficiency
asym. inefficiency
inconsistency
12
Review of linear algebra
1. a mrtrix A is idempotent iff AA=A.
2. If the inner product of the 2 vectors vanishes (ie., the scalar is 0), then the vector are
orthogonal.
[An inner product (or scalar, or dot product) of 2 vectors is a row vector times a column vector,
yielding a scalar. An outer product of 2 vectors is a column vector times a new vector,
yielding a matrix ]
3. The rank of any matrix A, ρ(A), is the size of the largest non-vanishing determinant
contained in A;
Or, the rank is the maximum number of linearly independent rows or columns of A, where
a1, a2, … aN is a set of linearly independent vectors, iff
N
k a
j 1
j
j
 0 ,necessarily implies
k1  k 2    k N  0
 2 3
Ex. A  
 , A =12-24=-12≠ 0,
8 6 
3 6
B
 , B =12-12=0
 2 4
  ( A)  2
  (B)  1
,
 properties:
a. 0   ( A) =integer≦min(M,N) where A is an M×N matrix.
b.  ( A) =N,  (0)  0
c.  ( A) =  ( A' )   ( A' A)   ( AA' )
d. if A and B are of the same order,  ( A  B)   ( A)   ( B) .
e. if AB is defined,  ( AB)  min [  ( A),  ( B)]
f. If A is diagonal,  ( A) = number of nonzero elements.
g. If A is idempotent,  ( A) =trace(A.)
h. The ranks of a matrix is not changed if one row ( or column) is multiplied by a nonzero
constant, or if such a multiple of one row (column) is added to another row (column).
13
4. A aquare matrix of order N is nonsingular iff it is of full rank,  ( A) =N (or A  0);
otherwise, it is aingular.
 The rank of matrix is unchanged by premultiplying or postmultiplying by a nonsingular
matrix.
5. differentiation in matrix notation (rules):
 X1 
a M 


a. If X   , a     where ai=(i=1,2,…M) are constant, then

 M 1  
M 1
 X M 
a M 
 (a ' X )
 a.
X
b. IF A is a symmetric matrix of order M×M where the typical element is a constant aij, then
 ( X ' AX )
 2AX,
X
if not a symmetric matrix
 ( X ' AX )
 (A’+A)X.
X
c. IF A is a symmetric matrix of order M×M where the typical element is a constant aij, then
 ( X ' AX )
 2A,
X X '
 ( X ' AX )
 XX’
A
If Y and Z are vectors, B is a matrix, then
 (Y ' BZ )
 BZ
Y
,
 (Y ' BZ )
 B’Y ,
Z
14
 (Y ' BZ )
 YZ’
B
Formula for Matrix
( A B )'  B’A’
(A’)’= A
M N
AB≠BA
M  N N K
in general
( A  C )'  A'C '
M N
( D E ) 1  E 1 D 1 ,
M M M M
iff
det D≠0 and E≠0 ( ie. Iff
(D-1)’= (D’)-1
Det D=det D’
Trace(D ± E)= trace(D) ±trace(E)
Trace ( A F )  trace( FA)
M  N N M
N N
Trace(scalar)=saclar
E[trace(D)]=trace(E(D))
15
D and E nonsingular matrices)
Download