Notes 1 - Wharton Statistics Department

advertisement
Statistics 512 Notes I
D. Small
Reading: Section 5.1
Basic idea of statistical inference:
Population
Inference about
population using
statistical tools
Sample
of Data
Statistical Experiment: Observe data X. The distribution of
X is P.
P( X  E )  "Probability X is in E"
Model: Family of possible P’s.
P   { P , } .
We call  a parameter of the distribution.
Examples:
1. Binomial model. Toss a coin n independent times.
P(“Success”)=p.
X=# of successes
  [0,1]
2. Normal location model. Observe X=(X1,...,Xn), Xi
independent and identically distributed (iid) with a normal
distribution with unknown mean  and known variance
2.
1
1
f ( x; ) 
exp{ 2 ( x   ) 2 }
2
2
  (, )
3. Normal model with unknown mean and variance.
Observe X=(X1,...,Xn), Xi iid with a normal distribution
2
with unknown mean  and unknown variance  .
  (, )  (0, )
4. Nonparametric model. Observe X=(X1,...,Xn), Xi iid
real valued.
  {all distributions on }
  {cdf of distribution of Xi }
5. Survey sampling. There is a finite population of units
1,...,N that have variables Y1,...,YN associated with them.
We observe Y for n of the units u1,...,un, i.e., we observe
X1=Yu1,...,Xn=Yun.
  {Y1 ,..., YN }
We are usually interested in a particular function of  such
Y1   YN
as the population mean,
N
Two methods of choosing the units:
(A) Sampling with replacement: u1,...,un are iid from the
uniform distribution on {1,2,...,N}.
(B) Sampling without replacement (simple random
sample): Each unit will appear in the sample at most once.
Each of the possible  N  samples has the same probability.
n 
If N is much greater than n, the two sampling methods are
practically the same.
Statistical Inference: Statement about some aspect of 
based on a statistical experiment.
Note: We might not be interested in the entire  but only
some function of it, e.g., in Examples 3 and 4, we might
only be interested in the mean of the distribution.
Types of Inferences we will study:
1. Point estimation: Give best estimate of function of 
we are interested.
2. Interval estimation (confidence intervals): Give an
interval (set) in which function of  lies along with a
statement about how certain we are that function of 
lies in the interval.
3. Hypothesis testing: Choose between two hypotheses
about 
Point Estimation
Goal of point estimation is to provide the single “best
guess” of some quantity of interest g(  ).
g(  ) is a fixed unknown quantity.
A point estimator is any function of the data h(X). The
point estimator depends on the data so h(X) is a random
variable.
Examples of point estimators:
Binomial model: X~Binomial(n,p), n known
Point estimator for p: h(X)=X/n
Notation: We sometimes denote point estimator for a
parameter by putting a hat on it, i.e., pˆ  X / n . Also we
sometimes add a subscript n to denote the sample size,
pˆ n  X / n .
Normal model with unknown mean  and known or
2
unknown variance 
X1   X n
ˆ


X

Point estimator for  : n
n
Sampling distribution: A point estimator h(X) is a function
of the sample so h(X) is a random variable. The
distribution of a point estimator h(X) for repeated samples
is called the sampling distribution of h(X).
Example: Normal location model. Observe X=(X1,...,Xn),
Xi independent and identically distributed (iid) with a
normal distribution with unknown mean  and known
2
variance  .
X   Xn
ˆ n  X  1
n
2
Sampling distribution: ˆ n ~ N (  , n )
Properties of a point estimator:
1. Bias. The bias of an estimator of g ( ) is defined by
bias [h(X1 ,…,X n )]  E [h(X1 ,…,X n )]-g( )
We say that h(X1,...,Xn) is unbiased if
bias [h(X1 , , X n )]  0 for all  
Here E refers to the expectation with respect to the
sampling distribution of the data f ( x1 ,..., xn ; ) . It does not
mean we are averaging over a distribution for  .
An unbiased estimator is suitably “centered.”
2. Consistency: A reasonable requirement for an estimator
is that it should converge to the true parameter value as we
collect more and more information.
A point estimator h(X1,...,Xn) of a parameter g(  ) is
P
consistent if h(X1,...,Xn)  g ( ) for all   .
Recall definition of convergence in probability (Section
P
4.2). h(X1,...,Xn)  g ( ) means that for all   0 ,
lim P[| h( X 1 ,..., X n )  g ( ) |  ]  0 .
n 
3. Mean Square Error. A good estimator should on
average be accurate. A measure of the accuracy of an
estimator is the average squared error of the estimator:
MSE [h(X1 ,...,X n )]  E [{h(X1 ,...,X n )- }2 ]
Example: Suppose that an iid sample X1,...,Xn is drawn
from the uniform distribution on [0,  ] where  is an
unknown parameter and the distribution of Xi is
1
0<x<

f X ( x; )  
0
elsewhere
Consider the following estimator of  :
W=h(X1,...,Xn)=maxiXi
Sampling distribution of W:
If w<0, P( W  w )=0. If 0<w<  ,
 w
P(W  w)  P( X 1 ,..., X n  w)  [ P ( X 1  w]   
 
If w   , P( W  w )=0.
Thus,
n
n
0

n
 w 
FW ( w)   
  
1

if w<0
if 0  w  
if w>
and
 nwn 1

fW ( w)    n
0

0  w 
elsewhere
Bias:


nwn 1
0
0
n
E [W]   wfW ( w)dw   w
nwn 1
dw 
(n  1) n

0
 n 


n

1


n
1
   

n 1
n 1
There is a bias in W but it might still be consistent.
Bias  E [W ]   
Consistency:
Let Wn denote W for a sample of size n.
For any   0 ,
P(| Wn   |  )  P(    Wn     ) 


 
n( wn )n 1

dwn 
wnn

  
 1 

  
n
 n  
Note that for any   0 , it is possible to find an n making
[(   ) /  ]n as small as desired. Thus,
limn P(| Wn   |  )  1 and Wn is consistent.
n
Download