Proposition

advertisement
Distribution for Sample Mean
Distribution for Sample Mean
Proposition
Let X1 , X2 , . . . , Xn be a random sample from a distribution with
mean value µ and standard deviation σ. Then
1. E (X ) = µX = µ
√
2. V (X ) = σ 2 = σ 2 /n and σX = σ/ n
X
Distribution for Sample Mean
Proposition
Let X1 , X2 , . . . , Xn be a random sample from a distribution with
mean value µ and standard deviation σ. Then
1. E (X ) = µX = µ
√
2. V (X ) = σ 2 = σ 2 /n and σX = σ/ n
X
In words, the expected value of the sample mean equals the
population mean, which is called the unbiased property.
And the variance of the sample mean equals n1 of the population
variance
Distribution for Sample Mean
Distribution for Sample Mean
Proposition
Let X1 , X2 , . . . , Xn be a random sample from a distribution with
mean value µ and standard deviation σ. Define
T0 = X1 + X2 + · · · + Xn , then
√
E (T0 ) = nµ, V (T0 ) = nσ 2 and σT0 = nσ
Distribution for Sample Mean
Distribution for Sample Mean
Proposition
Let X1 , X2 , . . . , Xn be a random sample from a normal distribution
with mean value µ and standard deviation σ. Then for any n, X is
normally distributed (with mean value µ and standard deviation
√
σ/ n), as is T0 (with mean value nµ and standard deviation
√
nσ).
Distribution for Sample Mean
Distribution for Sample Mean
The Central Limit Theorem (CLT)
Let X1 , X2 , . . . , Xn be a random sample from a distribution with
mean value µ and standard deviation σ. Then if n is sufficiently
large, X has approximately a normal distribution with mean value
√
µ and standard deviation σ/ n, and T0 also has approximately a
normal distribution with mean value nµ and standard deviation
√
nσ. The larger the value of n, the better the approximation.
Distribution for Linear Combinations
Distribution for Linear Combinations
Proposition
Let X1 , X2 , . . . , Xn have mean values µ1 , µ2 , . . . , µn , respectively,
and variances σ12 , σ22 , . . . , σn2 , respectively.
1.Whether or not the Xi s are independent,
E (a1 X1 + a2 X2 + · · · + an Xn ) = a1 E (X1 ) + a2 E (X2 ) + · · · + an E (Xn )
= a1 µ1 + a2 µ2 + · · · + an µn
2. If X1 , X2 , . . . , Xn are independent,
V (a1 X1 + a2 X2 + · · · + an Xn ) = a12 V (X1 ) + a22 V (X2 ) + · · · + an2 V (Xn )
= a12 σ12 + a22 σ22 + · · · + an2 σn2
Distribution for Linear Combinations
Distribution for Linear Combinations
Proposition (Continued)
Let X1 , X2 , . . . , Xn have mean values µ1 , µ2 , . . . , µn , respectively,
and variances σ12 , σ22 , . . . , σn2 , respectively.
3. More generally, for any X1 , X2 , . . . , Xn
V (a1 X1 + a2 X2 + · · · + an Xn ) =
n X
n
X
i=1 j=1
ai aj Cov (Xi , Xj )
Distribution for Linear Combinations
Proposition (Continued)
Let X1 , X2 , . . . , Xn have mean values µ1 , µ2 , . . . , µn , respectively,
and variances σ12 , σ22 , . . . , σn2 , respectively.
3. More generally, for any X1 , X2 , . . . , Xn
V (a1 X1 + a2 X2 + · · · + an Xn ) =
n X
n
X
ai aj Cov (Xi , Xj )
i=1 j=1
We call a1 X1 + a2 X2 + · · · + an Xn a linear combination of the
Xi ’s.
Distribution for Linear Combinations
Distribution for Linear Combinations
Example (Problem 64)
Suppose your waiting time for a bus in the morning is uniformly
distributed on [0,8], whereas waiting time in the evening is
uniformly distributed on [0,10] independent of morning waiting
time.
Distribution for Linear Combinations
Example (Problem 64)
Suppose your waiting time for a bus in the morning is uniformly
distributed on [0,8], whereas waiting time in the evening is
uniformly distributed on [0,10] independent of morning waiting
time.
a. If you take the bus each morning and evening for a week,
what is your total expected waiting time?
Distribution for Linear Combinations
Example (Problem 64)
Suppose your waiting time for a bus in the morning is uniformly
distributed on [0,8], whereas waiting time in the evening is
uniformly distributed on [0,10] independent of morning waiting
time.
a. If you take the bus each morning and evening for a week,
what is your total expected waiting time?
b. What is the variance of your total waiting time?
Distribution for Linear Combinations
Example (Problem 64)
Suppose your waiting time for a bus in the morning is uniformly
distributed on [0,8], whereas waiting time in the evening is
uniformly distributed on [0,10] independent of morning waiting
time.
a. If you take the bus each morning and evening for a week,
what is your total expected waiting time?
b. What is the variance of your total waiting time?
c. What are the expected value and variance of the difference
between morning and evening waiting times on a given day?
Distribution for Linear Combinations
Example (Problem 64)
Suppose your waiting time for a bus in the morning is uniformly
distributed on [0,8], whereas waiting time in the evening is
uniformly distributed on [0,10] independent of morning waiting
time.
a. If you take the bus each morning and evening for a week,
what is your total expected waiting time?
b. What is the variance of your total waiting time?
c. What are the expected value and variance of the difference
between morning and evening waiting times on a given day?
d. What are the expected value and variance of the difference
between total morning waiting time and total evening waiting
time on a particular week?
Distribution for Linear Combinations
Distribution for Linear Combinations
Corollary
E (X1 − X2 ) = E (X1 ) − E (X2 ) and, if X1 and X2 are independent,
V (X1 − X2 ) = V (X1 ) + V (X2 ).
Distribution for Linear Combinations
Corollary
E (X1 − X2 ) = E (X1 ) − E (X2 ) and, if X1 and X2 are independent,
V (X1 − X2 ) = V (X1 ) + V (X2 ).
Proposition
If X1 , X2 , . . . , Xn are independent, normally distributed rv’s (with
possibly different means and/or variances), then any linear
combination of the Xi s also has a normal distribution. In
particular, the difference X1 − X2 between two independent,
normally distributed variables is itself normally distributed.
Distribution for Linear Combinations
Distribution for Linear Combinations
Example (Problem 62)
Manufacture of a certain component requires three different
maching operations. Machining time for each operation has a
normal distribution, and the three times are independent of one
another. The mean values are 15, 30, and 20min, respectively, and
the standard deviations are 1, 2, and 1.5min, respectively.
Distribution for Linear Combinations
Example (Problem 62)
Manufacture of a certain component requires three different
maching operations. Machining time for each operation has a
normal distribution, and the three times are independent of one
another. The mean values are 15, 30, and 20min, respectively, and
the standard deviations are 1, 2, and 1.5min, respectively.
What is the probability that it takes at most 1 hour of machining
time to produce a randomly selected component?
Point Estimation
Point Estimation
Example (a variant of Problem 62, Ch5)
Manufacture of a certain component requires three different
maching operations. The total time for manufacturing one such
component is known to have a normal distribution. However, the
mean µ and variance σ 2 for the normal distribution are unknown.
If we did an experiment in which we manufactured 10 components
and record the operation time, and the sample time is given as
1
2
3
4
5
time 63.8 60.5 65.3 65.7 61.9
following:
6
7
8
9
10
time 68.2 68.1 64.8 65.8 65.4
What can we say about the population mean µ and population
variance σ 2 ?
Point Estimation
Point Estimation
Example (a variant of Problem 64, Ch5)
Suppose the waiting time for a certain bus in the morning is
uniformly distributed on [0, θ], where θ is unknown. If we record 10
waiting times as follwos:
1
2
3
4
5
time 7.6 1.8 4.8 3.9 7.1
6
7
8
9
10
time 6.1 3.6 0.1 6.5 3.5
What can we say about the parameter θ?
Point Estimation
Point Estimation
Definition
A point estimate of a parameter θ is a single number that can be
regarded as a sensible value for θ. A point estimate is obtained by
selecting a suitable statistic and computing its value from the
given sample data. The selected statistic is called the point
estimator of θ.
Point Estimation
Definition
A point estimate of a parameter θ is a single number that can be
regarded as a sensible value for θ. A point estimate is obtained by
selecting a suitable statistic and computing its value from the
given sample data. The selected statistic is called the point
estimator of θ.
P
e.g. X = 10
i=1 Xi /10 is a point estimator for µ for the normal
distribution example.
Point Estimation
Definition
A point estimate of a parameter θ is a single number that can be
regarded as a sensible value for θ. A point estimate is obtained by
selecting a suitable statistic and computing its value from the
given sample data. The selected statistic is called the point
estimator of θ.
P
e.g. X = 10
i=1 Xi /10 is a point estimator for µ for the normal
distribution example.
The largest sample data X10,10 is a point estimator for θ for the
uniform distribution example.
Point Estimation
Point Estimation
Problem: when there are more then one point estimator for
parameter θ, which one of them should we use?
Point Estimation
Problem: when there are more then one point estimator for
parameter θ, which one of them should we use?
There are a few criteria for us to select the best point estimator:
Point Estimation
Problem: when there are more then one point estimator for
parameter θ, which one of them should we use?
There are a few criteria for us to select the best point estimator:
unbiasedness,
Point Estimation
Problem: when there are more then one point estimator for
parameter θ, which one of them should we use?
There are a few criteria for us to select the best point estimator:
unbiasedness,
minimum variance,
Point Estimation
Problem: when there are more then one point estimator for
parameter θ, which one of them should we use?
There are a few criteria for us to select the best point estimator:
unbiasedness,
minimum variance,
and mean square error.
Point Estimation
Point Estimation
Definition
A point estimator θ̂ is said to be an unbiased estimator of θ if
E (θ̂) = θ for every possible value of θ. If θ̂ is not unbiased, the
difference E (θ̂) − θ is called the bias of θ̂.
Point Estimation
Definition
A point estimator θ̂ is said to be an unbiased estimator of θ if
E (θ̂) = θ for every possible value of θ. If θ̂ is not unbiased, the
difference E (θ̂) − θ is called the bias of θ̂.
Principle of Unbiased Estimation
When choosing among several different estimators of θ, select one
that is unbiased.
Point Estimation
Point Estimation
Proposition
Let X1 , X2 , . . . , Xn be a random sample from a distribution with
mean µ and variance σ 2 . Then the estimators
Pn
Pn
(Xi − X )2
2
2
i=1 Xi
µ̂ = X =
and σ̂ = S = i=1
n
n−1
are unbiased estimator of µ and σ 2 , respectively.
e
If in addition the distribution is continuous and symmetric, then X
and any trimmed mean are also unbiased estimators of µ.
Point Estimation
Point Estimation
Principle of Minimum Variance Unbiased Estimation
Among all estimators of θ that are unbiased, choose the one that
has minimum variance. The resulting θ̂ is called the minimum
variance unbiased estimator ( MVUE) of θ.
Point Estimation
Principle of Minimum Variance Unbiased Estimation
Among all estimators of θ that are unbiased, choose the one that
has minimum variance. The resulting θ̂ is called the minimum
variance unbiased estimator ( MVUE) of θ.
Theorem
Let X1 , X2 , . . . , Xn be a random sample from a normal distribution
with mean µ and variance σ 2 . Then the estimator µ̂ = X is the
MVUE for µ.
Point Estimation
Point Estimation
Definition
Let θ̂ be a point estimator of parameter θ. Then the quantity
E [(θ̂ − θ)2 ] is called the mean square error (MSE) of θ̂.
Point Estimation
Definition
Let θ̂ be a point estimator of parameter θ. Then the quantity
E [(θ̂ − θ)2 ] is called the mean square error (MSE) of θ̂.
Proposition
MSE = E [(θ̂ − θ)2 ] = V (θ̂) + [E (θ̂) − θ]2
Point Estimation
Point Estimation
Definition
The standard
error of an estimator θ̂ is its standard deviation
q
σθ̂ = V (θ̂). If the standard error itself involves unknown
parameters whose values can be estimated, substitution of these
estimates into σθ̂ yields the estimated standard error (estimated
standard deviation) of the estimator. The estimated standard error
can be denoted either by σ̂θ̂ or by sθ̂ .
Methods of Point Estimation
Definition
Let X1 , X2 , . . . , Xn be a random sample from a distribution with
pmf or pdf f (x). For k = 1, 2, 3, . . . , the kth population
moment, or kth moment of the
distribution f (x), is E (X k ).
1 Pn
The kth sample moment is n i=1 Xik .
Definition
Let X1 , X2 , . . . , Xn be a random sample from a distribution with
pmf or pdf f (x; θ1 , . . . , θm ), where θ1 , . . . , θm are parameters
whose values are unknown. Then the moment estimators
θ̂1 , . . . , θ̂m are obtained by equating the first m sample moments to
the corresponding first m population moments and solving for
θ1 , . . . , θ m .
Methods of Point Estimation
Suppose that a coin is biased, and it is known that the average
proportion of heads is one of the three values p = .2, .3, or .8. An
experiment consists of tossing the coin twice and observing the
number of heads. This could be modeled as a random sample
X1 , X2 of size n = 2 from a Bernoulli distribution, Xi ∼ BER(p),
where the parameter is one of .2, .3, .8.
Consider the joint pdf of the random sample
f (x1 , x2 ; p) = p x1 +x2 (1 − p)2−x1 −x2
for xi
p
.2
.3
.8
= 0 or
(0,0)
.64
.49
.04
1. The
(0,1)
.16
.21
.16
values of f (x1 , x2 ; p) are provided as follows
(1,0) (1,1)
.16
.04
.21
.09
.16
.64
Methods of Point Estimation
The values of f (x1 , x2 ; p) are provided as follows
p (0,0) (0,1) (1,0) (1,1)
.2 .64
.16
.16
.04
.3 .49
.21
.21
.09
.16
.16
.64
.8 .04
The estimate that maximizes the “likelihood” for an observed pair
(x1 , x2 ) is


.2 if (x1 , x2 ) = (0, 0)
p̂ = .3 if (x1 , x2 ) = (0, 1) or (1, 0)


.8 if (x1 , x2 ) = (1, 1)
Methods of Point Estimation
Definition
Let X1 , X2 , . . . , Xn have joint pmf or pdf f (x1 , x2 , . . . , xn ; θ) where
the parameter θ is unknown. When x1 , . . . , xn are the observed
sample values and the above function f is regarded as a function
of θ, it is called the likelihood function and often is denoted by
L(θ). The maximum likelihood estimate (mle) θ̂ is the value of θ
that maximize the likelihood function, so that
f (x1 , x2 , . . . , xn ; θ̂) ≥ f (x1 , x2 , . . . , xn ; θ)
for all θ
When the Xi s are substituted in place of the xi s, the maximu
likelihood estimator result.
Methods of Point Estimation
The Invariance Principle
Let θ̂ be the mle of the parameter θ. Then the mle of any function
h(θ) of this parameter is the function h(θ̂).
Proposition
Under very general conditions on the joint distribution of the
sample, when the sample size n is large, the maximum likelihood
estimator of any parameter θ is approximately unbiased [E (θ̂) ≈ θ]
and has variance that is nearly as small as can be achieved by any
estimator. Stated another way, the mle θ̂ is approximately the
MVUE of θ.
Download