Applied Statistics I Liang Zhang July 7, 2008

advertisement
Applied Statistics I
Liang Zhang
Department of Mathematics, University of Utah
July 7, 2008
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
1 / 28
Covariance
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
2 / 28
Covariance
Definition
The covariance between two rv’s X and Y is
Cov (X , Y ) = E [(X − µX )(Y − µY )]
(P P
y (x − µX )(y − µY )p(x, y )
= R ∞x R ∞
−∞ −∞ (x − µX )(y − µY )f (x, y )dxdy
Liang Zhang (UofU)
Applied Statistics I
X , Y discrete
X , Y continuous
July 7, 2008
2 / 28
Covariance
Definition
The covariance between two rv’s X and Y is
Cov (X , Y ) = E [(X − µX )(Y − µY )]
(P P
y (x − µX )(y − µY )p(x, y )
= R ∞x R ∞
−∞ −∞ (x − µX )(y − µY )f (x, y )dxdy
X , Y discrete
X , Y continuous
Remark: The covariance depends on both the set of possible pairs and the
probabilities.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
2 / 28
Covariance
Definition
The covariance between two rv’s X and Y is
Cov (X , Y ) = E [(X − µX )(Y − µY )]
(P P
y (x − µX )(y − µY )p(x, y )
= R ∞x R ∞
−∞ −∞ (x − µX )(y − µY )f (x, y )dxdy
X , Y discrete
X , Y continuous
Remark: The covariance depends on both the set of possible pairs and the
probabilities.
Proposition
Cov (X , Y ) = E (XY ) − µX · µY
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
2 / 28
Covariance
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
3 / 28
Covariance
Example (Problem 75 revisit)
A restaurant serves three fixed-price dinners costing $12, $15, and $20.
For a randomly selected couple dinning at this restaurant, let X = the
cost of the man’s dinner and Y = the cost of the woman’s
dinner. If the joint pmf of X and Y is assumed to be
y
12 15 20
p(x, y )
12 .05 .05 .10
x
15 .05 .10 .35
20 0 .20 .10
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
3 / 28
Covariance
Example (Problem 75 revisit)
A restaurant serves three fixed-price dinners costing $12, $15, and $20.
For a randomly selected couple dinning at this restaurant, let X = the
cost of the man’s dinner and Y = the cost of the woman’s
dinner. If the joint pmf of X and Y is assumed to be
y
12 15 20
p(x, y )
12 .05 .05 .10
x
15 .05 .10 .35
20 0 .20 .10
Cov (X , Y ) = E (XY ) − µX · µY = 276.7 − 15.9 · 17.45 = −0.755
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
3 / 28
Covariance
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
4 / 28
Covariance
If we change the unit for the previous example from dollar to cent, then
the joint pmf would be
y
p(x, y )
1200 1500 2000
1200 .05
.05
.10
x
1500 .05
.10
.35
0
.20
.10
2000
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
4 / 28
Covariance
If we change the unit for the previous example from dollar to cent, then
the joint pmf would be
y
p(x, y )
1200 1500 2000
1200 .05
.05
.10
x
1500 .05
.10
.35
0
.20
.10
2000
And correspondingly,
Cov (X , Y ) = E (XY ) − µX · µY = 7550
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
4 / 28
Covariance
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
5 / 28
Covariance
Definition
The correlation coefficient of X and Y , denoted by Corr (X , Y ), ρX ,Y or
just ρ is defined by
Cov (X , Y )
ρX ,Y =
σX · σY
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
5 / 28
Covariance
Definition
The correlation coefficient of X and Y , denoted by Corr (X , Y ), ρX ,Y or
just ρ is defined by
Cov (X , Y )
ρX ,Y =
σX · σY
e.g. for the previous example, the correlation coefficient of X and Y is
ρ=
Liang Zhang (UofU)
−0.755
= −0.09
2.91 · 2.94
Applied Statistics I
July 7, 2008
5 / 28
Covariance
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
6 / 28
Covariance
Proposition
1. Corr (aX + b, cY + d) = Corr (X , Y ) if a · c > 0.
2. −1 ≤ Corr (X , Y ) ≤ 1.
3. ρ = 1 or −1 iff Y = aX + b for some a and b with a 6= 0.
4. If X and Y are independent, then ρ = 0. However, ρ = 0 does not
imply that X and Y are independent
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
6 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
7 / 28
Statistics and Their Distributions
Assume we are running an online retail store. Some factors may be of
great interest to us. Like the distributions of buyers nation-wide, the time
a customer spend on the website per each visit, costumers’ satisfactories
and etc.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
7 / 28
Statistics and Their Distributions
Assume we are running an online retail store. Some factors may be of
great interest to us. Like the distributions of buyers nation-wide, the time
a customer spend on the website per each visit, costumers’ satisfactories
and etc.
For some factors, we may get the data for the “whole population”, like the
time spent per each visit. While for others, we can only obtain information
of a sample, like costumers’ satisfactories.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
7 / 28
Statistics and Their Distributions
Assume we are running an online retail store. Some factors may be of
great interest to us. Like the distributions of buyers nation-wide, the time
a customer spend on the website per each visit, costumers’ satisfactories
and etc.
For some factors, we may get the data for the “whole population”, like the
time spent per each visit. While for others, we can only obtain information
of a sample, like costumers’ satisfactories.
If we take into account of the future customers, we are unable to get the
information about the population theoretically. All the information we are
dealing with now is just from a sample.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
7 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
8 / 28
Statistics and Their Distributions
For example, we have the following data about the time spent per each
visit (in min.) from a sample of size 10:
1
2
3
4
5
time 24 51 12 95 26
6
7
8
9 10
time 5 33 62 31 27
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
8 / 28
Statistics and Their Distributions
For example, we have the following data about the time spent per each
visit (in min.) from a sample of size 10:
1
2
3
4
5
time 24 51 12 95 26
6
7
8
9 10
time 5 33 62 31 27
Each observation is “random”, i.e. we can not predict the exact value
before we obtain the observation.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
8 / 28
Statistics and Their Distributions
For example, we have the following data about the time spent per each
visit (in min.) from a sample of size 10:
1
2
3
4
5
time 24 51 12 95 26
6
7
8
9 10
time 5 33 62 31 27
Each observation is “random”, i.e. we can not predict the exact value
before we obtain the observation.
Therefore we can associate a random variable Xi to the ith observation.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
8 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
9 / 28
Statistics and Their Distributions
Often, we are interested in some overall properties of the sample.
For example, the maximum of the previous sample is 95;
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
9 / 28
Statistics and Their Distributions
Often, we are interested in some overall properties of the sample.
For example, the maximum of the previous sample is 95;
the minimum is 5;
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
9 / 28
Statistics and Their Distributions
Often, we are interested in some overall properties of the sample.
For example, the maximum of the previous sample is 95;
the minimum is 5;
the mean is 36.6;
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
9 / 28
Statistics and Their Distributions
Often, we are interested in some overall properties of the sample.
For example, the maximum of the previous sample is 95;
the minimum is 5;
the mean is 36.6;
the medain is 29;
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
9 / 28
Statistics and Their Distributions
Often, we are interested in some overall properties of the sample.
For example, the maximum of the previous sample is 95;
the minimum is 5;
the mean is 36.6;
the medain is 29;
and the standard deviation is 26.4.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
9 / 28
Statistics and Their Distributions
Often, we are interested in some overall properties of the sample.
For example, the maximum of the previous sample is 95;
the minimum is 5;
the mean is 36.6;
the medain is 29;
and the standard deviation is 26.4.
Sometimes, these characteristics are more interesting to us than the
sample data itself.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
9 / 28
Statistics and Their Distributions
Often, we are interested in some overall properties of the sample.
For example, the maximum of the previous sample is 95;
the minimum is 5;
the mean is 36.6;
the medain is 29;
and the standard deviation is 26.4.
Sometimes, these characteristics are more interesting to us than the
sample data itself.
We call these characteristics statistics.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
9 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
10 / 28
Statistics and Their Distributions
Definition
A statistic is any quantity whose value can be calculated from sample
data.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
10 / 28
Statistics and Their Distributions
Definition
A statistic is any quantity whose value can be calculated from sample
data.
Remark:
1. A statistic is a random variable.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
10 / 28
Statistics and Their Distributions
Definition
A statistic is any quantity whose value can be calculated from sample
data.
Remark:
1. A statistic is a random variable. The reason is prior to obtaining data,
we are not sure what value of any particular statistic will result.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
10 / 28
Statistics and Their Distributions
Definition
A statistic is any quantity whose value can be calculated from sample
data.
Remark:
1. A statistic is a random variable. The reason is prior to obtaining data,
we are not sure what value of any particular statistic will result.
We use uppercase letters to denote statistics and lowercase letter to
denote the calculated or observed values of statistics.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
10 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
11 / 28
Statistics and Their Distributions
Remark:
2. A statistic must be calculated from sample data.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
11 / 28
Statistics and Their Distributions
Remark:
2. A statistic must be calculated from sample data.
For example, if in addition to the size 10 sample for the previous
example, we also know that the time spent per each visit is normally
distributed with mean µ and variance σ 2 , then neither the population
mean µ nor the population variance σ 2 is a statistic.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
11 / 28
Statistics and Their Distributions
Remark:
2. A statistic must be calculated from sample data.
For example, if in addition to the size 10 sample for the previous
example, we also know that the time spent per each visit is normally
distributed with mean µ and variance σ 2 , then neither the population
mean µ nor the population variance σ 2 is a statistic.
While the sample mean and the sample variance are two valid
statistics, which will be denoted by X and S 2 , respectively.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
11 / 28
Statistics and Their Distributions
Remark:
2. A statistic must be calculated from sample data.
For example, if in addition to the size 10 sample for the previous
example, we also know that the time spent per each visit is normally
distributed with mean µ and variance σ 2 , then neither the population
mean µ nor the population variance σ 2 is a statistic.
While the sample mean and the sample variance are two valid
statistics, which will be denoted by X and S 2 , respectively.
3. Any statistic, being a random variable, has a probability distribution.
The probability distribution of a statistic is referred to as its sampling
distribution.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
11 / 28
Statistics and Their Distributions
Remark:
2. A statistic must be calculated from sample data.
For example, if in addition to the size 10 sample for the previous
example, we also know that the time spent per each visit is normally
distributed with mean µ and variance σ 2 , then neither the population
mean µ nor the population variance σ 2 is a statistic.
While the sample mean and the sample variance are two valid
statistics, which will be denoted by X and S 2 , respectively.
3. Any statistic, being a random variable, has a probability distribution.
The probability distribution of a statistic is referred to as its sampling
distribution.
The sampling distribution of a statistic DEPENDS on the sample size
n.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
11 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
12 / 28
Statistics and Their Distributions
Definition
The random variables X1 , X2 , . . . , Xn are said to form a (simple) random
sample of size n if
1. The Xi s are independent random variables.
2. Every Xi has the same probability distribution.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
12 / 28
Statistics and Their Distributions
Definition
The random variables X1 , X2 , . . . , Xn are said to form a (simple) random
sample of size n if
1. The Xi s are independent random variables.
2. Every Xi has the same probability distribution.
In words, X1 , X2 , . . . , Xn forms a random sample if the Xi ’s are
independent and identically distributed (iid).
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
12 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
13 / 28
Statistics and Their Distributions
Remark:
When sampling with replacement or from an infinite (conceptual)
population, the two conditions are satisfied and the result can be regarded
as a random sample.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
13 / 28
Statistics and Their Distributions
Remark:
When sampling with replacement or from an infinite (conceptual)
population, the two conditions are satisfied and the result can be regarded
as a random sample.
For sampling WITHOUT replacement from a finite population, although
consecutive observations are not independent and identically distributed,
we can still regard the result as a random sample if the sample size n is
much smaller than the population size N.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
13 / 28
Statistics and Their Distributions
Remark:
When sampling with replacement or from an infinite (conceptual)
population, the two conditions are satisfied and the result can be regarded
as a random sample.
For sampling WITHOUT replacement from a finite population, although
consecutive observations are not independent and identically distributed,
we can still regard the result as a random sample if the sample size n is
much smaller than the population size N.
In practice, if n/N ≤ .05 (at most .05% of the population is sampled), we
can regard the sample as a random sample.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
13 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
14 / 28
Statistics and Their Distributions
Deriving Sampling Distributions
Example (Problem 38)
There are two traffic lights on my way to work. Let X1 be the number of
lights at which I must stop, and suppose that the distribution of X1 is as
follows:
0 1 2
x1
µ = 1.1, σ 2 = .49
p(x1 ) .2 .5 .3
Let X2 be the number of lights at which I must stop on the way home; X2
is independent of X1 . Assume that X2 has the same distribution as X1 , so
that X1 , X2 is a random sample of size n = 2.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
14 / 28
Statistics and Their Distributions
Deriving Sampling Distributions
Example (Problem 38)
There are two traffic lights on my way to work. Let X1 be the number of
lights at which I must stop, and suppose that the distribution of X1 is as
follows:
0 1 2
x1
µ = 1.1, σ 2 = .49
p(x1 ) .2 .5 .3
Let X2 be the number of lights at which I must stop on the way home; X2
is independent of X1 . Assume that X2 has the same distribution as X1 , so
that X1 , X2 is a random sample of size n = 2.
a. Let X = (X1 + X2 )/2. Find the probability distribution of X .
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
14 / 28
Statistics and Their Distributions
Deriving Sampling Distributions
Example (Problem 38)
There are two traffic lights on my way to work. Let X1 be the number of
lights at which I must stop, and suppose that the distribution of X1 is as
follows:
0 1 2
x1
µ = 1.1, σ 2 = .49
p(x1 ) .2 .5 .3
Let X2 be the number of lights at which I must stop on the way home; X2
is independent of X1 . Assume that X2 has the same distribution as X1 , so
that X1 , X2 is a random sample of size n = 2.
a. Let X = (X1 + X2 )/2. Find the probability distribution of X .
b. Calculate P(X ≤ 1).
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
14 / 28
Statistics and Their Distributions
Deriving Sampling Distributions
Example (Problem 38)
There are two traffic lights on my way to work. Let X1 be the number of
lights at which I must stop, and suppose that the distribution of X1 is as
follows:
0 1 2
x1
µ = 1.1, σ 2 = .49
p(x1 ) .2 .5 .3
Let X2 be the number of lights at which I must stop on the way home; X2
is independent of X1 . Assume that X2 has the same distribution as X1 , so
that X1 , X2 is a random sample of size n = 2.
a. Let X = (X1 + X2 )/2. Find the probability distribution of X .
b. Calculate P(X ≤ 1).
c. Calculate µX . How does it relate to µ, the population mean?
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
14 / 28
Statistics and Their Distributions
Deriving Sampling Distributions
Example (Problem 38)
There are two traffic lights on my way to work. Let X1 be the number of
lights at which I must stop, and suppose that the distribution of X1 is as
follows:
0 1 2
x1
µ = 1.1, σ 2 = .49
p(x1 ) .2 .5 .3
Let X2 be the number of lights at which I must stop on the way home; X2
is independent of X1 . Assume that X2 has the same distribution as X1 , so
that X1 , X2 is a random sample of size n = 2.
a. Let X = (X1 + X2 )/2. Find the probability distribution of X .
b. Calculate P(X ≤ 1).
c. Calculate µX . How does it relate to µ, the population mean?
d. Calculate σ 2 . How does it relate to σ 2 , the population variance?
X
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
14 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
15 / 28
Statistics and Their Distributions
Deriving Sampling Distributions
Example
A certain system consists of two identical components. The life time of
each component is supposed to have an expentional distribution with
parameter λ. The system will work if at least one component works
properly and the two components are assumed to work independently. Let
X1 and X2 be the lifetime of the two components, respectively. What can
we say about the lifetime of the system T0 = X1 + X2 ?
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
15 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
16 / 28
Statistics and Their Distributions
Deriving Sampling Distributions
Example
A certain system consists of two identical components. The life time of
each component is supposed to have an expentional distribution with
parameter λ = 3. The system will work if both components work properly
and the two components are assumed to work independently. Let X1 and
X2 be the lifetime of the two components, respectively. Then the lifetime
of the system is T1 = min(X1 , X2 ). What is the average lifetime of 5 such
systems?
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
16 / 28
Statistics and Their Distributions
Deriving Sampling Distributions
Example
A certain system consists of two identical components. The life time of
each component is supposed to have an expentional distribution with
parameter λ = 3. The system will work if both components work properly
and the two components are assumed to work independently. Let X1 and
X2 be the lifetime of the two components, respectively. Then the lifetime
of the system is T1 = min(X1 , X2 ). What is the average lifetime of 5 such
systems?
This time, direct derivation of the sampling distribution is complicated.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
16 / 28
Statistics and Their Distributions
Deriving Sampling Distributions
Example
A certain system consists of two identical components. The life time of
each component is supposed to have an expentional distribution with
parameter λ = 3. The system will work if both components work properly
and the two components are assumed to work independently. Let X1 and
X2 be the lifetime of the two components, respectively. Then the lifetime
of the system is T1 = min(X1 , X2 ). What is the average lifetime of 5 such
systems?
This time, direct derivation of the sampling distribution is complicated.
Instead, we use the method simulation.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
16 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
17 / 28
Statistics and Their Distributions
Simulation Experiments
1. Use some software to generate a size-5 random sample whose
distribution is EXP(3);
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
17 / 28
Statistics and Their Distributions
Simulation Experiments
1. Use some software to generate a size-5 random sample whose
distribution is EXP(3);
2. Generate another size-5 random sample whose distribution is EXP(3);
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
17 / 28
Statistics and Their Distributions
Simulation Experiments
1. Use some software to generate a size-5 random sample whose
distribution is EXP(3);
2. Generate another size-5 random sample whose distribution is EXP(3);
3. Construct the data set min(Xi , Yi ) for i = 1, . . . , 5 from these two
random samples;
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
17 / 28
Statistics and Their Distributions
Simulation Experiments
1. Use some software to generate a size-5 random sample whose
distribution is EXP(3);
2. Generate another size-5 random sample whose distribution is EXP(3);
3. Construct the data set min(Xi , Yi ) for i = 1, . . . , 5 from these two
random samples;
4. Calculate the mean of the data set. This is one simulation.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
17 / 28
Statistics and Their Distributions
Simulation Experiments
1. Use some software to generate a size-5 random sample whose
distribution is EXP(3);
2. Generate another size-5 random sample whose distribution is EXP(3);
3. Construct the data set min(Xi , Yi ) for i = 1, . . . , 5 from these two
random samples;
4. Calculate the mean of the data set. This is one simulation.
5. Simulate another 499 times;
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
17 / 28
Statistics and Their Distributions
Simulation Experiments
1. Use some software to generate a size-5 random sample whose
distribution is EXP(3);
2. Generate another size-5 random sample whose distribution is EXP(3);
3. Construct the data set min(Xi , Yi ) for i = 1, . . . , 5 from these two
random samples;
4. Calculate the mean of the data set. This is one simulation.
5. Simulate another 499 times;
6. Construct the histogram for the 500 results from simulations.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
17 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
18 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
18 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
19 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
19 / 28
Statistics and Their Distributions
The larger the sample size is, the smaller the spread of the sampling
distribution of the sample mean is.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
19 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
20 / 28
Statistics and Their Distributions
Example (Problem 45)
Carry out a simulation experiment using a statistical computer package or
other software to study the sampling distribution of X when the
population distribution is lognormal with E (ln(X )) = 3 and V (ln(X )) = 1.
Consider the four sample sizes n = 10, 20, 30, and 50, and in each case use
500 replications.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
20 / 28
Statistics and Their Distributions
Example (Problem 45)
Carry out a simulation experiment using a statistical computer package or
other software to study the sampling distribution of X when the
population distribution is lognormal with E (ln(X )) = 3 and V (ln(X )) = 1.
Consider the four sample sizes n = 10, 20, 30, and 50, and in each case use
500 replications.
1. Use some software to generate a size-10 random sample whose
distribution is LOGN(3,1);
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
20 / 28
Statistics and Their Distributions
Example (Problem 45)
Carry out a simulation experiment using a statistical computer package or
other software to study the sampling distribution of X when the
population distribution is lognormal with E (ln(X )) = 3 and V (ln(X )) = 1.
Consider the four sample sizes n = 10, 20, 30, and 50, and in each case use
500 replications.
1. Use some software to generate a size-10 random sample whose
distribution is LOGN(3,1);
2. Calculate the mean of the random sample. This is one simulation.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
20 / 28
Statistics and Their Distributions
Example (Problem 45)
Carry out a simulation experiment using a statistical computer package or
other software to study the sampling distribution of X when the
population distribution is lognormal with E (ln(X )) = 3 and V (ln(X )) = 1.
Consider the four sample sizes n = 10, 20, 30, and 50, and in each case use
500 replications.
1. Use some software to generate a size-10 random sample whose
distribution is LOGN(3,1);
2. Calculate the mean of the random sample. This is one simulation.
3. Simulate another 499 times;
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
20 / 28
Statistics and Their Distributions
Example (Problem 45)
Carry out a simulation experiment using a statistical computer package or
other software to study the sampling distribution of X when the
population distribution is lognormal with E (ln(X )) = 3 and V (ln(X )) = 1.
Consider the four sample sizes n = 10, 20, 30, and 50, and in each case use
500 replications.
1. Use some software to generate a size-10 random sample whose
distribution is LOGN(3,1);
2. Calculate the mean of the random sample. This is one simulation.
3. Simulate another 499 times;
4. Construct the histogram for the 500 results from simulations.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
20 / 28
Statistics and Their Distributions
Example (Problem 45)
Carry out a simulation experiment using a statistical computer package or
other software to study the sampling distribution of X when the
population distribution is lognormal with E (ln(X )) = 3 and V (ln(X )) = 1.
Consider the four sample sizes n = 10, 20, 30, and 50, and in each case use
500 replications.
1. Use some software to generate a size-10 random sample whose
distribution is LOGN(3,1);
2. Calculate the mean of the random sample. This is one simulation.
3. Simulate another 499 times;
4. Construct the histogram for the 500 results from simulations.
5. Repeat the simulation for n = 20, 30 and 50.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
20 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
21 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
21 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
22 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
22 / 28
Statistics and Their Distributions
As the sample size becomes larger, the sampling distribution looks more
like the normal distribution.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
22 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
23 / 28
Statistics and Their Distributions
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
23 / 28
Distribution for Sample Mean
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
24 / 28
Distribution for Sample Mean
Proposition
Let X1 , X2 , . . . , Xn be a random sample from a distribution with mean
value µ and standard deviation σ. Then
1. E (X ) = µX = µ
√
2. V (X ) = σ 2 = σ 2 /n and σX = σ/ n
X
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
24 / 28
Distribution for Sample Mean
Proposition
Let X1 , X2 , . . . , Xn be a random sample from a distribution with mean
value µ and standard deviation σ. Then
1. E (X ) = µX = µ
√
2. V (X ) = σ 2 = σ 2 /n and σX = σ/ n
X
In words, the expected value of the sample mean equals the population
mean, which is called the unbiased property.
And the variance of the sample mean equals n1 of the population variance
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
24 / 28
Distribution for Sample Mean
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
25 / 28
Distribution for Sample Mean
Example (Problem 38 revisit)
There are two traffic lights on my way to work. Let X1 be the number of
lights at which I must stop, and suppose that the distribution of X1 is as
follows:
x1
0 1 2
µ = 1.1, σ = .49
p(x1 ) .2 .5 .3
Let X2 be the number of lights at which I must stop on the way home; X2
is independent of X1 . Assume that X2 has the same distribution as X1 , so
that X1 , X2 is a random sample of size n = 2.
Let X = (X1 + X2 )/2 denote the average stops.
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
25 / 28
Distribution for Sample Mean
Example (Problem 38 revisit)
There are two traffic lights on my way to work. Let X1 be the number of
lights at which I must stop, and suppose that the distribution of X1 is as
follows:
x1
0 1 2
µ = 1.1, σ = .49
p(x1 ) .2 .5 .3
Let X2 be the number of lights at which I must stop on the way home; X2
is independent of X1 . Assume that X2 has the same distribution as X1 , so
that X1 , X2 is a random sample of size n = 2.
Let X = (X1 + X2 )/2 denote the average stops.
a. Calculate µX .
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
25 / 28
Distribution for Sample Mean
Example (Problem 38 revisit)
There are two traffic lights on my way to work. Let X1 be the number of
lights at which I must stop, and suppose that the distribution of X1 is as
follows:
x1
0 1 2
µ = 1.1, σ = .49
p(x1 ) .2 .5 .3
Let X2 be the number of lights at which I must stop on the way home; X2
is independent of X1 . Assume that X2 has the same distribution as X1 , so
that X1 , X2 is a random sample of size n = 2.
Let X = (X1 + X2 )/2 denote the average stops.
a. Calculate µX .
b. Calculate σ 2 .
X
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
25 / 28
Distribution for Sample Mean
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
26 / 28
Distribution for Sample Mean
Proposition
Let X1 , X2 , . . . , Xn be a random sample from a distribution with mean
value µ and standard deviation σ. Define T0 = X1 + X2 + · · · + Xn , then
√
E (T0 ) = nµ, V (T0 ) = nσ 2 and σT0 = nσ
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
26 / 28
Distribution for Sample Mean
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
27 / 28
Distribution for Sample Mean
Proposition
Let X1 , X2 , . . . , Xn be a random sample from a normal distribution with
mean value µ and standard deviation σ. Then for any n, X is normally
√
distributed (with mean value µ and standard deviation σ/ n), as is T0
√
(with mean value nµ and standard deviation nσ).
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
27 / 28
Distribution for Sample Mean
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
28 / 28
Distribution for Sample Mean
Example (Problem 54)
Suppose the sediment density (g/cm) of a randomly selected specimen
from a certain region is normally distributed with mean 2.65 and standard
deviation .85 (suggested in “Modeling Sediment and Water Column
Interactions for Hydrophobic Pollutants”, Water Research, 1984:
1169-1174).
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
28 / 28
Distribution for Sample Mean
Example (Problem 54)
Suppose the sediment density (g/cm) of a randomly selected specimen
from a certain region is normally distributed with mean 2.65 and standard
deviation .85 (suggested in “Modeling Sediment and Water Column
Interactions for Hydrophobic Pollutants”, Water Research, 1984:
1169-1174).
a. If a random sample of 25 specimens is selected, what is the
probability that the sample average sediment density is at most 3.00?
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
28 / 28
Distribution for Sample Mean
Example (Problem 54)
Suppose the sediment density (g/cm) of a randomly selected specimen
from a certain region is normally distributed with mean 2.65 and standard
deviation .85 (suggested in “Modeling Sediment and Water Column
Interactions for Hydrophobic Pollutants”, Water Research, 1984:
1169-1174).
a. If a random sample of 25 specimens is selected, what is the
probability that the sample average sediment density is at most 3.00?
b. How large a sample size would be required to ensure that the above
probability is at least .99?
Liang Zhang (UofU)
Applied Statistics I
July 7, 2008
28 / 28
Download