Uploaded by brucenyambe03

CONCEPT OF A STATISTIC

advertisement
CONCEPT OF A STATISTIC
Any characteristic of a population which is measurable is called a population parameter.
(Greek letters for population parameters.)
A parameter is a numerical property of a sample.
Usually the population is too large to calculate these parameters. In order to estimate a
population parameter, we take a random sample from the population and use observations
from the items in it to estimate the required parameters.
EXAMPLE
A manufacturer makes three sizes of toaster. 40% of the toasters sell for K16, 50%
sell for K20
and 10% sell for K30.
a. Find the mean and variance of the value of the toasters.
A sample of 2 toasters is sent to a shop.
b. List all the possible prices of the samples that could be sent.
c. Find the sampling distribution for the mean price X of these samples .
EXAMPLE
A supermarket sells a large number of 3-litre and 2-litre cartons of milk.
They are sold in the ratio 3:2.
a. Find the mean and variance of the milk content in this population of cartons.
A random sample of 3 cartons is taken from the shelves (𝑋1 , 𝑋2 , π‘Žπ‘›π‘‘ 𝑋3 ).
b.
c.
d.
e.
List all the possible samples.
Find the sampling distribution of the mean 𝑋̅.
Find the sampling distribution of the mode 𝑀.
Find the sampling distribution of the median 𝑁 of these samples.
EXAMPLE
A large bag contains pawns. Sixty per cent of the pawns have the number 0 on
them and forty per cent have the number 1.
a. Find the mean and variance for this population of pawns.
A simple random sample of size 3 is taken from this population.
b. List all possible samples.
c. Find the sampling distribution for the mean
𝑋 +𝑋 +𝑋
𝑋̅ = 1 2 3
3
where 𝑋1 , 𝑋2 π‘Žπ‘›π‘‘ 𝑋3 are the three variables representing samples 1, 2 and 3.
d. Hence find E(𝑋̅) and Var(𝑋̅).
e. Find the sampling distribution for the mode M.
f. Hence find E(M) and Var(M).
CENTRAL LIMIT THEOREM
The Central Limit Theorem says that if 𝑋1 , 𝑋2 , 𝑋3 … . . 𝑋𝑛 is a random sample of
size n from a population with mean πœ‡ and variance 𝜎 2 then 𝑋̅ is approximately
𝑋̅ ~𝑁 (πœ‡,
𝜎2
)
𝑛
CONFIDENCE INTERVAL (C.I.)
The value of πœƒΜ‚, which is an estimator of θ, is found from a sample. It is used as an
unbiased estimate for the population parameter θ and is very unlikely to be exactly
equal to θ.
There is no way of establishing, from the sample data only, how close the estimate
is.
INSTEAD, YOU CAN FORM A CONFIDENCE INTERVAL FOR Θ.
A confidence interval (C.I.) for a population parameter Θ is a range of values
defined so that there is a specific probability that the true value of the parameter
lies within that range.
You could establish a 90% confidence interval, or a 95% confidence interval.
A 95% confidence interval is an interval such that there is a 0.95 probability that
the interval contains Θ.
Different samples will generate different confidence intervals since estimates for the
parameter will change based on the data in the sample and the sample size.
𝑋̅ ~𝑁 (πœ‡,
𝜎2
)
𝑛
Hence, if you know the population standard deviation, you can establish a
confidence interval for the population mean πœ‡ using the standardized normal
distribution.
EXAMPLE
Show that a 95% confidence interval for πœ‡, based on a sample of size "𝑛" is given
by
( π‘₯Μ… − 1.96
𝜎
√𝑛
, π‘₯Μ… + 1.96
𝜎
√𝑛
)
We are sometimes interested in the width of a confidence interval. The width of a
confidence interval is the difference between
THE UPPER CONFIDENCE LIMIT AND THE LOWER CONFIDENCE LIMIT.
This is 2 × π‘ ×
𝜎
√𝑛
where z is the value from the tables
EXAMPLE
QUESTION ONE
The breaking strains of string produced at a certain factory are normally distributed
with standard deviation 1.5 kg. A sample of 100 lengths of string from a certain batch was tested
and the mean breaking strain was 5.30 kg.
a. Find a 95% confidence interval for the mean breaking strain of string in this batch.
The manufacturer becomes concerned if the lower 95% confidence limit falls below 5 kg.
A sample of 80 lengths of string from another batch gave a mean breaking strain of 5.31 kg.
b. Will the manufacturer be concerned?
QUESTION TWO
A random sample of size 25 is taken from a normal population with standard deviation of 2.5.
The mean of the sample was 17.8
a. Find a 99% C.I. for the population mean V.
b. What size sample is required to obtain a 990/0 C.I. of width of at most 1.5?
c. What confidence level would be associated with the interval based on the above sample of
25 but of width 1.5, i.e. (17.05, 18.55)?
QUESTION TWO
A random sample of size 9 is taken from a normal distribution with variance 36.
The sample mean is 128.
a. Find a 95% confidence interval for the mean of the distribution.
b. Find a 99% confidence interval for the mean of the distribution.
QUESTION THREE
A random sample of size 25 is taken from a normal distribution with standard deviation 4.
The sample mean is 85.
a. Find a 90% confidence interval for the mean of the distribution.
b. Find a 95% confidence interval for the mean of the distribution.
QUESTION FOUR
A normal distribution has standard deviation 15. Estimate the sample size required
if the following confidence intervals for the mean should have width of less than 2.
a. 90%
b. 95%
c. 99%
QUESTION FIVE
An experienced poultry farmer knows that the mean weight kg for a large
population of chickens will vary from season to season but the standard deviation
of the weights should remain at 0.70 kg.
A random sample of 100 chickens is taken from the population and the weight π‘₯ kg
of each chicken in the sample is recorded, giving ∑ π‘₯ = 190.2.
Find a 95% confidence interval for πœ‡.
QUESTION SIX
It is known that each year the standard deviation of the marks in a certain
examination is 13.5 but the mean mark will fluctuate.
An examiner wishes to estimate the mean mark of all the candidates on the
examination but he only has the marks of a sample of 250 candidates which give a
sample mean of 68.4.
a. What assumption about these candidates must the examiner make in order to
use this sample mean to calculate a confidence interval for 𝑝?
b. Assuming that the above assumption is justified, calculate a 95% confidence
interval for 𝑝.
Later the examiner discovers that the actual value of was 65.3.
c. What conclusions might the examiner draw about his sample?
QUESTION SEVEN
The managing director of a certain firm has commissioned a survey to estimate the
mean expenditure of customers on electrical appliances. A random sample of 100
people were questioned and the research team presented the managing director
with a 95% confidence interval of (K128.14, K141.86).
The director says that this interval is too wide and wants a confidence interval of
total width KIO.
a. Using the same value off, find the confidence limits in this case.
b. Find the level of confidence for the interval in part a.
The managing director is still not happy and now wishes to know how large a
sample would be required to obtain a 95% confidence interval of total width no
more than 𝐾10.
c. Find the smallest size of sample that will satisfy this request.
QUESTION EIGHT
a. The error made when a certain instrument is used to measure the body length
of a butterfly of a particular species is known to be normally distributed with
mean 0 and standard deviation 1 mm.
Calculate, to 3 decimal places, the probability that the error made when the
instrument is used once is numerically less than 0.4 mm.
b. Given that the body length of a butterfly is measured 9 times with the
instrument, calculate, to 3 decimal places, the probability that the mean of the 9
readings will be within 0.5 mm of the true length.
c. Given that the mean of the 9 readings was 22.53 mm, determine a 98%
confidence interval for the true body length of the butterfly.
CONFIDENCE INTERVAL FOR A LARGE SAMPLE
In the examples considered so far it has been assumed that the samples are drawn from a
normal distribution and that the variance of this distribution is known.
In practice you will not always be sure that the population is normal and you may or may not
have accurate information about its variance.
However, the method of calculating a confidence interval which has been described in this
chapter can still be applied provided that the sample is large.
ESTIMATING POPULATION PARAMETERS
οƒ˜ A statistic that is used to estimate a population parameter is called an 𝐞𝐬𝐭𝐒𝐦𝐚𝐭𝐨𝐫
οƒ˜ And the particular value of the estimator generated from the sample taken is called an
𝐞𝐬𝐭𝐒𝐦𝐚𝐭𝐞.
There is need to determine how reliable these sample statistics are as estimators for the
corresponding population parameters.
Since all the 𝑋𝑖 are random variables having the same mean and variance as the population, you
can sometimes find expected values of a statistic T, E(T), which will tell you what the 'average'
value of the statistic should be.
The BIAS is simply the
EXPECTED VALUE OF THE ESTIMATOR
MINUS
THE PARAMETER OF THE POPULATION
it is estimating.
οƒ˜ If a statistic T is used as an estimator for a population parameter θ
then
BIAS = E(T) − θ.
οƒ˜ If a statistic T is used as an estimator for a population parameter θ
and
𝐸(𝑇) = πœƒ
𝐸(𝑇) − πœƒ = 0
then T is an unbiased estimator for θ.
οƒ˜ An unbiased estimator for 𝜎 2 is given by the sample variance 𝑆 2
𝑛
1
𝑆 =
∑(𝑋𝑖 − 𝑋̅)2
𝑛−1
2
𝑖=1
One reason that the sample mean is used as an estimator for πœ‡ is that
𝜎2
οƒ˜ The variance of the estimator Var(𝑋̅) = 𝑛 decreases as n increases.
οƒ˜ For larger values of n, the value of an estimate is more likely to
be close to the population mean.
οƒ˜ So, a larger value of n will result in a better estimator.
οƒ˜ The standard deviation of an estimator is called the standard error of the estimator.
Μ…=
π’π­πšπ§ππšπ«π 𝐞𝐫𝐫𝐨𝐫 𝐨𝐟 𝐗
𝛔
√𝐧
=
𝐬
√𝐧
EXAMPLE
QUESTION ONE
A random sample 𝑋1 , 𝑋2 , 𝑋3 , , , … , , 𝑋𝑛 is taken from a population with 𝑁(πœ‡, 𝜎 2 ).
Show that –
I.
II.
𝐸(𝑋̅) = πœ‡
𝜎2
π‘‰π‘Žπ‘Ÿ(𝑋̅) =
𝑛
QUESTION TWO
1
Show that 𝑆 2 = 𝑛−1 ∑(𝑋 − 𝑋̅)2 =
1
𝑛−1
(∑ π‘₯ 2 − 𝑛π‘₯Μ… 2 ) is an unbiased estimator for 𝜎 2
QUESTION THREE
The table below summarizes the number of breakdowns 𝑋 on a busy road on 30 randomly
chosen days.
Number of break downs
Number of days
2
3
3
5
4
4
5
3
6
5
7
4
8
4
9
2
a. Calculate unbiased estimates of the mean and variance of the number of breakdowns
Twenty-one more days were randomly sampled, and this sample had
𝑋̅ = 6.0 π‘‘π‘Žπ‘¦π‘  π‘Žπ‘›π‘‘ 𝑆 2 = 5.0
b. Treating the 50 results as a single sample, obtain further unbiased estimates of the
population mean and variance.
c. Find the standard error of this new estimate of the mean.
d. Estimate the size of the sample required to achieve a standard error of less than 0.25
QUESTION FOUR
The lengths of metal bars produced by a certain machine are normally distributed with mean πœ‡
and standard deviation 𝜎. A random sample of 10 metals bars is taken, and there lengths
𝑋1 , 𝑋2 , 𝑋3 , , , … , , 𝑋10 are measured.
Write down the distributions of the following.
a.
b.
∑10
𝑖=1 𝑋𝑖
2𝑋1 − 3𝑋10
10
∑ 𝑋𝑖
𝑖=1
5
c. ∑10
𝑖=1(𝑋𝑖 − πœ‡)
d. 𝑋̅
e. ∑51 𝑋𝑖 − ∑10
6 𝑋𝑖
f. ∑10
𝑖 (
𝑋𝑖 − πœ‡
𝜎
)
QUESTION FIVE
A large bag of coins contains 1 cent, 5 cent and 10 cent coins in the ratio 2 ∢ 2 ∢ 1
a. Find the mean and the variance for the value of coins in this population.
A random sample of two coins is taken and their values 𝑋1 and 𝑋2 are recorded.
b. List all the possible observations from this sample.
𝑋 −𝑋
c. Find the sampling distribution for the mean 𝑋̅ = 1 2
d. Hence show that E(𝑋̅) = πœ‡ and π‘‰π‘Žπ‘Ÿ(𝑋̅) =
𝜎2
2
𝑛
QUESTION SIX
Find unbiased estimates of the mean and variance of the populations from which the following
random samples have been taken.
a. 21.3 19.6 18.5 22.3 17.4 16.3 18.9 17.6 18.7 16.5 19.3 21.8 20.1 22.0
b. 1 , 2, 5, 1, 6, 4, 1, 3, 2, 8, 5, 6, 2, 4, 3, 1
c. 120.4 230.6 356.1 129.8 185.6 147.6 258.3 329.7 249.3
QUESTION SEVEN
Find unbiased estimates of the mean and variance of the populations for which the random
samples with the following summaries have been made
a.
b.
c.
d.
𝑛 = 120
𝑛 = 30
𝑛 = 1037
𝑛 = 15
∑ π‘₯ = 4368
∑ π‘₯ = 270
∑ π‘₯ = 1140.7
∑ π‘₯ = 168
∑ π‘₯ 2 = 162 466
∑ π‘₯ 2 = 2546
∑ π‘₯ 2 = 1278.08
∑ π‘₯ 2 = 162 466
QUESTION EIGHT
A sample of size 6 is taken from a population that is normally distributed with mean 10 and
standard deviation 2.
a. Find the probability that the sample mean is greater than 12.
b. State, with a reason, if your answer is an approximation.
QUESTION NINE
A machine fills cartons in such a way that the amount of drink in each carton is distributed
normally with a mean of 40 π‘π‘š3 and a standard deviation of 1.5 π‘π‘š3.
A sample of four cartons is examined.
a. Find the probability that the mean amount of drink is more than 40.5 π‘π‘š3.
A sample of 49 cartons is examined.
b. Find the probability that the mean amount of drink is more than 40.5 π‘π‘š3 on this occasion.
QUESTION TEN
Cartons of orange juice are filled by a machine. A sample of 10 cartons selected at random from
the production line contained the following quantities of orange juice (in ml).
201.2 205.0 209.1 202.3 204.6 206.4 210.1 201.9 203.7 207.3
Calculate unbiased estimates of the mean and variance of the population from which this sample
was taken.
QUESTION ELEVEN
A manufacturer of self-build furniture required bolts of two lengths, 5cm and 10 cm, in the ratio
2 ∢ 1 respectively.
a. Find the mean πœ‡ and the variance 𝜎 2 for the lengths of bolts in this population.
A random sample of three bolts is selected from a large box containing bolts in the required ratio.
b.
c.
d.
e.
f.
g.
List all the possible observations from this sample.
Find the sampling distribution for the mean 𝑋̅.
Μ…) and π‘‰π‘Žπ‘Ÿ(𝑋̅).
Hence find E(X
Find the sampling distribution of the mode 𝑀
Hence find 𝐸(𝑀) π‘Žπ‘›π‘‘ π‘£π‘Žπ‘Ÿ(𝑀)
Find the bias when 𝑀 is used as an estimator of the population mode.
QUESTION TWELVE
A machine operator checks a random sample of 20 bottles from a production line in order to
estimate the mean volume of bottles (𝑖𝑛 π‘π‘š3 ) from this production run. The 20 values can be
summarized as ∑ π‘₯ = 1300 and ∑ π‘₯ 2 = 84 685.
a. Use this sample to find unbiased estimates of πœ‡ and 𝜎 2 .
A supervisor knows from experience that the standard deviation of volumes on this process,
(T, should be 3 π‘π‘š3 and he wishes to have an estimate of πœ‡ that has a standard error of less than
0.5 π‘π‘š3 ).
b. Recommend a sample size for the supervisor, showing working to support your
recommendation.
c. Does your recommended sample size guarantee a standard error of less than 0.5 cm3?
Give a reason for your answer.
The supervisor takes a further sample of size 16 and finds ∑ π‘₯ = 1060.
d. Combine the two samples to obtain a revised estimate of πœ‡.
QUESTION THIRTEEN
To work for a company, applications need to complete a medical test. The probability of each
applicant passing the test is 𝑝, independent of any other applicant. The medicals are out over two
days and on the first day "𝑛" applicants are seen and on the next day "2𝑛" are seen.
Let 𝑋1 be the number of applicants who pass the test on the first day and
Let 𝑋2 be the number of applicants who pass the test on the second day
a. Write down 𝐸(𝑋1 ), 𝐸(𝑋2 ), π‘‰π‘Žπ‘Ÿ(𝑋1 ) π‘Žπ‘›π‘‘ π‘‰π‘Žπ‘Ÿ(𝑋2 )
b. Show that
𝑋1
𝑛
π‘Žπ‘›π‘‘
𝑋2
are both unbiased estimates of 𝑝 and state giving a reason which one
2𝑛
you would prefer to use.
1 𝑋
𝑋
c. Show that 𝑋 = 2 ( 𝑛1 + 2𝑛2 ) is an unbiased estimator of 𝑝
𝑋1 +𝑋2
d. Show that 𝑦 = (
3𝑛
) is an unbiased estimator of 𝑝
e. Which of the statistics
The statistic𝑇 = (
𝑋1
𝑛
2𝑋1 +𝑋2
3𝑛
,
𝑋2
2𝑛
𝑋 π‘œπ‘Ÿ π‘Œ is the best estimator of 𝑝?
) is proposed as an estimator of 𝑝
f. Find the bias.
QUESTION FOURTEEN
In a bag that contains a large number of counters, the umbers 0 is written on 40% of the counters,
the number 1 is written on 20% of the counters and the number is written on the remaining 40%
of the counters.
a. Find the mean πœ‡ and the variance 𝜎 2 for this population of counters
A random sample of size 3 is taken from the bag
b.
c.
d.
e.
List all the possible observations from this sample
Find the sampling distribution of the mean 𝑋̅.
Μ…) π‘Žπ‘›π‘‘ π‘‰π‘Žπ‘Ÿ(𝑋̅).
The E(X
Find the sampling distribution for the median 𝑁.
f. Hence, find E(N) π‘Žπ‘›π‘‘ π‘‰π‘Žπ‘Ÿ(𝑁).
g. Show that N is an unbiased estimator of πœ‡.
h. Explain which estimator, 𝑋̅ or 𝑁, you would choose as an estimator of πœ‡.
QUESTION FIFTEEN
A factory worker checks a random sample of 20 bottles from a production line in order to
estimate the mean volume of bottles (π‘–π‘›π‘π‘š2 ) from this production run. The 20 values can be
summarized as ∑ π‘₯ = 1300 π‘Žπ‘›π‘‘ ∑ π‘₯ 2 = 84685.
a. Use this sample to find unbiased estimates πœ‡ and 𝜎 2 .
A factory manager knows from experience that the standard deviation of volumes on this
process, " 𝜎 " should be 3π‘π‘š3 and he wishes to have an estimate of πœ‡ that has a standard
error of less than 0.5 π‘π‘š3.
b. Recommend a sample size for the manager, showing working to support your
recommendation.
c. Does the recommended sample size guarantee a standard error of less than 0.5π‘π‘š3 ?
Give a reason for your answer.
The manager takes a further sample of 16 and finds ∑ π‘₯ = 1060.
d. Combine the two samples to obtain a revised estimate of πœ‡
Download