Unit 6: Statistical Inference

advertisement
Unit 6: Statistical Inference
Standard Error of Means
Let a random sample of size n 1 be drawn from a population with mean µ 1 and standard deviation σ1 .
Similarly let another random sample of size n 2 be drawn from a population with mean µ2 and standard
deviation σ2.
Then Mean and Standard Error are
𝐸 (π‘₯Μ…) = πœ‡
𝑆𝐸(π‘₯Μ… ) =
𝜎
√𝑛
Difference in Sample Means
Let Μ…Μ…Μ…
π‘₯1 be the mean of sample drawn from the first population and π‘₯Μ…Μ…Μ…2 be the mean of sample drawn
from the second population. Then the mean and standard error of difference of sample means are:
𝐸 (π‘₯Μ…1 − π‘₯Μ… 2 ) = πœ‡1 − πœ‡ 2
𝜎2 𝜎2
𝑆𝐸 (𝑋̅1 − 𝑋̅2 ) = √ 1 + 2
𝑛1 𝑛2
Standard Error Of Proportions
Let a random sample of size ‘n’ be drawn from a population with proportion P. Then, the sample
proportion ‘p’ has mean ‘P’ and standard error
𝑃𝑄
𝑆𝐸 = √
𝑛
𝐸(𝑝) = 𝑃
Where
𝑄 = 1 −𝑃
Difference of Sample Proportions
Let a random sample of size n 1 be drawn from a population with proportion P1. Similarly let another
random sample of size n 1 be drawn from a population with proportion P2.
Let p1 be the proportion of sample drawn from first population and p2 be the proportion of sample
drawn from second population.
Then difference in sample proportion is
𝐸 (𝑝1 − 𝑝2 ) = (𝑃1 − 𝑃2 )
𝑃1 𝑄1 𝑃2 𝑄2
𝑆𝐸 (𝑝1 − 𝑝2 ) = √
+
𝑛1
𝑛2
Or
1
1
𝑆𝐸( 𝑝1 − 𝑝2 ) = √𝑃𝑄 ( + )
𝑛1 𝑛2
The Hypothesis Concept
A hypothesis is an assumption. To test if your assumption is right or wrong we use the concept of
hypothesis.
A null hypothesis (H0) is the basic assumption. Lets say you feel the mean height of a class of 20 students
is 150cm. This is the null hypothesis. But the alternative hypothesis (H1) can be:
a) Mean height does not equal to 150cm
b) Mean height is more than 150cm
c) Mean height is less than 150cm
In order to see if the null hypothesis is correct or the alternative hypothesis is correct we use an
appropriate test statistic.
The test statistic is the formula given as
π‘π‘π‘Žπ‘™ =
π‘…π‘’π‘™π‘’π‘£π‘Žπ‘›π‘‘ π‘ π‘‘π‘Žπ‘‘π‘–π‘ π‘‘π‘–π‘ − π»π‘¦π‘π‘œπ‘‘β„Žπ‘’π‘‘π‘–π‘π‘Žπ‘™ π‘£π‘Žπ‘™π‘’π‘’
π‘†π‘‘π‘Žπ‘›π‘‘π‘Žπ‘Ÿπ‘‘ π‘‰π‘Žπ‘™π‘’π‘’
Depending on the alternative hypothesis the critical region or critical values are estimated. The critical
value or region represents where the value of Z cal will fall in.
This is represented with a normal curve.
If for example the previous case of 150cm being the mean of a class, the various alternative hypothesis
can be tested as follows:
a) Mean height does not equal to 150cm (two tailed test)
b) Mean height is more than 150cm (right tailed test)
c) Mean height is less than 150cm (left tailed test)
The graph must be drawn to represent the region where Z cal falls in in order to either accept the null
hypothesis or reject the null hypothesis
The critical values here will help us decide that:
α
5%
1%
Two Tailed Test
-k
K
-1.96
1.96
-2.58
2.58
One Tailed Test
Left (-k)
Right (+k)
1.65
-2.33
2.33
If the calculated value of Zcal lies in acceptance region then the null hypothesis is accepted otherwise it is
rejected.
a) If the test is two tailed and α 5% then null hypothesis is accepted (-1.96 < Zcal < 1.96)
b) If the test is right tailed and α 5% then null hypothesis is accepted (Zcal < 1.65)
c) If the test is left tailed and α 5% then null hypothesis is accepted (Zcal > -1.65)
Test for Population Mean
π‘₯Μ… − πœ‡ 0
𝜎
√𝑛
𝑍=
Where π‘₯Μ…the sample mean, σ is the population standard deviation. If it is unknown then it is replaced by
sample standard deviation ‘s’ and ‘n’ is sample size
Test for Equality of Means of Two Populations
𝑍=
π‘₯Μ…1 − π‘₯Μ…2
𝜎2 𝜎2
√ 1+ 2
𝑛1 𝑛2
Where Μ…Μ…Μ…
π‘₯1 and Μ…Μ…Μ…
π‘₯ 2 is the sample mean, σ1 and σ2 is the population standard deviation. If it is unknown
then it is replaced by sample standard deviation ‘s 1’ and ‘s2 ’ and ‘n1’ and ‘n2’ are sample sizes
Test for Population Proportion
𝑍=
𝑝−𝑃
√𝑃𝑄
𝑛
𝑝=
π‘₯
𝑛
Where x is the sample size and ‘n’ is the sample proportion.
Test for Equality of Proportions of Two Populations
𝑍=
𝑝1 − 𝑝2
1
1
√𝑃𝑄 ( + )
𝑛1 𝑛2
Where
𝑝1 =
π‘₯1
π‘₯2
𝑛1 𝑝1 + 𝑛2 𝑝2
π‘Žπ‘›π‘‘ 𝑝2 =
π‘Žπ‘›π‘‘ 𝑃 =
𝑛1
𝑛2
𝑛1 + 𝑛2
t-tests (Small Sample Tests or Samples of Size < 30)
This test is used for:
a) Finding the significance of mean of a population using small sample
b) Test difference between means of two populations.
c) Test the difference between means of two populations using paired observations.
Testing Mean of a Population using Small Sample
The test statistic is
𝑑=
π‘₯Μ… − πœ‡
𝑠
√𝑛 − 1
π‘₯Μ… is the sample mean
πœ‡ is the hypothetical mean of the population
n is the sample size
s is the standard deviation of the sampele given as
2
∑ π‘₯2
∑ π‘₯𝑖
)
𝑠 = √ 𝑖 −(
𝑛
𝑛
Under H0 the test statistic is:
𝑑=
π‘₯Μ… − πœ‡
π‘€π‘–π‘‘β„Ž (𝑛 − 1) π‘‘π‘’π‘”π‘Ÿπ‘’π‘’π‘  π‘œπ‘“ π‘“π‘Ÿπ‘’π‘’π‘‘π‘œπ‘š
𝑠
√𝑛 − 1
Degrees of freedom are the number of independent observations. If there are ‘n’ observations then
𝐷𝐹 = 𝑛 − 𝑐
Where ‘c’ is the number of independent constraints.
Testing Difference between Means of Two Populations using two Samples
Given that two sample sizes of n1 and n2 with means π‘₯Μ…1 and π‘₯Μ… 2 and standard deviations s1 and s2 we
may be interested in testing the hypothesis that the samples come from the same normal population.
𝑑=
π‘₯Μ…1 − π‘₯Μ… 2
π‘€π‘–π‘‘β„Ž (𝑛1 + 𝑛2 − 2) 𝐷𝐹
1
1
2
√𝑠𝑐 ( + )
𝑛1 𝑛2
∑(π‘₯1 − π‘₯Μ…1 )2 + ∑( π‘₯2 − π‘₯Μ… 2 )2
𝑛1 + 𝑛2 − 2
𝑠𝑐2 =
Test The Difference between Means of Two Populations using two Samples
The test statistic is
𝑑=
𝑑̅
𝑠𝑑 π‘€π‘–π‘‘β„Ž (𝑛 − 1) 𝐷𝐹
√𝑛 − 1
∑ 𝑑2
∑𝑑
𝑠𝑑 = √
−( )
𝑛
𝑛
2
Chi-Square Test
If Z1, Z2… Zn be n independently distributed standard normal variables. Then the distribution is
2
2
2
Z 1+Z 2+…Z n with n degrees of freedom.
Chi square tests are used for:
1. To test if a population has a given variance / standard deviation
2. To test ‘goodness of fit’ of a theoretical distribution to an observed distribution
3. To test independence of attributes in a contingency table.
2
=
Test for Variance / Standard Deviation
If the variance of a normal population is not known, we want to test if the population has a given
variance.
The null hypothesis is:
H0:
2
=
2
(population variance is
)
2
The alternative hypothesis is:
1. H1:
2
2. H1:
2
2
the test is one tailed with critical region in the upper tail
3. H1:
2
2
the test is one tailed with critical region in the lower tail.
>
>
2
The test is two tailed
The test statistic is
πœ’2 =
𝑛𝑠 2
𝜎2
Is a chi square variate with (n-1) degrees of freedom.
If value of s is not given it can be calculated using the following:
𝑠2 =
∑ 𝑒2
∑𝑒
−[ ]
𝑛
𝑛
2
𝑒 = (π‘₯ − 𝐴)
To Test Goodness of Fit
Suppose there is an observed frequency Oi, let a theoretical frequency Ei be fit to the observed
distribution.
The null hypothesis is:
H0: The theoretical frequency distribution is a good fit to the observed frequency distribution
The alternative hypothesis is
H1: The theoretical frequency distribution is not a good fit to the observed frequency distribution
The test statistic is
πœ’2 = ∑
(𝑂𝑖 − 𝐸𝑖 )2
𝐸𝑖
Under H0 ,
πœ’2 = ∑
(𝑂𝑖 −𝐸𝑖 ) 2
𝐸𝑖
is a chi square variate with (n-c) d.f
n is the number of terms in the πœ’ 2 column after pooling frequencies less than 5 with the adjacent ones.
c is one more than the number of parameters estimated from the observed distribution.
Test for Independence of Attribute
Suppose we want to test the independence of two attributes A and B in a population. We apply the Chi Square test as follows:
A
A
Total
B
a
c
a+c
B
b
d
b+d
Total
a+b
c+d
N=a+b+c+d
The null hypothesis is:
H0: Attributes A and B are independent
The alternative hypothesis is:
H1: Attributes A and B are not independent.
The test statistic is
πœ’2 =
𝑁 (π‘Žπ‘‘ − 𝑏𝑐) 2
(π‘Ž + 𝑏)(𝑐 + 𝑑)(π‘Ž + 𝑐)(𝑏 + 𝑑)
Under H0 this is a chi square variate with 1 d.f
This test is one tailed (upper).
Download