Uploaded by steelray21

StatDS Ch8-13 Jan2023

advertisement
8
Testing Hypothesis for Two
Population Parameters
SECTIONS
8.1 Testing Hypothesis for Two Population
Means
8.1.1 Two Independent Samples
8.1.2 Paired Sample
8.2 Testing Hypothesis for Two Population
Variances
8.3 Testing Hypothesis for Two Population
Proportions
CHAPTER OBJECTIVES
In Chapter 7, we discussed how to test
hypotheses about parameters in a single
population.
In this chapter, we discuss testing
hypothesis to compare population
parameters of two populations.
Section 8.1 discusses a t-test for testing
hypothesis of two population means when
samples are independent and when samples
are paired.
Section 8.2 discusses a F-test for testing
hypothesis of two population variances.
Section 8.3 discusses a Z-test for testing
hypothesis of two population proportions
when samples are large enough.
2
/ Chapter 8 Testing Hypothesis for Two Population Parameters

8.1 Testing Hypothesis for Two Population Means
Ÿ
There are many examples comparing means of two populations as follows:
- Is there a difference between the starting salary of male graduates and of
female graduates in this year’s college graduates?
- Is there a difference in the weight of the products produced in the two
production lines?
- Did the special training for typists to increase the speed of typing really bring
about an increase in the speed of typing?
Ÿ
As such, a comparison of the two population means (  and  ) is possible by
testing hypothesis that the difference in the population means is greater than, or
less than, or equal to zero. The comparison of two population means differs
depending on whether samples are extracted independently from each population
or not (referred to as paired samples).
8.1.1 Two Independent Samples
Ÿ
Generally, testing hypothesis for two population means can be divided into three
types, depending on the type of the alternative hypothesis as follows:
       
      
2)       
      
3)       
     ≠ 
Here  is the value for the difference in population means to be tested.
Ÿ
Ÿ
When samples are selected independently from each other in the population, the
estimator of the difference of the population means    is the difference in
  
  . The sampling distribution of all possible sample mean
sample means 
differences is approximately a normal distribution with the mean    and
variance    if both sample sizes are large enough.
Since the population variances  and  are usually unknown, estimates of
these variances,  and  , are used to test the hypothesis. The test statistic
differs slightly depending on the assumption of two population variances. If two
populations follow normal distributions and their variances can be assumed the
same, the testing hypothesis for the difference of two population means uses the
following statistic.

  
    


 

 




        
  
    
  is an estimator of the population variance called as a pooled variance which is
Ÿ
an weighted average of two sample variances  and  by using the sample
sizes as weights when population variances are assumed to be the same.
The above statistic follows a t-distribution with      degrees of freedom and
it is used to test the difference of two population means as follows:
8.1 Testing Hypothesis for Two Population Means / 3

Table 8.1.1 Testing hypothesis of two population means
- independent samples, populations are normal distributions,
two population variances are assumed to be equal
Type of Hypothesis
Decision Rule
 
  




        , then reject  , else accept 
        If 





      
 




 
  




         , then reject  , else accept 
        If 










   
 




        If
     ≠ 

  
    





 





        , then reject  , else accept 
※ If sample sizes are large enough (     ),  distribution is approximately close to the
standard normal distribution and the decision rule may use the standard normal distribution.
Example 8.1.1
Answer
Two machines produce cookies at a factory and the average weight of a cookie bag
should be 270g. Cookie bags were sampled from each of two machines to examine the
weight of the cookie bag. The average weight of 15 cookie bags extracted from the
machine 1 was 275g and their standard deviation was 12g, and the average weight of
14 cookie bags extracted from the machine 2 was 269g and the standard deviation
was 10g. Test whether weights of cookie bags produced by two machines are different
at the 1% significance level. Check the test result using『eStatU』.
w The hypothesis of this problem is     
decision rule is as follows:
‘If

  
    




 







   ≠  . Hence, the
        , then reject  ’
The information in this example can be summarized as follows:
  
  

       

       
Therefore,
 

      

    
  




 




      
   
    
 
           
Since 1.457 < 2.7707,  can not be rejected.
4
/ Chapter 8 Testing Hypothesis for Two Population Parameters

Example 8.1.1
Answer
(continued)
w In『eStatU』menu, select ‘Testing Hypothesis   ’, In the window shown in
<Figure 8.1.1>, check the alternative hypothesis of not equal case at [Hypothesis],
check the variance assumption of [Test Type] as the equal case, check the
significance level of 1%, check the independent sample, and enter sample sizes
  
  , and sample variances as in <Figure 8.1.1>.
  , sample means 
『
<Figure 8.1.1> Testing hypothesis for two population means using eStatU
』
w Click the [Execute] button will show the result of testing hypothesis as <Figure
8.1.2>.
<Figure 8.1.2> Testing hypothesis for two population means
– case of the same population variances
Ÿ
If variances of two populations are different, the test statistic

X  
X




 






do not follow a t distribution even if populations are normally distributed. The
testing hypothesis for two population means when their population variances are
8.1 Testing Hypothesis for Two Population Means / 5

different is called a Behrens-Fisher problem and several methods to solve this
problem have been studied. The Satterthwaite method approximates the degrees
of freedom      of the t distribution in the decision rule in Table 8.1.1
with  as follows:

  
 



 

  

 

 




 

  
  
Ÿ


 
 
Table 8.1.2 summarizes decision rule when two population variances are different.
Table 8.1.2 Testing hypothesis of two population means
- independent samples, populations are normal distributions,
two population variances are assumed to be different
Type of Hypothesis
Decision Rule

  
    
    , then reject  , else accept 
        If 



 
      
 





  
    
     , then reject  , else accept 
        If 



 
      
 




        If
     ≠ 
Example 8.1.2
Answer

  
    












    , then reject  , else accept 
If two population variances are assumed to be different in [Example 8.1.1], test
whether weights of cookie bags produced from two machines are equal or not at a 1%
significance level. Check the test result using『eStatU』.
w Since the population variances are different, the degrees of freedom  of 
distribution is approximated as follows:

 
 




 
 
  

 






 

  
  
 
 
    
Since 1.457 < 2.773,  can not be rejected.
w In order to practice using『eStatU』, select the different population variances
assumption
of [Test Type] in the window of <Figure 8.1.1> and click
the [Execute] button to see the result as shown in <Figure 8.1.3>.
6
/ Chapter 8 Testing Hypothesis for Two Population Parameters

Example 8.1.2
Answer
<Figure 8.1.3> Testing hypothesis for two population means
– Case of two different population variances
Example 8.1.3
(Monthly wages by male and female)
Samples of 10 male and female college graduates this year were randomly taken and
their monthly average wages were examined as follows: (Unit 10,000 KRW)
Male
272 255 278 282 296 312 356 296 302 312
Female
276 280 369 285 303 317 290 250 313 307
⇨ eBook ⇨ EX080103_WageByGender.csv.
Using『eStat』, answer the following questions.
1) If population variances are assumed to be the same, test the hypothesis at the 5%
significance level whether the average monthly wage for male and female is the
same.
2) If population variances are assumed to be different, test the hypothesis at the 5%
significance level whether the average monthly wage for male and female is the
same.
Answer
1) In 『eStat』, enter raw data of gender (M or F) and income as shown in <Figure
8.1.4> on the sheet. This type of data input is similar to all statistical packages.
After entering the data, click the icon
for testing two population means and
select 'Analysis Var' as V2 and 'By Group' variable as V1. A 95% confidence interval
graph that compares sample means of two populations will be displayed as <Figure
8.1.5>.
8.1 Testing Hypothesis for Two Population Means / 7

Example 8.1.3
Answer
<Figure 8.1.4> Data
input for testing two
population means
<Figure 8.1.5> Dot graph and confidence Intervals by
gender for testing two population means
w In the options window as in <Figure 8.1.6> located at the below the Graph Area,
enter the average difference    for the desired test, select the variance
assumption    , select the 5% significance level and click the [t-test] button.
Then the graphical result of testing hypothesis for two population means will be
shown as in <Figure 8.1.7> and the test result as in <Figure 8.1.8>.
<Figure 8.1.6> Options to test for two population means
<Figure 8.1.7> Testing hypothesis for  and  – case of
the same population variances
8
/ Chapter 8 Testing Hypothesis for Two Population Parameters

Example 8.1.3
Answer
(continued)
<Figure 8.1.8> result of testing hypothesis for two
population means if population variances are the same
2) Select the variance assumption  ≠  at the option window and click [t-test]
button under the graph to display the graph of the hypothesis test and the test
result table as in <Figure 8.1.9> and <Figure 8.1.10>.
<Figure 8.1.9> Testing hypothesis for  and 
– case of the different population variances
<Figure 8.1.10> Result of testing hypothesis for two population
means if population variances are different
8.1 Testing Hypothesis for Two Population Means / 9

[Practice 8.1.1]
(Oral Cleanliness by Brushing Methods)
Oral cleanliness scores were examined for 8 samples who are using the basic brushing
method (coded 1) and 7 samples who are using the rotation method (coded 2). The
data are saved at the following location of『eStat』.
⇨ eBook ⇨ PR080101_ToothCleanByBrushMethod.csv.
1) If population variances are the same, test the hypothesis at the 5% significance level
whether scores for both brushing methods are the same using『eStat』.
2) If population variances are different, test the hypothesis at the 5% significance level
whether scores for both brushing methods are the same using『eStat』.
8.1.2 Paired Sample
Ÿ
Ÿ
Ÿ
The testing hypothesis for two population means in the previous section is based
on two samples extracted independently from each population. However, in some
cases it is difficult to extract samples independently, or if samples are extracted
independently, then the resulting analysis may be meaningless, because
characteristics of each sample differ too much.
For example, you want to give typists a special education to increase the speed
of typing and want to see if this training has been effective in the speed of
typing. In this case, if different samples are extracted before and after education,
it is difficult to measure the effectiveness of education, because individual
differences are severe. In order to overcome the individual difference for a typist
who has sampled before training education, if you measure the typing speed
before and after the training for the typist, the effect of special education can be
well understood.
A hypothesis test that uses same samples to perform similar experiments to
compare means of two populations is called a paired comparison. In the paired
comparison, we calculate the difference (  ) between paired data  and  as
shown in Table 8.1.3 and obtain the mean of differences (  ) and variance of
differences (   ).
Table 8.1.3 Data for a paired comparison
Ÿ
Sample of population 1
(  )
Sample of population 2
(  )
Difference of pair




    
    
...
...


    
...
    
  
    
   
Mean of 

 
Variance 




When two populations of normal distributions have the same mean, the sample
  follows a t distribution with the n-1 degrees of freedom. It
statistic 
allows the testing of the difference between two population means in case of the
paired comparison as follows:
10
/ Chapter 8 Testing Hypothesis for Two Population Parameters

Table 8.1.4 Testing hypothesis of two population means (paired comparison)
- two populations are normal distributions, and paired sample case
Type of Hypothesis
Decision Rule
       
      

 
If        , then reject  , else accept 




       
      

 
If         , then reject  , else accept 





 





 
       
     ≠ 
Example 8.1.4
Answer
      , then reject  , else accept 
The following is the result of a special
typists before and after the training.
increased at the 5% significance level.
normal distribution. Check the test result
training to improve the typing speed of eight
Test whether or not the typing speed has
Assume that the speed of typing follows a
using『eStat』and『eStatU』.
id
Typing speed
before training
(unit: words/min)
Typing speed
after training
(unit: words/min)
1
2
3
4
5
6
7
8
52
60
63
43
46
56
62
50
58
62
62
48
50
55
68
57
w This problem is for testing the null hypothesis        to the alternative
hypothesis        to compare the typing speed of typists before
training (population 1) and after training (population 2) using paired samples.
Therefore, the decision rule is as follows:

 
‘If         , then reject  .’




w Calculated differences ( ) of paired samples before and after training, the mean
 and standard deviation ( ) of differences are as follows:
( )

8.1 Testing Hypothesis for Two Population Means / 11

Example 8.1.4
Answer
id
Typing speed
before training
(unit: words/min)
Typing speed
after training
(unit: words/min)
Difference 
1
2
3
4
5
6
7
8
52
60
63
43
46
56
62
50
58
62
62
48
50
55
68
57
-6
-2
1
-5
-4
1
-6
-7
Mean 
 
Standard deviation
  
w The test statistic is as follows:

 
 
    









                    ,
Therefore,  is rejected and concludes that the training increased the typing
speed.
w In『eStatU』menu,
select
’Testing
Hypothesis:
  ’,
select
the
alternative
hypothesis
at [Hypothesis], check the 5% significance level, check
‘paired sample’ at [Test Type], and enter data of sample 1 and sample 2 of paired
samples at [Sample Data] as in <Figure 8.1.11>.
『
』
<Figure 8.1.11> Testing hypothesis for two population means using
eStatU - paired sample
w Click the [Execute] button to calculate the sample mean and sample standard
 and  ) and to show the result of the hypothesis test
deviation of differences ( 
as <Figure 8.1.12>.
12
/ Chapter 8 Testing Hypothesis for Two Population Parameters

Example 8.1.4
Answer
(continued)
『
』
<Figure 8.1.12> Result of testing hypothesis for two population
means using eStatU - paired sample
w In『eStat』, the paired data is entered in two columns as shown in <Figure
8.1.13>. Click the icon for testing two population means
and select 'Analysis
Var' as V1 and 'by Group' as V2 to show the dot graph and the confidence interval
for differences of paired data as in <Figure 8.1.14>.
⇨ eBook ⇨ EX080104_TypingSpeedEducation.csv.
<Figure 8.1.13> Data input
of paired sample
<Figure 8.1.14> Dot graph of difference data of
paired sample
8.1 Testing Hypothesis for Two Population Means / 13

Example 8.1.4
Answer
(continued)
w Enter the mean difference D = 0 for the desired test in the options window below
the graph, select the 5% significance level, and press the [t-test] button to display
the result of the hypothesis test for paired samples such as <Figure 8.1.15> and
<Figure 8.1.16>.
『
』
『
』
<Figure 8.1.15> Testing hypothesis for two
population means using eStat - paired sample
<Figure 8.1.16> Result of testing hypothesis for two
population means using eStat - paired sample
[Practice 8.1.2]
Randomly sampled data of (wife age, husband age) for 8 couples are as follows:
(28, 28) (29, 30) (18, 21) (29, 33) (22, 22) (18, 21) (40, 35) (24, 29)
⇨ eBook ⇨ PR080102_CoupleAge.csv.
Test whether the population mean of wife’s age is the same as the population mean
husband’s age or not. Use the significance level of 0.05.
14
/ Chapter 8 Testing Hypothesis for Two Population Parameters

8.2 Testing Hypothesis for Two Population Variances
Ÿ
Consider following examples to compare two population variances.
- When comparing two population means in the previous section, we studied
that if the sample size was small, the decision rule for testing hypothesis were
different depending on whether two population variances were the same or
different. So how can we test if two population variances are the same?
- The quality of bolts used to assemble cars depends on the strict specification
for their diameters. Average diameters of bolts produced by two factories were
said to be the same and if the variance of diameters is smaller, it is
considered as superior production. How can you compare variances of the
diameter?
Ÿ
When comparing variances (  and  ) of two populations, the ratio (  ) of
variances is calculated instead of comparing the difference in variances. If the
ratio of variances is greater, smaller, or equal to 1, you can see that  is
greater, smaller, or equal to  . The reason for using the ratio of variances
instead of the difference of variances is that it is easy to find the sampling
distribution of the ratio of variances mathematically. If two populations follow
normal distributions, and if  and  samples are collected randomly from each
population, the ratio of two sample variances  and  such as
   


  

   


  
Ÿ
 
follows a F distribution with the numerator degrees of freedom    and the
denominator degrees of freedom    . Using this fact, we can perform testing
hypothesis on the ratio of population variances.
F distribution is an asymmetrical distribution group with two parameters, the
numerator degrees of freedom and denominator degrees of freedom. <Figure
8.2.1> shows F distributions for different parameters.
<Figure 8.2.1> F distribution of different degrees of freedom.
Ÿ
Testing hypothesis for two population variances can be performed using the F
distribution as following Table 8.2.1.
8.2 Testing Hypothesis for Two Population Variances / 15

Table 8.2.1 Testing hypothesis for two population variances
- Two populations are normally distributedType of Hypothesis
1)     
    
2)     
    
3)     
   ≠ 
Example 8.2.1
Answer
Decision Rule
 
If          , then reject  , else accept 

 
If            , then reject  , else accept 

 
 
If            or          , then reject

 , else accept 

A company that produces a bolt has two plants. One day, 12 bolts produced in Plant
1 were sampled randomly and the variance of diameter was  . 10 bolts produced
in Plant 2 were sampled randomly and the variance of diameter was  . Test
whether variances of the bolt from two plants are the same or not with the 5%
significance level. Check the test result using『eStatU』.
w The hypothesis of this problem is      ,    ≠  and its decision
rule is as follows:






‘If              or            , then reject  , else accept  ’


The test statistic using two sample variances    and the percentile of F
distribution is as follows:


 
  


                
              
Hence, the hypothesis  can not be rejected and conclude that two variances are
equal.
w In『eStatU』menu, select ‘Testing Hypothesis  ,  . At the window shown in
<Figure 8.2.2>, enter   ,   ,    ,    . Click the
[Execute] button to reveal the hypothesis test result shown in <Figure 8.2.3>.
『
』
<Figure 8.2.2> Data input for testing hypothesis of two
population variances using eStatU
16
/ Chapter 8 Testing Hypothesis for Two Population Parameters

Example 8.2.1
Answer
(continued)
『
』
<Figure 8.2.3> Testing hypothesis for two population variances
using eStatU
Example 8.2.2
(Income of college graduates, data of [Example 8.1.3])
Samples of 10 male and 10 female graduates of the college this year were taken and
the average monthly income were examined as follows. Test whether variances of two
populations are equal.
Male
272 255 278 282 296 312 356 296 302 312
Female 276 280 369 285 303 317 290 250 313 307
⇨ eBook ⇨ EX080103_WageByGender.csv.
Answer
(Unit 10000 KRW)
w In『eStat』, enter the gender and income in two columns on the sheet as shown
in <Figure 8.2.4>. This type of data input is similar to all statistical packages. Once
you entered the data, click on the icon
for testing two population variances
and select 'Analysis Var' as V2 and 'By Groups' as V1. Then a mean-standard
deviation graph for each group will be appeared as in <Figure 8.2.5>.
<Figure 8.2.4> Data
input for testing two
population variances
<Figure 8.2.5> Dot graph and mean-standard
deviation interval of each group
8.2 Testing Hypothesis for Two Population Variances / 17

Example 8.2.2
Answer
(continued)
w If you click the [F-Test] button int the options window below the graph, a test
result graph using F distribution such as <Figure 8.2.6> is appeared in the Graph
Area and the result table is appeared as in <Figure 8.2.7> appears in the Log Area.
<Figure 8.2.6> Testing hypothesis for two population
variances
<Figure 8.2.7> Result table of testing two population variances
[Practice 8.2.1]
Tire products from two companies are known to have the same average life span of
80,000km. However, there seems to be a difference in the variance. Sixteen tires from
each of the two companies were randomly selected and run under similar conditions to
measure their life span. The sample variance was 4,500 and 2,500, respectively.
Using『eStatU』, test the null hypothesis that the variances of the tire life of two
products are the same at the 5% significance level.
18
/ Chapter 8 Testing Hypothesis for Two Population Parameters

8.3 Testing Hypothesis for Two Population Proportions
Ÿ
Consider the following examples which compare two population proportions.
- Is there a gender gap in the approval rating for a particular candidate in this
year's presidential election?
- A factory has two machines that make products. Do two machines have
different defect rates?
Ÿ
Comparing proportions  and  of two populations is possible by testing the
difference between two proportions    as the comparison of two population
  
  from two populations follows
means. The difference in sample proportions 
a
normal
distribution
with
the
mean
  
and
variance
           when two sample sizes are large enough. Since we
do not know population proportions  and  to estimate the variance, weighted
  and 
  by using sample sizes as
average value  for two sample proportions 
weights is used as follows:

   


  
  
Ÿ
The testing hypothesis for two population proportions uses the following test
statistic.

  





   
  






Table 8.3.1 Testing hypothesis for two population proportions
- two independent large samples Type of Hypothesis
1)     
    
2)     
    
3)     
   ≠ 
Decision Rule

  

If 
>   , then reject  , else accept 



   
  






  

If 
< -  , then reject  , else accept 


  
 
  







If

  




  
 
  







>   , then reject  , else accept 
8.3 Testing Hypothesis for Two Population Proportions / 19

Example 8.3.1
Answer
A survey was conducted for a presidential election and samples were selected
independently from both male and female populations. 54 out of 225 samples from the
male population supported the candidate A and 52 out of 175 samples from the
female population supported the candidate A. Test whether there is a difference in
approval ratings of the male and female populations with the 5% significance level.
Check the result using『eStatU』.
w The hypothesis of this problem is      ,    ≠  and its decision
rule is as follows:
‘If


  




  
 
  






>   , then reject  , else accept  ’
  = 54/225 = 0.240, 
  = 52/175 = 0.297,
w Since 
calculated as follows:

 and the test statistic can be

 = (54 + 52) / (225 + 175) = 106 / 400 = 0.265


  



   
  




 
=
  


     





= 1.28
 =   =   = 1.96
Therefore, the hypothesis  can not be rejected and we conclude that there is
not enough evidence that the approval ratings of male and female are different.
w In『eStatU』menu, select ’Testing Hypothesis   ’ and enter   ,

  ,   , 
   as shown in <Figure 8.3.1>. Clicking the [Execute]



button will show the result of the hypothesis test as shown in <Figure 8.3.2>.
『
』
<Figure 8.3.1> Data input for testing two population
proportions in eStatU
『
』
<Figure 8.3.2> Result of testing hypothesis for two
population proportions using eStatU
20
/ Chapter 8 Testing Hypothesis for Two Population Parameters

Example 8.3.2
In 2000, a simple random sampling of 1,000 people aged 15 to 29 across the country
examined the status of marriage, and 63.5 percent were single. In 2020, another 1,000
people were surveyed independently, with 69.8 percent of them being single. From this
fact, can you say that there has been a tendency to get married late in recent years?
In other words, test at the 5% significance level whether the population aged 15 to 29
in 2020 is more likely to be single than in 2000. What is the p-value of this test?
Answer
w The hypothesis of this problem is      ,      and its decision
rule is as follows:

  

If 
< -  , then reject  , else accept 


  
 
  






  = 0.635 and 
  = 0.698, 
w Since 
 and the test statistic are as follows:
 ×    × 
  

      
  


  

  
= 
= -2.989



     
     












- = -  = -1.645
Therefore,  is rejected. and conclude that the proportion of unmarried people in
2020 has been increased. p-value can be calculated as follows:
-value =     = 0.0014
[Practice 8.3.1]
Ÿ
Ÿ
In a company, the labor union found that 63 percent of 200 salesmen who did not
receive a college education wanted to take it back even now. The company did a
similar study 10 years ago and it was only 58 percent of 100 salesmen wanted it. Test
the null hypothesis that the desire for college education is not different from 10 years
ago using the significance level of 0.05. Samples were selected independently.
In the previous two examples of comparing two population proportions, two
sample proportions were calculated from independent samples.
Suppose two candidates ran in an election and one thousand samples were
selected to test whether there was any difference on the candidate's approval
rating. The approval ratings  and  of two candidates obtained from the
sample are not independent, because unlike two previous examples they are
calculated from one set of samples. So the test method should be different. The
following statistic are used to test whether there is a difference in approval
ratings of two candidates.

  

, where    





  



Ÿ
is



            



  
 .
the standard error of 
Assuming that two population proportions are equal, the estimated value    is
8.3 Testing Hypothesis for Two Population Proportions / 21

as follows


  
Ÿ




,




 
 

If the sample size is large, the test statistic follows a normal distribution which
allows proper testing hypothesis according to the form of the alternative
hypothesis. As such, it is important to distinguish between sample proportions
from independent samples and not independent samples when we compare two
population proportions.
22
/ Chapter 8 Testing Hypothesis for Two Population Parameters

Exercise
8.1 An analyst studies two types of advertising methods (A and B) tried by retailers. The variable is
the sum of the amount spent on advertising over the past year. The following is the sample
statistics extracted independently from retailers of each type. (Unit million USD)
Type A:
 = 60
Type B:
 = 70

  = 14.8

  = 14.5
  = 0.180
  = 0.133
From these data, can you conclude that type A retailers have invested more in advertising than
type B retailers? (Significance level = 0.05)
8.2 Paper making plants are looking to buy one of two forests. The followings are diameters of 50
trees sampled from each forest. From these data, test at the significance level of 0.05 whether the
trees in area B are on average smaller than those in area A. What is the p-value of this test?
Area A:
Area B:

  = 28.25

  = 22.50
 = 25
 = 16
8.3 In order to check the period of residence at the current house in region A and B, the following
statistics were examined from simple random samples of 100 households in A and 150
households in B. From this data, can households in A area live shorter on average than those in
B? (Significance level = 0.05)
Region A
Region B
8.4

  = 33 months

  = 49 months

900
= 1,050
An advertising analyst surveyed how much working men and housewives were exposed to
advertisements on radio, TV, newspaper and magazines. The survey item was the number of
advertisements that each group encountered in a particular week and the sample mean and
standard deviation of each group are as shown in the table below. From these data, can you say
that housewives are exposed to more advertisements on average than working men? (Significance
level = 0.05)
Group
Working men
Housewives
8.5
 =

Sample mean
100
144
200
225
Sample Standard Deviation
50
60
One company wants to test whether a female employee uses the phone longer than a male
employee. A sample survey of 10 males and 10 females for one-day call time measurement are
as follows. Is there a difference in the average call time between male and female? Use the 5%
significance level.
Male
8
6 4
6
2 2
Female
4
4 10
2 8
4
8 10 10
4 10
8 13 14
(unit minutes)
8.6 One factory tries to compare the adhesion of motor oil from two companies. Among the products
of each company, 32 products were randomly selected and tested as follows. Based on these
Chapter 8 Exercise / 23

data, can you conclude that the adhesion means of the two company products are different?
(Significance level = 0.05.)
Company A
Company B
13
52
46
74
21
25
52
73
60
11
66
43
35
11
65
70
38
55
71
51
10
44
67
72
36
25
47
65
24
41
48
45
35
16
58
76
35
47
42
48
45
50
66
56
19 42 11 35 39 25 17 51 25
18
69 60 80 45 47 69 75 43 46
64
8.7 An industrial psychologist thinks that the big factor that workers change jobs is self-esteem to
workers' individual work. The scholar thinks that workers who change jobs frequently (group A)
have lower self-esteem than those who do not (group B). The following data are used to measure
the score of self-esteem by sampling each group independently.
Group A
60 45 42 62 68 54 52 55 44 41
Group B
70 72 74 74 76 91 71 78 78 83 50 52 66 65 53 52
Can this data support the psychologist's idea? Assume that scores of the population are normally
distributed and that the population variance is not known but the same. (Significance level = 0.01)
8.8 In a business administration department of a university, a debate arose over claims that men have
more knowledge of the stock market than women. To calm the dispute, the instructor sampled
each of 15 men and women independently and tested them for knowledge of the stock market.
The result is as follows:
Women
73 96 74 55 91 50 46 82 79 79 50 46 81 83
Men
57 78 42 44 91 65 63 60 97 85 92 42 86 81 64
According to the data, on average, can you say that men have more knowledge of the stock
market than women? Use the significance level of 0.05. What assumptions do you need?
8.9 An oil company has developed a gasoline additive that will improve the fuel mileage of gasoline.
We used 16 pairs of cars to compare the fuel mileage to see if it actually improved. Each pair of
cars has the same details as its structure, model, engine size, and other relationship
characteristics. When driving the test course using gasoline, one of the pair selected randomly and
added additives, the other of the pair was driving the same course using gasoline without
additives. The following table shows the km per liter for each of pairs. Is this data a basis for
saying that additives increase fuel mileage? Assume that the fuel mileage is normally distributed.
Use 5% significance level.
(unit: km / liter)
pair
1
2
3
4
5
6
7
8
Additive
(X1)
17.1
12.7
11.6
15.8
14.0
17.8
14.7
16.3
No Additive
(X2)
16.3
11.6
11.2
14.9
12.8
17.1
13.4
15.4
pair
9
10
11
12
13
14
15
16
Additive
(X1)
10.8
14.9
19.7
11.4
11.4
9.3
19.0
10.1
No Additive
(X2)
10.1
13.7
18.3
11.0
10.5
8.7
17.9
9.4
24
/ Chapter 8 Testing Hypothesis for Two Population Parameters

8.10 A study deals with a survey on whether car accidents in a village can be reduced effectively by
increasing the number of street lamps. The following table shows the average number of accidents
per night, one year before and one year after putting street lamps on 12 locations. Does this data
provide evidence that street lamps have reduced nightly car accidents? Use the 5% significance
level.
Location
A
Before
After
B
C
D
E
F
G
8 12
5
4
6
3
4
5
2
1
4
2
2
3
H
I
J
K
L
3
2
6
6
9
4
3
5
4
3
8.11 The survey result of (wife’s age, husband’s age) by sampling 16 couples are as follows
(28, 28) (29, 30) (18, 21) (29, 33) (22, 22) (18, 21) (40, 35) (24, 29)
(21, 31) (20, 24) (20, 34) (23, 25) (33, 39) (33, 35) (40, 29) (39, 40)
Test whether the wife’s age is the same as the husband’s age or not. Use the significance level
of 0.05.
8.12 One person is considering the use of a test to compare between two population means. 16
samples are randomly taken from two populations and their sample variances are 28.5 and 9.5.
Is this data shows evidence that two population variances are the same? (Significance level =
0.05)
8.13 Certain studies have been planned to compare the two relaxing drugs for office workers in
stressful jobs. A medical team sampled eight workers for each of two drugs and collected data on
the strain. Two sample variances are   = 2916 and  = 4624. Using the significance level of
0.05, can this data be said to differ in two population variances of tension? Explain necessary
assumptions.
8.14 Let  and  be the number of days it takes for a plant to sprout its wide leaves and narrow
leaves, respectively. The measured data are as follows:
  , 
 ,   ,
  , 
 ,   .
If  ∼      and  ∼      , test the following hypothesis using the 5% significance
level.
 :   ,  :   
8.15 Both tire products are known to have an average life span of 80,000 km. However, there seems
to be a difference in the variance. Sixteen tires from each of two companies were randomly
selected and run under similar conditions to measure their life span. Sample variances were 4,500
and 2,200, respectively.
1) Test the null hypothesis that variances of the tire life of two products are the
same at the significance levels of both 0.10 and 0.05.
2) Obtain 90% and 95% confidence intervals of the ratio  .
8.16 A carpet manufacturer is looking for materials that can withstand temperature above 250 degree
Fahrenheit. One of two materials is a natural material and the other is a cheap artificial material,
which both have the same properties except for heat-resistant levels. As a result of a
Chapter 8 Exercise / 25

heat-resistant experiment by independently selecting 250 samples from each of two materials, 36
samples from natural materials and 45 samples from man-made materials failed at temperatures
above 250 degrees Fahrenheit. Is there a difference in the heat resistance of two materials from
this data using the significance level is 0.05?
8.17 A labor union of a company found that 63 percent of 150 salespeople who did not receive a
college education wanted to take it back even now. The company did a similar study 10 years
ago, when only 58 percent of 160 people wanted it. Test the null hypothesis that the desire for
college education is not different from 10 years ago using the significance level of 0.05. Samples
were selected independently.
8.18 When we extracted 200 companies of the type A and examined them, we found that 12% of them
spent more than 1% of their total sales on advertising. The other 200 companies of the type B
independently selected and examined, we found 15% of them spent more than 1% of their total
sales on advertising. Test the following hypotheses with the significance level of 0.05.
   ≤  ,     
8.19 In a company, a study was conducted on the leisure activities of sales staffs and managing staffs.
400 persons were selected independently from each of sales and managing staffs. 288 sales and
260 managing staffs answered that they usually spend their leisure time on sports activities. From
this data, can you say that the percentage of two groups for the leisure time spent on sports
activities is the same? Use the significance level of 0.05.
8.20 In September 2013, a research institute surveyed 260 men and 263 women about a political issue
and the response result is as follows. Do you think there are significant differences in their way
of thinking on the political issue?' Specify the null hypothesis and the alternative hypothesis and
test at the 5% significance level.
Men
Women
Yes
57%
65%
No
43%
35%
8.21 In order to see whether the unemployment rate in two cities are different, samples of 500 people
were randomly selected from two cities and found unemployment persons were 35 and 25
respectively. Can you say that the unemployment rate in two cities is different? Describe the
necessary assumptions and calculate the p-value.
26
/ Chapter 8 Testing Hypothesis for Two Population Parameters

Multiple Choice Exercise
8.1 One professor claims that 'A student who studies in the morning will get better math score than
a student who studies in the evening.' Assume that  is the average exam score of students who
study in the morning and  is the average exam score of students who study in the evening.
What is the null hypothesis of this test?
①
③


> 
≠ 
② ≧
④ =




8.2 What is the alternative hypothesis of the test of the above question 8.1?
①
③


8.3
> 
≠ 
② ≧
④ =




A researcher claims that “After age of 40 and over, there is no difference in weight between male
and female.” Assume the average weight of males whose age is 40 and more is  and the
average weight of females whose age is 40 and more is  . What is the alternative hypothesis of
the test?
① =
③ >




②
④


≠ 
< 
8.4 We want to test whether two population means are equal or not using t-test. Which one of the
following is not a required assumption?
① Populations are normal distributions.
② Two population variances are the same.
③ Samples are selected independently.
④ Samples are collected using cluster sampling method.
8.5 Which sampling distribution is used to test whether two population means are equal or not when
sample sizes are small?
① Normal distribution
② t-distribution
③ Chi square distribution.
④ F-distribution
8.6 16 couples are randomly selected to compare their ages as follows. What is the name of this kind
of data?
(woman age, man age)
(28, 28) (29, 30) (18, 21) (29, 33) (22, 22) (18, 21) (40, 35) (24, 29)
(21, 31) (20, 24) (20, 34) (23, 25) (33, 39) (33, 35) (40, 29) (39, 40)
① independent data
③ random data
② paired data
④ cluster data
8.7 Which sampling distribution is used to test whether two population variances are equal or not when
populations are normally distributed?
Chapter 8 Multiple Choice Exercise / 27

① Normal distribution
② t-distribution
③ Chi square distribution.
④ F-distribution
8.8 Which sampling distribution is used to test whether two population proportions are equal or not
when sample sizes are large enough?
① Normal distribution
② t-distribution
③ Chi square distribution.
④ F-distribution
8.9
In a company, a comparative study was conducted on leisure activities of sales staffs and
managing staffs. 400 staffs selected independently from each of sales staffs and management
staffs and surveyed. We found that 288 sales staffs and 260 managing staffs answered that they
usually spend their leisure time on sports activities. Which of the following is the null hypothesis
for comparing two groups?
①  
③  




②  ≠
④  




8.10 Which of the following is the alternative hypothesis in question 8.9?
①  
③  




②  ≠
④  




(Answers)
8.1 ④, 8.2 ①, 8.3 ②, 8.4 ④, 8.5 ②, 8.6 ②, 8.7 ④, 8.8 ①, 8.9 ①, 8.10 ②,
♡
9
Testing Hypothesis for
Several Population Means
SECTIONS
CHAPTER OBJECTIVES
9.1 Analysis of Variance for
Experiments of Single Factor
9.1.1 Multiple Comparison
9.1.2 Residual Analysis
In testing hypothesis of the population mean
described in chapters 7 and 8, the number of
populations was one or two. However, many
cases are encountered where there are three
or more population means to compare.
9.2 Design of Experiments for Sampling
The analysis of variance (ANOVA) is used to
test whether several population means are
equal or not. The ANOVA was first published
by British statistician R. A. Fisher as a test
method applied to the study of agriculture, but
today its principles are applied in many
experimental sciences, including economics,
business administration, psychology and
medicine.
9.2.1 Completely Randomized Design
9.2.2 Randomized Block Design
9.3 Analysis of Variance for
Experiments of Two Factors
In section 9.1, the one-way ANOVA for single
factor is introduced. In section 9.2, experimental
designs for experiments are introduced. In
section 9.2, the two-way ANOVA for two factors
experiments is introduced.
30
/ Chapter 9 Testing Hypothesis for Several Population Means

9.1 Analysis of Variance for Experiments of Single Factor
Ÿ
In section 8.1, we discussed how to compare means of two populations using the
testing hypothesis. This chapter discusses how to compare means of several
populations. There are many examples of comparing means of several populations
as follows:
- Are average hours of library usage for each grade the same?
- Are yields of three different rice seeds equal?
- In a chemical reaction, are response rates the same at four different
temperatures?
- Are average monthly wages of college graduates the same at three different
cities?
Ÿ
The group variable used to distinguish groups of the population, such as the
grade or the rice, is called a factor.
Factor
Definition
The group variable used to distinguish groups of the population is called
a factor.
Ÿ
Example 9.1.1
This section describes the one-way analysis of variance (ANOVA) which compares
population means when there is a single factor. Section 9.2 describes how the
experiment is designed to extract sample data. Section 9.3 describes the two-way
ANOVA to compare several population means when there are two factors. Let's
take a look at the following example.
In order to compare the English proficiency of each grade at a university, samples were
randomly selected from each grade to take the same English test, and data are as in
 ⋅ , 
 ⋅ , 
 ⋅ , 
 ⋅ for
Table 9.1.1. The right column is a calculation of the average 
each grade.
Table 9.1.1 English Proficiency Score by Grade
Grade
English Proficiency Score
Average
1
81
75
69
90
72
83
2
65
80
73
79
81
69
3
72
67
62
76
80
4
89
94
79
88

 ⋅ = 78.3

 ⋅ = 74.5

 = 71.4
⋅

 ⋅ = 87.5
⇨ eBook ⇨ EX090101_EnglishScoreByGrade.csv.
1) Using『eStat』, draw a dot graph of test scores for each grade and compare their
averages.
2) We want to test a hypothesis whether average scores of each grade are the same
or not. Set up a null hypothesis and an alternative hypothesis.
3) Apply the one-way analysis of variances to test the hypothesis in question 2).
4) Use『eStat』to check the result of the ANOVA test.
Example 9.1.1
Answer
1) If you draw a dot graph of English scores by each grade, you can see whether
scores of each grade are similar. If you plot the 95% confidence interval of the
population mean studied in Chapter 6 on each dot graph, you can see a more
detailed comparison.
9.1 Analysis of Variance for Experiments of Single Factor / 31

Example 9.1.1
Answer
(continued)
w In order to draw a dot graph with data shown in Table 9.1.1 using 『eStat』,
enter data on the sheet and set variable names to 'Grade' and 'Score' as shown in
<Figure 9.1.1>. In the variable selection box which appears by clicking the ANOVA
icon
on the main menu of『eStat』, select 'Analysis Var' as ‘Score’ and 'By
Group' as ‘Grade’. The dot graph of English scores by each grade and the 95%
confidence interval are displayed as shown in <Figure 9.1.2>.
<Figure 9.1.1>
『eStat
』data input for
ANOVA
<Figure 9.1.2> 95% Confidence Interval by grade
w To review the normality of the data, pressing the [Histogram] button under this
graph (<Figure 9.1.3>) will draw the histogram and normal distribution together, as
shown in <Figure 9.1.4>.
32
/ Chapter 9 Testing Hypothesis for Several Population Means

Example 9.1.1
Answer
(continued)
<Figure 9.1.3> Options of ANOVA
<Figure 9.1.4> Histogram of English score by grade
  = 78.3, 
  = 74.5, 
  = 71.4, 
  = 87.5.
w <Figure 9.1.2> shows sample means as 
th
The sample mean of the 4 grader is relatively large and the order of the sample
  < 
  < 
  < 
  . 
  and 
  are similar, but 
  is
means in English is 
much greater than the other three. Therefore, it can be expected that the
population mean  and  would be the same and  will differ from three
other population means. However, we need to test whether this difference by
sample means is statistically significant.
2) In this example, the null hypothesis to test is that population means of English
scores of the four grades are all the same, and the alternative hypothesis is that
population means of the English scores are not the same. In other words, if
    are the population means of English scores for each grade, the
hypothesis to test can be written as follows,
Null hypothesis
Alternative hypothesis
 :       
 : at least one pair of  is not the same
3) A measure that can be considered first as a basis for testing differences in multiple
sample means would be the distance from each mean to the overall mean. In
 ·· , the
other words, if the overall sample mean for all 21 students is expressed as 
squared distance from each sample mean to the overall mean is as follows when
the number of samples in each grade is weighted. This squared distance is called
the between sum of squares (SSB) or the treatment sum of squares (SSTr).
 ··      
 ··     
 ··     
 ··   = 643.633
SSTr =   
If the squared distance SSTr is close to zero, all sample means of English scores for
four grades are similar.
9.1 Analysis of Variance for Experiments of Single Factor / 33

Example 9.1.1
Answer
(continued)
w However, this treatment sum of squares can be larger if the number of populations
increases. It requires modification to become a test statistic to determine whether
several population means are equal. The squared distance from each observation to
its sample mean of the grade is called the within sum of squares (SSW) or the
error sum of squares (SSE) as defined below.
SSE =   
 ·     
 ·       
 · 
   
 ·     
 ·       
 · 


    ·     
 ·       
 · 
   
 ·     
 ·       
 ·  = 839.033
w If population distributions of English scores in each grade follow normal distributions
and their variances are the same, the following test statistic has the 
distribution.

SSTr

  
 
SSE

  
This statistic can be used to test whether population English scores of four grades
are the same or not. In the test statistic, the numerator SSTr   is called
the treatment mean square (MSTr) which implies a variance between grade means.
The denominator SSE   is called the error mean square (MSE) which
implies a variance within each grade. Thus, the above test statistics are based on
the ratio of two variances which is why the test of multiple population means is
called an analysis of variance (ANOVA).
w Calculated test statistic which is the observed F value,  , using data of English
scores for each grade is as follows:

SSTr



  

     
SSE



  

Since    = 3.20, the null hypothesis that population means of English scores
of each grade are the same,          , is rejected at the 5%
significance level. In other words, there is a difference in population means of
English scores of each grade.
w The following ANOVA table provides a single view of the above calculation.
Sum of Squares
Degree
of freedom
Treatment
Error
SSTr= 643.633
SSE = 839.033
4-1
21-4
Total
SST =1482.666
20
Factor
Mean Squares
F value
MSTr = 643.633/3
MSE = 839.033/17
Fo = 4.347
4) In <Figure 9.1.3>, if you select the significance level of 5%, confidence level of 95%,
and click [ANOVA F test] button, a graph showing the location of the test statistic
in the F distribution is appeared as shown in <Figure 9.1.5>. Also, in the Log Area,
the mean and confidence interval tables and test result for each grade are
appeared as in <Figure 9.1.6>.
34
/ Chapter 9 Testing Hypothesis for Several Population Means

Example 9.1.1
Answer
(continued)
<Figure 9.1.5>
<Figure 9.1.6>
『eStat』 ANOVA F test
『eStat』Basic Statistics and ANOVA table
w The analysis of variance is also possible using『eStatU』. Entering the data as in
<Figure 9.1.7> and clicking the [Execute] button will have the same result as in
<Figure 9.1.5>.
9.1 Analysis of Variance for Experiments of Single Factor / 35

Example 9.1.1
Answer
(continued)
『
<Figure 9.1.7> ANOVA data input at eStatU
Ÿ
Ÿ
』
The above example refers to two variables, the English score and grade. The
variable such as the English score is called as an analysis variable or a response
variable. The response variable is mostly a continuous variable. The variable used
to distinguish populations such as the grade is called a group variable or a factor
variable which is mostly a categorical variable. Each value of a factor variable Is
called a level of the factor and the number of these levels is the number of
populations to be compared. In the above example, the factor has four levels, 1st,
2nd, 3rd and 4th grade. The term 'response' or 'factor' is originated to analyze data
through experiments in engineering, agriculture, medicine and pharmacy.
The analysis of variance method that examines the effect of single factor on the
response variable is called the one-way ANOVA. Table 9.1.2 shows the typical data
structure of the one-way ANOVA when the number of levels of a factor is  and
the numbers of observation at each level are   ⋯  .
Table 9.1.2 Notation of the one-way ANOVA
Factor
Observed values of sample
Level 1


⋯
 


·
Level 2


⋯
 


·
⋯
⋯
Level 
Ÿ
Average

⋯

⋯
⋯
 
⋯


·
Statistical model for the one-way analysis of variance is given as follows:
    
      
  ⋯ 
  ⋯
 represents the  observed value of the response variable for the  level of
factor. The population mean of the  level,  , is represented as    where 
is the mean of entire population and  is the effect of  level for the
36
/ Chapter 9 Testing Hypothesis for Several Population Means

Ÿ
Ÿ
response variable.  denotes an error term of the  observation for the 
level and the all error terms are assumed independent of each other and follow
the same normal distribution with the mean 0 and variance  .
The error term  is a random variable in the response variable due to reasons
other than levels of the factor. For example, in the English score example,
differences in English performance for each grade can be caused by other
variables besides the variables of grade, such as individual study hours, gender
and IQ. However, by assuming that these variations are relatively small compared
to variations due to differences in grade, the error term can be interpreted as
the sum of these various reasons.
The hypothesis to test can be represented using  instead of  as follows:
        ⋯     
  At least one pair of  is not equal to 0
Null hypothesis
Alternative hypothesis
In order to test the hypothesis, the analysis of variance table as Table 9.1.3 is
used.
Table 9.1.3 Analysis of variance table of the one-way ANOVA
Factor
Sum of
Squares
Treatment
Error
SSTr
SSE
Total
SST
Degree of
freedom


Mean Squares
MSTr = SSTr / (  )
MSE = SSE / (  )
F value
 = MSTr/MSE


( 
 )
 
Ÿ

The three sum of squares for the analysis of variances can be described as
follows. For an explanation, first define the following statistics:

 ·


Mean of observations at the  level
Mean of total observations
··
ni
k
SST 
Y
i  j  

Y ·· 
ij
:
The sum of squared distances between observed values of the response variable
and the mean of total observations is called the total sum of squares (SST).
ni
k
SSTr 
 Y
i  j  
i·

Y ·· 
:
The sum of squared distances between the mean of each level and the mean of
total observations is called the treatment sum of squares (SSTr). It represents the
variation between level means.
k
SSE 
ni
Y
i  j  
ij

Y i· 
:
The sum of squared distances between observations of the  level and the
mean of the  level which is referred to as 'within variation’, and is called the
error sum of squares (SSE).
9.1 Analysis of Variance for Experiments of Single Factor / 37

Ÿ
Ÿ
The degree of freedom of each sum of squares is determined by the following
   , but 
  should be
logic: The SST consists of  number of squares,   
calculated first, before SST is calculated, and Hence, the degree of freedom of SST
 ·  , but the 
is    . The SSE consists of  number of squares,   
 ⋅ ⋯ 
 ⋅ should be calculated first, before SSE is
number of values, 
calculated, and Hence, the degree of freedom of SSE is    . The degree of
freedom of SSTr is calculated as the degree of freedom of SST minus the degree
of freedom of SSE which is    .
In the one-way analysis of variance, the following facts are always established:
Partition of sum of squares and degrees of freedom
Ÿ
Ÿ
Sum of squares:
SST  SSE  SSTr
Degrees of freedom:
          
The sum of squares divided by the corresponding degrees of freedom is referred
to as the mean squares and Table 9.1.3 defines the treatment mean squares
(MSTr) and error mean squares (MSE). As in the meaning of the sum of squares,
the treatment mean square implies the average variation between each level of
the factor, and the error mean square implies the average variation within
observations in each level. Therefore, if MSTr is relatively much larger than MSE,
we can conclude that the population means of each level,  , are not the same.
So by what criteria can you say it is relatively much larger?
The calculated  value,  , in the last column of the ANOVA table represents the
relative size of MSTr and MSE. If the assumptions of  based on statistical
theory are satisfied, and if the null hypothesis       ⋯     is
true, then the below test statistic follows a F distribution with degrees of
freedoms    and    .
SSTrk  
MSTr
    
MSE
SSE n  k 
Ÿ
Therefore, when the significance level is  for a test, if the calculated value 
is greater than the value of         , then the null hypothesis is rejected. That
is, it is determined that the population means of each factor level are not all the
same.
One-way analysis of variance  test
Null hypothesis
      ⋯    
Alternative hypothesis
  At least one  is not equal to 0
Test Statistic
  
MSE
Decision Rule
If     
『
』
MSTr
   
, then reject 
(Note: eStat calculates the  -value of this test. Hence, if the  -value is smaller
than the significance level  , then reject the null hypothesis. )
38
/ Chapter 9 Testing Hypothesis for Several Population Means

[Practice 9.1.1]
(Plant Growth by Condition)
Results from an experiment to compare yields (as measured by dried weight of plants)
obtained under a control (leveled ‘ctrl’) and two different treatment conditions (leveled
‘trt1’ and ‘trt2’). The weight data with 30 observations on control and two treatments
(‘crtl’, ‘trt1’, ‘trt2’), are saved at the following location of 『eStat』. Answer the
following questions using『eStat』,
⇨ eBook ⇨ PR090101_Rdatasets_PlantGrowth.csv
1) Draw a dot graph of weights for each control and treatments.
2) Test a hypothesis whether the weights are the same or not. Use the 5% significance
level.
9.1.1 multiple comparisons
Ÿ
Ÿ
If the F test of the one-way ANOVA does not show a significant difference
between each level of the factor, it can be concluded that there is no difference
between each level of populations. However, if you conclude that there are
significant differences between each level as shown in [Example 9.1.1], you need
to examine which levels are different from each other.
The analysis of differences between population means after ANOVA requires
several tests for the mean difference to be performed simultaneously and it is
called as the multiple comparisons. The hypothesis for the multiple comparisons to
test whether the level means,  and , are equal is as follows:
    ;         
    ,    ≠ 
Ÿ
It means that there are   tests to be done simultaneously for the multiple
comparisons if there are  levels of the factor.
There are many multiple comparisons tests, but Tukey's Honestly Significant
Difference (HSD) test is most commonly used. The statistic for Tukey's HSD test to
compare means  and  is the sample mean difference  ⋅   ⋅ and the
decision rule to test      is as follows:
  
     , then reject 
If  
where HSD ij  q k n k  ⋅

 

   MSE ,

 
n
n

i
j
 and  are the number of samples (repetitions) in  level and  level, MSE is the
mean squared error,        is the right tail 100×  percentile of the studentized range
distribution with parameter  and    degrees of freedom. (It can be found at『eStatU』
(<Figure 9.1.8>)).
9.1 Analysis of Variance for Experiments of Single Factor / 39

『
』
<Figure 9.1.8> eStatU HSD percentile table
Example 9.1.2
Answer
In [Example 9.1.1], the analysis variance of English scores by the grade concluded that
the null hypothesis was rejected and the average English scores for each grade were
not all the same. Now let's apply the multiple comparisons to check where the
differences exist among each school grade with the significance level of 5%. Use
『eStat』 to check the result.
w The hypothesis of the multiple comparisons is     ,
the decision rule is as follows:
   ≠ 
and
  
    , then reject  .’
‘If  
Since there are four school grades (  ),  = 6 multiple comparisons are
possible as follows. The 5 percentile from the right tail of HSD distribution which is
used to test is                 .
1)         ≠ 

  
         

 

HSD   q k n k  ⋅      MSE
 n n

  
       ⋅       = 11.530
  
Therefore, accept  .


2)     
   ≠ 


             

 

HSD   q k n k  ⋅      MSE
 n n

  
       ⋅       = 12.092
  
Therefore, accept  .


40
/ Chapter 9 Testing Hypothesis for Several Population Means

Example 9.1.2
Answer
(continued)
3)     
   ≠ 






    
 

 

HSD   q k n k  ⋅      MSE
 n n

  
     ⋅        
  
Therefore, accept  .


4)     
   ≠ 

            

 

HSD   q k n k  ⋅      MSE
 n n

  
     ⋅       = 12.092
  
Therefore, accept  .


5)     
   ≠ 


             

 

HSD   q k n k  ⋅      MSE
 n n

  
     ⋅       = 12.891
  
Therefore, reject  .


6)     
   ≠ 

            

 

HSD   q k n k  ⋅      MSE
 n n

  
     ⋅       = 13.396
  
Therefore, reject  .


w The result of the above multiple comparisons shows that there is a difference
between  and  ,  and  as can be seen in the dot graph with average in
<Figure 9.1.1>. It also shows that  has no significant difference from other
means.
w If you click [Multiple Comparison] in the options of the ANOVA as in <Figure 9.1.3>,
『eStat』shows the result of Tukey's multiple comparisons as shown in <Figure
9.1.9>. 『eStat』also shows the mean difference and 95% HSD value for the sample
mean combination after rearranging levels of rows and columns in ascending order
of the sample means.
w The next table shows that, if the HSD test result for the combination of the two
levels is significant with the 5% significance level, then * will be marked and if it is
significant with the 1% significance level, then ** will be marked, if it is not
significant, then the cell is left blank.
9.1 Analysis of Variance for Experiments of Single Factor / 41

Example 9.1.2
Answer
(continued)
<Figure 9.1.9> HSD Multiple Comarisons
w For the analysis of mean differences, confidence intervals for each level may also
be used. <Figure 9.1.2> shows the 95% confidence interval for the mean for each
level. This confidence interval is created using the formula described in Chapter 6,
but the only difference is that the estimate of the variance for the error,  , is
the pooled variance using overall observations rather than the sample variance of
observed values at each level. In the ANOVA table, MSE is the pooled variance.
w In post-analysis using these confidence intervals, there is a difference between
means if the confidence intervals are not overlapped, so the same conclusion can
be obtained as in the previous HSD test.
[Practice 9.1.2]
By using the data of [Practice 9.1.1]
⇨ eBook ⇨ PR090101_Rdatasets_PlantGrowth.csv
apply the multiple comparisons to check where differences exist among Control and
two treatments with the significance level of 5%. Use『eStat』 .
9.1.2 Residual Analysis
Ÿ
Another statistical analysis related to the ANOVA
hypothesis tests in the ANOVA are performed on
hold about the error term  . Assumptions
independence (  are independent of each other),
of  is constant as  ), normality (each  is
is a residual analysis. Various
the condition that assumptions
about error terms include
homoscedasticity (each variance
normally distributed), etc. The
42
/ Chapter 9 Testing Hypothesis for Several Population Means

validity of these assumptions should always be investigated. However, since  can
not be observed, the residual as the estimate of  is used to check the
assumptions. The residuals in the ANOVA are defined as the deviations used in
   in the
the equation of the error sum of squares, for example,   
one-way analysis of variance.
Example 9.1.3
In [Example 9.1.1] of English score comparison by the grade, apply the residual analysis
using『eStat』.
Answer
w If you click on [Standardized Residual Plot] of the ANOVA option in <Figure 9.1.3>,
a scatter plot of residuals versus fitted values appears as shown in <Figure 9.1.10>.
In this scatter plot, if the residuals show no unusual tendency around zero and
appear randomly, then the assumptions of independence and homoscedasticity are
valid. There is no unusual tendency in this scatter plot. Normality of the residuals
can be checked by drawing the histogram of residuals.
<Figure 9.1.10> Residual plot of the ANOVA
[Practice 9.1.3]
By using the data of [Practice 9.1.1]
⇨ eBook ⇨ PR090101_Rdatasets_PlantGrowth.csv
apply the residual analysis using『eStat』.
9.2 Design of Experiments for Sampling
Ÿ
Data such as English scores by the grade in [Example 9.1.1] are not so difficult to
collect samples from each of the grade population. However, obtaining samples
through experiments such as engineering, medicine, or agriculture are often
9.2 Design of Experiments for Sampling / 43

difficult to collect a large number of samples due to the influence of many other
external factors, and should be very cautious about sampling. This section
discusses how to design experiments for collecting small number of data from
experiments.
9.2.1 Completely Randomized Design
Ÿ
Ÿ
In order to identify the differences accurately that may exist among each level of
a factor, you should design experiments such as little influence from other factors.
One method to do this is to make the whole experiments random. For example,
consider experiments to compare a fuel mileage per one liter of gasoline for
three types of cars A, B and C. We want to measure the fuel mileage for five
different cars of each type. One driver may try to drive all 15 cars. However, if
only five cars can be measured per day, the measurement will take place over a
total of three days. In this case, changes in daily weather, wind speed and wind
direction can influence the fuel mileage which makes it a question of which car
should be measured for fuel mileage on each day.
If five drivers (1, 2, 3, 4, 5) plan to drive the car to measure the fuel mileage of
all cars a day, the fuel mileage of the car may be affected by the driver. One
solution would be to allocate 15 cars randomly to five drivers and then to
randomize the sequence of experiments as well. For example, each car is
numbered from 1 to 15 and then, the experiment of the fuel mileage is
conducted in the order of numbers that come out using drawing a random
number. Such an experiment would reduce the likelihood of differences caused by
external factors such as the driver, daily wind speed and wind direction, because
randomized experiments make all external factors equally affecting the all
observed measurement values. This method of experiments is called a completely
randomized design of experiments. Table 9.2.1 shows an example allocation of
experiments by this method. Symbols A, B and C represent the three types of
cars.
Table 9.2.1 Example of completely randomized design
of experiments
Ÿ
Driver
1
2
3
4
5
Car Type
B
B
C
A
C
B
B
A
A
C
A
B
A
C
C
In general, in order to achieve the purpose of the analysis of variance, it is
necessary to plan experiments thoroughly in advance for obtaining data properly.
The completely randomized design method explained as above is studied in detail
at the Design of Experiments area in Statistics. From the standpoint of the
experimental design, the one-way analysis of variance technique is called an
analysis of the single factor design.
9.2.2 Randomized Block Design
Ÿ
In the experiments of completely randomized design for measuring the fuel
mileage explained in the previous section, 15 cars were randomly allocated to five
drivers. However, one example allocation as inTable 9.2.1 shows a problem of this
completely randomized design. For example, Driver 1 will only experiment with B
and C types of cars and Driver 3 will only experiment A and B types of cars so
that the variable between drivers will not be averaged in the test. Thus, if there
is a significant variation between drivers for measuring the fuel mileage, the error
term of the analysis of variance may not be a simple experimental error. In order
44
/ Chapter 9 Testing Hypothesis for Several Population Means

to eliminate this problem, each driver may be required to experiment with each
type of the car at least once which is known as a randomized block design. Table
9.2.2 shows an example of possible allocation in this case. In this table, the
values in parentheses are the values of the observed fuel mileage.
Table 9.2.2 Example of randomized block design
Driver
1
A(22.4)
C(20.2)
B(16.3)
Car Type
(gas mileage)
Ÿ
Ÿ
2
3
B(12.6)
C(15.2)
A(16.1)
4
C(18.7)
A(19.7)
B(15.9)
A(21.1)
B(17.8)
C(18.9)
5
A(24.5)
C(23.8)
B(21.0)
Table 9.2.2 shows that the total observed values are divided into five groups by
driver, called blocks so that they have the same characteristics. The variable
representing blocks, such as the driver, is referred to as a block variable. A block
variable is considered generally if experimental results are influenced significantly
by this variable which is different from the factor. For example, when examining
the yield resulting from rice variety, if the fields of the rice paddy used in the
experiment do not have the same fertility, divide the fields into several blocks
which have the same fertility and then all varieties of rice are planted in each
block of the rice paddy. This would eliminate the influence of the rice paddy
which have different fertility and would allow for a more accurate examination of
the differences in yield between rice varieties.
Statistical model of the randomized block design with  blocks can be represented
as follows:
         
  ⋯
  ⋯
In this equation,   is the effect of  level of the block variable to the
response variable. In the randomized block design, the variation resulting from the
difference between levels of the block variable can be separated from the error
term of the variation of the factor independently. In the randomized block design,
the total variation is divided into as follows:
  
     
   
   
     
   
     
   
  
Ÿ
If you square both sides of the equation above and then combine for all  , you
can obtain several sums of squares as in the one-way analysis of variance as
follows:
Total sum of squares, degrees of freedom   

SST 

 
    

    

Error sum of squares, degrees of freedom     

SSE 

  
    



          

Treatment sum of squares, degrees of freedom   

SSTr 

   
    

    


 
 
 

    

9.2 Design of Experiments for Sampling / 45

Block sum of squares, degrees of freedom   

SSB 

  
    
Ÿ

    


 
 


    

The following facts are always established in the randomized block design.
Division of the sum of squares and degrees of freedom
SST
Sum of squares :
Degrees of freedom :
Ÿ
=
  
SSE
+
SSTr
=      + (    )
+
SSB
+ (  )
Table 9.2.3 shows the ANOVA table of the randomized block design. In this
ANOVA table, if you combine the sum of squares and degrees of freedom of the
block variable and the error variation, it becomes the sum of squares and degrees
of freedom of the error term in the one-way ANOVA table 9.1.3.
Table 9.2.3 Analysis of Variance Table of the randomized block design
Variation
Ÿ
Ÿ
Ÿ
Sum of
Squares
Degrees of
freedom
Mean Squares
Treatment
SSTr

SSTr
MSTr  
k
Block
SSB

SSB
MSB  
b
Error
SSE
    
SSE
MSE  
b  k  
Total
SST
  
F value

  

In the randomized block design, the entire experiments are not randomized unlike
the completely randomized design, but only the experiments in each block are
randomized.
Another important thing to note in the randomized block design is that, although
the variation of the block variable was separated from the error variation, the
main objective is to test the difference between levels of a factor as in the
one-way analysis of variance. The test for differences between the levels of the
block variable is not important, because the block variable is used to reduce the
error variation and to make the test for differences between the levels of the
factor more accurate.
In addition, the error mean square (MSE) does not always decrease, because
although the block variation is separated from the error variation of the one-way
analysis of variance, the degrees of freedom are also reduced.
46
/ Chapter 9 Testing Hypothesis for Several Population Means

Example 9.2.1
Table 9.2.4 is the rearrangement of the fuel mileage data in Table 9.2.2 measured by
five drivers and car types.
Table 9.2.4 Fuel mileage data by five drivers and three car types
Drive
Car
Type
1
A
B
C
  
Average  
3
4
5
 ⋅ )
Average( 
19.7
15.9
18.7
21.1
17.8
18.9
24.5
21.0
23.8
20.76
16.72
19.36
2
22.4
16.3
20.2
16.1
12.6
15.2
19.63
14.63
18.10 19.27 23.10
18.947
⇨ eBook ⇨ EX090201_GasMilage.csv
1) Assuming that this data have been measured by the completely randomized design,
use 『eStat』 to do the analysis of variance whether the three car types have the
same fuel mileage.
2) Assuming that this data have been measured by the randomized block design, use
『eStat』 to do the analysis of variance whether the three car types have the same
fuel mileage.
Answer
1) In 『eStat』, enter data as shown in <Figure 9.2.1> and click the icon of analysis of
variance
. Select 'Analysis Var' as Miles and 'By Group' as Car in the variable
selection box, then the confidence interval graph for each type of cars will appear
such as <Figure 9.2.2>.
<Figure 9.2.1> Data
input for randomized
block design for
eStat ANOVA
『
』
<Figure 9.2.2> Dot graph and 95% confidence
interval for population mean of each car type
w Click the [ANOVA F-test] button in the option below the graph to reveal the
ANOVA graph as in <Figure 9.2.3> and the ANOVA table as in <Figure 9.2.4>. The
result of the ANOVA is that there is no difference in fuel mileage between the cars
of each company. The same is true for the multiple comparisons tests in <Figure
9.2.5>.
9.2 Design of Experiments for Sampling / 47

Example 9.2.1
Answer
(continued)
<Figure 9.2.3> ANOVA of gas milage
<Figure 9.2.4> ANOVA table of gas milage
<Figure 9.2.5> Multiple comparisons by car
2) If this data have been extracted using the randomized block design, the block sum
of squares will be separated from the error sum of squares. Adding Driver variable
to 'by Group' in the variable selection box of『eStat』will give you a scatter plot of
driver-specific fuel mileage for each car type as shown in <Figure 9.2.6>. This
scatter plot shows a significant difference in fuel mileage per driver.
48
/ Chapter 9 Testing Hypothesis for Several Population Means

Example 9.2.1
Answer
(continued)
<Figure 9.2.6> Fuel mileages for each driver
w
Click the [ANOVA F-Test] button in the options window below the graph to reveal
the two-way mean table shown in <Figure 9.2.7> and the ANOVA table shown in
<Figure 9.2.8>. This ANOVA table clearly shows a decrease in error sum of squares
and reduces significantly the mean squares of errors. This is due to the large
variation between drivers being separated from the error variation. Factor B (driver)
represents the block sum of squares separated from error term. The p-value shows
that, the block (driver) effect is statistically significant. The  value for the
hypothesis          of fuel mileage by Factor A (car type) is 43.447
and is greater than  = 4.46, so you can reject the  at the significance
level of 0.05. Consequently, significant differences in fuel mileages between car
types can be found by removing the variation of the block in the error term.
<Figure 9.2.7> Two-way mean table by car and driver
(There is no standard deviation of single data and denoted as NaN)
9.2 Design of Experiments for Sampling / 49

Example 9.2.1
Answer
(continued)
<Figure 9.2.8> ANOVA table for randomized block design
w In average, car type A has the best fuel mileage than other car types. In order to
examine more about the differences between car types, the multiple comparisons
test in the previous section can be applied. In this example, you can use one HSD
value for all mean comparisons, because the number of repetitions at each level is
the same.




    
 




 
Therefore, there is a significant difference in fuel mileage between all three types of
cars, since the differences between the mean values (4.04, 1.40, 2.64) are all
greater than the critical value of 1.257.
w The same analysis of randomized design can be done using 『eStatU』 by following
data input and clicking [Execute] button.
<Figure 9.2.9> Data input for
[Practice 9.2.1]
『eStatU』RBD
The following is the result of an agronomist's survey of the yield of four varieties of
wheat by using the randomized block design of the three cultivated areas (block). Test
whether the mean yields of the four wheats are the same or not with 5% significance
level.
1
Wheat
Type
A
B
C
D
50
59
55
58
Cultivated Area
2
3
60
52
55
58
56
51
52
55
⇨ eBook ⇨ PR090201_WheatAreaYield.csv
50
/ Chapter 9 Testing Hypothesis for Several Population Means

9.2.3 Latin Square Design
Ÿ
Ÿ
Ÿ
In the experiments of randomized block design for measuring the fuel mileage
explained in the previous section, there is one extraneous block variation which is
the driver. If the researcher feels that there is an additional variation such as
road type, there are two identifiable sources of extraneous block variations, i.e.,
two block variables. In this case, the researcher needs a design that will isolate
and remove both sources of block variables from residual. The Latin square design
is such a design.
In the Latin square design, we assign one sources of extraneous variation to the
columns of the square and the second source of extraneous variation to the rows
of the square. We then assign the treatments in such a way that each treatment
occurs one and only once in each row and each column. The number of rows,
the number of columns, and the number of treatments, therefore, are all equal.
Table 9.2.5 shows a 3 × 3 typical Latin squares with three rows, three columns
and three treatments designated by capital letters A, B, C.
Table 9.2.5 Fuel mileage data by three drivers
and three road types of three car types (A, B, C)
Row 1
Row 2
Row 3
Column 1
Road 1
Column 2
Road 2
Column 3
Road 3
A
B
C
B
C
A
C
A
B
Driver 1
Driver 2
Driver 3
Table 9.2.6 shows a 4 × 4 typical Latin squares with four rows, four columns and
four treatments designated by capital letters A, B, C, D.
Table 9.2.6 Fuel mileage data by four drivers and four road
types of four car types (A, B, C, D)
Row
Row
Row
Row
Ÿ
Ÿ
Ÿ
1
2
3
4
Driver
Driver
Driver
Driver
Column 2
Road 2
A
B
C
D
B
C
D
A
Column 3
Road 3
Column 4
Road 4
C
D
A
B
D
A
B
C
In the Latin square design, treatments can be assigned randomly in such a way
that the car type occurs one and only once in each row and each column..
Therefore, there are many possible designs of 3 × 3 and 4 × 4 Latin square. We
get randomization in the Latin square by randomly selection a square of the
desired dimension from all possible squares of that dimension. One method of
doing this is to randomly assign a different treatments to each cell in each
column, with the restriction that each treatment must appear one and only once
in each row.
Small Latin squares provided only a small number of degrees of freedom for the
error mean square. So a minimum size of 5 × 5 is usually recommended.
The hypothesis of Latin square design with  treatments is as follows:
Null hypothesis
Alternative hypothesis
Ÿ
1
2
3
4
Column 1
Road 1
      ⋯  
  At least one pair of  is not equal
Statistical model of the  ×  Latin square design with  treatments can be
represented as follows:
9.2 Design of Experiments for Sampling / 51

  ⋯
          
  ⋯
  ⋯
where     
Ÿ
In this equation,  is the effect of  level of the row block variable to the
response variable and  is the effect of  level of the column block variable to
the response variable.  is the effect of  level of the response variable.
Notation for row averages, column averages and treatment averages of  × 
Latin squre data are as follows;
Table 9.2.7 Notation for row means, column means and treatment averages
of  ×  Latin squre data
Column 1
Column 2
⋯ Column r
Row 1



Row 2




⋯
⋯


Row r



Column Average


Treatment average:
Ÿ
Row Average








⋯



⋯






In the Latin square design, the variation resulting from the difference between
levels of two block variables can be separated from the error term of the
variation of the factor independently. In the Latin square design, the total
variation is divided into as follows:
  
     
   
   
    
     
   
     
   
     
   
  
If you square both sides of the equation above and then combine for all  ,
you can obtain the following sums of squares:
Total sum of squares, degrees of freedom   

SST 


   
      

    

Error sum of squares, degrees of freedom    

SSE 


   
      




              

Row sum of squares, degrees of freedom  

SSR 


    
      

    

Column sum of squares, degrees of freedom   

SSC 


    
      

    

Treatment sum of squares, degrees of freedom   

SSTr 


    
      
Ÿ

    

The following facts are always established in the Latin square design. Table 9.2.8
shows the ANOVA table of the Latin square design. In this ANOVA table,
52
/ Chapter 9 Testing Hypothesis for Several Population Means

Division of the sum of squares and degrees of freedom
Sum of squares :
SST
Degrees of freedom :   
=
SSE + SSR  SSC + SSTr
=     + (   ) + (    ) + (    )
Table 9.2.8 ANOVA table of the Latin square design
Sum of
Squares
Variation
Example 9.2.2
Degrees of
freedom
Mean Squares
Treatment
SSTr

SSTr
MSTr  
r
Row
SSR

SSR
MSR  
r
Column
SSC

SSC
MSC  
r
Error
SSE
    
SSE
MSE  
r  r  
Total
SST
  
F value

  

Table 9.2.9 is the fuel mileage data of four car types (A, B, C, D) measured by four
drivers and four road types with Latin square design.
Table 9.2.9 Fuel mileage data by four drivers and four road
types of four car types (A, B, C, D)
Row
Row
Row
Row
1
2
3
4
Driver
Driver
Driver
Driver
1
2
3
4
Column 1
Road 1
Column 2
Road 2
Column 3
Road 3
Column 4
Road 4
A(22)
B(24)
C(17)
D(18)
B(16)
C(16)
D(21)
A(18)
C(19)
D(12)
A(20)
B(23)
D(21)
A(15)
B(15)
C(22)
Use 『eStatU』 to do the analysis of variance whether the four car types have the
same fuel mileage.
Answer
w In 『eStatU』- ‘Testing Hypothesis ANOVA – Latin Square Design’, select the number
of treatment r = 4 and enter data as shown in <Figure 9.2.10>.
<Figure 9.2.10> Data input for Latin square design in
『eStatU』
9.2 Design of Experiments for Sampling / 53

Example 9.2.2
Answer
(continued)
w Click [Execute] button to show Dot graph by car type in Latin square design as
<Figure 9.2.11> and ANOVA table as in <Figure 9.2.12>. The dot graph and result of
the ANOVA is that there is no difference in fuel mileage between the car types.
<Figure 9.2.11> Dot graph by car type in Latin square design
<Figure 9.2.12> ANOVA table of Latin square design
[Practice 9.2.2]
To study the effect of packaging on the sales of a certain cereal, a researcher tries
three different packaging methods (treatments) at four different times of the week
(columns) in four different supermarket chains (rows). The variable of interest is daily
salse. The following table shows the results of the study. Do these data show a
significant difference in shoppers’ response to the different packaging methods? Let 
= 0.05.
1
Store
1
2
3
4
A(50)
B(59)
C(55)
D(58)
Time of week
2
3
B(60)
C(52)
D(55)
A(58)
C(56)
D(51)
A(52)
B(55)
4
D(63)
A(57)
B(56)
C(61)
54
/ Chapter 9 Testing Hypothesis for Several Population Means

9.3 Analysis of Variance for Experiments of Two Factors
Ÿ
Ÿ
Definition
If there are two factors affecting the response variable, the analysis is called a
two-way analysis of variances. This technique is frequently used in experiments
such as engineering, medicine and agriculture. The response variable is observed
at each combination of levels of two factors (denoted as A and B). In general, it
is advisable to repeat at least two experiments at each combination of levels of
two factors, if possible, in order to increase the reliability of the experimental
results.
When data are obtained from repeated experiments at each factor level, the
two-way ANOVA tests whether the population means of each level of factor A
are the same (called the main effect test of the factor A) as the one-way
ANOVA, or tests whether the population means of each level of factor B are the
same (called the main effect test of the factor B). In addition, the two-way
ANOVA tests whether the effect of one factor A is influenced by each level of
the other factor B (called the interaction effect test). For example, in a chemical
process, if the higher the pressure when the temperature is low, the greater the
amount of products, and the lower the pressure when the temperature is high,
the greater the amount of products, the interaction effect exists between the two
factors of temperature and pressure. The interaction effect exists where the
effects of one factor change with changes in the level of another factor.
Main effect and Interaction effect
When data are obtained from repeated experiments at each factor level,
the two-way ANOVA tests whether the population means of each level of
factor A (called the main effect test of the factor A) are the same as the
one-way ANOVA, or tests whether the population means of each level of
factor B are the same (called the main effect test of the factor B).
The two-way ANOVA also tests whether the effect of one factor A is
influenced by each level of the other factor B (called the interaction effect
test).
Example 9.3.1
Table 9.3.1 shows the yield data of three repeated agricultural experiments for each
combination of four fertilizer levels and three rice types to investigate the yield of rice.
Table 9.3.1 Yield of rice by fertilizers and types of rice (unit kg)
Fertilizer
1
2
3
4
1
64,66,70
65,63,58
59,68,65
58,50,49
Types of rice
2
72,81,64
57,43,52
66,71,59
57,61,53
3
74,51,65
47,58,67
58,45,42
53,59,38
⇨ eBook ⇨ EX090301_YieldByRiceFertilzer.csv
1) Find the average yield for each combination of fertilizers and rice types.
2) Using 『eStat』, draw a scatter plot with the rice types (1, 2 and 3) as X-axis and
the yield as Y-axis. Separate the color of dots in the scatter plot by the type of
fertilizer. Then, show the average of the combinations at each level on the scatter
plot and connect them with lines for each type of fertilizer to observe.
3) Test the main effects of fertilizers and rice types and test the interaction effect of
the two factors.
4) Using『eStat』, check the result of the two-way analysis of variance.
9.3 Analysis of Variance for Experiments of Two Factors / 55

Example 9.3.1
Answer
1) For convenience, let us call the fertilizer as the factor A and the rice type as factor
B. The averages of the rice yield for each level combination of two factors are
 · of each
shown in Table 9.3.2. Denote the  rice yield,  , and average 
combination of  level of factor A and  level of factor B. Also, denote the
 ·· , the average of  level of factor B as
average of  level of factor A as 

 , and the global average as 
 .
··
···
Table 9.3.2 Average yield of rice by fertilizers and types of rice (unit kg)
Types of Rice (Factor A)
Fertilizer
(Factor B)
Row Average
1
2
3

 ·  

 ·  
  

 ·  

 ·  
  

 ·  

 ·  
  

 ··  

 ··  
  
4

 ·  

 ·  

 ·  

 ··  
Column
Average

 ··  

 ··  

 ··  

 ···  
1
2
3
·
·
·
··
2) To draw a scatter plot for the two-way ANOVA using 『eStat』, enter data as
<Figure 9.3.1> where the fertilizer is variable 1, the rice type is variable 2 and the
rice yield is variable 3.
『
』
<Figure 9.3.1> Data input for
two-way ANOVA in eStat
w In the variable selection box which appears by clicking the ANOVA icon
on the
main menu, select 'Analysis Var' as Yield and 'By Group' as Rice and Fertilizer, then
the scatter plot of the yield by rice type will appear as in <Figure 9.3.2>. In
addition, the average yields at each rice type by fertilizer are marked as dots
linking them with lines by fertilizer. In this graph, rice type 1 always yields more
than rice type 3 regardless of the fertilizer used. Rice type 2 varies in yield
depending on the type of fertilizer used, which shows the existence of interaction,
and the use of fertilizer 1 usually results in a high yield regardless of the rice
types.
56
/ Chapter 9 Testing Hypothesis for Several Population Means

Example 9.3.1
Answer
(continued)
<Figure 9.3.2> Yields by rice types and fertilizer types
3) Testing the factor A, which is to test the main effect of rice types, implies to test
the following null hypothesis.
 : The average yields of the three rice types are the same.
w If the null hypothesis is rejected, we conclude that the main effect of rice types
exists. In order to test the main effect of rice types, as in the one-way analysis of
  of rice type 
variance, the sum of squared distances from each average yield 
to the overall average yield   .
                    
where the weight of 12 of each sum of squares is the number of data for each
rice type. Since there are 3 rice types, the degrees of freedom of  is (3-1) and
we call the sum of squares  divided by (3-1),   , is the mean
squares of factor A,  .
w Testing the factor B, which is to test the main effect of fertilizer types, implies to
test the following null hypothesis.
 : The average yields of the four fertilizer types are the same.
w If the null hypothesis is rejected, we conclude that the main effect of fertilizer
types exists. In order to test the main effect of fertilizer types, as in the one-way
analysis of variance, the sum of squared distances from each average yield 
  of
fertilizer type  to the overall average yield   ,
                              
where the weight of 9 of each sum of squares is the number of data for each
fertilizer type. Since there are 4 fertilizer types, the degrees of freedom of  is
(4-1) and we call the sum of squares  divided by (4-1),    , is the
mean squares of factor B,  .
w Testing the interaction effect of rice and fertilizer (represented as factor AB) is to
test the following null hypothesis.
 : There is no interaction effect between rice type and fertilizer type.
9.3 Analysis of Variance for Experiments of Two Factors / 57

Example 9.3.1
Answer
(continued)
w If the null hypothesis is rejected, we conclude that there is an interaction effect
between rice types and fertilizer types. In order to test the interaction effect, the
  subtracting the average yield
sum of squared distances from each average yield 

 of fertilizer type , subtracting the average yield 
  of rice type , adding the

  .
overall average yield 
SSAB    
   
         
        
   
   
   
      
   
   
  






                     
  

               
   
   
  
   
 
   
      
 
   
  


   
   
   
      
   
   
  
 
where the weight of 3 of each sum of squares is the number of data for each cell
of rice and fertilizer type. The degrees of freedom of  is (3-1)(4-1) and we
call the sum of squares  divided by (3-1)(4-1),       is
the mean squares of interaction AB,  .
w It is not possible to test each effect immediately using these sum of squares, but
the error sum of squares should be calculated. In order to calculate the error sum
of squares, first we calculate the total sum of squares which is the sum of the
squared distances from each data to the overall average.
SST    
y      
y      
y   
     
      
      
    
This total sum of squares can be proven mathematically to be the sum of the
other sums of squares as follows:
SST  SSA  SSB  SSAB  SSE
Therefore, the error sum of squares can be calculated as follows:
SSE  SST  SSA  SSB  SSAB   
w If the yields on each rice type or fertilizer type are assumed to be normal and the
variances are the same, the statistic which divides the each mean squares by the
error mean squares follows  distribution. Therefore, the main effects and
interaction effect can be tested using  distributions. If the interaction effect is
separated, we test them first. Testing results using the 5% significance level are as
follows:
① Testing of the interaction effect on rice and fertilizer:
SSAB


    

    = 1.77

SSE


   = 2.51
Since  <    , we conclude that there is no interaction. The interaction on
rice and fertilizer in <Figure 9.3.2> is so small which is not statistically significant
and it may due to other kind of random error. The calculated p-value of   
using『eStat』is 0.1488.
58
/ Chapter 9 Testing Hypothesis for Several Population Means

Example 9.3.1
Answer
(continued)
② Testing of the main effect on rice types (Factor A):
SSA


  

    = 3.08

SSE


   = 3.40
Since  <    , we can not reject the null hypothesis that average yields of
rice types are the same. There is not enough evidence statistically that average
yields are different depending on rice types. The calculated p-value of   
using『eStat』is 0.0644.
③ Testing of the main effect on fertilizer types (Factor B):
SSB


  

    = 6.02

SSE


   = 3.01
Since  >    , we reject the null hypothesis that average yields of fertilizer
types are the same. There is enough statistical evidence which shows that average
yields are different depending on fertilizer types. Since there is no interaction effect
by 1), we can conclude that fertilizer 1 produces more yields than other fertilizer.
The calculated p-value of    using『eStat』is 0.0033.
w The result of the two-way analysis of variances is as Table 9.3.3.
Table 9.3.3 two-way analysis of variance of yields by rice and fertilizer types
Sum of
Squares
Factor
Rice Type
degrees of
freedom
Mean
Squares
F value
 -value
342.3889
2
171.1944
3.0815
0.0644
1002.8889
3
334.2963
6.0173
0.0033
588.9444
6
98.1574
1.7668
0.1488
Error
1333.3333
24
55.5556
Total
3267.5556
35
Feritlizer Type
Interaction
4) If you press the [ANOVA F-test] button in the options window below <Figure 9.3.2>
of『eStat』, the two-dimensional table of means / standard deviations for each
level combination as in <Figure 9.3.3> and the two-way analysis of variance table as
in <Figure 9.3.4> will appear in the Log Area.
<Figure 9.3.3> Two dimensional mean / standard deviation table
9.3 Analysis of Variance for Experiments of Two Factors / 59

Example 9.3.1
Answer
(continued)
<Figure 9.3.4> two-way analysis of variance table
Ÿ
Let's generalize the theory of the two-way analysis of variance discussed in the
example above. Let  be the random variable representing the   observation
at the   level of factor A, which has  number of levels, and   level of
factor B, which has  number of levels. A statistical model of the two-way
analysis of variances is as follows:
              ⋯    ⋯    ⋯
 : total mean
 : effect of   level of factor A
 : effect of   level of factor B
 : interaction effect of   level of factor A and   level of factor B
 : error terms which are independent and follow N(0, ).
Ÿ
Assume that experiments are repeated  times equally at the   level of factor A
and   level of factor B. Therefore, the total number of observations is    .
 
The total sum of squared distances from each observation to the total mean 
can be partitioned as following sum of squares similar to the one-way analysis of
variance.
a
Total sum of squares: SST 
b
r
  Y
i   j  k  
a
Fator A sum of squares: SSA  br
 Y
i
b
Factor B sum of squares: SSB  ar
a
: degrees of freedom:   


: degrees of freedom:   
b
 Y
i  j  
: degrees of freedom:   

j  Y  
Interaction sum of squares: SSAB  r


i  Y  
 Y
j

ijk  Y  

ij  Y i

Y j  
Y   : degrees of
freedom:
a
Error sum of squares: SSE 
b
r
  Y
i   j  k  

ijk  Y ij 

    
: degrees of freedom:   
Partition of Sum of Squares and degrees of freedom
        
Sum of Squares:
degrees of freedom:                     
60
/ Chapter 9 Testing Hypothesis for Several Population Means

Ÿ
The two-way analysis of variance is summarized as Table 9.3.4.
Table 9.3.4 two-way analysis of variance table
Sum of
Squares
Factor
Factor A
SSA
Factor B
SSB
Interaction
SSAB
Error
SSE
Total
SST
Degree of
Freedom
Mean Squares
F value

MSA = SSA/(    )
 = MSA/MSE

MSB = SSB/(    )
 = MSB/MSE
     MSAB = SSAB/(      )  = MSAB/MSE
  
MSE = SSE/(    )

Two-way analysis of variance without repetition of experiments
If there is no repeated observation at each level combination of two
factors, the interaction effect can not be estimated and the row of
interaction factor is deleted from the above two-way ANOVA table. In
this case, the analysis of variance table is the same as the randomized
block design as Table 9.2.3.
Ÿ
Testing hypothesis for the main effects and interaction effect of factor A and
factor B are as follows. If the interaction effect is separated, it is reasonable to
test the interaction effect first. This is because, depending on the significance of
the interaction effect, the method of interpreting the result of the main effect
test of each factor can be different.
1)  Test for the interaction effect:
       ⋯   ⋯
If   MSABMSE >         , then reject 
2)  Test for the main effect of factor A:
      ⋯    
If   MSAMSE >        , then reject 
3)  Test for the main effect of factor B:
      ⋯    
If   MSBMSE >        , then reject 
『
』
( eStat calculates the p-value for each of these tests and tests them using it.
That is, for each test, if the p-value is less than the significance level, the null
hypothesis  is rejected.)
Ÿ
If the test for interaction effect is not significant, a test of the main effects of
each factor can be performed to test significant differences between levels.
However, if there is a significant interaction effect, the test for the main effects
of each factor is meaningless, so an analysis should be made on which level
combinations of factors show differences in the means.
9.3 Analysis of Variance for Experiments of Two Factors / 61

Ÿ
[Practice 9.3.1]
If you conclude that significant differences between the levels of a factor as in
the one-way analysis of variance exist there, you can compare confidence intervals
at each level to see which level of the differences appears. And a residual
analysis is necessary to investigate the validity of the assumption.
The result of an experiment at a production plant of an electronic component to
investigate the life of the product due to changes in temperature (    ) and
humidity (   ) is as follows. Analyze data using the analysis of variance with 5%
significance level.
(Unit: Time)



6.29
6.38
6.25
5.80
5.92
5.78

5.95
6.05
5.89
6.32
6.44
6.29
⇨ eBook ⇨ PR090301_LifeByTemperatureHumidity.csv
Design of experiments for the two-way analysis of variances
Even in the two-way analysis of variance, obtaining sample data at each
level of two factors in engineering or in agriculture can be influenced by
other factors and should be careful in sampling. In order to accurately
identify the differences that may exist between each level of a factor, it
is advisable to make as few as possible influences from other factors.
One of the most commonly used methods of doing this is completely
randomized design which makes the entire experiments random. There
are many other experimental design methods, and for more information,
refer to the references to the experimental design of several factors.
62
/ Chapter 9 Testing Hypothesis for Several Population Means

Exercise
9.1 Complete the following ANOVA table.

Factor

df
Treatment
Error
154.9199
________
4
__
Total
200.4773
39
 ratio
______
______
9.2 Answer the following questions based on this ANOVA table.
Factor
Treatment
Error

df

 ratio
5.05835
65.42090
2
27
2.52917
2.4230
1.0438
1) How many levels of treatment are compared?
2) How many total number of observations are there?
3) Can you conclude that the levels of treatment are significantly different with the
5% significance level? Why?
9.3
In order to test customers' responses to new products, four different exhibition methods (A, B, C
and D) were used by a company. Each exhibition method was used in nine stores by selecting 36
stores that met the company's criteria. The total sales for the weekend are shown in the following
table.
Exhibition Method
Sales for the weekend in 9 stones (unit: 1000USD)
A
B
C
D
5
2
2
6
6
2
2
6
7
2
3
7
7
3
3
8
8
3
2
8
6
2
2
8
7
3
2
6
7
3
3
6
6
2
3
6
1) Draw a scatter plot of sales (y axis) and exhibition method (x axis). Mark the
average sales of each exhibition method and connect them with a line.
2) Test that the sales by each exhibition method are different in the amount of sales
with the 5% significance level. Can you conclude that one of the exhibition
methods shows significant effect on sales?
9.4 The following table shows mileages in km per liter obtained from experiments to compare three
brands of gasoline. In this experiment, seven cars of the same type were used in a similar
situation to reduce the variation of the car.
Gasoline
A
B
C
mileage in km / liter
14
20
20
19
21
26
19
18
23
16
20
24
15
19
23
17
19
25
20
18
23
1) Calculate the average mileages of each gasoline brand. Draw a scatter plot of gas
milage (y axis) and gasoline brand (x axis) to compare.
2) From this data, test whether there are differences between gasoline brands for gas
Chapter 9 Exercise / 63

milage with the 5% significance level.
9.5 The result of a survey on job satisfaction of three companies (A, B, and C) is as follows. Test
whether the averages of job satisfaction of the three companies are different with the 5%
significance level.
Company
Job satisfaction score
A
B
C
69 67 65 59 68 61 66
56 63 55 59 52 57
71 72 70 68 74
9.6 Psychologists were asked to investigate the job satisfaction of salespeople in three companies: A,
B and C. Ten salespeople were randomly selected from each company and a test to measure the
job satisfaction was conducted. Test scores are as follows. From this data, can we claim that the
average scores of the job satisfaction of three companies are different with the significance level
of 0.05?
Company
A
B
C
Job satisfaction score
67
66
87
65
68
80
59
55
67
59
59
89
58
61
80
61
66
84
66
62
78
53
65
65
51
64
72
64
74
85
9.7 An advertising agency experimented to find out the effects of various forms (A, B, C, D and E) of
TV advertising. Fifty television viewers were shown five forms of TV commercials for a cold
medicine in random order one by one. The effect of advertising after viewing was measured and
recorded as follows. Test an appropriate hypothesis with the 5% significance level.
Forms of TV Advertising
A
20 23 21
23 26 24
26 23 20
24
B
28 27 22
28 23 29
27 25 28
21
C
D
33 34 25
26 27 33
25 32 25
34
33 29 31
29 27 25
26 26 33
32
E
49 41 41
39 41 48
43 43 46
35
9.8 The following is the result of an agronomist's survey of the yield of four varieties of wheat by using
the randomized block design of three cultivated areas (block). Test whether the mean yields of the
four wheats are the same or not with the 5% significance level.
Cultivated Area
Wheat Type
1
2
3
Average
 ⋅ )
(
A
B
C
D
60
59
55
58
61
52
55
58
56
51
52
55
59
54
54
57
9.9 Answer the following questions based on the following ANOVA table.
64
/ Chapter 9 Testing Hypothesis for Several Population Means

Factor

df

 value
-value
6.1575
6.5948
1.4902
0.2094
29.4021
31.4898
7.1159
< 0.005
< 0.005
< 0.005
A
B
AB
Error
12.3152
19.7844
8.9416
10.0525
2
3
6
48
Total
51.0938
59
1) What method of analysis was used?
2) What conclusions can be obtained from the above analysis table? The significance
level is 0.05.
9.10 Research was conducted to compare the job satisfaction of workers in the assembly process with
different working conditions. Another concern is the relationship between the job satisfaction and
years of service. Observers would like to investigate the interaction effect between the years of
service and working conditions. The following table shows the level of the job satisfaction obtained
from the survey. Analyze the data using an appropriate methodology.
Years of service
Good
Working condition
Fair
Bad
< 5
12
15
15
14
12
10
10
9
10
9
8
7
7
8
6
5 - 10
12
14
12
10
11
10
10
14
14
10
10
11
12
10
14
11 or more
9
10
9
9
10
10
11
10
10
12
12
14
15
15
15
9.11 The following table shows the degree of stress in the work and the level of anxiety among 27
workers classified as years of service. Analyze data using the analysis of variance with the 5%
significance level.
Factor A
Years of service
Job-induced pressure (Factor B)
Good
Fair
Bad
< 5
25
28
22
18
23
19
17
24
19
5 - 10
28
32
30
16
24
20
18
22
20
11 or more
25
35
30
14
16
15
10
8
12
Chapter 9 Exercise / 65

9.12 A fertilizer manufacturer hired a research team to study the yields of three grain seeds (A, B, C)
and three types of fertilizer (1, 2, 3). Three grain seeds in combination of three types of fertilizer
were used and the experiment were repeated three times at each combination of treatments. Each
combination of treatments was randomly assigned to 27 different regions. Analyze data using the
analysis of variance with the 5% significance level.
Seed type
1
Fertilizer type
2
3
A
5
8
7
8
8
10
10
9
10
B
6
8
6
10
12
11
15
14
14
C
7
8
10
12
12
14
16
10
18
9.13 The result of an experiment at a production plant of an electronic component to investigate the life
of the product due to changes in temperature (   ) and humidity (  ) is as follows. Analyze
data using the analysis of variance with the 5% significance level.
(Unit: Time)



6.29
6.38
6.25
5.95
6.05
5.89

5.80
5.92
5.78
6.32
6.44
6.29
9.14 The result of a fertilizer manufacturer's experiment with the production of soybeans on two seeds
using three types of fertilizer (A, B, and C) is as follows. Each fertilizer and seed were tested four
times. Analyze data using the analysis of variance with the 5% significance level.
Fertilizer
A
B
C
Seed 1
5
8
7
6
8
8
10
10
10
12
10
10
Seed 2
8
6
8
10
12
11
12
14
14
16
16
18
66
/ Chapter 9 Testing Hypothesis for Several Population Means

Multiple Choice Exercise
9.1 Who first announced the ANOVA method?
① Laspeyres
③ Fisher
② Paasche
④ Edgeworth
9.2 What are the abbreviation of the analysis of variance?
① ANOVA
③
②
④

9.3 Which areas are not the area of application for the analysis of variance?
① marketing survey
③ economy forecasting
② quality control
④ medical experiment
9.4 Which sampling distribution is used for the analysis of variance?
①  distribution
③  distribution

②  distribution
④ Normal distribution
9.5 Which is the correct process for the one-way ANOVA?
a. Calculate Total SS, Treatment SS, Error SS
b. Set the hypothesis
c. Test the hypothesis
d. Calculate the variance ration in the ANOVA table
e. Find the value in the F distribution table
①a→b→c→d→e
③b→a→d→e→c
②b→d→e→a→c
④b→e→d→a→c
9.6 Which is the correct relationship between the total sum of squares (SST), between sum of squares
(SSB), error sum of squares (SSE)?
① SST = SSB + SSE
③ SST = SSE - SSB
② SST = SSB - SSE
④ SST = SSB * SSE
9.7 If    and the observed  ratio is 6.90 in the ANOVA table, what is your conclusion
with the 5% significance level?
① significantly different
③ very similar
② no significant difference
④ unknown
9.8 Which is not appeared in the analysis of variance table?
① sum of squares
③ degrees of freedom
② F ratio
④ standard deviation
Chapter 9 Multiple Choice Exercise / 67

9.9 What is the name of variable which effects response variable in the experimental design?
① cause element
③ dependent variable
② independent variable
④ factor
9.10 In order to compare the fuel mileage of three types of cars, three drivers would like to drive cars,
but fuel mileage may be affected by the driver. What is the name of variable like drivers?
① block variable
③ dependent variable
② independent variable
④ factor
9.11 When we compare the fuel mileage of three types of cars, which experimental desing is used to
reduce the effect of drivers?
bvcxz① completely randomized design
② latin square method
③ two-way ANOVA
④ randomized block design
9.12 What is called the effect of a factor A that varies depending on the level of the factor B?
① main effect of factor A
③ two-way ANOVA
② main effect of factor B
④ interaction effect
(Answers)
9.1 ③, 9.2 ①, 9.3 ③, 9.4 ②, 9.5 ③, 9.6 ①, 9.7 ①, 9.8 ④, 9.9 ④, 9.10 ①,
9.11 ④, 9.12 ④
♡
10
Nonparametric Testing
Hypothesis
SECTIONS
CHAPTER OBJECTIVES
10.1 Nonparametric Test for Location of
Single Population
10.1.1 Sign Test
10.1.2 Wilcoxon Signed Rank Sum Test
The hypothesis tests from Chapters 7 through
9 are based on assumptions such that the
populations of continuous data follow the
normal distributions. However, in real-world
data, such assumptions may not be satisfied.
10.2 Nonparametric Test for Comparing
Locations of Two Populations
10.2.1 Independent Samples: Wilcoxon
Rank Sum Test
10.2.2 Paired Samples: Wilcoxon Signed
Rank Sum Test
This chapter introduces the nonparametric
methods for testing hypothesis by converting
data such as rankings which do not require
assumptions on the population distribution.
10.3 Nonparametric Test for Comparing
Locations of Several Populations
10.3.1 Completely Randomized Design:
Kruskal-Wallis Test
10.3.2 Randomized block design:
Friedman Test
Section 10.1 introduces tests for the location
parameter of single population such as the
Sign Test and Signed Rank Test.
Section 10.2 introduces tests for comparing
location parameters of two populations such
as the Wilcoxon Rank Sum Test.
Section 10.3 introduces tests for comparing
location parameter of several populations
such as the Kruskal-Wallis Test and Friedman
Test.
70
/ Chapter 10 Nonparametric Testing Hypothesis

10.1 Nonparametric Test for the Location Parameter of Single
Population
Ÿ
Ÿ
Ÿ
Ÿ
Ÿ
The hypothesis test for a population mean in Chapter 7 can be done using t
distribution in the case of a small sample if the population is assumed as a
normal distribution. As such, if we make some assumptions about a population
distribution and test a population parameter using sample data, it is called a
parametric test. The hypothesis tests for two population parameters in Chapter 8
and the analysis variance in Chapter 9 are also parametric tests, because they
assume that populations are normal distributions.
However, real world data may not be appropriate to assume that a population
follows a normal distribution, or there may not be enough number of samples to
assume a normal distribution. In some cases, data collected are not continuous or
can be ordinal such as rank, then the parametric tests are not appropriate. In
such cases, methods to test population parameters by converting the data into
signs or ranks without assuming on population distributions are called the
distribution-free or nonparametric tests.
Since the nonparametric test utilizes the converted data such as signs or ranks,
there may be some loss of information about the data. Therefore, if a population
can be assumed as a normal distribution, there is no reason to use the
nonparametric tests. In fact, when a population follows a normal distribution, a
nonparametric test has a higher probability of the type 2 error at the same
significance level. However, a nonparametric test would be more appropriate if
the data are from a population that do not follow a normal distribution.
The hypothesis test for a population mean in Chapter 7 is based on the theory
of the central limit theorem for the sampling distribution of all possible sample
means. However, the nonparametric test use signs by examining whether data
values are small or large from the central location parameter of the population
(the Sign Test of 10.1.1), or use ranks by calculating the ranking of the data (the
Wilcoxon Signed Rank Test of Section 10.1.2). Here, the central location parameter
can be the population mean or the population median, but usually referring to
the population median that is not affected by an extreme point of the data.
Estimation of a population parameter can also be made by using a nonparametric
method, but this chapter only introduces nonparametric hypothesis tests. Those
interested in the nonparametric estimation should refer to the relevant literature.
10.1.1 Sign Test
Ÿ
Example 10.1.1
Let's take a look at the sign test with the following examples.
A bag of cookies is marked with a weight of 200g. Ten bags are randomly selected
from several retailers and examined their weights as follows. Can you say that there
are as many cookies in the bag as the weight marked?
203 204 197 195 201 205 198 199 194 207
⇨ eBook ⇨ EX100101_CookieWeight.csv
1) Draw a histogram of the data to check whether a testing hypothesis using a
parametric method can be performed.
2) Test the hypothesis by using a nonparametric method which utilizes the sign data by
examining whether data values are smaller or larger than 200 with the significance
level of 5%.
3) Check the result of the above test using『eStatU』.
10.1 Nonparametric Test for the Location Parameter of Single Population / 71

Example 10.1.1
Answer
1) The null and alternative hypothesis to test the population mean  can be written
as follows:
 :  = 200,
 :  ≠ 200
In order to test the hypothesis using the parametric t-test in Chapter 7, it is
necessary to assume that the population is normally distributed, because the sample
size of 10 is small. Let us check whether the sample data is a normal distribution
by using a histogram. Enter data in『eStat』 as shown in <Figure 10.1.1>
<Figure 10.1.1> Data input
for cookie weight
w Click icon
of the testing hypothesis for the population mean and select ‘Weight’
as the analysis variable in the variable selection box. A dot graph with the 95%
confidence interval will appear as <Figure 10.1.2>. If you click the [Histogram]
button in the options window below the graph, a histogram as shown in <Figure
10.1.3> will appear. If you look at the histogram, it is not sufficient to assume that
the population follows a normal distribution. In such cases, applying the parametric
hypothesis test may lead to errors.
<Figure 10.1.2> Dot graph of the
cookie weight
<Figure 10.1.3> Histogram of the
cookie weight
2) In this case, the sample data can be converted to sign data only by examining
whether the weight of cookie bag is greater than 200g (+ marked) or not (marked).
sample data
sign data
203 204 197 195 201 205 198 199 194 207
+
+
+
+
+
72
/ Chapter 10 Nonparametric Testing Hypothesis

Example 10.1.1
Answer
(continued)
If the number of + signs and – signs are similar, the weight of cookie bag would
be 200g approximately. If the number of + signs is larger than – signs, then the
weight of cookie bag is greater than 200g. If the number of – signs is larger than
+ signs, then the weight of cookie bag is less than 200g.
w Since the above sign data only investigate whether a data is larger and smaller
than 200 and never use a concept of the mean, it can be considered as testing for
the population median (  ) as follows:
 :  = 200
 :  ≠ 200
w In the sign data above, ‘the number of + signs’ (denote it as  ) or ‘the number
of – signs’ (denote as  ) follows a binomial distribution with parameters of =10,
=0.5 (<Figure 10.1.4>).
<Figure 10.1.4> Binomial distribution when =10, =0.5
w Therefore, if  is correct, the number of + signs may be the most likely to be 5
and 0, 1 or 9, 10 are very unlikely to be present. In order to test  :  = 200
with 5% significance level, since it is a two-sided test, rejection region should have
the 2.5% probability at both ends of the binomial distribution, so it is approximately
as follows:
If the number of + signs ( ) is either 0, 1 (cumulated probability from left is
0.011) or 9, 10 (cumulated probability from right is 0.011), then reject 
This rejection region has a total probability of 2*0.011 = 0.022 which is smaller
than the significance level of 0.05. When we use a discrete distribution such as
binomial distribution, it may be difficult to find a rejection region which is exactly
the same as the significance level. If we include one more values in the rejection
region, the decision rule is as follows:
If the number of + signs ( ) is either 0, 1, 2 (cumulated probability from left is
0.055) or 8, 9, 10 (cumulated probability from right is 0.055), then reject 
This rejection region has a total probability of 2*0.055 = 0.110 which is greater
than the significance leve of 0.05. Therefore, the middle values 1.5 (of 1 and 2)
and 8.5 (of 8 and 9) can be used in the decision rule as follows:
If the number of + signs ( ) < 1.5 or  > 8.5, then reject 
10.1 Nonparametric Test for the Location Parameter of Single Population / 73

Example 10.1.1
Answer
(continued)
This method may also be approximate. In the case of testing using a discrete
distribution, it is not possible to say 'what is right' among the above decision rules
and the analyst should select the critical value near the significance level. In this
example, the number of + signs ( ) is 5 and you can not reject the null
hypothesis  . In other words, the median of the weight of the cookie bag is
200g.
3) Enter data as shown in <Figure 10.1.5> in『eStatU』and press the [Execute] button
to show the test result as in <Figure 10.1.6>. It shows the critical lines for values
containing the significance level of 5% (2.5% for both tests). For a discrete
distribution such as the binomial distribution, the choice of the final reject region
shall be determined by the analyst.
<Figure 10.1.5> Data input for sign test in
『eStatU』
『
<Figure 10.1.6> Result of sign test using eStatU
[Practice 10.1.1]
』
A psychologist has selected 9 handicap workers randomly from production workers
employed at various factories in a large industrial complex and their work competency
scores are examined as follows. The psychologist wants to test whether the population
median score is 40. Assume the population distribution is symmetrical about the mean.
32, 52, 21, 39, 23, 55, 36, 27, 37
⇨ eBook ⇨ PR100101_CompetencyScore.csv
1) Check whether a parametric test is possible.
2) Apply the sign test with the significance level of 5%.
74
/ Chapter 10 Nonparametric Testing Hypothesis

Ÿ
Ÿ
When the population median is  , the sign test is to test whether    or
   (or    or  ≠  ). However, if the population distribution is
symmetrical to the mean, the sign test is the same as the test of the population
mean, because mean and median are the same in this case.
When there are  number of samples, the test statistic for the sign test uses the
number of data which are greater than  which is denoted as . The sign test
uses the random variable of ‘the number of + signs ( )’ which follows a
binomial distribution with parameters  and  =0,5, i.e.,   when the null
hypothesis is true. You can use the number of data which are less than 
which is  =    .  also follows a binomial distribution   . Let us
use  in this section.   represents the right tail 100 ×  percentile, but
the accurate percentile value may not exist, because it is a discrete distribution.
In this case, middle value of two nearest percentile is often used. Table 10.1.1
summarizes the decision rule for each type of hypothesis of the sign test.
Table 10.1.1 Decision rule of the sign test
Decision Rule
Test Statistic  = 'number of plus sign data'
Type of Hypothesis
1)     
If      , then reject  , else accept 
    
2)     
If        , then reject  , else accept 
    
3)     
If        or     , then
reject  , else accept 
   ≠ 
☞
If the observed value is the same as  ?
If any of the observations has the same value as  , they are not used
in the sign test. In other words, reduce  .
Ÿ
As studied in Chapter 5, the binomial distribution    can be approximated
by the normal distribution     if  is sufficiently large. Therefore, if
the sample size is large, the test statistic  = 'the number of plus sign data'
can be tested using the normal distribution     . Table 10.1.2
summarizes the decision rule for each hypothesis of the sign test in the case of
large samples.
Table 10.1.2 Decision rule of the sign test (large sample case)
Decision Rule
Test Statistic:  = 'number of plus sign data'
Type of Hypothesis
1)     
    
2)     
    
3)     
   ≠ 
  
   , then reject  , else accept 
If 


  
   , then reject  , else accept 
If 


If

  




   , then reject  , else accept 
10.1 Nonparametric Test for the Location Parameter of Single Population / 75

10.1.2 Wilcoxon Signed Rank Sum Test
Ÿ
Example 10.1.2
The sign test described in the previous section converted sample data to either +
or - symbols by examining whether the data were larger or smaller than the
medium  . In this case, most of the information that the original sample data
have is lost. In order to apply the Wilcoxon signed rank test, we subtract 
first from the sample data and take the absolute value of this data. Assign ranks
on these absolute values and calculate the sum of the larger ranks than  and
the sum of the smaller ranks than  . If two rank sums are similar, we conclude
that the population median is equal to  . This signed rank sum test is the most
widely used nonparametric method for testing the central location parameter of a
population. This test takes into account the relative size of the sample data as
well as the larger and smaller than  .
Using the cookie weight data of [Example 10.1.1], apply the signed rank test to see
whether the weight of the cookie bag is 200g or not with the significance level of 5%
203 204 197 195 201 205 198 199 194 207
⇨ eBook ⇨ EX100101_CookieWeight.csv
Check the result of the signed rank test using『eStatU』.
Example 10.1.2
Answer
w The hypothesis for this problem is to test whether the population median(  ) is
200g or not.
 :  = 200
 :  ≠ 200
w The signed rank sum test examines not only checking the sample data are greater
than  = 200g (+ sign) or not (- sign), but also checking the rank of values of
|data – 200|. If there are tied values, assign the average rank to each of tied
values. For example, since there are two tied values of ‘1’ which is the smallest
among |data – 200|, the corresponding ranks of 1 and 2 are averaged which is
1.5 and assign the averaged rank to each of value ‘1’.
Sample data
Sign data
|data – 200|
Rank of |data – 200|
Rank sum of ‘+’ sign
(  )
203
+
3
4.5
204
+
4
6
4.5 + 6
197
3
4.5
195
5
7.5
+
201
+
1
1.5
205
+
5
7.5
1.5 + 7.5
198
2
3
199
1
1.5
+
194
6
9
207
+
7
10
10 = 29.5
w The sum of all ranks is    ⋯        . If the rank sum of +
sign data (  ) and the rank sum of – sign data (  ) are similar (approximately
27.5 or so), the null hypothesis  = 200g would be true. In this example,  =
29.5 and  = 25.5. Since  is greater than  , the weight data which are
greater than 200g appears to be dominant. What kind of large difference is
statistically significant?
w To investigate how large a value is statistically significant when the null hypothesis
is true, the sampling distribution of random variable  = 'rank sum of + sign
data' (or  = 'rank sum of – sign data') should be known. If  is true, the
number of cases for  is shown in Table 10.1.3. It is not easy to examine all of
these possible rankings to create a distribution table.『eStatU』shows the
distribution of Wilcoxon signed rank sum as shown in <Figure 10.1.7> and its table
as in Table 10.1.4.
76
/ Chapter 10 Nonparametric Testing Hypothesis

Example 10.1.2
Answer
(continued)
Table 10.1.3 All possible cases of  = 'rank sum of + sign data'
Number of data
with + sign
0
{1}, {2}, ⋯ , {10}
{1,2}, {1,3}, ⋯ , {1,10},
{2,3}, ⋯ , {2,10},
⋯
{9,10}
⋯
{1,2, ⋯ ,10}
0
1
2
⋯
10
All possible rank sum
of 
All possible combination of
ranks
0
1, 2, ⋯ , 10
3, 4, ⋯ , 11,
5, ⋯ , 12,
⋯
19
⋯
55
<Figure 10.1.7> Distribution of Wilcoxon signed rank sum
when   
Table 10.1.4 Distribution of Wilcoxon signed rank sum when   
Wilcoxon Signed
Rank Sum Distribution
n = 10
x
P(X = x)
P(X ≤ x)
P(X ≥ x)
0
0.0010
0.0010
1.0000
1
0.0010
0.0020
0.9990
2
0.0010
0.0029
0.9980
3
0.0020
0.0049
0.9971
4
0.0020
0.0068
0.9951
5
0.0029
0.0098
0.9932
6
0.0039
0.0137
0.9902
7
0.0049
0.0186
0.9863
8
0.0059
0.0244
0.9814
9
0.0078
0.0322
0.9756
⋯
⋯
⋯
⋯
47
0.0059
0.9814
0.0244
48
0.0049
0.9863
0.0186
49
0.0039
0.9902
0.0137
50
0.0029
0.9932
0.0098
51
0.0020
0.9951
0.0068
52
0.0020
0.9971
0.0049
53
0.0010
0.9980
0.0029
54
0.0010
0.9990
0.0020
55
0.0010
1.0000
0.0010
10.1 Nonparametric Test for the Location Parameter of Single Population / 77

Example 10.1.2
Answer
(continued)
w Since it is a two-sided test with the 5% significance level, if you find a 2.5%
percentile at both ends, P(X ≤ 8) = 0.0244, P(X ≥ 47) = 0.0244. In case of a
discrete distribution, we can not find the exact 2.5 percentile from both ends.
Therefore, the decision rule can be written as follows:
‘If  ≤  or
 ≥ , then reject  ’
Since  = 29.5 in this problem, we can not reject  .
w After entering the data in『eStatU』as in <Figure 10.1.8>, pressing the [Execute]
button will calculate the sample statistics and show the test result as in <Figure
10.1.9>. The critical lines are the value for containing 5% significance level from
both sides (the probability of each end is 2.5%). For a discrete distribution, the
choice of the final reject region should be determined by the analyst.
<Figure 10.1.8>
『eStatU』Signed rank sum test
『
<Figure 10.1.9> Signed rank sum test in eStatU
』
w The signed rank sum test may be done using『eStat』. If you enter the data as
shown in <Figure 10.1.10>, select 'Weight' as the analysis variable in the variable
selection box and click the icon
of testing the population mean. Then a dot
graph with the 95% confidence interval for the population mean will appear as
<Figure 10.1.11>.
78
/ Chapter 10 Nonparametric Testing Hypothesis

Example 10.1.2
Answer
(continued)
<Figure 10.1.10> Data input
for cookie weight
<Figure 10.1.11> Dot graph and
confidence interval of cookie weight
w Enter a value of 200 from the options below the graph and click the [Wilcoxon
Signed Rank Sum Test] button to display the same test result graph and result
table as in <Figure 10.1.12>.
<Figure 10.1.12> Result of the Wilicoxon Signed Rank Sum Test
Ÿ
If we denote the population median as  , the signed rank sum test is to test
whether the population median is  or greater than (or less than or not equal
to)  . However, if the population distribution is symmetric about the mean, the
signed rank sum test becomes to test about the population mean, because the
population median and mean are the same. The basic statistical model is as
follows:
     ,   ⋯
where  ’s are independent, symmetric about the mean 0 and follow the same distribution.
Ÿ
If      are sample data, ranks of      are calculated first and the
sum of ranks for the data which are greater than  (+ sign data of    ),
denoted as , is calculated.  is the test statistic for the signed rank sum test
and the sampling distribution of , denoted as   , is calculated for testing
hypothesis by considering all possible cases. eStatU provides   until    .
   denotes right tail 100 ×  percentile of the    distribution, but it is
not easy to find the exact percentile, because   is a discrete distribution
and is usually used to approximate the two adjacent values. Table 10.1.5
summarizes the decision rule for the Wilcoxon signed rank sum test for each type
『
』
10.1 Nonparametric Test for the Location Parameter of Single Population / 79

of hypothesis.
Table 10.1.5 Decision rule of Wilcoxon signed rank sum test
Type of Hypothesis
1)     
    
2)     
    
3)     
   ≠ 
☞
Decision Rule
Test Statistic  = Rank sum of + sign data of     
If     , then reject  , else accept 
If       , then reject  , else accept 
If      
 , else accept 
or      , then reject
If the observed value is the same as  ?
If any of the observed values has the same value as  , they are not
used in the test. In other words, reduce  .
[Practice 10.1.2]
A psychologist has selected 9 handicap workers randomly from production workers
employed at various factories in a large industrial complex and their work competency
scores are examined as follows. The psychologist wants to test whether the population
median score is 45. Assume the population distribution is symmetrical about the mean.
32, 52, 21, 39, 23, 55, 36, 27, 37
⇨ eBook ⇨ PR100101_CompetencyScore.csv
1) Check whether a parametric test is possible.
2) Apply the Wilcoxon signed rank test with the significance level of 5%.
3) Compare this test result with the sign test of [Practice 10.1.1].
Ÿ
If the sample size is large enough, the test statistic  is approximated to a
normal distribution with the following mean    and variance    when
the null hypothesis is true.
  
     

    
    

Ÿ
Table 10.1.6 summarizes the decision rule of the signed rank sum test for each
type of hypothesis.
80
/ Chapter 10 Nonparametric Testing Hypothesis

Table 10.1.6 Decision rule of Wilcoxon signed rank sum test (large sample case)
Decision Rule
Test Statistic:  = Rank sum of + sign data of     
Type of Hypothesis
1)     
    
2)     
    
3)     
   ≠ 
Ÿ
      
  , then reject  , else accept 
If 
   

      
   , then reject  , else accept 
If 
   

If

      

   


The distribution of    is independent of the population distribution. In other
words, the Wilcoxon signed rank sum test is a distribution free test. For example,
if    , the distribution of   can be obtained as follows:
Rank
Ÿ
Ÿ
  , then reject  , else accept 
Possible value of 
1
2
3
-
-
-
0
+
-
-
1
-
+
-
2
-
-
+
3
+
+
-
3
+
-
+
4
-
+
+
5
+
+
+
6
Therefore, the distribution of    can be calculated as follows which is
independent of the population distribution.
  
0
1
2
3
4
5
6
   





















If there is a tie on the value of      , the average rank is calculated when
the ranking is obtained. In this case, the variance of  in case of large sample
is calculated using the following modified formula.

 
     

   
Here  = (number of groups of tie)
 = (size of  tie group, i.e., number of observations in the tie group)
if there is no tie, size of  tie group is 1 and =1
           

10.2 Nonparametric Test for Location Parameters of Two Populations / 81

10.2 Nonparametric Test for Location Parameters of Two
Populations
Ÿ
Ÿ
Ÿ
The testing hypothesis for the two population means in Chapter 8 used the
t-distribution in case of a small sample, if each population could be assumed to
be a normal distribution. However, the assumption that the population follows a
normal distribution may not be appropriate for real world data, or that there may
not be enough sample data to assume a normal distribution. Alternatively, if
collected data is ordinal such as ranking, then the parametric t-test is not
appropriate. In such cases, a nonparametric method is used to test parameters by
converting data to ranks without assuming the distribution of the population. This
section introduces the Wilcoxon rank sum test.
Nonparametric tests convert data into ranks, so there may be some loss of
information about the data. Therefore, if data are normally distributed, there is no
reason to apply a nonparametric test. However, a nonparametric method would
be a more appropriate method if the data do not follow a normal distribution.
As in Chapter 8, this section introduces nonparametric tests for testing location
parameters of two populations for the samples drawn independently from each
population and for the samples drawn as paired.
10.2.1 Independent Samples: Wilcoxon Rank Sum Test
Ÿ
Example 10.2.1
Let's take a look at the Wilcoxon rank sum test with the following example.
A professor teaches Statistics courses to students in the Department of Economics and
the Department of Management. In order to compare exam scores of students in the
two departments, seven students were randomly sampled from the Economics
Department and six students from the Management Department and their scores were
as follows:
Department of Economics
87 75 65 95 90 81 93
Department of Management 57 85 90 83 87 71
⇨ eBook ⇨ EX100201_ScoreByDepartment.csv
1) Draw a histogram of the data to verify that the testing hypothesis can be
performed using a parametric method.
2) Apply the Wilcoxon rank sum test with the significance level of 5%.
3) Check the result of the Wilcoxon rank sum test using『eStat』.
Example 10.2.1
Answer
1) The hypothesis of this problem to test two population means  and  are as
follows:
     ,
   ≠ 
Since the sample sizes,  = 7 and  = 6, are small from each population, it is
necessary to assume that the populations are normally distributed in order to apply
the parametric t-test. In order to check whether each sample data follows a normal
distribution, let us draw a histogram using『eStat』. Enter data in『eStat』 as
shown in <Figure 10.2.1>.
82
/ Chapter 10 Nonparametric Testing Hypothesis

Example 10.2.1
Answer
(continued)
『
』
<Figure 10.2.1> Data
input at eStat
w Click icon
for testing two population means in the main menu. Select ‘Score’ as
'Analysis Var' and ‘Dept’ as ‘By Group’ variable. Then, two dot graphs together with
95% confidence intervals for each population mean will appear as in <Figure
10.2.2>. Average score of students in the Economics Department appears to be
higher than the average score of students in the Management Department, but it
should be tested for statistical significance. Pressing the [Histogram] button in the
options window below the graph will reveal the histogram and normal distribution
curves for each department as in<Figure 10.2.3>.
<Figure 10.2.2> Dot graph and
confidence interval by department
<Figure 10.2.3> Histogram by
department
2) Looking at the histogram, the small number of data is not sufficient to assume that
the population follows a normal distribution. In such case, applying the parametric
t-test may lead to error. In case of a nonparametric test, we test the location
parameter of the population such as median which is not so sensitive to extreme
values. The hypothesis for this problem is to test whether the median values 
and  of the two populations are equal or not as follows:
    
   ≠ 
w The Wilcoxon rank sum test calculates ranks of each data by combining two
samples first and then calculate the sum of ranks in each sample. If there is a tie,
then the averaged rank shall be used. To obtain the ranks of the combined sample,
it is convenient to arrange each sample data in ascending order as shown in Table
10.2.1. The sum of ranks  and  in each sample will be used as the test
statistic for the Wilcoxon rank sum test.
10.2 Nonparametric Test for Location Parameters of Two Populations / 83

Example 10.2.1
Answer
(continued)
Table 10.2.1 A table to calculate ranks in a combined sample
Sorted Data
of Sample 1
Sorted Data
of Sample 2
57
Ranks
of Sample 1
65
Ranks
of Sample 2
1
2
71
75
81
3
4
5
83
85
87
90
87
90
93
95
Sum of ranks
8.5
10.5
12
13
 = 55
6
7
8.5
10.5
 = 36
w The sum of all ranks is    ⋯       . The sum of ranks in
sample 1 is  = 55 and the sum of ranks in sample 2 is  = 36. Note that 
+  = 91. If  and  are similar, the null hypothesis that two population
medians are the same is accepted. In this example  is larger than  and it
seems the median of the population 1 is larger than the median of the population
2. But how much difference in the rank sum would be statistically significant if you
consider the sample sizes?
w To investigate how large a difference in the rank sum is statistically significant when
the null hypothesis is true, the sampling distribution of the random variable  =
'Rank sum of sample 2' (or  = 'Rank sum of sample 1') should be known. If 
is true, the number of cases for  is    as shown in Table 10.2.2. It
is not easy to examine all of these possible rankings to find the distribution table.
『eStatU』provides the Wilcoxon rank sum distribution and its table as shown in
<Figure 10.2.4>.
Table 10.2.2 All possible ranks for six data in sample 2 if n = 13
All possible permutation of ranks
{1,2,3,4,5,6}
{1,2,3,4,5,7}
⋯
{8,9,10,11,12,13}
Sum of ranks, 
21
22
⋯
63
<Figure 10.2.4> Wilcoxon rank sum distribution when  =7,  =6
84
/ Chapter 10 Nonparametric Testing Hypothesis

Example 10.2.1
Answer
(continued)
Table 10.2.3 Wilcoxon rank sum distribution when  =7,  =6
Wilcoxon rank sum
distribution
 = 7
 = 6
x
P(X = x)
P(X ≤ x)
21
0.0006
0.0006
P(X ≥ x)
1
22
0.0006
0.0012
0.9994
23
0.0012
0.0023
0.9988
24
0.0017
0.0041
0.9977
25
0.0029
0.007
0.9959
26
0.0041
0.0111
0.993
27
0.0064
0.0175
0.9889
28
0.0082
0.0256
0.9825
29
0.0111
0.0367
0.9744
⋯
⋯
⋯
⋯
55
0.0111
0.9744
0.0367
56
0.0082
0.9825
0.0256
57
0.0064
0.9889
0.0175
58
0.0041
0.993
0.0111
59
0.0029
0.9959
0.007
60
0.0017
0.9977
0.0041
61
0.0012
0.9988
0.0023
62
0.0006
0.9994
0.0012
63
0.0006
1
0.0006
w Since the hypothesis requires a two sided test with the significance level of 5%, so
if you find a 2.5 percentile at both ends, P(X ≤ 28) = 0.0256, P(X ≥ 56) =
0.0256. Since it is a discrete distribution, there is no exact value of the 2.5
percentile. Therefore, the decision rule can be set as follows:
‘If  ≤  or  ≥ , then reject  ’
In this problem  = 36, and therefore, we cannot reject  which means the
difference between  and  is not statistically significant.
3) In『eStatU』, enter the data as <Figure 10.2.5>
will calculate the sample statistics and show
10.2.6>. The two critical lines which correspond
here. For a discrete distribution such as this, the
should be determined by the analyst.
『
』
and click the [Execute] button. It
the test result graph as <Figure
to 2.5% from the end are shown
choice of the final rejection region
<Figure 10.2.5> Data input for the Wilcoxon rank sum test
at eStatU
10.2 Nonparametric Test for Location Parameters of Two Populations / 85

Example 10.2.1
Answer
(continued)
<Figure 10.2.6> Wilcoxon rank sum test using
『eStatU』
w The rank sum test can be performed using『eStat』. After you saw <Figure 10.2.2>,
click the [Wilcoxon Rank Sum Test] button in the options window below the graph.
Then a test result graph as shown in <Figure 10.2.6> will appear in the Graph Area
and a test result table as in <Figure 10.2.7> will appear in the Log Area.
<Figure 10.2.7> Result table of Wilcoxon rank sum test
Ÿ
Let's generalize the Wilcoxon rank sum test described in [Example 10.2.1]. Denote
random samples selected independently from each of the two populations as
follows. The sample sizes are  and  respectively, and      .
Sample 1
  ⋯ 
Sample 2
  ⋯ 
For convenience, assume  ≥  . If  ≤  , you can swap between  and  .
86
/ Chapter 10 Nonparametric Testing Hypothesis

Ÿ
The statistical model of the Wilcoxon rank sum test is as follows:
     ,   ⋯
       ,   ⋯ , * You may write     
Ÿ
Here  is the difference between location parameters.  ’s are independent and
follow the same continuous distribution which is symmetric about 0.
The test statistic for the Wilcoxon rank sum test is the sum of ranks,  , for
  ⋯  based on the combined sample of   ⋯  ,   ⋯  . The
distribution of the random variable  = ‘Sum of the ranks for  sample’ can be
obtained by investigating all possible cases of ranks for Y which is  and is
denoted as    . eStatU provides the Wilcoxon rank sum distribution
    and its table up to    .     denotes the right tail 100 × 
percentile, but it might not be able to find the accurate percentile, because
    is a discrete distribution. In this case, middle value of two percentiles
near      is often used as an approximation. Table 10.2.4 summarizes the
decision rule for each type of hypothesis.
『
』
Table 10.2.4 Wilcoxon rank sum test
Type of Hypothesis
1)     
    
2)     
    
Decision Rule
Test Statistic:  = ‘Sum of ranks assigned samples of  ’
If      , then reject  , else accept 
If        , then reject  , else accept 
3)     

☞
[Practice 10.2.1]
If        or        , then
  ≠  reject  , else accept 
If there is a tie in the combined sample, assign the average rank.
A company wants to compare two methods of obtaining information about a new
product. Among company employees, 17 employees were randomly selected and divided
into two groups. The first group learned about the new product by the method A, and
the second group learned by the method B. At the end of the experiment, the
employees took a test to measure their knowledge of the new product and their test
scores are as follows:
Method A:
50 59 60 71 80 78 72 77 73
Method B:
52 54 58 78 65 61 60 72
⇨ eBook ⇨ PR100201_ScoreByMethod.csv
1) Can we apply a parametric test to conclude that population means of the two
groups are different?
2) Apply a nonparmetric test to conclude that the median values of the two groups
are different. Test with the significance level of 0.05.
10.2 Nonparametric Test for Location Parameters of Two Populations / 87

Ÿ
When the null hypothesis is true, if the sample is large enough, the test statistic
is approximated to the normal distribution with the following mean    and
variance    :
     
     

     
    

Ÿ
Table 10.2.5 summarizes the decision rule for each hypothesis type of the
Wilcoxon rank sum test if the sample is large enough.
Table 10.2.5 Wilcoxon rank sum test (large sample case)
Decision Rule
Test Statistic:  = ‘Sum of ranks assigned samples of  ’
Type of Hypothesis
1)     
    
2)     
    
3)     
   ≠ 
Ÿ
    
   , then reject  , else accept 
If 
   

    
    , then reject  , else accept 
If 
   

If

     

  


   , then reject  , else accept 
The distribution of rank sum statistic,    , is not dependent on the
population distribution. That is, the rank sum test is a distribution free test. For
example, if  = 3 and  = 2, the distribution   can be found as follows.
All possible cases of ranks for  is    .
All possible ranks for combined sample
Ÿ





Value of 
3
2
2
2
1
1
1
1
1
1
4
4
3
3
4
3
3
2
2
2
5
5
5
4
5
5
4
5
4
3
1
1
1
1
2
2
2
3
3
4
2
3
4
5
3
4
5
4
5
5
3
4
5
6
5
6
7
7
8
9
Therefore, the distribution
distribution as follows:
  
is
given regardless
of
the
  
3
4
5
6
7
8
9
P  





















population
88
/ Chapter 10 Nonparametric Testing Hypothesis

Ÿ
If there is a tie in the combined sample, the average rank is assigned to each
data. In this case, the variance of  should be modified in case of large sample
as follows:



     


 



           
        
 
Here  = (number of tied groups)
 = (size of  tie group, i.e., number of observations in the tie group)
if there is no tie, size of  tie group is 1 and =1
10.2.2 Paired Samples: Wilcoxon Signed Rank Sum Test
Ÿ
Ÿ
Section 8.1.2 discussed the testing hypothesis for two population means using
paired samples. Paired samples are used when it is difficult to extract samples
independently from two populations, or if independently extracted, the
characteristics of each sample object are so different that the resulting analysis is
meaningless. If two populations are normally distributed, the t-test was applied for
the difference data of the paired samples as described in Section 8.1.2. However,
if the normality assumption of two populations can not be satisfied, the Wilcoxon
signed rank sum test in Section 10.1.2, which is a nonparametric test, can be
applied to the difference data of the paired samples.
In case of the paired samples, first calculate the differences (      ) for
each paired sample as shown in Table 10.2.6. For the data of differences, we
examine the normality to check whether the parametric test can be applicable or
not. If it is not applicable, we apply the Wilcoxon signed rank sum test on the
differences.
Table 10.2.6 Data of differences for paired samples
Pair
number
1
2
...

Ÿ
Example 10.2.2
Sample of
population 1
Sample of
population 2






...
...


Difference
    
    
    
...
    
Let's take a look at the next example.
The following is the survey result of eight samples from young couples. The husband’s
age and wife’s age of each couple are recorded.
(28, 28) (30, 29) (34, 31) (29, 32) (28, 29) (31, 33) (39, 35) (34, 29)
⇨ eBook ⇨ EX100202_AgeOfCouple.csv
1) Calculate data of differences in each pair and draw their histogram to check
whether a parametric test is applicable or not.
2) Apply the Wilcoxon signed rank sum test to see whether the husband’s age is
greater than the wife’s age with the significance level of 0.05.
3) Check the result of the above signed rank sum test using『eStat』.
10.2 Nonparametric Test for Location Parameters of Two Populations / 89

Example 10.2.2
Answer
1) The data of age differences between husband and wife are as follows:
Table 10.2.7 Data of age differences between husband and wife
Husband
Wife
Difference
Number


    
1
2
3
4
5
6
7
8
28
30
34
29
28
31
39
34
28
29
31
32
29
33
35
29
0
1
3
-3
-1
-2
4
5
w The histogram for the data of differences by using『eStat』
(the testing
hypothesis for a population mean) is as in <Figure 10.2.8>. If you look at the
histogram, it is not sufficient evidence that the data of differences follow a normal
distribution, because the number of data is small. In such a case, applying the
parametric hypothesis test may lead to errors. An appropriate nonparametric
method for this problem is the Wilcoxon signed rank sum test on the data of
differences.
<Figure 10.2.8> Histogram of age difference
2) The hypothesis to test is that the population median of the husband’s age (  ) is
the same as the population median of the wife’s age (  ) or not as follows:
    
   ≠ 
Since it is a paired sample, the hypothesis can be written whether the population
median of differences (  ) is equal to 0 or not as follows:
    
   ≠ 
w In order to apply the signed rank sum test on the data of differences, we count
the number of differences which is greater than 0 (denote as + sign) or not
(denote as – sign) and assign ranks on |difference – 0|. Then calculate the sum of
ranks with + sign and the sum of ranks with – sign. If the difference data is 0,
omit the data. If there are ties on the difference data, assign the average rank.
90
/ Chapter 10 Nonparametric Testing Hypothesis

Example 10.2.2
Answer
(continued)
Difference data
Sign data
|data – 0|
Rank of |data – 0|
Rank sum of ‘+’ sign (  = 19)
1
+
1
1.5
3
+
3
4.5
-3
3
4.5
-1
1
1.5
-2
2
3
1.5 + 4.5
4
+
4
6
5
+
5
7
+ 6 + 7
w In『eStatU』, the distribution of the Wilcoxon signed rank sum when    is
shown in <Figure 10.2.9> and Table 10.2.8.
Table 10.2.8 Wilcoxon signed rank sum distribution when   
Wilcoxon Signed
Rank Sum Distribution
n = 7
x
P(X = x)
P(X ≤ x)
P(X ≥ x)
0
0.0078
0.0078
1.0000
1
0.0078
0.0156
0.9922
2
0.0078
0.0234
0.9844
3
0.0156
0.0391
0.9766
⋯
⋯
⋯
⋯
25
0.0156
0.9766
0.0391
26
0.0078
0.9844
0.0234
27
0.0078
0.9922
0.0156
28
0.0078
1.0000
0.0078
w Since it is a two-sided test with the significance level of 5%, if a 2.5 percentile is
found at both ends, P(X ≤ 2) = 0.0234, P(X ≥ 26) = 0.0234. Since it is a discrete
distribution, there is no exact value of the 2.5 percentile. Therefore, the decision
rule is as follows:
If  ≤  or  ≥ , reject 
Since  = 19 in this problem, we can not reject the null hypothesis  and
conclude that the husband’s age and the wife’s age are the same.
3) Enter the data as shown in <Figure 10.2.9> in『eStat』and click the icon
which
is the test for a population mean. If you select the variable ‘Difference’ as the
analysis variable, a dot graph with the 95% confidence interval for the population
mean difference will appear. If you enter 0 for testing value in the hypothesis
option and click the [Execute] button, you will see the test result as in <Figure
10.2.10> and <Figure 10.2.11>. Two critical lines for values containing 2.5 percentile
from both sides are shown here. For a discrete distribution, the choice of the final
decision rule should be determined by the analyst.
<Figure 10.2.9> Data
difference
10.2 Nonparametric Test for Location Parameters of Two Populations / 91

Example 10.2.2
Answer
(continued)
<Figure 10.2.10>
『eStat』Signed rank sum test
<Figure 10.2.11> Result of Wilcoxon signed rank sum test
Ÿ
The Wilcoxon signed rank test for the paired samples is to test whether the
population median of the differences between two populations,  , is zero or
not. If we denote the paired samples as       ⋯    , the
Wilcoxon signed rank sum test calculates the difference      first and
assign ranks on    . The sum of ranks of    which has + sign, , is used as
the test statistic. eStatU provides the distribution of , denoted as    , up
to    .    refers to the right tail 100 ×  percentile of this distribution
which may not have an accurate percentile value, because it is a discrete
distribution. In this case the average of two values near   is used
approximately. Table 10.2.9 summarizes the decision rule of the Wilcoxon signed
rank sum test for paired samples by the type of hypothesis.
『
』
92
/ Chapter 10 Nonparametric Testing Hypothesis

Table 10.2.9 Wilcoxon signed rank sum test for paired samples
Decision Rule
Test Statistic:  = ‘sum of ranks on    with + sign’
Type of Hypothesis
1)     
    
2)     
    
If     , then reject  , else accept 
If       , then reject  , else accept 
3)     
If      
   ≠  else accept 
or     , then reject  ,
If there is 0 on the differences of paired samples?
☞
If there is 0 on the differences of paired samples, the data is omitted
for further analysis. That is,  is decreased.
[Practice 10.2.2]
An oil company has developed a gasoline additive that will improve the fuel mileage of
gasoline. We used 8 pairs of cars to compare the fuel mileage to see if it is actually
improved. Each pair of cars has the same details as its structure, model, engine size,
and other relationship characteristics. When driving the test course using gasoline, one
of the pair selected randomly and added additives, the other of the pair was driving
the same course using gasoline without additives. The following table shows the km
per liter for each of pairs.
pair
1
2
3
4
5
6
7
8
Additive
(X1)
17.1
12.7
11.6
15.8
14.0
17.8
14.7
16.3
No Additive
(X2)
16.3
11.6
11.2
14.9
12.8
17.1
13.4
15.4
Difference
0.8
1.1
0.4
0.9
1.2
0.7
1.3
0.9
⇨ eBook ⇨ PR100202_DifferenceOfMileage.csv
Apply a nonparametric test to check whether the additive increase fuel mileage. Use
the significance level of 0.05.
Ÿ
If the sample size of the paired sample is large, use the normal distribution
approximation formula shown in Table 10.1.6.
10.3 Nonparametric Test for Location Parameters of Several
Populations
Ÿ
The testing hypothesis for several population means in Chapter 9 was possible if
each population could be assumed to be a normal distribution and has the same
population variance. However, the assumption that the population follows a
normal distribution may not be true for real-world data, or that there may not
10.3 Nonparametric Test for Locations of Several
Populations / 93

Ÿ
be enough data to assume a normal distribution. Alternatively, if data are ordinal
such as ranks, then the parametric test is not appropriate. In this case, a
nonparametric test is used by converting data into ranks without making
assumptions about the population distribution. This section introduces the
Kruskal-Wallis test corresponding to the completely randomized design of
experiments and the Friedman test corresponding to the randomized block design
of experiments in Chapter 9.
Since nonparametric tests are done by using the converted data such as ranks,
there may be some loss of information about the data. Therefore, if data are
normally distributed, there is no reason to apply a nonparametric test. However, a
nonparametric test would be a more appropriate method if data were selected
from a population that did not follow a normal distribution.
10.3.1 Completely Randomized Design: Kruskal-Wallis Test
Ÿ
Example 10.3.1
The Kruskal–Wallis test extends the Wilcoxon rank sum test for two populations.
Consider the following example.
The result of a survey of the job satisfaction by sampling employees of three
companies are as follows. From this data, can you say that the three companies have
different job satisfaction? (unit: points out of 100 scores)
Company A
69 67
65 59
Company B
56 63
55
Company C
71 72 70
⇨ eBook ⇨ EX100301_JobSatisfaction.csv
1) Draw a histogram of the data to see whether the comparison of the job satisfaction
for the three companies can be made using a parametric test.
2) Using the Kruskal-Wallis test, which is a nonparametric test, find whether the three
companies have the same job satisfaction or not with the significance level of 5%
3) Check the above result of the Kruskal-Wallis test using『eStat』.
Answer
1) The parametric method for testing the hypothesis that three population means are
the same is the one-way analysis of variance studied in Chapter 9 and it requires
the assumption that the populations are normal distributions. Since the sample sizes
are small,  =4,  =3,  =3, in each of the population respectively we need to
examine if each sample data satisfy the normality assumption.
w Enter the data as shown in <Figure 10.3.1> in『eStat』.
10.3.1>
『<Figure
eStat』data input
94
/ Chapter 10 Nonparametric Testing Hypothesis

Example 10.3.1
Answer
(continued)
w Click the ANOVA icon
. Select ‘Score’ as ‘Analysis Var’ and ‘Company’ as ‘by
Group’ variable in the variable selection box. Then a dot graph with the 95%
confidence interval of each population mean will appear as in <Figure 10.3.2>.
Company C has the highest average of satisfaction scores, followed by Company A
and Company B. However, it should be tested if these differences are statistically
significant. Clicking the [Histogram] button in the options window below the graph
will reveal the histogram and its normal distribution curve for each company, as
in<Figure 10.3.3>.
<Figure 10.3.2> Dot graph and the
confidence interval by company
<Figure 10.3.3> Histogram by company
w Looking at the histogram, the data are not sufficient to assume that the population
follows a normal distribution, because the number of data is so small. In such a
case, applying the parametric hypothesis test such as the ANOVA F-test may lead to
errors. The hypothesis for this problem is to test whether the location parameters
 ,  ,  of the three populations are the same or not as follows:
      
  At least one pair of location parameters is not the same.
w The Kruskal–Wallis test combines all three samples into a single set of data and
calculate ranks of this data. If there is a tie, then the average rank will be
assigned. Then the sum of the ranks in each sample,    , is calculated. The
test statistic  for the Kruscal–Wallis test is similar to the  -test by converting
sample data into ranks as follows:


  
  

   




w To obtain ranks of the combined sample, it is convenient to arrange the data in
ascending order separately and then rank the whole data as shown in Table 10.3.1.
Table 10.3.1 A table to calculate the sum of ranks in each sample
Sample 1
Sorted Data
Sample 2
Sorted Data
55
56
Sample 3
Sorted Data
59
Sample
1 Rank
Sample
2 Rank
1
2
Sample
3 Rank
3
63
4
65
67
69
5
6
7
70
71
72
Sum of
ranks
8
9
10
 = 21
 = 7
 = 27
10.3 Nonparametric Test for Locations of Several
Populations / 95

Example 10.3.1
Answer
(continued)
w The total sum of ranks is    ⋯       . The sum of ranks for
sample 1 is  = 21, for sample 2 is  = 7, and for sample 3 is  = 27. When
the number of data in each sample is taken into account, if  ,  , and  are
similar, the null hypothesis that three population location parameters are the same
would be accepted. In this example, despite of the small sample size for sample 3,
 is larger thant  or  . Also  is larger than  . Based on these
differences, can you conclude that the three population location parameters are
statistically different?
w In the above example, the  statistic is as follows:




               
  



If the null hypothesis is true, the distribution of the test statistic should be known
to investigate how large a value of  is statistically significant. If   , the
number of cases for ranking {1,2,3, ⋯, 10} is   . It is not easy to
examine all of these possible rankings to create a distribution table of  .『eStat
U』shows the distribution of the Kruskal–Wallis  for  =4,  =3, and  =3 as
shown in <Figure 10.3.4>, and a part of the distribution table as in Table 10.3.2. As
shown in the figure, the distribution of  is an asymmetrical distribution.
<Figure 10.3.4> Kruskal Wallis H distribution when
 =4,  =3,  =3
Table 10.3.2 Kruskal Wallis H distribution when  =4,  =3,  =3
Kruskal Wallis H distribution
 =3
 = 4

= 3
≤

= 3
≥
x
P(X = x)
0.018
0.0162
0.0162
1.0000
0.045
0.0133
0.0295
0.9838
⋯
⋯
⋯
⋯
5.727
0.0048
0.9543
0.0505
5.791
0.0095
0.9638
0.0457
5.936
0.0019
0.9657
0.0362
5.982
0.0076
0.9733
0.0343
6.018
0.0019
0.9752
0.0267
6.155
0.0019
0.9771
0.0248
6.300
0.0057
0.9829
0.0229
6.564
0.0033
0.9862
0.0171
6.664
0.0010
0.9871
0.0138
6.709
0.0029
0.9900
0.0129
6.745
0.0038
0.9938
0.0100
7.000
0.0019
0.9957
0.0062
7.318
0.0019
0.9976
0.0043
7.436
0.0010
0.9986
0.0024
8.018
0.0014
1.0000
0.0014
P(X
x)
P(X
x)
96
/ Chapter 10 Nonparametric Testing Hypothesis

Example 10.3.1
Answer
(continued)
w
 test is a right tail test and the 5 percentile from the right corresponding to the
significance level is approximately P(X ≥ 5.727) = 0.0505. Note that there is no
exact 5 percentile in case of a discrete distribution. Hence, the decision rule to test
the null hypothesis is as follows:
‘If   , then reject  ’
Since  = 7.318 in this example, we reject  .
3) In『eStatU』, enter data as <Figure 10.3.5> and click the [Execute] button. Then the
sample statistics are calculated and the test result is shown as in <Figure 10.3.6>.
The critical line for values containing 5 percentile of the significance level is shown
here. For a discrete distribution, the choice of the final rejection region shall be
determined by the analyst.
<Figure 10.3.5>
『eStatU』 Kruskal-Wallis test
<Figure 10.3.6> Kruskal-Wallis test
10.3 Nonparametric Test for Locations of Several
Populations / 97

Example 10.3.1
Answer
(continued)
w 『eStat』may also be used to conduct the Kruskal–Wallis  test. Enter data as
<Figure 10.3.1> and click the ANOVA icon
. Select ‘Score’ as ‘Analysis Var’ and
‘Company’ as ‘by Group’ variable in the variable selection box. Then a dot graph
with the 95% confidence interval of the population mean in each company will
appear as <Figure 10.3.2>. If you press the [Kruskal–Wallis test] button in the
options window below the graph, the same test graph and test result table will
appear as in <Figure 10.3.7>.
<Figure 10.3.7> Result of the Kruskal-Wallis test
Ÿ
Let us generalize the Kruskal–Wallis  test described so far with an example.
Denote random samples collected independently from the  populations (at each
level of one factor) when their sample sizes are  ,  , ...,  as follows:
(      ⋯  ).
Table 10.3.3 Notation for random samples from each level
Level 1
Level 2




⋯
⋯


Level 1 Mean
Level 2 Mean


Ÿ
⋅


⋯
Level 


⋯
⋯

⋯
Level  Mean


⋅
⋅
Total Mean


⋅⋅
The statistical model of the Kruskal-Wallis test is as follows:

       ,   ⋯;   ⋯
Ÿ
where
   .
 

Here  represents the effect of the level  and ’s are independent and follow
the same continuous distribution.
The hypothesis of the Kruskal-Wallis test is as follows:
      ⋯  
  At least one pair of  is not equal.
98
/ Chapter 10 Nonparametric Testing Hypothesis

Ÿ
For the Kruskal–Wallis test, ranking data for the combined sample must be
created. Table 10.3.4 is a notation of ranking data for each level.
Table 10.3.4 Notation of ranking data in each level
Level 1
Level 2




⋯

Sum of ranks
in level 1
Sum of ranks
in level 2
⋅
⋅
Mean of ranks
in level 2




⋯

Mean of ranks
in level 1
⋅
Level 
⋯
⋯



Ÿ
⋯
Sum of ranks
in level  ⋅
⋯
Mean of ranks
in level 
⋯


⋅
⋅
Total mean of
ranks


⋅⋅
   
The sum of squares for the one-way analysis of variance studied in Chapter 9 by
using the ranking data in Table 10.3.4 are as follows:


 
 
    
 
 

   
    

 ⋅⋅   
·

 


 ⋅⋅        

 ⋅⋅ 
    
Ÿ
Also, the statistic for the  -test is as follows:






 



   


   








Ÿ
Since  is a constant, the statistic for the  -test is proportional to  .
The statistic for the Kruskal-Wallis test  is proportional to  as follows:

  
  

 
  
Ÿ
Ÿ

   
 


 


⋅

⋅   ⋅⋅ 

   

The multiplication constant  in the definition of  statistics is intended
  
to ensure that the statistic follows approximately the chi-square distribution with
   degrees of freedom.
The distribution of the Kruskal-Wallis test statistic  , denoted as ⋯  ,
can be obtained by considering all possible cases of ranks {1, 2, ⋯ ,  } which is
 .
eStatU provides the table of ⋯  up to    . ⋯  
denotes the right tail 100 ×  percentile, but it might not have the exact value of
this percentile, because ⋯  is a discrete distribution. In this case, the
middle of two adjacent values of 100 ×  percentile is often used. The decision
rule of the Kruskal-Wallis test is as Table 10.3.5.
『
』
10.3 Nonparametric Test for Locations of Several
Populations / 99

Table 10.3.5 Kruskal-Wallis test
Hypothesis
Decision Rule
Test Statistic: 
      ⋯  
If   ⋯  , then reject  ,
  At least one pair of  is not equal. else accept  .
☞
If there are tied values in the combined sample, assign the average
of ranks.
Ÿ
Ÿ
The distribution of the Kruskal-Wallis  statistic is independent of a population
distribution. In other words, the Kruskal-Wallis test is a distribution-free test.
If the null hypothesis is true and the sample size is large enough, the test
statistic  is approximated by the chi-square distribution with    degrees of
freedom. Table 10.3.6 summarizes the decision rule for the Kruskal-Wallis test in
case of large samples.
Table 10.3.6 Kruskal-Wallis test in case of large samples.
Hypothesis
Decision Rule
Test Statistic: 
      ⋯  
If         , then reject  ,
  At least one pair of  is not equal. else accept 
Ÿ
If there is a tie in the combined sample, the average rank is assigned to each
data. In this case, the statistic  shall be modified as follows:

′  




 


Here  = (number of tied groups)

 
      




 = (the size of the  tie group, i.e., the number of observations in the tie group)
if there is no tie, the size of the  tie group is 1 and =1.
[Practice 10.3.1]
A bread maker wants to compare the three new mixing methods of ingredients. 15
breads were made by each mixing method (A, B, C) of 5 pieces, and a group of
judges who did not know the difference in material mixing ratio gave the following
points. Test the null hypothesis that there is no difference in taste according to the
mixing methods at the significance level of 0.05.
Mixing ratio:
Method A:
72
88 70 87
Method B:
85 89
86 82
Method C:
94
94 88 87
⇨ eBook ⇨ PR100301_ScoreByMixingMethod.csv
71
90
89
100
/ Chapter 10 Nonparametric Testing Hypothesis

10.3.2 Randomized Block Design: Friedman Test
Ÿ
Ÿ
Example 10.3.2
In Section 9.2, we studied the randomized block design to measure the fuel
mileage of three types of cars which reduce the impact of the block factor, i.e.,
driver. If each population follows a normal distribution, sample data are analyzed
using the F-test based on the two-way analysis of variance without the
interaction. However, the assumption that a population follows a normal
distribution may not be appropriate for real-world data, or that there may not be
enough data to assume a normal distribution. Alternatively, if the data collected
might not be continuous and are ordinal such as ranks, then the parametric test
is not appropriate. In such cases, nonparametric tests are used to test parameters
by converting data to ranks without assuming the distribution of the population.
This section introduces the Friedman test corresponding to the randomized block
design experiments in Section 9.2.2.
Let us take a look at the Friedman test using [Example 9.2.1] which was the car
fuel mileage measurement problem.
The fuel mileage of the three types of cars (A, B and C) is measured using the
randomized block design as Table 9.2.4 and it is rearranged in Table 10.3.7.
Table 10.3.7 Fuel mileage of the three types of cars
Driver
(Block)
1
2
3
4
5
Car A
22.4
16.1
19.7
21.1
24.5
Car B
16.3
12.6
15.9
17.8
21.0
Car C
20.2
15.2
18.7
18.9
23.8
⇨ eBook ⇨ EX090201_GasMileage.csv
1) Draw a histogram of the data to see if the fuel mileage of the three cars can be
tested by a parametric method.
2) Using the Friedman test which is a nonparametric method of the randomized block
design, test whether the fuel mileage of the three types of cars are different with
the significance level of 5%.
3) Check the result of the above Friedman test using『eStatU』.
Answer
1) Enter data in『eStat』as shown in <Figure 10.3.8>.
10.3.8>
『<Figure
eStat』Data input
10.3 Nonparametric Test for Locations of Several Populations / 101

Example 10.3.2
Answer
(continued)
w Click icon
of the analysis of variance. Select ‘Miles’ as 'Analysis Var' and ‘Car’
as ‘by Group’. Then the dot graph by car type and the 95% confidence interval for
the population mean will appear. Again, clicking the [Histogram] button in the
options window below the graph will show the histogram and normal distribution
curve for each car type as shown in <Figure 10.3.9>.
<Figure 10.3.9> Histogram of fuel mileage by car
w Looking at the histogram, it is not sufficient to assume that each population follows
a normal distribution, because of the small number of data. In such case, applying
the parametric  -test may lead to errors.
w The hypothesis for this problem is to test whether or not the location parameters
 ,  ,  of the three populations are the same.
      
  At least one pair of location parameters is not equal.
w The Friedman test calculates the sum of ranks,    , for each of the three
types of cars after the ranking is calculated for the fuel mileage measured for each
driver (block) (Table 10.3.8). If there is a tie, then the average of ranks is assigned.
Table 10.3.8 Ranking in each of the block
1
2
3
4
5
Driver
(Block)
Sum of ranks
Car A
Car B
Car C
3
3
3
3
3
1
1
1
1
1
2
2
2
2
2
 =15
 =5
 =10
The sum of ranks for Car A is  = 15, for Car B is  = 5, for Car C is  =
10. The sum of ranks looks different. Are the differences statistically significant?
w The Friedman test statistic  can be considered as the  statistic in the two-way
analysis of variance to these ranking data as follows:

  
  





   ,  is the number of population.
102
/ Chapter 10 Nonparametric Testing Hypothesis

Example 10.3.2
Answer
(continued)
In this example,    and the  statistic is as follows:

           ×     
 ×   
The distribution of the test statistic  , when the null hypothesis is true, should be
known to investigate how large a value of  is statistically significant. Since the
number of cases of ranking when   ,    is   , it is not easy to
examine all of these possible rankings to obtain a distribution.『eStatU』provides
the distribution of the test statistic  in the case of   ,    as in <Figure
10.3.10> and its distribution table as Table 10.3.9. As shown in the graph, the
distribution of  is an asymmetrical distribution.
<Figure 10.3.10> Friedman S distribution when   ,   
Table 10.3.9 Friedman S distribution when   ,   
Friedman S
distribution
k = 3
n = 5
x
P(X = x)
P(X ≤ x)
P(X ≥ x)
0.000
0.0463
0.0463
1.0000
0.400
0.2623
0.3086
0.9537
1.200
0.1698
0.4784
0.6914
1.600
0.1543
0.6327
0.5216
2.800
0.1852
0.8179
0.3673
3.600
0.0579
0.8758
0.1821
4.800
0.0309
0.9066
0.1242
5.200
0.0540
0.9606
0.0934
6.400
0.0154
0.9761
0.0394
7.600
0.0154
0.9915
0.0239
8.400
0.0077
0.9992
0.0085
10.000
0.0008
1.0000
0.0008
w The Friedman test is a right sided test. If we look for the five percentile from the
right tail corresponding to significance level, the nearest value is P(X ≥ 6.4) =
0.0394. Since it is a discrete distribution, there is no exact value of five percentile.
Hence, the rejection region with the significance level of 5% can be written as
follows:
‘If  ≥ , then reject  ’
Since  = 10 in this example,  is rejected.
10.3 Nonparametric Test for Locations of Several Populations / 103

Example 10.3.2
Answer
(continued)
3) Enter data in『eStatU』as in <Figure 10.3.11> and click the [Execute] button. The
sample statistics and test graph will be shown as in <Figure 10.3.12>. The critical
line which contains 5% of the significance level is shown here. For a discrete
distribution, the choice of the final rejection region should be determined by the
analyst.
『
<Figure 10.3.11> Data input for Friedman test at eStatU
『
<Figure 10.3.12> Result of Friedman test using eStatU
Ÿ
』
』
Let's generalize the Friedman test described so far using the above example.
Assume that there are  number of levels and denote the rank of  number of
data as follows:
104
/ Chapter 10 Nonparametric Testing Hypothesis

Table 10.3.10 Notation of n random samples for k number of levels with randomized
block design
Treatment
Block
1
2
⋯
Level 1
Level 2




⋯
⋯






n
Mean
Ÿ
⋅
⋯
Level 


⋯
⋯



⋯
⋅
Total
 ⋅⋅
Mean 
⋅
A statistical model of the Friedman test is as follows:
  ⋯;   ⋯
         ,

Here  is the effect of level  which satisfies
  
 

and  is the effect of

block  which satisfies


Ÿ

  . ’s are independent and follows the same
continuous distribution.
The hypothesis of the Friedman test is as follows:
      ⋯  
  At least one pair of  is different
Ÿ
For the Friedman test, ranking data for each block must be created. Table 10.3.11
is the notation of ranking data for each level.
Table 10.3.11 Notation of rank data in each level
Treatment
Block
Level 1
Level 2




1
2
⋯
⋯
Level 


⋯
⋯
n


Sum of ranks
⋅
⋅
⋯
⋅




⋯


Average of
ranks
Ÿ
⋯
⋅
⋅
⋯

⋅
Average of
ranks


⋅⋅
   
If we apply the analysis of variance for the rank data of Table 10.3.11 instead of
the observation data in Section 9.2, the total sum of squares,  , and the block
sum of squares  are constants. The treatment sum of squares  is as
follows:

 
  
 
·

 ⋅⋅ 
     
Therefore, the  test statistic can be written as follows:
10.3 Nonparametric Test for Locations of Several Populations / 105

 
 
      , Here  is a constant.

    
Ÿ
That is, since  is a constant,  test statistic is proportional to .
The Friedman test statistic  is proportional to  as follows:


     
  
  

 
  

  
 

⋅   ⋅⋅ 



 

⋅
   

The reason  statistic has the constant multiplication of  is to make 
  
Ÿ
which follows a chi-square distribution with    degrees of freedom.
The distribution of the Friedman test statistic  is denoted as    . eStatU
provides the distribution of    up to  ≤  if    and up to  ≤  if
   .    denotes the right tail 100 ×  percentile, but there might not be
the exact percentile, because it is a discrete distribution. In this case, the middle
value of two nearest    is often used approximately. Table 10.3.12 is the
summary of decision rule of the Friedman test.
『
』
Table 10.3.12 Friedman Test
Hypothesis
Decision Rule
Test Statistic: 
      ⋯  
If      , then reject  ,
  At least one pair of  is different else accept 
☞
If there are tied values on each block, use the average rank.
Ÿ
Ÿ
The distribution of the Friedman statistic  is independent of the population
distribution. In other words, the Friedman test is a distribution-free test.
If the null hypothesis is true and if the sample is large enough, the test statistic
 is approximated by the chi-square distribution with    degrees of freedom.
Table 10.3.13 summarizes the decision rule for the Friedman test in case of large
sample.
Table 10.3.13 Friedman Test – large sample case
Hypothesis
Decision Rule
Test Statistic: 
      ⋯  
If         , then reject  , else
  At least one pair of  is different accept 
Ÿ
If there is a tie in the block, the average rank is assigned to each data. In this
case, the statistic  shall be modified as follows:
106
/ Chapter 10 Nonparametric Testing Hypothesis


′  




  


Here  = (number of tied groups)

 
      




 = (the size of the  tie group, i.e., the number of observations in the tie group).
If there is no tie, the size of the  tie group is 1 and  =1
[Practice 10.3.2]
The following is the result of an agronomist's survey of the yield of four varieties of
wheat by using the randomized block design of the three cultivated areas (block). Apply
the Friedman test whether the mean yields of the four wheats are the same or not
with the 5% significance level.
Cultivated Area
Wheat
Type
A
B
C
D
1
2
3
50
59
55
58
60
52
55
58
56
51
52
55
⇨ eBook ⇨ PR100302_WheatAreaYield.csv
Chapter 10 Exercise / 107

Exercise
10.1 A psychologist has selected 12 handicap workers randomly from production workers employed at
various factories in a large industrial complex and their work competency scores are examined as
follows. The psychologist wants to test whether the population average score is 45. Assume the
population distribution is symmetrical about the mean.
32, 52, 21, 39, 23, 55, 36, 27, 37, 41, 34, 51
1) Check whether a parametric test is possible.
2) Apply the sign test with the significance level of 5%.
3) Apply the Wilcoxon signed rank test with the significance level of 5%.
10.2 A tire production company wants to test whether a new manufacturing process can produce a
more durable tire than the existing process. The tire by a new process was tested to obtain the
following data: (unit: 1000 )
Existing Process
New Process
62
76
61
90
74
74
75
63
73
53
61
65
60
53
70
63
1) Check whether a parametric test is possible.
2) Apply the Wilcoxon rank sum test whether the new process and the existing
process have the same durability or not with the significance level of 5%.
10.3 A company wants to compare two methods of obtaining information about a new product. Among
company employees, 19 were randomly selected and divided into two groups. The first group
learned about the new product by the method A, and the second group learned by the method B.
At the end of the experiment, the employees took a test to measure their knowledge of the new
product and their test scores are as follows. Can we conclude from these data that the median
values of the two groups are different? Test with the significance level of 0.05.
Method A
50 59 60 71 80 78 72 77 73 75
Method B
52 54 58 78 65 61 60 72 60
10.4 10 men and 10 women working in the same profession were selected independently and surveyed
their monthly salaries. Can you say that a man in this profession earns more than a woman. Test
with the significance level of 0.05. (Unit: 10USD)
108
/ Chapter 10 Nonparametric Testing Hypothesis

Man
Woman
381
294
296
389
281
194
193
286
384
494
284
279
288
383
489
287
496
393
277
371
10.5 To find out the fuel mileage improvement effect of a new gasoline additive, 10 cars of the same
state were selected. The gas mileage was tested without gasoline additives and with additives
running the same road at the same speed and obtain the following data. Test whether the new
gasoline additive is effective in improving the fuel mileage with the significance level of 0.05.
gas mileage (unit: km/liter)
With additives
Without additives
11.7
13.8
11.2
7.7
8.2
16.3
14.2
19.4
13.9
15.5
10.3
12.9
12.5
9.5
11.2
14.6
15.9
18.5
12.0
15.1
10.6 In order to determine the efficacy of the new pain reliever, seven persons were tested with aspirin
and new pain reliever. The experimental time of the two pain relievers were sufficiently spaced,
and the order of the medication experiment was randomly determined. The time (in minutes) until
feeling pain relief was measured as follows. Do the data indicate that the new pain reliever has
faster pain relief than aspirin? Test with the significance level of 0.05.
Person ID
Aspirin
New pain reliever
1
2
3
4
5
6
7
15
7
20
14
12
13
20
11
17
10
14
16
17
11
10.7 A person was asked to taste 15 coffee samples to rank from 1 (hate first) to 15 (best). The 15
samples are taken from each of the three types of coffee (A, B, C) and are tasted in random
order. The following table shows the ranking of preference by the coffee type. Test the null
hypothesis that there is no difference in three types of coffee preferences at the significance level
of 0.05.
Coffee Type
A
B
C
Ranking
9
14
2
10
1
3
11
5
4
12
7
15
13
8
6
10.8 A bread maker wants to compare the four new mix of ingredients. 5 breads were made by each
mixing ratio of ingredients, a total of 20 breads, and a group of judges who did not know the
difference in mixing ratio of ingredients were given the following points. Test the null hypothesis
that there is no difference in taste according to the mixing ratio of ingredients at the significance
Chapter 10 Exercise / 109

level of 0.05.
Mixing Ratio
Method A
Method B
Method C
Method D
72
85
94
91
88
89
94
93
Scores
70
86
88
92
87
82
87
95
71
90
89
96
110
/ Chapter 10 Nonparametric Testing Hypothesis

Multiple Choice Exercise
10.1 What is NOT the reason to have a nonparametric test?
① Population is not normally distributed.
② Ordinal data.
③ Data follows a normal distribution.
④ There is an extreme point in sample.
10.2 Which of the following nonparametric tests is for testing the location parameter of single
population?
① Wilcoxon signed rank sum test
③ Kruskal-Wallis test
② Wilcoxon rank sum test
④ Friedman test
10.3 Which of the following nonparametric tests is for testing the location parameters of two
populations?
① Wilcoxon signed rank sum test
③ Kruskal-Wallis test
② Wilcoxon rank sum test
④ Friedman test
10.4 Which of the following nonparametric tests is for tesing the location parameters of multiple
populations?
① Wilcoxon signed rank sum test
③ Kruskal-Wallis test
② Wilcoxon rank sum test
④ Friedman test
10.5 Which of the following nonparametric tests is appropriate for testing of the randomized block
design method?
① Wilcoxon signed rank sum test
③ Kruskal-Wallis test
② Wilcoxon rank sum test
④ Friedman test
10.6 What is the sign test?
① Test for the location parameter of single population
② Test for two location parameters of two populations
③ Test for several location parameters of multiple populations
④ Test for the randomized block design
10.7 What is the transformation of data that is often used for nonparametric tests?
① log transformation
③ (0-1) transformation
② exponential transformation
④ ranking transformation
10.8 What is the test statistic used for the sign test?
① rank
③ degrees of freedom
② (number of + signs) ④ (number of + signs)
(number of - signs)
10.9 What is the test statistic used for testing two location parameters of two populations using a
nonparametric test?
Chapter 10 Multiple Choice Exercise / 111

① (number of + signs)
② sum of ranks in population 2
③ (number of - signs)
④ (sum of ranks in population 1) + (sum of ranks in population 2)
10.10 What is the theoretical basis for the  statistic used for the Kruskal-Wallis test?
① Within sum of squares of rank data
② Error sum of squares of rank data
③ Total sum of squares of rank data
④ Treatment sum of squares of rank data
(Answers)
10.1 ③, 10.2 ①, 10.3 ②, 10.4 ③, 10.5 ④, 10.6 ①, 10.7 ④, 10.8 ④, 10.9 ②, 10.10 ④,
♡
11
Testing Hypothesis for
Categorical Data
SECTIONS
CHAPTER OBJECTIVES
11.1 Goodness of Fit Test
11.1.1 Goodness of Fit Test for
Categorical Data
11.1.2 Goodness of Fit Test for
Continuous Data
The hypothesis tests that we have studied
from Chapter 7 to Chapter 10 are for
continuous data. In this chapter, we describe
testing hypothesis for categorical data.
11.2 Testing Hypothesis for Contingency
Table
11.2.1 Independence Test
11.2.2 Homogeneity Test
Section 11.1 describes the goodness of fit
test for the frequency table of categorical
data.
Section 11.2 describes the independence
and homogeneity tests for the contingence
table of two categorical data.
114
/ Chapter 11 Testing Hypothesis for Categorical Data

11.1 Goodness of Fit Test
Ÿ
The frequency table of categorical data discussed in Chapter 4 counts the
frequency of possible values of a categorical variable. If this frequency table is for
sample data from a population, we are curious what would be the frequency
distribution of the population. The goodness of fit test is a test on the hypothesis
that the population follows a particular distribution based on the sample
frequency distribution. In this section, we discuss the goodness of fit test for
categorical distributions (Section 11.1.1) and the goodness of fit test for
continuous distribution (Section 11.1.2).
11.1.1 Goodness of Fit Test for Categorical Data
Ÿ
Example 11.1.1
Answer
Consider the goodness of fit test for a categorical distribution using the example
below.
The result of a survey of 150 people before a local election to find out the approval
ratings of three candidates is as follows. Looking at this frequency table alone, it seems
that A candidate has a 40 percent approval rating, higher than the other candidates.
Based on this sample survey, perform the goodness of fit test whether three candidates
have the same approval rating or not. Use『eStatU』with the 5% significance level.
Candidate
Number of
Supporters
Percent
A
B
C
60
50
40
40.0%
33.3%
25.7%
Total
150
100%
w Assume each of candidate A, B, and C’s approval rating is    respectively.
The hypothesis for this problem is as follows:

 : The three candidates have the same approval rating. (i.e.,        )

 : The three candidates have different approval ratings.
w If the null hypothesis  is true that the three candidates have the same approval

rating, each candidate will have   ×   supporters out of total 150

people. It is referred to as the ‘expected frequency’ of each candidate when  is
true. For each candidate, the number of observed supporters in the sample is
called the 'observed frequency'. If  is true, the observed and expected number
of supporters can be summarized as the following table.
Candidate
Observed frequency
(denoted as  )
Expected frequency
(denoted as  )
A
B
C
 = 60
 = 50
 = 40
 = 50
 = 50
 = 50
Total
150
150
11.1 Goodness of Fit Test / 115

Example 11.1.1
Answer
(continued)
w If  is true, the observed frequency (  ) and the expected frequency (  ) will
coincide. Therefore, in order to test the hypothesis, a statistic which uses the
squared difference between  and  is used. Specifically, the statistic to test the
hypotheses is as follows:
   
    
    
       






If the observed value of this test statistic is close to zero, it can be considered that
 is true, because  is close to  . If the observed value is large,  will be
rejected. The question is, 'How large value of the test statistic would be considered
as the statistically significant one?' It can be shown that this test statistic
approximately follows the chi-square distribution with    degrees of freedom if
the expected frequency is large enough. Here  is the number of categories (i.e.,
candidates) in the table and it is 3 in this example. Therefore, the decision rule to
test the hypotheses is as follows:
‘If          , reject  , else do not reject  ’
w The statistic   can be calculated as follows:
  
   
  
         



Since the significance level  is 5%, the critical value can be found from the
chi-square distribution as follows:
                   
Therefore,  cannot be rejected. In other words, although the above sample
frequency table shows that the approval ratings of the three candidates differ, this
difference does not provide sufficient evidence to conclude that the three
candidates have different approval ratings.


     , 
     ,
w Using each candidate's sample approval rating 




     , 95% confidence intervals for the population proportion of each


candidate's approval rating using the formula 
 ± 
   
  ) (refer
Chapter 6.4) are as follows:

⋅



⋅
B :  ±  


⋅
C :  ±  

A :  ± 
⇔
 
⇔
 
⇔
 
The overlapping of the confidence intervals on the three candidates' approval ratings
means that one candidate's approval rating is not completely different from the
other.
w In the data input box that appears by selecting the 'Goodness of Fit Test' of
『eStatU』, enter the 'Observed Frequency' and 'Expected Probability' data as
shown in <Figure 11.1.1>. After entering the data, select the significance level and
click [Execute] button to calculate the 'Expected Frequency' and to see the result of
the chi-square test. Be sure that this chi-square goodness of fit test should be
applied when the expected frequency of each category is at least 5.
116
/ Chapter 11 Testing Hypothesis for Categorical Data

Example 11.1.1
Answer
(continued)
<Figure 11.1.1> Goodness of fit test in
<Figure 11.1.2>
Ÿ
Ÿ
『eStatU』
『eStatU』Chi-square Goodness of Fit Test
Consider a categorical variable X which has  number of possible values
  ⋯  and their probabilities are   ⋯  respectively. In other words,
the probability distribution for the categorical variable X is as follows:



⋯

Total
   


⋯

1
When random samples are collected from the population of the categorical
random variable X and their observed frequencies are    ⋯   , the
11.1 Goodness of Fit Test / 117

hypothesis
to
test
the
population
probability
distribution
of
   ⋯       ⋯   is as follows:
  Distribution of      are from the distribution
 ⋯       ⋯  
  Distribution of      are not from the distribution
 ⋯       ⋯  
Ÿ
If the total number of samples  is large enough, the above hypothesis can be
tested using the chi-square test statistic as follows:
   

‘If   


 

         , then reject  ’
Here,    ⋯      ⋯   are expected frequencies,  is the
number of population parameters estimated from the sample data. In [Example
11.1.1], since there was not a population parameter estimated from the sample,
  .
Goodness of Fit Test
Consider a categorical variable  which has  number of possible
values   ⋯  and their probabilities are   ⋯  respectively. Let
observed frequencies for each value of  from  samples are
  ⋯   , expected frequencies for each value of  from 
samples are    ⋯      ⋯   and the significance
level is α.
Hypothesis:
  Distribution of      follows    ⋯   
  Distribution of      does not follow    ⋯  
Decision Rule:
     


         , then reject  ’
‘If   


 


 is the number of population parameters estimated from the samples.
☞
In order to use the chi-square Goodness of Fit test, all expected
frequencies  should be greater than 5.
A category which has an expected frequency less than 5 can be
merged with other category.
[Practice 11.1.1]
Market shares of toothpaste A, B, C and D are known to be 0.3, 0.6, 0.08, and 0.02
respectively. The result of a survey of 100 people for the toothpaste brands are as
follows. Can you conclude from these data that the known market share is incorrect?
Use 『eStatU』.   .
Brand
Number of
Customers
A
B
C
D
Total
192
342
44
22
600
118
/ Chapter 11 Testing Hypothesis for Categorical Data

11.1.2 Goodness of Fit Test for Continuous Data
Ÿ
Example 11.1.2
The goodness of fit test for categorical data using the chi-square distribution can
also be used for continuous data. The following is an example of the goodness of
fit test in which data are derived from a population of a normal distribution. The
parametric statistical tests from Chapter 6 to Chapter 9 require the assumption
that the population is normally distributed and the goodness of fit test in this
section can be used to test for normality.
Ages of 30 people who visited a library in the morning is as follows. Test the
hypothesis that the population is normally distributed at the significance level of 5%.
28 55 26 35 43 47 47 17 35 36 48 47 34 28 43
20 30 53 27 32 34 43 18 38 29 44 67 48 45 43
⇨ eBook ⇨ EX110102_AgeOfLibraryVisitor.csv
Answer
w Age is a continuous variable, but you can make a frequency distribution by dividing
possible values into intervals as we studied in histogram of Chapter 3. It is called a
categorization of the continuous data.
w Let's find a frequency table which starts at the age of 10 with the interval size of
10. The histogram of『eStat』makes this frequency table easy to obtain. If you
enter the data as shown in <Figure 11.1.3>, click the histogram icon and select Age
from the variable selection box, then the histogram as <Figure 11.1.4> will appear.
<Figure 11.1.3>
Data input
at eStat
『
』
<Figure 11.1.4> Default histogram of age
w If you specify 'start interval' as 10 and 'interval width' as 10 in the options window
below the histogram, the histogram of <Figure 11.1.4> is adjusted as <Figure
11.1.5>. If you click [Frequency Table] button, the frequency table as shown in
<Figure 11.1.6> will appear in the Log Area. The designation of interval size can be
determined by a researcher.
11.1 Goodness of Fit Test / 119

Example 11.1.2
Answer
(continued)
<Figure 11.1.5> Adjusted histogram of age
<Figure 11.1.6> Frequency table
of the adjusted histogram
w Since the normal distribution is a continuous distribution defined at
 ∞    ∞, the frequency table of <Figure 11.1.6> can be written as follows:
Table 11.1.2 Frequency table of age with adjusted interval
Interval id
1
2
3
4
5
6
Observed
frequency
Interval
20
30
40
50
60
≤
≤
≤
≤
≤
X
X
X
X
X
X
<
<
<
<
<
20
30
40
50
60
2
6
8
11
2
1
w The frequency table of sample data as Table 11.1.2 can be used to test the
goodness of fit whether the sample data follows a normal distribution using the
chi-square distribution. The hypothesis of this problem is as follows:
  Sample data follow a normal distribution
  Sample data do not follow a normal distribution
w This hypothesis does not specify what a normal distribution is and therefore, the
population mean  and the population variance  should be estimated from
sample data. Pressing the 'Basic Statistics' icon on the main menu of『eStat』will
display a table of basic statistics in the Log Area, as shown in <Figure 11.1.7>. The
sample mean is 38.567 and the sample standard deviation is 12.982.
<Figure 11.1.7>
Descriptive
statistics of age
120
/ Chapter 11 Testing Hypothesis for Categorical Data

Example 11.1.2
Answer
(continued)
Hence, the above hypothesis can be written in detail as follows:
  Sample data follow    .
  Sample data do not follow     .
w In order to find the expected frequency of each interval when  is true, the
expected probability of each interval is calculated first using the normal distribution
    as follows. The normal distribution module of『eStatU』makes
it easy to calculate this probability of an interval. At the normal distribution module
of『eStatU』, enter the mean of 38.000 and the standard deviation of 11.519. Click
the second radio button of    ≤  type and enter 20, then press the [Execute]
button to calculate the probability as shown in <Figure 11.1.8>.
  

                 
『
<Figure 11.1.8> Calculation of normal probability using eStatU
』
Similarly you can calculate the following probabilities.
  
  


  
  
  ≤        ≤         ≤     


  
  
  ≤        ≤        ≤     


  
  
  ≤        ≤        ≤     


  
  ≥     ≥      ≥   

  ≤        ≤         ≤     
w Expected frequency can be calculated by multiplying the sample size of 30 to the
expected probability of each interval obtained above. The observed frequencies,
expected probabilities, and expected frequencies for each interval can be
summarized as the following table.
11.1 Goodness of Fit Test / 121

Example 11.1.2
Answer
(continued)
Table 11.1.3 Observed and expected frequencies of each interval of
    distribution
Interval id
1
2
3
4
5
6
Interval
20
30
40
50
60
≤
≤
≤
≤
≤
X
X
X
X
X
X
<
<
<
<
<
20
30
40
50
60
Observed
frequency
Expected
probability
Expected
frequency
2
6
8
11
2
1
0.059
0.185
0.325
0.282
0.121
0.028
1.77
5.55
9.75
8.46
3.63
0.84
w Since the expected frequencies of the 1st and 6th interval are less than 5, the
intervals should be combined with adjacent intervals for testing the goodness of fit
using the chi-square distribution as Table 11.1.4. The expected frequency of the last
interval is still less than 5, but, if we combine this interval, there are only three
intervals, we demonstrate the calculation as it is. Note that, due to computational
error, the sum of the expected probabilities may not be exactly equal to 1 and the
sum of the expected frequencies may not be exactly 30 in Table 11.1.4.
Table 11.1.4 Revised table after combining interval of small expected
frequency
Interval id
Interval
Observed
frequency
Expected
probability
Expected
frequency
1
2
3
4
X < 30
30 ≤ X < 40
40 ≤ X < 50
50 ≤ X
8
8
11
3
0.244
0.325
0.282
0.149
7.32
9.75
8.46
4.47
Total
30
1.000
30.00
w The test statistic for the goodness of fit test is as follows:
  
  
  
  
           




Since the number of intervals is 4,  becomes 4, and   , because two
population parameters  and  are estimated from the sample data. Therefore,
the critical value is as follows:
                       
The observed test statistic is less than the critical value, we can not reject the null
hypothesis that the sample data follows     .
w Test result can be verified using 'Goodness of Fit Test' in 『eStatU』. In the Input
box that appears by selecting the 'Goodness of Fit Test' module, enter the data for
'observation frequency' and 'expected probability' in Table 11.1.4, as shown in
<Figure 11.1.9>. After entering the data, select the significance level and press the
[Execute] button to calculate the 'expected frequency' and produce a chi-square test
result (<Figure 11.1.10>).
122
/ Chapter 11 Testing Hypothesis for Categorical Data

Example 11.1.2
Answer
(continued)
<Figure 11.1.9> Data input for goodness of fit test in
『eStatU』
『
<Figure 11.1.10> Chi-square goodness of fit test using eStatU
[Practice 11.1.2]
』
(Otter length)
Data of 30 otter lengths can be found at the following location of『eStat』.
⇨ eBook ⇨ PR110102_OtterLength.csv.
Test the hypothesis that the population is normally distributed at the significance level
of 5% using『eStat』.
11.2 Testing Hypothesis for Contingency Table / 123

11.2 Testing Hypothesis for Contingency Table
Ÿ
The contingency table or cross table discussed in Chapter 4 was a table that
placed the possible values of two categorical variables in rows and columns,
respectively, and examined frequencies of each cell in which the values of the
two variables intersect. If this contingency table is for sample data taken from a
population, it is possible to predict what would be the contingency table of the
population. The test for the contingency table is usually an analysis of the relation
between two categorical variables and it can be divided into the independence
test and homogeneity test according to the sampling method for obtaining the
data.
11.2.1 Independence Test
Ÿ
Example 11.2.1
The independence test of the contingency table is to investigate whether two
categorical variables are independent when samples are extracted from one
population. Consider the independence test with the following example.
In order to investigate whether college students who are wearing glasses are
independent by gender, a sample of 100 students was collected and its contingency
table was prepared as follows:
Table 11.2.1 Wearing glasses by gender
Wear Glasses
Men
Women
Total
No Glasses
Total
40
20
10
30
50
50
60
40
100
⇨ eBook ⇨ EX110201_GlassesByGender.csv.
1) Using『eStat』, draw a line graph of the use of eyeglasses by men and women.
2) Test the hypothesis at 5% of the significance level to see if the gender variable and
the wearing of glasses are independent or related to each other.
3) Check the result of the independence test using『eStatU』.
Answer
1) Enter data in 『eStat』 as shown in <Figure 11.2.1>.
<Figure 11.2.1> Data input
w Select 'Line Graph' icon from the main menu. If you click variables ‘Gender’,
‘Glasses’, ‘NoGlasses’ one by one, then a line graph as shown in <Figure 11.2.2>
will appear in the Graph Area. If you look at the line graph, you can see that the
ratio of wearing glasses for men and women are different. For men, there are
many students who do not wear glasses (80% of men) and for women, 60% of
women do. In such cases, the gender variable and the wearing of glasses are
considered related. As such, when two variables are related, two lines of the line
graph intersect to each other.
124
/ Chapter 11 Testing Hypothesis for Categorical Data

Example 11.2.1
Answer
(continued)
<Figure 11.2.2> Line graph of wearing
glasses by gender
2) If two variables are not related (i.e., if the two variables are independent of each
other), the contingency table in Table 11.2.1 will show that the proportion of
wearing glasses by men or women is equal to 60% which is the proportion of all
students wearing glasses. In other words, if two variables are independent, the
contingency table should be as follows:
Table 11.2.2 Contingency table when gender and wearing
glasses are independent
Wear
Glasses
No
Glasses
Total
Men
Women
30
30
20
20
50
50
Total
60
40
100
w If there is little difference between the observed contingency table and the
contingency table in the case of independence, two categorical variables are said to
be independent of each other. If the differences are very large, two categorical
variables are related to each other. The independence test is a statistical method
for determining that two categorical variables of the population are independent of
each other by using the observed contingency table obtained from the sample. The
independent test uses the chi-square distribution and the hypothesis is as follows:
  Two variables of the contingency table are independent of each other.
  Two variables of the contingency table are related.
w The test statistic for testing this hypothesis utilizes the difference between the
observed frequency of the contingency table in the sample and the expected
frequency of the contingency table when two variables are assumed to be
independent which is similar to the goodness of fit test. The test statistic in this
example is as follows:
  
  
   
  
           




This test
freedom
variable)
variable).
statistic follows a chi-square distribution with      degrees of
where  is the number of rows (number of possible values of row
and  is the number of columns (number of possible values of column
Therefore, the decision rule to test the hypothesis is as follows:
11.2 Testing Hypothesis for Contingency Table / 125

Example 11.2.1
Answer
(continued)
‘If           , then reject  .’
In
this
example,
   
is
greater
than
the
critical
value
than
      
               . Therefore, the null hypothesis
that two variables are independent each other is rejected and we conclude that the
gender and wearing glasses are related.
3) In the independence test of『eStatU』, enter data as shown in <Figure 11.2.3> and
press the [Execute] button to display the result of the chi-square test as shown in
<Figure 11.2.4>.
<Figure 11.2.3>
『
『eStatU』Test of Independence
』
<Figure 11.2.4> eStatU Chi-square test of independence
Ÿ
Assume that there are  number of attributes of the variable A such as
    , and  number of attributes of the variable B such as       .
Let  denote the probability of the cell of  and   attribute in the
contingency table of A and B as Table 11.2.3. Here ․      ···  
denotes the probability of  and ⋅      ···   denotes the
probability of   .
126
/ Chapter 11 Testing Hypothesis for Categorical Data

Table 11.2.3 Notation of probabilities in  ×  contingency
table
Variable 
   · · · · 
Vaiable 


·
·
·

Total
Ÿ
Total
  · · · ·
  · · · ·



  · · · ·



·
·



·
· · · · · ·
·
1
If two events  and   are independent,   ∩      ·    and hence,
  ·⋅·. If two variables  and  are independent, all  and   should
satisfy the above property which is called the independent test.
  Variables  and  are independent.
i.e.   ·⋅·    ···  
    ··· 
  Variables  and  are not independent.
Ÿ
In order to test whether two variables of the population are independent, let us
assume the observed frequencies,  ’s, of the contingency table from  samples
are as follows:
Table 11.2.4 Observed frequency  of  ×  contingency table
Variable  

·
·
·
Total
  · · · 
  · · · 
·
·
·
·
·
·
·
·
 

·
·
·
·
·
·
·
·
· ·
· ·
· ·
· 
·  · · · · ·  ·
Total
Ÿ
Variable 
  · · · ·  
·
·
·
·

If the null hypothesis  is true, i.e., if two variables are independent of each
other, the expected frequency of the sample data will be  ··. Since we do
not know the population · and · , if we use the estimates of · and · ,
then the estimate of the expected frequency,  , is as follows:
·  ·
·
        ·   

Ÿ


The expected frequencies in case of independent can be explained that the
proportions of each attribute of the B variable,    ⋯ ·  , are
maintained in each attribute of the A variable.
11.2 Testing Hypothesis for Contingency Table / 127

Table 11.2.5 Expected frequency  of ×  contingency table
Variable 

Variable 
· · ·



   

    · · ·
 
   


   

    · · ·
 
   


·
·
·








   

Ÿ



   

· · ·

    

The test statistic utilizes the difference between  and  as follows:



    
   



This test statistic follows approximately a chi-square distribution with     
degrees of freedom. Therefore, the decision rule to test the hypothesis with
significance level of α is as follows:

‘If   
   

 

    

        , then reject  ’
Independence Test
Hypothesis:
  Variables  and  are independent.
i.e.,   ·⋅·    ···       ··· 
  Variables  and  are not independent.
Decision Rule:

‘If   


    
   
        , then reject  ’



where  is the number of attributes of row variable and  is the
number of attributes of column variable.
In order to use the chi-square distribution for the independence test,
all expected frequencies are at least 5 or more.
☞
If an expected frequency of a cell is smaller than 5, the cell is
combined with adjacent cell for analysis.
Ÿ
Consider an example of the independent test with many rows and columns.
128
/ Chapter 11 Testing Hypothesis for Categorical Data

Example 11.2.2
A market research institute surveyed 500 people on how three beverage products (A, B
and C) are preferred by region and obtained the following contingency table.
Table 11.2.6 Survey for preference of beverage by region
New York
Region Los Angels
Atlanta
Total
A
52
60
50
162
Beverage
B
64
59
65
188
C
24
52
74
150
Total
140
171
189
500
⇨ eBook ⇨ EX110202_BeverageByRegion.csv.
1) Draw a line graph of beverage preference by region using『eStat』and analyze the
graph.
2) Test whether the beverage preference by the region is independent of each other at
the significance level of 5%.
3) Check the result of the independence test using『eStatU』.
Answer
1) Enter the data in『eStat』 as shown in <Figure 11.2.5>.
<Figure 11.2.5> Data input
w Select 'Line Graph' and click variables ‘Region’, ‘A’, ‘B’, and ‘C’ in order, then the
line graph shown in <Figure 11.2.6> will appear. If you look at the line graph, you
can see the cross-section of the lines from region to region, and the regional
preference is different. Can you statistically conclude that the region and beverage
preference are related?
<Figure 11.2.6> Line graph by region and
beverage
2) The hypothesis for the independence test is as follows:
  Region and beverage preference are independent.
  Region and beverage preference are not independent.
11.2 Testing Hypothesis for Contingency Table / 129

Example 11.2.2
Answer
(continued)
w In order to calculate the expected frequencies, we first calculate the proportions of
each beverage preference without considering the region as follows:



  
  
w If two variables are independent, these proportions should be kept in each region.
Hence, the expected frequencies in each region can be calculated as follows:









   ×       ×       ×   



   ×       ×       ×   



   ×       ×       ×   



w The chi-square test statistic and critical value are as follows:
 


   
  
  
  
     ···  
 




    
      
 


       

   
 
Therefore, the null hypothesis  is rejected at the significance level of 5% and
conclude that the region and beverage are related.
3) In the independence test of『eStatU』, enter data as shown in <Figure 11.2.7> and
click the [Execute] button to display the result of the chi-square test as shown in
<Figure 11.2.8>.
『
』
<Figure 11.2.7> Data input for Independence Test
at eStatU
『
』
<Figure 11.2.8> Chi-square Independence Test
at eStatU
130
/ Chapter 11 Testing Hypothesis for Categorical Data

Ÿ
As described in Chapter 4, if a contingency table is made using raw data (<Figure
11.2.9>), eStat provides the result of the independence test as shown in <Figure
11.2.10>. In this case, if a cell of the contingency table has a small expected
number, the test result should be interpreted carefully.
『
』
<Figure 11.2.9> Raw data input for
independence test
『
』
<Figure 11.2.10> eStat contingency table and independence
test
[Practice 11.2.1]
A guidance counselor surveyed 100 high school students for reading and watching TV.
The following table was obtained by classifying each item as high and low. Using the
significance level of 0.05, are these data sufficient to claim that the reading and TV
viewing are related? Check the test result using『eStatU』.
Reading
Total
High
Low
TV viewing High
TV viewing Low
40
31
18
11
58
42
Total
71
29
100
⇨ eBook ⇨ EX110201_TV_Reading.csv.
11.2 Testing Hypothesis for Contingency Table / 131

11.2.2 Homogeneity Test
Ÿ
The independence test described in the previous section were for the contingency
table of two categorical variables based on sample data from one population.
However, similar contingency table may be taken from several populations, where
each sample is drawn from such a different population. It can often be seen
when the research is more efficiently to be done or when time and space
constraints are imposed. For example, if you want to compare the English scores
of freshman, sophomore, junior and senior students in a university, it is
reasonable to take samples from each grade and analyze them. In this case, the
contingency table is as follows:
Table 11.2.7 A contingency table of English score by grade level
English
score
Ÿ
Freshman
Sophomore
Junior
Senior
-
-
-
-
A
B
C
D
If this contingency table is derived from each grade population, the question we
are curious is not an independence of the English score and grade level, but four
distributions of English scores are equal. The hypothesis for a contingency table of
samples drawn from multiple populations is as follows. It is called the
homogeneity test.
  Distributions of several populations for a categorical variable are homogeneous.
  Distributions of several populations for a categorical variable are not homogeneous.
Ÿ
The test statistic for the homogeneity test is the same as the independence test
as follows:


   
 

 

Here  is the number of attributes of the categorical variable and  is the
number of populations.
Homogeneity Test
Hypothesis:
  Several population distributions for a categorical variable are homogeneous.
  Several population distributions for a categorical variable are not
homogeneous.
Decision Rule:

If   

   
 

    

        , then reject 
Here  is the number of attributes of the categorical variable and  is the number of
populations.
☞
In order to use the chi-square distribution for the homogeneity test,
all expected frequencies are at least 5 or more.
If an expected frequency of a cell is smaller than 5, the cell is
combined with adjacent cell for analysis.
132
/ Chapter 11 Testing Hypothesis for Categorical Data

Example 11.2.3
In order to investigate whether viewers of TV programs are different by age for three
programs (A, B and C), 200, 100 and 100 samples were taken separately from the
population of young people(20s), middle-aged people (30s and 40s), and older people
(50s and over) respectively. Their preference of the program were summarized as
follows. Test whether TV program preferences vary by age group at the significance
level of 5%.
Table 11.2.8 Preference of TV program by age group
A
B
C
TV
Program
Total
Answer
Young
Middle
Aged
Older
Total
120
30
50
10
75
15
10
30
60
140
135
125
200
100
100
400
w The hypothesis of this problem is as follows:
  TV program preferences for different age groups are homogeneous.
  TV program preferences for different age groups are not homogeneous.
w Proportions of the number of samples for each age group are as follows:
  
  
  
Therefore, the expected frequencies of each program when  is true are as
follows:




   ×       ×   


   ×   












   ×       ×       ×   
   ×       ×       ×   
w Test statistic and critical value are as follows:

  

   
  
  
   
    ⋯   
 




    

                      
Since   is greater than the critical value,  is rejected. TV programs have
different preferences for different age groups.
11.2 Testing Hypothesis for Contingency Table / 133

[Practice 11.2.2]
To evaluate the effectiveness of typing training, 100 documents by company employees
who received type training and 100 documents by employees who did not receive
typing training were evaluated. Evaluated documents are classified as good, normal, and
low. The following table shows a classification of the evaluation for total 200
documents according to whether or not they received training. Test the null hypothesis
that distributions of the document evaluation are the same in both populations. Use
   and check your test result using『eStatU』.
Training
Total
Document Evaluation
Typing training
No typing training
Good
Normal
Low
48
39
13
12
26
62
60
65
75
Total
100
100
200
134
/ Chapter 11 Testing Hypothesis for Categorical Data

Exercise
11.1 300 customers selected randomly are asked on which day of the week they usually went to the
grocery store and received the following votes. Can you conclude that the percentage of days
customers prefer is different? Use the 5% significance level. Check the test result using 『eStatU』.
Day
Number of Customers
Mon
10
Tue
20
Wed Thr Fri Sat Sun
40 40 80 60 50
Total
300
11.2 The market shares of toothpaste brands A, B, C and D are known to be 0.3, 0.6, 0.08, and 0.02
respectively. The result of a survey of 600 people for the toothpaste brands are as follows. Can
you conclude from these data that the existing market share is incorrect? Use    and
check your test result using 『eStatU』.
Brand
Number of Customers
A
192
B
342
C
44
D
22
Total
600
11.3 The following table shows the distribution by score by conducting an aptitude test on 223 workers
at a plant. The mean and variance from the sample data are 75 and 386 respectively. Test
whether the scores of the aptitude test follow a normal distribution. Use    and check your
test result using 『eStatU』.
Score interval
X < 40
40 ≤ X < 50
50 ≤ X < 60
60 ≤ X < 70
70 ≤ X < 80
80 ≤ X < 90
90 ≤ X < 100
X ≥ 100
Total
Number of Workers
10
12
17
37
55
51
34
7
223
11.4 The following data shows the highest temperature of a city during the month of August. Test
whether the temperature data follow a normal distribution with the 5% significance level. (Unit: °C)
29, 29, 34, 35, 35, 31, 32, 34, 38, 34, 33, 31, 31, 30, 34, 35,
34, 32, 32, 29, 28, 30, 29, 31, 29, 28, 30, 29, 29, 27, 28.
11.5 For market research, a company obtained data on the educational level and socio-economic status
of 375 housewives and summarized a contingency table as follows. Test the null hypothesis that
social and economic status and educational level are independent at the significance level of 0.05.
Check the test result using 『eStatU』.
Socio-economic
status
1
2
3
4
5
6
Total
Elementary
10
14
9
7
3
2
45
Education Level
Middle
High
7
3
10
7
25
13
9
38
8
14
3
8
62
83
College
4
4
18
44
18
10
98
Above
1
2
3
6
62
13
87
Total
25
37
68
104
105
36
375
Chapter 11 Exercise / 135

11.6 Government agencies surveyed workers who wanted to get a job and classified 532 respondents
according to the gender and technical level as follows. Does these data provide sufficient evidence
that the technical level and gender are related? Use    and check your test result using
『eStatU』.
Technical Level
Gender
Male
Total
Female
Skilled worker
Semi-skilled worker
Unskilled worker
106
93
215
6
39
73
112
132
288
Total
414
118
532
11.7 A guidance counselor surveyed 110 high school students for reading and watching TV. The
following table was obtained by classifying each item as high and low. At the significance level of
0.05, are these data sufficient to claim that the reading and TV viewing are related? Check the
test result using 『eStatU』.
Reading
Total
High
Low
TV viewing High
TV viewing Low
40
41
18
11
58
52
Total
81
29
110
11.8 165 defective products produced in two plants operated by the same company were classified
depending on whether they were due to low occupational awareness or low quality raw materials
by each plant. Test the null hypothesis that the cause of the defect and production plant are
independent with the significance level of 0.05. Check the test result using 『eStatU』.
Plant
Total
Cause of defect
A
B
low occupational awareness
low quality raw materials
21
46
72
26
93
72
Total
67
98
165
11.9 To evaluate the effectiveness of typing training, 110 documents by company employees who
received type training and 120 documents by employees who did not receive typing training were
evaluated. Evaluated documents are classified as good, normal, and low. The following table
shows a classification of the evaluation for total 230 documents according to whether or not they
received training. Test the null hypothesis that typing training and document evaluation are
independent. Use    and check your test result using 『eStatU』.
Evaluation
Good
Normal
Low
Total
Typing training
No typing
training
48
12
39
36
23
72
110
120
Total
60
75
95
230
136
/ Chapter 11 Testing Hypothesis for Categorical Data

11.10 A company with three large plants applied different working conditions and wage systems to
three plants to ask them for satisfaction with the new system six months later. 250 workers from
each of three plants were randomly selected and the survey results were as follows. Is there
sufficient evidence that workers at each plant have different satisfaction levels? Test with the
significance level of 0.05. Check the test result using 『eStatU』.
Job Satisfaction
Plant
Very satisfied Satisfied
Average
Not satisfied
Total
Plant 1
Plant 2
Plant 3
135
145
140
70
80
75
25
15
20
20
10
15
250
250
250
Total
420
225
60
45
750
Chapter 11 Multiple Choice Exercise / 137

Multiple Choice Exercise
11.1 What tests do you need to investigate whether the sample data follow a theoretical distribution?
① Goodness of fit test
③ Test for population proportion
② Independence test
④ Test for two population means
11.2 In order to test whether sample data of a continuous variable follow a distribution, what is the first
necessary work for the goodness of fit test?
① log transformation
③ [0,1] transformation
② frequency distribution of interval
④ frequency distribution
11.3 How do you test the hypothesis that the two categorical variables of a sample from a population
have no relation?
① Goodness of fit test
③ Test for population proportion
② Independence test
④ Test for homogeneity
11.4 How do you test the hypothesis that the samples from two categorical populations have the same
distribution?
① Goodness of fit test
③ Test for population proportion
② Independence test
④ Test for homogeneity
11.5 Which of the following statistical distributions is used to test for a contingency table?
①  distribution
③ binomial distribution
(Answers)
11.1 ①, 11.2 ②, 11.3 ②, 11.4 ④, 11.5 ②
②  distribution
④ Normal distribution

♡
12
Correlation and Regression
Analysis
SECTIONS
CHAPTER OBJECTIVES
12.1 Correlation Analysis
12.2 Simple Linear Regression Analysis
12.2.1 Simple Linear Regression Model
12.2.2 Estimation of Regression
Coefficient
12.2.3 Goodness of Fit for Regression
Line
12.2.4 Analysis of Variance for
Regression
12.2.5 Inference for Regression
12.2.6 Residual Analysis
12.3 Multiple Linear Regression Analysis
12.3.1 Multiple Linear Regression Model
12.3.2 Estimation of Regression
Coefficient
12.3.3 Goodness of Fit for Regression
and Analysis of Variance
12.3.4 Inference for Multiple Linear
Regression
From Chapter 7 to Chapter 10, we
discussed the estimation and the testing
hypothesis of parameters such as
population mean and variance for single
variable.
This chapter describes a correlation analysis
for two or more variables.
If variables are related with each other,
then a regression analysis is described to
see how this association can be used.
Simple linear regression analysis and
multiple regression analysis are discussed.
140
/ Chapter 12 Correlation and Regression Analysis

12.1 Correlation Analysis
Ÿ
Example 12.1.1
The easiest way to observe the relation of two variables is to draw a scatter plot
with one variable as X axis and the other as Y axis. If two variables are related,
data will gather together with a certain pattern, and if not related, data will be
scattered around. The correlation analysis is a method of analyzing the degree of
linear relationship between two variables. It is to investigate how linearly the
other variable increases or decreases as one variable increases.
Based on the survey of advertising costs and sales for 10 companies that make the
same product, we obtained the following data as in Table 12.1.1. Using『eStat』, draw
a scatter plot for this data and investigate the relation of the two variables.
Table 12.1.1 Advertising costs and sales (unit: 1 million USD)
Company
1
Advertise (X)
Sales (Y)
2
3
4
5
4 6 6 8 8
39 42 45 47 50
6
7
8
9
10
9
50
9
52
10
55
12
57
12
60
⇨ eBook ⇨ EX120101_SalesByAdvertise.csv.
Answer
w Using『eStat』, enter data as shown in <Figure 12.1.1>. If you select the Sales as
'Y Var' and the Advertise 'by X Var' in the variable selection box that appears when
you click the scatter plot icon on the main menu, the scatter plot will appear as
shown in <Figure 12.1.2>. As we can expect, the scatter plot show that the more
investments in advertising, the more sales increase, and not only that, the form of
increase is linear.
『
』
<Figure 12.1.1> Data
input in eStat
Ÿ
<Figure 12.1.2> Scatter plot of sales by
advertise
The relation between two variables can be roughly investigated using a scatter
plot like this. However, a measure of the extent of the relation can be used
together to provide a more accurate and objective view of the relation between
two variables. As a measure of the relation between two variables, there is a
covariance. The population covariance of the two variables X and Y is denoted as
    . When the random samples of two variables are given as
12.1 Correlation Analysis / 141

    ⋯    , the estimate of the population covariance using samples,
which is called the sample covariance,   , is defined as follows:


     
   

  



        


  

Ÿ
Ÿ
In the above equation, 
 and 
 represent the sample means of X and Y
respectively.
In order to understand the meaning of covariance, consider a case that Y
 and the value of Y is
increases if X increases. If the value of X is larger than 
 , then   
   
  always has a positive value. Also, if the
larger than 
value of X is smaller than 
 and the value of Y is smaller than 
 , then
  
   
  has a positive value. Therefore, their mean value which is the
covariance tends to be positive. Conversely, if the value of the covariance is
negative, the value of the other variable decreases as the value of one variable
increases. Hence, by calculating covariance, we can see the relation between two
variables: positive correlation (i.e., increasing the value of one variable will
increase the value of the other) or negative correlation (i.e., decreasing the value
of the other).
Covariance itself is a good measure, but, since the covariance depends on the
unit of X and Y, it makes difficult to interpret the covariance according to the
size of the value and inconvenient to compare with other data. Standardized
covariance which divides the covariance by the standard deviation of X and Y, 
and  , to obtain a measurement unrelated to the type of variable or specific
unit, is called the population correlation coefficient and denoted as  .
Population Correlation Coefficient:
Ÿ
   
  
 
<Figure 12.1.3> shows different scatter plots and its values of the correlation
coefficient.
<Figure 12.1.3> Different scatter plots and their correlation coefficients.
142
/ Chapter 12 Correlation and Regression Analysis

Ÿ
The correlation coefficient  is interpreted as follows:
1)  has a value between -1 and +1. A  value closer to +1 indicates a strong
positive linear relation and a  value closer to -1 indicates a strong negative
linear relation. Linear relationship weakens as the value of  is close to 0.
2) If all the corresponding values of X and Y are located on a straight line, the
value of  has either +1 (if the slope of the straight line is positive) or -1 (if
the slope of the straight line is negative).
3) The correlation coefficient  is only a measure of linear relationship between
two variables. Therefore, in the case of    , there is no linear relationship
between the two variables, but there may be a different relationship. (see the
scatter plot (f) in <Figure 12.1.3>)
Ÿ
『eStatU』provides a simulation of scatter plot shapes for different correlations as
in <Figure 12.1.4>.
『
』
<Figure 12.1.4> Simulation of correlation coefficient
at eStatU
Ÿ
An estimate of the population correlation coefficient using samples of two
variables is called the sample correlation coefficient and denoted as  . The
formula for the sample correlation coefficient  can be obtained by replacing
each parameter with the estimates in the formula for the population correlation
coefficient.

  
  
where  is the sample covariance and     are the sample standard deviations of 
and  as follows:


   
  
  


   


  


  
 


12.1 Correlation Analysis / 143


  


  
 


Therefore, the formula  can be written as follows:


    

 
     

 


 
  








     
 
  
   
    
 




 
 
 

 
Example 12.1.2
Find the sample covariance and correlation coefficient for the advertising costs and
sales of [Example 12.1.1].
Answer
w To calculate the sample covariance and correlation coefficient, it is convenient to
make the following table. This table can also be used for calculations in regression
analysis.
Table 12.1.2 A table for calculating the covariance
1
2
3
4
5
6
7
8
9
10





4
6
6
8
8
9
9
10
12
12
39
42
45
47
50
50
52
55
57
60
16
36
36
64
64
81
81
100
144
144
1521
1764
2025
2209
2500
2500
2704
3025
3249
3600
156
252
270
376
400
450
468
550
684
720
766
25097
4326
Sum
84
497
Mean
8.4
49.7
w Terms which are necessary to calculate the covariance and correlation coefficient
are as follows:

 


  
    ×   
       


 


    ×   


 


 








              

w

       





    ×  ×   
   represent the sum of squares of  , the sum of squares of  ,
the sum of squares of  . Hence, the covariance and correlation coefficient are as
follows:



  
  
   
     
     
  


  
   



   
 






×
 
  
    





This value of the correlation coefficient is consistent with the scatter plot which
shows a strong positive correlation of the two variables.
144
/ Chapter 12 Correlation and Regression Analysis

Ÿ
Sample correlation coefficient  can be used for testing hypothesis of the
population correlation coefficient. The main interest in testing hypothesis of  is
     which tests the existence of linear correlation. This test can be done
using t distribution as follows:
Testing the population correlation coefficient  :
Null Hypothesis:     
Test Statistic:

  
 

  
 follows t distribution with    degrees of freedom
Rejection Region of  :
Example 12.1.3
Answer
1)      : Reject if
      
2)      : Reject if
     
3)    ≠  : Reject if
      
In the Example 12.1.2, test the hypothesis that the population correlation coefficient
between advertising cost and the sales amount is zero at the significance level of 0.05.
(Since the sample correlation coefficient is 0.978 which is close to 1, this test will not
be required in practice.)
w The value of the test statistic  is as follows:

 
  
   

  
Since it is greater than  = 2.306,      should be rejected.
w With the selected variables of 『eStat』 as <Figure 12.1.1>, click the regression icon
on the main menu, then the scatter plot with a regression line will appear.
Clicking the [Correlation and Regression] button below this graph will show the
output as <Figure 12.1.5> in the Log Area with the result of the regression analysis.
The values of this result are slightly different from the textbook, which is the error
associated with the number of digits below the decimal point. The same conclusion
is obtained that the p-value for the correlation test is 0.0001, less than the
significance level of 0.05 and, therefore, the null hypothesis is rejected.
『
<Figure 12.1.5> Testing hypothesis of correlation using eStat
』
12.1 Correlation Analysis / 145

[Practice 12.1.1]
A professor of statistics argues that a student’s final test score can be predicted from
his/her midterm. Ten students were randomly selected and their mid-term and final
exam scores are as follows:
id
Mid-term X
Final Y
1
2
3
4
5
6
7
8
9
10
92
65
75
83
95
87
96
53
77
68
87
71
75
84
93
82
98
42
82
60
⇨ eBook ⇨ PR120101_MidtermFinal.csv.
1) Draw a scatter plot of this data with the mid-term score on X axis and final score
on Y axis. What do you think is the relationship between mid-term and final scores?
2) Find the sample correlation coefficient and test the hypothesis that the population
correlation coefficient is zero with the significance level of 0.05.
Ÿ
If there are more than three variables in the analysis, the relationship can be
viewed using the scatter plots for each combination of two variables and the
sample correlation coefficients can be obtained. However, to make it easier to see
the relationship between the variables, the correlations between the variables can
be arranged in a matrix format which is called a correlation matrix.
eStat
shows the result of a correlation matrix and the significance test for those values.
The result of the test shows the t value and p-value.
『
Example 12.1.4
』
Draw a scatter plot matrix and correlation coefficient matrix using four variables of the
iris data saved in the following location of『eStat』.
⇨ eBook ⇨ EX120104_Iris.csv
The variables are Sepal.Length, Sepal.Width, Petal.Length, and Petal.Width. Test the
hypothesis whether the correlation coefficients are equal to zero.
Answer
w From 『eStat』, load the data and click the 'Regression' icon. When the variable
selection box appears, select the four variables of Sepal.Length, Sepal.Width,
Petal.Length, and Petal.Width, then the scatter plot matrix will be shown as <Figure
12.1.6>.
w It is observed that the Sepal.Length and the Petal.Length, and the Petal.Length and
the Petal.Width are related.
146
/ Chapter 12 Correlation and Regression Analysis

Example 12.1.4
Answer
(continued)
『
<Figure 12.1.6> Scatter plot matrix using eStat
』
w When selecting [Regression Analysis] button from the options below the graph, the
basic statistics and correlation coefficient matrix such as <Figure 12.1.7> appear in
the Log Area with the test result. It can be seen that all correlations are significant
except the correlation coefficient between the Sepal.Length and Sepal.Width.
『
』
<Figure 12.1.7> Descriptive statistics and correlation matrix
using eStat
12.2 Simple Linear Regression Analysis / 147

[Practice 12.1.2]
A health scientist randomly selected 20 people to determine the effects of smoking and
obesity on their physical strength and examined the average daily smoking rate ( ,
number/day), the ratio of weight by height ( , kg/m), and the time to exercise with a
certain intensity (, in hours). Draw a scatterplot matrix and test whether there is a
correlation among smoking, obesity and exercising time with a certain intensity.
smoking rate
ratio of weight by
height 
time to exercise

24
0
25
0
5
18
20
0
15
6
0
15
18
5
10
0
12
0
15
12
53
47
50
52
40
44
46
45
56
40
45
47
41
38
51
43
38
36
43
45
11
22
7
26
22
15
9
23
15
24
27
14
13
21
20
24
15
24
12
16

⇨ eBook ⇨ PR120102_SmokingObesityExercis.csv.
12.2 Simple Linear Regression Analysis
Ÿ
Regression analysis is a statistical method that first establishes a reasonable
mathematical model of relationships between variables, estimates the model using
measured values of the variables, and then uses the estimated model to describe
the relationship between the variables, or to apply it to the analysis such as
forecasting. For example, a mathematical model of the relationship between sales
(Y) and advertising costs (X) would not only explain the relationship between sales
and advertising costs, but would also be able to predict the amount of sales that
a given investment.
Regression analysis is a statistical method that first establishes a
reasonable mathematical model of relationships between variables,
estimates the model using measured values of the variables, and then
uses the estimated model to describe the relationship between the
variables, or to apply it to the analysis such as forecasting.
Definition
Ÿ
As such, the regression analysis is intended to investigate and predict the degree
of relation between variables and the shape of the relation. In regression analysis,
a mathematical model of the relation between variables is called a regression
equation, and the variable affected by other related variables is called a
dependent variable. The dependent variable is the variable we would like to
describe which is usually observed in response to other variables, so it is also
called a response variable. In addition, variables that affect the dependent variable
are called independent variables. The independent variable is also referred to as
148
/ Chapter 12 Correlation and Regression Analysis

Ÿ
the explanatory variable, because it is used to describe the dependent variable. In
the previous example, if the objective is to analyse the change in sales amounts
resulting from increases and decreases in advertising costs, the sales amount is a
dependent variable and the advertising cost is an independent variable.
If the number of independent variables included in the regression equation is one,
it is called a simple linear regression. If the number of independent variables are
two or more, it is called a multiple linear regression.
12.2.1 Simple Linear Regression Model
Ÿ
Simple linear regression analysis has only one independent variable and the
regression equation is shown as follows:
       
Ÿ
In other words, the regression equation is represented by the linear equation of
the independent variable, and  and  are unknown parameters which represent
the intercept and slope respectively. The  and  are called the regression
coefficients. The above equation represents an unknown linear relationship
between Y and X in population and is therefore, referred to as the population
regression equation.
In order to estimate the regression coefficients  and  , observations of the
dependent and independent variable are required, i.e., samples. In general, all of
these observations are not located in a line. This is because, even if the Y and X
have an exact linear relation, there may be a measurement error in the
observations, or there may not be an exact linear relationship between Y and X.
Therefore, the regression formula can be written by considering these errors
together as follows:
       
Ÿ
  ⋯
where  is the subscript representing the  observation, and  is the random
variable indicating an error with a mean of zero and a variance  which is
independent of each other. The error  indicates that the observation  is how
far away from the population regression equation. The above equation includes
unknown population parameters  ,  and  and is therefore, referred to as a
population regression model.
If  and  are the estimated regression coefficients using samples, the fitted
regression equation can be written as follows. It is referred to as the sample
regression equation.

    
 represents the estimated value of  at    as predicted
In this expression, 
by the appropriate regression equation. These predicted values can not match the
actual observed values of  , and differences between these two values are called
residuals and denoted as 
Residuals:
Ÿ
     
 
  ⋯
The regression analysis makes some assumptions about the unobservable error  .
Since the residuals   calculated using the sample values have similar
characteristics as  , they are used to investigate the validity of these
assumptions. (Refer to Section 12.2.6 for residual analysis.)
12.2 Simple Linear Regression Analysis / 149

12.2.2 Estimation of Regression Coefficient
Ÿ
When sample data,    ⋯    , are given, a straight line representing it
can be drawn in many ways. Since one of the main objectives of regression
analysis is prediction, we would like to use the estimated regression line that
would make the residuals smallest that the error occurs when predicting the value
of Y. However, it is not possible to minimize the value of the residuals at all
points, and it should be chosen to make the residuals 'totally' smaller. The most
widely used of these methods is the method which minimizes the total sum of
squared residuals, that is called the method of least squares regression.
Method of Least Squares Regression
Definition
A method of estimating regression coefficients so that the total sum of the
squared errors occurring in each observation is minimized. i.e.,

Find  and  which minimize

 
Ÿ


 
 

     


 







 


 


Definition

To obtain the values of  and  by the least squares method, the sum of
squares above should be differentiated partially with respect to  and  , and
equate them zero respectively. If the solution of  and  of these equations is 
and  , the equations can be written as follows:
⋅  
Ÿ




 


 
 


The above expression is called a normal equation. The solution  and  of this
normal equation is called the least squares estimator of  and  and is given as
follows:
Least Squares Estimator of  and 

    
 


  

  
 

 
  
  

If we divide both the numerator and the denominator of  by    ,  can be

  
written as      . Since the correlation coefficient is   
and,
therefore,      , the slope  can also be calculated by using the
correlation coefficient as follows:
 
  

 
 
  

 

150
/ Chapter 12 Correlation and Regression Analysis

Example 12.2.1
Answer
In [Example 12.1.1], find the least squares estimate of the slope and intercept if the
sales amount is a dependent variable and the advertising cost is an independent
variable. Predict the amount of sales when you have spent on advertising by 10.
w In [Example 12.1.1], the calculation required to obtain the intercept and slope has
already been made. The intercept and slope using this are as follows:

       




   
 




   


 
  
     ×  
    
Therefore, the fitted regression line is 
w <Figure 12.2.1> shows the fitted regression line on the original data. The meaning
of slope value, 2.5033, is that, if advertising cost increases by one (i.e., one
million), sales increases by about 2.5 million.
『
<Figure 12.2.1> Simple linear regression using eStat
』
w Prediction of the sales amount of a company with an advertising cost of 10 can be
obtained by using the fitted sample regression line as follows:
    
In other words, sales of 53.705 million are expected. That is not to say that all
companies with advertising costs of 10 million USD have sales of 53.705 million
USD, but that the average amount of their sales is about that. Therefore, there
may be some differences in individual companies.
[Practice 12.2.1]
Using the data of [Practice 12.1.1] for the mid-term and final exam score, find the
least squares estimate of the slope and intercept if the final exam score is a
dependent variable and the mid-term score is an independent variable. Predict the final
exam score when you have a mid-term score of 80.
12.2 Simple Linear Regression Analysis / 151

12.2.3 Goodness of Fit for Regression Line
Ÿ
Ÿ
After estimating the regression line, it should be investigated how valid the
regression line is. Since the objective of a regression analysis is to describe a
dependent variable as a function of an independent variable, it is necessary to
find out how much the explanation is. A residual standard error and a coefficient
of determination are used for such validation studies.
Residual standard error  is a measure of the extent to which observations are
scattered around the estimated line. First, you can define the sample variance of
residuals as follows:


     

  

Ÿ
Ÿ
The residual standard error  is defined as the square root of  . The  is an
estimate of  which is the extent that the observations  are spread around
the population regression line. A small value of  or  indicates that the
observations are close to the estimated regression line, which in turn the
regression line represents well the relationship between the two variables.
However, it is not clear how small the residual standard error  is, although the
smaller value is the better. In addition, the size of the value of  depends on
the unit of  . To eliminate this shortcoming, a relative measure called the
coefficient of determination is defined. The coefficient of determination is the
ratio of the variation described by the regression line over the total variation of
observation  , so that it is a relative measure that can be used regardless of
the type and unit of the variable.
As in the analysis of variance in Chapter 9, the following partitions of the sum of
squares and degrees of freedom are formed in the regression analysis:
Partitions of the sum of squares and degrees of freedom
    
Sum of squares:
Degrees of freedom:         
Ÿ
Description of the above three sums of squares is as follows:

Total Sum of Squares
 
   
 


:
The total sum of squares indicating the total variation in observed values of  is
called the total sum of squares (  ). This  has the degree of freedom,
   , and if  is divided by the degrees of freedom, it becomes the sample
variance of  .

Error Sum of Squares
 

   
 



:
The error sum of squares (  ) of the residuals represents the unexplained
variation of the total variation of the  . Since the calculation of this sum of
squares requires the estimation of two parameters  and  ,  has the degree
of freedom    . This is the reason why, in the calculation of the sample
variance of residuals  , it was divided by    .

Regression Sum of Squares
 
 


 


:
The regression sum of squares (  ) indicates the variation explained by the
152
/ Chapter 12 Correlation and Regression Analysis

regression line among the total variation of  . This sum of squares has the
degree of freedom of 1.
Ÿ
If the estimated regression equation fully explains the variation in all samples (i.e.,
if all observations are on the sample regression line), the unexplained variation
 will be zero. Thus, if the portion of  is small among the total sum of
squares  , or if the portion of  is large, the estimated regression model
is more suitable. Therefore, the ratio of  to the total variation  , called
the coefficient of determination, is defined as a measure of the suitability of the
regression line as follows:
 

    
 

The value of the coefficient of determination is always between 0 and 1 and the
closer the value is to 1, the more concentrated the samples are around the
regression line, which means that the estimated regression line explains the
observations well.
Example 12.2.2
Answer
Calculate the value of the residual standard error and the coefficient of determination
in the data on advertising costs and sales.
w To obtain the residual standard error and the coefficient of determination, it is
 of the
convenient to make the following Table 12.2.1. Here, the estimated value 
sales from each value of  uses the fitted regression line.

    
Table 12.2.1 Useful calculations for the residual standard error and coefficient of
determination




SST
  
 
SSR

  
 
SSE
  
 
1
2
3
4
5
6
7
8
9
10
4
6
6
8
8
9
9
10
12
12
39
42
45
47
50
50
52
55
57
60
38.639
43.645
43.645
48.651
48.651
51.154
51.154
53.657
58.663
58.663
114.49
59.29
22.09
7.29
0.09
0.09
5.29
28.09
53.29
106.09
122.346
36.663
36.663
1.100
1.100
2.114
2.114
15.658
80.335
80.335
0.130
2.706
1.836
2.726
1.820
1.332
0.716
1.804
2.766
1.788
Sum
84
497
496.522
396.1
378.429
17.622
Average
8.4
49.7

w In Table 12.2.1,  = 396.1,  = 378.429,  = 17.622. Here, the
relationship of  =  +  does not exactly match because of the error
in the number of digits calculation. The sample variance of residuals is as follows:



  
  
     

  
  

12.2 Simple Linear Regression Analysis / 153

Example 12.2.2
Answer
(continued)
Hence, the residual standard error is   . The coefficient of determination is
as follows:


       


This means that 95.6% of the total variation in the observed 10 sales amounts can
be explained by the simple linear regression model using a variable of advertising
costs, so this regression line is quite useful.
w Click the [Correlation and Regression] button in the option below the graph of
<Figure 12.2.1> to show the coefficient of determinations and estimation errors
shown in <Figure 12.2.2>.
<Figure 12.2.2> Correlation and descriptive statistics
[Practice 12.2.2]
Using the data of [Practice 12.1.1] for the mid-term and final exam scores, calculate
the value of the residual standard error and coefficient of determination.
12.2.4 Analysis of Variance for Regression
Ÿ
If we divide three sums of squares obtained in the above example by its degree
of freedom, each one becomes a kind of variance. For example, if you divide the
 by    degrees of freedom, then it becomes the sample variance of the
observed values   ⋯  . If you divide the  by    degrees of
freedom, it becomes   which is an estimate of the variance of error  . For this
reason, addressing the problems associated with the regression using the partition
of the sum of squares is called the ANOVA of regression. Information required for
ANOVA, such as calculated sum of squares and degrees of freedom, can be
compiled in the ANOVA table as shown in Table 12.2.2.
Table 12.2.2 Analysis of variance table for simple linear regression
Factor
Sum of squares Degrees of freedom
Regression
Error
SSR
SSE

1
Total
SST

Mean squares
F value
MSR=SSR/1
Fo=MSR/MSE
MSE=SSE/   
154
/ Chapter 12 Correlation and Regression Analysis

Ÿ
Ÿ
The sum of squares divided by its degrees of freedom is referred to as mean
squares, and Table 12.2.2 defines the regression mean squares (  ) and error
mean squares (  ) respectively. As the expression indicates,  is the same
statistic as   which is the estimate of  .
The F value given in the last column are used for testing hypothesis      ,
   ≠  . If  is not 0, the F value can be expected to be large, because the
assumed regression line is valid and the variation of  is explained in large part
by the regression line. Therefore, we can reversely decide that  is not zero if
the calculated  ratio is large enough. If the assumptions about the error terms
mentioned in the population regression model are valid and if the error terms
follows a normal distribution, the distribution of  value, when the null
hypothesis is true follows  distribution with 1 and    degrees of freedom.
Therefore, if        , then we can reject      .
 Test for simple linear regression:
Hypothesis:
     ,
   ≠ 

Decision rule: If     

『
』
  
, then reject 
(In
eStat , the p-value for this test is calculated and the decision can be
made using this p-value. That is, if the p-value is less than the significance level,
the null hypothesis  is rejected.)
Example 12.2.3
Prepare an ANOVA table for the example of advertising cost and test it using the 5%
significance level.
Answer
w Using the sum of squares calculated in [Example 12.2.2], the ANOVA table is
prepared as follows:
Factor
Sum of
Squares
Degrees
Mean squares
of freedom
Regression
Error
378.42
17.62
1
10-2
Total
396.04
10-1
=378.42/1=378.42
 =17.62/8 =2.20
F value
 =378.42/2.2=172.0
w Since the calculated F value of 172.0 is much greater than     , we
reject the null hypothesis      with the significance level   .
w Click the [Correlation and Regression] button in the options window below the
graph <Figure 12.2.1> to show the result of the ANOVA as shown in <Figure
12.2.3>.
『
<Figure 12.2.3> Regression Analysis of Variance using eStat
[Practice 12.2.3]
』
Using the data in [Practice 12.1.1] for the mid-term and final exam scores, prepare an
ANOVA table and test it using the 5% significance level.
12.2 Simple Linear Regression Analysis / 155

12.2.5 Inference for Regression
Ÿ
One assumption of the error term  in the population regression model is that it
follows a normal distribution with the mean of zero and variance of  . Under
this assumption the regression coefficients and other parameters can be estimated
and tested. Note that, under the assumption above, the regression model
       follows a normal distribution with the mean    and variance

1) Inference for the parameter 
The parameter  , which is the slope of the regression line, indicates the
existence and extent of a linear relationship between the dependent and the
independent variables. The inference for  can be summarized as follows.
Especially, the test for hypotheses      is used whether the independent
variable describes the dependent variable significantly. The F test for the
hypothesis      described in the ANOVA of regression is theoretically the
same as in the test below. eStat calculates the p- value under the null
hypothesis. If this p-value is less than the significance level, the null hypothesis
is rejected and the regression line is said to be significant.
『
』
Inference for the parameter 

      
Point estimate:
 



 

 
   

 
Standard error of estimate  :


 ∼   

 
  

 

   


   
 

 
Confidence interval of  :  ±     ⋅ 
Testing hypothesis:
Null hypothesis:
Test statistic::
    
  
 

 rejection region: if      , then        
if      , then       
if    ≠  , then       
2) Inference for the parameter 
The inference for the parameter  , which is the intercept of the regression
line, can be summarized as below. The parameter  is not much of interest in
most of the analysis, because it represents the average value of the response
variable when an independent variable is 0.
156
/ Chapter 12 Correlation and Regression Analysis

Inference for the parameter 
Point estimate:




⋅ 
 ∼      


  
 
 
  
,

 
Standard error of estimate :








  
 

   ⋅

 

Confidence interval  :  ±     ⋅ 
Testing hypothesis:
Null hypothesis:     
Test Statistic::
  

 
 rejection region: if      , then        
if      ,       
if    ≠  ,        
3) Inference for the average of 
At any point in    , the dependent variable  has an average value
       . Estimation of    is also considered as an important
parameter, because it means predicting the mean value of  .
Inference for the average value       
Point estimate:

    
 :
Standard error of estimate 

 
  






   
 

 
   ⋅

 
Confidence interval of    : 
 ±     ⋅ 
 
The confidence interval formula of the mean value    depends on the value
of the  given the standard error of the estimate, so the width of the
confidence interval depends on the value of the given  . As the formula for
the standard error shows, this width is the narrowest at a time   
 , and if
 is the farther away from 
 , the wider it becomes. If we calculate the
confidence interval for the mean value of Y at each point of  , and then if
we connect the upper and lower limits to each other, we have a confidence
band of the regression line on the above and below the sample regression
line.
12.2 Simple Linear Regression Analysis / 157

Example 12.2.4
Answer
Let's make inferences about each parameter with the result of a regression analysis of
the previous data for the sales amount and advertising costs. Use『eStat』to check the
test result and confidence band.
1) Inference for 
The point estimate of  is    and the standard error of  is as follows:


   
   






   


w Hence, the 95% confidence interval of  using      is as follows:
 ± 
 ± 
i.e. the interval (1.7720, 3.2346).
w The test statistic for the hypothesis      ,    ≠  is as follows:
  
    

Since     , the null hypothesis      is rejected with the
significance level of   . This result of two sided test can be obtained from
the confidence interval. Since 95% confidence interval (1.7720, 3.2346) do not
include 0, the null hypothesis      can be rejected.
2) Inference for 
The point estimate of  is    and its standard error is as follows:









 ⋅    = 1.670
   ⋅   


 
  
 



Since the value of  statistic is    and     , the
null hypothesis      is also rejected with the significance level   .
3) Inference for the average value of 
In『eStat』, the standard error of 
 , which is the estimate of    , is calculated
 at    is
at each point of  . For example, the point estimate of 

   and its standard error is 0.475. Hence, the 95% confidence interval of
   is as follows:
 ± 
 ± 
i.e., the inteval is (46.878, 50.520). We can calculate the confidence interval for
other value of  in a similar way as follows:
At
At
At
At








  ±  ⇒  
  ±  ⇒  
  ±  ⇒  
  ±  ⇒  
X.
As we discussed, the confidence interval becomes wider as  is far from 
w If you select the [Confidence Band] button from the options below the regression
graph of <Figure 12.2.1>, you can see the confidence band graph on the scatter
plot together with regression line as <Figure 12.2.4>. If you click the [Correlation
and Regression] button, the inference result of each parameter will appear in the
Log Area as shown in <Figure 12.2.5>.
158
/ Chapter 12 Correlation and Regression Analysis

Example 12.2.4
Answer
(continued)
『
<Figure 12.2.4> Confidence band using eStat
』
<Figure 12.2.5> Testing hypothesis of regression coefficients
[Practice 12.2.4]
Using the data in [Practice 12.1.1] for the mid-term and final exam scores, make
inferences about each parameter using『eStat』and draw the confidence band.
12.2.6 Residual Analysis
Ÿ
Ÿ
The inference for each regression parameter in the previous section is all based
on some assumptions about the error term  included in the population
regression model. Therefore, the satisfaction of these assumptions is an important
precondition for making a valid inference. However, because the error term is
unobservable, the residuals as estimate of the error term are used to investigate
the validity of these assumptions which are referred to as a residual analysis.
First, let's look at the assumptions in the regression model.
Assumptions in regression model
 : The assumed model        is correct.
 : The expectation of error terms  is 0.
 : (Homoscedasticity) The variance of  is  which is the same for all X.
 : (Independence) Error terms  are independent.
 : (Normality) Error terms  ’s are normally distributed.
Ÿ
Review the references for the meaning of these assumptions. The validity of these
assumptions is generally investigated using scatter plots of the residuals. The
12.2 Simple Linear Regression Analysis / 159

following scatter plots used primarily for each assumption:
 )
1) Residuals versus predicted values (i.e.,  vs 
: 
2) Residuals versus independent variables (i.e.,   vs  ) : 
3) Residuals versus observations (i.e.,   vs  ) :  , 
Ÿ
Ÿ
Example 12.2.5
Answer
In the above scatter plots, if the residuals show no particular trend around zero,
and appear randomly, then each assumption is valid.
The assumption that the error term  follows a normal distribution can be
investigated by drawing a histogram of the residuals in case of a large amount of
data to see if the distribution is similar to the shape of the normal distribution.
Another method is to use the quantile–quantile (Q-Q) scatter plot of the
residuals. In general, if the Q-Q scatter plot of the residuals forms a straight line,
it can be considered as a normal distribution.
Since residuals are also dependent on the unit of the dependent variable,
standardized values of the residuals are used for consistent analysis of the
residuals, which are called standardized residuals. Both the scatter plots of the
residuals described above and the Q-Q scatter plot are created using the
standardized residuals. In particular, if the value of the standardized residuals is
outside the ±  , an anomaly value or an outlier value can be suspected.
Draw a scatter plot of residuals and a Q-Q scatter plot for the advertising cost
example.
w When you click the [Residual Plot] button from the options below the regression
graph of <Figure 12.2.1>, the scatter plot of the standardized residuals and
predicted values are appeared as shown in <Figure 12.2.6>. If you click [Residual
Q-Q Plot] button, <Figure 12.2.7> is appeared. Although the scatter plot of the
residuals has no significant pattern, the Q-Q plot deviates much from the straight
line and so, the normality of the error term is somewhat questionable. In such
cases, the values of the response variable need to be re-analyzed by taking
logarithmic or square root transformation.
<Figure 12.2.6> Residual plot
160
/ Chapter 12 Correlation and Regression Analysis

Example 12.2.5
Answer
(continued)
<Figure 12.2.7> Residual Q-Q Plot
[Practice 12.2.4]
Ÿ
Using the data in [Practice 12.1.1] for the mid-term and final exam scores, draw a
scatter plot of the residuals and a Q-Q scatter plot.
『
』
In eStatU , it is possible to do experiments on how much a regression line is
affected by an extreme point (<Figure 12.2.8>). A point can be created by clicking
the mouse on the screen in the link below. If you create multiple dots, you can
see how much the regression line changes each time. You can observe how
sensitive the correlation coefficient and the coefficient of determination are as you
move a point along with the mouse.
『
』
<Figure 12.2.8> Simulation experiment of
regression analysis at eStatU
12.3 Multiple Linear Regression Analysis / 161

12.3 Multiple Linear Regression Analysis
Ÿ
For actual applications of the regression analysis, the multiple regression models
with two or more independent variables are more frequently used than the
simple linear regression with one independent variable. This is because it is rare
for a dependent variable to be sufficiently explained by a single independent
variable, and in most cases, a dependent variable has a relationship with several
independent variables. For example, it may be expected that sales will be
significantly affected by advertising costs, which are examples of simple linear
regression, but will also be affected by product quality ratings, the number and
size of stores sold. The statistical model used to identify the relationship between
one dependent variable and several independent variables is called a multiple
linear regression analysis. However, the simple linear regression and multiple linear
regression analysis differ only in the number of independent variables involved,
and there is no difference in the method of analysis.
12.3.1 Multiple Linear Regression Model
Ÿ
In the multiple linear regression model, it is assumed that the dependent variable
Y and k number of independent variables have the following relational formulas:
      ⋯    
This means that the dependent variable is represented by the linear function of
independent variables and a random variable that represents the error term as in
the simple linear regression model. The assumption of the error terms is the
same as the assumption in the simple linear regression. In the above equation, 
is the intercept of Y axis and  is the slope of the Y axis and  which
indicates the effect of  to Y when other independent variables are fixed.
Example 12.3.1
When logging trees in forest areas, it is necessary to investigate the amount of timber
in those areas. Since it is difficult to measure the volume of a tree directly, we can
think of ways to estimate the volume using the diameter and height of a tree that is
relatively easy to measure. The data in Table 12.3.1 are the values for measuring
diameter, height and volume after sampling of 15 trees in a region. (The diameter was
measured at a point 1.5 meters above the ground.) Draw a scatter plot matrix of this
data and consider a regression model for this problem.
Table 12.3.1 Diameter, height and volume of tree
Diameter(cm) Height(m) Volume(   )
21.0
21.33
0.291
21.8
19.81
0.291
22.3
19.20
0.288
26.6
21.94
0.464
27.1
24.68
0.532
27.4
25.29
0.557
27.9
20.11
0.441
27.9
22.86
0.515
29.7
21.03
0.603
32.7
22.55
0.628
32.7
25.90
0.956
33.7
26.21
0.775
34.7
21.64
0.727
35.0
19.50
0.704
40.6
21.94
1.084
⇨ eBook ⇨ EX120301_TreeVolume.csv.
162
/ Chapter 12 Correlation and Regression Analysis

Example 12.3.1
Answer
w Load the data saved at the following location of 『eStat』.
⇨ eBook ⇨ EX120301_TreeVolume.csv
w In the variable selection box which appears by selecting the regression icon, select
'Y variable' by volume and select ‘by X variable’ as the diameter and height to
display a scatter plot matrix as shown in <Figure 12.3.1>. It can be observed that
there is a high correlation between volume and diameter, and that volume and
height, and diameter and height are also somewhat related.
<Figure 12.3.1> Scatterplot matrix
<Figure 12.3.2> Correlation matrix
12.3 Multiple Linear Regression Analysis / 163

Example 12.3.1
Answer
(continued)
w Since the volume is to be estimated using the diameter and height of the tree, the
volume is the dependent variable Y, and the diameter and height are independent
variables  ,  respectively, and the following regression model can be
considered.
           
[Practice 12.3.1]
  ⋯
A health scientist randomly selected 20 people to determine the effect of smoking and
obesity on their physical strength and examined the average daily smoking rate ( ,
number/day), the ratio of weight by height ( , kg/m), and the time to continue to
exercise with a certain intensity (, in hours). Draw a scatter plot matrix of this data
and consider a regression model for this problem.
smoking rate

ratio of weight by
height 
24
0
25
0
5
18
20
0
15
6
0
15
18
5
10
0
12
0
15
12
53
47
50
52
40
44
46
45
56
40
45
47
41
38
51
43
38
36
43
45
time to continue
to exercise 
11
22
7
26
22
15
9
23
15
24
27
14
13
21
20
24
15
24
12
16
⇨ eBook ⇨ PR120301_SmokingObesityExercis.csv.
Ÿ
In general, matrix and vectors are used to facilitate expression of formula and
calculation of expressions. For example, if there are k number of independent
variables, the population multiple regression model at the observation point
    ⋯  is presented in a simple manner as follows:
    
Here     are defined as follows:
  
    ⋯  
  
   ⋯ 

  ⋅   
,  ⋅ ,
⋯
⋅
⋅
⋯
 
   ⋯  
  
 

  

 ⋅
⋅
  
  
164
/ Chapter 12 Correlation and Regression Analysis

12.3.2 Estimation of Regression Coefficient
Ÿ
In a multiple regression analysis, it is necessary to estimate the    number of
regression coefficients   ⋯  using samples. In this case, the least squares
method which minimizes the sum of the squared errors is also used. We find 
which minimizes the following sum of the error squares as follows:

 

 


 ′     ′   
As in the simple linear regression, the above error sum of squares is
differentiated with respect to  and then, equate to zero which is called a
normal equation. The solution of the equation, denoted as  which is called the
least squares estimate of  , should satisfy the following normal equation.
 ′    ′
Therefore, if there exists an inverse matrix of  ′ , the least squares estimator of
 ,  , is as follows:
   ′    ′
Ÿ
(Note: Statistical packages uses a different formula, because the above formula
causes large amount of computing error)
If the estimated regression coefficients are     ⋯  ′, the estimate of the
response variable  is as follows:

      ⋯  
The residuals are as follows:
    

       ⋯   
By using a vector notation, the residual vector  can be defined as follows:
    
12.3.3 Goodness of Fit for Regression and Analysis of Variance
Ÿ
Ÿ
In order to investigate the validity of the estimated regression line in the multiple
regression analysis, the standardized residual error and coefficient of determination
are also used. In the simple linear regression analysis, the computational formula
for these measures was given as a function of the residuals, i.e., observed value
of  and its predicted value, so there is nothing to do with the number of
independent variables. Therefore, the same formula can be used in the multiple
linear regression and there is only a difference in the value of the degrees of
freedom that each sum of squares has.
In the multiple linear regression analysis, the standard error of residuals is defined
as follows:




  





 
12.3 Multiple Linear Regression Analysis / 165

Ÿ
Ÿ
The difference from the simple linear regression is that the degrees of freedom
for residuals is      , because the k number of regression coefficients must
be estimated in order to calculate residuals. As in simple linear regression,   is a
statistic such as the residual mean squares(  ). The coefficient of determination
is given in     and its interpretation is as shown in the simple linear
regression.
The sum of squares is defined by the same formula as in the simple linear
regression, and can be divided with corresponding degrees of freedom as follows
and the table of the analysis of variance is shown in Table 12.3.2.
Sum of squares:
Degrees of freedom:
    
          
Table 12.3.2 Analysis of variance table for multiple linear regression analysis
Source
Ÿ
Sum of
Squares
Regression
Error
SSR
Total
SST
Degrees of
Freedom
Mean Squares


 
MSR=SSR/ 
SSE
F value
 =MSR/MSE
MSE=SSE/     

The F value in the above ANOVA table is used to test the significance of the
regression equation, where the null hypothesis is that all independent variables
are not linearly related to the dependent variables.
      ⋯    
  At least one of  number of  s is not equal to 0
Ÿ
Since  follows F distribution with  and      degrees of freedom under
the null hypothesis, we can reject  at the significance level  if
        α. Each  can also be tested which is described in the following
sections. (Also,
eStat
calculates the p-value for this test, so use this p-value
to test. That is, if the p-value is less than the significance level, the null
hypothesis is rejected.)
『
』
12.3.4 Inference for Multiple Linear Regression
Ÿ
Parameters that are of interest in multiple linear regression, as in the simple
linear regression, are the expected value of Y and each regression coefficients
  ⋯  . The inference of these parameters   ⋯  is made possible by
obtaining a probability distribution of the point estimates  . Under the
assumption that the error terms  are independent and all have a distribution of
   , it can be shown that the distribution of  is as follows:
 ∼ N ⋅  ,
    ⋯ 
The above  is the  diagonal element of the    ×    matrix   ′   .
In addition, using an estimate  instead of a parameter  , you can make
inferences about each regression coefficient using the t distribution.
166
/ Chapter 12 Correlation and Regression Analysis

Inference on regression coefficient 
Point estimate:

Standard error of point estimate:
   
 ⋅

Confidence interval of    ±       ⋅  
Testing hypothesis:
Null hypothesis:
    
  
Test Statistic::   
  
 rejection region:
if      ,          
if      ,         
『
』
if    ≠  ,           
(Since eStat calculates the p-value under the null hypothesis      , p-value
is used for testing hypothesis. )
Ÿ
Example 12.3.2
Answer
Residual analysis of the multiple linear regression is the same as in the simple
linear regression.
For the tree data of [Example 12.3.1], obtain the least squares estimate of each
coefficient of the proposed regression equation using『eStat』and apply the analysis of
variance, test for goodness of fit and test for regression coefficients.
w In the options window below the scatter plot matrix in <Figure 12.3.1>, click
[Regression Analysis] button. Then you can find the estimated regression line,
ANOVA table as shown in <Figure 12.3.3> in the Log Area. The estimated regression
equation is as follows:

       
In the above equation, 0.037 represents the increase of the volume of the tree
when the diameter    increases 1(cm).
w The p-value calculated from the ANOVA table in <Figure 12.3.3> at F value of 73.12
is less than 0.0001, so you can reject the null hypothesis        at the
significance level   . The coefficient of determination,    , implies
that 92.4% of the total variances of the dependent variable are explained by the
regression line. Based on the above two results, we can conclude that the diameter
and height of the tree are quite useful in estimating the volume.
12.3 Multiple Linear Regression Analysis / 167

Example 12.3.2
Answer
(continued)
<Figure 12.3.3> Result of Multiple Linear Regression
w Since           and      from the result in
<Figure 12.3.3>, the 95% confidence intervals for each regression coefficients can be
calculated as follows. The difference between this result and the Figure 12.3.3 due
to the error in the calculation below the decimal point.
95% confidence interval for  :  ±    
95% confidence interval for  :  ±    
w In the hypothesis test of         ≠    , each p-value is less
than the significance level of 0.05, so you can reject each null hypothesis.
w The scatter plot of the standardized residuals is shown in <Figure 12.3.4> and the
Q-Q scatter plot is shown in <Figure 12.3.5>. There is no particular pattern in the
scatter plot of the standardized residuals, but there is one outlier value, and the
Q-Q scatter plot shows that the assumption of normality is somewhat satisfactory.
<Figure 12.3.4> Residual analysis of multiple linear
regression
168
/ Chapter 12 Correlation and Regression Analysis

Example 12.3.2
Answer
(continued)
<Figure 12.3.5> Q-Q plot of multiple linear regression
[Practice 12.3.2]
Apply a multiple regression model by using『eStat』on the regression model of
[Practice 12.3.1]. Obtain the least squares estimate of each coefficient of the proposed
regression equation and apply the analysis of variance, test for goodness of fit and test
for regression coefficients.
Chapter 12 Exercise / 169

Exercise
12.1 A survey was conducted on the level of education(X, the period after graduating a high school,
unit: year) for 10 businessmen and annual income (Y, unit: one thousand USD) after graduating
from the high school.
id
Education Period (X)
Annual Income (Y)
1
2
3
4
5
6
7
8
9
10
4
2
0
3
4
4
5
5
2
2
50
37
35
45
57
49
60
47
39
50
1) Draw a scatter plot of data and interpret.
2) Calculate the sample correlation coefficient.
3) Apply the regression analysis with annual income as the dependent variable and
the level of education as the independent variable.
12.2 The following data shows studying time for a week (X) and the grade (Y) of six students.
Studying time X
(unit: hour)
15
28
13
20
4
10
Grade Y
2.0
2.7
1.3
1.9
0.9
1.7
1) Find a regression line and 95% confidence interval for  (it is a further grade
score that is expected to be raised when a student studies one more hour a
week.)
2) Calculate a 99% confidence interval in the average score of a student who studies
an average of 12 hours a week.
3) Test for hypothesis      ,      (significance level = 0.01).
12.3 A professor of statistics argues that a student’s final test score can be predicted from his/her
midterm. Five students were randomly selected and their mid-term and final exam scores are as
follows:
id
1
2
3
4
5
Mid-term X
92
65
75
83
95
Final Y
87
71
75
84
93
1) Draw a scatter plot of this data with mid-term score on X axis and final score on
Y axis. What do you think is the relationship between mid-term and final scores?
170
/ Chapter 12 Correlation and Regression Analysis

2) Find the regression line and analyse the result.
12.4 An economist argues that there is a clear relationship between coffee and sugar prices. 'When
people buy coffee, they will also buy sugar. Isn't it natural that the higher the demand, the higher
the price?' We collected the following sample data to test his theory.
Year
Coffee Price
Sugar Price
1985
1986
1987
1988
1989
1990
1991
0.68
1.21
1.92
1.81
1.55
1.87
1.56
0.245
0.126
0.092
0.086
0.101
0.223
0.212
1) Prepare a scatter plot with the coffee price on X axis and sugar price on Y axis.
Is this data true to this economist's theory?
2) Test this economist's theory by using a regression analysis.
12.5 A rope manufacturer thinks that the strength of the rope is proportional to the nylon content of the
rope. Ten ropes are randomly selected and their data are as follows:
% Nylon
0
10
20
20
30
30
40
50
60
70
X
Strength (psi) Y
260
360
490
510
600
600
680
820
910
990
1) Draw a scatter plot with the % Nylon on X axis and strength on Y axis. Find a
regression line using the least squares method. Draw this estimated regression
line on the scatter plot.
2) Estimate the strength of a rope in case of 33% nylon.
3) Estimate the strength of a rope in case of 66% nylon.
4) The strength of two ropes in case of 20% nylon on the data are different. How
can you explain this variation in a regression model?
5) Estimate the strength of a rope in case of 0% nylon. Why is this estimate different
from the observed value of 260?
6) Obtain a 95% confidence interval for the strength of the 0% nylon rope.
7) If the observed strength of the 0% nylon rope was outside the confidence interval
in 6), how would you interpret this result?
12.6 A health scientist randomly selected 20 people to determine the effects of smoking and obesity on
their physical strength and examined the average daily smoking rate ( , number/day), the ratio of
weight by height ( , kg/m), and the time to continue to exercise with a certain intensity ( , in
hours). Test whether smoking and obesity can affect your exercising time with a certain intensity.
Apply a multiple regression model by using 『eStat』.
Chapter 12 Exercise / 171

smoking rate

ratio of weight
by height 
24
0
25
0
5
18
20
0
15
6
0
15
18
5
10
0
12
0
15
12
53
47
50
52
40
44
46
45
56
40
45
47
41
38
51
43
38
36
43
45
time to
continue to
exercise 
11
22
7
26
22
15
9
23
15
24
27
14
13
21
20
24
15
24
12
16
12.7 The price of old watches in an antique auction is said to be determined by the year of making the
watch and the number of bidders. In order to see if this is true, the 32 recently auctioned alarm
clocks were examined for the elapsed period (in years) after manufacture, the number of bidders
and the auction price (in 1,000USD) as follows. Test the hypothesis that the auction price of the
alarm clock increases with the increase in the number of bidders using the multiple linear
regression model. (significance level: 0.05)
172
/ Chapter 12 Correlation and Regression Analysis

Elapsed Period

Number of bidders

127
115
127
150
156
182
156
132
137
113
137
117
137
153
117
126
170
182
162
184
143
159
108
175
108
179
111
187
111
115
194
168
13
12
7
9
6
11
12
10
9
9
15
11
8
6
13
10
14
8
11
10
6
9
14
8
6
9
15
8
7
7
5
7
Auction Price

1235
1080
845
1522
1047
1979
1822
1253
1297
946
1713
1024
1147
1092
1152
1336
2131
1550
1884
2041
854
1483
1055
1545
729
1792
1175
1593
785
744
1356
1262
Chapter 12 Multiple Choice Exercise / 173

Multiple Choice Exercise
12.1 The variables X and Y have a strong relationship with a quadratic equation (   ) as shown in
the following table. What is their sample correlation coefficient?

…
-3
-2
-1
0
1
2
3
…

…
9
4
1
0
1
4
9
…
①1
③ -1
②0
④ 
12.2 Which is a wrong description of the correlation coefficient?
①     
③ if    , no linear correlation
② if    , perfect negative correlation
④ if  < 0, negative correlation
12.3 Which is a right description of the correlation coefficient?
① if   , there is strong positive correlation between  and  .
② if || closes to 0, there exist a weak linear correlation between  and  .
③ If  is negative, then  is increasing when  increases.
④ If  is near   , there exist a weak linear correlation between  and  .
12.4 If the sample correlation coefficient between  and      ⋯  is , what is the sample
correlation coefficient between  + and  + ?
①
③ 5 + 3
② 2
④ 10 + 2
12.5 If the sample correlation coefficient between  and  is , what is the sample correlation
coefficient between  and    ?
①
③ 3
② 2
④ 3 + 1
12.6 When not all points on a scatter plot tend to be linear, what is the sample correlation coefficient
 close to:
① ≥
③ || is close to 1
②  ≤ 
④ || is close to 0
12.7 Find the sample correlation coefficient between  and  of the following data.

10
20
30
40

2
4
6
8
①1
② 0.3
③ 0.4
④ 0.5
174
/ Chapter 12 Correlation and Regression Analysis

12.8 If the correlation coefficient of two variables   is 0, what is the right description?
① There is no linear relationship between two variables   .
② There is a linear relationship between two variables   .
③ Two variables   has a strong relationship.
④ Two variables   has a strong linear relationship.
12.9 Which one of the following descriptions on the sample correlation coefficient  is not right?
①  is a random variable.
②  ≤  ≤ 
③  is a measure of linear relationship between two variables.
④ Distribution of  is a normal distribution.
12.10 Find the sample correlation coefficient between  and  of the following data.

1
2
3
4
5

5
4
3
2
1
① -1
③0
②  
④ 
12.11 Find the sample correlation coefficient  between  and  of the following data?

1
2
3
4
5
6

-1
1
3
5
7
9
① -0.5
②
③ 0.5
0
④
1
12.12 If  and  are independent, what is the sample correlation coefficient ?
①1
③0
② 
④  
12.13 Which one of the followings is right for description of the sample correlation coefficient ?
① ≤≤
③  ≤≤ 
②  ≤  ≤ 
④ ∞    ∞
12.14 Which one of the followings is right for description of the sample correlation coefficient  between
 and  ?
① if  = -1, the value of  is directly proportional to the value of .
② if  = 1, the value of  is directly proportional to the value of .
③ if  = 0, the value of  is inversely proportional to the value of .
④ if  = -1, the value of  is not related with the value of .
Chapter 12 Multiple Choice Exercise / 175

12.15 Which one of the followings is not right for description of the sample correlation coefficient 
between  and  ?
①  ≤≤ 
② Distribution of  is a normal distribution.
③  is a random variable.
      
④ The formula to calculate  is   
  
  










.

12.16 If two variables  and  have a strong quadratic relation, what is the sample correlation
coefficient ?
① ≒
③ ≒
②  ≒ 
④ know information on .
12.17 Which one of the followings has positive correlation?
① height of mountain and pressure
② weight and height
③ monthly income and Engel’ coefficient
④ amount of production and price
12.18 If all points lie on a straight line in a scatter plot, what is the characteristic of the correlation
coefficient?
① perfect correlation
③ weak correlation
② strong correlation
④ no correlation
12.19 If the sample correlation coefficient is     , what is the characteristic of the correlation
coefficient?
① inverse correlation
③ weak correlation
② positive correlation
④ usual correlation
12.20 Find the sample correlation coefficient between  and  of the following data.


1
1
① 0.29
2
4
3
3
4
6
② 0.53
③ 0.87
④ 0.98
12.21 Find the sample covariance between  and  of the following data.
①
②
③
④
1
0
0.5
-1


1
2
3
4
5
5
5
5
12.22 Find the sample covariance between  and  of the following data.
176
/ Chapter 12 Correlation and Regression Analysis



1
17
2
15
3
13
4
11
①0
③  
5
9
②1
④ -1
12.23 Find the sample covariance between  and  of the following data.

1
2
3
4
5

6
8
10
12
14
①3
③ 10
②4
④ 20
12.24 Find the regression line between  and  using the following data.

1
2
3
4
5

1
4
7
10
13
①       
③       
②       
④       
12.25 If the standard deviations of the  and  variables are 4.06 and 2.65 respectively, the
covariance is 10.50, what is the sample correlation coefficient ?
① 10.759
③ 1.025
② 0.532
④ 0.976
12.26 If we know the sample correlation coefficient  and the standard deviations of X and Y,   and
  respectively, what is the regression line equation?
①      ․    
③      ․    




②      ․    
④      ․    




12.27 If the sample correlation coefficient of two random variables  and  is    , the sample
means are 
   
   , and the sample standard deviations are        , what is the
regression line of  on  ?

①      


③     

②      

④      
12.28 Find the regression coefficient  of the regression line      using the following data.
Chapter 12 Multiple Choice Exercise / 177


40
sample standard
deviation
4

30
3
sample mean
correlation
coefficient
0.75
①
②
③
④
0.56
0.07
1.00
1.53
12.29 Which one of the following statements is true about the regression line of two variables  and
 , the regression line of  on  and the regression line of  on  ?
① The two regression lines are always consistent.
② The two regression lines are always parallel.  
③ The two regression lines meet at one point      and do not match.
④ The two regression lines are always perpendicular.
12.30 Find the regression coefficient  of the regression line      using the following data.
sample mean
sample standard
deviation

12
3

13
4
① 0.6
② 0.7
correlation
coefficient
r = 0.6
③ 0.8
④ 0.9
12.31 Which one is a wrong explanation about the regression coefficient  and the sample correlation
coefficient ?
① If    
② If    
③ If    
④ If    




(no correlation)
(positive correlation)
(perfect correlation)
(negative correlation)
12.32 If a regression line is      and the sample standard deviations of  and  are 4, 2
respectively, what is the value of the sample correlation coefficient ?
①1
② 0.8
③ 0.5
④ 0.4
(Answers)
12.1 ②, 12.2 ①, 12.3 ②, 12.4 ①, 12.5 ①, 12.6 ④, 12.7 ①, 12.8 ①, 12.9 ④, 12.10 ①,
12.11 ④, 12.12 ③, 12.13 ③, 12.14 ②, 12.15 ②, 12.16 ③, 12.17 ②, 12.18 ①, 12.19 ①, 12.20 ③,
12.21 ②, 12.22 ④, 12.23 ②, 12.24 ①, 12.25 ④, 12.26 ①, 12.27 ①, 12.28 ①, 12.29 ③, 12.30 ③),
12.31 ③, 12.32 ②
♡
13
Time Series Analysis
SECTIONS
13.1
13.2
13.3
13.4
13.5
What is Time Series Analysis?
Smoothing of Time Series
Transformation of Time Series
Regression Model and Forecasting
Exponential Smoothing Model
and Forecasting
13.6 Seasonal Model and Forecasting
CHAPTER OBJECTIVES
In this chapter, we study data observed over
time, time series, and find out about:.
- What is time series analysis and what
are types of time series models?
- How to smooth a time series.
- How to transform a time series.
- Prediction method using regression
model.
- Prediction method using exponential
smoothing model.
- How to predict future values with models
for seasonal time series..
We will be mainly focused on descriptive
methods
and
simple
models,
and
discussion of the Box-Jenkins model and
other theoretical models will not be
discussed.
180
/ Chapter 13 Time Series Analysis

13.1 What is Time Series Analysis?
Ÿ
Ÿ
Time series data refers to data recorded according to changes in time. In general,
observations are made at regular time intervals such as year, season, month, or
day, and this is called a discrete time series. There may be time series that are
continuously observed, but this book will only deal with the analysis of discrete
time series.
An example of a discrete time series is the population of Korea as shown in
[Table 13.1.1]. This data is from the census conducted every five years in Korea
from 1925 to 2020 (except for 1944 and 1949). .
[Table 13.1.1] Population of Korea
(Source: Korea National Statistical Office, Census till 2010, Registered Census 2015, 2020)
Year
Population
Year
Population
Year
Population
1925
1930
1935
1940
1944
1949
1955
19,020,030
20,438,108
22,208,102
23,547,465
25,120,174
20,166,756
21,502,386
1960
1966
1970
1975
1980
1985
1990
24,989,241
29,159,640
31,435,252
34,678,972
37,406,815
40,419,652
43,390,374
1995
2000
2005
2010
2015
2020
44,553,710
45,985,289
47,041,434
47,990,761
51,069,375
51,829,136
Ÿ
As shown in the table above, it is not easy to understand the overall shape of
the time series displayed in numbers. The first step in time series analysis is to
observe the time series by drawing a time series plot with the X axis as time
and the Y axis as time series values. For example, the time series plot of the
total population in Korea is shown in <Figure 13.1.1>.
<Figure 13.1.1> Time Series of Korea Population
Ÿ
Observing this figure, Korea's population has an overall increasing trend, but the
population decreased sharply in 1944-1949 due to World War II. It can be seen
that the population expanded rapidly after the Korean war in 1953 and slowed
since 1990. It can be seen that the growth has slowed further in the last 10
years. By observing the time series in this way, trends, change points, and outliers
13.1 What is Time Series Analysis? / 181

Ÿ
Ÿ
can be observed, and it is helpful in selecting an analysis model or method
suitable for the data.
Time series that we frequently encounter include monthly sales of department
stores and companies, daily composite stock index, annual crop production, yearly
export and import time series, and yearly national income and economic growth
rate, and so on.
[Table 13.1.2] shows the percent increase in monthly sales of the US toy/game
industry for the past 6 years, and <Figure 13.1.2> is a plot of this time series. As
it is the rate of change from the previous month, it can be observed that it is
seasonal data showing a large increase in November and December every year,
moving up and down based on 0. However, May 2020 is an extreme with an
increase rate of 211% unlike other years. For time series, you can better examine
the characteristics of the data by converting the raw time series into the rate of
change.
[Table 13.1.2] Percent Increase, Monthly Sales of Toy/Game in US(%)
(Source: Bureau of Census, US)
Year.Month
2016.01
2016.02
2016.03
2016.04
2016.05
2016.06
2016.07
2016.08
2016.09
2016.10
2016.11
2016.12
2017.01
2017.02
2017.03
2017.04
2017.05
2017.06
2017.07
2017.08
2017.09
2017.10
2017.11
2017.12
Percent
Increase
-66.7
2.5
12.5
-9.0
-0.6
-4.4
4.3
0.0
6.1
8.6
56.4
53.6
-65.6
-0.1
14.7
-5.7
-2.4
-5.5
1.3
4.2
8.4
7.2
54.9
45.5
Year.Month
2018.01
2018.02
2018.03
2018.04
2018.05
2018.06
2018.07
2018.08
2018.09
2018.10
2018.11
2018.12
2019.01
2019.02
2019.03
2019.04
2019.05
2019.06
2019.07
2019.08
2019.09
2019.10
2019.11
2019.12
Percent
Increase
-63.6
3.6
39.8
-21.0
5.9
-12.4
-16.9
5.2
7.5
8.5
54.9
5.8
-46.2
-3.8
16.3
-8.4
6.6
-5.3
0.8
7.7
-1.2
12.2
46.7
11.7
Year.Month
2020.01
2020.02
2020.03
2020.04
2020.05
2020.06
2020.07
2020.08
2020.09
2020.10
2020.11
2020.12
2021.01
2021.02
2021.03
2021.04
2021.05
2021.06
2021.07
2021.08
2021.09
2021.10
2021.11
2021.12
Percent
Increase
-49.1
2.2
-28.2
-58.2
211.1
26.8
-0.8
7.0
4.9
5.8
44.1
8.5
-37.1
-12.2
37.0
-10.3
-0.5
-2.0
4.6
1.8
5.2
6.4
40.0
10.6
182
/ Chapter 13 Time Series Analysis

<Figure 13.1.2> Percent Increase, Monthly Sales of Toy/Game in US(%)
Ÿ
Most time series have four components: trend, season, cycle, and other irregular
factors. A trend is a case in which a time series has a certain trend, such as a
line or a curved shape as time elapses, and there are various types of trends.
Trends can be understood as a consumption behavior, population variations, and
inflation that appear in time series over a long period of time. Seasonal factors
are short-term and regular fluctuation factors that exist quarterly, monthly, or by
day of the week. Time series such as monthly rainfall, average temperature, and
ice cream sales have seasonal factors. Seasonal factors generally have a short
cycle, but fluctuations when the cycle occurs over a long period of time rather
than due to the season is called a cycle factor. By observing these cyclical factors,
it is possible to predict the boom or recession of a periodic economy. {Figure
13.1.3} shows the US S&P 500 Index from 1997 to 2016, and a six-year cycle can
be observed.
<Figure 13.1.3>] US S&P500 Index (1997- 2016)
Ÿ
Other factors that cannot be explained by trend, season, or cyclical factors are
called irregular or random factors, which refer to variable factors that appear due
to random causes regardless of regular movement over time.
13.1 What is Time Series Analysis? / 183

13.1.1 Time Series Model
Ÿ
By observing the time series, you can predict how this time series will change in
the future by building a time series model that fits the probabilistic characteristics
of this data. Because the time series observed in reality has a very diverse form,
the time series model is also very diverse, from simple to very complex. In
general, time series models for a single variable can be divided into the following
four categories.
A. Regression Model
A model that explains data or predicts the future by expressing a time series in
the form of a function related to time is the most intuitive and easy to
understand model. That is, when a time series is an observation of a random
variable,       ⋯    , it is expressed as the following model:
         
     ⋯  
Here   is the error of the time series that cannot be explained by a function
  . In general   is assumed independent,      , and      = σ2 which
is called a white noise. For example, the following model can be applied to a
time series in which the data is horizontal or has a linear trend.
Horizontal:
     
Linear Trend:         
B. Decomposition Model
The model that decomposes the time series into four factors, i.e., trend(  ),
cycle(  ), seasonal(   ), and irregular(  ), is an analysis method that has been
used for a long time based on empirical facts. It can be divided into additive
model and multiplicative model.
Additive Model:
            
Multiplicative Model:     ⋅  ⋅   ⋅ 
Here   ,  ,   are deterministic function,  is a random variable. If we take
the logarithm of a multiplicative model, it becomes an additive model. If the
number of data is not enough the cycle factor can be omitted in the model.
C. Exponential Smoothing Model
Time series data are often more related to recent data than to past data. The
above two types of models are models that do not take into account the
relationship between the past time series data and the recent time series data.
Models using moving averages and exponential smoothing are often used to
explain and predict data using the fact that time series forecasting is more
related to recent data.
D. Box-Jenkins ARIMA Model
The above models are not methods that can be applied to all types of time
series, and the analyst selects and applies them according to the type of data.
Box and Jenkins presented the following general ARIMA model that can be
184
/ Chapter 13 Time Series Analysis

applied to all time series of stationary or nonstationary type as follows:
                 ⋯                   ⋯
Ÿ
Ÿ
The ARIMA model considers the observed time series as a sample extracted
from a population time series, studies the probabilistic properties of each model,
and establishes an appropriate time series model through parameter estimation
and testing. For the ARIMA model, autocorrelation coefficients between time lags
are used to identify a model. The ARIMA model is beyond the scope of this
book, so interested readers are encouraged to consult the bibliography.
In the above time series model, the regression model and ARIMA model are
systematic models based on statistical theory, and the decomposition model and
exponential smoothing model are methods based on experience and intuition. In
general, regression models using mathematical functions and models using
decomposition are known to be suitable for predicting slow-changing time series,
whereas exponential smoothing and ARIMA models are known to be effective in
predicting very rapidly changing time series.
For all time series models, it is impossible to predict due to sudden changes. And
because time series has so many different forms, it cannot be said that one time
series model is always superior to another. Therefore, rather than applying only
one model to a time series, it is necessary to establish and compare several
models, combine different models, or make an effort to determine the final
model by combining opinions of experts familiar with the time series.
13.1.2 Evaluation of Time Series Model
Ÿ
Let
the time series be the observed values of the random variables

    ⋯   and 
  
  ⋯ 
  be the values predicted by the model. If the
model agrees exactly, the observed and predicted values are the same, and the
model error   is zero. In general, it is assumed that the error   ’s of the time
series model are independent random variables which follow the same normal
distribution with a mean of 0 and a variance of   . The accuracy of a time
series model can be evaluated using residual,   
 , which is a measure by


subtracting the predicted value from the observed value. In general, the following
mean squared error (MSE) is commonly used for the accuracy of a model and
the smaller the MSE value, the more appropriate the predicted model is judged.
n
 Y
t
t

Yt 

MSE  
n
The mean square error is used as an estimator for the variance   of the error.
Since MSE can have a large value, the root mean squared error (RMSE) is often
used.
  

13.2 Smoothing of Time Series
Ÿ
Original time series data can be used to make a time series model by observing
trends, but in many cases, time series can be observed by smoothing to
understand better. In a time series such as stock price, it is often difficult to find
13.2 Smoothing of Time Series / 185

a trend because of temporary or short-term fluctuations due to accidental
coincidences or cyclical factors. In this case, smoothing techniques are used as a
method to effectively grasp the overall long-term trend by removing temporary or
short-term fluctuations. The centered moving average method and the exponential
smoothing method are widely used.
13.2.1 Centered Moving Average
Ÿ
The time series in [Table 13.2.1] is the world crude oil price based on the closing
price every year from 1987 to 2022. Looking at <Figure 13.2.1>, it can be seen
that the short-term fluctuations in the time series are large. However, causes such
as oil shocks are short-term and not continuous, so if we are interested in the
long-term trend of gasoline consumption, it would be more effective to look at
the fluctuations caused by short-term causes.
[Table 13.2.1] Price of Crude Oil (End of Year Price, US$) and 5-point Moving Average
Year
Price of Oil
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
16.74
17.12
21.84
28.48
19.15
19.49
14.19
17.77
19.54
25.90
17.65
12.14
25.76
26.72
19.96
31.21
32.51
43.36
5-point Moving
Average
20.666
21.216
20.630
19.816
18.028
19.378
19.010
18.600
20.198
21.634
20.446
23.158
27.232
30.752
37.620
45.798
Year
Price of Oil
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
61.06
60.85
95.95
44.60
79.39
91.38
98.83
91.83
98.17
53.45
37.13
53.75
60.46
45.15
61.14
48.52
75.21
106.95
5-point Moving
Average
58.746
61.164
68.370
74.434
82.030
81.206
91.920
86.732
75.882
66.866
60.592
49.988
51.526
53.804
58.096
67.394
186
/ Chapter 13 Time Series Analysis

<Figure 13.2.1> Price of Crude Oil and 5-point Moving Average
Ÿ
The N-point centered moving average of a time series refers to the average of 
data from a single point in time. For example, in crude oil price data, the value
of the five-point moving average for a specific year is the average of the data for
two years before the specific year, that year, and the data for the next two
years. Expressed as an expression, if   is a moving average in time , the
5-point centered moving average is as follows:
                
  

For example, the 5-point centered moving average for 1989 is as follows.
             
   

          


  
Ÿ
Ÿ
[Table 13.2.2] shows the values ​of all 5-points centered moving averages obtained
in this way and <Figure 13.2.1> is the graph of 5-points moving average. Note
that the moving averages for the first two years and the last two years cannot
be obtained here. It can be seen that the graph of the moving average is better
for grasping the long-term trend than the graph of the original data because
short-term fluctuations are removed.
The choice of a value  for the  -points moving average is important. A large
value of  will provide a smoother moving average, but it has the disadvantage
of losing more points at both ends and insensitive to detecting important trend
changes. On the other hand, if you choose small  , you will lose less data at
both ends, but you may not be able to get the smoothing effect because you
will not sufficiently eliminate short-term fluctuations. In general, try a few values ​
 to reflect important changes that should not be missed, while achieving a
smoothing effect and balancing the points not to lose too much at both ends.
If the value of  is an even number, there is a difficulty in obtaining a central
moving average with the same number of data on both sides of the base year.
For example, the center of the four-point moving average from 1987 to 1990 is
between 1988 and 1989. If you denote this as   , it can be calculated as
13.2 Smoothing of Time Series / 187

follows:
          
   

       


 
Ÿ
The 4-point moving average obtained in this way is called a non-central 4-points
moving average. In the case of this even number N, the non-central moving
average does not match the observation year of the original data, which is
inconvenient. In the case of this even number N, it is calculated as the average
of the noncentral moving average values of two adjacent non-central moving
averages. In other words, the central four-point moving average in 1989 is the
average of   and   as follows:
    
   

    


  
Ÿ
If the time series is quarterly or monthly, a 4-point central moving average or a
12-point central moving average is an average of one year, so it is often used to
observe data without seasonality.
13.2.2 Exponential Smoothing
Ÿ
3-point moving average can be considered the weighted average of three data
with each weight 1/3 as follows:
        
                        

When the weights are       , the weighted moving average Mt of the
time series is defined as follows:

 
  ,
 

n is the number of data.

n
Here,   ≥  ,
Ÿ
∑ wi =1
i=1
Various weighted averages with different weights can be used depending on the
purpose. Among them, a smoothing method that gives more weight to data closer
to the present and smaller weights as it is farther from the present is called
exponential smoothing. The exponential smoothing method is determined by an
exponential smoothing constant  that has a value between 0 and 1. The
exponentially smoothed data Et is calculated as follows:
           
           
           
188
/ Chapter 13 Time Series Analysis

⋯⋯
             
Here, an initial value   is required, and   is usually used a lot, and the
average value of the data can also be used. The exponentially smoothed value  
at the point in time  gives weight  to the current data, and the    
weight to the previous smoothed data is given. The exponentially smoothed value
  can be represented with the original data   as follows:
                         ⋯                   
Ÿ
Therefore, the exponential smoothing method uses all data from the present and
the past, but gives the current data the highest weight, and gives a lower weight
as the distance from the present time increases.
Exponential smoothing of the crude oil price in [Table 13.2.1] with the initial
value      and exponential smoothing constant  = 0.3 is as follows.
      
                     
                  
All data exponentially smoothed with  = 0.3 are given in [Table 13.4]. It can
be seen that, in the exponential smoothing method, there is no loss of data at
both ends, unlike the moving average method. The crude oil price time series
and exponentially smoothed data are shown in <Figure 13.2.2>. It can be seen
that the smoothed data are not significantly different from the original data. If
the value of  is small, more weight is given to the past data than to the
present, making it less sensitive to sudden changes in the present data.
Conversely, the closer the value of  is to 1, that is, the more weight is given
to the current data, the more the smoothed data resembles the original data,
and the smoothing effect disappears.
13.2 Smoothing of Time Series / 189

[Table 13.2.2] Price of Crude Oil and Exponential Smoothing with  =0.3
Year
Price of Oil
 =0.3
Exponental
Smoothing
Year
Price of Oil
 =0.3
Exponental
Smoothing
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
16.74
17.12
21.84
28.48
19.15
19.49
14.19
17.77
19.54
25.90
17.65
12.14
25.76
26.72
19.96
31.21
32.51
43.36
16.740
16.854
18.350
21.389
20.717
20.349
18.501
18.282
18.659
20.832
19.877
17.556
20.017
22.028
21.408
24.348
26.797
31.766
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
61.06
60.85
95.95
44.60
79.39
91.38
98.83
91.83
98.17
53.45
37.13
53.75
60.46
45.15
61.14
48.52
75.21
106.95
40.554
46.643
61.435
56.385
63.286
71.714
79.849
83.443
87.861
77.538
65.416
61.916
61.479
56.580
57.948
55.120
61.146
74.888
<Figure 13.2.2> Price of Crude Oil and Exponential Smoothing with  =0.3
13.2.3 Filtering by Moving Median 필터링
Ÿ
Ÿ
The N-point centered moving median of a time series refers to the median of N
data from a single point in time. For example, in crude oil price data, the value
of a five-point moving median for a specific year is the median of data for two
years before a certain year, that year, and data for two years thereafter. If data
are denoted by                       , and the data are sorted from
smallest to largest, and expressed as                            , the
median value is         .
For example, the 1989 central moving median for crude oil prices in [Table 13.3]
190
/ Chapter 13 Time Series Analysis

is as follows:
                   
             
  
Ÿ
[Table 13.2.3] and <Figure 13.2.3> show all the five-point moving median values
obtained in this way and their graphs. Note that the moving median for the first
two years and the last two years are not available here. Because the centered
moving medians remove extreme values, it is called a filtering and the time series
is much smoother than the original data.
Year
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
[Table 13.2.3] Price of Crude Oil and 5-point Centered Moving Median
5-point
5-point
Centered
Centered
Price of Oil
Year
Price of Oil
Moving
Moving
Median
Median
16.74
17.12
21.84
28.48
19.15
19.49
14.19
17.77
19.54
25.90
17.65
12.14
25.76
26.72
19.96
31.21
32.51
43.36
19.15
19.49
19.49
19.15
19.15
19.49
17.77
17.77
19.54
25.76
19.96
25.76
26.72
31.21
32.51
43.36
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
61.06
60.85
95.95
44.60
79.39
91.38
98.83
91.83
98.17
53.45
37.13
53.75
60.46
45.15
61.14
48.52
75.21
106.95
60.85
60.85
61.06
79.39
91.38
91.38
91.83
91.83
91.83
53.75
53.75
53.45
53.75
53.75
60.46
61.14
<Figure 13.2.3> Price of Crude Oil and 5-point Centered Moving Median
13.3 Transformation of Time Series / 191

Ÿ
If the value of N is an even number, there is a difficulty in obtaining the central
moving median having the same number of data on both sides of the base year.
For example, the center of the four-point moving median from 1987 to 1990 is
between 1988 and 1989. If you denote this as   , it can be calculated as
follows:
                
          
    
    

The 4-point moving median obtained in this way is called the non-central 4-point
moving median. As such, the non-central moving average in the case of this even
number N does not match the observation year of the original data, which is
inconvenient. In the case of this even number, it is calculated as the average of
the values of the two non-central moving averages that are adjacent to each
other. In other words, the central four-point moving median in 1989 is the mean
of   and   .
13.3 Transformation of Time Series
Ÿ
Time series can be viewed by drawing the raw data directly, but in order to
examine various characteristics, change in percentage increase or decrease is
examined, and an index that is a percentage with respect to base time is alse
examined. In addition, in order to examine the relation of the previous data, it is
compared with a time lag or converted into horizontal data using the difference.
When the variance of the time series increases with time, it is sometimes
converted into a form suitable for applying the time series model by using
logarithmic, square root, or Box-Cox transformation.
13.3.1 Percent
A. Percentage Change
In a time series, you can examine the increase or decrease of a value, but you
can easily observe the change by calculating the percentage increase or decrease.
When the time series is expressed as       ⋯    , the percentage increase or
decrease   compared to the previous data is as follows.
    
    ×  ,     
  
[Table 13.3.1] shows the number of houses in Korea from 2010 to 2020, and
<Figure 13.3.1> shows the percentage increase or decrease compared to the
previous data. Looking at this rate of change, it can be easily observed that the
original time series has an overall increasing trend, but the rate of change of the
previous year has many changes. In other words, it can be observed that there
was a 2.23% increase in the number of houses in 2014 compared to the previous
year, and a 2.48% increase in the number of houses in 2018 as well.
   
    ×   
 
192
/ Chapter 13 Time Series Analysis

[Table 13.3.1] Number of Houses in Korea and Percent Change
(Korea National Statistical Office, unit 1000)
Year
Number of Houses
% Change
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
17738.8
18082.1
18414.4
18742.1
19161.2
19559.1
19877.1
20313.4
20818.0
21310.1
21673.5
1.93
1.83
1.77
2.23
2.07
1.62
2.19
2.48
2.36
1.70
<Figure 13.3.1> Number of Houses in Korea and Percent Change
B. Simple Index
Another way to use percentages to easily characterize changes over time is to
calculate an index number. An index number is a number that indicates the
change over time of a time series. The index number   of a time series at
a certain point in time is the percentage of the total time series data for a
predetermined time point  called the base period.

   ×  ,     
 
The most commonly used indices in the economic field are the price index and
the quantity index. For example, the consumer price index is a price index
indicating the price change of a set of goods that can reflect the total consumer
price, and the index indicating the change in total electricity consumption every
year is the quantity index. There are several methods of calculating the index,
which are broadly divided into simple index number when the number of items
13.3 Transformation of Time Series / 193

represented by the index is one, and composite index number when there are
several as in the consumer price index.
[Table 13.3.2] is a simple index for the number of houses in Korea from 2010
to 2020, with the base time being 2010. If you look at the figure for the index,
you can see that in this case, there is no significant change from the original
time series and trend. It can be seen that there is a 22.18% increase in the
number of houses in 2020 compared to 2010.
 

   ×    ×    ,     
 

[Table 13.3.2] Simple Index of Number of Houses in Korea
(Korea National Statistical Office, unit 1000)
Simple Index
Year
Number of Houses
Base: 2010
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
17738.8
18082.1
18414.4
18742.1
19161.2
19559.1
19877.1
20313.4
20818.0
21310.1
21673.5
100.00
101.94
103.81
105.66
108.02
110.26
112.05
114.51
117.36
120.13
122.18
<Figure 13.3.2> Simple Index of Number of Houses in Korea
C. Composite Index
Composite index is a method in which the change in price or quantity of
several goods is set at a specific time point as the base period, and then the
data at each time point is calculated as a percentage value compared to the base
period. An example of the most used composite index is the consumer price
index, which reflects price fluctuations of about 500 products in Korea that affect
194
/ Chapter 13 Time Series Analysis

consumer prices. Other commonly used composite indices include the
comprehensive stock index, which examines the price fluctuations of all listed
stocks traded in the stock market.
For the composite index, a weighted composite index that is calculated by
weighting the price of each product with the quantity consumed is often used.
When calculating such a weighted composite index, the case where the quantity
consumption at the base time is used as a weight is called the Laspeyres method,
and the case where the quantity consumption at the current time is used as the
weight is called the Paasche method. In general, the Laspeyres method of
weighted composite index is widely used, and the consumer price index is a
representative example. The price index of the Paasche method is used when the
consumption of goods used as weights varies greatly over time, and can be used
only when the consumption at each time point is known. It is expensive to
examine the quantity consumption at each point in time.
Assuming that    ⋯    are the prices of  number of products at the
time point , and    ⋯     are the quantities of each product consumpted
at the base time, the formula for calculating each composite index is as follows:
Laspeyres Index:
Paasche Index:
      ⋯      
   × 
       ⋯       
     ⋯     
   × 
      ⋯      
The data in [Table 13.3.3] shows the price and quantity of three
metals by month in 2020.
[Table 13.3.3] Composite Index of three Metal Prices($/ton) and Production Quantity(ton)
Month
1
2
3
4
5
6
7
8
9
10
11
12
Copper
Price
Quantity
1361.6
1399.0
1483.6
1531.6
1431.2
1383.8
1326.8
1328.8
1307.8
1278.4
1354.2
1305.2
100.7
95.1
104.0
95.6
103.3
106.9
95.9
96.7
95.7
89.1
100.5
96.9
Metal
Price
Quantity
213
213
213
213
213
213
213
213
213
213
213
213
4311
4497
5083
5077
5166
4565
4329
4057
3473
3739
3817
3694
Price
Lead
Quantity
530.0
520.0
529.0
540.0
531.0
580.0
642.8
602.6
513.6
480.8
528.4
462.2
46.1
47.0
51.0
23.0
26.5
13.5
27.4
25.8
20.5
24.6
21.5
27.9
Laspeyres
100.00
100.31
101.13
101.63
100.65
100.42
100.16
100.00
99.43
99.01
99.92
99.18
Paasche
100.00
100.28
101.01
101.35
100.57
100.27
99.98
99.87
99.38
99.07
99.92
99.21
In [Table 13.3.3], the Laspeyres index for the data for February with January as
the base time is as follows.
    ⋯    
   × 
     ⋯     
             
   
             
Similarly, Paasche index is as follows:
13.3 Transformation of Time Series / 195

     ⋯     
   × 
     ⋯     
            
    
            
In [Table 13.3.3], it can be seen that the production quantity of iron and lead in
the last 4 quarters is significantly different from the production quantity in
January, which is the base time. In this way, when the quantity fluctuates greatly
and the quantity at each time is known, the Pasche index can be said to be the
best index because it appropriately reflects the price change at that time.
13.3.2 Time Lag and Difference
A. Time Lag
In a time series, current data can usually be related to past data. Lag means a
transformation for comparing data of the present time and observation values ​at
one time point or a certain past time point. That is, when the observed time
series is       ⋯    , the time series with lag 1 becomes
       ⋯      . Note that, in case of lag k, there are no data for the
first  number than the original data.
The correlation coefficient between the time lag data and the raw data is
called the autocorrelation coefficient. If the average of time series is 
 , the 
-lag autocorrelation  is defined as follows:


  
     

  
  

  
 
k=0,1,2,⋯,n-1


    ⋯   are called an autocorrelation function and are used to determine a
time series model.
[Table 13.3.4] shows the monthly consumer price index for the past two years
and time lag 1 to 12 for this data, and the autocorrelation coefficients are shown
in [Table 13.3.5]. <Figure 13.3.3> shows the original time series and the
autocorrelation function.
196
/ Chapter 13 Time Series Analysis

[Table 13.3.4] Monthly Consumer Price Index and Time Lag 1, Lag 2, ... Lag 12
time
Year/month
CPI
Lag 1
Lag 2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
2020.01
2020.02
2020.03
2020.04
2020.05
2020.06
2020.07
2020.08
2020.09
2020.10
2020.11
2020.12
2021.01
2021.02
2021.03
2021.04
2021.05
2021.06
2021.07
2021.08
2021.09
2021.10
2021.11
2021.12
102.3
102.8
103.7
104.1
104.0
104.3
104.5
104.9
104.8
104.8
104.2
104.4
105.0
105.5
106.1
106.7
107.1
107.0
106.7
107.4
108.0
107.7
107.8
108.3
102.3
102.8
103.7
104.1
104.0
104.3
104.5
104.9
104.8
104.8
104.2
104.4
105.0
105.5
106.1
106.7
107.1
107.0
106.7
107.4
108.0
107.7
107.8
102.3
102.8
103.7
104.1
104.0
104.3
104.5
104.9
104.8
104.8
104.2
104.4
105.0
105.5
106.1
106.7
107.1
107.0
106.7
107.4
108.0
107.7
[Table 13.3.5]
Autocorrelation Function
t
autocorrelation
1
2
3
4
5
6
7
8
9
10
11
0.8318
0.6772
0.5651
0.4479
0.3333
0.2547
0.1647
0.0755
-0.0143
-0.0854
-0.1737
...
Lag 12
...
102.3
102.8
103.7
104.1
104.0
104.3
104.5
104.9
104.8
104.8
104.2
104.4
13.3 Transformation of Time Series / 197

<Figure 13.3.3> Autocorrelation Function Graph
B. Differencing
Since the price index in [Table 13.3.4] has a linear trend, a model for this
trend can be built, but in some cases, a model can be created by changing the
time series to a horizontal trend. The way to transform a linear trend into a
horizontal trend is to use a differencing. When the time series is       ⋯    ,
the first order difference ∇   is as follows:
∇           , t=2,3,⋯,n
If the raw data is a linear trend, the first-order differencing of time series is a
horizontal time series because it means a change in slope. If we make
differencing on the first-order differencing ∇   , it becomes the second-order
difference as follows:
∇     ∇    ∇                         ,      ⋯  
If the raw data has a trend with a quadratic curve, the second differencing of
time series becomes a horizontal time series.
<Figure 13.3.4> shows the first order differencing of [Table 13.3.4] time series
and it becomes horizontal series.
198
/ Chapter 13 Time Series Analysis

<Figure 13.3.4> 1st Order Differencing of Consumer Price Index
13.3.3 Mathematical Transformation
Ÿ
If the original data of the time series is used as it is, modeling may not be easy
or it may not satisfy various assumptions. In this case, we can fit the model we
want by performing an appropriate functional transformation, such as log
transformation. The functions commonly used for mathematical transformations are
as follows.
Log function
Square root function
Square function
Box-Cox Transformation
Ÿ
  log  
  

  
  

 ≠
  


 log      
[Table 13.3.6] is a toy company's quarterly sales, and <Figure 13.3.5> is a diagram
of this data. It is a seasonal data by quarter, but, as time goes on, dispersion of
sales increases over time. It is not easy to apply a time series model to data
with this increasing dispersion over time. In this case, log transformation
  log   can reduce the dispersion as time increases, as shown in <Figure
13.3.5>, so that a model can be applied. After predicting by applying the model
to log-transformed data, exponential transformation   exp   is performed
again to predict the raw data.
13.4 Regression Model and Forecasting / 199

[Table 13.3.6] Quarterly Sales of a Toy
Company (unit million $)
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Year
2017 Quarter
Quarter 2
Quarter 3
Quarter 4
2018 Quarter
Quarter 2
Quarter 3
Quarter 4
2019 Quarter
Quarter 2
Quarter 3
Quarter 4
2020 Quarter
Quarter 2
Quarter 3
Quarter 4
2021 Quarter
Quarter 2
Quarter 3
Quarter 4
Sales
1
1
1
1
1
38.0
53.6
57.5
200.0
56.5
75.8
78.3
269.7
70.2
92.7
101.8
332.6
97.3
123.7
132.9
429.4
138.3
167.6
189.9
545.9
<Figure 13.3.5> Log Transformation of Toy Sales
Ÿ
The square root transform is used for a similar purpose to the log transform, and
the square transform can be used when the variance decreases with time. The
Box-Cox transform is a general transformation.
13.4 Regression Model and Forecasting
Ÿ
If there is a trend factor that shows a continuous increase or decrease in the
time series, the regression model learned in Chapter 12 can be applied. For
example, if the time series shows a linear trend, the linear regression model is
applied with the time series as the observation values of the random variable
          and time as 1, 2, ... ,  as follows.
200
/ Chapter 13 Time Series Analysis

        ⋅   
Here εt is ther error term with mean 0 and variance  .
Ÿ
A characteristic of the linear model is that it increases by a slope   of a certain
magnitude over time.
When the estimated regression coefficients are 
 
  , the validity test of the
linear regression model is the same as the method described in Chapter 12. The
standard error of estimate and coefficient of determination are often used. In a
linear trend model,  represents the degree to which observations can be
scattered around the estimated regression line at each time point. As an estimate
of this , the following standard error is used.

Ÿ




   
  

 


A smaller standard error value  indicates that the observed values are close to
the estimated regression line, which means that the regression line model is well
fitted.
The coefficient of determination is the ratio of the regression sum of squares,
RSS, which is explained out of the total sum of squares, TSS.
RSS
  
TSS
Ÿ
The value of the coefficient of determination is always between 0 and 1, and the
closer the value is to 1, the more dense the samples are around the regression
line, which means that the estimated regression equation explains the observations
well.
As explained in Chapter 12, since it is difficult to determine the absolute criteria
for adequacy of the standard error or the coefficient of determination, a
hypothesis test is used to determine whether the trend parameter β1 is zero or
not.
Hypothesis:
Test statistic:
H      , H    ≠ 



   , Here SE 
   





SE  
   


Rejection region:
Ÿ
If         , reject H  with significance level  .
If the null hypothesis H       is not rejected, the model cannot be
considered valid.
The assumption for error   is tested using the residual, which is the difference
between the observed time series value and the predicted value which is called
residual analysis. Residual analysis usually examines whether assumptions about
error terms such as independence and equal variance between errors are satisfied
by drawing a scatter plot of the residuals over time or a scatter plot of the
residuals and predicted values. In the scatterplots, if the residuals do not show a
specific trend around 0 and appear randomly, it means that each assumption is
valid. To examine the normality assumption of the error term, draw a normal
probability plot of the residuals, and if the points on the figure show the shape
of a straight line, it is judged that the assumption of the normal distribution is
appropriate.
13.4 Regression Model and Forecasting / 201

Ÿ
If



the
linear
regression
model
is
suitable, the
predicted
value


      ⋅ at the time point  can be interpreted as a point
estimate for the mean of the random variable    at the time point, and the
confidence interval for the mean of 
   at this time  is as follows:

   ±     ⋅SE 
 
Here SE
     ⋅
Ÿ
(Cubic)
        ⋅    ⋅   
        ⋅    ⋅    ⋅   
The prediction method is similar to the above simple linear regression model.
If the trend is not a polynomial model as above, the following model can also be
considered.
(Square root)
(Log)
Ÿ

 
If the trend is in the form of a quadratic, cubic or higher polynomial, the
following multiple linear regression model can be assumed.
(Quadratic)
Ÿ


   

 

 
   
        ⋅    
        ⋅log     
These models are the same as the linear regression model if I 
 or log   are
replaced with a variable  in the simple linear regression and the prediction
method is similar.
In addition, the function types to which the linear regression model can be
applied by transformation are as follows.

(Power)
    ⋅    
(Exponential)
     ⋅
  
 
In the case of these two models, the parameters should be estimated using the
nonlinear regression model, but if the error term is ignored, the linear model can
be estimated approximately as follows:
(Power)
(Exponential)
Ÿ
log     log       ⋅log  
log     log       
Korea's GDP from 1986 to 2021 is shown in [Table 13.4.1]. <Figure 13.4.1> shows
the application of three regression models to this data. Among these models, the
quadratic model has the largest value of  = 0.9591, so it can be said that the
time series is the most suitable model. However, additional validation of the
model is required.
202
/ Chapter 13 Time Series Analysis

[Table 13.4.1] GDP of Korea
Year
GDP (billion $)
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
330.65
355.53
392.67
463.62
566.58
610.17
569.76
383.33
497.51
576.18
547.66
627.25
702.72
793.18
934.9
1053.22
1172.61
1047.34
943.67
1143.98
1253.16
1278.43
1370.8
1484.32
1465.77
1499.36
1623.07
1725.37
1651.42
1638.26
<Figure 13.4.1> GDP of Korea and Three Regression Model
13.5 Exponential Smoothing Model and Forecasting / 203

13.5 Exponential Smoothing Model and Forecasting
Ÿ
When the time series moves in a trend, the future can be predicted well with
the regression model. However, it may not be appropriate to predict a time
series that is dynamically moving hourly, daily, etc. In this case, a moving average
model or an exponential smoothing model can be used. The time series model is
explained into two cases where the trend is stationary and linear.
13.5.1 Stationary Time Series
Ÿ
A time series is called stationary if the statistical properties such as mean,
variance and covariance are consistent over time. When a time series is the
observed values of random variables       ⋯    , a stationary time series is
the following model that changes around a constant value .
       ,    
Here  is unknown parameter and   is an error term which is independent with mean 0

and variance  .
A. Single Moving Average Model
In a stationary time series model, the estimated value of , 
, is the mean of
the data. .


  



 

Using this model, the prediction after  time points ahead at the current time 
denoted as 
    , is as follows:


 
 
 , τ=1,2,⋯
It is called a simple average model. .
The simple average model uses all observations until the current time. However,
the unknown parameter  may shift slightly over time, so it would be reasonable
to give more weight to recent data than to past data for prediction. If a weight

is given to only the most recent  observations at the present time  and


the weight of the remaining observations is set to 0, the estimated value of  is
as follows.


  



   

                ⋯   

This is called a single moving average model at the time point  and is denoted
by   . The  single moving average means the average of the observations
adjacent to the time point  . Notice that           are independent of
each other by assumption, but        are not independent of each
other, but are correlated.
The value of the single moving average varies depending on the size of  .
When the value of  is large, it becomes insensitive to the fluctuations of the
original time series, so it changes gradually, and when the value of  is small, it
becomes sensitive to fluctuations. Therefore, when the fluctuation of the original
204
/ Chapter 13 Time Series Analysis

time series is small, it is common to set the small value of  , and when the
fluctuation is large, it is common to set the value of large  .
Using the single moving average model at the time point  , the predicted
value and the mean and variance of the predicted value at the time point   
are as follows.


   ,     ⋯


            
 

  
            

When the single moving average model is used, the 95% confidence interval
estimation of the predicted value is approximately as follows.


 

±  
   
=>   ± 




The monthly sales for the last two years of a furniture company are as shown
in [Table 13.4.2], and the residual between the raw data and the predicted value
of one point in time was calculated by obtaining a six-point moving average.
<Figure 13.5.1> shows the time series for this. This time series fluctuates up and
down based on approximately 95, and such a time series is called a stationary
time series.
When  = 6, the moving average for the first 5 time points cannot be
obtained. The moving average at time 6 is is as follows:
          
     

Therefore, one time prediction at time 6 becomes 
     and the residual
at time 7 is as follows:
     
          
In the same way, the moving average of the remaining time points, the predicted
values ​after one time point, and the residuals are as shown in [Table 13.5.1], so
the mean square error is as follows:

 
 



 
    

Sales for the next three months are the last moving average   , and the
95% confidence interval for the forecast is as follows:



     
        


  ±  



=>  ±   

  


=> [90.10, 119.23]
※ Moving average at initial period
Since the  -point single moving average cannot be obtained before the time
13.5 Exponential Smoothing Model and Forecasting / 205

point  , the prediction model cannot be applied. When there are many time
series, this may not be a big problem, but when the number of data is small, it
can affect the prediction. In order to solve this problem, the moving average at
initial period can be obtained as follows until the time point   .
  ,
   
  
  
  ,

⋯
     ⋯    
    ,     

[Table 13.5.1] Montly Sales of a Furniture Company and 6-point Moving Average, One
Time Forecast and Residuals
Time
Sales
(unit million $)
6-pt Moving
Average
Mt
One Time
Forecast


Residual
     

t

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
95
100
87
123
90
96
75
78
106
104
89
83
118
86
86
112
85
101
135
120
76
115
90
92
98.50
95.17
91.50
94.67
91.50
91.33
89.17
96.34
97.67
94.34
95.67
95.00
98.00
100.84
106.50
104.84
105.34
106.17
104.67
98.50
95.17
91.50
94.67
91.50
91.33
89.17
96.34
97.67
94.34
95.67
95.00
98.00
100.84
106.50
104.84
105.34
106.17
-23.50
-17.17
14.50
9.33
-2.50
-8.33
28.83
-10.33
-11.67
17.67
-10.67
6.00
37.00
19.17
-30.50
10.17
-15.33
-14.17

206
/ Chapter 13 Time Series Analysis

<Figure 13.5.1> Montly Sales of a Furniture Company and 6-point Moving
Average and One Time Forecast
B. Single Exponential Smoothing Model

In the single moving average model, the same weight  is given to only the

latest  observations, and the previous observations are completely ignored by
setting the weight to 0. The single exponential smoothing method compensates
for the shortcomings of the moving average model by assigning weights to all
observations when predicting future values from past observations, but giving
more weight to recent data. This single exponential smoothing model uses the
value of the exponential smoothing method as the predicted value.
The single exponential smoothing model calculates the weighted average of the
exponential smoothing estimator 
 at the immediately preceding time point and
the observation value  at the time point  . Assuming that the exponential
smoothing estimated value at the time point  is    
 and  is a real
number between 0 and 1, the single exponential smoothing value   is defined
as follows.
            ,
            ,
⋯
            ,
Here,  is called the smoothing constant, and the single exponential smoothing
value is the weighted average value given the weight  of the most recent
observation and     of the exponential smoothing value     at the time
  . You can better understand the meaning of exponential smoothing if you
write down the recursive equation as follows:
             
                      
                     
                       +․․․+
                
13.5 Exponential Smoothing Model and Forecasting / 207

In other words, for the single exponential smoothing value   , the most recent
observation   is given a weight , and the next most recent observation is
given     , the next is       ⋯ and so on, a gradually smaller weight.
Therefore, if the size of  is small, the current observation value is given a small
weight, and the exponential smoothing value is insensitive to the fluctuations of
the time series. if the size of  is large, the current observation value is given a
large weight, and the exponential smoothing value is sensitive to the fluctuations
of the time series. In general, a value between 0.1 and 0.3 is often used as the
value of  .
In order to obtain a single exponential smoothing value, an initial smoothing
value   is required, and the first observation value or the sample average of
several initial data or the overall sample average can be used. The exponential
smoothing method has the advantage of being less affected by extreme point or
intervention than the ARIMA model and easy to use, although the selection of
the smoothing constant is arbitrary and it is difficult to obtain a prediction
interval.
The predicted value, average and variance of the predicted value at the time
point    using the single exponential smoothing model are as follows:


 

            
 

  
             

Therefore, when the single exponential smoothing model is used, the 95% interval
estimation in the predicted value is approximately as follows.


 

± 
 
=>   ±  






To the data of [Table 13.5.1], predict sales for the next three months by a
single exponential smoothing model with smoothing constant  = 0.1. Let’s use
the first observed value for the initial value of exponential smoothing, that is  
=   = 95. The exponential smoothing value for the first three time series are as
follows:
    ×        ×     ×    ×   
    ×        ×     ×    ×   
    ×        ×     ×    ×   
At each time point, the prediction after one point in time is as follows:

        

        

        
Hence the residuals using the above estimated values are as follows:
    
    
    

         

         

          
208
/ Chapter 13 Time Series Analysis

In the same way, the single exponential smoothing of the remaining time points,
the predicted values after one time point, and the residuals are as shown in
[Table 13.4.2]. Therefore, the mean square error is as follows:

 Y  Y 
i
i

i
MSE    

In terms of mean square error, the MSE of the 6-point single moving average
model is 331.22, so it can be said that the exponential smoothing model has
better fit.
Sales for the next three months are the last moving average   , and the 95%
confidence interval for the forecast is as follows:



     
        


  ±   
 


=>   ±     
  
  


=> [65.27, 132.05]
[Table 13.5.2] summarizes the above equations, and <Figure 13.5.2> shows the
prediction after one time point and the prediction for the next 3 months using
the single exponential smoothing model with  = 0.1.
[Table 13.5.2] Exponential Smoothing with α = 0.1, One Time Forecast and Residual
Time
Sales
(unit million $)
Exponential
Smoothing
St
One Time
Forecast


Residual
     

t

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
95
100
87
123
90
96
75
78
106
104
89
83
118
86
86
112
85
101
135
120
76
115
90
92
95.00
95.50
94.65
97.48
96.74
96.66
94.50
92.85
94.16
95.15
94.53
93.38
95.84
94.86
93.97
95.77
94.70
95.33
99.29
101.36
98.83
100.45
99.40
98.66
95.00
95.00
95.50
94.65
97.48
96.74
96.66
94.50
92.85
94.16
95.15
94.53
93.38
95.84
94.86
93.97
95.77
94.70
95.33
99.29
101.36
98.83
100.45
99.40
0.00
5.00
-8.50
28.35
-7.48
-0.74
-21.66
-16.50
13.15
9.84
-6.15
-11.53
24.62
-9.84
-8.86
18.03
-10.77
6.30
39.67
20.71
-25.36
16.17
-10.45
-7.40

13.5 Exponential Smoothing Model and Forecasting / 209

<Figure 13.5.2> Exponential Smoothing with α = 0.1 and One Time
Forecast
※ Initial value of exponential smoothing
Since the initial exponential smoothing value   at the time point    cannot
be obtained, the following three methods are commonly used.
1) The first observation, i.e.      ,
2) Partial average using the initial  observation values, that is

         ⋯    ,

3) The mean up to the entire time point  , i.e.,

          ⋯    

※ Initial smoothing constant
The same smoothing constant  can be applied to all time series, but the
following method is also used to reduce the effect of the initial value   .

    , until   reaches 

13.5.2 Linear Trend Time Series
A. Double Moving Average Model
In the previous section, we examined that the single moving average model can
be applied to a stationary time series. What would happen if the single moving
average model was applied to a time series with a linear trend? That is, for a
time series with a linear trend         ⋅    , the  -point single
moving average at time  is as follows:

                 ⋯    

The expected value can be shown as follows:
210
/ Chapter 13 Time Series Analysis


              

That is, in the case of a linear trend model, it can be seen that the single

moving average   is biased by    . For example, if the consumer price

index with a linear trend in [Table 13.5.3] is predicted after one point in time
using the 5-point single moving average, it is as shown in <Figure 13.5.3>. It can
be seen that the predicted value using   is under estimated value of the time
series .
<Figure 13.5.3> 5-pt Moving Average of Consumer Price Index with Linear
Trend
In the case of a linear trend, one way to eliminate the bias of the single moving
average model is the double moving average, which obtains the moving average
again for the single moving average. The  -point double moving average   
at the time  and its expected value are as follows.

             ⋯        

                
Since     and     have the same number of parameters,      can
be estimated by solving the system of two equations as follows:


         


          
 
Therefore, the predicted value at the time point    using the double moving
average at time  as follows.


 



         

Such a double moving average model can be said to be a kind of heuristic
13.5 Exponential Smoothing Model and Forecasting / 211

method. That is, although logical, it is not based on any optimization such as
least squares method. However, it can be approximated by the least-squares
method, which we will omit in this book.
[Table 13.5.3] is a calculation table for predicting the consumer price index
using the 5-point double moving average model. Note that the third column is a
5-point single moving average   , but the single moving average cannot be
calculated from time points 1 to 4. The fourth column is the calculation of the
5-point double moving average    , but the double moving average cannot be
calculated until 5 single moving averages have been calculated, that is, from time
points 1 to 8. Using   and    to obtain the prediction after time 1 from
time 9, 

is as follows:
 






                
 

  ×            
 


The predicted values calculated in the same way are shown in the fifth column.
[Table 13.5.3] Double Moving Average of Consumer Price Index and One Time Forecast
Time
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Consumer
Price Index
5-pt Single
Moving Average
5점 Double
Moving Average



One Time
Forecast


102.3
102.8
103.7
104.1
104.0
104.3
104.5
104.9
104.8
104.8
104.2
104.4
105.0
105.5
106.1
106.7
107.1
107.0
106.7
107.4
108.0
107.7
107.8
108.3
103.38
103.78
104.12
104.36
104.50
104.66
104.64
104.62
104.64
104.78
105.04
105.54
106.08
106.48
106.72
106.98
107.24
107.36
107.52
107.84
104.028
104.284
104.456
104.556
104.612
104.668
104.744
104.924
105.216
105.584
105.972
106.36
106.7
106.956
107.164
107.388
105.2080
105.2240
104.9160
104.7160
104.6820
104.9480
105.4840
106.4640
107.3760
107.8240
107.8420
107.9100
108.0500
107.9660
108.0540
   
212
/ Chapter 13 Time Series Analysis

<Figure 13.5.4> Forecast using Double Moving Average of Consumer Price
Index
B. Holt Double Exponential Smoothing Model
Holt proposed a model for a linear time series         ⋅   
which uses a smoothing constant for the level and a smoothing constant for the
trend. This is called Holt's linear trend exponential smoothing model or
two-parameters double exponential smoothing model. Let 
  and 
  be the initial
values of the intercept and slope, and  be the smoothing constant of the level
and  is the smoothing constant of the trend. The predicted values 
  , level


    and trend     are as follows:
Predicted value:
Level:
Trend:

  
     
   ,      ⋯

            
      ⋯



                   
      ,   ⋯
That is, the level is the weighted average of the current observed value   and
the predicted value 
  , and the trend is the weighted average of the level
difference 
     
      between the time points  and (  ) and the trend

      at the time point (t-1). For this model, initial values of level 
   
and slope 
    are required and the intercept and slope of the simple
regression analysis of all observed values are widely used as initial estimates.
Similar to the single exponential smoothing model, a value between 0.1 and 0.3 is
often used to determine the smoothing constants  and .
The predicted values for the time point    at time  using the trend
exponential smoothing model are as follows:


 

   
   
Such a trend exponential smoothing model is also a kind of heuristic method.
That is, although logical, it is not based on any optimization such as least squares
method.
The result of simple linear regression model to all data in [Table 13.5.4] is
13.5 Exponential Smoothing Model and Forecasting / 213

as follows:

      
[Table 13.5.4] is a calculation table for predicting the consumer price index with
the Holt double exponential smoothing model using this initial values. The third
column is the predicted value of the level 
    , the fourth column is the trend
   , and the fifth column is the prediction 
  
     
    obtained

one time after each time point. Therefore, the forecast of the consumer price
index for the next three months is as follows:
   
   × 
       
t = 25 :


      ×   
t = 26 :       × 
   
   × 
      ×   
t = 27 :
[
[Table 13.5.4] Forecasting using Holt Double Exponential Smoothing Model of
Consumer Price Index
One Time
Consumer
Constant
Trend
Time
Forecast
Price Index


 
 
t




0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
102.3
102.8
103.7
104.1
104.0
104.3
104.5
104.9
104.8
104.8
104.2
104.4
105.0
105.5
106.1
106.7
107.1
107.0
106.7
107.4
108.0
107.7
107.8
108.3
   
102.57
102.76
102.97
103.25
103.54
103.80
104.07
104.33
104.61
104.85
105.07
105.20
105.33
105.50
105.70
105.93
106.20
106.49
106.75
106.96
107.21
107.50
107.73
107.95
108.20
0.234
0.229
0.227
0.232
0.239
0.241
0.243
0.245
0.249
0.248
0.245
0.234
0.224
0.218
0.216
0.218
0.223
0.230
0.233
0.230
0.232
0.238
0.237
0.236
0.237
102.81
102.99
103.20
103.48
103.78
104.04
104.31
104.58
104.86
105.10
105.31
105.44
105.56
105.72
105.91
106.15
106.43
106.72
106.98
107.19
107.44
107.73
107.97
108.19
<Figure 13.5.5> shows the predicted values using the Holt’s double
exponential smoothing model.
214
/ Chapter 13 Time Series Analysis

<Figure 13.5.5> Forecasting using Holt Double Exponential Smoothing
Model of CPI
13.6 Seasonal Model and Forecasting
Ÿ
As a seasonal time series model, a multiplicative model using a central moving
average and a Holt-Winters model are introduced.
13.6.1 Seasonal Multiplicative Model
Ÿ
Assume that a time series   with a seasonal period  can be expressed as the
product of a trend (  ), a seasonal (  ), and an irregular component (  ) as
follows:
     ⋅   ⋅  .
Ÿ
The ratio to moving average method removes the trend and irregular components
to obtain the seasonal component as follows
(Step 1) For the time series, find the  -point centered moving average. This
moving average represents the trend component  after removing seasonal
component and irregular component from the time series.
(Step 2) Divide the time series   by the trend component   obtained in Step
1. This value implies the seasonal component and the irregular component
  ⋅  , and is called the seasonal ratio.

   ⋅ 



(Step 3) Calculate the trimmed average for each seasonal ratio obtained in Step
2. This is the seasonal index, but the normalization should be performed so that
the sum of the seasonal indices is  .
13.6 Seasonal Model and Forecasting / 215

Ÿ
After obtaining the seasonal index as shown, dividing the original time series data
by the seasonal index. The results is called a deseasonal time series.

Deseasonal time series:      


Ÿ
Ÿ
This deseasonal time series   implies  . An appropriate time series model is
applied to this deseasonal data and predict the future vaules. Then multiply the
corresponding seasonal index to obtain the final predicted value of the desired
season.
[Table 13.6.1] shows a company's quarterly sales. Since the seasonal period is 4,
the 4-point centered moving average is as shown in column 4 of the table. By
dividing the original time series by a 4-point centered moving average, the
seasonal ratio in column 5 can be calculated.

   ⋅ 



[Table 13.6.1] Quarterly Sales of a Company
③
④
②
①
⑤
Centered
Sales
Seasonal
4-point
4-point

Year Quarter
Ratio
MA
MA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Ÿ
2018 Quarter
Quarter 2
Quarter 3
Quarter 4
2019 Quarter
Quarter 2
Quarter 3
Quarter 4
2020 Quarter
Quarter 2
Quarter 3
Quarter 4
2021 Quarter
Quarter 2
Quarter 3
Quarter 4
1
1
1
1
75
60
54
59
86
65
63
80
90
72
66
85
100
78
72
93
62.000
64.750
66.000
68.250
73.500
74.500
76.250
77.000
78.250
80.750
82.250
83.750
85.750
63.375
65.375
67.125
70.875
74.000
75.375
76.625
77.625
79.500
81.500
83.000
84.750
0.852
0.902
1.281
0.917
0.851
1.061
1.175
0.928
0.830
1.043
1.205
0.920
⑥
Deseasonal
Data  
61.292
64.915
63.759
58.700
70.281
70.324
74.385
79.593
73.550
77.898
77.928
84.567
81.722
84.389
85.012
92.527
[Table 13.6.2] shows the seasonal ratio by year and quarter. If the maximum and
minimum values are removed for each quarter and the average is obtained
(trimmed average), it is the seasonal index in column 6. Since the sum of these
values is 4.0197, the seasonal index in column is normalized as in column 7.

  = 1.1991, 
  = 0.9159, 
  = 0.8472, 
  = 1.0378
216
/ Chapter 13 Time Series Analysis

[Table 13.6.2] Seasonal Index
①
Year
Quarter
1st
2nd
3rd
4th
Quarter
Quarter
Quarter
Quarter
②
③
④
⑤
2018
2019
2020
2021
1.175
0.928
0.830
1.043
1.205
0.920
0.852
0.902
1.281
0.917
0.851
1.061
⑥
Trimmed
Mean of
Seasonal
Ratio
1.2050
0.9204
0.8514
1.0429
⑦
Seasonal
Index
St
1.1991
0.9159
0.8472
1.0378
sum
4.0197
Column 6 of [Table 13.6.1] shows the non-seasonal data   obtained by
dividing the original data by the seasonal index of each quarter. The linear
regression line for this non-seasonal data is as follows (<Figure 13.6.1>).,
      
Therefore, the forecast for the next one year is as follows.

Time 17 : 

Time 18 : 

Time 19 : 

Time 20 : 

    ×  ×   
    ×  ×   
    ×  ×   
    ×  ×   
<Figure 13.6.2> is a graph of seasonal forecasts.
<Figure 13.6.1> Forcasting Model of Deseasonal Sales of a Company
13.6 Seasonal Model and Forecasting / 217

<Figure 13.6.2> Forcasting of Quarterly Sales of a Company
13.6.2 Holt-Winters Model
Ÿ
Assume that a time series with a seasonal period  is observed over  cycles
as follows:
Season 1 Season 2
…
Season L
----------------------------------------------------------------------------------------------Cycle 1


…

Cycle 2
  
  
…

…
…
…
Cycle m         
…

-----------------------------------------------------------------------------------------------
Ÿ
The Holt-Winters model is an extension of Holt's linear double exponential
smoothing method studied in the previous section to a seasonal model. It consists
of a level component  , a trend component  , and a seasonal component   .
There are additive model and multiplicative model, but only the multiplication
model is introduced here.


     ⋅         

               
  
               

            
       
Here  is integer part of     

  is a time series level, which means the exponential smoothing of the current

level (  ) with seasonality removed and the value predicted one time ago
  
        .  is the slope which is the exponential smoothing of the slope of
the current time point       and the previous time point (   ).   is a
seasonal index which is the exponential smoothing of the current seasonal
218
/ Chapter 13 Time Series Analysis

Ÿ

component (  ) and the seasonal component of the previous season
       
   .
[Table 13.6.3] calculates exponential smoothing values of level, slope, and seasonal
indices using the Holt-Winters model with  = 0.3,  = 0.3,  = 0.3 to the
quarterly sales of a company. The last column is one time prediction at time
   , 
     . The initial values   and  are the intercept and slope of the
linear regression model for all data, and the initial values of the seasonal index
are calculated by the model    ×  ×  .
[Table 13.6.3] Holt-Winters Forecasting Model of Quarterly Sales
One Time
Seasonal
Sales
Level
Slope
Year Quarter




    
Forecast 
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Ÿ
2018 Quarter
Quarter 2
Quarter 3
Quarter 4
2019 Quarter
Quarter 2
Quarter 3
Quarter 4
2020 Quarter
Quarter 2
Quarter 3
Quarter 4
2021 Quarter
Quarter 2
Quarter 3
Quarter 4
1
1
1
1
75
60
54
59
86
65
63
80
90
72
66
85
100
78
72
93
61.2
62.7312
64.6753
65.5795
63.9809
66.7081
68.7061
71.6766
75.7320
76.5997
78.2283
79.2477
81.6382
83.2158
84.7427
86.0501
88.4618
1.61
1.5863
1.6937
1.4568
0.5402
1.1963
1.4368
1.8969
2.5445
2.0414
1.9176
1.6481
1.8708
1.7829
1.7061
1.5865
1.8341
1.1991
0.9159
0.8472
1.0378
1.1976
0.9210
0.8371
0.9905
1.2382
0.9319
0.8555
1.0195
1.2117
0.9270
0.8459
1.0289
1.2074
0.9242
0.8420
1.0386
75.315
58.908
56.230
69.570
77.270
62.539
58.720
72.874
96.920
73.282
68.561
82.477
101.184
78.791
73.124
90.169
<Figure 13.6.3> is the Holt-Winters forecast for the next one year and is
calculated as follows:

   

   

   


  




   ×  ⋅ 
   ×  ⋅ 
   ×  ⋅ 
   ×  ⋅ 




      ×   
    ×   ×   
    ×   ×   
    ×   ×   
13.6 Seasonal Model and Forecasting / 219

<Figure 13.6.3> Holt-Winters Forecasting of Quarterly Sales
220
/ Chapter 13 Time Series Analysis

Exercise
For the next exercise (13.1 – 13.4), draw graph of the time series data, Apply an
appropriate smoothing method and transformation, Find an appropriate prediction
model to predict the next year.
13.1 The following table provides data on the number of items damaged in shipment during 2001 2014 for a manufacturer.
Year
Items
Year
Items
2001
2002
2003
2004
2005
2006
2007
533
373
132
555
168
281
175
2008
2009
2010
2011
2012
2013
2014
291
228
204
349
234
209
176
13.2 The following table shows the sales volume (in thousands of dollars) of a retail store between
2001-2014.
Year
Sales
Year
Sales
2001
2002
2003
2004
2005
2006
2007
815
1,276
4,752
7,535
10,122
9,642
14,100
2008
2009
2010
2011
2012
2013
2014
12,529
12,824
13,777
15,379
18,705
17,632
16,571
13.3 The following table shows the number of items repaired during a company's warranty period
between 2001 and 2014.
Year
Items
Year
Items
2001
2002
2003
2004
2005
2006
2007
749
709
700
678
611
641
631
2008
2009
2010
2011
2012
2013
2014
611
600
574
559
543
534
524
13.4 The following table shows the annual sales (unit: billion $) of a company for 11 years.
Chapter 13 Exercise / 221

Year
Sales
Year
Sales
2012
2013
2014
2015
2016
2017
12
14
18
20
18
16
2018
2019
2020
2021
2022
20
22
27
24
30
13.5 The following data show the price of silver and crude oil between 2000 and 2015, respectively.
Find the percentage change of silver and crude oil and the price indices and overlay them on one
picture.
Year
Silver
($/ounce)
1.771
1.546
1.684
2.558
4.708
4.419
4.353
4.620
2000
2001
2002
2003
2004
2005
2006
2007
Crude Oil
($/barrel)
1.80
2.18
2.48
5.18
10.46
11.51
11.51
12.70
Year
Silver
Crude Oil
2008
2009
2010
2011
2012
2013
2014
2015
5.440
11.090
20.633
10.481
7.950
11.439
8.141
6.192
15.40
18.00
28.00
32.00
34.00
30.00
26.00
26.00
13.6 The following table shows the number of skis sold by a sports merchandise seller in 2017-2021.
1) Predict the next year with a multiplicative seasonal model.
2) Predict the next year using the Holt-Winters seasonal model.
Year
2017
2018
2019
2021
2021
1
0
3
9
13
4
2
2
0
2
4
12
3
10
5
46
56
6
4
4
4
11
30
10
5
89
14
14
90
17
6
33
23
30
20
32
7
11
7
22
15
24
8
4
11
4
11
9
9
17
11
7
6
10
10
5
4
4
5
5
11
17
4
0
1
17
12
0
8
2
7
1
Download