Tutorial in Statistics Exercise 1 are quantitative?

advertisement
Tutorial in Statistics
Exercise 1 Which of the following variables are categorical(or qualitative) and which
are quantitative?
(i)The color of cars involved in several severe accidents.
(ii)The length of time required for rats to move through a maze.
(iii)The classification of police administration as city, county or state.
(iv)The ratings given to pizza in a taste test as poor, good or excellent.
(v)The number of times subjects in a sociological research study have been
married.
Solution The variables given in (i), (iii) and (iv) are categorical since they
result in non-numerical values. They are classified into categories.
The variables in (ii) and (v) result in numerical values as a result of
measuring and counting respectively and are quantitative variables.
Exercise 2: The areas of various continents of the world in millions of square kilometres
are presented in table below.
Continent
Africa
Asia
Europe
North America
Oceanic
South America
U.S.S.R
Total
Area
30.3
26.9
4.9
24.3
8.5
17.9
20.5
133.3
Display this data using (i) a bar chart and (ii) a pie chart.
Solution (a) The areas of various continents of the world in millions of square
kilometres together with their % of the total area of the continents are
presented in table below.
Continent
Africa
Asia
Europe
North America
Oceanic
South America
U.S.S.R
Area in sq.
kilometre
30.3
26.9
4.9
24.3
8.5
17.9
20.5
Total=133.3
Percentage Area
30.3/133.3=22.7%
26.9/133.3=20.2%
4.9/133.3=3.7%
24.3/133.3=18.2%
8.5/133.3=6.4%
17.9/133.3=13.4%
20.5/133.3=15.4%
Total=100%
(i)The requested bar charts are given below.
Area of continemts in sq.kilometres
Area of Continents
30
20
10
africa
asia n. amer u.s.s.r s.ameroceaniceurope
Continents
Percentage Area of continents
Percentage Area of Continents
25
15
5
africa
asia n. amer u.s.s.r s.ameroceaniceurope
Continents
(ii) The requested pie chart is given below.
Area of Continents in sq. kilometres
asia
(27, 20.2%)
africa
n. amer
(24, 18.2%)
(30, 22.7%)
europe
oceanic
u.s.s.r
(21, 15.4%)
s.amer
(18, 13.4%)
( 5, 3.7%)
( 9, 6.4%)
Exercise 3: The breakdown of total dollars spent on business trips in the United States is
estimated as follows (a) 41% on air fares, (b)22% on lodgings, (c) 12% on meals, (d) 8%
on car rentals and (e) the remaining on other expenses.
(i)
Construct a pie chart to show this information.
(i)
Construct a bar chart to show this information.
Solution
We have 41% on air fares, 22% on lodgings,12% on meals, 8% on car rentals
and 17% on other expenses
Total
100%
(i)
Percentage breakdown of Business Trips
air fares (41, 41.0%)
car rentals ( 8, 8.0%)
other expens (17, 17.0%)
lodgings
(22, 22.0%)
meals
(12, 12.0%)
.
Arranged in decreasing order clockwise
Percentage breakdown of Business Trips
other expens (17, 17.0%)
meals
(12, 12.0%)
car rentals ( 8, 8.0%)
lodgings
(22, 22.0%)
air fares (41, 41.0%)
(ii)
Breakdown of Expenses of Business Trips
Percentage
40
30
20
10
air fares car rentals lodgings
mealsother expenses
C1
Breakdown of Expenses of Business Trips
Percentage
40
30
20
10
air fares lodgings
other expensesmeals car rentals
C1
Exercise 4 The final marks of 80 students at a university are recorded below
68 84 75 82 68 90 62 88 76 93
73 79 88 73 60 93 71 59 85 75
61 65 75 87 74 62 95 78 63 72
66 78 82 75 94 77 69 74 68 60
96 78 89 61 75 95 60 79 83 71
79 62 67 97 78 85 76 65 71 75
65 80 73 57 88 78 62 76 53 74
86 67 73 81 72 63 76 75 85 77.
(a)Construct a Frequency Distribution , the Relative Distribution and the %
Frequency Distribution table for this data and draw the histograms for(i) the
Frequency Distribution and Frequency Polygon, and (ii) the % Frequency
Distribution and % Frequency Polygon.
Comment on the shape of the histogram.
Take 5 equal classes.
(b)Construct a Cumulative Frequency Distribution table and draw the
respective histogram and Ogive. Take 5 equal classes.
(c) How many students got less than or equal to 75 marks and 80 marks?
(d) Construct Cumulative % Frequency Distribution table and draw the
respective histogram and Ogive. Take 5 equal classes
(e) What % of students got less than or equal to 84 marks and 89 marks?
Solution
(a) We take 5 classes of equal width.
97  53 44

 8.8  9 .
Then class size=
5
5
Take equal class size=9.
We will use 5 classes of equal size of 9 with mid-points at 57.5, 66.5, 75.5,
84.5, and 93.5 respectively. Class boundaries are shown below in the
frequency table together with the equal class width.
The required Frequency Distribution Table is given below.
l
Examination
Marks
Class
Boundaries
53 to less than
62
62 to less than
71
71 to less than
80
Tally
Class
midpo
int
Class
Width
Frequency
!!!/ !!!
57.5
9
8
!!!!/
!!!!/
!!!!/
!!!!/
!!!!/
!!!
!!!!/
!
!!!!/
!!!!/
!!!!/
66.5
9
16
75.5
9
33
80 to less than
89
89 to less than
98
!!!!/ !!!!/
!!!!
!!!!/ !!!!
84.5
9
14
93.5
9
9
Sum=80
Frequency Distribution of Examination Marks
of 80 students and Frequency Polygon
Number of students
30
20
10
0
57.5
66.5
75.5
84.5
93.5
Examination Marks
N.B. The shape of the histogram is single peaked and is approximately symmetrical.
The complete frequency table including (i)frequency, (ii) relative frequency and %
frequency is given below.
Class
Boundaries
53 to less than
62
62 to less than
71
71 to less than
80
80 to less than
89
89 to less than
98
Class
midpo
int
57.5
Freq
uenc
y
8
Relative
Frequency
Percentage
% Frequency
8/80=.1
10.00%
66.5
16
16/80=.20
20.00%
70.5
33
33/80=.4125
41.25%
77.5
14
14/80=.175
17.5%
84.5
9
9/80=.1125
11.25%
Sum
=80
Sum=1.0
Sum=100%
Last two columns are calculated from the third column according to following
Note 1 Relative Frequency of a class 
frequency of that class
.
Sum of all frequencies
Note 2 Percentages %=Relative Frequencies  100.
Note: The Relative Frequency and the Percentage Frequency are really the same except
the vertical axis have different units. Hence we will only plot the Percentage Frequency
as this is most frequently used.
% Frequency Distribution of Examination Marks
of 80 students and % Frequency Polygon
% Number of students
40
30
20
10
0
57.5
66.5
75.5
84.5
93.5
Examination Marks
N.B. The shape of the histogram is single peaked and is approximately symmetrical.
(b)
Cumulative Frequency Distribution
Definition: A cumulative frequency distribution gives the total number of values that
fall below the upper boundary of each class. Using the data for the exam results of 80
students we will illustrate the cumulative frequency distribution and the %cumulative
frequency.
Definition: An ogive (ojive) is a curve drawn for the cumulative frequency distribution
by joining with straight lines the dots marked above the upper boundaries of classes at
heights equal to the cumulative frequencies of the respective classes.
The ogive for the exam results of 80 students is as follows.
Example 1 Construct a cumulative frequency distribution and an ojive for the data for
the exam results of 80 students given in the table above.
Total Payroll
Millions of
dollars
Class Boundary
53 to less than
62
62 to less than
71
71 to less than
80
80 to less than
89
89 to less than
98
Frequency
Cumulative
Frequency
8
8
16
8+16=24
33
8+16+33=57
14
8+16+33+14=71
9
8+16+33+14+9=80
Sum=80
The lower boundary of the first class 53 is taken as the lower limit of each class in the
cumulative frequency . The upper boundaries of all classes are the same as in the
frequency distribution table. To obtain the cumulative frequency of a class just add the
frequency of that class to the frequencies of all the preceding classes. The cumulative
frequencies are recorded in the third column while the class boundaries are recorded in
the first column.
Cumulative Number of students
Cumulative Frequency Distribution of Examination Marks of 80 students
and Ojive
80
70
60
50
40
30
20
10
0
57.5
66.5
75.5
84.5
93.5
Examination Marks
C
(c) The advantage of the cumulative frequency table and the ojive is that it can answer
following question
Example “How many students get an exam mark less than or equal to75 and 80?”
Answer 40 and 57 approximately.
(d)
Cumulative Relative Frequency and Cumulative Percentage
Cumulative Relative Frequency and Cumulative Percentage are easily obtained from the
cumulative frequency distribution using following formulae.
Cumulative relative frequency 
cumulative frequency of a class
Total observations in the data set
Cumulative Percentage  Cumulative relative frequency  100 .
We will illustrate the Cumulative Relative Frequency and Cumulative Percentage using
the example above.
Class Boundaries
Cumulative Relative
Frequency
8/80=.10
24/80=.30
57/80=.7125
71/80=.8875
80/80=1.00
53-62
62-71
71-80
80-89
89-98
Cumulative Percentage
10.00%
30.0%
71.25%
88.75%
100.0%
Note: The Cumulative Relative Frequency and the Cumulative Percentage are really the
same except the vertical axis have different units. Hence we will only plot the
Cumulative Percentage and the ojive.
Cumulative % Number of students
Cumulative % Frequency Distribution of Examination Marks
of 80 students and Ojive
100
50
0
57.5
66.5
75.5
84.5
93.5
Examination Marks
(e)The advantage of the % cumulative frequency table and the ojive is that it can
answer following question.
Example “What % of students get an exam mark less than or equal to 84 and 89?”
Answer 80.00% and 88.75% approximately.
Exercise 5 Consider the following example. The total payrolls( rounded to millions) for
all 30 major league baseball teams in U.S.A. for 1999 are given in the table below.
Total Payrolls of Major League Baseball Teams for 1999
Team
Anaheim
Arizona
Atlanta
Baltimore
Boston
Chicago Cubs
Chicago White Sox
Cincinnati
Cleveland
Colorado
Detroit
Florida
Houston
Kansas City
Los Angeles
Total
Payroll(millions of
dollars)
51
70
79
75
72
55
25
38
74
54
37
15
56
17
77
Team
Milwaukee
Minnesota
Montreal
New York Mets
New York Yankees
Oakland
Philadelphia
Pittsburgh
St. Louis
San Diego
San Francisco
Seattle
Tampa Bay
Texas
Toronto
Total
Payroll(millions of
dollars)
43
16
15
72
92
25
30
24
46
47
46
45
38
81
49
(a)Construct a Frequency Distribution , the Relative Distribution and the %
Frequency Distribution table for this data and draw the histograms for(i) the
Frequency Distribution and Frequency Polygon, and (ii) the % Frequency
Distribution and % Frequency Polygon.
Comment on the shape of the histogram.
Take 5 equal classes.
(b)Construct a Cumulative Frequency Distribution table and draw the
respective histogram and Ogive. Take 5 equal classes.
(c)Find the number of major baseball teams with payroll of $50 million or less?
(d) Construct Cumulative % Frequency Distribution table and draw the
respective histogram and Ogive. Take 5 equal classes
(e) What % of major league baseball has 1999 payroll of $62 million or less ?
Solution
(a)
First we decide on the number of classes , say 5.
92  15 77

 15.4 .
Then class size=
5
5
Take class size=16.
We will use 5 classes of equal size of 16 with mid-points at 23,39,55,71 and
87 respectively.
The required Frequency Distribution Table is given below
Total Payroll
Millions of
dollars
Class Limits
15 to less
than31
31 to less than
47
47 to less than
63
63 to less than
79
79 to less than
95
Tally
Class
Class
Boundaries midpoint
Class
Width
Frequency
!!!!/ !!!
14.5-30.5
23
16
8
!!!!/ !!
30.5-46.5
39
16
7
!!!!/ !
46.5-62.5
55
16
6
!!!!/ !
62.5-78.5
71
16
6
!!!
78.5-94.5
87
16
3
Sum=30
The resulting histogram and polygon is shown below.
Frequency Distribution and Frequency Polygon of the
Payroll of the Major Baseball Teams
8
Number of Teams
7
6
5
4
3
2
1
0
23
39
55
71
87
Payroll in millions of dollars
Note: The shape of the overall histogram is right-skewed.
The complete frequency table including (i)frequency, (ii) relative frequency and%
frequency is given below
Total Payroll
Millions of
dollars
15 to less than
31
31 to less than
47
47 to less than
63
63 to less than
79
79 to less than
95
Tally
Class
midpoint
Frequency
Relative
Frequency
%
Frequency
!!!!/ !!!
23
8
8/30=.2666 26.66%
!!!!/ !!
39
7
7/30=.2333 23.33%
!!!!/ !
55
6
6/30=.2
20%
!!!!/ !
71
6
6/30=.2
20%
!!!
87
3
3/30=.1
10%
Sum=30
Sum=1.0
Sum=100%
Last two columns are calculated from the fourth column according to following
Note 1 Relative Frequency of a class 
frequency of that class
.
Sum of all frequencies
Note 2 Percentages %=Relative Frequencies  100.
From this frequency distribution table we can draw the the histograms for the Frequency
Distribution, the Relative Distribution and the % Frequency Distribution. These are
shown below. Since the Relative Frequency Distribution and the % Frequency
Distribution only differ in units on the vertical axis we only plot the % Frequency
Distribution.
The resulting histogram and polygon is shown below.
% Frequency Distribution and % Polygon of the
Payroll of the Major Baseball Teams
% Number of Teams
30
20
10
0
23
39
55
71
87
Payroll in millions of dollars
Note: The shape of the overall histogram is right-skewed.
(b)
Cumulative Frequency Distribution
Definition: A cumulative frequency distribution gives the total number of values that
fall below the upper boundary of each class. Using the data for the total payrolls( rounded
to millions) for all 30 major league baseball teams in U.S.A. for 1999 we will illustrate
the cumulative frequency distribution.
Example 1 Construct a cumulative frequency distribution for the total payrolls( rounded
to millions) for all 30 major league baseball teams in U.S.A. for 1999 which are given in
the table below.
Total Payroll
Millions of
dollars
15-31
31-47
47-63
63-79
79-95
Class Boundaries
15-31
15-47
15-63
15-79
15-95
Frequency
8
7
6
6
3
Sum=30
Cumulative Frequency
8
8+7=15
8+7+6=21
8+7+6=6=27
8+7+6+6+3=30
The lower limit of the first class 15 is taken as the lower limit of each class in the
cumulative frequency . The upper limits of all classes are the same as in the frequency
distribution table. To obtain the cumulative frequency of a class just add the frequency of
that class to the frequencies of all the preceding classes. The cumulative frequencies are
recorded in the third column while the class boundaries are recorded in the second
column.
Definition: An ogive (ojive) is a curve drawn for the cumulative frequency distribution
by joining with straight lines the dots marked above the upper boundaries of classes at
heights equal to the cumulative frequencies of the respective classes.
The ogive for the baseball teams is as follows.
The Cumulative Frequency and Ojive of the Payroll of the Major Baseball Teams
are given below.
Cumulative Number of Teams
Cumulative Distribution and Ojive of the
Payroll of the Major Baseball Teams
30
20
10
0
23
39
55
71
87
Payroll in millions of dollars
(c) One advantage of the ogive is that it can be used to approximate the cumulative
frequency for any interval.
Find the number of major baseball teams with payroll of $50 million or less.
Answer approximately 17 from ogive.
(d)
Cumulative Relative Frequency and Cumulative Percentage
Cumulative Relative Frequency and Cumulative Percentage are easily obtained from the
cumulative frequency distribution using following formulae.
Cumulative relative frequency 
cumulative frequency of a class
Total observations in the data set
Cumulative Percentage  Cumulative relative frequency  100 .
We will illustrate the Cumulative Relative Frequency and Cumulative Percentage using
the example above.
Class Boundaries
15-31
15-47
15-63
15-79
15-95
Cumulative Relative
Frequency
8/30=.267
15/30=.500
21/30=.700
27/30=.900
30/30=1.00
Cumulative Percentage
26.7%
50.0%
70.0%
90.0%
100.0%
Note: The Cumulative Relative Frequency and the Cumulative Percentage are really the
same except the vertical axis have different units. Hence we will only plot the
Cumulative Percentage.
The Cumulative Percentage and the % Ojive for The Payroll of Major League Baseball
Teams is given below.
Coordinates of the ogive are 15.0, 31,26.7, 47,50, 63,70, 79,90.0, 95,100
Cumulative % Number of Teams
Cumulative % Distribution and % Ojive of the
Payroll of the Major Baseball Teams
100
50
0
23
39
55
71
87
Payroll in millions of dollars
(e) What % of major league baseball has 1999 payroll of $62 million or less?
Answer 70% of major league baseball teams has 1999 payroll of $62 million or
less
Exercise 6 The incomes in 2001 of 16 randomly chosen people from th U.S. census, who
have high school diplomas but no third level qualifications were to the nearest thousand
of dollars
12 43 20 5 67 32 19 6 43 47 21 40 31 25 22 24.
Find (i) the range (ii) the five-number summary and (iii)the interquartile(IQR) .
Solution
Note n  16
Step 1 Arrange the data in ascending order
5 6 12 19 20 21 22 24 25 31 32 40 43 43 47 67
(i)The Range=67-5=62.
(ii) Median M is at the
n  1  17  8.5 position in the list
2
2
i.e.
24  25
 24.5 .
2
The first quartile Q1 is the median of the data values of the ordered list of
data to the left of the location of the overall median M, that is the median of
n  1  9  4.5 th(where
5 6 12 19 20 21 22 24 which is at the
2
2
19  20
 19.5 .Alternatively first quartile Q1 is at the
n  8 ) position i.e. Q1 
2
n  1  17  4.25 th position of the overall ordered data list i.e. Q is between
1
4
4
19  20
 19.5 .
the 4th and 5th positions, that is Q1 
2
The third quartile Q3 is the median of the data values of the ordered list of
data to the right of the location of the overall median M, that is the median of
n  1  9  4.5 th(where
25 31 32 40 43 43 47 67 which is at the
2
2
th
th
n  8 ) position i.e. Q3 is between the 4 and the 5 positions ,that is
40  43
Q3 
 41.5 .Alternatively third quartile Q3 is at the
2
3n  1 3  17

 12.75 th position of the overall ordered data list i.e. Q3 is
4
4
40  43
 41.5 .
between the 12th and 13 th position of the overall list , that is Q3 
2
Thus the five-number summary is
Minimum=5, Q1  19.5 ,M=24.5, Q3  41.5 and Maximum=67.
Between the 8 th and 9 th positions i.e. M 
(iii)
The Interquartile Range(IQR)= Q3  Q1 =41.5-19.5=22.0.
Minitab gives following results
Descriptive Statistics: non-graduate income
Variable
non-grad
N
16
Mean
28.56
Median
24.50
TrMean
27.50
Variable
non-grad
Minimum
5.00
Maximum
67.00
Q1
19.25
Q3
42.25
StDev
16.37
SE Mean
4.09
N.B. Q1 and Q3 have slightly different values (19.25 and
42.25 respectively)from the values we calculated (19.5 and
42.5) Note: Some software packages use slightly different rules to calculate the
quartiles so computer results may be slightly different from the results calculated by the
above rules. However the difference will be very small and can be ignored.
Stem-and-Leaf Display: non-graduate income
Stem-and-leaf of non-grad
Leaf Unit = 1.0
2
4
(5)
7
5
1
1
0
1
2
3
4
5
6
N
56
29
01245
12
0337
7
Boxplot
non-graduate income
70
60
50
40
30
20
10
0
= 16
Boxplot(Box-and-Whisker) is a graph of the five-number summary and is of the
following form
5
10
15
C1
Box plot of numbers 5 7 9 10 11 13 15
The central box spans the quartiles Q1 and Q3 and a line in the box marks the median M
while lines extend from the box(the whiskers) out to the smallest and largest data values.
Exercise 7: The incomes of 15 people who have bachelors degrees chosen at random
from the U.S. Census Bureau in March 2002 were to the nearest thousand of
dollars 110 25 50 50 55 30 35 30 4 32 50 30 31 74 60.
Find the boxplot for this set of data.
Solution Find the five-number summary for this data. From previous work we found the
five-number summary is
Minimum=4, Q1  30 ,M=35, Q3  55 and Maximum=110 and the boxplot is
Income of Graduates
0
50
100
Income to the nearest thousand of dollars
Note: The asterisk in the boxplot indicates the value 110 may be an outliner.
Exercise 8 The incomes in 2001 of 16 randomly chosen people from th U.S. census, who
have high scool diplomas but no third level qualifications were to the nearest thousand of
dollars
12 43 20 5 67 32 19 6 43 47 21 40 31 25 22 24.
Find the boxplot for this set of data and compare with the boxplot for graduates.
Solution Find the five-number summary for this data. From previous work we found the
five-number summary is
Minimum=5, Q1  19.5 ,M=24.5, Q3  41.5 and Maximum=67 and the boxplot is
Income of Non-Graduates
0
10
20
30
40
50
60
Income to the nearest thousand of dollars
Income of Graduates
0
50
100
Income to the nearest thousand of dollars
70
Because boxplots show less detail than histograms or stemplots, they are best used for
side by side comparison of more than one distribution, as in figure below. Be sure to
include a numerical scale in the graph. When you look at a boxplot, first locate the
median, which marks the center of the distribution. Then look at the spread. The quartiles
show the spread of the middle half of the data, and the extremes (the smallest and largest
observations) show the spread of the entire data set. We see from figure below that
holders of a bachelor's degree as a group earn considerably more than people with no
education beyond high school. For example, the first quartile for college graduates is
higher than the median for high school grads. The spread of the middle half of incomes
(the box in the boxplot) is roughly the same for both groups.
A boxplot also gives an indication of the symmetry or skewness of a distribution. In a
symmetric distribution, the first and third quartiles are equally distant from the median. In
most distributions that are skewed to the right, on the other hand, the third quartile will be
farther above the median than the first quartile is below it. That is the case for both
distributions in figure below. The extremes behave the same way, but. remember that
they are just single observations and may say little about the distribution as a whole.
Exercise 9
Given the following data set 11,10,9,18,11,8,3 calculate
(i)the range ,(ii) the median (iii) the mean and (iv) the variance s 2 and the standard
deviation s .
Solution
Put the data in ascending order 3, 8, 9, 10, 11, 11, 18
Range=18-3=15.
Median=10.
3  8  9  10  11  11  18 70
Mean x 

 10 .
7
7
Data
Deviation
Squared Deviations
3
3-10=-7
(7)2  49
8
8-10=-2
(2)2  4
9
9-10=-1
 1
10
10=10=0
(0) 2  0
11
11-10=1
1
11
11-10=1
18
2
2
1
1
(1)2  1
82  64
_______________
Sum=120
18-10=8
________________
Sum=0
The variance
 3  10   8  10   9  10   10  10   11  10   11 10   18 10 

2
s
2

2
2
2
2
6
120
 20 .
6
The standard deviation s  s 2  20  4.4721359  4.47 .
2
2
Exercise 10
Given the following data set 75, 10, 30,10,15,30 calculate
(i)the range ,(ii) the median (iii) the mean and (iv) the variance s 2 and the standard
deviation s .
Solution
Put the data in ascending order 10,10,15,30,30,75
Range=75-10=65
.
15  30 45
Median=

 22.5 .
2
2
10  10  15  30  30  75 170
Mean x 

 28.333333  28.33 .
6
6
Data
Deviation
Squared Deviations
10
10-28.33=-18.33
(18.33)2  335.9889
10
10-28.33=-18.33
(18.33)2  335.9889
15
15-28.33=-13.33
 13.33
30
30-28.33=1.67
(1.67)2  2.7889
30
30-28.33=1.67
(1.67)2  2.7889
75
75-28.33=46.67
________________
Sum=0 .02  0
2
 46.67 
2
 177.6889
 2178.0889
_______________
Sum=3033.3334
The variance
10  28.33  10  28.33  15  28.33  30  28.33  30  28.33   75  28.33

2
s
2
2
2
2
5
3033.3334

 606.6668 .
5
The standard deviation s  s 2  606.6668  24.630604  24.63 .
2
2
Download