Data Description

advertisement
TOPIC 6 : DATA DESCRIPTION
6.1 – Introduction to Data
6.2 – Measures of Location
6.3 - Measures of Dispersion
1
6.1 – Introduction to Data
Learning outcomes:
At the end of this topic, students should be able to:
(a) identify the discrete and continuous data
(b) identify ungrouped and grouped data
(c) construct and interpret stem-and-leaf diagrams
2
6.1 – Introduction to Data
Statistics involves collecting, organizing,
presenting and analyzing data in order to obtain
useful information for decision making.
POPULATION is a collection of all elements
whose characteristics are being studied.
SAMPLE is a group of elements drawn from a
population which is representative of that group.
A sample is a subset of a population
3
Parameter is a numerical measurement describing
some characteristics of a population
e.g ; mean, median, mode...
Variable is a characteristic or attribute that
can take different values.
e.g ; height of student
Variables can be classified into Discrete
Data and Continuous Data
4
DATA
A collection of observations or
measurements or information
obtained from study that
is carried out.
Data
Quantitative Data
data that can be
measured numerically
Discrete data
Continuous data
Qualitative Data
data that cannot
assume a numerical
value but can be
classified into
categories
6
Discrete data
Discrete data are data that assume integer
values.
E.g; The number of teachers in a school
Continuous data
Continuous data are data that assume any
numerical values in a certain interval on the real
line.
E.g; The height of students in KMM,
130.2 cm, 132.5 cm, 131.8 cm……
7
Raw data can be represented in ungrouped data
and grouped data.
Data
(a) UNGROUPED DATA
data which listed as a
sequence or in the form
of a frequency table
but without the use of
intervals
(b) GROUPED
DATA
data which are
categorized into
class intervals
8
Example
The following are the length of 12 leaves collected
from a garden measured to the nearest cm.
10
11
14
12
13
9
9
11
10
11
13
12
these data are called
raw data.
9
The data can be summarized
as a
FREQUENCY DISTRIBUTION TABLE.
Length Of Leaf (cm)
Frequency
9 10
11
12 13 14
2
3
2
2
2
1
The data shown in this frequency distribution
above is known as ungrouped data
The frequency distribution below shows the
same data but grouped into the following
intervals.
Intervals 9 to 10
Length Of Leaf (cm)
9-10
11-12
13-14
Frequency
4
5
3
Data in the form of the frequency distribution
table shown above is known as grouped data.
11
Stem and leaf diagram
• Stem and leaf diagram is another technique of
illustrating the quantitative data.
• Each value is divided into two parts, which are the
stem and the leaf.
• The digit(s) in the greatest place value(s) of the
data values are the stems.
• The digits in the next greatest place
values are the leaves.
• For example,
 if all the data are two-digit numbers,
the number in the tens place would be
used for the stem.
 The number in the ones place would be
used for the leaf.
Example 1
Construct a stem-and-leaf diagram for
the data below:
12, 13, 21, 27, 33, 34, 35, 37, 40, 41
Steps for constructing stem and leaf diagram.
Step 1
• Separate each value into two parts, i.e. the stem
and the leaf
• Since given value consisting of two digits,
therefore first digit can be used as the stems.
The leaves consists of the second digit.
(when the values are big, the stem can consist of
several digits)
Step 2
• Draw a vertical line and list the stem on the left
following the magnitude starting from the
smallest number.
Step 3
• List the leaf, i.e. The corresponding second digit
on the right of the vertical line.
Solution
Stem
1
2
3
4
2
1
3
0
Leaf
3
7
4 5
1
7
Example 2
Construct a stem-and-leaf diagram for
the data of a test scores for a group of
students:
92, 92, 96, 98, 83, 85, 72, 74, 76, 78, 78
79, 61, 64, 64, 67, 68, 50, 50, 52, 58, 58
Solution
Test scores out of 100
Stem
Leaf
9
2 2 6 8
8
3 5
7
2 4 6 8 8 9
6
1 4 4 7 8
5
0 0 2 8 8
Based on the stem and leaf diagram:
• 4 students got a mark in the 90's on their
test out of 100.
• 2 students received the same mark of 92.
• No marks were received below 50.
• No mark of 100 was received.
When you count the total amount of leaves, you know
how many students took the test.
Exercise
Try your own Stem and Leaf diagram with
the following temperatures for June
77 80 82 68 65 59 61
57 50 62 61 70 69 64
67 70 62 65 65 73 76
87 80 82 83 79 79 71
80 77
Solution
Temperatures
Stem
Leaf
5
079
6
11224555789
12/10/11
7
001367799
8
0002237
6.2 Measures Of Location
Learning outcomes:
At the end of this topic, students should able to:
(a) Find and interpret the mean, mode and median
for ungrouped data.
(b) Find and interpret the mean, mode, median,
quartiles and percentiles for grouped data.
(c) Construct and interpret box-and-whisker plots.
23
Data
UNGROUPED DATA
data which listed as a
sequence or in the form
of a frequency table
but without the use of
intervals
mean, mode median
GROUPED DATA
data which are
categorized into
class intervals
mean, mode median,
quartiles and percentiles
24
Ungrouped Data
Mean
• The sum of the values of all observations
divided by the total number of observations.
• Using the symbol
x
x1 + x2 + x3 +… + xn
Mean, x =
n
x
=
n
25
Example 1
a) Find the mean of a set of numbers
3, 5, 7, 4, 5, 9, 6
b) Find the mean of a set of data
Number of Male Children
Frequency
0
2
1
5
2
7
3
3
4
2
5
1
26
Solution 1(a)
27
Solution 1(b)
28
Median
• The middle value when a set of data is arranged in
order of magnitude (in ascending or descending).
• For a set of data
x1, x2, x3,..., xn arranged in order
of magnitude, there are two cases.
29
CASE 1: data (n) is odd
Median =





n 1
2
th




CASE 2: data (n) is even
Median = Mean of the two middle values
30
Example 2
Find the median for the following set of data.
a) 180
186
191
201
209
b)
24
21
28
36
2.71
5.48
17
c) 3.56
8.61
4.35
219 220
32
20
6.22
31
Solution 2(a)
32
Solution 2(b)
33
Solution 2(c)
34
Mode
• The mode of a set of data is the value that
occurs most frequently.
Example 3
Find the mode for the following set of data.
a) 5, 2, 3, 3, 5, 4, 28, 5
b) 2, 3, 5, 8, 10
c) 0.2, 0.4, 0.4, 0.4, 0.5, 0.7, 0.7, 0.7, 0.5
35
Example 4
Find the mode for the following data:
x
20
33
40
52
f
4
10
6
7
The higher frequency
Solution
36
Grouped Data
Mean
• For grouped data, the mean is given by
fx

x=
f
• f = the frequency for each class-value
• x = class mark
Class mark = the mid-point for each class-value
37
Example 1
The table shows the distribution of the fat
content of 40 pieces of food. Find the mean for
the following distribution.
Fat content
Frequency
0.1 - 1.0
4
1.1 - 2.0
5
2.1 - 3.0
7
3.1 - 4.0
13
4.1 - 5.0
11
38
Solution 1
Fat
content
0.1 - 1.0
1.1 - 2.0
2.1 - 3.0
3.1 - 4.0
4.1 - 5.0
Frequency,
f
4
5
7
13
11
fx

mean, x 
f
Class mark,
x
fx
39
Median
• Median of frequency distribution for grouped
data, can be estimated by using the formula
n

 2 -Fk-1 
Median = Lk + 
C
 fk 


Lk = lower boundary of class median
n
= number of data or the sum of frequency
Fk-1 = cumulative frequency before the median class
fk = frequency of the median class
C = class width
40
Example 2
Find the median given that the lengths of a sample
of 90 pieces of leaves from a tree are recorded in
the table (Figure 1):
Lengths (cm)
4–5
6–7
8–9
10 – 11
12 – 13
14 – 15
Frequency
2
6
14
31
30
7
Figure 1
41
Solution 2
42
Solution 2
Length (cm)
frequency
4–5
6–7
8–9
10 – 11
12 – 13
14 - 15
2
6
14
31
30
7
Cumulative
frequency
43
Solution 2
44
Mode
 d1 
Mode = LB + 
C
 d1 + d2 
LB = lower class boundary of mode class
d1 = the different between the mode class
frequency and the PREVIOUS class
frequencies
d2 = the different between mode class frequency
and the class frequency AFTER the mode
class frequency.
C = class width
45
Example 3
The table below shows the distribution of the
heights of 30 plants of type B which have been
planted for 6 weeks. These heights are measured
to the nearest cm. Estimate the mode of this
distribution.
Heights
3 – 5 6 – 8 9 – 11 12 – 14 15 – 17 18 – 20
(cm)
f
1
2
11
Mode class
10
5
1
46
Solution 3
 d1 
Mode  LB  
C
 d1  d 2 
47
Solution 3
48
Quartiles
k
 4 n  -Fk-1
Qk = Lk + 
fk




 Ck


k =1,2, 3
Lk = lower class boundary of the class
containing the quartile
Fk-1 = cumulative frequency before the class
containing the quartile
n = the number of data
fk = frequency of the class containing the quartile
Ck = class with of the class containing the quartile
49
Example 4
The table shows the marks of 250 pre-university
students in an examination.
Marks
0-9
10-19
No. of
students
15
20
20-29 30-39 40-49 50-59 60-69 70-79
25
24
12
31
71
52
Estimate the:
a) First quartile
b) Third quartile
50
Solution 4(a)
51
Solution 4(a)
52
Solution 4(b)
53
Solution 4(b)
54
Percentiles
Percentiles divide the data set into 100 equal parts.
The percentile can be obtained by the formula below:
 k 

  100  n -Fk-1 

 Ck
Pk = Lk +  
fk




k =1, 2, 3, ..., 99
55
Where,
Lk = lower class boundary of the class
containing the percentile
Fk-1 = cumulative frequency BEFORE the class
containing the percentile
n
= the number of data
fk = frequency of the class containing the percentile
Ck = class with of the class containing the percentile
56
NOTES!!
The 25 percentile is called the 1st quartile, Q1
P25 = Q1
The median is the 50 percentile, are also
called the second quartile, Q2
Median = P50 = Q2
The 75 percentile is called the 3rd quartile, Q3
P75 = Q3
57
Example 5
The following table shows the weekly pocket
money of 50 students in a secondary school.
Pocket
money
(RM)
20< x <25
25< x <30
f
10
15
30< x <35 35< x <40 40< x <45
16
5
4
Find the 40th and 90th percentiles respectively.
58
Solution 5
Pocket money
f
20< x <25
25< x <30
30< x <35
35< x <40
10
15
16
5
40< x <45
4
Cumulative
frequency
59
Solution 5
60
Box and Whisker Plots
A box plot summarizes data using the
median, quartiles, and the extreme (least
and greatest) values. It used to provide a
graphical display of the center and
variation of a data set.
Construction of Box and Whisker Plots
Step 1 :
Arrange the data in order least to greatest
Step 2 :
Find median, quartiles, and the extreme (least and
Step 3 :
greatest) values
Connect the quartiles to each other to make a box,
and then connect the box to the minimum and
maximum with lines.
62
Example 1
Draw a Box-and-Whisker Plots for the following set
of data.
3, 5, 4, 2, 1, 6, 8, 11, 14, 13, 6, 9, 10, 7
63
Solution 1
Step 1: Arrangement of data
Arrange your numbers from the least to
the greatest:
1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14
64
Step 2: Find median, quartile 1 and quartile 3
1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14
• Then find the median (from the ordered list):
• Cross off one number from each side until you
reach the middle number (or numbers).
1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14
1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14
• If there are two numbers in the middle,
Add those 2 middle numbers together:
6 + 7 = 13
• Then divide by 2:
13 ÷ 2 = 6.5
• The median is 6.5.
1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14
• Then split the numbers on left and right sides
of the median:
1, 2, 3, 4, 5, 6, 6, │7, 8, 9, 10, 11, 13, 14
1, 2, 3, 4, 5, 6, 6, │7, 8, 9, 10, 11, 13, 14
• Find the median for each half:
1, 2, 3, 4, 5, 6, 6 │ 7, 8, 9, 10, 11, 13, 14
1, 2, 3, 4, 5, 6, 6 │ 7, 8, 9, 10, 11, 13, 14
Left
Right
Median = 4
Median = 10
1, 2, 3, 4, 5, 6, 6 │ 7, 8, 9, 10, 11, 13, 14
Left
Right
Median = 4
Median = 10
• The left median is called the LOWER
QUARTILE, Q1.
• The right median is called the UPPER
QUARTILE, Q3.
Step 3 : Connect the quartiles to each other to
make a box, and then connect the box to the
minimum and maximum with lines.
1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14
• Draw a number line from the smallest to
the largest number without skipping any
numbers.
1
2
3
4
5
6
7
8
9
10 11
12
13 14
1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14
• Put circles at the LOWER and UPPER
Quartiles.
1
2
3
4
5
6
7
8
9
10 11
12
13 14
• Draw a box connecting the circles at
the LOWER and UPPER Quartiles.
1
2
3
4
5
6
7
8
9
10 11
12
13 14
• Put a circle at the median (6.5).
• Draw a line connecting the median to
the box.
1
2
3
4
5
6
7
8
9
10 11
12
13 14
• Put circles at the high and low points.
• Draw lines that connect the high and low
points to the box.
1
2
3
4
5
6
7
8
9
10 11
12
13 14
Box and Whisker Plot
3, 5, 4, 2, 1, 6, 8, 11, 14, 13, 6, 9, 10, 7
1
2
3
4
5
6
7
8
9
10 11
12
13 14
Here is the completed Box and Whisker
Plot!
Symmetry and Skewness
1. Symmetrical distribution: The ‘whiskers’ are
the same length and the median is the centre of
the box.
Q1
Q2
Q3
76
2. Positively skewed distribution: The left
‘whiskers’ is shorter than the right ‘whiskers’ and
the median is nearer Q1.
Q1 Q2
Q3
77
3. Negatively skewed distribution: The left
‘whiskers’ is longer than the right ‘whiskers’ and
the median is nearer Q3.
Q1
Q2
Q3
78
6.3 – Measures of Dispersion
Learning outcomes: At the end of this topic,
students should be able to :
a)
Find and interpret variance and standard
deviation for ungrouped data.
b)
Find and interpret variance, standard
deviation for grouped data.
c)
Find and interpret the Pearson’s
Coefficient of Skewness.
7
9
79
Data
UNGROUPED DATA
variance and
standard
deviation
GROUPED DATA
variance and
standard
deviation
80
Variance and standard deviation for
Ungrouped data.
 For ungrouped data ;
x

Mean =
n
( x)
x 

2
n
Variance, s =
n -1
2
2
Standard deviation, s = s
2
81
Example 1
Find the mean, variance and standard deviation for the
data below
2, 7, 10, 9, 2, 5, 16
Solution 1
82
Solution 1
83
Exercise
Find the mean and standard deviation of the
set of numbers
5, 2, 3, 8, 6
Answer:
Mean = 4.8
Standard deviation = 2.39
Example 2
A set of numbers {1,6,3,2,8,5, x, y} has mean of 4,
36
variance of
.Show that x + y = 7 and hence find
7
the values of x and y.
Solution 2
85
Solution 2
86
Variance and standard deviation for
Grouped data.
( fx)
fx 

2
n
Variance, s 
n -1
2
2
with x = class midpoint
f = frequency
Standard deviation, s = s
2
87
Example 3
Find the mean, variance and standard deviation
for the data below.
Marks
0  x  20
f
9
20  x  40
29
40  x  60
42
60  x  80
26
80  x  100
14
88
Solution 3
Marks
 x < 20
20  x < 40
40  x < 60
60  x < 80
80  x < 100
0
f
Midpoint,
x
fx
fx
2
9
29
42
26
14
n =120
89
Solution 3
90
Pearson Coefficient of Skewness
The Pearson coefficient of skewness provides a
numerical measure of the skewness of a distribution.
Denoted by SK , it is calculated as follows :
Sk =
3(mean - median)
standard deviation
OR
Sk =
(mean - mode)
standard deviation
91
Sk =
3(mean - median)
standard deviation
=
(mean - mode)
standard deviation
3(mean -median) = mean -mode
Note :
If Sk = +ve ; the distribution is positively skewed.
If Sk = -ve ;  the distribution is negatively skewed.
Example 4
Find the Pearson's coefficient of skewness.
1.2, 1.5, 1.9, 2.4, 2.4, 2.5, 2.6, 3.0, 3.5, 3.8
93
Solution 4
94
Solution 4
95
Example 5
In exam, the marks of 120 students is given as below
fx =3108 fx
2
= 82398
Mode = 27.6
Find the mean, standard deviation and Pearson's
coefficient of skewness for the distribution and
interpret the result.
96
Solution 5
Solution 5
98
Exercise
The marks for 400 KMM students in the first quiz
are given below
Marks
Number of students
0-9
44
10-19
56
20-29
64
30-39
78
40-49
60
50-59
40
60-69
36
70-79
18
80-89
4
Estimate the mean, median and standard deviation
for the above sample. By calculating Person’s
coefficient of skewness, state the type of
distribution for the above data.
Answers:
mean = 35.3
median = 34.1
standard deviation = 20.1
Person’s coefficient of skewness = 0.179
(skewed to the right)
Download