Uploaded by ymesalam

Chapter 1 Data Representation

advertisement
Probability and Statistics
Dr Yehya Mesalam
Text book
• Probability & Statistics for Engineers &
Scientists, Ronald E. Walpole, 9th edition 2012,
Pearson
Dr Yehya Mesalam
Brief list of Course topics
1.
2.
3.
4.
5.
6.
7.
8.
Introduction to statistics and data analysis.
Introduction to probability theory.
Random variables and probability distributions.
Mathematical Expectation
Some discrete probability distribution.
Some continuous probability distribution.
Functions of Random Variables
Fundamental sampling distributions and data
descriptions.
Dr Yehya Mesalam
Evaluation Scheme
• Midterm Exam
• Report
• Activities
30
30
60 = 60%
• Final exam
40
• Total
100 =100%
Dr Yehya Mesalam
40 = 40%
4
Chapter 1
Introduction to statistics and
data analysis
Dr Yehya Mesalam
5
What is Statistics?
• Statistics is the science of collecting,
organizing, summarizing, and analyzing
information to draw conclusions or answer
questions.
• Statistics is a way to get information from data.
It is the science of uncertainty.
Dr Yehya Mesalam
6
Steps of Statistical Practice
• Preparation: Set clearly defined goals, questions of
interests for the investigation
• Data collection: Make a plan of which data to collect
and how to collect it
• Data analysis: Apply appropriate statistical methods
to extract information from the data
• Data interpretation: Interpret the information and
draw conclusions
Dr Yehya Mesalam
7
Statistical Methods
• Descriptive statistics include the collection,
presentation and description of numerical data .
• Inferential statistics include making inference,
decisions by the appropriate statistical methods by
using the collected data.
• Model building includes developing prediction
equations to understand a complex system.
Dr Yehya Mesalam
8
Descriptive Statistics
• Descriptive statistics involves the arrangement,
summary, and presentation of data, to enable
meaningful interpretation, and to support decision
making.
• Descriptive statistics methods make use of
– graphical techniques
– numerical descriptive measures.
• The methods presented apply both to
– the entire population
– the sample
Dr Yehya Mesalam
9
Basic Definitions
• Population: The collection of all items of interest in a
particular study.
• Sample: A set of data drawn from the population;
a subset of the population available for observation
• Parameter: A descriptive measure of the population,
e.g., mean
• Statistic: A descriptive measure of a sample
• Variable: A characteristic of interest about each
element of a population or sample.
Dr Yehya Mesalam
10
Collecting Data
• Target Population: The population about
which we want to draw inferences.
• Sampled Population: The actual population
from which the sample has been taken.
Dr Yehya Mesalam
11
Types of Variables
Qualitative
Ordinal
Quantitative
Non Ordinal
Discrete
Dr Yehya Mesalam
Continuous
12
Types of data - examples
Numerical data
Age - income
55
42
75000
68000
.
.
.
.
Weight gain
+10
+5
.
.
Nominal
Person Marital status
1
2
3
married
single
single
.
.
.
.
Computer
Brand
1
2
3
.
.
IBM
Dell
IBM
.
.
Dr Yehya Mesalam
13
Types of data - examples
Numerical data
Nominal data
A descriptive statistic
for nominal data is
the proportion
of data that falls into
each category.
Age - income
55
42
.
.
75000
68000
.
. gain
Weight
+10
+5
.
.
IBM
25
50%
Dell Compaq
11
8
22% 16%
Dr Yehya Mesalam
Other
6
12%
Total
50
14
14
Types of Variables
Qualitative
Ordinal
Quantitative
Non Ordinal
Discrete
Dr Yehya Mesalam
Continuous
15
Types of Variables
•Qualitative variables (what, which type…)
measure a quality or characteristic on each
experimental unit. (categorical data)
•Examples:
•Hair color (black, brown, blonde…)
•Make of car (Dodge, Honda, Ford…)
•Gender (male, female)
•State of birth (Iowa, Arizona,….)
Dr Yehya Mesalam
16
Types of Variables
•Quantitative variables (How big, how
many) measure a numerical quantity on each
experimental unit. (denoted by x)
Discrete if it can assume only a finite or
countable number of values.
Continuous if it can assume the infinitely
many values corresponding to the points
on a line interval.
Dr Yehya Mesalam
17
Graphing Qualitative Variables
• Use a data distribution to describe:
– What values of the variable have been measured
– How often each value has occurred
• “How often” can be measured 3 ways:
– Frequency
– Relative frequency = Frequency/n
– Percent frequency = Relative frequency* 100
Dr Yehya Mesalam
18
Example
• A bag contains 25 colored balls:
• Raw Data: m m m m m m m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
m
• Statistical Table:
Color
Tally
Frequency Relative
Frequency
Percent
Red
mmm
3
3/25 = .12
12%
Blue
mmmmmm
6
6/25 = .24
24%
Green
mm mm
4
4/25 = .16
16%
mmmmm
5
5/25 = .20
20%
3
3/25 = .12
12%
4
4/25 = .16
16%
Orange
Brown
Yellow
mmm
m m m m
Dr Yehya Mesalam
19
6
Graphs
Frequency
5
4
3
Bar Chart
2
1
0
Pareto Chart
Brown
Yellow
Red
Blue
Orange
Green
Color
Brown
12.0%
Green
16.0%
Pie Chart
Yellow
16.0%
Orange
20.0%
Angle=
Red
12.0%
Relative Frequency times 360
Blue
24.0%
Dr Yehya Mesalam
20
Example
A sample of 30 persons who often consume donuts
were asked what variety of donuts was their favourite.
The responses from these 30 persons were as follows:
glazed
frosted
glazed
frosted
filled
filled
filled
plain
plain
other
other
filled
other
other
frosted
plain
glazed
glazed
other
glazed
glazed
other
glazed
frosted
glazed
other
frosted
filled
filled
filled
Construct a frequency distribution table for these data.
Dr Yehya Mesalam
21
Solution
Dr Yehya Mesalam
22
Solution
Relative Frequency and Percentage Distributions
Frequency of that category
Re lative frequency of a category 
Sum of all frequencies
Calculating Percentage Frequency
Percentage Frequency = (Relative frequency) * 100
23
Graphical Presentation of Qualitative Data
A graph made of bars whose heights represent the
frequencies of respective categories is called a bar
graph.
Dr Yehya Mesalam
24
Graphical Presentation of Qualitative Data
A circle divided into portions that represent the
relative frequencies or percentages of a population
or a sample belonging to different categories is
called a pie chart.
Dr Yehya Mesalam
25
Calculating Angle Sizes for the Pie Chart
Dr Yehya Mesalam
26
Pie chart for the percentage distribution
Dr Yehya Mesalam
27
Scatter Plot
Dr Yehya Mesalam
28
Scatter Plot
Dr Yehya Mesalam
29
Dot Plot
Draw the dot plot for the following data, then calculate
the mean, median, and mode
0.86, 0.49, 0.46, 0.52, 0.62, 0.79, 0.75, 0.47, 0.26, 0.43
Mean Calculation: x  x1  x2  x3  x4  x5  x6  .....  xn
 x  5.65
x 5.65

x

 0.565
n
10
Dr Yehya Mesalam
30
Dot Plot
Median Calculation:Rearrange the data
n=10
Median Order =5&6
0.26, 0.43, 0.46, 0.47, 0.49, 0.52, 0.62, 0.75, 0.79, 0.86
Median =( 0.49+0.52)/2 = 0.505
Mode No Mode
Dr Yehya Mesalam
31
Stem and Leaf Displays
In a stem-and-leaf display of quantitative data, each
value is divided into two portions – a stem and a leaf.
The leaves for each stem are shown separately in a
display.
Dr Yehya Mesalam
32
Example
The following are the scores of 30 college students
on a statistics test:
75
69
83
52 80
72 81
84 77
96
61
64
65
76
71
79
86
87
71
79
72
87
68
92
93
50
57
95
92
98
Construct a stem-and-leaf display.
Dr Yehya Mesalam
33
Solution
Dr Yehya Mesalam
34
Solution
Dr Yehya Mesalam
35
Solution
Dr Yehya Mesalam
36
Example
The following data give the monthly rents paid by a
sample of 30 households selected from a small town.
880
1210
1151
1081 721
985 1231
630 1175
1075 1023
932
850
952 1100
775
825
1140
1235
1000
750
750
915
1140
965
1191
1370
960
1035
1280
Construct a stem-and-leaf display for these data.
Dr Yehya Mesalam
37
Solution
Dr Yehya Mesalam
38
Example
Construct a stem-and-leaf display for the given data
Dr Yehya Mesalam
39
39
Solution
Dr Yehya Mesalam
40
Solution
Dr Yehya Mesalam
41
Mean
The mean for ungrouped data is obtained by dividing the
sum of all values by the number of values in the data set. Thus,
Mean for population data:
x


Mean for sample data:
x

x
N
n
Where
x
is the sum of all values;
n is the sample size;
 is the population mean;
x is the sample mean.
N is the population size;
Dr Yehya Mesalam
42
Mean
1.
2.
3.
4.
Most common measure of central tendency
Acts as „balance point‟
Affected by extreme values („outliers‟)
Denoted x where
n

x  i 1
n
x
i

x
1
 x
2
 …  x
n
n
Dr Yehya Mesalam
43
Example
Find the mean of cash donations made by these
eight Persons.
319, 199, 110, 63, 21, 315, 26, 63
Solution
x  x
1
 x2  x3  x4  x5  x6  x7  x8
 319199110 63  21 315 26  63  1116
x 1116

x

 139.5  $139.5million
n
8
Dr Yehya Mesalam
44
Example
Raw Data:
10.3 4.9 8.9 11.7 6.3 7.7
n

x
x  i 1
n

i

x
1
 x
2
 x
3
 x
4
 x
5
 x
6
6
10 . 3  4 . 9  8 . 9  11 .7  6 . 3  7 .7
6
 8 . 30
Dr Yehya Mesalam
45
Median
1. Measure of central tendency
2. Middle value in ordered sequence
•
•
If n is odd, middle value of sequence
If n is even, average of 2 middle values
3. Position of median in sequence
n 1
n is odd
Order 
2
n is even
Order 
n
n
,
2
2
+1
4. Not affected by extreme values
Dr Yehya Mesalam
46
Median Example
• Raw Data: 24.1 22.6 21.5 23.7 22.6
• Ordered: 21.5 22.6 22.6 23.7 24.1
• Position:
1
2
3
4
5
Positioning
Point

n 1

2
Median
5 1
 3 .0
2
 22 . 6
Dr Yehya Mesalam
47
Median Example
• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
• Position:
1
2
3
4
5
6
Positioning

Point
n

2
Median

7 .7  8 . 9
6
 3 ,4
2
 8 . 30
2
Dr Yehya Mesalam
48
Mode
1. Measure of central tendency
2. Value that occurs most often
3. Not affected by extreme values
4. May be no mode or several modes
5. May be used for quantitative or qualitative
data
Dr Yehya Mesalam
49
Mode Example
• No Mode
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• One Mode
Raw Data: 6.3 4.9 8.9
6.3 4.9 4.9
• More Than 1 Mode
Raw Data: 21 28
41
28
Dr Yehya Mesalam
43
43
50
Range
1. Measure of dispersion
2. Difference between largest & smallest
observations
Range= R = Max Value – Min Value
3. Ignores how data are distributed
7 8 9 10
7 8 9 10
Range = 10 – 7 = 3
Range = 10 – 7 = 3
Dr Yehya Mesalam
51
Variance & Standard Deviation
1. Measures of dispersion
2. Most common measures
3. Consider how data are distributed
4. Show variation about mean (x or μ)
x = 8.3
4
6
8 10 12
Dr Yehya Mesalam
52
Standard Notation
Measure
Mean
Sample
Population
x

s

Standard
Deviation
Variance
Size
s
2
n
Dr Yehya Mesalam

2
N
53
Variance and Standard Deviation
Basic Formulas for the Variance and Standard Deviation for
Ungrouped Data
x   



2
and s
N
 x   

x  x


2
2
2
N
 x  x 
2
2

n 1
and s 
n 1
where σ² is the population variance, s² is the sample
variance, σ is the population standard deviation, and s
is the sample standard deviation.
Dr Yehya Mesalam
54
Variance and Standard Deviation
Short-cut Formulas for the Variance and Standard Deviation for
Ungrouped Data
2 

x

x  N


x

x  N
2
2
n x 2   x 
2
and s 2 
N
n(n  1)
2
2
N
n x   x 
2
and s 
2
n(n  1)
where σ² is the population variance, s² is the sample variance, σ
is the population standard deviation, and s is the sample
standard deviation.
Dr Yehya Mesalam
55
Variance Example
Raw Data:
10.3 4.9 8.9 11.7 6.3 7.7
n
(
x
s 2  i 1
s 2 
i
 x
10 . 3  8 . 3

)
where
n 1
(
n
2
2
) (
x  i 1
n
 4 .9  8 .3
2
)
x
i
 8 .3
(
 …  7 .7  8 . 3
)
6 1
 6 . 368
Dr Yehya Mesalam
2
56
Variance Example
Raw Data:
n= 6
10.3 4.9 8.9 11.7 6.3 7.7
n x   x 
2
2
s 
2
2
x
  445.18
 x  49.8
n(n  1)
6 * 445.18  (49.8)
s 
 6.368
6*5
2
2
s  6.368  2.523
Dr Yehya Mesalam
57
Summary of Variation Measures
Measure
Range
Standard Deviation
(Sample)
Standard Deviation
(Population)
Formula
XMax – XMin
n
Description
Total Spread
x  x 
2
i
i1
Dispersion about
Sample Mean
n 1
n
x  µ 
2
i
x
i1
Dispersion about
Population Mean
N
n
Variance
(Sample)
xi  x 
i1
2
n 1
Squared Dispersion
about Sample Mean
Dr Yehya Mesalam
58
Box-and-whisker Plot
A box and whisker plot also called a box plot
displays the five number summary of a set of data.
The five number summary is
• The minimum value
• First quartile (Q1)
• Median,
• Third quartile (Q3)
• The maximum value
Dr Yehya Mesalam
59
Box-and-whisker Plot
In a box plot, we draw a box from the first quartile to
the third quartile. A vertical line goes through the box
at the median. The whiskers go from each quartile to
the minimum or maximum.
minimum
Lower quartile
maximum
Median
Upper quartile
Dr Yehya Mesalam
60
Example
Construct a box-and-whisker plot for the following
data.
85,92,78,88,90,88,89
Dr Yehya Mesalam
61
Solution
1. Order the test scores from least to greatest
78, 85,88, 88, 89, 90,92
2. Find the median of the test scores.
88
3. Find Find the quartiles.
 The first quartile (Q1) is the median of the data
points to the left of the median.
85
 The third quartile (Q3) is the median of the data
points to the right of the median
90
Dr Yehya Mesalam
62
Solution
4. Complete the five-number summary by
finding the min and the max.
Min = 78
Max = 92
Min
78
Maz
85
Q1
88
88
89
Median
90 92
Q3
Dr Yehya Mesalam
63
Example
Use the given data to make a box-and-whisker plot.
31, 23, 33, 35, 26, 24, 31, 29
Dr Yehya Mesalam
64
Solution
Order the data from least to greatest. Then find the
minimum, lower quartile, median, upper quartile, and
maximum.
23 24 26 29 31 31 33 35
31
median: 29 +
= 30
2
lower quartile:
24 + 26
=
2
upper quartile:
31 + 33
2
25
= 32
minimum: 23
maximum: 35
Dr Yehya Mesalam
65
Solution
Draw the box and whiskers.
Draw a number line and plot a point above each value.
23 24 26 29 31 31 33 35
22
24
26
28
30
32
34
Dr Yehya Mesalam
36 38
66
Frequency Histograms
•
•
•
•
•
Divide the range of the data into 5-12
subintervals of equal length.
Calculate the approximate width of the
subinterval as Range/number of subintervals.
Round the approximate width up to a
convenient value.
Sturges rule K= 1+3.3log (N).
Create a statistical table including the
subintervals, their frequencies and relative
frequencies.
Dr Yehya Mesalam
67
Example
• The following are balances (in $) of 100 accounts
receivable taken from the ledger of XYZ Store.
31 38 41 52 59 46 74 69 93
69 83 78 74 77 35 79 80 71
56 69 34 33 92 37 60 43 51
74 68 83 49 34 71 58 83 94
78 48 34 50 68 65 64 95 92
77 84 41 40 38 38 60 67 50
76 99 38 94 48 70 80 95 98
55 49 54 60 62 70 88 94 85
59 68 51 87 53 57 54 46 46
69 64 61 63 78 55 66 73 75
Dr Yehya Mesalam
60
65
61
66
81
86
42
51
76
64
68
Example
• Using 7 equal intervals with the lowest starting at
30, compute the mean, and the variance using shortcut method.
• calculate mode and median (analytically and
graphically)
• Estimate the value below which 75% of the values
fall.
Dr Yehya Mesalam
69
Solution
• Determine the Min value
Dr Yehya Mesalam
70
Example
• The following are balances (in $) of 100 accounts
receivable taken from the ledger of XYZ Store.
31 38 41 52 59 46 74 69 93
69 83 78 74 77 35 79 80 71
56 69 34 33 92 37 60 43 51
74 68 83 49 34 71 58 83 94
78 48 34 50 68 65 64 95 92
77 84 41 40 38 38 60 67 50
76 99 38 94 48 70 80 95 98
55 49 54 60 62 70 88 94 85
59 68 51 87 53 57 54 46 46
69 64 61 63 78 55 66 73 75
Dr Yehya Mesalam
60
65
61
66
81
86
42
51
76
64
71
Solution
• Determine the Min value = 31
• Determine the Max value =
Dr Yehya Mesalam
72
Example
• The following are balances (in $) of 100 accounts
receivable taken from the ledger of XYZ Store.
31 38 41 52 59 46 74 69 93
69 83 78 74 77 35 79 80 71
56 69 34 33 92 37 60 43 51
74 68 83 49 34 71 58 83 94
78 48 34 50 68 65 64 95 92
77 84 41 40 38 38 60 67 50
76 99 38 94 48 70 80 95 98
55 49 54 60 62 70 88 94 85
59 68 51 87 53 57 54 46 46
69 64 61 63 78 55 66 73 75
Dr Yehya Mesalam
60
65
61
66
81
86
42
51
76
64
73
Solution
•
•
•
•
•
•
•
Determine the Min value = 31
Determine the Max value = 99
Calculate the range = Max – Min
But the starting point is given 30
use Min = 30
Range = 99 – 30 = 69
Interval Length C = Range / No. of intervals
C= 69 / 7 = 9.85 =10
Dr Yehya Mesalam
74
Solution
L.L
U.L
30
Dr Yehya Mesalam
75
Solution
L.L
C=10
U.L
30
40
Dr Yehya Mesalam
76
Solution
L.L
C=10
U.L
30
40
C=10
50
Dr Yehya Mesalam
77
Solution
L.L
U.L
30
40
50
60
70
80
90
Dr Yehya Mesalam
78
Classes
L.L
U.L
30
39
40
49
50
59
60
69
70
79
80
89
90
99
Solution
Dr Yehya Mesalam
79
Solution
L.L
U.L
30
39
40
49
50
59
60
69
70
79
80
89
90
99
f
Dr Yehya Mesalam
80
Example
• The following are balances (in $) of 100 accounts
receivable taken from the ledger of XYZ Store.
31 38 41 52 59 46 74 69 93
69 83 78 74 77 35 79 80 71
56 69 34 33 92 37 60 43 51
74 68 83 49 34 71 58 83 94
78 48 34 50 68 65 64 95 92
77 84 41 40 38 38 60 67 50
76 99 38 94 48 70 80 95 98
55 49 54 60 62 70 88 94 85
59 68 51 87 53 57 54 46 46
69 64 61 63 78 55 66 73 75
Dr Yehya Mesalam
60
65
61
66
81
86
42
51
76
64
81
Solution
L.L
U.L
f
30
39
11
40
49
50
59
60
69
70
79
80
89
90
99
Dr Yehya Mesalam
82
Example
• The following are balances (in $) of 100 accounts
receivable taken from the ledger of XYZ Store.
31 38 41 52 59 46 74 69 93
69 83 78 74 77 35 79 80 71
56 69 34 33 92 37 60 43 51
74 68 83 49 34 71 58 83 94
78 48 34 50 68 65 64 95 92
77 84 41 40 38 38 60 67 50
76 99 38 94 48 70 80 95 98
55 49 54 60 62 70 88 94 85
59 68 51 87 53 57 54 46 46
69 64 61 63 78 55 66 73 75
Dr Yehya Mesalam
60
65
61
66
81
86
42
51
76
64
83
Solution
L.L
U.L
f
30
39
11
40
49
12
50
59
60
69
70
79
80
89
90
99
Dr Yehya Mesalam
84
Solution
L.L
U.L
f
30
39
11
40
49
12
50
59
16
60
69
23
70
79
17
80
89
11
90
99
10
100
Dr Yehya Mesalam
85
Solution
f = Relative Frequency
L.L
U.L
f
F
f relative
X
30
39
11
11
0.11
34.5
40
49
12
23
0.12
50
59
16
39
0.16
54.5
60
69
23
62
0.23
64.5
70
79
17
79
0.17
74.5
80
89
11
90
0.11
84.5
90
99
10
100
0.1
94.5
100
F = Cumulative Frequency
C=10
44.5
C=10
1
class mark (X) or Mid Point = (LL+UL )/2
X1 = (30+39)/2 =34.5
Dr Yehya Mesalam
86
Solution
L.L
U.L
f
F
f relative
X
f*X
f*(x - x )2
f*X2
30
39
11
11
0.11
34.5
379.5
9637.76
13092.75
40
49
12
23
0.12
44.5
534
4609.92
23763
50
59
16
39
0.16
54.5
872
1474.56
47524
60
69
23
62
0.23
64.5
1483.5
3.68
95685.75
70
79
17
79
0.17
74.5
1266.5
1838.72
94354.25
80
89
11
90
0.11
84.5
929.5
4577.76
78542.75
90
99
10
100
0.1
94.5
945
9241.6
89302.5
6410
31384
442265
100
X
1
f


i
* Xi
n
Mean (X ) = 64.1
Dr Yehya Mesalam
87
Solution
L.L
U.L
f
F
f relative
X
f*X
f*(x-x )2
f*X2
30
39
11
11
0.11
34.5
379.5
9637.76
13092.75
40
49
12
23
0.12
44.5
534
4609.92
23763
50
59
16
39
0.16
54.5
872
1474.56
47524
60
69
23
62
0.23
64.5
1483.5
3.68
95685.75
70
79
17
79
0.17
74.5
1266.5
1838.72
94354.25
80
89
11
90
0.11
84.5
929.5
4577.76
78542.75
90
99
10
100
0.1
94.5
945
9241.6
89302.5
6410
31384
442265
100
Mean (X ) =
Variance (S2 )
S.D (s)
CV 
64.1
317.010101
17.80477748
0.277765639
C.V
s
X
1
S
2
S2 



fi ( X i  X )2
n 1
n f i X i2  ( f i X i ) 2
Dr Yehya Mesalam
n(n  1)
88
histogram
25
20
15
10
5
0
30
39 40
49
50
59
Dr Yehya Mesalam
89
histogram
25
20
15
10
5
0
30
39 40
49
50
59
Dr Yehya Mesalam
90
histogram
25
20
15
10
5
0
30
39 40
49
50
59
Dr Yehya Mesalam
91
histogram
Take scale every 1 Cm = 10 $ or degree as given in your data
X1 X2
X3
X7
1 Cm
44.5
34.5
C
54.5
64.5
74.5 84.5 94.5
Dr Yehya Mesalam
92
histogram
X1 X2
X3
X7
C
2
C
Dr Yehya Mesalam
93
Solution
L.L
U.L
L. B
U. B
30
39
29.5
39.5
40
49
39.5
49.5
50
59
49.5
59.5
60
69
59.5
69.5
70
79
69.5
79.5
80
89
79.5
89.5
90
99
89.5
99.5
The graph of histogram
must be on the boundaries
not on the limits
L.B for class i=
U.B for class i=
L.L i +U.L i-1
2
L.L i+1 + U.L i
2
L.B 1 = (30+29)/2 =29.5
U.B 1 = (39+40)/2 =39.5
Then L.B i = U.B i-1 or U.B i = L.B i+1
L.B 2 = (40+39)/2 =39.5
L.L & U.L is the Lower Limit & upper limit for the class
L.B & U.B is the Lower boundary& upper boundary for the class
Dr Yehya Mesalam
94
histogram
X1 X2
25
15
10
5
44.5
C
54.5
59.5
34.5
39.5
24.5
49.5
0
29.5
frequency
20
64.5
74.5
84.5
94.5
104.5
class mark
Dr Yehya Mesalam
95
histogram
25
0.20
frequency
20
15
10
Relative frequency
fr
0.05
5
0
24.5
34.5
44.5
54.5
64.5
74.5
84.5
94.5
104.5
class mark
Dr Yehya Mesalam
96
Polygon
25
frequency
20
15
10
5
0
24.5
34.5
44.5
54.5
64.5
74.5
84.5
94.5
104.5
class mark
Dr Yehya Mesalam
97
Polygon
25
frequency
20
15
10
5
0
24.5
34.5
44.5
54.5
64.5
74.5
84.5
94.5
104.5
class mark
Dr Yehya Mesalam
98
Polygon
25
frequency
20
15
10
5
0
24.5
34.5
44.5
54.5
64.5
74.5
84.5
94.5
104.5
class mark
Dr Yehya Mesalam
99
Polygon
25
frequency
20
15
10
5
0
24.5
34.5
44.5
54.5
64.5
74.5
84.5
94.5
104.5
class mark
Dr Yehya Mesalam
10
Polygon
25
frequency
20
15
10
5
0
24.5
34.5
44.5
54.5
64.5
74.5
84.5
94.5
104.5
class mark
Dr Yehya Mesalam
10
histogram
fr
25
0.20
15
10
Mode
frequency
20
5
0.05
0
24.5
34.5
44.5
54.5
64.5
74.5
84.5
94.5
104.5
class mark
Dr Yehya Mesalam
10
Median
Class
limit
30-39
40-49
50-59
60-69
70-79
80-89
90-99
fi
F
11
12
16
23
17
11
10
100
11
23
39
62
79
90
100
~
X  Lmed
n
 Fmed 1
C* 2
f med
60-0.5= 59.5
Dr Yehya Mesalam
10
Median
Class
limit
30-39
40-49
50-59
60-69
70-79
80-89
90-99
fi
11
12
16
23
17
11
10
100
Less
than F
11
23
39
62
79
90
100
~
X  Lmed
n
 Fmed 1
C* 2
f med
60-0.5= 59.5
Dr Yehya Mesalam
10
Median
Class
limit
30-39
40-49
50-59
60-69
70-79
80-89
90-99
fi
11
12
16
23
17
11
10
100
Less
than F
11
23
39
62
79
90
100
~
X  Lmed
n
 Fmed 1
C* 2
f med
60-0.5= 59.5
Dr Yehya Mesalam
10
Median
Class
limit
30-39
40-49
50-59
60-69
70-79
80-89
90-99
fi
11
12
16
23
17
11
10
100
Less
than F
11
23
39
62
79
90
100
~
X  Lmed
n
 Fmed 1
C* 2
f med
60-0.5= 59.5
Median =59.5+10(50-39)/23 =
64.28
Dr Yehya Mesalam
10
Mode
Class
limit
30-39
40-49
50-59
60-69
70-79
80-89
90-99
^
fi
X  Lmod
11
12
16
23
17
11
10
100
1
C*
1   2
60-0.5= 59.5
Dr Yehya Mesalam
10
Mode
Class
limit
30-39
40-49
50-59
60-69
70-79
80-89
90-99
^
fi
X  Lmod
11
12
16
23
17
11
10
100
1
C*
1   2
60-0.5= 59.5
1  23  16  7
Dr Yehya Mesalam
10
Mode
Class
limit
30-39
40-49
50-59
60-69
70-79
80-89
90-99
^
fi
X  Lmod
11
12
16
23
17
11
10
100
1
C*
1   2
60-0.5= 59.5
1  23  16  7
2  23 17  6
Dr Yehya Mesalam
10
Mode
Class
limit
30-39
40-49
50-59
60-69
70-79
80-89
90-99
^
fi
X  Lmod
11
12
16
23
17
11
10
100
1
C*
1   2
60-0.5= 59.5
1  23  16  7
2  23 17  6
Mode=59.5+10(7/13) = 64.88
Dr Yehya Mesalam
11
O-Gives ( Less Than & More than)
Lower
Less
Boundary Than
29.5
0
39.5
49.5
59.5
69.5
79.5
89.5
99.5
Dr Yehya Mesalam
11
O-Gives ( Less Than & More than)
Lower
Less
Boundary Than
29.5
39.5
0
11
49.5
59.5
69.5
79.5
89.5
99.5
Dr Yehya Mesalam
11
O-Gives ( Less Than & More than)
Lower
Less
Boundary Than
29.5
0
39.5
11
49.5
23
59.5
39
69.5
62
79.5
79
89.5
90
99.5
100
Dr Yehya Mesalam
11
Solution
L.L
U.L
f
F
30
39
11
11
40
49
12
23
50
59
16
39
60
69
23
62
70
79
17
79
80
89
11
90
90
99
10
100
100
Dr Yehya Mesalam
11
O-Gives ( Less Than & More than)
Lower
Less
Boundary Than
29.5
0
39.5
11
49.5
23
59.5
39
69.5
62
79.5
79
89.5
90
99.5
100
More
Than
100
Dr Yehya Mesalam
11
O-Gives ( Less Than & More than)
Lower
Less
Boundary Than
More
Than
29.5
0
100
39.5
11
89
49.5
23
59.5
39
69.5
62
79.5
79
89.5
90
99.5
100
Dr Yehya Mesalam
11
O-Gives ( Less Than & More than)
Lower
Less
Boundary Than
More
Than
29.5
0
100
39.5
11
89
49.5
23
77
59.5
39
61
69.5
62
38
79.5
79
21
89.5
90
10
99.5
100
0
M than =n- L than
M than +L than =n
Dr Yehya Mesalam
11
O-Gives ( Less Than & More than)
Lower
Less
Boundary Than
More
Than
29.5
0
100
39.5
11
89
49.5
23
77
59.5
39
61
69.5
62
38
79.5
79
21
89.5
90
10
99.5
100
0
M than +L than =n
Dr Yehya Mesalam
11
O-Gives ( Less Than & More than)
Lower
Less
Boundary Than
More
Than
More Than
O-Gives
Less Than
29.5
0
100
39.5
11
89
110
100
90
49.5
23
77
59.5
39
61
69.5
62
38
Cum. Frequency
80
70
60
50
40
30
79.5
79
21
20
10
89.5
90
10
0
29.5
99.5
100
0
39.5
49.5
59.5
69.5
79.5
89.5
99.5
Lower Boundary
Dr Yehya Mesalam
11
O-Gives ( Less Than & More than)
Lower
Less
Boundary Than
More
Than
More Than
O-Gives
Less Than
29.5
0
100
39.5
11
89
110
100
90
49.5
23
77
59.5
39
61
69.5
62
38
Cum. Frequency
80
70
60
50
40
30
79.5
79
21
20
10
89.5
90
10
0
29.5
99.5
100
0
39.5
49.5
59.5
69.5
79.5
89.5
99.5
Lower Boundary
Dr Yehya Mesalam
12
O-Gives ( Less Than & More than)
Lower
Less
Boundary Than
More
Than
More Than
O-Gives
Less Than
29.5
0
100
39.5
11
89
110
100
90
23
77
59.5
39
61
69.5
62
38
70
60
Mediam at n=50
50
40
Median
49.5
Cum. Frequency
80
30
79.5
79
21
20
10
89.5
90
10
0
29.5
99.5
100
0
39.5
49.5
59.5
69.5
79.5
89.5
99.5
Lower Boundary
Dr Yehya Mesalam
12
O-Gives ( Less Than & More than)
Estimate the value below which 75% of the values fall.
75% of the sample obtained
more ( above) the value 51
More Than
O-Gives
Less Than
110
100
90
80
Cum. Frequency
n= 100
100%
?
75%
Then at frequency value =75
draw horizontal line cuts
Less Than and More Than
then determine the required
value
70
60
50
40
30
20
10
0
39.5
49.5
59.5
Lower Boundary
Dr Yehya Mesalam
69.5
79.5
89.5
99.5
77
29.5
51
75% of the sample obtained
less (blew)the value 77
12
Short Cut Method
X  X0
S2  C2
fd

C
i
i
n
n f i d i2  ( f i d i ) 2
n(n  1)
Dr Yehya Mesalam
12
Short Cut Method
L.L
U.L
f
F
f relative
d
f*d
30
39
11
11
0.11
-3
-33
40
49
12
23
0.12
-2
-24
50
59
16
39
0.16
-1
-16
60
69
23
62
0.23
0
0
70
79
17
79
0.17
1
17
80
89
11
90
0.11
2
22
90
99
10
100
0.1
3
30
Sum
100
Mean (X ) =
Variance (S2 )
S.D (s)
317.010101
17.80477748
0.277765639
C.V
CV 
64.1
s
X
1
f *d2
-4
X  X0  C

f i di
n
X = 64.5+ 10 (-4/100)=64.1
Dr Yehya Mesalam
12
Short Cut Method
L.L
U.L
f
F
f relative
d
f*d
f*d2
30
39
11
11
0.11
-3
-33
99
40
49
12
23
0.12
-2
-24
48
50
59
16
39
0.16
-1
-16
16
60
69
23
62
0.23
0
0
0
70
79
17
79
0.17
1
17
17
80
89
11
90
0.11
2
22
44
90
99
10
100
0.1
3
30
90
-4
314
Sum
Mean (X ) =
Variance (S2 )
S.D (s)
C.V
100
64.1
317.010101
17.80477748
0.277765639
1
S
2
C
2
n  f i d i2  ( f i d i ) 2
n( n  1)
S2 = 102 *[(100*314-(-4)2 )/(100*99)]
=317.010101
Dr Yehya Mesalam
125
Short Cut Method
L.L
U.L
f
F
f relative
d
f*d
30
39
11
11
0.11
0
0
40
49
12
23
0.12
1
12
50
59
16
39
0.16
2
32
60
69
23
62
0.23
3
69
70
79
17
79
0.17
4
68
80
89
11
90
0.11
5
55
90
99
10
100
0.1
6
60
Sum
Mean (X ) =
Variance (S2 )
S.D (s)
C.V
100
64.1
317.010101
17.80477748
0.277765639
1
f *d2
296
X  X0  C

f i di
n
X = 34.5+ 10 (296/100)=64.1
Dr Yehya Mesalam
126
Example
Complete the following table, and then find the mean, mode,
median, variance and CV.
Draw the Histogram, Frequency polygon, Relative frequency
histogram
Class limits
X L  XU
50 -
C
xi
fi
F
di
34
0
fi d 2i
8
-
10
-
14
-
C
fi di
10
-
119
65
16
120
Dr Yehya Mesalam
127
Solution
Complete the following table, and then find the mean, mode,
median, variance and CV.
Draw the Histogram, Frequency polygon, Relative frequency
histogram
Class limits
X L  XU
50 -
C
xi
fi
F
di
34
0
fi di
fi d 2i
8
-
10
-
14
-
10
-
119
120= 50+7C
65
16
Then C=10
Dr Yehya Mesalam
128
Solution
Complete the following table, and then find the mean, mode,
median, variance and CV.
Draw the Histogram, Frequency polygon, Relative frequency
histogram
Class limits
X L  XU
xi
fi
F
di
50
- 59
54.5
8
-2
60
- 69
64.5
10
-1
70
- 79
74.5
80
- 89
84.5
90
- 99
94.5
100
- 109
104.5
110
- 119
114.5
34
fi d 2i
0
1
10
fi di
14
2
3
65
4
Dr Yehya Mesalam
16
129
Solution
Complete the following table, and then find the mean, mode,
median, variance and CV.
Draw the Histogram, Frequency polygon, Relative frequency
histogram
Class limits
X L  XU
xi
fi
F
di
50
- 59
54.5
8
-2
60
- 69
64.5
10
-1
70
- 79
74.5
80
- 89
84.5
14
1
90
- 99
94.5
10
2
100
- 109
104.5
110
- 119
114.5
34
fi di
fi d 2i
0
14
3
1
65
4
Dr Yehya Mesalam
16
130
Solution
Complete the following table, and then find the mean, mode,
median, variance and CV.
Draw the Histogram, Frequency polygon, Relative frequency
histogram
Class limits
X L  XU
xi
fi
F
di
50
- 59
54.5
8
8
-2
60
- 69
64.5
10
18
-1
70
- 79
74.5
16
34
0
80
- 89
84.5
14
48
1
90
- 99
94.5
10
58
2
100
- 109
104.5
6
64
3
110
- 119
114.5
1
65
4
fi di
fi d 2i
14
Dr Yehya Mesalam
16
131
Solution
Complete the following table, and then find the mean, mode,
median, variance and CV.
Draw the Histogram, Frequency polygon, Relative frequency
histogram
Class limits
X L  XU
xi
fi
F
di
fi di
fi d 2i
50
- 59
54.5
8
8
-2
-16
32
60
- 69
64.5
10
18
-1
-10
10
70
- 79
74.5
16
34
0
0
0
80
- 89
84.5
14
48
1
14
14
90
- 99
94.5
10
58
2
20
40
100
- 109
104.5
6
64
3
18
54
110
- 119
114.5
1
65
4
4
16
30
166
Dr Yehya Mesalam
132
Solution
Complete the following table, and then find the mean, mode,
median, variance and CV.
Draw the Histogram, Frequency polygon, Relative frequency
histogram
Class limits
X L  XU
xi
fi
F
di
fi di
fi d 2i
50
- 59
54.5
8
8
-2
-16
32
60
- 69
64.5
10
18
-1
-10
10
70
- 79
74.5
16
34
0
0
0
80
- 89
84.5
14
48
1
14
14
90
- 99
94.5
10
58
2
20
40
100
- 109
104.5
6
64
3
18
54
110
- 119
114.5
1
65
4
4
16
30
166
Dr Yehya Mesalam
133
Short Cut Method
X  X0
fd

C
i
i
n
Mean = 74.5 + 10 ( 30/65) = 79.11
S2  C2
n f i d i2  ( f i d i ) 2
n(n  1)
Variance = (10)2 [65*166 – (30)2 ] / [65*64 ] = 237.74
S.D = (237.74) 0.5 =15.41
Dr Yehya Mesalam
134
Shape
1. Describes how data are distributed
2. Measures of Shape
• Skew = Symmetry
Left-Skewed
Mean Median
Symmetric
Mean = Median
Dr Yehya Mesalam
Right-Skewed
Median Mean
135
Moment
About the Origin
mK/
f


m
K
mK
Xi
i
 X
n
/
2
m
f


/
3
m
f


/
4
f


m
Xi
n
f


/
1
i
About the Mean
i
Xi
2
Xi
3
f (X


i
m1 
 f (X
i
i
Xi
n
Dr Yehya Mesalam
 X)
i
n
m2
 f (X

m3
f (X


m4
 f (X

i
i
0
 X )2
n
i
i
 X )3
n
n
4
 X )K
n
n
i
i
i
i
 X )4
n
136
Moment
• Coefficient of Skewness
1 
1  0
Left-Skewed
Mean Median
Skewness to Left
1
See page 38
m3
m23
1  0
Symmetric
Mean = Median
Normal Distribution
Dr Yehya Mesalam
1  0
Right-Skewed
Median Mean
Skewness to Right
137
Moment
• Coefficient of Kurtosis
2
2
m4

m22
2  3
2  3
Symmetric
Leptokurtic
Normal Distribution
Dr Yehya Mesalam
2  3
Platykurtic
138
Example
• From the given graph, complete the following tables, draw the histogram
and polygon, determine the mode and median graphically, and calculate
the mean, median, mode, variance, standard deviation, and coefficient
of variation
Class limits
xi
fi
fr
Dr Yehya Mesalam
139
Solution
•
Class
limits
xi
12 - 16
14
17 - 21
fr
d
fd
6
6/80
-3
-18
19
8
8/80
-2
-16
22 - 26
24
14
14/80
-1
-14
27 - 31
29
24
24/80
0
0
32 - 36
34
14
14/80
1
14
37 - 41
39
8
8/80
2
16
52 - 46
44
6
6/80
3
18
80
1
sum
fi
0
X = 29+ 5(0/80)=29
Dr Yehya Mesalam
140
Example
• From the given graph, complete the following tables, draw the histogram
and polygon, determine the mode and median graphically, and calculate
the mean, median, mode, variance, standard deviation, and coefficient
of variation
Class limits
xi
fi
fr
Dr Yehya Mesalam
141
Example
• Complete the table, compute the mean, variance, , and mode and
median analytical and graphical
Class limit
Frequency
Relative
frequency
Boundaries
Cumulative
frequency
?
-
?
?
?
More than ?
100
20
-
?
?
?
More than 19.95
92
?
-
?
17
?
More than ?
?
?
-
?
?
?
More than ?
46
?
-
?
?
0.12
More than 37.95
?
?
-
?
?
?
More than ?
5
?
?
More than ?
?
Dr Yehya Mesalam
142
Solution
•
Class limit
Frequency
Relative
frequency
Boundaries
Cumulative
frequency
14
- 19.9
8
0.08
More than 13.95
100
20
- 25.9
29
0.29
More than 19.95
92
26
- 31.9
17
0.17
More than 25.95
63
32
- 37.9
29
0.29
More than 31.95
46
38
- 43.9
12
0.12
More than 37.95
17
44
- 49.9
5
0.05
More than 43.95
5
100
1
More than 49.95
0
Dr Yehya Mesalam
143
Solution
L.L
U.L
f
d
Fd
f d2
14
19.9
8
-3
-24
72
20
25.9
29
-2
-58
116
26
31.9
17
-1
-17
17
32
37.9
29
0
0
0
38
43.9
12
1
12
12
44
49.9
5
2
10
20
-77
237
Sum
mean X
100
30.33
Variance S2
64.62181818
S.D
8.038769693
Dr Yehya Mesalam
144
Solution
viscosity
35
29
30
29
20
17
15
10
8
5
Mode
12
Mode
frequency
25
5
0
16.95
22.95
28.95
34.95
40.95
46.95
class mark
Dr Yehya Mesalam
145
Solution
O-Gives
110
100
100
95
100
90
83
92
Cum. Frequency
80
70
60
Mediam at n=50
63
54
More Than
Less Than
50
37
40
46
30
20
8
10
17
5
0
0
0
13.96
19.96
25.96
31.96
37.96
43.96
49.96
Lower Boundary
Dr Yehya Mesalam
146
147
Download