Elementary Statistics

advertisement
Elementary Statistics
Dr. Ghamsary
Chapter 2
Elementary Statistics
M. Ghamsary, Ph.D.
Chapter 02
1
Page 1
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 2
Descriptive Statistics
Grouped vs Ungrouped Data
•
Ungrouped data: have not been summarized in any way are also called raw data
•
Grouped data: have been organized into a frequency distribution
Raw Data: When data are collected in original form, they are called raw data.
The following are the scores on the first test of the statistics class in fall of 2004.
76
62
68
69
79
90
79
86
52
97
78
55
96
89
73
66
88
92
94
50
71
89
78
88
58
76
59
92
93
88
86
66
81
85
85
70
55
62
80
60
80
72
82
86
99
63
75
83
78
61
Table 2.1: Data fromTest#1 of fall 2007
Stem-and-Leaf: One method of displaying a set of data is with a stem-and-leaf plot.
Stem
Leaf
5
025589
6
012236689
7
012356688899
8
9
001235566688899
02234679
Group Data: When the raw data is organized into a frequency distribution
Frequency Distribution: is the organizing of raw data in table form, using classes and
frequencies.
2
Elementary Statistics
Class
Dr. Ghamsary
Tally
Chapter 2
Page 3
Frequency
50-59
6
60-69
9
70-79
12
80-89
15
90-99
8
• Class: Number of classes in the above table is 5.
• Class Limits: represent the smallest and largest data values in each class.
• Lower Class: the lowest number in each class. In above table 50 is the lower class limit of the
first class, 60 is the lower class limit of the 2nd class, etc.
• Upper Class: the highest number in each class. In above table 59 is the upper class limit of the
first class, 69 is the upper class limit of the 2nd class, etc.
• Class Width: for a class in a frequency distribution is found by subtracting the lower (or
upper) class limit of one class minus the lower (or upper) class limit of the previous class. In
above table the class width is 10.
Class Boundaries are used to separate the classes so that there are no gaps in the frequency
distribution.
Class
50-59
3
Class
Frequency
Boundaries
49.5-59.5
6
60-69
59.5-69.5
9
70-79
69.5-79.5
12
80-89
79.5-89.5
15
90-99
89.5-99.5
8
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 4
Cumulative Frequency:
Relative Frequency:
Class
Frequency
Cumulative Relative
Frequency Frequency
50-59
6
6
6/50=0.12
60-69
9
9+6=15
9/50=0.18
70-79
12
12+15=27
12/50=0.24
80-89
15
15+27=42
15/50=0.30
90-99
8
8+42=50
8/50=0.16
n=50
Most Popular Graphs in Statistics
The most commonly used graphs in statistics are:
1. The Histogram
6. Pareto Charts
2. The Frequency Polygon.
7. Dot Plot
3. The Cumulative Frequency Graph
8. Stem-Leaf
4. The Bar Chart
9. Time Series Graph
5. Pie Chart
1. The Histogram
o Making decisions about a process, product, or procedure that could be improved after
examining the variation (example: Should the school invest in a computer-based tutoring
program for low achieving students in Algebra I after examining the grade distribution? Are
more shafts being produced out of specifications that are too big rather than too small?)
o Displaying easily the variation in the process (example: Which units are causing the most
difficulty for students? Is the variation in a process due to parts that are too long or parts that
are too short?)
4
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 5
Histogram of Test1
Normal
16
14
Mean
StDev
N
76.8
12.98
50
Mean
StDev
N
AD
P-Value
76.8
12.98
50
0.537
0.161
Frequency
12
10
8
6
4
2
0
55
65
75
Test1
85
95
Probability Plot of Test1
Normal - 95% CI
99
95
90
Percent
80
70
60
50
40
30
20
10
5
1
5
30
40
50
60
70
80
Test1
90
100
110
120
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 6
2. The frequency polygon
o Making decisions about a process, product or procedure that could be improved
(example: a frequency polygon for 642 psychology test scores, shown below to the right.)
X
Frequency
54.5
6
64.5
9
74.5
12
84.5
15
94.5
8
Scatterplot of f vs x
15.0
f
12.5
10.0
7.5
5.0
60
70
80
Midpoints x
6
90
100
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 7
2. The Cumulative Frequency Graph (Ogive)
Cumulative frequency is used to determine the number of observations that lie above
(or below) a particular value.
Upper Class
Boundaries
Cumulative
Frequency
59.5
6
69.5
15
79.5
27
89.5
42
99.5
50
Scatterplot of Cumulative f vs x
50
Cumulative f
40
30
20
10
0
60
70
80
Upper Class Boudaries
7
90
100
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 8
4. The bar chart
Bar charts are useful for comparing classes or groups of data. A class or group can have a
single category of data or they can be broken down further into multiple categories for
greater depth of analysis.
Class
Grade
Frequency
50-59
F
6
60-69
D
9
70-79
C
12
80-89
B
15
90-99
A
8
16
14
Frequency
12
10
8
6
4
2
0
8
F
D
C
Grade
B
A
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 9
5. Pie Chart
o A pie chart is a way of summarizing a set of categorical data or displaying the different
values of a given variable (example: percentage distribution).
o
Pie charts usually show the component parts of a whole. Often you will see a segment of the
drawing separated from the rest of the pie in order to emphasize an important piece of
information
A
8, 16.0%
F
6, 12.0%
D
9, 18.0%
B
15, 30.0%
C
12, 24.0%
9
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 10
6. Pareto Charts
A Pareto chart is used to graphically summarize and display the relative importance of the
differences between groups of data.
16
14
Frequency
12
10
8
6
4
2
0
B
C
D
A
F
7. Dot plot
A dot plot is a visual representation of the similarities between two sequences.
D o tp l o t o f T e s t1
49
56
63
70
77
Te s t 1
10
84
91
98
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 11
8. Stem-Leaf
o The Stem-and-Leaf Plot summarizes the shape of a set of data (the distribution) and
provides extra detail regarding individual values.
o They are usually used when there are large amounts of numbers to analyze. Series of
scores on sports teams, series of temperatures or rainfall over a period of time, series of
classroom test scores are examples of when Stem and Leaf Plots could be used.
Stem
Leaf
5
025589
6
012236689
7
012356688899
8
9
001235566688899
02234679
9. Time series Graph
Month Price of
AOL
Jan
65
Feb
60
Mar
58
Apr
62
May
55
Jun
50
Jul
48
Aug
55
Sep
57
Oct
50
Nov
48
Dec
40
Price of
MSFT
110
115
120
100
95
90
85
75
80
60
50
40
Time Series Plot of AOL, MSFT
Variable
AOL
MSFT
120
110
100
90
80
70
60
50
40
30
Dec
11
Jan
Feb Mar
Apr May Jun
Month
Jul
Aug Sep Oct
Nov
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 12
Type of Distributions:
There are several different kinds of distributions, but the following are the most common used in
statistics.
•
Symmetric , normal, or bell shape
•
Positively skewed, Right tail, or skewed to the right side.
•
Negatively skewed, Left tail, or skewed to the left side.
•
Uniform
• Symmetric, Bell Shape, or Normal Distribution
600
500
400
300
200
100
0
12
18
36
54
72
90
108
126
144
Elementary Statistics
•
Dr. Ghamsary
Chapter 2
Page 13
Positively skewed
500
400
300
200
100
0
0.00
0.09
0.18
0.27
0.36
0.45
0.54
0.63
• Negatively skewed
500
400
300
200
100
0
13
0.36
0.45
0.54
0.63
0.72
0.81
0.90
0.99
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 14
• Uniform
1000
800
600
400
200
0
14
0
2
4
6
8
10
Test1
76
62
68
69
79
90
79
86
52
97
78
55
96
89
73
66
88
92
94
50
71
89
78
88
58
Sex
1
1
1
1
0
0
1
1
0
1
1
1
1
1
0
0
1
0
1
1
0
0
1
0
1
Dr. Ghamsary
Grade
C
D
D
D
C
A
C
B
F
A
C
F
A
B
C
D
B
A
A
F
C
B
C
B
F
Test1
76
59
92
93
88
86
66
81
85
85
70
55
62
80
60
80
72
82
86
99
63
75
83
78
61
Sex
1
1
1
1
0
0
0
1
0
0
1
1
1
1
1
1
1
0
1
1
1
1
1
0
1
Grade
C
F
A
A
B
B
D
B
B
B
C
F
D
B
D
B
C
B
B
A
D
C
B
C
D
Chapter 2
Grade
8
F
D
C
B
A
6
4
2
0
Male
Female
Sex
Sex
15
Male
Female
12
Count
9
6
3
0
F
D
C
B
A
Grade
Sex
70
Male
Female
60
Percent
50
40
30
20
10
0
F
D
C
Grade
15
B
Page 15
1=Female 0=Male
Count
Elementary Statistics
A
Elementary Statistics
Dr. Ghamsary
Chapter 2
Boxplot of Test1 vs Sex
100
90
Test1
80
70
60
50
Female
Male
Sex
16
Page 16
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 17
Numerical measurements:
•
Statistic:: any value(s) or measure(s) obtained from a sample.
•
Parameter:
any value(s) or measure(s) obtained from a specific population.
Measures of central tendency: are Mean, Median, and Mode,
Mean is defined to be the sum of the scores in the data set divided by the total number of scores.
o Sample Mean: is denoted by x , and it is defined by:
n
x=
∑x
i =1
n
i
, or simply
x=
∑x.
n
o Population Mean: is denoted by µ , and it is defined by:
N
µ=
∑x
i =1
N
i
, or simply µ = ∑
N
x
.
Note: The sample mean, x is an unbiased estimate of the population mean, µ .
Example1: Find the mean of 10, 7, 3, 12, 18.
x=
10 + 7 + 3 + 12 + 18
= 10 .
5
Example2: Find the mean of 10, 7, 3, 12, 18, 13, 17, 15, 25, 3
x=
10 + 7 + 3 + 12 + 18 + 13 + 17 + 15 + 25 + 30 150
=
= 15
10
10
Example3: Find the mean of scores in the test#1, 2004 in data set in this chapter.
x=
76 + 62 + " + 78 + 61
= 76 . 8
50
17
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 18
Median: is defined to be the midpoint of the data set that is arranged from smallest to largest.
Example4: Find the median of 10, 7, 3, 12, 15.
Solution: First we must sort the data set as follows: 3, 7, 10, 12, 15.
The median is 10.
Example5: Find the median of 10, 7, 3, 12, 15, 20.
Solution: After we sort we get: 3, 7, 10, 12, 15, 20.
As we observe, there are 2 middle observations. So to find the median we average these 2 values,
namely: Median=(10+12)/2 =11.
Example6: The median of scores in the test#1, 2004 in data is 78.50
Median = 78.50
Mode: is defined to be the value in the data set that occurs most frequently.
Example7A: Find the mode of 10, 7, 3, 12, 15, 3.
Mode is 3.
Example7B: Find the mode of 10, 7, 3, 10, 15, 3.
Modes are 3 and 10.
Example7C: Find the mode of 10, 7, 3, 10, 10, 3.
Mode is 10.
Example7D: Find the mode of 10, 7, 3, 10, 7, 3.
There is no mode, since all values occur with same frequency
Example7E: Find the mode of 10, 7, 3, 12, 15, 18.
There is no mode, since no values occur more than once.
18
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 19
Example 8: Find the mean, the median, and the mode of data set:
10, 17, 13, 12, 15, 18, 10, 17, 14, 16, 35, 28, 22, 17, 23, 12, 15, 28, 10, 20
Solution: First we must sort the data set
10, 10, 10, 12, 12, 13, 14, 15, 15, 16, 17, 17, 17, 18, 20, 22, 23, 28, 28, 35
o Mean: x =
o Median:
10 + 10 + 10 + 12+ .....+28 + 28 + 35 352
=
= 17.6
20
20
16 + 17
= 16.5 , since there are 2 middle observations
2
o Mode: 10, 17
Example 9: Find the mean, the median, and the mode of data set:
25, 42, 18, 37, 25, 18, 40, 57, 64, 66, 85, 86, 92
85, 88, 92, 67, 33, 75, 85, 48, 60, 80, 60, 50
Example10: Find the mean, the median, and the mode of data set:
12.37, 13.33, 32.67, 12.37, 26.45
19
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 20
Example11A: Find the mean for the following group data
Class
Frequency
50-59
6
60-69
9
70-79
12
80-89
15
90-99
8
Solution: First we need to find the class marks(midpoints) and then we use the following formula
:x
=
∑ [ x. f ] ,
n
where
x : is the midpoint or class mark, and
f :is the frequency
n :is the number of data points
Class
Frequency
Class marks
f
x
x. f
50-59
6
54.5
327
60-69
9
64.5
580.5
70-79
12
74.5
894
80-89
15
84.5
1267.5
90-99
8
94.5
756
n=
So the mean is
20
x=
∑ f =50
∑ [ x. f ] = 3825 = 76.5
n
50
∑
x . f =3825
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 21
Example11B: Find the mean for the following group data
Class
Frequency
00-04
4
05-09
10
10-14
12
15-19
20
20-24
8
25-29
6
Weighted Average (Mean): The formula in above is also called weighted average or weighted
mean. It can also be written as follows:
x=
∑ [ w .x ]
∑w
where w is weight and x is the score.
Example12: Find the GPA of John who has the following courses with the corresponding units
and grades.
English
Math
Spanish
5 units with the grade of A
3 units with the grade of F
2 units with the grade of D
Solution: In this problem, x will be the value of the grades and w is the number of units,
x=
∑ [ w .x ] = [5.4] + [3.0] + [ 2.1] = 20 + 0 + 2 = 22 = 2.2 .
5+3+ 2
10
10
∑w
Example13: A teacher is teaching 3 classes: There are 30 students in the first Class with the
average of 70 on the final exam. The second class has 40 students with the average of 60 on the final
exam. The 3rd class has 20 students with the average of 80 on the final exam. Find the weighted
average of the three classes combined together.
Solution: Let x be the average of and w be the number of students.
x=
∑ [ w .x ]
∑w
=
70( 30 ) + 60( 40 ) + 80( 20 ) 2100 + 2400 + 1600 6100
=
=
≈ 67.8
30 + 40 + 20
90
90
21
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 22
Measures of Variation
• Range
• Variance
• Standard Deviation
The Range: is defined to be the highest value minus the lowest value in the data set
The Variance: is defined by the following:
n
Sample: s
2
=
∑( x
i =1
i
− x)
2
s2 =
or
n −1
∑x
2
(∑ x )
−
n −1
2
n
(short cut formula of the sample
variance).
N
Population: σ
2
=
∑ ( xi − µ )
2
i =1
, or σ =
2
N
∑x
2
d∑ x i
−
2
N
N
variance).
Standard deviation: is the positive square root of the variance.
Standard deviation =
n
Sample: s =
∑( x
i =1
i
− x)
, and
n −1
N
Population: σ =
22
∑( x
i =1
2
i
− µ)
N
2
Variance
(short cut formula of the sample
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 23
Example14A: Find the range, variance, and the standard deviation of the following data
set.
3, 0, 7, 5, 15.
Solution:
o Range: Largest- Smallest = 15-0=15
n
o Variance: If we use the
So x =
s
s
2
i =1
i
− x)
n −1
2
, first we need to find the sample mean x .
3 + 0 + 7 + 5 + 15 30
=
= 6 , then we substitute in the above formula and we get
5
5
b 3 − 6g + b0 − 6g + b7 − 6g + b5 − 6g + b15 − 6g
=
2
2
s2 =
∑( x
2
2
2
5−1
b−3g + b−6g + b1g + b−1g + b9g
=
2
2
2
2
2
s2 =
5−1
s2 =
2
,
9 + 36 + 1 + 1 + 81
,
5−1
128
= 32 , So the variance is s 2 = 32 .
4
x
x− x
3
3-6=-3
9
0
0-6=-6
36
7
7-6=1
1
5
5-6=-1
1
15
15-6=9
81
( x − x)
2
∑ ( x − x ) =0 ∑ ( x − x )
n
s2 =
∑ ( xi − x )
i =1
n −1
2
=128
2
=
128 128
=
= 32
5 −1
4
o Standard deviation: As we know the standard deviation is positive square root of
variance. standard deviation =
Variance
=
32 ≈ 5.66
23
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 24
But if we use the short cut formula
∑x
s2 =
(∑ x )
−
2
n −1
n
2
, first we need to find their sum,
∑ x , and their sum of squares,
∑x .
∑ x = 3 + 0 + 7 + 5 + 15 = 30
2
∑x
s2 =
∑x
2
2
= 32 + 02 + 7 2 + 52 + 152 = 9 + 0 + 49 + 25 + 225 = 308
(∑ x )
−
n −1
n
2
=
( 30 )
308 −
5 −1
5
then we have
2
900
5 = 308 − 180 = 128 = 32 , which is exactly the
4
4
4
308 −
=
same as above.
----------------------------------------------------------------------------------
Example14B: Find the range, variance, and the standard deviation of the following data set.
10, 17, 13, 12, 15, 18, 10, 17, 14, 16
28, 22, 17, 23, 12, 15, 28, 10, 20, 35
Solution:
24
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 25
Example15A: Find the standard deviation for the following group data
Class
Frequency
50-59
6
60-69
9
70-79
12
80-89
15
90-99
8
Solution: First will modify the above formula for the variance. But first we need to find the class marks
(midpoints) and then we use the following formula
s2 =
∑
bx − xg . f
2
i
n−1
or s
=
2
∑ ⎡⎣ x
2
( ∑ xf )
f ⎤−
⎦
n −1
2
n
where
x : is the midpoint or class mark
f : is the frequency
n : is the number of data points
We already know the mean
Class
50-59
60-69
x=
∑ [ x. f ] = 3865 = 76.5
x
x. f
bx − xg
6
54.5
327
(54.5-76.5)2=484
12
80-89
15
90-99
8
n=
50
f
9
70-79
n
∑ f =50
64.5
74.5
84.5
94.5
580.5
894
1267.5
756
∑
x. f
=3825
2
i
bx − xg . f
2
i
2904
2
(64.5-76.5) =144
1296
2
(74.5-76.5) =4
48
2
(84.5-76.5) =64
960
2
(94.5-76.5) =324
2592
∑
bx − xg . f
2
i
= 7800
25
Elementary Statistics
Dr. Ghamsary
After substitution in s =
2
∑
bx − xg . f
Chapter 2
2
i
n−1
we get
7800
= 159.18 , and hence the
50 − 1
s = 159.18 ≈ 12.6
standard deviation will be
If we use the short cut formula
Class
s2 =
∑ ⎡⎣ x
2
( ∑ xf )
f ⎤−
⎦
n −1
n
2
, we need the following table.
x2. f
f
x
x. f
50-59
6
54.5
327
(54.5)2.6 =17821.5
60-69
9
64.5
580.5
(64.5)2..9 =37442.25
70-79
12
74.5
894
(74.5)2.12 =66603
80-89
15
84.5
1267.5
(84.5)2.15=107103.8
90-99
8
94.5
756
(94.5)2.8 =71442
n=
∑ f =50
∑
x. f
=3825
s2 =
s2 =
( 3825)
300412.5 −
40 − 1
50
Page 26
2
=
∑
x2. f
=300412.5
14630625
3004125.5 − 292612.5
50
=
=
39
49
300412.5 −
7800
= 159.18 and hence the standard deviation will be s = 159.18 ≈ 12.6 , which the same as
49
the above result.
26
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 27
Example15B: Find the standard deviation for the following group data
Class
Frequency
00-04
4
05-09
10
10-14
12
15-19
20
20-24
8
25-29
6
27
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 28
Question 1. What will happen to the mean, median, mode, range, and standard deviation if we add
a fix number, c, to all values in the data set?
Answer. The mean, median, and mode will increase by c units, but the range, and standard
deviation will not change.
Question 2. What will happen to the mean, median, mode, range, and standard deviation if we
subtract a fix number, c, from all values in the data set?
Answer. The mean, median, and mode will decrease by c units, but the range, and standard
deviation will not change.
Question 3. What will happen to the mean, median, mode, range, and standard deviation if we
multiply a fix number, c, to all values in the data set?
Answer. The mean, median, and mode will be multiplied by c units, so does to the range, and
standard deviation.
Example 16:
X
X+7
X-7
X*7
15
15+7=22
15-7=8
15*7=105
13
16+7=23
16-7=9
16*7=112
15
15+7=22
15-7=8
15*7=105
15
15+7=22
15-7=8
15*7=105
22
22+7=29
22-7=15
22*7=154
Mean
16
16+7=23
16-7=9
16*7=112
Median
15
15+7=22
15-7=8
15*7=105
Mode
15
15+7=22
15-7=8
15*7=105
Range
9
9
9
9*7=63
3.46
3.46
3.46
Sd
3.46*7=24.22
In general if Y = aX + b , then we have
•
Mean of Y = a. [Mean of X]+b or
•
Standard deviation of Y = |a| [standard deviation of X], S y = a S X
28
y = ax + b
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 29
Empirical Rule
If the distribution of a data is bell shape or normal, then
•
Approximately 68% of scores are one standard deviation away from the mean. They fall in the
interval x − 1s , x + 1s .
•
Approximately 95% of scores are two standard deviation away from the mean. They fall in the
interval x − 2s , x + 2s .
•
Approximately 99.7% of scores are two standard deviation away from the mean. They fall in the
interval x − 3s , x + 3s .
Example17. Suppose the IQ scores are normally distributed with the mean of µ = 100 and
standard deviation of σ = 15 . Then by the empirical rule
•
Approximately 68% of scores are in the interval 100-15, to100+15 or 85 to 115.
•
Approximately 95% of scores are in the interval 100-2(15), to100+2(15) or 70 to 130.
•
Approximately 99.7% of scores are in the interval 100-3(15), to100+3(15) or 55 to 145.
29
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 30
Coefficient of Variation
The coefficient of variation is defined to be the standard deviation divided by the mean.
Coefficient of variation (CV) =
s
. If x is 0 or close to 0, then this measure shall not be used.
x
Normally this measure is used in the case we have 2 or more groups of data with different units.
Example18.
Class A
Mean =129, and standard deviation= 11
CV=11/129=.085 or 8.5%
Class B
Mean =150, and standard deviation= 25
CV=25/150=.167 or 16.7%
Class C
Mean =60, and standard deviation= 15
CV=15/60 = .25 or 25.0%
The class C has the greatest relative variation.
Measures of Position
• Standard Scores
z=
x−x
or
s
z=
x−µ
σ
,
where, x or µ is the mean s or σ is the standard deviation.
This value, z, measures the deviation from the mean in number of standard deviation which is also has
no unit.
Example19. Suppose John is taking 3 classes with the following scores. In which class has he
better score?
Class A
Class B
Class C
English test score = 145
Mean =129, and standard deviation= 11
Physics test score = 190
Mean =150, and standard deviation= 25
Statistics test score = 88
Mean =60, and standard deviation= 15
So his score in class C is higher relatively.
30
Z=(145-129)/11 =1.45
Z=(190-150)/25 = 1.60
Z=(88-60)/15=1.87
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 31
Percentiles
The percentile corresponding to a given score (X) is denoted by P and it is given by the following
formula
P=
# of scores less than x
.100
total number of scores
Example20. John has the score of 88 in a class of 20 students. Find the percentile rank of a his
score.
81, 65, 75, 76, 78, 62, 63, 65, 70, 90,
61, 75, 76, 79, 58, 88, 82, 95, 90, 67.
Solution: In any problem of finding percentile, we must sort the data set from smallest to largest.
58, 61, 62, 63, 65, 65, 67, 70, 75, 75
76, 76, 78, 79, 81, 82, 88, 90, 90, 95.
P=
# of scores less than x
16
.100 =
.100 = 80
total number of scores
20
So john’s score has 80th percentile, which means 80% of all scores are below 88.
Finding the Score Corresponding to a Given Percentile
Example21. In data set of example 20, find the score corresponding 12th percentile.
Solution:
Step1: Make sure data is sorted
58, 61, 62, 63, 65, 65, 67, 70, 75, 75
76, 76, 78, 79, 81, 82, 88, 90, 90, 95
Step2: Compute the L = p% of n., where L is the location for the score.
In this example L=12%of 20=0.12(20)=2.4 or 3.
Step3: Go to the data set and pick the score at the 3rd position which is 62.
It is usually written as P12=62
31
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 32
Note:
• If L is not a whole number, round up to the next whole number.
•
If L is a whole number, use the score as the average of Lth and (L+1)th location score.
Example22. In data set of example 20, find the score corresponding 40th percentile.
Step1: as before
58, 61, 62, 63, 65, 65, 67, 70, 75, 75
76, 76, 78, 79, 81, 82, 88, 90, 90, 95
Step2: L =40% of 20= 0.40(20)=8 which is a whole number so we are going to pick the average of 8th
and 9th scores.
8th score is 70
Step3:
9th score is 75 and their average is (70+75)/2=72.5. So P40=72.5.
Deciles: divide the data set into 10 groups.
D1=10th percentile which the same as P10
D2=20th percentile which the same as P20
…….
D9=90th percentile which the same as P90
Quartiles: divide the data set into 4 groups.
Q1=First quartile or 25th percentile which the same as P25
Q2=second quartile or 50th percentile which the same as P50 . This is also median
Q3=third quartile or 75th percentile which the same as P75
Inter-Quartile Range (IQR): is the difference between 3rd and 1st quartiles and it is denoted by
IQR and it is defined by IQR = Q3 – Q1.
32
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 33
Example23. In data set of example 20, find the score corresponding to
• D2
• Q1
• Q3
• IQR
Outlier: An outlier is an extremely high or an extremely low data value, To check for outlier we
compute Q1-1.5(IQR) and Q3+1.5(IQR), then if
• The suspected score is below Q1-1.5(IQR) or
• The suspected score is above Q3+1.5(IQR)
Then the score is said to be an outlier.
Example24. Is there any outlier in the following data set?
55
55
36
52
51
46
49
41
47
61
46
51
86
44
51
41
41
53
51
48
Sorted Data
36
51
41
51
41
51
41
51
44
52
46
53
46
55
47
55
48
61
49
86
33
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 34
Five commonly used Statistics: The five numbers in any data set that is used frequently are
Minimum, Q1, Q2, Q3, Maximum
Box plot or box-and-whisker plot: is another graphical representation of any data set. We
use the five commonly used statistics to graph the box plot. The box plot can provide answers to the
following questions
o Is a factor significant?
o Does the location differ between subgroups?
o Does the variation differ between subgroups?
o Are there any outliers?
Example25. In data set of example 20, find the 5 common statistics.
58, 61, 62, 63, 65, 65, 67, 70, 75, 75
76, 76, 78, 79, 81, 82, 88, 90, 90, 95
1. Minimum: is 58
2. Q1: L= 25% of 20 =.25(20) = 5. Since this is a whole number we use the average of 5th and
6th
observation. In above ordered data set we have
5th score is 65
6th score is 65
their average is also 65. SO Q1=65.
3. Q2: L= 50% of 20 =0.50(20) =10. Again since this is a whole number we use the average of
and 11th observation. In above ordered data set we have
10th score is 75
11th score is 76
their average is (75+76)/2=75.5 SO Q2=75.5.
4. Q3: L= 75% of 20 =0.75(20) =15. This is a whole number we use the average of 15th and 16th
observation. In above ordered data set we have
15th score is 81
34
10th
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 35
16th score is 82
their average is (81+82)/2=81.5 SO Q3=81.5.
5. Maximum: is 95.
So the five statistics are
58, 65, 75.5, 81.5, and 95.
Boxplot of C 1
100
C1
90
80
70
60
Example26 In data set of example 24, find the 5 common statistics.
35
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 36
Example27. In data set below use computer to find the descriptive statistics and plot all
appropriate charts for all variables that was discussed so far.
Test1
76
62
68
69
79
90
79
86
52
97
78
55
96
89
73
66
88
92
94
50
71
89
78
88
58
Sex
1
1
1
1
0
0
1
1
0
1
1
1
1
1
0
0
1
0
1
1
0
0
1
0
1
Grade
C
D
D
D
C
A
C
B
F
A
C
F
A
B
C
D
B
A
A
F
C
B
C
B
F
Test1
76
59
92
93
88
86
66
81
85
85
70
55
62
80
60
80
72
82
86
99
63
75
83
78
61
Sex
1
1
1
1
0
0
0
1
0
0
1
1
1
1
1
1
1
0
1
1
1
1
1
0
1
Descriptive Statistics: Test1
Variable Sex
N N* Mean SE Mean StDev Minimum Q1 Median
Test1 Female 34 0 75.59 2.36 13.76 50.00 62.00 77.00 86.50
Male
36
16 0 79.38
2.77 11.10 52.00 71.50 83.50 88.00
Q3
99.00
92.00
Maximum
Grade
C
F
A
A
B
B
D
B
B
B
C
F
D
B
D
B
C
B
B
A
D
C
B
C
D
Elementary Statistics
Dr. Ghamsary
Chapter 2
Page 37
Boxplot of Test1 vs Sex
100
90
Test1
80
70
60
50
Female
Male
Sex
37
Download