Frequency Distribution Table

advertisement
STATISTICS
• Branch of mathematics that
deals with the systematic
method of collecting,
classifying, presenting,
analyzing and interpreting
quantitative data
DIVISION OF STATISTICS
• DESCRIPTIVE
• To summarize and describe
the group characteristics of
data
• INFERENTIAL
• Drawing of conclusion or
judgment about a population
based on a representative
sample
POPULATION
• Consists of the totality of the
observations with which one is
concerned
SAMPLE
• Subset; taken from a population
of objects or observation
Variable
Is a characteristic or information of
interest that is observable or measurable
from every individual or object under
consideration.
Types of variables
 Qualitative or Categorical Variable
 Quantitative or Numerical Variable
Types of Quantitative Variables
 Discrete Quantitative
 Continuous Quantitative
Levels of Measurements of Variables:
 Nominal Level (Classificatory Scale)
Lowest level of measurement that simply labels or
names or categories without any implicit or explicit
ordering of the levels.
 Ordinal Level ( Ranking Scale)
Labels or classes with an implied ordering in these
labels.
 Interval Level
The unit of measurement is arbitrary and there is
no “true zero” point.
Ratio Level
Contains all the properties of the interval level, and
in addition, it has a “true zero” point.
STEPS IN STATISTICAL INQUIRY
•
•
•
•
•
Collection of Data
Processing of Data
Presentations of Data
Analysis of Data
Interpretation of Data
TYPES OF DATA
• INTERNAL DATA
• Company’s own data
• EXTERNAL DATA
• Outside sources
METHODS OF DATA COLLECTION
• Interview or direct method
• Questionnaire or indirect
method
• Registration method (e.g. NSO)
• Observation
• Experimentation
Sampling Techniques
One of the most parts of the research work
that needs preparation and planning is choosing
the right and appropriate sampling method.
• Random Sampling
A recommended process to prevent the
possibility of a biased or erroneous inference.
Under the concept of randomness, each member
of the population has an equal chance to be
included in the sample gathered.
• Stratified Random Sampling
This sampling technique is done through
dividing the population into categories or
strata and getting the members at random
proportionate to each stratum or sub – group.
• Systematic Random Sampling
Refers to a process of selecting every nth
element in the population until the desired
sample size is acquired.
• Cluster Sampling
Is the advantageous procedure when the
population is spread over a wide geographical area
CLUSTER – refers to an intact group which
has a common characteristics
• Multistage Sampling
More complex sampling technique, which
includes the following steps:
a)Divide the population into strata.
b)Divide each stratum into clusters.
c) Draw a sample from each cluster using the
simple random sampling technique
PROCESSING OF DATA
• EDITING – to detect errors
• CODING – assigning numerals
and other symbols to be able to
group them
• CLASSIFYING – sorting and
grouping
PRESENTATION OF DATA
• Textual
• Tabular
• Graphic
• Bar
• Line
• Pie Chart
• Scatter
• Pictograph
OTHER TERMS
• VARIABLE – fundamental
quantity that changes
• DISCRETE VARIABLE – no in
betweens
• CONTINUOUS VARIABLE – with
in between
• CONSTANT – does not change
How to organize data?
Frequency Distribution Table:
is the organization of raw data in table form
Consider the midyear scores of 45 students in
Statistics
29 27 28 27 34 29 27 27 28
25 23 35 25 29 33 23 27 33
27 22 40 27 21 29 22 25 29
25 21 20 21 23 25 30 20 28
30 29 28 30 27 27 27 19 30
Steps in Constructing Frequency
Distribution Table
• Find the range r. The range is the difference
between the highest score and the lowest
score.
• Decide on the number of classes. A class is a
grouping or category. The ideal number of
classes is between 5 and 15.
• Determine the class interval i. Class interval or
simply interval, is the size of each class.
Determine the classes starting with
the lowest class.
Determine the class frequency (f) for
each class by counting the tally. The
column for tally is optional.
The following numerical values are
relevant in dealing with frequency
distribution:
1. Class mark. It is the middle value in a
class
2. Class boundaries. They are often
described as the true limits .
The lower boundary of a class is
0.5 less than its lower limits, and
the upper boundary is 0.5 more
than its upper limit.
Cumulative frequency. is found by
adding the frequency starting from
the lowest class.
Grouped Frequency Distributions
Class Limits
Class
Boundaries
Class Mark
(X)
Tally
Frequency
Cumulative
Frequency
24 - 30
23.5 – 30.5
27
III
3
3
31 - 37
30.5 – 37.5
34
I
1
4
38 - 44
37.5 – 44.5
41
IIII
5
9
45 - 51
44.5 – 51.5
48
IIII IIII
9
18
52 - 58
51.5 – 58.5
55
IIII I
6
24
59 - 65
58.5 – 65.5
62
I
1
25
Total = 25
Example 1: These data represent the record high
temperatures in ⁰F for each of the 50 States
112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
Construct a grouped frequency distribution for the data using 7 classes
Example 2: Statistics Test Score of 50 Students
88
85
60
75
63
55
78
90
86
40
62
83
46
78
90
62
40
47
55
52
63
76
85
87
63
62
51
48
76
72
88
72
71
70
60
83
56
54
52
43
Construct the GFD for the Statistics Test Scores with 11 classes.
65
63
67
42
73
79
80
77
76
60
Using the data,
1. Construct a frequency
distribution with 11 classes.
2. Construct a histogram, a
frequency polygon and ogive from
the data.
Graphical Presentation of Data
A histogram is a bar graph like
representation of a frequency distribution.
The rectangular bars are without space
between them. The height of each bar
corresponds to the frequency of the class
and the width corresponds to the class
marks.
-A well balanced histogram should
have a height of 60%, 67% or 75%
of its width.
- A frequency polygon is a line
graph where the frequency of each
class is plotted against the
corresponding class mark.
An ogive ( pronounced as o – jayv)
is a line graph where the
cumulative frequency of each class
is plotted against the
corresponding class boundary.
Cumulative Frequency Graph (ogive)
Exercises #1:
I. Classify the following according to the scale of
measurement. Write N if your
answer is nominal, O if ordinal, I if interval or R if ratio.
______ 1. Newborns arranged according to gender.
______ 2. Banking hours of the different types of banks in
Metro Manila
______ 3. Peso-dollar exchange rate
______ 4. Temperature range of patients afflicted with
pneumonia.
______ 5. lead content in toys manufactured in the Phils.
II. Classify the following as descriptive statistics or inferential
statistics. Write D
if your answer is descriptive and write I if your answer is inferential.
______ 1. The time it takes a shipment of perishable goods to reach
its
destination.
______ 2. The number of times the peso –dollar rate fluctuates
during the week.
______ 3. Based on the medical record of the patient, the patient has
a
high sugar level considered to be critical.
______ 4. The farm produce in Baguio .
______ 5. For the past one month, there was an increase number of
cases
of cholera in hog farms in Bulacan..
III: Statistics Test Score of 50 Students
88
62
63
88
65
85
60
75
63
55
78
90
86
40
83
46
78
90
62
40
47
55
52
76
85
87
63
62
51
48
76
72
72
71
70
60
83
56
54
52
43
63
67
42
73
79
80
77
76
60
Construct the GFD for the Statistics Test Scores with 11 classes.
Example 1: The following data represents the
weekly savings of employees in a manufacturing
company.
• 49
• 57
• 54
84
82
52
91
29
43
67
47
67
38
38
65
• 50
• 16
• 78
18
65
56
58
35
35
48
71
59
39
73
71
•
•
•
•
26
24
34
46
57
52
39
34
61
63
39
28
42
85
61
25
9
44
46
29
Using the data,
1. Construct a frequency
distribution with 11 classes.
2. Construct a histogram and a
frequency polygon from the data.
3. Construct a frequency
distribution using 9 class interval.
4. Construct a histogram and a
frequency polygon from the data.
MEASURES OF CENTRAL TENDENCY
-It is a statistic that serves as a representative
of the data under investigation.
-This tends to lie within the center of the set
of data.
-There are three measures of central
tendency such as the mean, median and
mode.
The Mean(𝑥)
• It is the most important, the most useful,
and the most widely used measure of
central tendency.
• It refers to the sum of all the given values
or items in a distribution divided by the
number of values or items summed.
• Mean has limitations and uses.
The Mean is Used
• for interval and ratio measurement;
• If higher statistical computations are wanted;
• If there are no extreme values in the
distribution since it is easily affected by
extremely low scores or extremely high scores.
Thus, the distribution is approximately
normal;
- When the greater reliability of
the measure of central
tendency is wanted since its
computations include all the
given values.
The Limitations of the Mean
• It is the most widely used average, because it is
the most familiar. It is often, however misused.
It cannot be used if the clustering of values or
items is not substantial. An example is when
representing the scores or values, 10 and 100
since they are far apart.
• When the given values do not tend to cluster
around a central value, the mean is a poor
measure of central location.
• It is easily affected by extremely large or small
values. One small value can easily pull down
the mean.
- The mean cannot be utilized to
compare distributions since the means
of two or more distributions may be
the same but their characteristics
maybe entirely different. The means of
distribution A whose values are 80, 85,
and 90 and distribution B whose values
are 86, 85 and 84 are both 85.
However, we cannot imply that both
distribution posses the same
characteristics since their patterns of
dispersion or variations are markedly
different despite having the same
mean.
The formula for computing the Mean
are:
• Ungrouped Data
n
• Where:
X
 Xi
i 1
n
x= is the mean, xi stands for the values or
items and n is the number of respondents.
Grouped Data:
The midpoint
formula
n
X 
 Xifi
i 1
n
• Where:
• - is the mean
• Xifi- is the product of the classmark and the
frequency
• n – is the number of respondents
The Mean for Grouped data can
also be computed using the CODED
FORMULA:
 n

•
X
=
AM 
  xifi 
 i 1
i
 n 




• Where:
• AM – assumed mean
• Xi – deviation of the values from the assumed
mean
• i – class size
• n – number of cases
Example: Compute for the mean using the two
formulas.
Class Interval
90 - 94
85 - 89
80 - 84
75 - 79
70 - 74
65 - 69
60 - 64
55 - 59
50 – 54
45 – 49
40 – 44
f
2
6
3
8
5
2
10
3
4
3
4
Solution for Mean (Using Midpoint Formula)
𝑿𝒊 𝒇𝒊
Class Interval
90 – 94
85 – 89
Class Mark (X) Frequency (f)
92
2
87
6
184
522
80 – 84
75 – 79
70 – 74
65 – 69
82
77
72
67
3
8
5
2
246
616
360
134
60 – 64
55 – 59
50 – 54
62
57
52
10
3
4
620
171
208
45 – 49
47
3
141
40 – 44
42
4
168
𝒇𝒊 = 50
𝑿𝒊 𝒇𝒊 = 3,370
Using Midpoint Formula:
Solving for mean 𝑥
𝑥=
=
𝑛
𝑖=1 𝑋𝑖 𝑓𝑖
3,370
50
=67.4
𝑛
Solution for Mean (Using Unit Deviation Formula)
𝒙𝒊 𝒇𝒊
𝒙𝒊
Class
Interval
90 – 94
85 – 89
Class Mark
(X)
92
87
Frequency
(f)
2
6
5
4
10
24
80 – 84
75 – 79
70 – 74
82
77
72
3
8
5
3
2
1
9
16
5
65 – 69
60 – 64
55 – 59
50 – 54
67 (AM)
62
57
52
2
10
3
4
O
-1
-2
-3
0
-10
-6
-12
45 – 49
47
3
-4
-12
40 – 44
42
4
-5
-20
𝒏= 50
𝒙𝒊 𝒇𝒊 = 4
Using Unit Deviation Method:
Solving for mean( 𝑥 )
𝑛
𝑖=1 𝑥𝑖 𝑓𝑖
𝑥= 𝐴𝑀 +
𝑥= 64+
𝑛
4
50
x5
= 67+0.4
= 67.4
Where:
𝑖
Assumed Mean(AM)
may be one of the
class marks but
preferably one which
is located at the
center of the
distribution or one
which has the highest
frequency.
The Median(𝑥)
• This is the middle value in a set of quantities. It
separates an ordered set of data into two equal
parts. Half of the quantities are found above the
median and the other half is below it.
• To find the median of an ungrouped data, follow
these steps:
1. Arrange the quantities either in ascending or
descending order.
2. Number the quantities
consecutively from 1 to n.
3. If n is odd, the median is the
(n+1/2)th quantity. If n is even, the
median is the mean of (n/2+1)th
and (n/2)th quantities.
The Median is Used
• for ordinal or ranked measurement;
• if there are extrme cases, thus the
distribution is markedly skewed;
• if we desire to know whether the cases
fall within the upper halves or the lower
halves of the distribution;
• for an open-end distribution; that is, the
lowest or the highest class interval or
both are not defined as 50 and below or
100 and above;
Limitations of the Median:
• It is easily affected by the number of items in a
distribution.
• It cannot be determined if the given values are
not arranged according to magnitude.
• If several values are contained in a distribution,
it becomes a laborious task to arrange them
according to magnitude.
• Its value is not as accurate as the mean
because it is just an ordinal statistic.
Formula for finding the Median:
• To get the median for ungrouped data,
we simply arrange the data from the
highest value to the lowest value or vice
– versa. The median is the middle value
in the distribution.
• If there is an odd number of observation,
the middle value is the median. Ex. 6 ,7,
8, 9, 10, 12, 16
• If the number of observation is even, the
average of the two middle scores is the
median. Ex. 8, 7, 6, 5, 4, 3
Grouped Data:
𝑛
𝑥=𝑢+
−
𝑐𝑓
2
𝑖
𝑓𝑖
• where:
• u – exact lower limit of the class interval containing the
median
•
𝑛
2
one half of the total number of cases
• cf – cumulative frequency immediately below u
• i – class interval
• 𝑓𝑖 - freequency of the class interval containing the
median
Solution for Median
Class Interval
90 – 94
85 – 89
Frequency(f)
2
6
Cumulative frequency (cf)
50
48
80 – 84
75 – 79
70 – 74
65 – 69(u = 64.5)
3
8
5
2
42
39
31
26
60 – 64
55 – 59
50 – 54
10
3
4
24 =cf
14
11
45 – 49
40 – 44
3
4
7
4
𝒇 = 𝒏 = 𝟓𝟎
Solving for Median:
𝑛
𝑥=𝑢+
= 64.5 +
= 67
−
𝑐𝑓
2
𝑖
𝑓𝑖
25 −24
2
x5
Examples:
Solve for Median For ungrouped data
• Find the median of the set of measure: 23, 15,
9, 30, 27, 10, 18, 14, 13.
• 12.6, 15.0, 19.8, 17.9, 11.7, 18.6, 14.1, 13.4
The Mode(𝑥)
• It is the quantity with the most number
of frequency.
• A set of data is unimodal distribution if it
contains only one mode. For instance,
the set 11, 15, 13, 15, 14, 13, 15 is
unimodal. The mode is 15 with 3
frequencies.
• A set is bimodal distribution if it contains
two modes. For example, the sets
88, 89, 82, 82, 82, 89, 88, 89 and
63, 55, 57, 60, 60, 66, 56, 58, 57
are bimodal. The modes are 82 and
89 and 60 respectively.
A set of data with three modes is
trimodal. But the distribution 40,
44, 37, 37,44, 40 has no mode.
The Mode is Used
for nominal or categorical data;
• if the most popular or most typical
case or value in the distribution is
wanted.
• If a rough or quick estimate of a
central value is wanted.
•
The Limitations of the Mode
• It is rarely or seldom used since it
does not always exist.
• It is very unstable because its value
changes depending on the
approaches used in finding it.
• Its value is just a rough estimate of
the center of concentration of a
distribution.
Formula for Mode of Grouped Data
• The mode in grouped data is the class mark or
midpoint of the class with the highest
frequency.
𝑥 = 𝑢+
𝑑1
𝑖
𝑑1 + 𝑑2
• where:
• u – exact lower limit of the modal class
• d1 – difference between the frequency of the
modal class and the next class lower in value
• d2 – difference between the frequency of the
modal class and the next class higher in value
• i – class size of the modal class
Solution for Mode
Class Interval
90 – 94
85 – 89
Frequency(f)
2
6
80 – 84
75 – 79
70 – 74
3
8
5
65 – 69
60 – 64(modal class)
55 – 59
50 – 54
2
10
3
4
45 – 49
3
40 – 44
4
𝑑1 =10 – 3 =7
𝑑2 = 10 − 2 = 8
𝒇 = 𝒏 = 𝟓𝟎
Solving for Mode:
𝑥 = 𝑢+
= 59.5 +
7
15
x5
= 61.8 ≈ 62
𝑑1
𝑖
𝑑1 + 𝑑2
Example: Compute for the mean, median, and mode
given the age brackets of the workers in a certain
factory.
Age
42 – 44
39 – 41
36 – 38
33 – 35
30 – 32
27 – 29
24 -26
21 – 23
18 – 20
15 – 17
No. of Workers(f)
15
18
23
20
24
16
25
12
10
13
Skewness in Relation to Central
Tendency
• The measure of central tendency are helpful
describing the characteristics of a given
distribution.
• When the values of the mean, median and
mode are all equal, then they are all
represented by a simple point in a distribution.
• The distribution in such case is normal or
symmetrical.
-If the values of the mean, median
and mode are not the same, the
curve or distribution is skewed or
assymetrically.
-There are two types of skewed
distribution.
*Positively Skewed – the curve has
a heavy right tail. This means that
there are more high values, so the
scores accumulate at the right.
Therefore, the mean is pulled into
the tail of the distribution and its
value is higher than the median. The
mean here is easily affected by
extreme cases which in a positively
skewed distribution are found to the
right. Moreover, the mean is also
found to the right of the mode since
skewness in this case is approximated
by the distance of the mean from the
mode.
* Negatively Skewed – the curve has a
heavy left tail. This implies that there
are more low scores, so that the values
accumulate at the left. Therefore, the
mean is pulled into the tail of the curve
which is found at the left. So the value
of the mean is lower than the median
because extreme cases are found at the
left of the distribution.
Quantiles:
• This refers to values which divides the distribution
into a given number of equal parts.
• There are types of quantiles:
• Quartiles – divide the distribution into four equal
parts.
• Deciles – divide the distribution into ten equal parts.
• Percentiles – divide the distribution into one
hundred equal parts.
Percentiles(for ungrouped data)
• Are positions measures used in
educational and health- related
fields to indicate the position of a
n individual in a group.
Percentile formula:
• The percentile corresponding to a given value X is
computed by using the following formula:
• 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑋 +0.5
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
x100%
Example
1. A teacher gives a 20-point test to 10
students. The scores are shown here. Find
the percentile rank of a score of 12.
18,15, 12, 6, 8, 2, 3, 5, 20, 10
Solution:
• Arrange the data in order from lowest to
highest.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
• Then substitute into the formula
𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 =
6+0.5
x
10
100% = 65th percentile.
Thus, a student who scored 6 did better than
65% of the class.
Procedure Table
• finding a data value corresponding to a given
Percentile
STEP 1:Arrange first the scores according to
magnitude or size.(lowest to highest).
STEP 2:𝑐 =
𝑛∙𝑝
100
where: n = total number of values
p = percentile
STEP 3A: If c is not a whole number, round up to
the next whole number. Starting at the
lowest value, count over to the
number that corresponds to the
rounded-up value.
STEP 3B: If c is a whole number, use the
value halfway between the cth and
(c + 1)st values when counting up from
the lowest value.
EXAMPLE 2:
• Using the scores in Example 1:
a. find the 25th percentile.
b. find the 60th percentile.
SOLUTION:
• For a:
STEP 1:
STEP 2:
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
𝑐=
10∙25
100
= 2.5
STEP 3: then c = 3.
hence, the value 5corresponds to
the 25th percentile
SOLUTION:
• For b:
STEP 1:
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
10∙60
100
STEP 2:
𝑐=
STEP 3:
6th value = 10, 7th value = 12
then,
10+12
=
2
=6
11
Hence, 11 corresponds to the 60th
percentile
Examples:
1. Find the 20th percentile or P20 of
the following scores:
25, 22, 20, 16, 17, 12, 8, 6, 5
2. Find the 60th percentile of the
following scores: 99, 95, 80, 75, 70,
60, 40
Quartiles and Deciles(for ungrouped data)
• Finding Data values Corresponding to Q1, Q2,
and Q3
STEP 1: Arrange the data in order from lowest to
highest.
STEP 2: find the median of the data values. This
is the value for Q2.
STEP 3: Find the median of the data values that
fall below Q2. This is the value for Q1 .
STEP 4: Find the median of the data values that
fall above Q2. This is Q3.
Example:
• Find Q1, Q2, Q3 for the data set
15, 13, 6, 5, 12, 50, 22, 18
SOLUTION:
STEP 1:
STEP 2:
5, 6, 12, 13, 15, 18, 22, 50
𝑄2 =
13+15
=
2
14
STEP 3:
values less than 14
5, 6, 12, 13
Q1
6 + 12
𝑄1 =
=9
2
STEP 4:
values greater than 14
15, 18, 22, 50
𝑄3 =
18+22
2
= 20
Computations of the Quantiles for
Grouped Data
• The computations for the grouped data is
similar to that of the median.
• The formula is
 np  cf
Pp  u  
 f

i

where:
Pp – the desired quantiles
u – exact lower limit of the class interval
containing the median
n - number of cases
p – proportion corresponding to the desired
quantiles
cf – cumulative frequency immediately below
the class interval containing pp
f – frequency of the class interval containing pp
i – class interval
The efficiency ratings of 200 faculty members of
a certain college were taken and are shown
below.
CI
73 – 75
76 – 78
79 – 81
82 – 84
85 – 87
88 – 90
91 – 93
94 – 96
97 – 99
f
2
6
11
18
20
39
55
39
10
1.Compute for the value of the
mean, median and mode
2.Determine the value of the
following:
nd
a. lower boundary of the 2
quartile class
b. upper limit of the 3rd
quartile class
c. classmark of the 78th percentile class
d. frequency of the 8th decile class
e. cumulative frequency before the 5th decile
class
3. Determine the value of the following:
a. Q1
e. D4
b. P36
f. P55
c. D5
g. P79
d. D7
h. Q4
Download