Uploaded by sherylanneb.umali

Math-01-Module-DATA-MANAGEMENT

advertisement
Mathematics in Our World | Mathematics as a Tool: Data Management
Module 3
Mathematics as a Tool: Data Management
Contents
A. Basic Concepts in Statistics
B. Measures of Central Tendency
C. Measures of Dispersion
D. Measures of Relative Position
E. Normal Distribution
F. Linear Regression and Correlation
Department of Mathematics
College of Arts and Sciences
Mariano Marcos State University
2019
Mathematics in the Modern World
1
Mathematics in Our World | Mathematics as a Tool: Data Management
“It is easy to lie with statistics. It is hard to tell the truth without statistics.”
Andrejs Dunkel
Introduction
Data management is a process by which information is acquired and
processed to ensure the accessibility and reliability of the data for its users. One of
the most important tool in processing and managing such information is statistics.
Statistics is utilized in most areas of human endeavor. It is usually used in education,
research, business, agriculture, and other fields and even in everyday life activities.
Data or the pieces of information may be collected by conducting a survey,
interview, observation, and experiment. The data gathered can be properly
Definition 1: Statistics is a science which deals with the collection, organization,
presentation, analysis, and interpretation of data so as to give a more meaningful
information.
organized and presented graphically by a line graph, bar graph or pictograph or with
the aid of a statistical table known as frequency distribution table (FDT). A concise
and meaningful conclusion is obtained from the analysis and interpretation of data.
Relevant information can be deduced from the analysis of numerical descriptions
and predictions may be made based on a small group to project the whole
population. The work of statistics offers a wide area of concern. Thus, statistics is
subdivided into two branches, namely: descriptive statistics and inferential statistics.
Definition 2: Descriptive statistics refers to the collection, organization,
summary, and presentation of data while inferential statistics deals with the
interpretation and analysis of data where conclusion is drawn based from the
subset of the population.
In descriptive statistics, a set of data is simply described without drawing any
inferences or implications. The data is merely summarized and discussed in a clear,
concise and informative manner. In inferential statistics, information or inferences
concerning a large group known as population is provided based on the study of a
representative group or selected members in the population which are identified as
sample. Calculating the average rating of a class of 40 students in Math 01
Mathematics in the Modern World
2
Mathematics in Our World | Mathematics as a Tool: Data Management
illustrates the descriptive statistics while determining the performance of the same
class based on the performance of 10 randomly selected members in the class
exhibits inferential statistics.
BASIC TERMS
Some of the basic terminologies and notations involved in statistics are the
following:
a. Population - a collection or set of things or objects under consideration
b. Sample - a subset or representative group of the population
c. Data - refers to the information gathered in a research
Statistical data are classified according to their sources, namely: primary data
or secondary data.
๏‚ท Primary data – information gathered from respondents by the researcher
himself.
๏‚ท
Secondary data – information obtained from published materials or data
gathered by other individuals or agencies. These are the data which are
transcribed from original sources.
d. Array – listing of observations which are arranged in an increasing or
decreasing magnitude
e. Parameter - a value which is computed from a population
f. Statistic – a value which is computed from a sample
g. Variable – a characteristic of interest that has been observed or measured on
every member of the population or sample.
A variable may be quantitative or qualitative where quantitative variable is
further classified as discrete or continuous.
i.
Quantitative/Numerical variable – describes the amount or number of
an element of a sample or population
๏ƒ˜ Discrete – takes on a countable amount (it is usually expressed as
whole number)
Example: number of books owned by a student
๏ƒ˜ Continuous – measured in a continuous scale (it takes any value
within a range or interval)
Example: height of the students (in feet)
Mathematics in the Modern World
3
Mathematics in Our World | Mathematics as a Tool: Data Management
ii.
Qualitative/Categorical variable – describes the quality, category, or
character of an element of a population or sample
Examples:
gender (male or female)
hair color (black, brown, blonde)
level of satisfaction of a student on his grade (highly satisfied,
satisfied, not satisfied)
Levels of Measurement
A more detailed distinction, termed as the levels of measurement, is used by
some researchers in examining the information that is collected. It is classified as
follows:
1. Nominal Measurement - numbers or symbols are used to code or classify
each element in the population. Note that the assigned numbers have no
numerical meaning.
Examples: gender, educational background, employment status
2. Ordinal Measurement– uses numerical category that expresses the
meaningful order. There is no indication of distance between positions. The
numbers become meaningful because they reveal whether one class or
category is more or less than the other. Categories are ranked according to
the order of their value on the property like first, second, third; oldest, next
oldest, youngest.
Example: rank in beauty contest
3. Interval Measurement– has equal intervals. There is significance to the
distance between any two values. It tells us that one unit differs by a certain
amount of the property from another unit. It has no absolute zero.
Example: Aptitude test, temperature
4. Ratio Measurement – A variable measured at this level not only includes the
concepts of order and interval, but also includes the idea of ’nothingness’, or
absolute zero.
Example: Measurement of height, weight, ages
Mathematics in the Modern World
4
Mathematics in Our World | Mathematics as a Tool: Data Management
Remark: The scale of measurement depends mainly on the method of
measurements and not on the property being measured.
For instance, the weight of a pack of milk measured in kilograms has an
interval scale but if the boxes are labelled as one of small, medium or large, the
weight is measured in ordinal scale.
Measure of Central Tendency
One way of summarizing the data is to figure out the data set by using the
descriptive measures. Among the most commonly used descriptive measures
which are important are the measures of central tendency and measures of
Definition 3: A measure of central tendency (or central location) is a single
value that is used to identify the “center” of the data set or set of observations.
dispersion.
The three measures of central tendency are the mean, median and mode
Definition 4: The mean also known as the arithmetic average is the sum of all the
observed values divided by the number of observations in the data set. It can be
๐‘›
๐‘–=1 ๐‘‹๐‘–
computed as ๐œ‡ =
where ๐‘ฅ๐‘– is the ๐‘– ๐‘กโ„Ž observation and ๐‘› is the number of
๐‘›
observations in the data set.
where the mean is the most familiar measure of the “center”.
The mean of the population is symbolized by the lowercase letter “mu” in
Greek alphabet, , while the mean of the sample is represented by ฬ… (x – bar).
Example 1: The scores of five students who are selected randomly in a class of
Math 01 are as follows: 44, 37, 41, 35 and 32. Find their average score.
Solution:
Applying the mean of ungrouped data gives ฬ…
.
Hence, the average score of the five students is 37.8.
The means of subgroups can be combined to come up with the group mean
known as weighted mean. This can be calculated using the formula
Mathematics in the Modern World
5
Mathematics in Our World | Mathematics as a Tool: Data Management
=1
ฬ…
=1
where
is the
โ„Ž
observation
is the frequency or weight for each observation
is the total of the frequencies
Example 2: If the final examination of a class in statistics is given the weight 2, the
average quizzes the weight 3, and a project report the weight 1, what would be the
mean grade of a student who got the grades 90, 85 and 87, respectively.
Solution:
ฬ…
(
)
(
)
(
)
The mean grade of the student is 87.
Remarks:
1. The mean may not be an actual observation in the data set.
2. The mean reflects the magnitude of every observation since every observation
contributes to the value of the mean.
3. The mean is not a good measure of central tendency if there is an extreme value
or observation since it is easily affected by extreme values. The best measure of
center for this case is the median.
Definition 5: The median is a single value which divides an array of observations into
two equal parts such that 50% of the observations falls above it and the remaining
50% falls below it. It may be written symbolically by ๐‘ฅฬƒ read as “x - tilde”.
The median of the data set consisting of an odd – numbered observations is the
middlemost value in the list. That is, ฬƒ
1
where n is the number of observations. If
is even, the median is the average of the two middlemost values. It can be computed
as ฬƒ
1
where
are the two middlemost values. Take note that the
observations are first arranged in an array form (from lowest to highest) before getting
the median value.
Example 1: The number of books owned by the eleven children are as follows: 5, 2, 4, 6,
5, 10, 7, 6, 9, 8, 6. What is the median?
Mathematics in the Modern World
6
Mathematics in Our World | Mathematics as a Tool: Data Management
Solution:
Arrange the data in an array form: 2, 4, 5, 5, 6, 6, 6, 7, 8, 9, 10. Since the list
contains 11 numbers then the median is the middlemost value (6th number) which is 6.
Example 2: Compute the median of the data set: 2.5, 4.0, 5.8, 3.5, 2.5, 8.2, 7.1, 3.7
Solution:
Forming an array, we have 2.5, 2.5, 3.5, 3.7, 4.0, 5.8, 7.1, 8.2.
values, hence, the median is calculated as ฬƒ
There are
.
Remarks:
1. The median value may not be an actual observation in the data set.
2. The median is a positional value, hence, it is not affected by the presence of
extreme observations.
3. When the data is qualitative, median is not a possible measure so described
the center by determining the mode.
Definition 6: The mode is an observation that occurs most frequently in the given
data set.
Example 1: Find the mode in the following sets of scores.
a) set A: 36, 36, 12, 29, 35, 45. 50, 45, 45, 53
b) set B: 8, 7, 6, 5, 6, 9, 2, 3, 11, 11, 43, 10
c) set C: 39, 23, 25, 25, 63, 37, 45, 37, 48, 51, 28, 45, 50
d) set D: 2, 9, 8, 12, 5, 13, 6, 10
Solution:
The mode in set A is 45 because 45 occurs most frequently in the list. Both 6 and
11 have the most number in set B, therefore, set B has the mode equal to 6 and 11. The
mode in set C are 25, 37 and 45 since these numbers have the highest frequency. Each
element in set D has the same number of occurrences, thus, the data set has no mode.
The distribution of data may be classified as unimodal, bimodal, trimodal or
multimodal distribution depending upon the number of modal values in the given data
set. In the above example, set A is unimodal, set B is bimodal and set C is trimodal.
Mathematics in the Modern World
7
Mathematics in Our World | Mathematics as a Tool: Data Management
Example 2: What is the modal color of the shirt worn by the students if the data gathered
were as follows: white, gray, gray, black, white, red, red, gray, black, white, white, red,
gray, red, gray, black, red, red, gray, gray, black?
Solution:
Since gray has the highest frequency, it follows that the modal color of the shirt
worn by the students is gray.
Remarks:
1. The mode can be used for both quantitative and qualitative data.
2. It is very much affected by the method of grouping.
3. It is determined by the frequency and not by the values of the observations.
DO THESE!
1. Company ABC is awarding the top ten most outstanding workers in their
company every year. The ages of the top ten awardees for the year 2018 are 47,
53, 36, 60, 30, 28, 42, 43, 38 and 52. Determine the mean, median and mode of
the ages.
2. The mean weight of 50 Balikbayan boxes is 135 kgs. What is the approximate
total weight of all the boxes?
3. The average height of the four basketball players is 74 inches. If the height of the
three players are 69 inches, 72 inches and 78 inches, what is the height of the
fourth player?
4. What is the median of the distribution given by 23, 17, 12, 8, 14, 25, 19, 22, 18?
If the maximum value is replaced by 40, what effect will this have on the median?
How about if the minimum is replaced by 0?
5. The final grades of a student in six subjects he enrolled last semester are shown
below.
Subject
Number of Units Final Grade
Calculus 1
5
2.25
English 3
3
2.0
Psychology 1
3
1.5
Finance 2
3
2.0
Accounting 3
6
2.25
Mathematics in the Modern World
8
Mathematics in Our World | Mathematics as a Tool: Data Management
Humanities
3
1.75
Determine her average grade. If the subjects were of equal number of units, what
would be her average?
MEASURE OF DISPERSION
In some cases, describing the data using the measures of central tendency
alone is not enough to provide a sufficient information concerning a population or
sample. It should be supplemented by an analysis on how the individual elements of
the population/sample tends to cluster around the central tendency. Thus, an
analysis on the variability of the observations may be applied.
Definition 7: A measure of dispersion/measure of variation is a quantity that
measures the spread or variability of the values in a given set of data.
The most commonly used measures of dispersion are the range, variance,
and standard deviation. The simplest measure and easiest to compute but a rough
estimate for the measure of dispersion is the range.
Definition 8: The range, R, is the difference between the highest value (H) and
lowest value (L) in the data set. That is, R = H – L.
Example 1. Compare the performances of the three students based on their ratings
(in percent) in the 5 long tests.
Solution:
Student A :
83, 80, 89, 78, 70
Student B :
78, 79, 80, 81, 82
Student C :
80, 80, 80, 80, 80
In terms of measure of central tendency, each student performs equally since
they have same average rating of 80%. However, looking at the variability of their
ratings, Student A has the highest range as compared to the other students. This
shows that scores of student A are more dispersed than the other. The rating of
Mathematics in the Modern World
9
Mathematics in Our World | Mathematics as a Tool: Data Management
Student A is fluctuating while that of Student B is uniformly distributed. On the other
hand, Student C has range equal to zero so his ratings are all concentrated at its
mean indicating that the distribution has no spread.
Example 2. The average daily allowances (in pesos) of 12 college students studying
at University Y are 112, 127, 118, 147.5, 165.5, 99.75, 150, 145, 145, 102, 136.25
and 113. Find the range.
Solution:
Given: H
and L
then range, R
.
The range of the daily allowances of 12 college students is
pesos.
Remarks:
1. The larger the value of the range, the more dispersed the observations are.
2. The range considers only the extreme values or observations in the data set.
A more reliable measure in describing the spread of a set of observations is
the standard deviation. Most researches uses this measure in the treatment of data.
The computation includes all the values in the data set.
Definition 9: The standard deviation is the positive square root of the variance.
The variance is the average of the squared deviations of every observation from
the mean.
The standard deviation and variance can be obtained from a population and a
sample but most its applications utilizes the sample rather than the population due to
the complete enumeration of the latter. The unit of the variance is squared unit while
that of the standard deviation is the same as the unit of the data set. The following
symbols are used to designate these measures to a population and sample.
Standard deviation
Variance
Population
๐œŽ
๐œŽ
Sample
๐‘ 
๐‘ 
The variance and standard deviation of a population are calculated by using
the formulas below.
Mathematics in the Modern World
10
Mathematics in Our World | Mathematics as a Tool: Data Management
Variance and Standard deviation of Population: Consider
be the
=1(
N elements of a population. Then, the population variance is
√
and the population standard deviation is
Sample Variance: Let
.
be the random sample of
=1(
Then, the sample variance is
)
ฬ…)
observations.
and the standard deviation of the
√ .
sample is
Example 1: The following are the scores of a student in all her long exams in
Calculus: 83, 80, 89, 78, and 70. Calculate the standard deviation.
Solution:
(
83
3
)
9
80
0
0
89
9
81
78
๐œ‡)
38.8 (Variance)
๐‘
๐œŽ
(Standard deviation)
2
The standard deviation of the population is
4
70
10
(
Total 400
๐‘ (
๐‘–=1 ๐‘ฅ๐‘–
๐œŽ
100
)
194
The result indicates that on the average, the percentage scores of the student
tends to deviate from the mean by an amount of 6.23 units.
Example 2: The following data were obtained by sampling on a population.
10
12
14
15
17
18
18
24
Find the variance and the standard deviation of the sample.
Solution:
(
ฬ…)
(
ฬ…)
-6
36
12
-4
16
14
-2
4
15
-1
1
17
1
1
(๐‘ฅ
๐‘›
๐‘ 
๐‘ 
๐‘ฅฬ… )
๐‘ 
Mathematics in the Modern World
11
Mathematics in Our World | Mathematics as a Tool: Data Management
18
2
4
18
2
4
24
8
__64__
(
Total
ฬ…)
130
ฬ…
The variance is 18.57 while the standard deviation is approximately 4.31.
What can you infer from this?
Remarks: A large amount of standard deviation indicates that, on the average, the
data values will be far from the mean while the standard deviation of smaller amount
shows that, on the average, the data values will be close to the mean.
DO THESE!
Answer the following. Show a complete and neat solution for each problem.
1. An interview was made to a class of 20 college students to determine the
number of books owned by the students. The data gathered are as follows:
4, 9, 0, 1, 3, 24, 12, 3, 30, 12, 7, 13, 18, 4, 5, 15, 8, 10, 10, and 12. Treating
the data as a population, calculate the standard deviation.
2. (Adopted from Mathematics A Practical Odyssey). To settle an argument over
who is a better bowler between Danny and George, the two agreed to bowl
six games and whoever has the highest “average” will be the best. Their
bowling scores are presented in the table below. Compute and compare their
averages. Who is the better bowler?
George
Danny
185
182
135
185
200
188
185
185
250
180
155
190
3. (Mathematical Excursions by Aufmann ). A consumer testing agency has tested
the strengths of 3 brands of
inch rope. The results of the tests are shown
in the following table. According to the same test results, which company
produces
inch rope for which the breaking point has the smallest standard
deviation?
Company
inch
rope
Mathematics in the Modern World
in
12
Mathematics in Our World | Mathematics as a Tool: Data Management
pounds
Trustworthy
Brand X
NeverSnap
122, 141, 151, 114, 108, 149, 125
128, 127, 148, 164, 97, 109, 137
112, 121, 138, 131, 134, 139, 135
4. Ten used trail bikes are randomly selected from a bike shop, and the
odometer reading of each is recorded as follows.
1,902, 103, 653,
1,901,
788,
361, 216, 363, 223, 656
Solve for the standard deviation and interpret.
Measures of Relative Position
A statistical tool which is significant in identifying the position of an
Definition 10: A measure of relative position is a statistical measure that
provides the specific location of an observation relative to the other values when
the data are in ranked order.
observation relative to the other elements in a given data set the measure of relative
position.
This measure divides the data set into subgroups such that a specific portion
of the data set belongs to the lower bracket and the remaining on the higher bracket.
Percentiles, deciles, and quartiles are among the most commonly used measures of
relative position.
In determining the desired measure, the data must first be arranged in an
increasing pattern. The entire set of observations in a percentile contains 99
Definition 11:
The percentile, denoted by ๐‘ƒ๐‘– , is a value that divides an array of observations
into 100 equal parts in order that ๐‘– % of all the observations lies below ๐‘ƒ๐‘– .
The quartile, denoted by ๐‘„๐‘– , is a value that divides an array of observations into
four equal parts in order that (๐‘– × %) of all the observations lies below ๐‘„๐‘– .
The decile, denoted by ๐ท๐‘– , is a value that divides an array of observations into
ten equal parts in order that (๐‘– × %) of all the observations lies below ๐ท๐‘– .
partitions which are located at
,
, , and
where 1% of the total observations
Mathematics in the Modern World
13
Mathematics in Our World | Mathematics as a Tool: Data Management
are lower than
and the remaining 99% are higher than
observations are found below
, 2% of the total
and 98% are above it, and so on.
Analogous to this, quartiles have the subdivisions described by
(the first
quartile which has 25% of the observations falling below it and the remaining 75%
above it),
(the second quartile which is equal to the median and has 50% of the
observations below it), and
(the third quartile with 75% of the total observations
falls below it and the remaining 25% lies above it).
The portions of deciles are the 1st decile (
), 2nd decile (
),
, and 9th
Definition 12: Formula for the Percentile
The percentile ๐‘ƒ๐‘– of ungrouped data consisting of ๐‘› observations located
on the ๐‘– ๐‘กโ„Ž place can be computed as ๐‘ƒ๐‘–
decile (
). The lowest decile
๐‘›๐‘–
.
corresponds to a value in the set wherein 10% of
the whole observations are located below
, the second decile
value in which 20% of the entire observations are lower than
the last decile
corresponds to a
,
, and so on up to
which has a value positioned at the top such that 90% of all the
observations are located below the value corresponding to
.
Remarks:
1. The quartile and decile can be determined by solving its equivalent percentile.
a.
.
b.
.
2. Given a data set, then Median
.
Example 1: Joy was told that relative to the other scores on a long exam in Statistics,
her score was the
โ„Ž
percentile. This means that at least 95% of those who took the
test had scores less than or equal to Joy’s score, while at least 5% had a score higher
than Joy’s.
Example 2: Given the following data set: 25, 5, 6, 12, 8, 16, 17, 22, 20, 9. Compute
for
a) 20th percentile
c) first quartile
e) 3rd decile
b) 56th percentile
d) 2nd quartile
f) seventh decile
Mathematics in the Modern World
14
Mathematics in Our World | Mathematics as a Tool: Data Management
Solutions:
Arrange the scores in an increasing manner.
5, 6, 8, 9, 12, 16, 17, 20, 22, 25
a) 20th percentile
(
)
(location of 20th percentile)
This means that the 20th percentile is the second score from the lowest.
So,
.
b) 56th percentile
(
)
When the result is not exact round it to the nearest whole number. The
56th percentile is approximately described by the 6th value in the data set.
Thus,
.
Note: Interpolation may be applied to find for an exact value
corresponding to the 56th percentile.
means that the 56th
percentile is between the 5th and 6th value. To interpolate, multiply the
difference of the 5th and 6th values by the decimal part then add the result
to the 5th value. That is, (
)×
. So,
which is the exact value.
c) first quartile,
(
)(
)
is located halfway between the 2 nd and 3rd value in the list. So,
Since
, therefore
.
.
d) 2nd quartile
Note that
has the same value as the median. Solving for the median
gives
. So,
.
e) 3rd decile
(
)
Therefore,
(3rd value from the lowest)
.
f) seventh decile
Mathematics in the Modern World
15
Mathematics in Our World | Mathematics as a Tool: Data Management
(
)
( 7th number in the list)
The seventh decile is 17.
Box - and - Whisker Plot
Definition 12: A diagram showing the representation of a 5-point summary of a
data set specified by the lowest and the highest values, the values corresponding
to ๐‘„ and ๐‘„ , and the median is called a box – and - whisker plot also known as
box plot.
The five important numbers are arranged increasingly in a horizontal or
vertical scale. Diagrammatically, we have
Diagram from Mathematical Excursions by Aufmann
Here is a summary in the construction of a box plot.
Steps in the Construction of Box – and – Whisker Plot
1. Arrange the values in an increasing pattern.
2. Compute for
, median , and
.
3. Locate the five numbers (lowest and the highest values,
, median, and
)
in the number line and draw a rectangle (box) above the scales covering
,
median, and
then draw a line segment across the box passing through the
median.
4. Connect the box to the extreme values by a line segment (known as whisker).
Example: Draw a box-and-whisker plot for the given data set: 23, 15, 5, 6, 12, 8, 16,
17, 22, 20, 9, 10.
Solution:
๏‚Ÿ
Arrange the values in an increasing pattern.
5, 6, 8, 9, 10,12, 15, 16, 17, 20, 22, 23
๏‚Ÿ
Identify the lowest and highest values and compute for
, median , and
.
Mathematics in the Modern World
16
Mathematics in Our World | Mathematics as a Tool: Data Management
Lowest value is 5 and highest value is 23
(
)
(
)
Median =
Follow steps 3 and 4 to illustrate the figure.
Stem-and-leaf display
An informative arrangement of data where actual values of the observations
are displayed can be visualized through the use of the stem-and-leaf display.
Definition 13. A stem - and- leaf display is an organized diagram showing the
relative position of every element in the data set such that the leading digit(s)
become the stem and the trailing digit(s) becomes the leaf.
63
57
49
100
49
61
20
50
73
89
37
99
80
33
84
75
24
43
56
27
55
58
15
57
63
29
58
83
32
77
Example. The table lists the number of words used by 30 students in their reflection.
Draw a stem-and-leaf display of these data.
Answer:
Stem
Leaf
1
5
2
0 4 7 9
3
4
2 3 7
3 9 9
5
0 5 6 7 7 8 8
6
1 3 3
Mathematics in the Modern World
17
Mathematics in Our World | Mathematics as a Tool: Data Management
7
3 5 7
8
0 3 4 9
9
9
10
0
DO THESE!
1. An interview was made to a class of 20 college students to determine the
number of books owned by the students. The data gathered are as follows:
4, 9, 0, 1, 3, 24, 12, 3, 30, 12, 7, 13, 18, 4, 5, 15, 8, 10, 10, and 12.
a. Solve for the following measures and interpret the result.
i.
ii.
iii.
b. Construct a box-and-whiskers plot.
c. Create the stem-and-leaf display.
2. Consider the scores of the two bowlers in the previous exercise.
George
185
135
200
185
Danny
182
185
188
185
a. Compare their scores which corresponds to
i)
250
180
ii)
155
190
b. If the scores of Danny and George are combined to form a single
population, compute for i)
ii)
.
NORMAL DISTRIBUTION
When most of the observations are near the “center” and the distribution of
data is nearly similar on both sides then the distribution is said to follow a normal
distribution. This distribution is one of the most commonly used distribution in the
field of Statistics which has various applications.
Definition 14: A normal distribution, named as the Gaussian distribution, is a
continuous probability distribution which is drawn graphically by a smooth bellshaped curve called the normal curve having an area under it which is equal to
one.
Properties of a Normal Distribution
Any normal distribution has the following properties:
1. The total area under the normal curve is one.
2. The three measures of central tendency given by the mean, median and
mode are all equal.
Mathematics in the Modern World
18
Mathematics in Our World | Mathematics as a Tool: Data Management
3. It is symmetric with respect to the vertical line
.
4. The curve is asymptotic with respect to the horizontal axis on both
directions.
The proportion of values in a given data set which is normally distributed is
based on the mean and the standard deviation of the data set. That is,
๏‚Ÿ
about 68% of the observations fall within 1 standard deviation away from the
mean;
๏‚Ÿ
about 95% of the observations fall within 2 standard deviations away from the
mean; and
๏‚Ÿ
about 99.7% of the observations fall within 3 standard deviations away from
the mean.
The diagram shows the different percentages defined by the empirical rules
for normal distributions.
Diagram from Mathematical Excursion by Aufmann
Every distribution has a unique probability so areas based on a standard
normal distribution will be used.
Definition: A standard normal distribution is a distribution of a random variable
with mean zero and standard deviation equal to one. That is, ๐‘~๐‘( ).
A random variable X with mean
and standard deviation
can be transformed
into a standard normal variable Z with mean zero and standard deviation equal to
one by using the formula
.
Mathematics in the Modern World
19
Mathematics in Our World | Mathematics as a Tool: Data Management
Rules in Finding the Areas Under the Normal Curve
Case 1. (
)
When the area under the curve is located to the left of
, simply read its
value corresponding to the area in the table for the areas under the normal curve.
Example: 1. Find the area to the left of
.
2. Give the probability (
)
Solution:
Case 2. (
)
(
Example: Find (
(
Solution:
Case3. (
)
)
)
)
(
(
)
)
(
)
This is applied when the area is bounded between two ordinates or values in
an interval.
Example: What is the area bounded between Z = -1.22 and Z = 2.03
Applications:
Example : (Mathematical Excursions by Aufmann) During 1 week, an overnight delivery
company found that the weights of its parcels were normally distributed, with a mean
of 24oz and a standard deviation of 6 oz.
a. What percent of the parcels weighed between 12 oz and 30 oz?
b. What percent of the parcels weighed more than 42 oz?
Solution:
a.
Example 2: The salaries of employees of a certain company in Metro Manila have a
mean of Php5000 and a standard deviation of Php1000. What is the probability that
an employee selected will have a salary of
a. more than Php 5000?
b. between Php 5,750 and Php 6,500?
c. less than Php 9,000?
Mathematics in the Modern World
20
Mathematics in Our World | Mathematics as a Tool: Data Management
Exercises: Show a complete solution for each problem.
2. Given a normal distribution with µ = 50 and
= 10, find the probability that X
assumes a value between 45 and 62.
3. Given a normal distribution with µ = 300 and
= 50, find the probability that X
assumes a value greater than 362.
4. In the qualifying examination for the admittance to college, the mean score was
65 and the standard deviation was 8. If 1,265 students took the qualifying exam,
how many of them scored between 60 and 75?
5. Records show that in a certain hospital the distribution of the “length of stay” of
its patients is normal with a mean of 10.5 days and a standard deviation of 2
days.
a. What percentage of the patients stayed 8 days?
b. What is the probability that a patient stays in the hospital between 9 and 11
days?
6. An electrical firm manufactures light bulbs that have a length of life that is
normally distributed with mean equal to 800 hours and a standard deviation of 40
hours. Find the probability that a bulb burns between 778 and 834 hours.
CORRELATION AND REGRESSION
Several research studies focus on the relationships between two or more
things. For instance, a teacher may want to know if study habits of students may
relate to their performance in the classroom. A businessman needs to predict the
selling prizes of his products based on the monthly consumption demand. The
doctor needs to find out if there is an evidence of relationship between cholesterol
and triglyceride levels. An agriculturist wants to know if the level of experience and
practices of the farmers in planting tobacco greatly affects their production. All of
these things are involved in the correlation and regression analysis of data.
Mathematics in the Modern World
21
Mathematics in Our World | Mathematics as a Tool: Data Management
Correlation and regression are two related statistical tools. Correlation is used
to find out if there is a relationship between two variables while regression is a
means to predict or forecast the value of one variable in terms of the other.
Definition: Correlation analysis is a method used measure the degree of
relationship or association between two or more variables.
The relationship between two variables can be shown graphically by
sketching the scatter diagram.
Scatter diagram – also known as scatter plot, is pictorial presentation showing
the relationship between two variables. It shows the direction and shape of the
association being conveyed. This is done by plotting the points corresponding to
the observations/data on the first quadrant of a rectangular coordinate system.
Example:
Types of Correlation:
1. Positive correlation – a direct relationship between two variables exists.
That
is,
as
one
variable
increases
(decreases),
the
other
also
increases(decreases).
2. Negative correlation – an inverse relationship exists between the variables.
Here, one variable increases as the other decreases or vice versa.
3. Zero correlation – exists when scores in one variable tend to score neither
systematically high nor systematically low in the other variable. It indicates
that there is no correlation between the variables. The points in the scatter
diagram are in random manner.
Mathematics in the Modern World
22
Mathematics in Our World | Mathematics as a Tool: Data Management
Diagram showing the positive, negative and zero correlation.
Remark: The relationship between two variables may be described by its magnitude
or its strength. In terms of strength, the correlation may be perfect, high, moderate,
or low. In a perfect correlation, all points in the scatter diagram lie on a straight line.
The degree or strength of relationship between two variables may also be
described by computing a single number called the correlation coefficient.
The Pearson Correlation Coefficient (r)
-
named after an English mathematician Karl Pearson (1857 – 1936)
-
measures relationships in variables that are linearly related.
-
its value ranges from
-
it is computed through the formula
(
[ (
) (
The correlation coefficient
) ( )(
) ][ (
)
) (
) ]
may be interpreted using the correlation scale shown
below:
Range of Values
1
Interpretation
Perfect Positive (Negative) Correlation
0.91
0.99
Very high positive (Negative) Correlation
0.71
0.90
High positive (Negative) Correlation
0.51
0.70
Moderately positive (Negative) Correlation
0.31
0.50
Low positive (Negative) Correlation
0.01
0.30
Negligible positive (Negative) Correlation
0.00
No Correlation
Testing the Significance of
Mathematics in the Modern World
23
Mathematics in Our World | Mathematics as a Tool: Data Management
The t – test is used to verify if the result is statistically significant or not. This
can be computed by using the formula
√
.
Example: A research study was conducted to determine the correlation between
students’ grade in English and their grades in Mathematics. A random sample of 10
students in a class was taken and the results of the sampling were tabulated below.
Use the 5% level of significance.
Student No.
9
2
3
4
5
6
7
8
93
89
84
91
90
83
75
81
10
English grade
84
1
77
Mathematics grade
78
85
91
86
80
88
89
87
78
76
REGRESSION – describes the process of estimating the relationship between two
variables. The relationship is estimated by by fitting a straight line through the given
data. The least squares method is useful in determining the equation of the line that
best fit the data. This line is known as the regression line which keeps the prediction
errors to be a minimum. It is given by the equation
where
is the predicted value,
is the regression value ( slope of the line)
is the y – intercept of the line which is
computed as
ฬ…
ฬ…
where
ฬ… is the mean of x – values
ฬ… is the mean of y – values
To find the slope,
(
(
) ( )( )
) ( )
Example. Use the above example to estimate the grade of the student in English if
his Mathematics grade is 90. What regression equation is used?
Mathematics in the Modern World
24
Download