Uploaded by Bana Hammad

f4b428e8-63c6-46e1-9fbf-700ef5ed3a7b

advertisement
FOM 11 – Chapter 5: Statistics
Statistics is a mathematical discipline that is concerned with the collection, organization, displaying, analyzing
interpretation and presentation of data.
Data is collected to study a population which can represent a certain group of people animals, trees, etc. data
points are often represented as ๐‘ฟ.
A population includes all measurements of interest. But when census data cannot be collected a portion of this
population is wisely selected to form a sample.
After data is collected statistician desires to study the behavior of this data using two statistical methods:
1. Measures of central tendency: which study the middle of the data:
๏‚ท
Mean: is the sum of the collection of the data entries divided by the count of data in the collection. The
mean of a population is represented by the Greek letter ๐œ‡ (mu). The mean of a sample is represented
as ๐‘ฅฬ…
๏‚ท
Median: is the data entry that has half of the data below it and half the data above it. When the data
entries are written in numerical order, the median is the Middle data value if the number of data
elements is odd and the mean of the two middle data points if the number of data elements is even.
๏‚ท
Mode: is the data entry that occurs the most.
2. Measures of dispersion which study how the data is spread out:
๏‚ท
Max: the greatest number in the population.
๏‚ท
Min: the smallest number in the population
๏‚ท
Range: Is the difference between the minimum and maximum values in the data set.
๏‚ท
Variance and Standard deviation.
Notes:
๏‚ท It is always beneficial to arrange the data in an ascending order to find the max, min, median, mode and so
on
๏‚ท Dispersion has a value of zero if all the data points in a set are identical. It increases in value as the data
becomes more spread out.
๏‚ท An outlier is a value in the data set that is very different from the other data value.
5.1 Exploring Data:
Example 1: Determine the mean, median, modem maximum, minimum and range for the data set below.
5.2 Frequency Tables, Histograms and Frequency Polygons
Sometimes data values may be repeated frequently. A frequency table tells us how often a data value or an event
occurs. For example, the table below organizes students based on the number of siblings they have.
Number of Siblings
Number of Students
0
1
2
3
4
5+
A frequency distribution is a set of intervals (a table or a graph) into which data is organized. For example, in the
table above, each row is an interval of the frequency distribution. Intervals are also referred to as bins.
A histogram is a bar graph that shows a frequency distribution. The horizontal axis represents the intervals and
the vertical axis represents the frequency.
A frequency polygon is the graph of a frequency distribution produced by joining the midpoints of the intervals
using straight lines.
๏‚ท
When dealing with a lot of data, it is often easier to choose a range of values for each interval, instead of
creating an individual interval for each data point which will be impossible to deal with especially if the
data set is very large.
Example 2: Create a frequency table and histogram for the data given below.
Range: __
__ to_______
Flow Rate
Tally
Freq.
๏‚ท
We often use histograms to determine the distribution (shape) of data. When data is normally distributed,
its frequency distribution has a bell shape. The intervals in the middle have a higher frequency that the
intervals on each end. It has been discovered that many types of data naturally have this type of
distribution. Height, weight and temperature of living organism are examples of data that are normally
distributed.
Although data is often normally distributed, sometimes data is skewed away from one direction or does not have
a distinct peak. Many other shapes are possible, but only the below have names.
Skewed Left
(Less data left)
Skewed Right
(Less data right)
Uniform
(No Peak)
Bimodal
(2 Peaks)
Example 3: The magnitude of an earthquake is measured using the Richter scale. The higher the magnitude, the
more severe the earthquake is. Based on the histograms shown below, in what years did the most damage from
earthquakes occur?
5.3 Measures of Dispersion:
So far we know one measure of dispersion, the range. The range doesn’t describe how the date is distributed
inside that interval, so we have the standard deviation can be used for this purpose.
Deviation is the distance between a data value and the mean of the data set, denoted by (๐‘ฅ − ๐œ‡) where x is the
data value, ๐œ‡ is the mean.
Standard deviation is a measure of the dispersion, or scatter, of data values in relation to the mean.
A low standard deviation shows that most data is close to the mean, so the data is more consistent.
A high standard deviation shows that the data is scattered farther from the mean and the data is less consistent.
Standard deviation for a population is represented using the Greek letter ๐œŽ and.
๐œŽ = ๐‘ ๐‘ก๐‘Ž๐‘›๐‘‘๐‘Ž๐‘Ÿ๐‘‘ ๐‘‘๐‘’๐‘ฃ๐‘–๐‘Ž๐‘ก๐‘–๐‘œ๐‘› ๐‘œ๐‘“ ๐‘Ž ๐‘๐‘œ๐‘๐‘ข๐‘™๐‘Ž๐‘ก๐‘–๐‘œ๐‘›
๐œ‡ = ๐‘š๐‘’๐‘Ž๐‘› ๐‘œ๐‘“ ๐‘๐‘œ๐‘๐‘ข๐‘™๐‘Ž๐‘ก๐‘–๐‘œ๐‘›
๐›ด = “๐‘กโ„Ž๐‘’ ๐‘ ๐‘ข๐‘š ๐‘œ๐‘“”
๐‘ฅ = ๐‘’๐‘Ž๐‘โ„Ž ๐‘‘๐‘Ž๐‘ก๐‘Ž ๐‘’๐‘›๐‘ก๐‘Ÿ๐‘ฆ
๐‘› = ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘’๐‘›๐‘ก๐‘Ÿ๐‘–๐‘’๐‘ 
Steps for calculating the standard deviation:
1. Calculate the deviation from the mean for each value:
2. Square each of the numbers obtained above (the deviations)
3. Find the sum of all the values above.
4. Divide the sum by the number of data entry.
5. Take the square root of the result.
∑(๐‘ฅ − ๐œ‡)2
๐œŽ=√
๐‘›
Example 4: 170, 182, 192, 193, 212 represents the heights of players on a basketball team. Calculate the standard
deviation for the team’s heights.
๐œ‡=
๐‘ฅ
(๐‘ฅ − ๐œ‡)
(๐‘ฅ − ๐œ‡)2
170
182
192
196
212
∑(๐‘ฅ − ๐œ‡)2 =
๐œŽ=
Example 5: A different basketball team height is 152, 154, 174, 180, and 220. Calculate the standard deviation for
this team’s height.
๐œ‡=
๐‘ฅ
(๐‘ฅ − ๐œ‡)
(๐‘ฅ − ๐œ‡)2
152
154
174
180
220
∑(๐‘ฅ − ๐œ‡)2 =
๐œŽ=
Q: How do the heights of the two teams compare, based on the standard deviation (consistency)
Steps for Calculating Standard Deviation from a Frequency Table:
1. To calculate the mean, multiply the data values by their frequency to get the sum of all the data entries
that have the same value.
2. Add all the sums, and then divide by the sum of the frequencies which is the number of elements in the
data set.
3. To calculate the standard deviation, calculate the square of the deviation for each data value,(๐‘ฅ − ๐œ‡)2 ,
then multiply by its frequency
4. Find the sum of all the(๐‘ฅ − ๐œ‡)2 , then divide by the sum of the frequencies
5. Take the square root of the number from step 4.
6. These steps can be performed using a table
Example 6: the standard deviation from the frequency table below.
Number of hits
(๐‘ฅ)
0
1
2
3
4
Frequency (๐‘“)
5
10
4
3
1
๐‘ฅ×๐‘“
(๐‘ฅ − ๐œ‡)
(๐‘ฅ − ๐œ‡)2
๐‘“ × (๐‘ฅ − ๐œ‡)2
๏‚ท
It is also possible to calculate the standard deviation from a frequency table whose intervals are given as a
range, rather than specific data points. Because it is not possible to see each individual data point, the
midpoint of each interval is treated as the data point and then the standard deviation is calculated as
above for a frequency table.
Example 7: Angelo conducts a survey to determine the number of hours per week that grade 11 females in her
school play videogames. She obtained the following set of data.
Calculate the mean and the standard deviation for the data table given below.
Number of hits
(๐‘ฅ)
3-5
5-7
7-9
9-11
11-13
Frequency (๐‘“)
๐‘ฅ
๐‘ฅ×๐‘“
(๐‘ฅ − ๐œ‡)
(๐‘ฅ − ๐œ‡)2
๐‘“ × (๐‘ฅ − ๐œ‡)2
7
11
16
19
12
Janice conducted the same experiment for males, and found out that the mean was ๐œ‡ = 12.84 while the standard
deviation was ๐œŽ = 2.16
Which quiz has a greater mean? Which quiz has a greater standard deviation?
Hw
# 1, 2 page 211
#2, 3, 7, 11 page 221
#1 ac, 2, 4, 6, 9 a, 13 p 233
Download