Uploaded by Ryan Seipp

Numerical Data Measures: Mean, Median, Standard Deviation

advertisement
DESCRIBING DATA: NUMERICAL MEASURES
Dr. Ashish Chandra
College of Business
Chapter Three
Objectives 1
Compute and interpret the mean, the median, and the mode
Ways to Measure the central location of Data
A measure of location is a value used to describe the central
tendency of a set of data
 Common measures of location

 Mean
 Median
 Mode

The mean is the most widely reported measure of location
Measure the central location of Data - via MEAN
Mean: Two ways to compute
(1) Population Mean
(2) Sample Mean
(1) Population Mean – Property of the population (Average of all data points)
Definition – PARAMETER: Any characteristic of a population. (Population Mean is a parameter)
Measure the central location of Data - via MEAN
Mean: Two ways to compute
(1) Population Mean
(2) Sample Mean
(1) Population Mean – Property of the population (Average of all data points)
Question: What type of statistics are we doing, when computing population mean – Recall Chapter 1
Measure the central location of Data - via MEAN
Mean: Two ways to compute
(1) Population Mean
(2) Sample Mean
(1) Population Mean – Property of the population (Average of all data points)
Question: What type of statistics are we doing, when computing population mean – Recall Chapter 1
Descriptive Statistics
Example: Population Mean
There are 42 exits on I-75 through the state of Kentucky. Listed below are the distances
between exits (in miles).
1. Why is this information a population?
Number of data points in table = 42, we consider all exits on I-75 of Kentucky
2. What is the mean number of miles between exits?
Interpretation: 4.57 is the typical/ average number of miles between exits. (Population parameter)
Sample Mean
STATISTIC - A characteristic of a sample.
Any measurable characteristic/ property of a sample is a called a statistic
Sample Mean is a statistic
Sample Mean
STATISTIC A characteristic of a sample.
What type of statistics are we doing, when computing sample mean – Recall Chapter 1
Sample Mean
STATISTIC A characteristic of a sample.
What type of statistics are we doing, when computing sample mean – Recall Chapter 1
Inferential Statistic
Example: Sample Mean
What is the sample mean number of hours for mobile usage
Properties of the Population Mean
All the data values are used in the calculation of population mean
The population mean is unique
The sum of the deviations from the mean (population/ sample) is zero
 Weakness: Affected by extreme values.
The Median
MEDIAN The midpoint of the values after they have been ordered from the
minimum to the maximum values or maximum to minimum values.
 Value in the middle of a set of ordered data
 Describes the center of the data sets where there are one or just a few extreme values
 Odd number of data points above
Finding the Median

To find the median for an even numbered data set, sort the
observations and calculate the average of the two middle values
The number of hours a sample of 10 adults used
Facebook last month:
3 5 7 5 9 1 3 9 17 10
Arranging the data in ascending order gives:
1 3 3 5 5 7 9 9 10 17
Thus, the median is 6.
Characteristics of the Median

The median is the value in the middle of a set of ordered data

It is not influenced by extreme values
 Cases where mean is not representative of the data, use the median.

Fifty percent of the observations are larger than the median

It is unique to a set of data
The Mode
MODE The value of the observation that occurs most frequently.

Given Data: 1 3 3 5 5 7 9 9 10 17

Construct frequency table

Identify the entry/entries with the maximum appearance

A set of data can have more than one mode

A set of data could have no mode 1 3 5 7 9 10 17
Entry
1
3
5
7
9
10
17
Frequency
1
1
1
1
1
1
1
Frequency
Tennis
6
Soccer
7
Baseball
3
Frequency
1
1
3
2
5
2
7
1
9
2
10
1
17
1
More than one mode: 3,5,9
NO mode as no entry is repeated
Sports
Entry
Mode: Soccer
Objective 2
We studied Mean, Median, Mode: Measure the central tendency of the data set
 We will study Dispersion: Measures variation or spread in the data set
Why study Dispersion?
 Dispersion: Measures the variation or spread in the data set
 Mean in both the cases is 50
 In case one data is LESS spread than the second
 Less spread = Data is closely clustered around the mean
(Hence, mean is a good representative of the data)
 Large spread = Data points scattered more
(Hence, mean may not be a good representative of the data)
 Ways to measure Dispersion
1) Range
2) Variance
3) Standard Deviation
1) Range = Maximum value – Minimum value
 Only 2 data values are used for its computation
 Super quick to compute
 Keeps account of the extreme values
Variance: Population Variance
2
Population Variance (𝜎 ):
2
∑
x
−
μ
σ2 =
N
𝜎 2 = Population variance (𝜎 Greek letter “sigma”). Read as “sigma raised to power 2” (sigma squared)
x = Value of observations in the population
𝜇 = Population mean
N = Number of observations in the population
Properties of Population variance
 All observations are used in the calculation
 Unit of population variance – Think about the unit of Mean, Median, Mode, and Range?
Understanding unit of population variance
Unit of Mean, Median, Mode, and Range : centimeters
Unit of Variance : centimeters2
∑ 100𝑐𝑚 −50𝑐𝑚 2
2
=
𝑐𝑚
100,000
Computing Population Variance
Let the number of traffic citations issued last year by month in Normal is reported below. Population variance?
Step 1: Compute the population mean
Step 2: Compute the following table
Step 3:
Sum it up
Population variance for the number of citations is 124
Population Standard Deviation (σ)
2
∑
x
−
μ
Population Variance (𝜎 2 ): σ2 =
N
Population Standard Deviation (𝜎):
Important points to Note:
 Unit of Population variance (𝜎 2 ): centimeter2
 Unit of Population Standard Deviation (𝜎): centimeter
 Population Standard Deviation and Population Variance can not be negative
 Population Variance: Average Squared distance of the observation from the population mean μ
 Population Standard Deviation: Square root of the average Squared distance of the observation from μ
Population vs Sample
Population Variance:
∑ x−μ 2
2
σ =
N
Population Standard Deviation (𝜎):
Sample Variance: s 2 =
∑ x−x 2
n −1
Sample Standard Deviation (s):
μ ∶ Population Mean
x ∶ Sample Mean obtained from the Sample
N: Number of observations in population
n: Number of observations in the Sample
x: Observations from the POPULATION
x: Observations from the chosen SAMPLE
Units of Sample Variance & Population Variance ?
Units of Sample Standard Deviation & Population Standard Deviation ?
Interpretation of Standard Deviation
THE EMPIRICAL RULE (Sample or Population): ONLY for a symmetrical, bell-shaped frequency distribution, approximately
68% of the observations (sample / population) will lie within plus and minus one standard deviation of the mean, about 95%
of the observations will lie within plus or minus 2 standard deviations of the mean, and practically all (99.7%) will lie within 3
standard deviations of the mean.
• Symmetrical
• Bell Shaped
Say Mean (sample/ population) = 100
Standard Deviation = 10
 68% of the observations lie in
[Mean – St. Deviation, Mean + St. Deviation]
=[100 – 10, 100 + 10]
 95% of the observations lie in
[Mean – 2 x St. Deviation, Mean + 2 x St. Deviation]
=[100 – 2 x 10, 100 + 2 x 10] = [80, 120]
 99.7% of the observations lie in
[Mean – 3 x St. Deviation, Mean + 3 x St. Deviation]
=[100 – 3 x 10, 100 + 3 x 10] = [70, 130]
How is the above analysis useful if you have a Symmetrical, bell-shaped frequency distribution
with 10 Billion observation?
Interpretation of Standard Deviation
What to do when we have a NON-symmetrical, and NON-bell-shaped frequency distribution
Example: Dupree Paint Company employees contribute a mean of $51.54 to the company’s profit-sharing plan
every two weeks. The standard deviation of biweekly contributions is $7.51. At least what percent of the
contributions lie within plus 3.5 standard deviations and minus 3.5 standard deviations of the mean, that is,
between $25.26 and $77.83?
Download