Distribution of a Data Set

advertisement
DAY 3
14 Jan 2014
Today is
A. January 14, 2014
B. January 13, 2013
Recap
• Organizing Data
• Qualitative & Quantitative Data
• Frequency distribution & relative frequency
distribution
• Single-value grouping, Limit grouping, Cut point
grouping
• Histogram, Dotplots, Stem-and-leaf diagrams
Objective of the day:
• Distribution shapes
• Descriptive Measures => Central Measures => Mean,
Median, Mode.
• Measures of Variations => Standard Deviation
Section 2.4
Distribution Shapes
Definition 2.10
Distribution of a Data Set
The distribution of a data set is a table, graph, or
formula that provides the values of the observations and
how often they occur.
Relative-frequency histogram and approximating smooth curve for
the distribution of heights
Common distribution shapes
Example:
Relative-frequency histogram for household size
Identify the shape of the distribution.
Example:
Relative-frequency histogram for household size
Identify the shape of the distribution.
Definition 2.12
Population and Sample Distributions; Distribution of a Variable
The distribution of population data is called the population
distribution, or the distribution of the variable.
The distribution of sample data is called a sample distribution.
Population distribution and
six sample distributions for
household size
Key Facts: Population and Sample Distributions
For a simple random sample, the sample distribution
approximates the population distribution (i.e., the
distribution of the variable under consideration). The
larger the sample size, the better the approximation
tends to be.
Chapter 3
Descriptive Measures
Descriptive Measures
Number that describe data set.
Section 3.1
Measures of Center
Measure of Center
Descriptive measures that indicates where the center or most typical
value of data set lies are called measure of central tendency or
measures of center.
Three most important measures of center:
1. Mean
2. Median
3. Mode
Definition 3.1
Mean of a Data Set
The mean of a data set is the sum of the observations
divided by the number of observations.
mean = sum of the observations / the number of observations.
Example:
Data Set I
Data Set II
Example:
Data Set I
Data Set II
Means in Data Set I and Data Set II
Definition 3.2
Median of a Data Set
Arrange the data in increasing order.
• If the number of observations is odd, then the median is
the observation exactly in the middle of the ordered list.
• If the number of observations is even, then the median is
the mean of the two middle observations in the ordered list.
In both cases, if we let n denote the number of observations,
then the median is at position (n + 1) / 2 in the ordered list.
Definition 3.3
Mode of a Data Set
Find the frequency of each value in the data set.
• If no value occurs more than once, then the data set has
no mode.
• Otherwise, any value that occurs with the greatest
frequency is a mode of the data set.
Example:
Data Set I
Median in Data Set I
Data Set I
300 300 300 300 300 300 400 400 450 450 800 940 1050
Median is at the position (n+1)/2 = (13+1)/2 = 7
Median = ?
Example:
Data Set I
Median in Data Set I
Data Set I
300 300 300 300 300 300 400 400 450 450 800 940 1050
Median is at the position (n+1)/2 = (13+1)/2 = 7
Median = 400
Example:
Data Set I
Mode in Data Set I
Data Set I
300 300 300 300 300 300 400 400 450 450 800 940 1050
Mode = ?
Example:
Data Set I
Mode in Data Set I
Data Set I
300 300 300 300 300 300 400 400 450 450 800 940 1050
Mode = 300
Example:
Data Set I
Data Set II
Mean, Median, and Mode in Data Set I and Data Set II
Definition 3.4
ƒCOMPARISON OF MEAN, MEDIAN, MODE:
1. Note that the mean is pulled in the direction of the skewness, i.e.
in the direction of the extreme observation. The mean is sensitive
to extreme observations (very large or very small in comparison to
the rest of the data). The mean is not a resistant measure of
center.
2. The median is not pulled into the direction of the most extreme
observations. The median is not sensitive to extremes, i.e. the
median is a resistant measure of center.
3. When the data is skewed, therefore, the median is the preferred
measure of center.
4. The mode may not be near the center and, thus not useful as a
measure of center.
Relative positions of the mean and median for
(a) right-skewed, (b) symmetric, and (c) left-skewed distributions
Section 3.2
Measures of Variation
Example:
Five starting players on two basketball teams
Example:
Shortest and tallest starting players on the teams
Definition 3.5
Range of a Data Set
The range of a data set is given by the formula
Range = Max – Min,
where Max and Min denote the maximum and minimum
observations, respectively.
∑
10
∑N=?
N=1
10
∑ N = 1+2+3+4+5+6+7+8+9+10
N=1
Definition 3.6
Example:
Five starting players on basketball Team I.
Example:
Five starting players on basketball Team I
Example:
Five starting players on basketball Team I
Example:
Five starting players on basketball Team II
Example:
Five starting players on basketball Team 1.
Example:
Five starting players on basketball Team 1.
Formula
Summary:
• Distribution shapes
• Descriptive Measures => Central Measures => Mean,
Median, Mode.
• Measures of Variations => Standard Deviation
Next ...
• Lab : Finish section 2.3 and Quiz 1 (1.1-2.3)
• Sections: 3.3 & 3.4
Thank You

Download