DAY 3 14 Jan 2014 Today is A. January 14, 2014 B. January 13, 2013 Recap • Organizing Data • Qualitative & Quantitative Data • Frequency distribution & relative frequency distribution • Single-value grouping, Limit grouping, Cut point grouping • Histogram, Dotplots, Stem-and-leaf diagrams Objective of the day: • Distribution shapes • Descriptive Measures => Central Measures => Mean, Median, Mode. • Measures of Variations => Standard Deviation Section 2.4 Distribution Shapes Definition 2.10 Distribution of a Data Set The distribution of a data set is a table, graph, or formula that provides the values of the observations and how often they occur. Relative-frequency histogram and approximating smooth curve for the distribution of heights Common distribution shapes Example: Relative-frequency histogram for household size Identify the shape of the distribution. Example: Relative-frequency histogram for household size Identify the shape of the distribution. Definition 2.12 Population and Sample Distributions; Distribution of a Variable The distribution of population data is called the population distribution, or the distribution of the variable. The distribution of sample data is called a sample distribution. Population distribution and six sample distributions for household size Key Facts: Population and Sample Distributions For a simple random sample, the sample distribution approximates the population distribution (i.e., the distribution of the variable under consideration). The larger the sample size, the better the approximation tends to be. Chapter 3 Descriptive Measures Descriptive Measures Number that describe data set. Section 3.1 Measures of Center Measure of Center Descriptive measures that indicates where the center or most typical value of data set lies are called measure of central tendency or measures of center. Three most important measures of center: 1. Mean 2. Median 3. Mode Definition 3.1 Mean of a Data Set The mean of a data set is the sum of the observations divided by the number of observations. mean = sum of the observations / the number of observations. Example: Data Set I Data Set II Example: Data Set I Data Set II Means in Data Set I and Data Set II Definition 3.2 Median of a Data Set Arrange the data in increasing order. • If the number of observations is odd, then the median is the observation exactly in the middle of the ordered list. • If the number of observations is even, then the median is the mean of the two middle observations in the ordered list. In both cases, if we let n denote the number of observations, then the median is at position (n + 1) / 2 in the ordered list. Definition 3.3 Mode of a Data Set Find the frequency of each value in the data set. • If no value occurs more than once, then the data set has no mode. • Otherwise, any value that occurs with the greatest frequency is a mode of the data set. Example: Data Set I Median in Data Set I Data Set I 300 300 300 300 300 300 400 400 450 450 800 940 1050 Median is at the position (n+1)/2 = (13+1)/2 = 7 Median = ? Example: Data Set I Median in Data Set I Data Set I 300 300 300 300 300 300 400 400 450 450 800 940 1050 Median is at the position (n+1)/2 = (13+1)/2 = 7 Median = 400 Example: Data Set I Mode in Data Set I Data Set I 300 300 300 300 300 300 400 400 450 450 800 940 1050 Mode = ? Example: Data Set I Mode in Data Set I Data Set I 300 300 300 300 300 300 400 400 450 450 800 940 1050 Mode = 300 Example: Data Set I Data Set II Mean, Median, and Mode in Data Set I and Data Set II Definition 3.4 ƒCOMPARISON OF MEAN, MEDIAN, MODE: 1. Note that the mean is pulled in the direction of the skewness, i.e. in the direction of the extreme observation. The mean is sensitive to extreme observations (very large or very small in comparison to the rest of the data). The mean is not a resistant measure of center. 2. The median is not pulled into the direction of the most extreme observations. The median is not sensitive to extremes, i.e. the median is a resistant measure of center. 3. When the data is skewed, therefore, the median is the preferred measure of center. 4. The mode may not be near the center and, thus not useful as a measure of center. Relative positions of the mean and median for (a) right-skewed, (b) symmetric, and (c) left-skewed distributions Section 3.2 Measures of Variation Example: Five starting players on two basketball teams Example: Shortest and tallest starting players on the teams Definition 3.5 Range of a Data Set The range of a data set is given by the formula Range = Max – Min, where Max and Min denote the maximum and minimum observations, respectively. ∑ 10 ∑N=? N=1 10 ∑ N = 1+2+3+4+5+6+7+8+9+10 N=1 Definition 3.6 Example: Five starting players on basketball Team I. Example: Five starting players on basketball Team I Example: Five starting players on basketball Team I Example: Five starting players on basketball Team II Example: Five starting players on basketball Team 1. Example: Five starting players on basketball Team 1. Formula Summary: • Distribution shapes • Descriptive Measures => Central Measures => Mean, Median, Mode. • Measures of Variations => Standard Deviation Next ... • Lab : Finish section 2.3 and Quiz 1 (1.1-2.3) • Sections: 3.3 & 3.4 Thank You