Education 793 Class Notes Descriptive Statistics, Central Tendency and Variability 10 September 2003 Today’s agenda • • • • Lab details finalized Your announcements Chapter 3: Frequency Distributions Chapter 4: Central Tendency and Variability 2 Data Matrix Data Matrix is defined as a matrix in which subjects are listed in the rows and variables corresponding to each subject are listed in the columns Example Case Sex Age Race 1 m 12 white 2 f 11 asian 3 f 13 african american 4 m 13 white 5 m 12 asian 6 f 11 white 3 Basic Terms and Concepts • Frequency Distribution: Orders values from lowest to highest and gives the number or percent of subjects with each value. • Can be presented with: • tabular form • histogram • frequency polygon • stem and leaf plot 4 Tables and Polygons 5 Histograms and Bar Charts 6 Stem and Leaf 7 Shape of Frequency Distributions • Symmetric: when two halves of the distribution mirror each other 8 Modality • Modality: the number of relative peaks the distribution exhibits Unimodal Bimodal Rectangular Multimodal 9 Skew A. B. Left (Negative) Skew Right (Positive) Skew 10 Kurtosis (Peakedness) High Peak, Long Tails Flat Peak, No Tails 11 Central Tendency and Variability 1) Central tendency Mode Median Mean 2) Variation Range Semi-interquartile Range Variance Standard deviation 12 Central Tendency Mode Most frequently occurring score in a distribution Median Point on the distribution below which one-half (50%) of the scores fall Mean Arithmetic average of scores within a distribution Special properties: Sum of deviations of scores mean is zero Sum of squares of deviations is at a minimum 13 Mean, Median, Mode A distribution of GRE scores: 340 600 450 620 Mode: 510 660 580 670 Median: 580 710 Mean: 572 A second distribution of GRE scores: 340 600 450 620 Mode: 510 660 Median: 580 670 580 1710* Mean: 672 *Heidi’s score 14 Box Plots 15 Shapes and Statistics A. B. 16 Variability Identifying the middle of a distribution as revealed by measures of central tendency is of limited value unless one also knows how much the scores in the distribution differ from each other. As such, measures of how much scores differ from each other should always accompany measures of central tendency. These measure of how much the scores differ from each other are called measures of "spread“ or variability. 17 Basic Measures • Range The range is the simplest measure of spread: It is equal to the difference between the largest and the smallest values. The range can be a useful measure of spread primarily because it is so easily understood, but it is otherwise seldom used in real statistical practice. It is very sensitive to extreme scores, however, because it is based on only two values. • The semi-interquartile range Computed as one half the difference between the 75th percentile (often called Q3) and the 25th percentile (Q1), or (Q3 - Q1) / 2. Because half the scores in a distribution lie between Q3 and Q1, semi-interquartile range is 1/2 the distance needed to cover 1/2 the scores. In a symmetric distribution, an interval stretching from one semi-interquartile range below the median to one semiinterquartile above the median will contain 1/2 of the scores. This will not be true for a skewed distribution, however. 18 Variance and Standard Deviation The variance is a widely used measure of spread. It is computed as the average squared deviation of each number from its mean. The formula (in summation notation) for the variance in a population in which M is the mean and N is the number of scores is: 2 s2 ( x x) n 1 The standard deviation is the square root of the variance. It is the most commonly used measure of spread. An important attribute of the standard deviation as a measure of spread is that it is possible to compute the percentile rank associated with any given score if the mean and standard deviation of a normal distribution are known. 19 Calculating Standard Deviation ( x x) s n 1 2 2 Or to calculate by hand there is a shortcut formula ( X ) X N s2 N 1 2 2 20 Numbers X X2 12 144 17 289 22 484 14 196 12 144 19 361 13 169 15 225 11 121 ( X ) X 2 N s N 1 2 2 135 2133 9 s2 9 1 2 2133 2025 8 135 2133 =13.5 21 Next Week • Coursepack – Say it with Figures Chapter7: • The Cross-Tabulation Refines • and Available through JSTOR at www.lib.umich.edu 22