Teaching Assistant Chapter Highlights and Quiz Review Quiz Info, Notes from Chap 3 Authored by Jeremy Cox Quizzes over Chapter 3 Wednesday, January 24, 2001 Quiz 3-1: Section 3.1 Friday, January 26, 2001 Quiz 3-2: Sections 3.2 Monday, January 28, 2001 Quiz 3-3: Section 3.3, Cumulative These notes are not meant to replace a thorough reading and study of the chapter, only to supplement the text’s explanation and point out key misunderstood points. Chapter Highlights This chapter begins our study of the “beef” of statistics. The core of all statistics in calculation and use is descriptive statistics. All inferential statistics are based on descriptive statistics. Whenever you use statistics, you will always use descriptive statistics. Thus, the material of the chapter is rather brief. About half is examples of calculation. In short: It is time to get serious about studying the material. Ask lots of questions! Here is also a friendly tip: before and after reading the chapter, look over the outline of the chapter. For Chapter 3, this starts on p57. After reading, also look over the selected formulas in the front and/or back covers to get an idea of what is important to understand and study. Section 3.1 Statistics for Central Tendency Statistics of central tendency try to describe the average or typical case. Mean, median, and mode are three different statistics. p59 Mean is the mathematical average. It is easy to compute and used to calculate many other statistics. NOTE: x represents the mean of a sample, the mean of a population. p62 Mean is the arithmetic center, which means that all high scores are balanced in magnitude from the mean by low scores. Thus, it is affected by outliers greatly. Whereas the median (below) only counts the fact that there is one score, the mean accounts for its size as well. See Figure 3.2 p64. p64 Median, aka Md, is the middle score. p69 It is balanced by having the same number of scores above as below. This is thought of as the “true center”. Thus, the median is literally the center of the distribution. Only slightly sensitive to outliers. Ask yourself: why did I say slightly instead of never affected by outliers? The answer is that if one side of a distribution has more outliers than another, the median will be slightly affected by moving the number of unbalanced outliers. p70 Mode is the most frequently occurring score. Note that grouped distributions, the mode is the most frequently occurring class interval, and for the nominal variable, the largest category. If there are two equally frequent scores/intervals/categories, they are both the mode. You can have more than one mode. (For instance, the rectangular distribution would have every score the mode.) p71 Notice that having more than one mode is an indicator that the distribution has multiple peaks (unless the scores or class intervals are adjacent). Thus, the mode is an indicator that a distribution is not multimodal. See Figure 3.5. Also, if mode is not equal to the median, you should immediately ask if a distribution is unimodal. p72 Mode is the only central measure available for nominal data. Comments on central tendency: A wise man looks at all three measures. Information can be learned by comparing them. We will learn more comparisons as we progress. Section 3.2 Statistics for Variability Questions of how the scores vary from one another is different than their center. When we talk of scores varying, we are talking about how the distribution changes and how different the scores are from each other. This concept is very difficult to explain and understand; it is best to learn what the statistics of variability are, and then learn to “get a feel” for what variability is. Certainly, it is easier to understand central tendency from the idea that it is like the average than in reverse. So it is with variability. p73 Range measures how wide the distribution is. (This is only meaningful for interval or ratio variables.) The range is equal to the high score minus the low score. Thus, the range is highly sensitive to outliers -- it is determined by them! p74 An alternative way to communicate range is “the values range from x1 to x2.” This tells us more, for we get two pieces of information, x1 and x2, instead of one, x2 - x1. The interquartile range is the width of the half of the distribution, between the 25th and 75th percentile (the first and third quartiles). This excludes outliers. It is solely a descriptive statistic and we will never see it again after this chapter. However, we will use the IQ range and a box plot, based upon it, in lab. p76 Variance is by far the most important statistic. We use different notation for samples: s 2 and populations: 2 . Mathematically, the variance is the average squared deviations about the mean. A deviation is a fancy word for the distance from the mean. Thus, deviation = x x and x x 2 s 2 N p79 Notice that we can estimate 2 from a sample. x x 2 N 2 2 2 sˆ s N 1 N 1 Above, the corrected sample variance ŝ 2 , is always larger than s 2 , and there is a formula to interconvert them. You are probably asking yourself, HOW CARES? This has no point! Variance is better known as the square of the standard deviation, because the standard deviation is the statistic used the most. p80 Sum of Squares is the sum of squared deviations. Note that this is the numerator in the variance equation. This statistic is of little (to no use) right now. We will use it a great deal later. Discussion of standard deviation Imagine we want a statistic which is a standard unit, with which we can compare all distributions. This unit should measure the units of deviation from the mean. The number should be small if the scores are close, and large if far away. The key idea here is that close and large are relative terms, depending upon each distribution. We can not find the average deviation, because by the definition of the mean, x x 0. So we have to settle for the square root of the sum of square deviations. This is the standard deviation! s s x x sˆ sˆ x x 2 2 N Note that we also have a corrected standard deviation, based on the corrected variance. 2 2 N 1 Section 3.3 Skew Skew is unevenness in a distribution. Memorize these facts: 1. In an ideal distribution, Md x see p85 Figure 3.8(c). 2. In a positive skew, x Md see p85 Figure 3.8(a). 3. In a negative skew, x Md see p85 Figure 3.8(b). Notice that the skew (positive or negative refers to the relationship of x to Md. These regular measurements of skew do not tell is the relative magnitude of the skew compared to distribution. A skew could be –10, but when considering yearly salaries, this is nearly negligible. Thus, Pearson developed a measurement of skew that is relative to the distribution by using the standard deviation: 3x Md Sk s