Teaching Assistant Chapter Highlights and Outline

advertisement
Teaching Assistant Chapter Highlights and Quiz Review
Quiz Info, Notes from Chap 3
Authored by Jeremy Cox
Quizzes over Chapter 3
Wednesday, January 24, 2001 Quiz 3-1: Section 3.1
Friday, January 26, 2001
Quiz 3-2: Sections 3.2
Monday, January 28, 2001
Quiz 3-3: Section 3.3, Cumulative
These notes are not meant to replace a thorough reading and study of the chapter, only to
supplement the text’s explanation and point out key misunderstood points.
Chapter Highlights
This chapter begins our study of the “beef” of statistics. The core of all statistics
in calculation and use is descriptive statistics. All inferential statistics are based on
descriptive statistics. Whenever you use statistics, you will always use descriptive
statistics. Thus, the material of the chapter is rather brief. About half is examples of
calculation. In short: It is time to get serious about studying the material. Ask lots of
questions!
Here is also a friendly tip: before and after reading the chapter, look over the
outline of the chapter. For Chapter 3, this starts on p57. After reading, also look over the
selected formulas in the front and/or back covers to get an idea of what is important to
understand and study.
Section 3.1 Statistics for Central Tendency
Statistics of central tendency try to describe the average or typical case.
Mean, median, and mode are three different statistics.
p59 Mean is the mathematical average. It is easy to compute and used to
calculate many other statistics.
NOTE: x represents the mean of a sample,  the mean of a population.
p62 Mean is the arithmetic center, which means that all high scores are balanced
in magnitude from the mean by low scores. Thus, it is affected by outliers greatly.
Whereas the median (below) only counts the fact that there is one score, the mean
accounts for its size as well. See Figure 3.2 p64.
p64 Median, aka Md, is the middle score. p69 It is balanced by having the same
number of scores above as below. This is thought of as the “true center”. Thus, the
median is literally the center of the distribution. Only slightly sensitive to outliers. Ask
yourself: why did I say slightly instead of never affected by outliers?
The answer is that if one side of a distribution has more outliers than another, the
median will be slightly affected by moving the number of unbalanced outliers.
p70 Mode is the most frequently occurring score. Note that grouped
distributions, the mode is the most frequently occurring class interval, and for the
nominal variable, the largest category.
If there are two equally frequent scores/intervals/categories, they are both the
mode. You can have more than one mode. (For instance, the rectangular distribution
would have every score the mode.)
p71 Notice that having more than one mode is an indicator that the distribution
has multiple peaks (unless the scores or class intervals are adjacent). Thus, the mode is
an indicator that a distribution is not multimodal. See Figure 3.5. Also, if mode is not
equal to the median, you should immediately ask if a distribution is unimodal.
p72 Mode is the only central measure available for nominal data.
Comments on central tendency:
A wise man looks at all three measures. Information can be learned by comparing
them. We will learn more comparisons as we progress.
Section 3.2 Statistics for Variability
Questions of how the scores vary from one another is different than their center.
When we talk of scores varying, we are talking about how the distribution changes and
how different the scores are from each other. This concept is very difficult to explain and
understand; it is best to learn what the statistics of variability are, and then learn to “get a
feel” for what variability is. Certainly, it is easier to understand central tendency from
the idea that it is like the average than in reverse. So it is with variability.
p73 Range measures how wide the distribution is. (This is only meaningful for
interval or ratio variables.) The range is equal to the high score minus the low score.
Thus, the range is highly sensitive to outliers -- it is determined by them!
p74 An alternative way to communicate range is “the values range from x1 to x2.”
This tells us more, for we get two pieces of information, x1 and x2, instead of one, x2 - x1.
The interquartile range is the width of the half of the distribution, between the 25th
and 75th percentile (the first and third quartiles). This excludes outliers. It is solely a
descriptive statistic and we will never see it again after this chapter. However, we will
use the IQ range and a box plot, based upon it, in lab.
p76 Variance is by far the most important statistic. We use different notation for
samples: s 2 and populations:  2 .
Mathematically, the variance is the average squared deviations about the mean. A
deviation is a fancy word for the distance from the mean. Thus,
deviation = x  x  and
x  x 

2
s
2
N
p79 Notice that we can estimate  2 from a sample.
x  x 2 N 2

2
2
  sˆ 

s
N 1
N 1
Above, the corrected sample variance ŝ 2 , is always larger than s 2 , and there is a
formula to interconvert them.
You are probably asking yourself, HOW CARES? This has no point! Variance is
better known as the square of the standard deviation, because the standard deviation is the
statistic used the most.
p80 Sum of Squares is the sum of squared deviations. Note that this is the
numerator in the variance equation. This statistic is of little (to no use) right now. We
will use it a great deal later.
Discussion of standard deviation
Imagine we want a statistic which is a standard unit, with which we can compare
all distributions. This unit should measure the units of deviation from the mean. The
number should be small if the scores are close, and large if far away. The key idea here
is that close and large are relative terms, depending upon each distribution.
We can not find the average deviation, because by the definition of the mean,
 x  x   0. So we have to settle for the square root of the sum of square deviations.
This is the standard deviation!
s s 
 x  x 
sˆ  sˆ 
 x  x 
2
2
N
Note that we also have a corrected standard deviation, based on the corrected variance.
2
2
N 1
Section 3.3 Skew
Skew is unevenness in a distribution.
Memorize these facts:
1.
In an ideal distribution, Md  x see p85 Figure 3.8(c).
2.
In a positive skew, x  Md see p85 Figure 3.8(a).
3.
In a negative skew, x  Md see p85 Figure 3.8(b).
Notice that the skew (positive or negative refers to the relationship of x to Md.
These regular measurements of skew do not tell is the relative magnitude of the
skew compared to distribution. A skew could be –10, but when considering yearly
salaries, this is nearly negligible.
Thus, Pearson developed a measurement of skew that is relative to the distribution
by using the standard deviation:
3x  Md 
Sk 
s
Download