advertisement

Psych 5500/6500 Measures of Central Tendency Fall, 2008 1 Measures of Central Tendency Various ways of indicating the most typical or average score. 1. Mean 2. Median 3. Mode 2 The Mean Y Y n ' Y' is themean of Y ‘n’ is the number of scores in the sample n Y or Y i 1 i meanssum all of thescores Some people use ‘n’ to represent the size of a sample and ‘N’ to represent the size of a population. I use either ‘n’ or ‘N’ apparently 3 arbitrarily and then I depend upon context to make it clear. Summation Symbols Y = 3, 4, 5, 8 n=4 n Y 3 4 5 8 20 Yi i 1 i Y1 3 Y2 4 Y3 5 Y impliesthatyou willstart with Y4 8 thefirst number and add all the way to 3 Y 4 5 9 i 2 i thelast number. 4 The Mean (Computation Example) Y Y n Y : 5, 4,1, 3, 2, 7 Y 5 4 1 3 2 7 22 n 6 22 Y 3.6666666. .. 3.67 6 5 Rounding Conventions The sample mean was 3.666666….with an infinite number of 6’s to the right of the decimal point. I would like to establish the following rules about rounding off your answers: 1. Go at least two places to the right of the decimal point (e.g. rounding off at 3.66 or 3.666 are ok but 3.6 is not). If you are using SPSS or having your calculator keep track of your intermediate calculations it won’t be rounding off at all and that is fine. 2. If the first number after that is ‘5’ or greater round up, if it is ‘4’ or less don’t round up. Thus 3.666 is rounded to 3.67, while 3.333 is rounded to 3.33 Now, if you know something about the topic of ‘significant figures’ this policy doesn’t make any sense. It will, however, keep all of you in the same ballpark when it comes to computing answers and handing them in to be graded. 6 Mean (Interesting Property #1) The mean is the balance point of a frequency distribution Y 1, 2, 2, 3, 3, 3, 5, 6 Y 3.13 7 Effect of Outliers One extreme score can have a big effect on the mean. Changelast 6 toa 10, Y 1, 2, 2, 3, 3, 3, 5,10 and Y 3.63 8 Outliers (cont.) Changelast 6 to a 50, Y 1, 2, 2, 3, 3, 3, 5, 50 and Y 8.63 Thus one outlier can dramatically affect the mean, making it no longer an effective representation of the majority of the scores. 9 Outliers and Skewed Data An extreme score (extreme when compared to the other scores in the distribution) is called an outlier. A distribution that has a number of extreme scores off in just one direction is said to be skewed. In general the mean is not a good measure of central tendency when you have an outlier or with skewed data as it is affected by the extreme scores off in one direction, making it no longer representative of the majority of the scores. 10 The Median The median is the middle score, the score that half of the scores are less than and half of the scores are greater than. 11 The Median (Computation) Step 1: First put the scores in order from smallest to largest. Step 2: If n is odd then the median is the one score in the middle, if n is even then the median is the mean of the two middle scores. 12 Median (Computation example) Example when ‘n’ is odd. Y = 1, 6, 5, 3, 2, 4, 2 Step 1: 1, 2, 2, 3, 4, 5, 6 Step 2: as n is odd (n=7) there is one score in the middle. The median = 3. 13 Median (Computation example) When when ‘n’ is even Y = 12, 9, 10, 8, 11, 7 Step 1: 7, 8, 9, 10, 11, 12 Step 2: as n is even (n=6) there are two scores in the middle, the median = (9+10)/2=9.5 14 Median (Interesting Property #1) The median divides the area of the histogram into two equal parts. 15 Effect of Outliers The median is not affect by an outlier. 16 Median: Special Case Sometimes, when the median is a value that occurs more than once in the data, then the simple formula I gave doesn’t quite work. For example, say your data are: Y = 1, 2, 2, 2, 3, 4 The median is ‘2’ but there is only one score below ‘2’ while there are two scores above ‘2’. In this case a median of 2 does not divide the area of the distribution into two equal pieces (see next slide). 17 Note we have 1+1/2+1/2+1/2 = 2.5 boxes below the median, while we have 2+1/2+1/2+1/2 = 3.5 boxes above the median. 18 If we tweak the value of the median a tad, then we get 1+2/3+2/3+2/3=3 boxes below the median, and 2 + 1/3+1/3+1/3= 3 boxes above the median. 19 Final Word on Median The ‘tweaking’ of the median to preserve its definition of dividing the area of the distribution into two equal parts is rarely done. Usually the simpler formula I have given (arrange the scores then find the middle of that list) is used, this is what SPSS does. Consequently, we will state that the median of Y = 1, 2, 2, 2, 3, 4 is ‘2’. 20 Mean, Median, and Skewed Data • The median is often preferred over the mean when you have skewed data. • Price of homes: $100,000 $130,000 $160,000 $180,000 $2,200,000 Mean = $554,000 Median = $160,000 21 The Mode The mode is the score that occurs the most. Y= 2, 4, 5, 5, 5, 7, 8, 9 Mode = 5 Sometimes there is no mode, sometimes there is more than one mode. 22 Mode (Semi-Interesting Property) The mode is the peak of a histogram 23 Bimodal Distributions The term bimodal is used when there are two peaks in the distribution even if both peaks aren’t exactly the same size. On a survey question measuring people’s views on a very controversial topic—one that few people feel neutral about--you might get a clump of low scores (with its own mode) and a clump of high scores (with its own mode) and the distribution could be called bimodal even if the two peaks are not identical in height (see graph below). 24 Nominal Scales and Central Tendency Racial background: 1=African American 2=Asian American 3=European American 4=Native American Y= 1, 1, 2, 4 Mean=2 Median=1.5 Mode=1 Only the mode makes sense. 25 Ordinal Scales and the Mean Size of household debt: 1=None ($0) 2=Tiny ($1 to $500) 3=Very Small ($501-$1000) 4=Large ($1000+) One person had a debt of $200 (Y=2) and one person had a dept of $2,000,000 (Y=4) Y= 2, 4 Mean=3 you are saying that the average debt in the sample was ‘Very Small’. This obviously isn’t working. 26 Ordinal Scales and the Median Size of household debt: 1=None ($0) 2=Tiny ($1 to $500) 3=Very Small ($501-$1000) 4=Large ($1000+) Y= 1, 3, 4 Median=3 you are saying that half the sample had a debt that was very small or less, and half had a debt that was very small or larger. This makes sense. 27 Ordinal Scales and the Mode Size of household debt: 1=None ($0) 2=Tiny ($1 to $500) 3=Very Small ($501-$1000) 4=Large ($1000+) Y= 1, 2, 3, 3, 3, 4 Mode=3 you are saying that the score that happened the most in the sample was a ‘3’, this also makes sense. 28 Rank Scales and Central Tendency (1) Within a sample: Order of finish in a foot race: Y = 1,2,3,4 Mean=2.5, Median=2.5, no mode You will get exactly the same values anytime you race four people, so what good are they? 29 Rank Scales and Central Tendency (2) When rank scores are used it is usually within a somewhat more complicated experimental design (e.g. one involving two groups). An example would be to take ten out-of-shape people, randomly divide them into two groups of 5 people each, have one group do a lot of training, then have all ten run a race and measure how they place in the race (a rank measure). The data might look like this: Training group: Y = 1, 2, 4, 5, 7 No training group: Y = 3, 6, 8, 9, 10 median = 4 median = 8 It looks like the ‘training group’ placed better in the race than the ‘no training group’. To compare the performance of the two groups you could compare the medians of the two groups (it would be inappropriate to use the means of the groups because these are ordinal-type numbers). There is no mode so you can’t use that. 30 Cardinal Scales and Central Tendency How many magazines various households subscribe to: Y= 1, 1, 2, 4 Mean=2 Median=1.5 Mode=1 They all make sense. 31 Selecting a Measure of Central Tendency 1. The most important guideline for selecting which measure of central tendency to use is to select the one that does the best job of representing the data given what you are trying to determine. Sometimes you would be more interested in knowing that most families in some sample had 2 children than you would in knowing that the average child per household was 2.43. Common sense, what you need to know, and which measure best represents what you need to know, will all determine which measure(s) you select. 32 Selecting a Measure of Central Tendency 2. By far, more statistical tools (including the ones we will be covering in this class) are developed around the mean than for any other measure of central tendency. Also, more people understand the mean as ‘the average’ than they do the other measures. 33 Selecting a Measure of Central Tendency 3. The median does a better job than the mean at describing skewed data. There are many more tools you can apply to the mean, however, and so it may make more sense to make the data be less skewed so you can use the mean. We will learn how to deskewify the data later in this class (don’t try to find that word in dictionary). 34 Selecting a Measure of Central Tendency 4. The measurement scale you use might determine which measure of central tendency would be appropriate (see the earlier slides) 35