Measures of Position Where does a certain data value fit in relative to the other data values? To accompany Hawkes lesson 3.3 Original content by D.R.S. 1 Nth Place • The highest and the lowest • 2nd highest, 3rd highest, etc. • “If I made $60,000, I would be 6th richest.” 2 Another view: “How does my π₯ compare to the mean?” • “Am I in the middle of the pack?” • “Am I above or below the middle?” • “Am I extremely high or extremely low?” • π§ Score is the measuring stick 3 π Score: π₯ is how many standard deviations away from the mean? If you know the x value • Population: π₯−π π§= π • Sample π₯−π₯ π§= π To work backward from z to x • Population π₯ =π§βπ+π • Sample π₯ =π§βπ +π₯ 4 π§ score is also called “Standard Score” • No matter what π₯ is measured in or how large or small the π₯ values are…. • The π§ score of the mean will be 0 – Because numerator π₯ − π₯ turns out to be 0. • If π₯ is above the mean, its π§ is positive. – Because numerator π₯ − π₯ turns out to be positive • If π₯ is below the mean, its π§ is negative. – Because numerator π₯ − π₯ turns out to be negative 5 π§ score values • Typically round to two decimal places. – Don’t say “0.2589”, say “0.26” • If not two decimal places, pad – Don’t say “2”, say “2.00” – Don’t say “-1.1”, say “-1.10” • π§ scores are almost always in the interval − 4 < π§ < 4. Be very suspicious if you calculate a π§ score that’s not a small number. 6 Practice: Given x, compute z Find the π§ scores corresponding to the π₯ salary values, given that the mean, π₯ = $51168 and the standard deviation π = $16291. • π₯ = $90,000 • π₯ = $70,000 • π₯ = $50,000 • π₯ = $30,000 • π₯ = $10,000 7 Practice: Given z, compute x Find the π₯ scores (salaries) corresponding to these π§ standard scores, given that the mean, π₯ = $51168 and the standard deviation π = $16291. • π§=0 • π§ = 1 and π§ = −1 • π§ = 2 and π§ = −2 • π§ = 3 and π§ = −3 8 Two parallel axes (scales), π₯ and π§ 9 Example: Using π§ scores to compare unlike items The Literature test • The mean score was 77 points. • The standard deviation was 11 points • Sue earned 91 points • Find her z score for this test The Biology test • The mean score was 47 points • The standard deviation was 6 points • Sue earned 55 points • Find her z score for this test • On which test did she have the “better” performance? 10 π§ scores caution with negatives • Example: compare test scores on two different tests to ascertain “Which score was the more outstanding of the two?” • Be careful if the π§ scores turn out to be negative. Which is the better performance? π§ = −1.99 or π§ = −0.34 ? • Stop and think back to your basic number line and the meaning of “<“ and “>” 11 Percentiles • “What percent of the values are lower than my value?” – 90th percentile is pretty high – 50th percentile is right in the middle – 10th percentile is pretty low • If you scored in the 99th percentile on your SAT, I hope you got a scholarship. 12 Salary data for our percentile examples • With these salary values again • What’s the percentile for a salary of $59,000 ? • You can see it’s going to be higher than 50th Because it’s in the top half. 13 Example: Given x, find the percentile • Count π₯ = how many values below $59,000 • Count π = how many values in the data set π₯ π • Formula for percentile π = β 100% • Here we have π₯ = 15 values lower than our $59,000 • Here we have π = 20 values in the data set. • π= 15 20 β 100% so π = 75, “75th percentile” 14 Continued: Given x, find the percentile • π= 15 20 β 100% so π = 75 • Do not say “75%”, but say “the 75th percentile” • Other sources use different formulas, beware! – Some other books use π₯ + 0.5 in the numerator. – Excel has two different answers, PERCENTILE.EXC and PERCENTILE.INC functions. 15 Given Percentile π, find the π₯ value • Formula: position from bottom π = πβπ 100 – Again, π = how many data values in the set – and π = the percentile rank that’s given. • Is there a decimal remainder in position π? – If so, then BUMP UP to the next highest whole # and take the value in that position. – Or if π is an exact whole number, take the average from positions π and (π + 1). • Note: Book uses lowercase π instead of π. 16 Given Percentile π, find the π₯ value • Example: What is the 31st percentile in the salary data? • 31st percentile: plug in π = 20, π = 31 • Compute π = 20β31 100 = 6.2. It has a remainder. • Bump it up! π =7. – Not rounding, but rather bumpety-upping • So we look 7 positions from the bottom • “The 31st percentile is $44,476” 17 Given Percentile π, find the π₯ value • Example: What is the 40th percentile in the salary data? Plug in π = 20, π = 40 • Compute π = 20β40 100 = 8. Exact integer! • So count π = 8th and π + 1 =9th from bottom. 47043+47692 2 • “The percentile is $47,367.50, or $47,368.” 40th = 18 Excel gives different answers • Excel does some fancy interpolation 19 Quartiles Q1, Q2, Q3 • • • • • Data values are arranged from low to high. The Quartiles divide the data into four groups. Q2 is just another name for the Median. Q1 = Find the Median of Lowest to Q2 values Q3 = Find the Median of Q2 to Highest values • It gets tricky, depending on how many values. 20 Quartiles example • • • • • 10, 20, 30, 40, 50, 60, 70, 80, 90 The Second Quartile, Q2 = median = 50 Find the medians of the subsets left and right. Keep the 50 in each of those subsets. The First Quartile, Q1 = median of { 10, 20, 30, 40, 50 } = 30 • The Third Quartile, Q3 = median of { 50, 60, 70, 80, 90 } = 70 21 Quartiles example • 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 • Q2 = median 50+60 = 2 = ππ. (two middle #s) • Leave the 50 and 60 in place; do not reuse 55 • Q1 = median of {10, 20, 30, 40, 50} = 30 • Q3 = median of {60, 70, 80, 90, 100} = 80 22 Quartiles example • 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110 • Q2 = median • • • • • 50+60 = 2 = ππ (two middle #s). 55 isn’t really there so you can’t remove it! Leave the 50 and 60 in place Q1 = median of {0, 10, 20, 30, 40, 50} = 25 Q3 = median of {60, 70, 80, 90, 100, 110} = 85 Two middle numbers happened again! 23 Interquartile Range • Definition: IQR = Q3 – Q1 • In the previous example, 85 – 25 = 60. • Interquartile Range measures how spread out the middle of the data are – The lowest quartile (x < Q1) is not involved – And the highest quartile (x > Q3) is not involved. 24 Quartiles with TI-84 • 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110 • Put values into a TI-84 List • Use STAT, CALC, 1-Var Stats • Scroll down down down to get to them. 25 There is disagreement about Quartiles • The TI-84 sometimes gives different answers than the method we use in the Hawkes materials • Excel might give different answers from Hawkes and TI-84, both. • Use the Hawkes method in this course’s work • Be aware of the others – You should know how to use TI-84 and Excel – You should be aware that differences can occur. 26 Quartiles with TI-84 vs. Hawkes • 10, 20, 30, 40, 50, 60, 70, 80, 90 • We got Q1=30 and Q3=70 before. • Hawkes keeps the 50, using 10,20,30,40,50 to compute Q1. • But the TI-84 throws out 50 and uses 10,20,30,40. • Hawkes says the TI-84 is computing “hinges”. 27 Quartiles in Excel • =QUARTILE.INC(cells, 1 or 2 or 3) seems to give the same results as the old QUARTILE function • There’s new =QUARTILE.EXC(cells, 1 or 2 or 3) • Excel does fancy interpolation stuff and may give different Q1 and Q3 answers compared to the TI-84 and our by-hand methods. 28 The Five Number Summary • Again: 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110 • Q2 = median 50+60 = 2 = ππ, Q1 = 25 and Q3 = 85 • “The Five Number Summary” is defined as: the minimum, then Q1, Q2, Q3, then the maximum • For this set of numbers, the Five Number Summary is “0, 25, 55, 85, 110” 29 The Five Number Summary • Again: 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110 • Q2=55, Q1=25, Q3 = 85 • Min is 0, Max is 110 • For this set of numbers, the Five Number Summary is “0, 25, 55, 85, 110” • Box Plot Min 0 Q1 25 Q2 55 Q3 85 Max 110 • TI-84 can do Box Plot too, but again its quartiles disagree with the way Hawkes defines quartiles. 30 Why Box Plot? • Don’t lose sight of the big picture here: – We have a data set – It’s a bunch of numbers – We want to summarize the data • Summarize means make it into a sound bite – We must be Concise – don’t say too much – We must be Informative – don’t say too little 31 We must be Concise • Bad: “Here is a report that tells you the mean and the variance and the standard deviation and the quartiles and the percentiles from 0 to 100… and the marketing survey analyzed by demographic subgroups …” (there is a place for that, but not right now) • Good: “Got fifteen seconds? Here’s what we found.” 32 Notice the pieces of the boxplot: • Horizontal scale, maybe a little beyond the min and the max. A generic number line. • The five numbers. • The box holds the quartiles – With a line in the middle at the median. • The whiskers extend out to the min and the max. 33 TI-84 Boxplot • See instructions on separate handout. • Caution again that TI-84 computes quartiles differently from Hawkes and differently from Excel, so the results aren’t always going to agree. 34 Additional Topics • Might not be needed for Hawkes homework • But you should be aware of them • Quintiles and Deciles • Interquartile Range and Outliers • TI-84 Box Plot 35 Quintiles and Deciles • You might also encounter – Quintiles, dividing data set into 5 groups. – Deciles, dividing data set into 10 groups. • Reconcile everything back with percentiles: – Quartiles correspond to percentiles 25, 50, 75 – Deciles correspond to percentiles 10, 20, …, 90 – Quintiles correspond to percentiles 20, 40, 60, 80 36 Interquartile Range and Outliers • Concept: An OUTLIER is a wacky far-out abnormally small or large data value compared to the rest of the data set. • We’d like something more precise. • Define: IQR = Interquartile Range = Q3 – Q1. • Define: If π₯ < π₯ − 1.5 β πΌππ , π₯ is an Outlier. • Define: If π₯ > π₯ + 1.5 β πΌππ , π₯ is an Outlier. • (Other books might make different definitions) 37 Outliers Example • • • • Here’s an quick elementary example: Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20 Mean π₯ = 6.8 and πΌππ = 9 – 3 = 6 Or in Hawkes method, π1 = 3.5, π3 = 9.5, and we still get interquartile range = 9.5 – 3.5 = 6 (it won’t always work out the same but in this case the IQR is the same either way) 38 Outliers Example • • • • • Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20 We found IQR = 6 and the mean is 6.8 One definition uses πΌππ ∗ 1.5 to define outliers Here, 6 ∗ 1.5 = 9 Anything more than 9 units away from π is then considered to be abnormally small or large. • 6.8 – 9 = −3.2, nothing smaller than −3.2 • 6.8 + 9 = 15.8: the 20 is an outlier. 39 No-Outliers Example • Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10 • Mean π₯ = 5.9 and πΌππ = 9 – 3 = 6 (coincidence that π₯ = πΌππ , insignificant) • πΌππ ∗ 1.5 = 9 • Anything more than 9 units away from π is abnormal. 5.9 − 9 = −3.1; 5.9 + 9 = 14.9 • This data set has No Outliers. 40 Outliers: Good or Bad? • “I have an outlier in my data set. Should I be concerned?” – Could be bad data. A bad measurement. Somebody not being honest with the pollster. – Could be legitimately remarkable data, genuine true data that’s extraordinarily high or low. • “What should I do about it?” – The presence of an outlier is shouting for attention. Evaluate it and make an executive decision. 41