Measures of Position Where does a certain data value fit in relative to the other data values? Nth Place • The highest and the lowest • 2nd highest, 3rd highest, etc. • “If I made $60,000, I would be 6th richest.” Another view: “How does my π₯ compare to the mean?” • “Am I in the middle of the pack?” • “Am I above or below the middle?” • “Am I extremely high or extremely low?” • π§ Score is the measuring stick π Score: π₯ is how many standard deviations away from the mean? If you know the x value • Population: π₯−π₯ π§= π To work backward from z to x • Population π₯ =π§βπ+π₯ • Sample • Sample π₯ =π§βπ +π₯ π₯−π₯ π§= π π§ score is also called “Standard Score” • No matter what π₯ is measured in or how large or small the π₯ values are…. • The π§ score of the mean will be 0 – Because numerator π₯ − π₯ turns out to be 0. • If π₯ is above the mean, its π§ is positive. – Because numerator π₯ − π₯ turns out to be positive • If π₯ is below the mean, its π§ is negative. – Because numerator π₯ − π₯ turns out to be negative π§ score basics, continued • Typically round to two decimal places. – Don’t say “0.2589”, say “0.26” • If not two decimal places, pad – Don’t say “2”, say “2.00” – Don’t say “-1.1”, say “-1.10” • π§ scores are almost always in the interval − 4 < π§ < 4. Be very suspicious if you calculate a π§ score that’s not a small number. Practice computing z scores • What are the π§ scores for the salary values π₯ = 90000, 70000, 50000, 30000, 10000 ? • What are the salaries corresponding to the π§ scores 0.5, −0.5, 1, −1, 2, −2,3, −3 ? • Helpful necessary information: Two parallel axes (scales), π₯ and π§ π§ scores can compare unlike values • Textbook’s example on next slide – they compare test scores on two different tests to ascertain “Which score was the more outstanding of the two?” • Be careful if the π§ scores turn out to be negative. Which is the better performance? π§ = −1.99 or π§ = −0.34 ? Example 3-29: Test Scores A student scored 65 on a calculus test that had a mean of 50 and a standard deviation of 10; she scored 30 on a history test with a mean of 25 and a standard deviation of 5. Compare her relative positions on the two tests. X ο X 65 ο 50 zο½ ο½ ο½ 1.5 Calculus s 10 X ο X 30 ο 25 zο½ ο½ ο½ 1.0 History s 5 She has a higher relative position in the Calculus class. Bluman, Chapter 3 10 Percentiles • “What percent of the values are lower than my value?” – 90th percentile is pretty high – 50th percentile is right in the middle – 10th percentile is pretty low • If you scored in the 99th percentile on your SAT, I hope you got a scholarship. Given value π₯, what’s its percentile? • With these salary values again • What’s the percentile for a salary of $59,000 ? • You can see it’s going to be higher than 50th. Example: Finding the percentile • Count π = how many values below $59,000 • Formula for percentile π = • π= 15+0.5 β 20 100% π = 77.5 • 78th percentile π+0.5 β π 100% Excel will find the percentile • Excel will compute it but slightly differently. • PERCENTRANK.EXC(cells, value) • For $59,000 Excel gives 0.74 • It does some fancy “interpolation” to come up with its results Given Percentile, what’s π₯ value? • Formula: position from bottom π = πβπ 100 – Again, π = how many data values in the set – and π = the percentile rank that’s given. – If there’s a decimal remainder, drop it. – If it’s integer, take average of π th and (π + 1)th. • 33rd percentile: π = 20β33 100 = 6.6 • So we look 6 positions from the bottom Given percentile, find π₯ (continued) • 33rd percentile: π = 20β33 100 = 6.6 • So we look 6 positions from the bottom • $43,546 • Excel: =PERCENTILE.EXC(cells,0.33)=$44,411 Quartiles Q1, Q2, Q3 • • • • • Data values are arranged from low to high. The Quartiles divide the data into four groups. Q2 is just another name for the Median. Q1 = Find the Median of Lowest to Q2 values Q3 = Find the Median of Q2 to Highest values • It gets tricky, depending on how many values. Quartiles example • • • • • 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 Q2 = median = 50 in the middle. Remove it and split into subsets left and right. Q1 = median(0, 10, 20, 30, 40) = 20 Q3 = median(60, 70, 80, 90, 100) = 80 Quartiles example • 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 • Q2 = median • • • • 50+60 = 2 = 55. (two middle #s) 55 isn’t really there so you can’t remove it! Leave the 50 and 60 in place Q1 = median(10, 20, 30, 40, 50) = 30 Q3 = median(60, 70, 80, 90, 100) = 80 Quartiles example • 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110 • Q2 = median • • • • • 50+60 = 2 = 55 (two middle #s). 55 isn’t really there so you can’t remove it! Leave the 50 and 60 in place Q1 = median(0, 10, 20, 30, 40, 50) = 25 Q3 = median(60, 70, 80, 90, 100, 110) = 85 Two middle numbers happened again! Quartiles with TI-84 • 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110 • Put values into a TI-84 List • Use STAT, CALC, 1-Var Stats Quartiles in Excel • =QUARTILE.INC(cells, 1 or 2 or 3) seems to give the same results as the old QUARTILE function • There’s new =QUARTILE.EXC(cells, 1 or 2 or 3) • Excel does fancy interpolation stuff and may give different Q1 and Q3 answers compared to the TI-84 and our by-hand methods. Quintiles and Deciles • You might also encounter – Quintiles, dividing data set into 5 groups. – Deciles, dividing data set into 10 groups. • Reconcile everything back with percentiles: – Quartiles correspond to percentiles 25, 50, 75 – Deciles correspond to percentiles 10, 20, …, 90 – Quintiles correspond to percentiles 20, 40, 60, 80 Interquartile Range and Outliers • Concept: An OUTLIER is a wacky far-out abnormally small or large data value compared to the rest of the data set. • We’d like something more precise. • Define: IQR = Interquartile Range = Q3 – Q1. • Define: If π₯ < π₯ − 1.5 β πΌππ , π₯ is an Outlier. • Define: If π₯ > π₯ + 1.5 β πΌππ , π₯ is an Outlier. • (Other books might make different definitions) Outliers Example • • • • • Here’s an quick elementary example: Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20 Mean π₯ = 6.8 and πΌππ = 9 – 3 = 6 πΌππ ∗ 1.5 = 9 Anything more than 9 units away from π is abnormal. Outlier, Outlier, Pants on Fire. • The 20 is an outlier. No-Outliers Example • Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10 • Mean π₯ = 5.9 and πΌππ = 9 – 3 = 6 (coincidence that π₯ = πΌππ , insignificant) • πΌππ ∗ 1.5 = 9 • Anything more than 9 units away from π is abnormal. 5.9 − 9 = −3.1; 5.9 + 9 = 14.9 • This data set has No Outliers. Outliers: Good or Bad? • “I have an outlier in my data set. Should I be concerned?” – Could be bad data. A bad measurement. Somebody not being honest with the pollster. – Could be legitimately remarkable data, genuine true data that’s extraordinarily high or low. • “What should I do about it?” – The presence of an outlier is shouting for attention. Evaluate it and make an executive decision.