3.3 Position

advertisement
Measures of Position
Where does a certain data value fit in
relative to the other data values?
Nth Place
• The highest and the lowest
• 2nd highest, 3rd highest, etc.
• “If I made $60,000, I would be 6th richest.”
Another view: “How does my π‘₯
compare to the mean?”
• “Am I in the middle of the pack?”
• “Am I above or below the middle?”
• “Am I extremely high or extremely low?”
• 𝑧 Score is the measuring stick
𝒛 Score: π‘₯ is how many standard
deviations away from the mean?
If you know the x value
• Population:
π‘₯−π‘₯
𝑧=
𝜎
To work backward from z to x
• Population
π‘₯ =π‘§βˆ™πœŽ+π‘₯
• Sample
• Sample
π‘₯ =π‘§βˆ™π‘ +π‘₯
π‘₯−π‘₯
𝑧=
𝑠
𝑧 score is also called “Standard Score”
• No matter what π‘₯ is measured in or how large
or small the π‘₯ values are….
• The 𝑧 score of the mean will be 0
– Because numerator π‘₯ − π‘₯ turns out to be 0.
• If π‘₯ is above the mean, its 𝑧 is positive.
– Because numerator π‘₯ − π‘₯ turns out to be positive
• If π‘₯ is below the mean, its 𝑧 is negative.
– Because numerator π‘₯ − π‘₯ turns out to be negative
𝑧 score basics, continued
• Typically round to two decimal places.
– Don’t say “0.2589”, say “0.26”
• If not two decimal places, pad
– Don’t say “2”, say “2.00”
– Don’t say “-1.1”, say “-1.10”
• 𝑧 scores are almost always in the interval
− 4 < 𝑧 < 4. Be very suspicious if you
calculate a 𝑧 score that’s not a small number.
Practice computing z scores
• What are the 𝑧 scores for the salary values
π‘₯ = 90000, 70000, 50000, 30000, 10000 ?
• What are the salaries corresponding to the 𝑧
scores 0.5, −0.5, 1, −1, 2, −2,3, −3 ?
• Helpful necessary information:
Two parallel axes (scales), π‘₯ and 𝑧
𝑧 scores can compare unlike values
• Textbook’s example on next slide – they
compare test scores on two different tests to
ascertain “Which score was the more
outstanding of the two?”
• Be careful if the 𝑧 scores turn out to be
negative. Which is the better performance?
𝑧 = −1.99 or 𝑧 = −0.34 ?
Example 3-29: Test Scores
A student scored 65 on a calculus test that had a
mean of 50 and a standard deviation of 10; she scored
30 on a history test with a mean of 25 and a standard
deviation of 5. Compare her relative positions on the
two tests.
X ο€­ X 65 ο€­ 50
zο€½
ο€½
ο€½ 1.5 Calculus
s
10
X ο€­ X 30 ο€­ 25
zο€½
ο€½
ο€½ 1.0 History
s
5
She has a higher relative position in the Calculus class.
Bluman, Chapter 3
10
Percentiles
• “What percent of the values are lower than
my value?”
– 90th percentile is pretty high
– 50th percentile is right in the middle
– 10th percentile is pretty low
• If you scored in the 99th percentile on your
SAT, I hope you got a scholarship.
Given value π‘₯, what’s its percentile?
• With these
salary values
again
• What’s the
percentile for a salary of $59,000 ?
• You can see it’s going to be higher than 50th.
Example: Finding the percentile
• Count π‘˜ = how many values below $59,000
• Formula for percentile 𝑝 =
• 𝑝=
15+0.5
βˆ™
20
100%
𝑝 = 77.5
• 78th percentile
π‘˜+0.5
βˆ™
𝑛
100%
Excel will find the percentile
• Excel will compute it but slightly differently.
• PERCENTRANK.EXC(cells, value)
• For $59,000
Excel gives 0.74
• It does some fancy
“interpolation”
to come up with
its results
Given Percentile, what’s π‘₯ value?
• Formula: position from bottom 𝑐 =
π‘›βˆ™π‘
100
– Again, 𝑛 = how many data values in the set
– and 𝑝 = the percentile rank that’s given.
– If there’s a decimal remainder, drop it.
– If it’s integer, take average of 𝑐 th and (𝑐 + 1)th.
•
33rd
percentile: 𝑐 =
20βˆ™33
100
= 6.6
• So we look 6 positions from the bottom
Given percentile, find π‘₯ (continued)
•
33rd
percentile: 𝑐 =
20βˆ™33
100
= 6.6
• So we look 6 positions from the bottom
• $43,546
• Excel: =PERCENTILE.EXC(cells,0.33)=$44,411
Quartiles Q1, Q2, Q3
•
•
•
•
•
Data values are arranged from low to high.
The Quartiles divide the data into four groups.
Q2 is just another name for the Median.
Q1 = Find the Median of Lowest to Q2 values
Q3 = Find the Median of Q2 to Highest values
• It gets tricky, depending on how many values.
Quartiles example
•
•
•
•
•
0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
Q2 = median = 50 in the middle.
Remove it and split into subsets left and right.
Q1 = median(0, 10, 20, 30, 40) = 20
Q3 = median(60, 70, 80, 90, 100) = 80
Quartiles example
• 10, 20, 30, 40, 50, 60, 70, 80, 90, 100
• Q2 = median
•
•
•
•
50+60
=
2
= 55. (two middle #s)
55 isn’t really there so you can’t remove it!
Leave the 50 and 60 in place
Q1 = median(10, 20, 30, 40, 50) = 30
Q3 = median(60, 70, 80, 90, 100) = 80
Quartiles example
• 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110
• Q2 = median
•
•
•
•
•
50+60
=
2
= 55 (two middle #s).
55 isn’t really there so you can’t remove it!
Leave the 50 and 60 in place
Q1 = median(0, 10, 20, 30, 40, 50) = 25
Q3 = median(60, 70, 80, 90, 100, 110) = 85
Two middle numbers happened again!
Quartiles with TI-84
• 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110
• Put values into a TI-84 List
• Use STAT, CALC, 1-Var Stats
Quartiles in Excel
• =QUARTILE.INC(cells, 1 or 2 or 3) seems to
give the same results as the old QUARTILE
function
• There’s new =QUARTILE.EXC(cells, 1 or 2 or 3)
• Excel does fancy interpolation stuff and may
give different Q1 and Q3 answers compared to
the TI-84 and our by-hand methods.
Quintiles and Deciles
• You might also encounter
– Quintiles, dividing data set into 5 groups.
– Deciles, dividing data set into 10 groups.
• Reconcile everything back with percentiles:
– Quartiles correspond to percentiles 25, 50, 75
– Deciles correspond to percentiles 10, 20, …, 90
– Quintiles correspond to percentiles 20, 40, 60, 80
Interquartile Range and Outliers
• Concept: An OUTLIER is a wacky far-out
abnormally small or large data value
compared to the rest of the data set.
• We’d like something more precise.
• Define: IQR = Interquartile Range = Q3 – Q1.
• Define: If π‘₯ < π‘₯ − 1.5 βˆ™ 𝐼𝑄𝑅, π‘₯ is an Outlier.
• Define: If π‘₯ > π‘₯ + 1.5 βˆ™ 𝐼𝑄𝑅, π‘₯ is an Outlier.
• (Other books might make different definitions)
Outliers Example
•
•
•
•
•
Here’s an quick elementary example:
Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20
Mean π‘₯ = 6.8 and 𝐼𝑄𝑅 = 9 – 3 = 6
𝐼𝑄𝑅 ∗ 1.5 = 9
Anything more than 9 units away from 𝒙 is
abnormal. Outlier, Outlier, Pants on Fire.
• The 20 is an outlier.
No-Outliers Example
• Data values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10
• Mean π‘₯ = 5.9 and 𝐼𝑄𝑅 = 9 – 3 = 6
(coincidence that π‘₯ = 𝐼𝑄𝑅, insignificant)
• 𝐼𝑄𝑅 ∗ 1.5 = 9
• Anything more than 9 units away from 𝒙 is
abnormal. 5.9 − 9 = −3.1; 5.9 + 9 = 14.9
• This data set has No Outliers.
Outliers: Good or Bad?
• “I have an outlier in my data set.
Should I be concerned?”
– Could be bad data. A bad measurement.
Somebody not being honest with the pollster.
– Could be legitimately remarkable data, genuine
true data that’s extraordinarily high or low.
• “What should I do about it?”
– The presence of an outlier is shouting for
attention. Evaluate it and make an executive
decision.
Download