Sierra Leone - LICH

advertisement
MEASURES OF
CENTRALITY
Last lecture summary
• Mode
• Distribution
Life expectancy data
Minimum
minimum = 47.8
Sierra Leone
Maximum
maximum = 84.3
Japan
Life expectancy data
all countries
Life expectancy data
half larger
73.2
half smaller
Egypt
1
99
197
Life expectancy data
Maximum = 83.4
Median = 73.2
Minimum = 47.8
Q1
1st quartile = 64.7
Sao Tomé & Príncipe
1
50 (¼ way)
197
Q1
1st quartile = 64.7
¼ smaller
¾ larger
Q3
3rd quartile = 76.7
Netherland
Antilles
1
148 (¾ way) 197
Q3
3rd quartile = 76.7
¾ smaller
¼ larger
Life expectancy data
Maximum = 83.4
3rd quartile = 76.7
Median = 73.2
1st quartile = 64.7
Minimum = 47.8
Box Plot
Box plot
maximum
median
3rd quartile
1st quartile
minimum
Quartiles, median – how to do it?
Find min, max, median, Q1, Q3 in these data.
Then, draw the box plot.
79, 68, 88, 69, 90, 74, 87, 93, 76
Another example
78, 93, 68, 84, 90, 74
Min. 1st Qu. Median
68.00 75.00 81.00
3rd Qu. Max.
88.50 93.00
Percentiles
vΔ›k [roky]
http://www.rustovyhormon.cz/on-line-rustove-grafy
Skeleton data
• Estimate age at death from skeletal remains
• Common problem in forensic anthropology
• Based on wear and deterioration of certain bones
• Measurements on 400 skeletons
• Two estimation methods
• Di Gangi et al., aspects of the first rib
• Suchey-Brooks, most common, pubic bone
http://www.bestcoloringpagesforkids.com/wp-content/uploads/2013/07/Skeleton-Coloring-Page.gif
• 400 skeletons, the estimated and the actual age of death
DiGangi
Modified boxplot
Min.
-60.00
Q1
Median Q3 Max.
-23.00 -13.00 -5.00 32.00
Mean
• Mathematical notation:
•
… Greek letter capital sigma
• means SUM in mathematics
• Another measure of the center of the data: mean
(average)
• π‘šπ‘’π‘Žπ‘› =
data values
# of data values
• Data values: π‘₯1 , π‘₯2 , … , π‘₯𝑛
• π‘šπ‘’π‘Žπ‘› = π‘₯ =
𝑛
𝑖=1 π‘₯𝑛
𝑛
Robust statistic
Median = -13
Mean = -14.2
Mean is not a robust statistic.
Median is a robust statistic.
Trimmed mean
Median = -13
Mean = -14.2
10% trimmed mean … eliminate
upper and lower 10% of data (i.e.
40 points).
10% trimmed mean = mean of 320
middle data values = -13.8
Trimmed mean is more robust.
33 750
33 750
33 750
33 750
Salary o 25 players of the American football (NY red
Bulls) in 2012.
44 000
44 000
44 000
44 000
45 566
65 000
95 000
103 500
112 495
138 188
141 666
181 500
185 000
190 000
194 375
195 000
205 000
292 500
301 999
4 600 000
5 600 000
median = 112 495
mean = 518 311
8% trimmed mean = 128 109
MEASURES OF
VARIABILITY
Navození atmosféry
QUESTION
Mean1
Mean2
Mode1
Mode2
Median1 Median2
range
(variační rozpΔ›tí)
MAX - min
Range
Range changes when we add new data into dataset
• Always
• Sometimes
• Never
Adding Mark Zuckerberg
Cut off data
IQR, mezikvartilové rozpΔ›tí
Interquartile range, IQR
Let’ take this quiz, answer yes ot not.
1. About 50% of the data fall within the IQR.
2. The IQR is affected by every value in the data set.
3. The IQR is not affected by outliers.
4. The mean is always between Q1 and Q3.
0 1 1 1 2 2 2 2 2 3 3 3 90
Q1=1
Q2
Q3=3
Define outlier
Sample
$38,946
$43,420
$49,191
$50,430
$50,557
$52,580
$53,595
$54,135
$60,181
$10,000,000
Outlier < 𝑄1 − 1.5 × πΌπ‘„π‘…
OR
Outlier > 𝑄3 + 1.5 × πΌπ‘„π‘…
What values are outliers for this
data set?
1. $60,000
2. $80,000
3. $100,000
4. $200,000
Problem with IQR
normal
bimodal
uniform
Options for measuring variability
• Find the average distance between all pairs of data
values.
• Find the average distance between each data value and
either the max or the min.
• Find the average distance between each data value and
the mean.
Average distance from mean
Sample
10
5
3
2
19
1
7
11
1
1
Average distance from mean
Sample
10
5
3
2
19
1
7
11
1
1
Deviation from mean (π‘₯𝑖 − π‘₯)
Average distance from mean
Sample
Deviation from mean (π‘₯𝑖 − π‘₯)
10
4
5
-1
3
-3
2
-4
19
13
1
-5
7
1
11
5
1
-5
1
-5
(π‘₯𝑖 − π‘₯) = 0
Find the average distance between
each data value and the mean.
Preventing cancellation
• How can we prevent the negative and positive deviations
from cancelling each out?
1.
2.
3.
4.
Ignore (i.e. delete) the negative sign.
Multiply each deviation by two.
Square each deviation.
Take absolute value of each deviation.
Download