MEASURES OF CENTRALITY Last lecture summary • Mode • Distribution Life expectancy data Minimum minimum = 47.8 Sierra Leone Maximum maximum = 84.3 Japan Life expectancy data all countries Life expectancy data half larger 73.2 half smaller Egypt 1 99 197 Life expectancy data Maximum = 83.4 Median = 73.2 Minimum = 47.8 Q1 1st quartile = 64.7 Sao Tomé & Príncipe 1 50 (¼ way) 197 Q1 1st quartile = 64.7 ¼ smaller ¾ larger Q3 3rd quartile = 76.7 Netherland Antilles 1 148 (¾ way) 197 Q3 3rd quartile = 76.7 ¾ smaller ¼ larger Life expectancy data Maximum = 83.4 3rd quartile = 76.7 Median = 73.2 1st quartile = 64.7 Minimum = 47.8 Box Plot Box plot maximum median 3rd quartile 1st quartile minimum Quartiles, median – how to do it? Find min, max, median, Q1, Q3 in these data. Then, draw the box plot. 79, 68, 88, 69, 90, 74, 87, 93, 76 Another example 78, 93, 68, 84, 90, 74 Min. 1st Qu. Median 68.00 75.00 81.00 3rd Qu. Max. 88.50 93.00 Percentiles vΔk [roky] http://www.rustovyhormon.cz/on-line-rustove-grafy Skeleton data • Estimate age at death from skeletal remains • Common problem in forensic anthropology • Based on wear and deterioration of certain bones • Measurements on 400 skeletons • Two estimation methods • Di Gangi et al., aspects of the first rib • Suchey-Brooks, most common, pubic bone http://www.bestcoloringpagesforkids.com/wp-content/uploads/2013/07/Skeleton-Coloring-Page.gif • 400 skeletons, the estimated and the actual age of death DiGangi Modified boxplot Min. -60.00 Q1 Median Q3 Max. -23.00 -13.00 -5.00 32.00 Mean • Mathematical notation: • … Greek letter capital sigma • means SUM in mathematics • Another measure of the center of the data: mean (average) • ππππ = data values # of data values • Data values: π₯1 , π₯2 , … , π₯π • ππππ = π₯ = π π=1 π₯π π Robust statistic Median = -13 Mean = -14.2 Mean is not a robust statistic. Median is a robust statistic. Trimmed mean Median = -13 Mean = -14.2 10% trimmed mean … eliminate upper and lower 10% of data (i.e. 40 points). 10% trimmed mean = mean of 320 middle data values = -13.8 Trimmed mean is more robust. 33 750 33 750 33 750 33 750 Salary o 25 players of the American football (NY red Bulls) in 2012. 44 000 44 000 44 000 44 000 45 566 65 000 95 000 103 500 112 495 138 188 141 666 181 500 185 000 190 000 194 375 195 000 205 000 292 500 301 999 4 600 000 5 600 000 median = 112 495 mean = 518 311 8% trimmed mean = 128 109 MEASURES OF VARIABILITY Navození atmosféry QUESTION Mean1 Mean2 Mode1 Mode2 Median1 Median2 range (variaΔní rozpΔtí) MAX - min Range Range changes when we add new data into dataset • Always • Sometimes • Never Adding Mark Zuckerberg Cut off data IQR, mezikvartilové rozpΔtí Interquartile range, IQR Let’ take this quiz, answer yes ot not. 1. About 50% of the data fall within the IQR. 2. The IQR is affected by every value in the data set. 3. The IQR is not affected by outliers. 4. The mean is always between Q1 and Q3. 0 1 1 1 2 2 2 2 2 3 3 3 90 Q1=1 Q2 Q3=3 Define outlier Sample $38,946 $43,420 $49,191 $50,430 $50,557 $52,580 $53,595 $54,135 $60,181 $10,000,000 Outlier < π1 − 1.5 × πΌππ OR Outlier > π3 + 1.5 × πΌππ What values are outliers for this data set? 1. $60,000 2. $80,000 3. $100,000 4. $200,000 Problem with IQR normal bimodal uniform Options for measuring variability • Find the average distance between all pairs of data values. • Find the average distance between each data value and either the max or the min. • Find the average distance between each data value and the mean. Average distance from mean Sample 10 5 3 2 19 1 7 11 1 1 Average distance from mean Sample 10 5 3 2 19 1 7 11 1 1 Deviation from mean (π₯π − π₯) Average distance from mean Sample Deviation from mean (π₯π − π₯) 10 4 5 -1 3 -3 2 -4 19 13 1 -5 7 1 11 5 1 -5 1 -5 (π₯π − π₯) = 0 Find the average distance between each data value and the mean. Preventing cancellation • How can we prevent the negative and positive deviations from cancelling each out? 1. 2. 3. 4. Ignore (i.e. delete) the negative sign. Multiply each deviation by two. Square each deviation. Take absolute value of each deviation.