Matakuliah Tahun : A0392-Statistik Ekonomi : 2006 Pertemuan 02 Ukuran Numerik Deskriptif 1 Outline Materi: • Ukuran Pemusatan • Ukuran Variasi • Ukuran Posisi (Letak) 2 Basic Business Statistics Numerical Descriptive Measures 3 Chapter Topics • Measures of Central Tendency – Mean, Median, Mode, Geometric Mean • Quartile • Measure of Variation – Range, Interquartile Range, Variance and Standard Deviation, Coefficient of Variation • Shape – Symmetric, Skewed, Using Box-and-Whisker Plots 4 Chapter Topics (continued) • The Empirical Rule and the BienaymeChebyshev Rule • Coefficient of Correlation • Pitfalls in Numerical Descriptive Measures and Ethical Issues 5 Summary Measures Summary Measures Central Tendency Mean Quartile Mode Median Range Variation Coefficient of Variation Variance Geometric Mean Standard Deviation 6 Measures of Central Tendency Central Tendency Mean Median Mode n X X i 1 i Geometric Mean X G X1 X 2 n Xn 1/ n N X i 1 i N 7 Mean (Arithmetic Mean) • Mean (Arithmetic Mean) of Data Values – Sample mean Sample Size n X X1 X 2 X – Population mean n n i 1 i Population Size N X i 1 N Xn i X1 X 2 N XN 8 Mean (Arithmetic Mean) (continued) • The Most Common Measure of Central Tendency • Affected by Extreme Values (Outliers) 0 1 2 3 4 5 6 7 8 9 10 Mean = 5 0 1 2 3 4 5 6 7 8 9 10 12 14 Mean = 6 9 Mean (Arithmetic Mean) (continued) • Approximating the Arithmetic Mean – Used cwhen raw data are not available mj f j j 1 – X n n sample size c number of classes in the frequency distribution m j midpoint of the jth class f j frequencies of the jth class 10 Median • Robust Measure of Central Tendency • Not Affected by Extreme Values 0 1 2 3 4 5 6 7 8 9 10 Median = 5 0 1 2 3 4 5 6 7 8 9 10 12 14 Median = 5 • In an Ordered Array, the Median is the ‘Middle’ Number – If n or N is odd, the median is the middle number 11 Mode • • • • • • A Measure of Central Tendency Value that Occurs Most Often Not Affected by Extreme Values There May Not Be a Mode There May Be Several Modes Used for Either Numerical or Categorical Data 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 2 3 4 5 6 No Mode 12 Geometric Mean • Useful in the Measure of Rate of Change of a Variable Over Time X G X1 X 2 Xn 1/ n • Geometric Mean Rate of Return – Measures the status of an investment over time RG 1 R1 1 R2 1 Rn 1/ n 1 13 Example An investment of $100,000 declined to $50,000 at the end of year one and rebounded back to $100,000 at end of year two: R1 0.5 (or 50%) R2 1 (or 100% ) Average rate of return: ( 0.5) (1) R 0.25 (or 25%) 2 Geometric rate of return: RG 1 0.5 1 1 1/ 2 0.5 2 1/ 2 1 1 11/ 2 1 0 (or 0%) 14 Quartiles • Split Ordered Data into 4 Quarters 25% 25% Q1 25% Q2 • Position of i-th Quartile 25% Q3 i n 1 Qi 4 Data in Ordered Array: 11 12 13 16 16 17 18 21 22 1 9 1 12 13 Position of Q1 2.5 Q1 12.5 4 2 • Q1 and Q3 are Measures of Noncentral Location • Q2 = Median, a Measure of Central Tendency 15 Measures of Variation Variation Variance Range Population Variance Sample Variance Interquartile Range Standard Deviation Population Standard Deviation Sample Standard Deviation Coefficient of Variation 16 Range • Measure of Variation • Difference between the Largest and the Smallest Observations: Range X Largest X Smallest • Ignores How Data are Distributed Range = 12 - 7 = 5 Range = 12 - 7 = 5 7 8 9 10 11 12 7 8 9 10 11 12 17 Interquartile Range • Measure of Variation • Also Known as Midspread – Spread in the middle 50% • Difference between the First and Third Quartiles Data in Ordered Array: 11 12 13 16 16 17 17 18 21 Interquartile Range Q3 Q1 17.5 12.5 5 • Not Affected by Extreme Values 18 Variance • Important Measure of Variation • Shows Variation about the Mean – Sample Variance: n S 2 X i 1 n 1 – Population Variance: N 2 X i 2 X i 1 i N 2 19 Standard Deviation • Most Important Measure of Variation • Shows Variation about the Mean • Has the Same Units as the Original Data – Sample Standard Deviation: S n X i 1 X i n 1 N X – Population Standard Deviation: 2 i 1 i N 2 20 Standard Deviation • Approximating the Standard Deviation – Used when the raw data are not available and the only source of data is a frequency distribution m j X f j c S 2 j 1 n 1 n sample size c number of classes in the frequency distribution m j midpoint of the jth class f j frequencies of the jth class 21 Comparing Standard Deviations Data A 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 3.338 Data B 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = .9258 Data C 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.57 22 Coefficient of Variation • Measure of Relative Variation • Always in Percentage (%) • Shows Variation Relative to the Mean • Used to Compare Two or More Sets of Data Measured in Different Units • S CV X 100% • Sensitive to Outliers 23 Shape of a Distribution • Describe How Data are Distributed • Measures of Shape – Symmetric or skewed Left-Skewed Mean < Median < Mode Symmetric Mean = Median =Mode Right-Skewed Mode < Median < Mean 24 Exploratory Data Analysis • Box-and-Whisker – Graphical display of data using 5-number summary X smallest Q 1 4 6 Median( Q2) 8 Q3 10 Xlargest 12 25 Distribution Shape & Box-and-Whisker Left-Skewed Q1 Q2 Q3 Symmetric Q1Q2Q3 Right-Skewed Q1 Q2 Q3 26 The Empirical Rule • For Most Data Sets, Roughly 68% of the Observations Fall Within 1 Standard Deviation Around the Mean • Roughly 95% of the Observations Fall Within 2 Standard Deviations Around the Mean • Roughly 99.7% of the Observations Fall Within 3 Standard Deviations Around the Mean 27 The Bienayme-Chebyshev Rule • The Percentage of Observations Contained Within Distances of k Standard Deviations Around the Mean Must Be at Least – Applies regardless of the shape ofthe set 1 1/data k 2 100% – At least 75% of the observations must be contained within distances of 2 standard deviations around the mean – At least 88.89% of the observations must be contained within distances of 3 standard deviations around the mean – At least 93.75% of the observations must be contained within distances of 4 standard 28 deviations around the mean Coefficient of Correlation • Measures the Strength of the Linear Relationship between 2 Quantitative Variables n • r X i 1 n X i 1 i i X Yi Y X 2 n Y Y i 1 2 i 29 Features of Correlation Coefficient • Unit Free • Ranges between –1 and 1 • The Closer to –1, the Stronger the Negative Linear Relationship • The Closer to 1, the Stronger the Positive Linear Relationship • The Closer to 0, the Weaker Any Linear Relationship 30 Scatter Plots of Data with Various Correlation Coefficients Y Y Y X r = -1 X r = -.6 Y X r=0 Y r = .6 X r=1 X 31