2ndEnglish Descriptive Statistics REVIEW OF CH. 3;4 Measure Raw Data x x Mean Grouped Data x1 x2 ..... xn n xi n X X i 1 n n f i 1 fi i i X i : is the midpoint of the 𝑖 class. f i :is the frequency in that class. 1- sort the data n f1 2 MD A L f 2 f1 2- calculate: -odd sample: [ (𝑛+1) 2 ]th. A is the Iower limit of the class of the median. (boundaries) -even sample: Median ([ (𝑛) 2 (𝑛) ]th + [ 2 f1 is the cumulative number of frequencies in all the classes before the class of the median. +1]th)/2 f 2 is the cumulative number of frequencies in all the n: the sample size. classes after the class of the median. L is the width of the class 1 2ndEnglish Descriptive Statistics Mode A f f1 L 2 f f1 f 2 A is the Iower limit of the class containing the mode. (boundaries ) Mode The value that occurs most often. f is the large number of frequencies. f1 is the number of frequency preceding the class containing the mode. f 2 is the number of frequency which following the class containing the mode. L is the width of the class. 1- sort the data 2- calculate: (𝑛+1) (𝑛+1) - if [ 4 ]th integer, Q1=[ 4 ]th. - otherwise, Q1= L+F×(U-L). L: the integer part of [ 1st quartile (Q1) (𝑛+1) th ] 4 U: the round above of [ (𝑛+1) th ] 4 F: the fraction part of [ (𝑛+1) . ] 4 Ex: 2;5;7;8;9;11 (𝑛+1) th ] 4 [ =[1.75]th n f1 MD A 4 L f 2 f1 A is the Iower limit of the class of the Q1 ( (boundaries ) f1 is the cumulative number of frequencies in all the n classes before the class of the Q1 ( ). 4 f 2 is the cumulative number of frequencies in all the n classes after the class of the Q1 ( ). 4 L is the width of the class Q1= 2+.75×(5-2)= 4.25. 2 n ). 4 2ndEnglish Descriptive Statistics 1- sort the data 2- calculate: 3(𝑛+1) 3(𝑛+1) - if [ 4 ]th integer, Q1=[ 4 ]th. - otherwise, Q1= L+F×(U-L). Ex: 2;5;7;8;9;11 3st quartile (Q3) 3(𝑛+1) [ 4 ]th 3n f1 MD A 4 L f 2 f1 A is the Iower limit of the class of the Q3 ( 3n ). 4 f1 is the cumulative number of frequencies in all the 3n classes before the class of the Q3 ( ). 4 =[5.25]th Q1= 9+.25×(11-9)= 9.5. f 2 is the cumulative number of frequencies in all the 3n classes after the class of the Q3 ( ). 4 L is the width of the class. Midrange (Lowest + Highest)/2 (Lowest boundary + Highest boundary)/2 n n S2 ( X i X )2 S2 i 1 n 1 (X i 1 x )2 fi n f i 1 Sampling variance i i 1 X i : is the midpoint of the 𝑖 class. f i :is the frequency in that class. Range (Highest - Lowest) 3 (Highest boundary - Lowest boundary) 2ndEnglish Descriptive Statistics Q Q3 Q1 Interquartile range Semi-interquartile range Q S 100% ; S=√𝑺𝟐 . X coefficient of variation (relative measure) Range Rule of Thumb Q3 Q1 2 S Range ; Chebyshev’s Theorem when k=2. 4 4 2ndEnglish Descriptive Statistics Important Notes 1- 2Properties of the Mean Uses all data values. Varies less than the median or mode Used in computing other statistics, such as the variance Unique, usually exists in data values 5 2ndEnglish Descriptive Statistics Affected by extremely high or low values, called outliers Cannot be used for nominal or ordinal data Properties of the Median Not uses all data values. Affected less than the mean by extremely high or extremely low values. Can not be used for nominal data Properties of the Mode Easiest measure to compute Can be used with nominal data Not always unique or may not exist Properties of the Midrange Easy to compute. Affected by extremely high or low values in a data set 6 2ndEnglish Descriptive Statistics 3Chebyshev’s Theorem (Empirical Rule) 𝑝(𝜇 − 𝑘𝜎 < 𝑥 < 𝜇 + 𝑘𝜎) ≥ 1 − 1 ; 𝑘 > 1. 𝑘2 #of standard Minimum Proportion within k Minimum deviations ,k standard deviations within k standard deviations 2 1 1 3 4 4 75% 3 1 1 8 9 9 88.89% 4 1 1 15 16 16 93.75% 7 Percentage 2ndEnglish Descriptive Statistics EX: The mean price of houses in a certain neighborhood is $50,000, and the standard deviation is $10,000. 1-Find the price range for which at least 55% of the houses will sell. 2- Find the price range for which at least 75% of the houses will sell. 1- Chebyshev’s Theorem states that at least 55% of a data set will fall within 1.5 standard deviations of the mean. Lowestvaule 50000 1.5 10000 35000 highestvaule 50000 1.5 10000 65000 2- Chebyshev’s Theorem states that at least 75% of a data set will fall within 2 standard deviations of the mean. Lowestvaule 50000 2 10000 30000 highestvaule 50000 2 10000 70000 Note: there is –ve relation between the accuracy and the estimated range. 8 2ndEnglish Descriptive Statistics In the case that the shape of the distribution for the data is roughly bell-shaped, the Empirical Rule states that: The interval: (μ - σ , μ+σ) will contain approximately 68% of all the measurements. The interval: (μ - 2σ, μ+2σ) will contain approximately 95% of all the measurements. The interval: (μ - 3σ, μ+ 3σ) will contain approximately 99.7% of all the measurements. EX: A survey of local companies found that the expenditures on traveling for individuals were $0.25 per month. The standard deviation was 0.025$. Using Chebyshev’s theorem, 1- Find the minimum percentage of the individuals expenditures that will fall between $0.20 and $0.30. 2- Assuming the population individuals is bell-shaped, find the minimum percentage of the individuals expenditures that will fall between $0.20 and $0.30. 9 2ndEnglish Descriptive Statistics 1-Compute the value of k .30 .25 2 or .025 .25 .20 K 2 .025 K At least 75% of the individuals expenditures will fall between $0.20 and $0.30. 2- At least 95% of the individuals expenditures will fall between $0.20 and $0.30. 10