Business Statistics: Module 2. Descriptive Statistics Page 1 of 6 Module 2. Descriptive Statistics: Numerical Characteristics Measures Descriptive Statistics tools used to summarize and present the gathered data in graphical, tabular, or numerical form so readers can easily understand it; its numerical measures are measures of central tendency, variability, and distribution of shapes. In this module, we will focus on the ungrouped data and numerical characteristics of a sample. Measures of Location/Central Tendency Central tendency – measures how the data are gathered around the central value Mean – most common measure of central tendency; average value of a given set of data; serves as the fulcrum or balance point in a set of data; describes what is typical in a set of data; all values pay an equal role; its greatly affected by extremely high or extremely low values. Mean (x bar) = Ʃx n Ʃ = sum of or total x = a value n = no. of sample or observations A survey among randomly select 10 students was conducted to determine their daily allowances (in pesos). Compute the average daily allowance of the students using the gathered data shown below 100 150 150 100 200 70 140 170 90 120 First thing to do is add all the values and then divide it by the number of samples X bar = 100 + 150 + 150 + 100 + 200 + 70 + 140 + 170 + 90 + 120 = 1290 = 129 10 10 We can say that the average daily allowance of the 10 students was Php129 Median – middle value in a given set of data, arranged from lowest to highest; .it is not affected by extremely high or low values provided there is no change on the number of data Using the given data on daily allowances, determine the median daily allowance 100 150 150 100 200 70 140 170 90 120 First thing you need to do is arrange the value from lowest to highest 70 90 100 100 120 140 150 150 170 Then determine the median rank = (n + 1) = (10 + 1) = 11 = 5.5 = 130 2 2 2 200 Business Statistics: Module 2. Descriptive Statistics Page 2 of 6 Get the average of the ranks 5 and 6 values= (120 + 140) = 260 = 130 2 2 The median daily allowance of the 10 students was Php130 Mode – simplest measure of central tendency; the value which occur most frequency in a set of data; No mode – no value was repeated in a set of data; if there is no repeated value, DO NOT write 0 nor leave the item blank, write None or No Mode Unimodal – one value occurs more frequently in a set of data Bimodal – two modes are in a set of data Multimodal – more than two modes are in a set of data The data on daily allowances of the 10 students has two modes, 100 and 150 because both appeared twice in the set of data. Measures of Variation/Dispersion Variation – measures how the values are scattered or dispersed from the central average Range – also called as spread; simplest measure of variation; represents the distance or difference of the highest and lowest values in a set of data Range (R) = Highest value (H) – Lowest value (L) Using the data given in measures of central tendency, let us measure the variation. A survey among randomly select 10 students was conducted to determine their daily allowances (in pesos). Compute for the range. 100 150 150 100 200 70 140 170 90 120 Range = 200 – 70 = 130 The daily allowances of the 10 students were spread by P130. Interquartile range (IQR) – known as mid-spread; measures the dispersion between the third and first quartiles Interquartile range (IQR) = third quartile (Q3) – first quartile (Q1) Before we can calculate the IQR, we need to determine the third and first quartiles ranks and values; and the formula are: Q3 = 3(n+1) 4 Q1 = (n+1) 4 n = number of values or sample in a set of data Business Statistics: Module 2. Descriptive Statistics Page 3 of 6 We also need to arrange the data from lowest to highest. Using the data on the 10 students’ daily allowances, let us arrange it accordingly; and determine the interquartile range. 70 90 100 100 120 140 150 150 170 200 Q3 = 3(10+1) = 8.25 rank to know the Q3 value, get the 8th value (150) and add it to 4 the product of 0.25 and the difference between 9th (170) and 8th (150) values Q3 = 150 + 0.25(170-150) = 155 Q3 value Q1 = (10+1) = 2.75 rank 4 to know the Q1 value, get the 2nd value (90) and add it to the product of 0.75 and the difference between 3 rd (100) and 2nd (90) values Q1 = 90 + 0.75(100-90) = 97.5 Q1 value IQR = Q3 – Q1 = 155 – 97.5 = 57.5 The interquartile range of the students’ daily allowance was P57.5. Variance (S2) – sum of the squared difference of x-value and the mean and divide it by number of samples minus 1. S2 = Ʃ(x – xbar)2 (n-1) x = any value in a set of data xbar = mean Ʃ = total or summation n = number of samples Let’s compute the variance using the data on students’ daily allowance x 70 90 100 100 120 140 150 150 170 200 Ʃ 1,290 xbar = 1290 = 129 10 (x – xbar)2 ( 70 – 129)2 = 3481 ( 90 – 129)2 = 1521 (100 – 129)2 = 841 (100 – 129)2 = 841 2 (120 – 129) = 81 (140 – 129)2 = 121 (150 – 129)2 = 441 (150 – 129)2 = 441 (170 – 129)2 = 1681 (200 – 129)2 = 5041 Ʃ 14490 2 S = 14490 = 1610 10 – 1 First, get the total of the all x values Next, get the mean or xbar by dividing total (1290) and number of values (in this example n is10). Get the squared difference of each x value and xbar and then get its total Next, divide total (14490) and no. of values – 1 (10 – 1) The quotient or result of the division is the variance. (1610) Standard Deviation (S) – the square root of the variance. Business Statistics: Module 2. Descriptive Statistics S= S2 = 1610 Page 4 of 6 = 40.12 The daily allowances of the 10 students deviates by P40.12 around the mean of P129. Coefficient of Variation (CV) – measure of variation expressed in percentage; measure the deviation compared with the mean CV = S X 100 xbar % CV = 40.12 X 100 % = 31.10% 129 The standard deviation (40.12) is 31.10% relative to the mean (129) of the daily allowances of the 10 students Z-score – also called standardized value; measures the location of a value in relation to its deviation from the mean; it identifies extreme value or outlier (value which is distant from the mean; z-scores beyond + 3.00 are considered outliers. Z = (x – xbar) S Let’s use the lowest and highest daily allowance data and calculate the lowest and higher z-scores in the said set of data Zlowest = (70 – 129) = -1.47 40.12 Zhighest = (200 – 129) = 1.77 40.12 Based on the computed z-cores, none of the values in the set of data is an outlier. Pattern of Distribution Shape – shows the pattern of distribution of a set of data, which can either be symmetrical or skewed Symmetrical – values are normally distributed; appearance of high and low values in a set of data are relatively equal most of the values are in the middle or near the mean and few high or low values bell-shaped mean and median values are equal zero skewness Skewed – imbalance/distorted distribution of values, either skewed to left or right Left skewed or negatively skewed – most of the values are high and some very low values appear in the set of data cause the distortion to the left tail of the shape; mean (xbar) is less than (<) median Business Statistics: Module 2. Descriptive Statistics Page 5 of 6 Right skewed or positively skewed – most of the values are low and some very high values appear in a set of data cause the skewness to the right tail of the shape; mean (xbar) is greater than (>) median Let’s determine the shape of data on daily allowance. The mean and median daily allowances of the 10 students were P129 and P130 respectively. As such, we can say that the data were negatively skewed because the mean of P129 is less than the median of P130. End of Module Questions 1. In your opinion, which one among the three central tendency measures is the best to use to analyze a set of data? Explain why you chose that particular measure. 2. Choose one best measure of variation to use; and explain why you prefer that particular one over the other measures. End of Module Exercises 1. The following values were from sample of seven (7): 10, 12, 15, 8, 10, 13, 9 a. Compute the mean, median, and mode. b. Compute the range, interquartile range, variance, standard deviation, coefficient of variation. c. Determine if there any outliers in the set of data. d. How are the set of data distributed? Present your basis. 2. Analyze the data taken from sample of 15: 3.5; 3.0; 3.75; 4.25; 4.0; 4.88; 3.25; 3.5; 4.0; 3.75; 3.98; 4.15; 4.0; 3.0; 4.0 a. Compute the mean, median, and mode. b. Compute the range, interquartile range, variance, standard deviation, coefficient of variation. c. Determine if there any outliers in the set of data. d. How are the set of data distributed? Present your basis. 3. A grade 7 student has been saving money every week for the last two months. Her weekly savings vary depending on her weekly school expenses. The amount of her weekly savings (in pesos) are as follows: 50, 60, 55, 40, 45, 70, 100, 80. a. Compute the mean, median, and mode. b. Compute the range, interquartile range, variance, standard deviation, coefficient of variation. c. Determine if there any outliers in the set of data. d. How are the set of data distributed? Present your basis. 4. A survey was conducted among 12 college students about the number of hours they spend every day on Facebook and playing online games. The results are shown on the table: Business Statistics: Module 2. Descriptive Statistics Facebook Online games 3.5 4.5 3 4 2.5 3.5 4 5 4.5 4 2 3.5 1.5 4 Page 6 of 6 3 4.5 2 3 2.5 3.5 3 2.5 3.5 3 As to students’ numbers of hours spend on facebook and online games, separately: a. Compute the mean, median, and mode. b. Compute the range, interquartile range, variance, standard deviation, coefficient of variation. c. Determine if there any outliers in the set of data. d. How are the set of data distributed? Present your basis. References Albright, S. et al. (2015). Business analytics: data analysis and decision making (5th ed). Cengage Learning. Anderson, D., Sweeney, D.J., et.al., (2018). Modern business statistics. Australia: Cengage Learning. Antivola, H. (2015). Business statistics: a modular approach. Books Atbp. Publishing. Anywhere Math. (2016). Introduction to Statistics. https://www.youtube.com/watch?v=LMSyiAJm99g. Berenson, M.L., Levine, D.M., & Krehbiel, T.C. (2015). Basic business statistics: concepts and applications. Pearson Education Sou7th Asia Pte. Ltd. Bowerman, B. (2017). Business statistics in practice: using modeling, data, and analytics (8th ed.). McGraw-Hill Education. Jaggia, S. (2019). Business statistics: communicating with numbers (3rd ed.). McGrawHill Education. Lee, N. (2016). Business statistics: using excel & SPSS. Sage. Simple Learning Pro. (2015). Mean, median, mode, range, and standard deviation. https://www.youtube.com/watch?v=mk8tOD0t8M0. Sharpe, N. (2015). Business statistics 3rd ed. Pearson Education. Willoughby, D. (2015). An essential guide to business statistics. John Wiley & Sons.