Business Analytics, 5e Chapter 2 – Descriptive Statistics Camm, Cochran, Fry, Ohlmann, Business Analytics, 5th Edition. © 2024 Cengage Group. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Email cmpg222.potch@gmail.com © 2024 Cengage Group. All Rights Reserved. Alias is the same as your preferred name Aliases 35025662 40997375 41166906 41877632 41211944 41443640 40903702 43633455 © 2024 Cengage Group. All Rights Reserved. Textbook © 2024 Cengage Group. All Rights Reserved. Practical Help • Download the practical and open the Word document • Questions to be answered are in the Word document • Question 6 has a figure labelled "DATA file - ceotime" on the left • Download "Data_files.zip" from Resources for the FEMS group. • Unzip the file and go to "Chapter 02" to find the Excel file for "ceotime.“ • The same applies to the other questions • Place all answers in a single Excel file as indicated in the assessment © 2024 Cengage Group. All Rights Reserved. Practical Help (Continued) © 2024 Cengage Group. All Rights Reserved. Chapter Contents 2.1 Overview of Using Data: Definitions and Goals 2.2 Types of Data 2.3 Exploring Data in Excel 2.4 Creating Distributions from Data 2.5 Measures of Location 2.6 Measures of Variability 2.7 Analyzing Distributions 2.8 Measures of Association Between Two Variables Summary © 2024 Cengage Group. All Rights Reserved. Learning Objectives (1 of 3) After completing this chapter, you will be able to: LO 2-1 Identify and describe different data types, including population and sample data, quantitative and categorical data, and crosssectional and time-series data. LO 2-2 Generate insights through sorting, filtering, and conditional formatting data. LO 2-3 Construct and interpret frequency, relative frequency, and percent frequency distributions for categorical data. LO 2-4 Construct and interpret frequency, relative frequency, and percent frequency distributions for quantitative data. LO 2-5 Construct and interpret histograms and frequency polygons to visualize the distribution of quantitative data. © 2024 Cengage Group. All Rights Reserved. Learning Objectives (2 of 3) LO 2-6 Construct and interpret cumulative frequency, cumulative relative frequency, and cumulative percent frequency distributions for quantitative data. LO 2-7 Interpret the shape of a distribution of data and identify positive skewness, negative skewness, and symmetric distributions. LO 2-8 Calculate and interpret measures of location such as the mean, median, geometric mean, and mode. LO 2-9 Calculate and interpret measures of variability such as the range, variance, standard deviation, and coefficient of variation. LO 2-10 Analyze and interpret distributions of data using percentiles, quartiles, z-scores, and the empirical rule. © 2024 Cengage Group. All Rights Reserved. Learning Objectives (3 of 3) LO 2-11 LO 2-12 LO 2-13 LO 2-14 Identify outliers in a set of data. Construct and interpret a boxplot. Create and interpret a scatter chart for two quantitative variables. Calculate and interpret the covariance and correlation coefficient for two quantitative variables. © 2024 Cengage Group. All Rights Reserved. 2.5 Mean The most common measure of central location is the mean, the average of all the data values. The population mean is denoted by the Greek letter, . For a sample with n observations, mean is computed as follows. 𝒊 where is the th observation DATAfile: homesales The mean home selling price for the sample of 12 home sales is: In Excel, the value for the mean in is calculated using =AVERAGE(B2:B13). © 2024 Cengage Group. All Rights Reserved. 2.5 Median The median is the value in the middle of a data set when data are arranged in ascending order. To compute the median, arrange the data in ascending order. Then a. if n is odd, the median is the middle value b. if n is even, the median is the average of the two middle values For the home sales data, is even, and the median is computed as the average of the 6th and 7th (middle) values. Because extremely small and large data values influence the mean, the median is the preferred measure of central location for highly skewed data. © 2024 Cengage Group. All Rights Reserved. 2.5 Mode The mode of a data set is the value that occurs with the greatest frequency. In Excel, we can find the mode using the MODE.SNGL function. The greatest frequency may occur at two or more different values. In these instances, more than one mode exists. • If the data have exactly two modes, the data are said to be bimodal. • If the data have more than two modes, the data are said to be multimodal. In Excel, we can find multiple modes using the MODE.MULT function. • For the home sales data, =MODE.MULT(B2:B13) returns two modes. © 2024 Cengage Group. All Rights Reserved. 2.5 Geometric Mean The geometric mean is a measure of central location calculated by finding the th root of the product of values. The general formula for the geometric mean, denoted 𝒏 𝒈 𝟏 𝟐 𝒏 𝟏 𝟐 𝒏 , follows. 𝟏⁄ 𝒏 The geometric mean is often used in analyzing growth rates in financial data (where using the arithmetic mean will provide misleading results.) It should be applied any time you want to determine the mean rate of change over several successive periods (be it years, quarters, weeks, etc.) Other common applications include changes in populations of species, crop yields, pollution levels, and birth and death rates. © 2024 Cengage Group. All Rights Reserved. 2.5 An Application of the Geometric Mean DATAfile: mutualfundreturns With a percentage annual return for year 1 of −22.1%, the balance in the fund at the end of year 1 is We refer to 0.779 as the growth factor for year 1. Generalizing the results, at the end of year 10, the initial investment would be worth Year 1 2 3 4 5 6 7 8 9 10 Thus, the fund average annual return is (see notes for the ` © 2024 Cengage Group. All Rights Reserved. Return (%) Growth Factor -22.1 0.779 28.7 1.287 10.9 1.109 4.9 1.049 15.8 1.158 5.5 1.055 -37.0 0.630 26.5 1.265 15.1 1.151 2.1 1.021 Excel formula) 2.6 Measures of Variability It is often desirable to consider measures of variability, or dispersion). Consider the annual payouts for two different investment funds, A and B. Although the mean payout is the same for the two funds, their histograms differ because the payouts associated with Fund B have greater variability. In this section, we present several ways to measure variability. © 2024 Cengage Group. All Rights Reserved. 2.6 Range The range is the simplest measure of variability, and it is defined as Range = Largest Value – Smallest Value For the home sales data, the range is In Excel, the range is computed using the MAX and MIN functions. =MAX(B2,B13)−MIN(B2,B13) However, the range sensitivity to extreme data values makes it a poor choice to measure the dispersion in a data set. © 2024 Cengage Group. All Rights Reserved. 2.6 Variance The variance is a measure of variability that utilizes all the data. The variance is based on the deviation about the mean, written as . In most statistical applications, when we compute a sample variance, we are often interested in using it to estimate the unknown population variance. For a random sample, if the sum of the squared deviations about the sample mean is divided by , and not , the resulting sample variance provides an unbiased estimate of the population variance. For this reason, the sample variance, denoted by 𝟐 𝒊 𝟐 © 2024 Cengage Group. All Rights Reserved. , is defined as follows. 2.6 Computation of the Variance Consider the data on the class size from five college classes: 𝒊 46 54 42 46 32 The table shows the computations of the squared deviations about the mean, The sample variance is computed as: . 𝒊 𝟐 𝒊 46 44 2 4 54 44 10 100 42 44 -2 4 46 44 2 4 32 44 -12 144 0 256 In Excel, the sample variance is computed using the formula VAR.S. For the home sales data, we have =VAR.S(B2,B13) = 9,037,501,420. © 2024 Cengage Group. All Rights Reserved. 2.6 Standard Deviation The positive square root of the variance is the standard deviation. The sample standard deviation, , is a point estimate of the population standard deviation, , and is derived from the sample variance as follows: 𝟐 Because of the square root, the variance, in our example, is converted to in the standard deviation. • The standard deviation always has the same units as the original data. In Excel, the sample standard deviation is computed using the formula STDEV.S. For the home sales data, we have =STDEV.S(B2,B13) = $95,065.77. © 2024 Cengage Group. All Rights Reserved. 2.6 Coefficient of Variation The coefficient of variation, usually expressed as a percentage, measures how large the standard deviation is relative to the mean. Standard Deviation Mean For the class size example, variation is and students. The coefficient of In words, the coefficient of variation tells us that the sample standard deviation is 18.2% of the value of the sample mean. For the home sales data example, the coefficient of variation is © 2024 Cengage Group. All Rights Reserved. Summary • • • • In this chapter, we have introduced descriptive statistics to summarize data. We began by defining data types and data sources. We presented several useful functions for modifying data in Excel. We introduced the concept of a distribution and explained how to describe it using different interpretations of the frequency of counts and visualize it. • We then introduced measures of location, such as mean and median, and variability, such as variance and standard deviation. • We also presented additional measures for analyzing data distributions. • Finally, we discussed how to visualize the relationship between two variables and how to measure their linear association using covariance and correlation coefficient. © 2024 Cengage Group. All Rights Reserved.