Uploaded by Ntuthuko Ndlovu

Chapter 2 2.5 - 2.6

advertisement
Business Analytics, 5e
Chapter 2 – Descriptive Statistics
Camm, Cochran, Fry, Ohlmann, Business Analytics, 5th Edition. © 2024 Cengage Group. All Rights Reserved.
May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Email
cmpg222.potch@gmail.com
© 2024 Cengage Group. All Rights Reserved.
Alias is the same as
your preferred name
Aliases
35025662
40997375
41166906
41877632
41211944
41443640
40903702
43633455
© 2024 Cengage Group. All Rights Reserved.
Textbook
© 2024 Cengage Group. All Rights Reserved.
Practical Help
• Download the practical and open the Word document
• Questions to be answered are in the Word document
• Question 6 has a figure labelled "DATA file - ceotime" on the left
• Download "Data_files.zip" from Resources for the FEMS group.
• Unzip the file and go to "Chapter 02" to find the Excel file for "ceotime.“
• The same applies to the other questions
• Place all answers in a single Excel file as indicated in the assessment
© 2024 Cengage Group. All Rights Reserved.
Practical Help (Continued)
© 2024 Cengage Group. All Rights Reserved.
Chapter Contents
2.1
Overview of Using Data: Definitions and Goals
2.2
Types of Data
2.3
Exploring Data in Excel
2.4
Creating Distributions from Data
2.5
Measures of Location
2.6
Measures of Variability
2.7
Analyzing Distributions
2.8
Measures of Association Between Two Variables
Summary
© 2024 Cengage Group. All Rights Reserved.
Learning Objectives (1 of 3)
After completing this chapter, you will be able to:
LO 2-1 Identify and describe different data types, including population
and sample data, quantitative and categorical data, and crosssectional and time-series data.
LO 2-2 Generate insights through sorting, filtering, and conditional
formatting data.
LO 2-3 Construct and interpret frequency, relative frequency, and percent
frequency distributions for categorical data.
LO 2-4 Construct and interpret frequency, relative frequency, and percent
frequency distributions for quantitative data.
LO 2-5 Construct and interpret histograms and frequency polygons to
visualize the distribution of quantitative data.
© 2024 Cengage Group. All Rights Reserved.
Learning Objectives (2 of 3)
LO 2-6
Construct and interpret cumulative frequency, cumulative relative
frequency, and cumulative percent frequency distributions for
quantitative data.
LO 2-7 Interpret the shape of a distribution of data and identify positive
skewness, negative skewness, and symmetric distributions.
LO 2-8 Calculate and interpret measures of location such as the mean,
median, geometric mean, and mode.
LO 2-9 Calculate and interpret measures of variability such as the range,
variance, standard deviation, and coefficient of variation.
LO 2-10 Analyze and interpret distributions of data using percentiles,
quartiles, z-scores, and the empirical rule.
© 2024 Cengage Group. All Rights Reserved.
Learning Objectives (3 of 3)
LO 2-11
LO 2-12
LO 2-13
LO 2-14
Identify outliers in a set of data.
Construct and interpret a boxplot.
Create and interpret a scatter chart for two quantitative variables.
Calculate and interpret the covariance and correlation coefficient
for two quantitative variables.
© 2024 Cengage Group. All Rights Reserved.
2.5 Mean
The most common measure of central location is the mean, the average of
all the data values. The population mean is denoted by the Greek letter, .
For a sample with n observations, mean is computed as follows.
𝒊
where
is the th observation
DATAfile: homesales
The mean home selling price for the sample of 12 home sales is:
In Excel, the value for the mean in is calculated using =AVERAGE(B2:B13).
© 2024 Cengage Group. All Rights Reserved.
2.5 Median
The median is the value in the middle of a data set when data are arranged
in ascending order.
To compute the median, arrange the data in ascending order. Then
a. if n is odd, the median is the middle value
b. if n is even, the median is the average of the two middle values
For the home sales data,
is even, and the median is computed as the
average of the 6th and 7th (middle) values.
Because extremely small and large data values influence the mean, the
median is the preferred measure of central location for highly skewed data.
© 2024 Cengage Group. All Rights Reserved.
2.5 Mode
The mode of a data set is the value that occurs with the greatest frequency.
In Excel, we can find the mode using the MODE.SNGL function.
The greatest frequency may occur at two or more different values. In these
instances, more than one mode exists.
• If the data have exactly two modes, the data are said to be bimodal.
• If the data have more than two modes, the data are said to be
multimodal.
In Excel, we can find multiple modes using the MODE.MULT function.
• For the home sales data, =MODE.MULT(B2:B13) returns two modes.
© 2024 Cengage Group. All Rights Reserved.
2.5 Geometric Mean
The geometric mean is a measure of central location calculated by finding
the th root of the product of values.
The general formula for the geometric mean, denoted
𝒏
𝒈
𝟏
𝟐
𝒏
𝟏
𝟐
𝒏
, follows.
𝟏⁄ 𝒏
The geometric mean is often used in analyzing growth rates in financial data
(where using the arithmetic mean will provide misleading results.)
It should be applied any time you want to determine the mean rate of change
over several successive periods (be it years, quarters, weeks, etc.)
Other common applications include changes in populations of species, crop
yields, pollution levels, and birth and death rates.
© 2024 Cengage Group. All Rights Reserved.
2.5 An Application of the Geometric Mean
DATAfile: mutualfundreturns
With a percentage annual return for year 1 of −22.1%,
the balance in the fund at the end of year 1 is
We refer to 0.779 as the growth factor for year 1.
Generalizing the results, at the end of year 10, the
initial investment would be worth
Year
1
2
3
4
5
6
7
8
9
10
Thus, the fund average annual return is (see notes for the
`
© 2024 Cengage Group. All Rights Reserved.
Return (%) Growth Factor
-22.1
0.779
28.7
1.287
10.9
1.109
4.9
1.049
15.8
1.158
5.5
1.055
-37.0
0.630
26.5
1.265
15.1
1.151
2.1
1.021
Excel formula)
2.6 Measures of Variability
It is often desirable to consider measures of variability, or dispersion).
Consider the annual payouts for two different investment funds, A and B.
Although the mean payout
is the same for the two
funds, their histograms
differ because the payouts
associated with Fund B
have greater variability.
In this section, we present
several ways to measure
variability.
© 2024 Cengage Group. All Rights Reserved.
2.6 Range
The range is the simplest measure of variability, and it is defined as
Range = Largest Value – Smallest Value
For the home sales data, the range is
In Excel, the range is computed using the MAX and MIN functions.
=MAX(B2,B13)−MIN(B2,B13)
However, the range sensitivity to extreme data values makes it a poor choice
to measure the dispersion in a data set.
© 2024 Cengage Group. All Rights Reserved.
2.6 Variance
The variance is a measure of variability that utilizes all the data.
The variance is based on the deviation about the mean, written as
.
In most statistical applications, when we compute a sample variance, we are
often interested in using it to estimate the unknown population variance.
For a random sample, if the sum of the squared deviations about the sample
mean is divided by
, and not , the resulting sample variance provides
an unbiased estimate of the population variance.
For this reason, the sample variance, denoted by
𝟐
𝒊
𝟐
© 2024 Cengage Group. All Rights Reserved.
, is defined as follows.
2.6 Computation of the Variance
Consider the data on the class size from five
college classes:
𝒊
46 54 42 46 32
The table shows the computations of the
squared deviations about the mean,
The sample variance is computed as:
.
𝒊
𝟐
𝒊
46
44
2
4
54
44
10
100
42
44
-2
4
46
44
2
4
32
44
-12
144
0
256
In Excel, the sample variance is computed using the formula VAR.S. For the
home sales data, we have =VAR.S(B2,B13) = 9,037,501,420.
© 2024 Cengage Group. All Rights Reserved.
2.6 Standard Deviation
The positive square root of the variance is the standard deviation.
The sample standard deviation, , is a point estimate of the population
standard deviation, , and is derived from the sample variance as follows:
𝟐
Because of the square root, the variance,
in our example,
is converted to
in the standard deviation.
• The standard deviation always has the same units as the original data.
In Excel, the sample standard deviation is computed using the formula
STDEV.S. For the home sales data, we have =STDEV.S(B2,B13) =
$95,065.77.
© 2024 Cengage Group. All Rights Reserved.
2.6 Coefficient of Variation
The coefficient of variation, usually expressed as a percentage, measures
how large the standard deviation is relative to the mean.
Standard Deviation
Mean
For the class size example,
variation is
and
students. The coefficient of
In words, the coefficient of variation tells us that the sample standard deviation
is 18.2% of the value of the sample mean.
For the home sales data example, the coefficient of variation is
© 2024 Cengage Group. All Rights Reserved.
Summary
•
•
•
•
In this chapter, we have introduced descriptive statistics to summarize data.
We began by defining data types and data sources.
We presented several useful functions for modifying data in Excel.
We introduced the concept of a distribution and explained how to describe it
using different interpretations of the frequency of counts and visualize it.
• We then introduced measures of location, such as mean and median, and
variability, such as variance and standard deviation.
• We also presented additional measures for analyzing data distributions.
• Finally, we discussed how to visualize the relationship between two
variables and how to measure their linear association using covariance and
correlation coefficient.
© 2024 Cengage Group. All Rights Reserved.
Download