Introduction/Review
Chapters 1-4
McGraw-Hill/Irwin
Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Learning Objectives
Chapter 1
1 Understand descriptive and inferential statistics.
2 Understand the differences between a sample and a population.
3 Understand the relationship between variable and data.
4 Understand types of data.
Chapter 2
1 Construct frequency table/frequency distribution for a dataset.
2 Understand a relative frequency distribution.
3 Present data from a frequency distribution in a histogram.
4 Understand cumulative frequency distribution
1-2
Learning Objectives
Chapter 3
1 Identify and compute the mean.
2 Explain and apply measures of dispersion.
3 Compute variance and standard deviation for a
dataset.
Chapter 4
1 Understand relationship between two variables.
Create and interpret a scatter pt.
7 Develop and explain a contingency table.
3-3
What is Statistics?
Chapter 1
McGraw-Hill/Irwin
Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Population versus Sample
A population is a collection of all possible individuals, objects, or
measurements of interest.
A sample is a portion, or part, of the population of interest
1-5
Inferential Statistics
Inferential Statistics: A decision, estimate,
prediction, or generalization about a
population, based on a sample.
Note: In statistics the words population and sample have a
broader meaning. A population or sample may consist of
individuals or objects
1-6
Why take a sample instead of
studying every member of the
population?
1.
2.
Prohibitive cost of census
Not possible to test or inspect all
members of a population being studied
1-7
Variables and data
A variable is some characteristic of a population
or sample.



The values of the variable are the possible observations of the
variable.
The value of a variable varies from one observation to another.
Example: the mark on a statistics exam.
Data are observed values of a variable.
1-8
Types of Data

Interval: real numbers. Also called quantitative or
numerical
 heights,

weights, incomes
Nominal: categories. Also called qualitative or
categorical
 Observations
of a qualitative variable can only be
classified and counted.
 marital status, gender

Ordinal: categories but can be ordered.
 rating
of a program
1-9
Describing Data:
Frequency Tables, Frequency
Distributions, and Graphic
Presentation
Chapter 2
McGraw-Hill/Irwin
Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Describing Data with Tables
and Graphs - Example


The Applewood Auto Group (AAG)sells a
wide range of vehicles through its four
dealerships. Ms. Kathryn Ball, a member
of the senior management team at AAG,
is responsible for tracking and analyzing
vehicle sales and the profitability of those
vehicles. Kathryn would like to
summarize the profit earned on the
vehicles sold with tables, charts, and
graphs that she would review monthly.
She wants to know the profit per vehicle
sold, as well as the best and highest
amount of profit.
Partial data for 180 customers are shown
on the table on the right.
2-11
Frequency Table/Frequency
Distribution
Class: A class is an interval of
numbers.
Frequency Table for Profits on
Cars Sold Last Month at
Applewood Auto Group by cation
 All classes in a frequency
distribution should cover the
complete range of
observations.
 Classes should be of the
same length.
 In Excel, class is referred to
as Bin.
Class frequency: The number
of observations in each class.
2-12
Constructing a Frequency Table

Step 1: Decide on the number of classes.
A useful recipe to determine the number of classes (k) is the “2 to the
k rule.” such that 2k > n, where n is the sample size/number of
observations.
There were 180 vehicles sold, so n = 180. If we try k = 7, then 27 = 128,
somewhat less than 180. Hence, 7 is not enough classes. If we let k = 8,
then 28 = 256, which is greater than 180. So the recommended number of
classes is 8.

Step 2: Determine the class interval or width.
The formula is: i  (H-L)/k where i is the class interval, H is the
highest observed value ($3,92), L is the west observed value ($294),
and k is the number of classes (8).
Round up to some convenient number like $400
2-13
Constructing a Frequency Table - Example

Step 3: Set the
individual class limits

Step 4: Count the
number of items in
each class.
2-14
Relative Frequency Distribution
• A relative frequency distribution is obtained by dividing each of
the class frequencies by the total number of observations.
• A relative frequency captures the relationship between a class
total and the total number of observations.
TABLE 2–8 Relative Frequency Distribution of Profit for Vehicles
2-15
Graphic Presentation of a Frequency
Distribution

Histograms
Cumulative frequency distributions
2-16
Histogram
HISTOGRAM A graph in which the classes are marked on the
horizontal axis and the class frequencies on the vertical axis. The
class frequencies are represented by the heights of the bars and
the bars are drawn adjacent to each other.
2-17
Histogram Using Excel


Open data Applewood
Determine the number of classes: 2 to the k rule-> 2k > n


Determine the width of classes:







28 = 256 > 180. So the recommended number of classes is 8.
Find the maximum and minimum of the data: =Max(data range), =min(data range).
Max=3,292; min=294.
≈400
In a new column type the upper limits of the class intervals: 200, 600, 1000,
…, 3400, we call them Bin.
Click Data, Data Analysis, and Histogram.
Specify the Input Range (B3: B182) and the Bin Range (the upper limits
you just entered). Click Chart Output. Click Labels if the first row contains
names—not in this case though.
Click OK, and you will see a frequency table and a histogram for the data.
Follow steps 3. d-g on page 54 in the textbook to make further changes to
the graph.
Histogram Using Excel
2-19
Cumulative Frequency
Distribution
2-20
Cumulative Frequency Distribution
2-21
Describing Data:
Numerical Measures
Chapter 3
McGraw-Hill/Irwin
Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Parameter Versus Statistics
PARAMETER A measurable characteristic of a population.
STATISTIC A measurable characteristic of a sample.
3-23
Notations
Mean
Variance
Standard Deviation
Proportion
Population Sample
𝜇
𝑋
𝜎2
𝑠2
𝜎
𝑝
s
𝑝
Also called standard error
Mean—Measures of location
The purpose of a measure of location is to
pinpoint the center of a distribution of data.
 Population mean 𝜇 is usually unknown.
 Sample mean 𝑋 is calculated by summing
the values and dividing by the sample
size.

3-25
EXAMPLE – Sample Mean
3-26
Dispersion—Variance and
Standard Deviation
 The
mean only describes the center of the data; it does
not tell us anything about the spread of the data.
 The dispersion in a set of data can be used to compare
the spread in two or more distributions.
 The variances (var) and standard deviations (sd) are
nonnegative.
 The population variance and standard deviation are
usually unknown.
3-27
Sample Variance and Standard
Deviation

X
2
2
1

X

n
n 1

 s2
Where :
s 2 is the sample variance and s is the sample standard deviation
X is the value of each observation in the sample
X is the mean of the sample
n is the sample size
3-28
EXAMPLE—Formula 1
The hourly wages for a
sample of part-time
employees at Home
Depot are: $12, $20,
$16, $18, and $19.
What are the sample
variance and the sample
standard deviation?
s  10  3.16 dollars
3-29
EXAMPLE—Formula 2
The hourly wages for a
sample of part-time
employees at Home
Depot are: $12, $20,
$16, $18, and $19.
What are the sample
variance and the sample
standard deviation?
Hourly Wage
(X)
$12
20
16
18
19
Total
$85
X2
$144
400
256
324
361
$1,485
1
1
2
2
X

(
X
)
1485

(85) 2


n
5
s2 

 10
n 1
5 1
s  10  3.16 dollars
3-30
Excel
Sample mean: =average(data range)
 Sample variance: =var(data range)
 Sample standard deviation:
=stdev(data range)
Illustration: Applewood

Describing Data:
Displaying and
Exploring Data
Chapter 4
McGraw-Hill/Irwin
Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Describing Relationship between Two
Variables



When we study the relationship between two variables we refer to the
data as bivariate.
One graphical technique we use to show the relationship between
variables is called a scatter diagram.
To draw a scatter diagram we need two variables. We scale one
variable along the horizontal axis (X-axis) of a graph and the other
variable along the vertical axis (Y-axis).
4-33
Describing Relationship between Two
Variables – Scatter Diagram Examples
The relationship between the
auction price and the odometer
reading of cars
The relationship between the age
of bus and the yearly
maintenance cost.
4-34
Describing Relationship between Two Variables –
Scatter Diagram Excel Example
In the example of the Applewood Auto Group, we gathered
information concerning several variables, including the profit
earned from the sale of 180 vehicles sold last month and the age
of the purchaser.
Is there a relationship between the profit earned on a vehicle sale
and the age of the purchaser?
Would it be reasonable to conclude that the more expensive
vehicles are purchased by older Buyers?
4-35
Describing Relationship between Two Variables –
Scatter Diagram Excel Example
•The scatter diagram shows
a rather weak positive
relationship.
•We will study the
relationship between
variables more extensively
later.
•Example Applewood
Excel instruction:
textbook, p.136, #7.
4-36