Introduction/Review Chapters 1-4 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved. Learning Objectives Chapter 1 1 Understand descriptive and inferential statistics. 2 Understand the differences between a sample and a population. 3 Understand the relationship between variable and data. 4 Understand types of data. Chapter 2 1 Construct frequency table/frequency distribution for a dataset. 2 Understand a relative frequency distribution. 3 Present data from a frequency distribution in a histogram. 4 Understand cumulative frequency distribution 1-2 Learning Objectives Chapter 3 1 Identify and compute the mean. 2 Explain and apply measures of dispersion. 3 Compute variance and standard deviation for a dataset. Chapter 4 1 Understand relationship between two variables. Create and interpret a scatter pt. 7 Develop and explain a contingency table. 3-3 What is Statistics? Chapter 1 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved. Population versus Sample A population is a collection of all possible individuals, objects, or measurements of interest. A sample is a portion, or part, of the population of interest 1-5 Inferential Statistics Inferential Statistics: A decision, estimate, prediction, or generalization about a population, based on a sample. Note: In statistics the words population and sample have a broader meaning. A population or sample may consist of individuals or objects 1-6 Why take a sample instead of studying every member of the population? 1. 2. Prohibitive cost of census Not possible to test or inspect all members of a population being studied 1-7 Variables and data A variable is some characteristic of a population or sample. The values of the variable are the possible observations of the variable. The value of a variable varies from one observation to another. Example: the mark on a statistics exam. Data are observed values of a variable. 1-8 Types of Data Interval: real numbers. Also called quantitative or numerical heights, weights, incomes Nominal: categories. Also called qualitative or categorical Observations of a qualitative variable can only be classified and counted. marital status, gender Ordinal: categories but can be ordered. rating of a program 1-9 Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation Chapter 2 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved. Describing Data with Tables and Graphs - Example The Applewood Auto Group (AAG)sells a wide range of vehicles through its four dealerships. Ms. Kathryn Ball, a member of the senior management team at AAG, is responsible for tracking and analyzing vehicle sales and the profitability of those vehicles. Kathryn would like to summarize the profit earned on the vehicles sold with tables, charts, and graphs that she would review monthly. She wants to know the profit per vehicle sold, as well as the best and highest amount of profit. Partial data for 180 customers are shown on the table on the right. 2-11 Frequency Table/Frequency Distribution Class: A class is an interval of numbers. Frequency Table for Profits on Cars Sold Last Month at Applewood Auto Group by cation All classes in a frequency distribution should cover the complete range of observations. Classes should be of the same length. In Excel, class is referred to as Bin. Class frequency: The number of observations in each class. 2-12 Constructing a Frequency Table Step 1: Decide on the number of classes. A useful recipe to determine the number of classes (k) is the “2 to the k rule.” such that 2k > n, where n is the sample size/number of observations. There were 180 vehicles sold, so n = 180. If we try k = 7, then 27 = 128, somewhat less than 180. Hence, 7 is not enough classes. If we let k = 8, then 28 = 256, which is greater than 180. So the recommended number of classes is 8. Step 2: Determine the class interval or width. The formula is: i (H-L)/k where i is the class interval, H is the highest observed value ($3,92), L is the west observed value ($294), and k is the number of classes (8). Round up to some convenient number like $400 2-13 Constructing a Frequency Table - Example Step 3: Set the individual class limits Step 4: Count the number of items in each class. 2-14 Relative Frequency Distribution • A relative frequency distribution is obtained by dividing each of the class frequencies by the total number of observations. • A relative frequency captures the relationship between a class total and the total number of observations. TABLE 2–8 Relative Frequency Distribution of Profit for Vehicles 2-15 Graphic Presentation of a Frequency Distribution Histograms Cumulative frequency distributions 2-16 Histogram HISTOGRAM A graph in which the classes are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars and the bars are drawn adjacent to each other. 2-17 Histogram Using Excel Open data Applewood Determine the number of classes: 2 to the k rule-> 2k > n Determine the width of classes: 28 = 256 > 180. So the recommended number of classes is 8. Find the maximum and minimum of the data: =Max(data range), =min(data range). Max=3,292; min=294. ≈400 In a new column type the upper limits of the class intervals: 200, 600, 1000, …, 3400, we call them Bin. Click Data, Data Analysis, and Histogram. Specify the Input Range (B3: B182) and the Bin Range (the upper limits you just entered). Click Chart Output. Click Labels if the first row contains names—not in this case though. Click OK, and you will see a frequency table and a histogram for the data. Follow steps 3. d-g on page 54 in the textbook to make further changes to the graph. Histogram Using Excel 2-19 Cumulative Frequency Distribution 2-20 Cumulative Frequency Distribution 2-21 Describing Data: Numerical Measures Chapter 3 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved. Parameter Versus Statistics PARAMETER A measurable characteristic of a population. STATISTIC A measurable characteristic of a sample. 3-23 Notations Mean Variance Standard Deviation Proportion Population Sample 𝜇 𝑋 𝜎2 𝑠2 𝜎 𝑝 s 𝑝 Also called standard error Mean—Measures of location The purpose of a measure of location is to pinpoint the center of a distribution of data. Population mean 𝜇 is usually unknown. Sample mean 𝑋 is calculated by summing the values and dividing by the sample size. 3-25 EXAMPLE – Sample Mean 3-26 Dispersion—Variance and Standard Deviation The mean only describes the center of the data; it does not tell us anything about the spread of the data. The dispersion in a set of data can be used to compare the spread in two or more distributions. The variances (var) and standard deviations (sd) are nonnegative. The population variance and standard deviation are usually unknown. 3-27 Sample Variance and Standard Deviation X 2 2 1 X n n 1 s2 Where : s 2 is the sample variance and s is the sample standard deviation X is the value of each observation in the sample X is the mean of the sample n is the sample size 3-28 EXAMPLE—Formula 1 The hourly wages for a sample of part-time employees at Home Depot are: $12, $20, $16, $18, and $19. What are the sample variance and the sample standard deviation? s 10 3.16 dollars 3-29 EXAMPLE—Formula 2 The hourly wages for a sample of part-time employees at Home Depot are: $12, $20, $16, $18, and $19. What are the sample variance and the sample standard deviation? Hourly Wage (X) $12 20 16 18 19 Total $85 X2 $144 400 256 324 361 $1,485 1 1 2 2 X ( X ) 1485 (85) 2 n 5 s2 10 n 1 5 1 s 10 3.16 dollars 3-30 Excel Sample mean: =average(data range) Sample variance: =var(data range) Sample standard deviation: =stdev(data range) Illustration: Applewood Describing Data: Displaying and Exploring Data Chapter 4 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved. Describing Relationship between Two Variables When we study the relationship between two variables we refer to the data as bivariate. One graphical technique we use to show the relationship between variables is called a scatter diagram. To draw a scatter diagram we need two variables. We scale one variable along the horizontal axis (X-axis) of a graph and the other variable along the vertical axis (Y-axis). 4-33 Describing Relationship between Two Variables – Scatter Diagram Examples The relationship between the auction price and the odometer reading of cars The relationship between the age of bus and the yearly maintenance cost. 4-34 Describing Relationship between Two Variables – Scatter Diagram Excel Example In the example of the Applewood Auto Group, we gathered information concerning several variables, including the profit earned from the sale of 180 vehicles sold last month and the age of the purchaser. Is there a relationship between the profit earned on a vehicle sale and the age of the purchaser? Would it be reasonable to conclude that the more expensive vehicles are purchased by older Buyers? 4-35 Describing Relationship between Two Variables – Scatter Diagram Excel Example •The scatter diagram shows a rather weak positive relationship. •We will study the relationship between variables more extensively later. •Example Applewood Excel instruction: textbook, p.136, #7. 4-36