Uploaded by maxom2247

behavioral-sciences-stat-2 compress

advertisement
S T U D E N T
T E S T E D ,
F A C U LT Y
A P P R O V E D
THE PROCESS
Like all 4LTR Press solutions, Behavioral Sciences STAT begins and ends with student and
faculty feedback. For the Statistics for the Behavioral Sciences course, here’s the process
we used:
Conduct research with students on their challenges and learning preferences.
As psychology majors, students taking statistics for the behavioral sciences expressed a need
for a broad base of examples and applications. Specifically, they were looking for resources
that would help them apply key formula in a relevant, accessible fashion.
Develop the ideal product mix with students to address each course’s needs.
SHOW
The first 4-color product in the statistics for the behavioral sciences course, Behavioral
Sciences STAT offers students a visually engaging experience that makes the material more
accessible. Additionally, Review and Tech cards provide students with a convenient resource
to use in class and after graduation.
Share student feedback and validate product mix with faculty.
TEST
Adopters of the first edition found that Behavioral Sciences STAT supported the way they teach
by providing an efficient presentation of the concepts with current examples. Discussions were
richer, because students came to class better prepared having read the chapter.
Publish a Student-Tested, Faculty-Approved solution.
Faculty broadly endorse our studenttested, faculty-approved approach,
but suggest a title change from
Marketing To Go to MKTG, officially
launching the 4LTR Press brand.
First
adoption
of MKTG.
Early adopters embrace
consistent approach,
adopt multiple 4LTR
Press solutions to drive
better outcomes.
Our first
adoption of
20+ titles
at a single
school.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
JUNE 2009
Marks the 1 millionth dollar
saved by students.
JANUARY 2009
Our title count grows to 8
solutions across business
disciplines.
2008
MKTG publishes
and launches a new
debate about how
best to engage
today’s students.
APRIL 2007
Student
conversations
begin.
BY 2008
Behavioral Sciences STAT delivers an engaging mixture of print and digital tools that expose
students to a variety of applications to prepare them to be professionals, while supporting
instructors with a suite of tracking and assessment resources.
FALL 2006
SPRING 2006
4LTR PRESS TIMELINE
WORK
MARCH 2007
CONTINUOUSLY IMPROVING
MEET
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
This is an electronic version of the print textbook. Due to electronic rights restrictions,
some third party content may be suppressed. Editorial review has deemed that any suppressed
content does not materially affect the overall learning experience. The publisher reserves the right
to remove content from this title at any time if subsequent rights restrictions require it. For
valuable information on pricing, previous editions, changes to current editions, and alternate
formats, please visit www.cengage.com/highered to search by ISBN#, author, title, or keyword for
materials in your areas of interest.
Important Notice: Media content referenced within the product description or the product text
may not be available in the eBook version.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Behavioral Sciences STAT2
Gary Heiman
Product Director: Jon-David Hague
Product Manager: Timothy Matray
Content Developer: Thomas Finn
Content Coordinator: Jessica Alderman
Product Assistant: Nicole Richards
Media Developer: Jasmin Tokatlian
© 2015, 2012 Cengage Learning
WCN: 02-200-203
ALL RIGHTS RESERVED. No part of this work covered by the copyright herein
may be reproduced, transmitted, stored, or used in any form or by any means
graphic, electronic, or mechanical, including but not limited to photocopying,
recording, scanning, digitizing, taping, Web distribution, information networks,
or information storage and retrieval systems, except as permitted under
Section 107 or 108 of the 1976 United States Copyright Act, without the prior
written permission of the publisher.
Brand Manager: Jennifer Levanduski
Market Development Manager: Christine Sosa
Content Project Manager: Michelle Clark
Art Director: Jennifer Wahi
Manufacturing Planner: Karen Hunt
Rights Acquisitions Specialist: Roberta Broyer
For product information and technology assistance, contact us at
Cengage Learning Customer & Sales Support, 1-800-354-9706.
For permission to use material from this text or product,
submit all requests online at www.cengage.com/permissions.
Further permissions questions can be e-mailed to
permissionrequest@cengage.com.
Production Service: Integra
Photo and Text Researcher: PreMedia Global
Library of Congress Control Number: 2013936603
Copy Editor: Integra
ISBN-13: 978-1-285-45814-4
Text and Cover Designer: Trish Knapke
ISBN-10: 1-285-45814-1
Cover Image: Cheryl Graham/iStockPhoto
Compositor: Integra
Cengage Learning
200 First Stamford Place, 4th Floor
Stamford, CT 06902
USA
Cengage Learning is a leading provider of customized learning solutions with
office locations around the globe, including Singapore, the United Kingdom,
Australia, Mexico, Brazil, and Japan. Locate your local office at
www.cengage.com/global.
Cengage Learning products are represented in Canada by
Nelson Education, Ltd.
To learn more about Cengage Learning Solutions, visit www.cengage.com.
Purchase any of our products at your local college store or at our preferred
online store www.cengagebrain.com.
Printed in the United States of America
1 2 3 4 5 6 7 17 16 15 14 13
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
For my wife Karen, the love of my life
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
BRIEF CONTENTS
1
Introduction to Statistics and Research 2
2
Creating and Using Frequency Distributions 20
3
Summarizing Scores with Measures of Central Tendency 36
4
Summarizing Scores with Measures of Variability 52
5
Describing Data with z-Scores and the Normal Curve 68
6
Using Probability to Make Decisions about Data 88
7
Overview of Statistical Hypothesis Testing: The z-Test 106
8
Hypothesis Testing Using the One-Sample t-Test 126
9
Hypothesis Testing Using the Two-Sample t-Test 140
10
Describing Relationships Using Correlation and Regression 162
11
Hypothesis Testing Using the One-Way Analysis of Variance 184
12
Understanding the Two-Way Analysis of Variance 202
13
Chi Square and Nonparametric Procedures 218
Appendix A: Math Review and Additional Computing Formulas 234
Appendix B: Statistical Tables 252
Appendix C: Answers to Odd-Numbered Study Problems 264
Index 276
iv
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CONTENTS
1
Introduction to Statistics
and Research 2
1-1 Learning about Statistics 3
1-3 Understanding Relationships 7
1-4 Applying Descriptive and Inferential Statistics 10
1-5 Understanding Experiments and Correlational
Studies 11
1-6 The Characteristics of Scores 15
© Vladimir L./Shutterstock.com
1-2 The Logic of Research 5
2
2-1 Some New Symbols and Terminology 21
2-2 Understanding Frequency Distributions 22
2-3 Types of Frequency Distributions 25
2-4 Relative Frequency and the Normal Curve 29
2-5 Understanding Percentile and Cumulative
Frequency 32
© Sai Yeung Chan/Shutterstock.com
Creating and Using
Frequency Distributions 20
3
3-1 Some New Symbols and Procedures 37
3-2 What Is Central Tendency? 37
3-3 Computing the Mean, Median, and Mode 39
3-4 Applying the Mean to Research 44
3-5 Describing the Population Mean 49
Jerry and Marcy Monkman/EcoPhotography.com/Alamy
Summarizing Scores with
Measures of Central Tendency 36
v
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4
Summarizing Scores with
Measures of Variability 52
4-1 Understanding Variability 53
4-2 The Range 55
4-3 The Sample Variance and Standard Deviation 55
©Stanth/Shutterstock.com
4-4 The Population Variance and
Standard Deviation 59
4-5 Summary of the Variance and Standard
Deviation 61
4-6 Computing the Formulas for Variance and
Standard Deviation 62
4-7 Statistics in the Research Literature: Reporting
Means and Variability 65
5
Describing Data with
z-Scores and the Normal
Curve 68
5-1 Understanding z-Scores 69
©iStockphoto.com/Casarsa
5-2 Using the z-Distribution to Interpret Scores 72
5-3 Using the z-Distribution to Compare Different
Variables 74
5-4 Using the z-Distribution to Compute Relative
Frequency 75
5-5 Using z-Scores to Describe Sample Means 79
6
Using Probability to Make
Decisions about Data 88
6-1 Understanding Probability 89
6-2 Probability Distributions 90
6-3 Obtaining Probability from the Standard Normal
Curve 92
6-4 Random Sampling and Sampling Error 94
Zentilia/Dreamstime.com
6-5 Deciding Whether a Sample Represents
a Population 96
vi
Contents
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
7
Overview of Statistical
Hypothesis Testing: The z-Test
106
7-2 Setting Up Inferential Procedures 108
7-3 Performing the z-Test 113
7-4 Interpreting Significant and Nonsignificant
Results 115
7-5 Summary of the z-Test 117
©iunewind/Shutterstock.com
7-1 The Role of Inferential Statistics in Research 107
7-6 The One-Tailed Test 118
7-7 Statistics in the Research Literature: Reporting
the Results 120
7-8 Errors in Statistical Decision Making 121
8
Hypothesis Testing Using
the One-Sample t-Test 126
8-1 Understanding the One-Sample t-Test 127
8-3 Interpreting the t-Test 133
8-4 Estimating m by Computing a Confidence
Interval 135
8-5 Statistics in the Research Literature:
Reporting t 138
©Yuri Arcurs/Shutterstock.com
8-2 Performing the One-Sample t-Test 128
9
Hypothesis Testing Using
the Two-Sample t-Test 140
9-1 Understanding the Two-Sample Experiment 141
9-2 The Independent-Samples t-Test 142
9-4 The Related-Samples t-Test 149
9-5 Performing the Related-Samples t-Test 152
9-6 Statistics in the Research Literature: Reporting a
Two-Sample Study 156
9-7 Describing Effect Size 156
Fancy Collection/SuperStock
9-3 Performing the Independent-Samples t-Test 144
Contents
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
vii
© Martin Holtkamp/Taxi Japan/Getty Images
10
Describing Relationships
Using Correlation and
Regression 162
10-1 Understanding Correlations 163
10-2 The Pearson Correlation Coefficient 171
10-3 Significance Testing of the Pearson r 174
10-4 Statistics in the Research Literature:
Reporting r 178
10-5 An Introduction to Linear Regression 178
© Influx Productions/Photodisc/Jupiterimages
10-6 The Proportion of Variance Accounted For: r 2 180
11
Hypothesis Testing
Using the One-Way Analysis of
Variance 184
11-1 An Overview of the Analysis of Variance 185
11-2 Components of the ANOVA 189
11-3 Performing the ANOVA 191
11-4 Performing the Tukey HSD Test 196
11-5 Statistics in the Research Literature: Reporting
ANOVA 198
11-6 Effect Size and Eta2 198
11-7 A Word about the Within-Subjects ANOVA 199
12
Understanding
the Two-Way Analysis of
Variance 202
12-1 Understanding the Two-Way Design 203
12-2 Understanding Main Effects 204
©Aeypix/Shutterstock.com
12-3 Understanding the Interaction Effect 207
viii
12-4 Completing the Two-Way ANOVA 209
12-5 Interpreting the Two-Way Experiment 214
Contents
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
13
Chi Square and
Nonparametric Procedures 218
13-1 Parametric versus Nonparametric Statistics 219
13-3 The One-Way Chi Square: The Goodness
of Fit Test 220
13-4 The Two-Way Chi Square: The Test
of Independence 224
13-5 Statistics in the Research Literature:
Reporting χ 2 229
13-6 A Word about Nonparametric Procedures for
Ordinal Scores 229
Nicholas Pavloff/Iconica/Getty Images
13-2 Chi Square Procedures 220
Appendix A: Math Review
and Additional Computing
Formulas 234
A-1 Review of Basic Math 234
A-2 Computing Confidence Intervals for the
Two-Sample t-Test 238
A-3 Computing the Linear Regression Equation 239
A-4 Computing the Two-Way Between-Subjects
ANOVA 241
A-5 Computing the One-Way Within-Subjects
ANOVA 247
Appendix B: Statistical Tables 252
Appendix C: Answers to
Odd-Numbered Study
Problems 264
Index 276
Contents
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
ix
Chapter
1
INTRODUCTION TO STATISTICS
AND RESEARCH
GOING
F O R WA R D
Your goals in this chapter are to learn:
• The logic of research and the purpose of statistical procedures.
• What a relationship between scores is.
• When and why descriptive and inferential procedures are used.
• What the difference is between an experiment and a correlational
study, and what the independent variable, the conditions, and the
dependent variable are.
• What the four scales of measurement are.
Sections
O
kay, so you’re taking a course in statistics. What does
this involve? Well, first of all, statistics involve math,
but if that makes you a little nervous, you can relax:
1-1
Learning about Statistics
1-2
The Logic of Research
1-3
Understanding
Relationships
and divide—and use a calculator. Also, the term statistics is often
Applying Descriptive and
Inferential Statistics
developed the statistical procedures you’ll be learning about. So
Understanding
Experiments and
Correlational Studies
and derivations, or doing other mystery math. You will simply
The Characteristics of
Scores
the answer. And don’t worry, there are not that many to learn, and
1-4
1-5
1-6
You do not need to be a math wizard to do well in
this course. You need to know only how to add, subtract, multiply,
shorthand for statistical procedures, and statisticians have already
you won’t be solving simultaneous equations, performing proofs
learn how to select the statistical procedure—the formula—that is
appropriate for a given situation and then compute and interpret
these fancy-sounding “procedures” include such simple things as
computing an average or drawing a graph. (A quick refresher in
2
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
© Vladimir L./Shutterstock.com
math basics is in Appendix A.1. If you can do that,
you’ll be fine.)
Instead of thinking of statistics as math problems, think of them as tools that psychologists and
other behavioral researchers employ when “analyzing” the results of their research. Therefore, for
you to understand statistics, your first step is to
understand the basics of research so that you can
see how statistics fit in. To get you started, in this
chapter we will discuss (1) what learning statistics
involves, (2) the logic of research and the purpose
of statistics, (3) the two major types of studies that
researchers conduct, and (4) the four ways that
researchers measure behaviors.
1-1 LEARNING ABOUT
STATISTICS
Why is it important to learn statistics? Statistical procedures are an important part of the research that forms
the basis for psychology and other behavioral sciences.
People involved with these sciences use statistics and statistical concepts every day. Even if you are not interested
in conducting research yourself, understanding statistics
is necessary for comprehending other people’s research
and for understanding your chosen field of study.
How do researchers use statistics? Behavioral
research always involves measuring behaviors. For
example, to study intelligence, researchers measure
the IQ scores of individuals, or to study memory, they
measure the number of things that people remember
or forget. We call these scores the data. Any study
typically produces a very large batch of data, and it
is at this point that researchers apply statistical procedures, because statistics help us to make sense out of
the data. We do this in four ways.
1. First, we organize the scores so that we can see
any patterns in the data. Often this simply involves
creating a table or graph.
2. Second, we summarize the data. Usually we don’t
want to examine each individual score in a study,
and a summary—such as the average score—
allows us to quickly understand the general
characteristics of the data.
3. Third, statistics communicate the results of a study.
You will learn the standard techniques and symbols we use to quickly and clearly communicate
results, especially in published research reports.
4. Finally, we use statistics to interpret what the
data indicate. All behavioral research is designed
to answer a question about a behavior and,
ultimately, we must decide what the data tell us
about that behavior.
Chapter 1: Introduction to Statistics and Research
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
3
is learning this code. Once you speak the language, much of the mystery of statistics will
evaporate. So learn (memorize) the terminology
by using the glossary in the page margins and the
other learning aids that are provided.
THE PURPOSE OF
STATISTICAL
PROCEDURES IS
TO MAKE SENSE OUT OF DATA.
You’ll see there are actually only a few different
ways that behavioral research is generally conducted,
and for each way, there are slightly different formulas that we use. Thus, in a nutshell, the purpose of
this course is to familiarize you with each research
approach, teach you the appropriate formulas for that
approach, and show you how to use the answers you
compute to make sense out of the data (by organizing,
summarizing, communicating, and interpreting).
Along the way, it is easy to get carried away and
concentrate on only the formulas and calculations.
However, don’t forget that statistics are a research
tool that you must learn to apply. Therefore, more
than anything else, your goal is to learn when to use
each procedure and how to interpret its answer.
1-1a Studying Statistics
The nature of statistics leads to some “rules” for how
to approach this topic and how to use this book.
•
You will be learning novel ways to think about
the information conveyed by numbers. You
need to carefully read and study the material,
and often you will need to read it again. Don’t
try to “cram” statistics. You won’t learn anything (and your brain will melt). You must
translate the new terminology and symbols into
things that you understand, and that takes time
and effort.
•
Don’t skip something if it seems difficult because
concepts and formulas build upon previous ones.
Following each major topic in a chapter, test
yourself with the in-chapter “Quick Practice.” If
you have problems with it, go back—you missed
something. (Also, the beginning of each chapter
lists what you should understand from previous
chapters. Make sure you do.)
•
Researchers use a shorthand “code” for describing statistical analyses and communicating
research results. A major part of learning statistics
4
•
The only way to learn statistics is to do statistics, so you must practice using the formulas and
concepts. Therefore, at the end of each chapter
are study questions that you should complete.
Seriously work on these questions. (This is the
practice test before the real test!) The answers to
the odd-numbered problems are in Appendix C,
and your instructor has the answers to the evennumbered problems.
•
At the end of this book are two tear-out “Review
Cards” for each chapter. They include: (1) a
Chapter Summary, with linkage to key vocabulary terms; (2) a Procedures and Formulas section,
where you can review how to use the formulas
and procedures (keep it handy when doing the
end-of-chapter study questions); and (3) a Putting
It All Together fill-in-the-blank exercise that
reviews concepts, procedures, and vocabulary.
(Complete this for all chapters to create a study
guide for the final exam.)
•
You cannot get too much practice, so also visit
the CourseMate website as described on the
inside cover of this book. A number of study tools
are provided for each chapter, including printable
flashcards, interactive crossword puzzles, and
more practice problems.
1-1b Using the SPSS Computer
Program
In this book we’ll use formulas to compute the answers
“by hand” so that you can see how each is produced.
Once you are familiar with statistics, though, you
will want to use a computer. One of the most popular
statistics programs is called SPSS. At the end of most
chapters in this book is a brief section relating SPSS
to the chapter’s procedures, and you’ll find step-bystep instructions on one of the Chapter Review Cards.
(Review Card 1.4 describes how to get started by
entering data.) These instructions are appropriate for
version 20 and other recent versions of SPSS. Establish
a routine of using the data from odd-numbered study
problems at the end of a chapter and checking your
answers in Appendix C.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
BRUSH UP ON
BASIC
MATH
TO
YOUR MATH SKILLS, CHECK OUT
THE REVIEW OF
IN APPENDIX A.1
ON PAGE 234.
But remember, computer programs do only what you
tell them to do. SPSS cannot decide which statistical
procedure to compute in a particular situation, nor
can it interpret the answer for you. You really must
learn when to use each statistic and what the answer
means.
1-2 THE LOGIC OF
RESEARCH
The goal of behavioral research is to understand the
“laws of nature” that apply to the behaviors of living organisms. That is, researchers assume that specific influences govern every behavior of all members
of a particular group. Although any single study is a
small step in this process, our goal is to understand
every factor that influences the behavior. Thus, when
researchers study such things as the mating behavior
of sea lions or social interactions among humans, they
are ultimately studying the laws of
nature.
The reason a study is a
small step is because nature
iss very complex. Therefore,
research involves a series of
translations that simplify things
so that we can examine a specific
influence on a specific behavior in
population
a specific situation. Then, using
The large group of
individuals to which
our findings, we generalize back
a law of nature
to the broader behaviors and
applies
laws we began with. For example,
sample A
here’s an idea for a simple study.
relatively small subset
Say that we think a law of nature
of a population
is that people must study informaintended to represent
the population
tion in order to learn it. We translate this into the more specific
participants
The individuals who
hypothesis that “the more you
are measured in a
study statistics, the better you’ll
sample
learn them.” Next, we will translate the hypothesis into a situation
where we can observe and measure specific people
who study specific material in different amounts, to
see if they do learn differently. Based on what we
observe, we have evidence for working back to the
general law regarding studying and learning.
The first part of this translation process involves
samples and populations.
1-2a Samples and Populations
When researchers talk of a behavior occurring
in nature, they say it occurs in the population. A
population is the entire group of individuals to
which a law of nature applies (whether all humans,
all men, all 4-year-old English-speaking children,
etc.). For our example, the population might be all
college students who take statistics. A population
usuallyy contains all possible members—past,
present, and future—so we usually consider it to be infinitely large.
However, to study an infinite popHow
ulation would
w
take roughly forever!
Instead, we
w study a sample from the
population. A sample is a relatively
populatio
small subset
sub
of a population that is
intended to represent, or stand in for,
the population.
p
Thus, we might
study the students in your statistics cl
class as a sample representing
the population
po
of all college students studying statistics. The individuals measured in a sample are
vidual
called the participants and it is
their scores that constitute
our data.
© iStockphoto.com/Andrzej Burak
But what
does it mean?
Chapter 1: Introduction to Statistics and Research
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
5
© iStockphoto.com/Aldo Murillo
example of the scores that everyone in the population
would obtain.
On the other hand, any sample can be unrepresentative and then it inaccurately reflects the population. The reason this occurs is simply due to random
chance—the “luck of the draw” of who we happen
to select for a sample. Thus, maybe, simply because
of who happened to enroll in your statistics class, it
contains some very unusual, atypical students who
are not at all like those in the population. If so, then
their behaviors and scores will mislead us about those
of the typical statistics student. Therefore, as you’ll
see, researchers always consider the possibility that
a conclusion about the population—about nature—
might be incorrect because it might be based on an
unrepresentative sample.
Researchers study the behavior of the individuals
in a sample by measuring specific variables.
6
1-2b Understanding Variables
We measure aspects of the situation that we think
influence a behavior, and we measure aspects of the
behavior itself. The aspects of the situation or behavior that we measure are called variables. A variable
is anything that can produce two or more different
scores. A few of the variables found in behavioral
research include characteristics of an individual, like
your age, race, gender, personality type, political affiliation, and physical attributes. Other variables measure your reactions, such as how anxious, angry, or
aggressive you are, or how attractive you think someone is. Sometimes variables reflect performance, such
as how hard you work at a task or how well you recall
a situation. And variables also measure characteristics
of a situation, like the amount of noise, light, or heat
that is present; the difficulty or attributes of a task; or
Erik Isakson/Getty Images
Although researchers ultimately discuss the behavior of individuals, in statistics we often go directly to
their scores. Thus, we will talk about the population
of scores as if we have already measured the behavior
of everyone in the population in a particular situation.
Likewise, we will talk about a sample of scores, implying that we have already measured our participants.
Thus, a population is the complete group of scores
that would be found for everyone in a particular situation, and a sample is a subset of those scores that we
actually measure in that situation.
The logic behind samples and populations is this:
We use the scores in a sample to infer—to estimate—
the scores we would expect to find in the population
if we could measure it. Then by translating the scores
back into the behaviors they reflect, we can infer the
behavior of the population. By describing the behavior of the population, we are describing how nature
works, because the population is the entire group to
which the law of nature applies. Thus, if we observe
that greater studying leads to better learning for the
sample of students in your statistics class, we will infer
that similar scores and behaviors would be found in
the population of all statistics students. This provides
evidence that, in nature, more studying does lead to
better learning.
Notice that the above assumes that a sample is
representative of the population. We discuss this issue
in later chapters, but put simply, the individuals in a
representative sample accurately reflect the individuals that are found in the population.
This means that then our inferences
variable Anything
about the scores and behaviors found
about a behavior or
in the population will also be accusituation that, when
measured, can produce
rate. Thus, if your class is representatwo or more different
tive of all college students, then the
scores
scores the class obtains are a good
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
how many others are present and the types of interactions you have with them.
Variables fall into two general categories. If a
score indicates the amount of a variable that is present,
the variable is a quantitative variable. A person’s
height, for example, is a quantitative variable. Some
variables, however, cannot be measured in amounts,
but instead classify or categorize an individual on
the basis of some characteristic. These variables are
called qualitative variables. A person’s gender, for
example, is qualitative because the “score” of male or
female indicates a quality, or category.
For our research on studying and learning statistics, say that to measure “studying,” we select the
variable of the number of hours that students spent
studying for a particular statistics test. To measure
“learning,” we select the variable of their performance on the test. After measuring participants’
scores on these variables, we examine the relationship
between them.
Table 1.1
Scores Showing a Relationship between the
Variables of Study Time and Test Grades
FYI: The data presented in this book are fictional. Any resemblance to
real data is purely a coincidence.
Student
Gary
Bo
Sue
Tony
Sidney
Ann
Rose
Lou
Study Time in Hours
1
1
2
2
3
4
4
5
Test Grades
F
F
D
D
C
B
B
A
1-3 UNDERSTANDING
RELATIONSHIPS
If nature relates those mental activities we call studying to those mental activities we call learning, then different amounts of learning should occur with different amounts of studying. In other words, there should
be a relationship between studying and learning. A
relationship is a pattern in which, as the scores on
one variable change, the scores on the other variable
change in a consistent manner. In our example, we
predict the relationship in which the longer you study,
the higher your test grade will be.
Say that we ask some students how long they studied for a test and their subsequent grades on the test.
We obtain the data in Table 1.1. To see the relationship, first look at those people who studied for 1 hour
and see their grade. Then look at those who studied
2 hours, and see that they had a different grade from
those studying 1 hour. And so on. These scores form
a relationship because as the study-time scores change
(increase), the test grades also change in a consistent
fashion (also increase). Further, when study-time
scores do not change (e.g., Gary and Bo both studied
for 1 hour), the grades also do not change (they both
received Fs). We often use the term association when
talking about relationships: Here, low study times are
associated with low test grades and high study times
are associated with high test grades.
In a relationship, as the scores on
one variable change, the scores
on the other variable change in
a consistent manner.
Because we see a relationship in these sample data,
we have evidence that in nature, studying and learning do operate as we hypothesized:
quantitative
The amount someone studies does
variable A
seem to make a difference in test
variable for which
grades. In the same way, whenever
scores reflect the
amount of the
a law of nature ties behaviors or
variable that is present
events together, then we’ll see that
qualitative
particular scores from one variable
variable A
are associated with particular scores
variable for which
from another variable so that a relascores reflect a quality
tionship is formed. Therefore, most
or category that is
present
research is designed to investigate
relationships, because relationships
relationship
A pattern between
are the tell-tale signs of a law at work.
two variables
A major use of statistical prowhere a change
cedures is to examine the scores in
in one variable is
accompanied by a
a relationship and the pattern they
consistent change in
form. The simplest relationships fit
the other
one of two patterns. Let’s call one
Chapter 1: Introduction to Statistics and Research
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
7
variable X and the other Y. Then, sometimes the relationship fits the description “the more you X, the
more you Y.” Examples of this include the following:
The more you study, the higher your grade; the more
alcohol you drink, the more you fall down; the more
often you speed, the more traffic tickets you receive;
and even that old saying “The bigger they are, the
harder they fall.”
At other times, the relationship fits the description
“the more you X, the less you Y.” Examples of this
include the following: The more you study, the fewer
the errors you make; the more alcohol you drink, the
less coordinated you are; the more you “cut” classes,
the lower your grades; and even that old saying “The
more you practice statistics, the less difficult they are.”
Relationships may also form more complicated
patterns where, for example, more X at first leads to
more Y, but beyond a certain point, even more X leads
to less Y. For example, the more you exercise the better you feel, until you reach a certain point, beyond
which more exercise leads to feeling less well, due to
pain and exhaustion.
Although the above examples involve quantitative
variables, we can also study relationships that involve
qualitative variables. For example, gender is a commonly studied qualitative variable. If you think of
being male or female as someone’s “score” on the gender variable, then we see a relationship when, as gender
scores change, scores on another variable also change.
For example, saying that men tend to be taller than
women is actually describing a relationship, because
as gender scores change (going from men to women),
their corresponding height scores tend to decrease.
1-3a The Consistency of a
Relationship
Table 1.1 showed a perfectly consistent association
between hours of study time and test grades: All
those who studied the same amount received the same
grade. In a perfectly consistent relationship, a score
on one variable is always paired with one and only
one score on the other variable. This makes for a very
clear, obvious pattern when you examine the data. In
the real world, however, not everyone who studies for
the same amount of time will receive the same test
grade. (Life is not fair.) A relationship can be present
even if there is only some degree of consistency. Then,
as the scores on one variable change, the scores on the
other variable tend to change in a consistent fashion.
This produces a less obvious pattern in the data.
8
Table 1.2
Scores Showing a Relationship between Study
Time and Number of Errors on Test
Student
Amy
Karen
Joe
Cleo
Jack
Maria
Terry
Manny
Chris
Sam
Gary
X Hours of Study
1
1
1
2
2
2
3
3
4
4
5
Y Errors on Test
12
13
11
11
10
9
9
10
9
8
7
For example, Table 1.2 presents a less consistent
relationship between the number of hours studied
and the number of errors made on the test. Notice
that the variables are also labeled X and Y. When
looking at a relationship, get in the habit of asking,
“As the X scores increase, do the Y scores change in
a consistent fashion?” Answer this by again looking
at one study-time score (at one X score) and seeing
the error scores (the Y scores) that are paired with it.
Then look at the next X score and see the Y scores
paired with it. Two aspects of the data in Table 1.2
produce a less consistent relationship: First, not
everyone who studies for a particular time receives
the same error score (e.g., 12, 13, and 11 errors are
all paired with 1 hour). Second, sometimes a particular error score is paired with different studying
scores (e.g., 11 errors occur with both 1 and 2 hours
of study). These aspects cause overlapping groups of
different error scores to occur at each study time, so
the overall pattern is harder to see. In fact, the greater
the differences among the group of Y scores at an X
and the more the Y scores overlap between groups,
the less consistent the relationship will be. Nonetheless, we still see the pattern where more studying
tends to be associated with lower error scores, so a
relationship is present. Essentially, one batch of error
scores occurs at one study-time score, but a different
batch of error scores tends to occur at the next studytime score.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Notice that the less consistent relationship above
still supports our original hypothesis about how nature
operates: We see that, at least to some degree, nature
does relate studying and test errors. Thus, we will always
examine the relationship in our data, no matter how
consistent it is. A particular study can produce anywhere
between a perfectly consistent relationship and no relationship. In Chapter 10 we will discuss in depth how
to describe and interpret the consistency of a particular
relationship. (As you’ll see, the degree of consistency in
a relationship is called its strength, and a less consistent
relationship is a weaker relationship.) Until then, it is
enough for you to simply know what a relationship is.
A relationship is present
(though not perfectly consistent)
if there tends to be a different
group of Y scores associated
with each X score. A relationship
is not present when virtually the
same batch of Y scores is paired
with every X score.
1-3b When No Relationship Is
Present
Table 1.3
At the other extreme, sometimes the scores from two
variables do not form a relationship. For example, say
that we had obtained the data shown in Table 1.3.
Here, no relationship is present because the error
scores paired with 1 hour are essentially the same as
the error scores paired with 2 hours, and so on. Thus,
virtually the same (but not identical) batch of error
scores shows up at each study time, so no pattern of
increasing or decreasing errors is present. These data
show that how long people study does not make a
consistent difference in their error scores. Therefore,
this result would not provide evidence that studying
and learning operate as we think.
Student
Amy
Karen
Joe
Cleo
Jack
Maria
Terry
Manny
Chris
Sam
Jane
Gary
X Hours of Study
1
1
1
2
2
2
3
3
3
4
4
4
Y Errors on Test
12
10
8
11
10
9
12
9
10
11
10
8
© iStockphoto.com/AAR Studio
Less studying may lead
to more errors ...
Scores Showing No Relationship between Hours
of Study Time and Number of Errors on Test
Chapter 1: Introduction to Statistics and Research
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
9
1-4 APPLYING DESCRIPTIVE
> Quick Practice
>
AND INFERENTIAL
STATISTICS
A relationship is present when, as the
scores on one variable change, the scores
on another variable tend to change in a
consistent fashion.
Statistics help us make sense out of data, and now you
can see that “making sense” means to understand the
scores and the relationship they form. However, because
we are always talking about samples and populations,
we separate statistical procedures into those that apply
to samples and those that apply to populations.
Descriptive statistics are procedures for organizing and summarizing sample data. The answers from
such procedures are often a single number that describes
important information about the scores. (When you
see descriptive, think describe.) A sample’s average, for
example, is an important descriptive statistic because
in one number we summarize all scores in the sample.
Descriptive statistics are also used to describe the relationship in sample data. For our study-time research,
for example, we’d want to know whether a relationship is present, how consistently errors decrease with
increased study time, and so on. (We’ll discuss the common descriptive procedures in the next few chapters.)
After describing the sample, we want to use that
information to estimate or infer the data we would
find if we could measure the entire population. However, we cannot automatically assume that the scores
and the relationship we see in the sample are what we
would see in the population: Remember, the sample
might be unrepresentative, so that it misleads us about
the population. Therefore, first we apply additional
statistical procedures. Inferential statistics are procedures for drawing inferences about the scores and
relationship that would be found in the population.
Essentially, inferential procedures help us to decide
whether our sample accurately represents the relationship found in the population. If it does, then, for
More Examples
Below, Sample A shows a perfect relationship: One Y
score occurs at only one X. Sample B shows a less consistent relationship: Sometimes different Ys occur at a
particular X, and the same Y occurs with different Xs.
Sample C shows no relationship: The same Ys tend to
show up at every X.
A
X
1
1
1
2
2
2
3
3
3
B
Y
20
20
20
25
25
25
30
30
30
C
X
1
1
1
2
2
2
3
3
3
Y
12
15
20
20
30
40
40
40
50
X
1
1
1
2
2
2
3
3
3
Y
12
15
20
20
12
15
20
15
12
For Practice
Which samples show a perfect, inconsistent, or no
relationship?
X
2
2
3
3
4
4
B
Y
4
4
6
6
8
8
X
80
80
85
85
90
90
C
Y
80
79
76
75
71
70
X
33
33
43
43
53
53
D
Y
28
20
27
20
20
28
X
40
40
45
45
50
50
Y
60
60
60
60
60
60
© iStockphoto.com/Rob Broek
A
> Answers
A: Perfect Relationship
B: Inconsistent Relationship
C and D: No Relationship
10
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
© Naypong/Shutterstock.com
© iStockphoto.com/Robert MacAusland
example, we would use the class average as an estimate of the average score we’d find in the population
of students. Or, we would use the relationship in our
sample to estimate how, for everyone, greater learning tends to occur with greater studying. (We discuss
inferential procedures in the second half of this book.)
population in a given situation, we are describing how
a law of nature operates.
1-5 UNDERSTANDING
1-4a Statistics versus Parameters
EXPERIMENTS AND
CORRELATIONAL STUDIES
Researchers use the following system so that we know
when we are describing a sample and when we are
describing a population. A number that describes an
aspect of the scores in a sample is called a statistic.
Thus, a statistic is an answer obtained from a descriptive procedure. We compute different statistics to
describe different aspects of the data, and the symbol
for each is a different letter from the English alphabet.
On the other hand, a number that describes an aspect
of the scores in the population is called a parameter.
Thus, a parameter is obtained when applying inferential procedures. The symbols for the different parameters are letters from the Greek alphabet.
For example, the average in your statistics class is
a sample average, a descriptive statistic that is symbolized by a letter from the English alphabet. If we then
estimate the average in the population, we are estimating a parameter, and the symbol for a population
average is a letter from the Greek alphabet.
After performing the appropriate descriptive and
inferential procedures, we stop being a “statistician”
and return to being a behavioral scientist: We interpret the results in terms of the underlying behaviors,
psychological principles, sociological influences, and
so on, that they reflect. This completes the circle,
because by describing the behavior of everyone in the
In research we can examine a relationship using a variety of different
kinds of studies. In other words, we
use different designs. The design
of a study is how it is laid out—
how many samples are examined,
how participants are selected and
tested, and so on. A study’s design is
important because different designs
require different descriptive and
inferential procedures. Recall that
your goal is to learn when to use
each statistical procedure and, in
part, that means learning the particular procedures that are appropriate for a particular design. (On
the tear-out cards in your book is
a decision tree for selecting procedures, which you should refer to as
you learn statistics.)
To begin, recognize that we
have two major types of designs
because we have two general ways
of demonstrating a relationship:
using experiments or using correlational studies.
descriptive
statistics
Procedures for
organizing and
summarizing sample
data
inferential
statistics
Procedures for
determining whether
sample data
accurately represent
the relationship in the
population
statistic A
number that
describes a sample
of scores; symbolized
by a letter from the
English alphabet
parameter
A number that
describes a
population of scores;
symbolized by a
letter from the Greek
alphabet
design The way
in which a study is
laid out
Chapter 1: Introduction to Statistics and Research
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
11
experiment
A study in which
one variable is
actively changed or
manipulated and scores
on another variable are
measured to determine
whether a relationship
occurs
independent
variable In an
experiment, a variable
manipulated by the
experimenter that
is hypothesized to
cause a change in the
dependent variable
condition An
amount or category
of the independent
variable that creates the
specific situation under
which participants’
scores on the
dependent variable are
measured
dependent
variable In an
Jetta Productions/Blend Images/Alamy
experiment, the
behavior or attribute
of participants that is
measured; expected to
be influenced by the
independent variable
1-5a Experiments
In an experiment, the researcher
actively changes or manipulates one
variable and then measures participants’ scores on another variable to
see if a relationship is produced. For
example, say that we study amount
of study time and test errors in an
experiment. We decide to compare
1, 2, 3, and 4 hours of study time,
so we select four samples of students. We have one sample study for
1 hour, administer the statistics test,
and count the number of errors each
participant makes. We have another
sample study for 2 hours, administer
the test, and count their errors, and
so on. Then we determine if we have
produced the relationship where, as
we increase study time, error scores
tend to decrease.
You must understand the components of an experiment and learn
their names.
CONDITIONS OF THE INDEPENDENT VARIABLE An
THE INDEPENDENT VARIABLE An
independent variable is the vari-
able that is changed or manipulated
by the experimenter. We manipulate
this variable because we assume that doing so will
cause the behavior and scores on the other variable to
change. Thus, in our example above, amount of study
time is our independent variable: We manipulate study
time because doing this should
cause participants’ error scores
to change in the predicted way.
(To prove that this variable
is actually the cause is a very
difficult task, which we’ll save
for an advanced discussion.
In the meantime, be cautious
when using the word cause.)
You can remember independent
because this variable occurs
independently of participants’
12
wishes (we’ll have some participants study for 4 hours
whether they want to or not).
Technically, a true independent variable is manipulated by doing something to participants. However,
there are many variables that an experimenter cannot manipulate in this way. For example, we might
hypothesize that growing older causes a change in
some behavior, but we can’t make some people be
20 years old and make others be 60 years old. Instead,
we would manipulate the variable by selecting one
sample of 20-year-olds and one sample of 60-yearolds. We will also call this type of variable an independent variable (although technically it is called a
quasi-independent variable). Statistically, we treat all
independent variables the same.
Thus, the experimenter is always in control of the
independent variable, either by determining what is
done to each sample or by determining a characteristic of the individuals in each sample. Therefore, a
participant’s “score” on the independent variable is
determined by the experimenter: Above, students in
the sample that studied 1 hour have a score of 1 on the
study-time variable; people in the 20-year-old sample
have a score of 20 on the age variable.
independent variable is the overall variable a researcher
manipulates, which is potentially composed of many
different amounts or categories. From these the
researcher selects the conditions. A condition is the
specific amount or category of the independent variable that creates the situation under which participants
are studied. Thus, although our independent variable is
amount of study time—which could be any amount—
our conditions involve 1, 2, 3, or
4 hours of study. Likewise, if
we compare 20-year-olds to
60-year-olds, then 20 and 60
are each a condition of the
independent variable of age.
THE DEPENDENT VARIABLE
The dependent variable
is the variable that measures
a behavior or attribute of participants that we expect will be
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
influenced by the indepenTable 1.4
dent variable. Therefore,
Summary of Identifying an Experiment’s Components
we measure participants’
scores on the dependent
Researcher’s
Name of
Amounts of
variable in each condition.
Activity
Role of Variable
Variable
Variable Present
You can remember depenResearcher
dent because whether a
→ Variable influences a → Independent → Conditions
manipulates
behavior
variable
score is high or low prevariable
sumably depends on a participant’s reaction to the
Researcher
→ Variable measures → Dependent → Scores
condition. (This variable
measures
behavior that is
variable
reflects the behavior that
variable
influenced
is “caused” in the relationship.) Thus, in our studying experiment, test errors
is our dependent variable because these scores depend
Drawing Conclusions from
on how participants respond to their particular study
an Experiment
time. Or, in a different experiment, if we compare the
activity levels of 20- and 60-year-olds, then particiAs we change the conditions of the independent varipants’ activity level is the dependent variable because
able, participants’ scores on the dependent variable
presumably it depends on their age. Note: The depenshould also change in a consistent fashion. To see this
dent variable is also called the “dependent measure”
relationship, a useful way to diagram an experiment
and we obtain “dependent scores.”
is shown in Table 1.5. Each column in the diagram is
1-5b
IDENTIFYING AN EXPERIMENT’S COMPONENTS It is
a condition of the independent variable (here, amount
of study time). The numbers in a column are the scores
on the dependent variable from participants who were
tested under that condition (here, each score is the
number of test errors).
Remember that a condition determines participants’ scores on the independent variable: Participants
in the 1-hour condition each have a score of “1” on
the independent variable, those under 2 hours have
a score of “2,” and so on. Thus, the diagram communicates pairs of scores consisting of 1-13, 1-12, 1-11;
then 2-9, 2-8, 2-7, etc.
Now look for the relationship as we did previously: First look at the error scores paired with 1 hour,
important that you can identify independent and dependent variables, so let’s practice: Say my experiment is
to determine if a person’s concentration is improved
immediately after physical exercise. First, recognize
that implicitly, we are always looking for a relationship, so I’m really asking, “Is it true that the more people exercise, the more their concentration improves?”
Therefore, also implicitly, I’m going to need to measure
the concentration of different participants after I make
them get different amounts of exercise. So what are the
variables? Use Table 1.4, which summarizes the decision process. (The table is also on Review Card 1.2.)
What is the variable I’m manipulating because I think
it influences a behavior?
Amount of exercise; so
Table 1.5
it is my independent
Diagram of an Experiment Involving the Independent Variable of Number of
variable (and the differHours Spent Studying and the Dependent Variable of Number of Errors Made on a
ent amounts that parStatistics Test
ticipants exercise are
my conditions). What
Independent Variable: Number of Hours Spent Studying
is the variable I’m meaCondition 1:
Condition 2:
Condition 3:
Condition 4:
suring because it reflects
1 Hour
2 Hours
3 Hours
4 Hours
Dependent Variable:
a behavior I think is
Number
of
Errors
Made
on
13
9
7
5
being influenced? Cona
Statistics
Test
➞
12
8
6
3
centration; so it is my
11
7
5
2
dependent variable that
produces my data.
Chapter 1: Introduction to Statistics and Research
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
13
© mtsyri/Shutterstock.com
Petarneych... | Dreamstime.com
then at the error scores paired with 2 hours, and so
on. The pattern here forms a relationship where,
as study-time scores increase, error scores tend to
decrease. Essentially, participants in the 1-hour condition produce one batch of error scores, those in the
2-hour condition produce a different, lower batch of
error scores, and so on.
We use this diagram because it facilitates applying
our statistics. For example, it makes sense to compute
the average error score in each condition (each column).
Notice, however, that we apply statistics to the dependent variable. We do not know what scores our participants will produce, so these are the scores that we need
help in making sense of (especially in a more realistic
study where we might have 100 different scores in each
column). We do not compute anything about the independent variable because we know all about it (e.g.,
above we have no reason to compute the average of 1,
2, 3, and 4 hours). Rather, the conditions simply form
the groups of dependent scores that we then examine.
Thus, we will use specific descriptive procedures
to summarize the sample’s scores and the relationship
found in an experiment. Then, to infer that we’d see a
similar relationship if we tested the entire population,
we have specific inferential procedures for experiments. Finally, we will translate the relationship back
to the original hypothesis about studying and learning
that we began with, so that we can add to our understanding of nature.
correlational
study A design in
1-5c Correlational
which participants’
scores on two variables
are measured, without
manipulation of either
variable, to determine
whether they form a
relationship
Studies
14
Not all research is an experiment.
Sometimes we do not manipulate or
change either variable and instead
conduct a correlational study. In a
correlational study, the researcher measures par-
ticipants’ scores on two variables and then determines
whether a relationship is present. Thus, in an experiment the researcher attempts to make a relationship
happen, while in a correlational study the researcher
is a passive observer who looks to see if a relationship
exists. For example, we used a correlational approach
previously when we simply asked some students how
long they studied for a test and what their test grade
was. Or, we would have a correlational design if we
asked people their career choices and measured their
personality, asking, “Is career choice related to personality type?”
As usual, we want to first describe and understand
the relationship we’ve observed in the sample, and correlational designs have their own descriptive statistical
procedures for doing this. Here we do not know the
scores that participants will produce for either variable, so the starting point for making sense of them is
often to compute the average score on each variable.
Also, to decide about the relationship we would find in
the population, we have specific correlational inferential procedures. Finally, as with an experiment, we will
translate the relationship back to the original hypothesis about studying and learning that we began with so
that we can add to our understanding of nature.
In a correlational study, the
researcher simply measures
participants’ scores on two
variables to determine if a
relationship exists.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1-6 THE CHARACTERISTICS
> Quick Practice
>
In a correlational design the researcher
measures participants on two variables.
More Examples
In a study, participants’ relaxation scores are measured
after they’ve been in a darkened room for either 10,
20, or 30 minutes. This is an experiment because the
researcher controls the length of times in the room. The
independent variable is length of time, the conditions
are 10, 20, or 30 minutes, and the dependent variable
is relaxation.
A survey measures participants’ patriotism and
also asks how often they’ve voted. This is a correlational
design because the researcher passively measures both
variables.
For Practice
1. In an experiment, the _____ is changed by the
researcher to see if it produces a change in participants’ scores on the _____.
2. To see if drinking influences one’s ability to drive,
each participants’ level of coordination is measured
after drinking 1, 2, or 3 ounces of alcohol. The independent variable is _____, the conditions are _____,
and the dependent variable is _____.
3. In an experiment, the _____ variable reflects participants’ behaviors or attributes.
4. We measure the age and income of 50 people to
see if older people tend to make more money. What
type of design is this?
> Answers
We have one more issue to consider when selecting the
descriptive or inferential procedure to use in a particular
experiment or correlational study. Although we always
measure one or more variables, the numbers that
comprise the scores can have different
erent underlying mathematical characteristics. The particular characteristics of our scores determine
which procedures we should use,
e, because
the kinds of math we can perform
rm depend
on the kinds of numbers we have.. Therefore, always pay attention to two
o
important characteristics of
your scores: the scale of measurement involved and whether
the measurements are continuous
us
or discrete.
© iStockphoto.com/RusN
>
OF SCORES
In an experiment, the researcher
changes the conditions of the
independent variable and then measures
participants’ behavior using the
dependent variable.
1-6a The Four Types of
Measurement Scales
Numbers mean different things in different contexts.
The meaning of a 1 on a license plate is different from
that of a 1 in a race, which is different still from the
meaning of a 1 in a hockey score. The kind of information that scores convey depends on the scale of
measurement that is used in measuring the variable.
There are four types of measurement scales: nominal,
ordinal, interval, and ratio.
With a nominal scale, we do not measure an
amount, but rather we categorize or classify individuals. For example, to “measure” your gender, we
classify you as either male or female. Rather than
using these labels, however, it is easier for us (and for
computers) to use numbers to identify the categories.
For example, we might assign a “1” to each male and
a “2” to each female. These scores involve a nominal
scale because the numbers are used simply for identification (so for nominal, think name). Such scores
are assigned arbitrarily—they don’t reflect an amount,
and we could use any other numbers. Thus, the key
here is that nominal scores indicate only that one
individual is qualitatively differnominal scale
ent from another. So, the numbers
A scale in which
on football uniforms or on your
each score is used
for identification and
credit card are nominal scales. In
does not indicate an
research, we have nominal variables
amount
when studying different types of
4. correlational
3. dependent
2. amount of alcohol; 1, 2, or 3 ounces; level of coordination
1. independent variable; dependent variable
Chapter 1: Introduction to Statistics and Research
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
15
schizophrenia or different therapies.
These variables can occur in any
scale in which scores
design, so for example, in a correlaindicate rank order
tional study, we might measure the
interval scale
political affiliation of participants
A scale in which
scores measure actual
using a nominal scale by assigning
amounts; but zero does
a 5 to democrats, a 10 to republinot mean zero amount
cans, and so on. Then we might also
is present, so negative
numbers are possible
measure participants’ income, to
determine whether as party affiliaratio scale A scale
in which scores measure
tion “scores” change, income scores
actual amounts and zero
also change. Or, if an experiment
means no amount of
compares the job satisfaction scores
the variable is present,
of workers in several different occuso negative numbers are
not possible
pations, the independent variable
is the nominal variable of type of
continuous
variable A variable
occupation.
that can be measured
A different approach is to use
in fractional amounts
an ordinal scale. Here the scores
discrete
indicate rank order—anything that is
variable A
akin to 1st, 2nd, 3rd, … is ordinal.
variable that cannot be
measured in fractional
(Ordinal sounds like ordered.) In
amounts
our studying example, we’d have an
ordinal scale if we assigned a 1 to students who scored best on the test, a 2 to those in second place, and so on. Then we’d ask, “As study times
change, do students’ ranks also tend to change?” Or,
if an experiment compares 1st graders to 2nd graders, then this independent variable involves an ordinal scale. The key here is that ordinal scores indicate
only a relative amount—identifying
who scored relatively high or low.
Also, there is no score of 0, and
the same amount does not separate
every pair of adjacent scores: 1st
may be only slightly ahead of 2nd,
but 2nd may be miles away from
3rd. Other examples of ordinal
variables include clothing size (e.g.,
small, medium, large), college year
(e.g., freshman or sophomore), and
letter grades (e.g., A or B).
A third approach is to use an
interval scale. Here each score
indicates an actual quantity,
and an equal amount separates any adjacent scores. (For
interval scores, remember equal
intervals between them.) However,
although interval scales do include
© udra11/Shutterstock.com
ordinal scale A
16
the number 0, it is not a true zero—it does not mean
that none of the variable is present. Therefore, the
key is that you can have less than this amount, so an
interval scale allows negative numbers. For example,
temperature (in Celsius or Fahrenheit) involves an
interval scale: Because 0° does not mean that zero
heat is present, you can have even less heat at ⫺1°. In
research, interval scales are common with intelligence
or personality tests: A score of zero does not mean
zero intelligence or zero personality. Or, in our studying research we might determine the average test score
and then assign students a zero if they are average; a
⫹1, ⫹2, etc., for the amount they are above average;
and a ⫺1, ⫺2, etc., for the amount they are below
average. Then we’d see if more positive scores tend to
occur with higher study times. Or, if we create conditions based on whether participants are in a positive,
negative, or neutral mood, then this independent variable reflects an interval scale.
The final approach is to use a ratio scale. Here,
like interval scores, each score measures an actual
quantity, and an equal amount separates adjacent
scores. However, 0 truly means that none of the variable is present. Therefore, the key is that you cannot
have negative numbers, because you cannot have less
than nothing. Also, only with a true zero can we make
“ratio” statements, such as “4 is twice as much as 2.”
(So for ratio, think ratio!) We used ratio scales in our
previous examples when measuring the number of
errors and the number of hours studied. Likewise, if
we compare the conditions of having people on diets
consisting of either 1,000, 1,500, or 2,000 calories a
day, then this independent variable involves a ratio
scale. Other examples of ratio variables include the
level of income in a household, the amount of time
required to complete a task, or the number of items
in a list to be recalled by participants. (See Review
Card 1.2 for a summary of the four scales of measurement.)
We can study relationships that involve any combination of the above scales.
1-6b Continuous versus Discrete
Variables
In addition to considering the scale used to measure a
variable, you must also consider whether the variable
is continuous or discrete. A continuous variable can
be measured in fractional amounts, so decimals make
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
sense. That is, our measurements continue between
the whole-number amounts, and there is no limit to
how small a fraction may be. Thus, the variable of
age is continuous because it is perfectly intelligent to
say that someone is 19.6879 years old. On the other
hand, some variables are discrete variables, which
are measured in fixed amounts that cannot be broken
into smaller amounts. Usually the amounts are labeled
using whole numbers, so decimals do not make sense.
For example, being male or female, or being in 1st
grade versus 2nd grade are discrete variables, because
you can be in one group or you can be in the other
group, but you can’t be in between. Some variables
may be labeled using fractions, as with shoe sizes, but
they are still discrete because they cannot be broken
into smaller units.
Usually researchers assume that variables measured using nominal or ordinal scales are discrete,
but that variables measured using interval or ratio
scales are at least theoretically continuous. For example, intelligence tests are designed to produce wholenumber scores, so you cannot have an IQ of 95.6. But
theoretically an IQ of 95.6 makes sense, so intelligence is a theoretically continuous (interval) variable.
Likewise, it sounds strange if the government reports
that the average family has 2.4 children, because this
is a discrete (ratio) variable and no one has .4 of a
child. However, it makes sense to treat this as theoretically continuous, because we can interpret what it
means if the average this year is 2.4, but last year it
was 2.8. (I’ve heard that a recent survey showed the
average American home contains 2.5 people and 2.7
televisions!)
> Quick Practice
>
>
Nominal scales identify categories and
ordinal scales reflect rank order. Both
interval and ratio scales measure actual
quantities, but negative numbers can
occur with interval scales and not with
ratio scales.
Interval and ratio scales are assumed to
be continuous, which allows fractional
amounts; nominal and ordinal scales are
assumed to be discrete, which does not
allow fractional amounts.
More Examples
If your grade on an essay exam is based on the number
of correct statements you include, then a ratio scale is
involved; if it is based on how much your essay is better
or worse than what the professor expected, an interval
scale is involved; if it indicates that yours was relatively
one of the best or worst essays in the class, this is an
ordinal scale (as is pass/fail); if it is based on the last digit
of your ID number, then a nominal scale is involved. If
you can receive one grade or another, but nothing in
between, it involves a discrete scale; if fractions are possible, it involves a continuous scale.
For Practice
1. Whether you are ahead or behind when gambling
involves a(n) _____ scale.
2. The number of hours you slept last night involves
a(n) _____ scale.
3. Your blood type involves a(n) _____ scale.
5. If scores can contain fractions, the variable is _____;
if fractions are not possible, the variable is_____.
> Answers
5. continuous; discrete
4. ordinal
Whether a variable is
continuous or discrete and
whether it is measured using
a nominal, ordinal, interval, or
ratio scale are factors that
determine which statistical
procedure to apply.
4. Whether you are a lieutenant or major in the army
involves a(n) _____ scale.
3. nominal
2. ratio
1. interval
Chapter 1: Introduction to Statistics and Research
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
17
Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out
and use the Chapter Review Cards in the back of your book. Check out the additional study aids online
in CourseMate at www.cengagebrain.com
STUDY PROBLEMS
(Answers for odd-numbered problems are in Appendix C.)
1. What is the goal of behavioral research?
2. Why is it important for students of behavioral
research to understand statistics?
3. (a) What is a population? (b) What is a sample?
(c) How are samples used to make conclusions
about populations? (d) What are researchers really
referring to when they talk about the population?
4. (a) What is a variable? (b) What is a quantitative
variable? (c) What is a qualitative variable?
5. What pattern among the X and Y scores do you
see when: (a) A relationship exists between them?
(b) No relationship is present?
6. What is the difference in the pattern among the
X and Y scores between (a) a perfectly consistent
relationship and (b) a less consistent (weaker)
relationship?
7. (a) What is a representative sample? (b) What is
an unrepresentative sample? (c) What produces an
unrepresentative sample?
8. What is the general purpose of experiments and
correlational studies?
9. What is the difference between an experiment and
a correlational study?
10. In an experiment, what is the dependent variable?
11. What is the difference between the independent
variable and the conditions of the independent
variable?
12. (a) What are descriptive statistics used for?
(b) What are inferential statistics used for?
13. (a) What is the difference between a statistic and a
parameter? (b) What types of symbols are used for
statistics and for parameters?
17. Researcher A gives participants various amounts
of alcohol and then observes any decrease in their
ability to walk. Researcher B notes the various
amounts of alcohol that participants drink at a
party and then observes any decrease in their
ability to walk. Which study is an experiment, and
which is a correlational study? Why?
18. Maria asked a sample of college students about
their favorite beverage. Based on what the majority said, she concluded that most college students
prefer drinking carrot juice to other beverages!
What statistical argument can you give for not
accepting this conclusion?
19. In the following experiments, identify the
independent variable, the conditions, and the
dependent variable: (a) studying whether final
exam scores are influenced by whether concurrent background music is soft, loud, or absent;
(b) comparing students from small, medium, and
large colleges with respect to how much fun they
have during the semester; (c) studying whether
being first-, second-, or third-born is related to
intelligence; (d) studying whether length of daily
exposure to a sunlamp (15 versus 60 minutes)
accounts for differences in depression; (e) studying
whether being in a room with blue walls, green
walls, or red walls influences aggressive behavior
in adolescents.
20. Use the words relationship, sample, population,
statistic, and parameter to describe the flow of a
research study to determine whether a relationship exists in nature.
21. Which of the following data sets show a
relationship?
Sample A
Sample B
Sample C
Sample D
14. Define the four scales of measurement.
X
Y
X
Y
X
Y
X
Y
15. (a) Distinguish between continuous and discrete
variables. (b) Which scales are usually assumed
to be discrete, and which are assumed to be
continuous?
1
1
1
2
2
3
3
10
10
10
20
20
30
30
20
20
22
22
23
24
24
40
42
40
41
40
40
42
13
13
13
13
13
13
13
20
19
18
17
15
14
13
92
92
92
95
95
97
97
76
75
77
74
74
73
74
16. What are the two aspects of a study to consider
when selecting the descriptive or inferential statistics you should employ?
18
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
22. Which sample in problem 21 shows the most
consistent relationship? How do you know?
25. In the chart below, identify the characteristics of
each variable.
23. What pattern do we see when the results of an
experiment show a relationship?
24. Indicate whether a researcher would conduct an
experiment or a correlational study when studying: (a) whether different amounts of caffeine
consumed in 1 hour influence speed of completing a complex task; (b) the relationship between
number of extracurricular activities and GPA;
(c) the relationship between the number of pairs
of sneakers owned and the person’s athleticism;
(d) how attractive men rate a woman when
she is wearing one of three different types of
perfume; (e) the relationship between GPA and
the ability to pay off school loans; (f) the influence of different amounts of beer consumed on
a person’s mood.
Variable
Personality type
Academic major
Number of minutes
before and after an event
Restaurant ratings
(best, next best, etc.)
Speed (miles per hr)
Dollars in your pocket
Change in weight (in lb)
Savings account balance
Reaction time
Letter grades
Clothing size
Registered voter
Therapeutic approach
Schizophrenia type
Work absences
Words recalled
Continuous or
Discrete
Type of
Measurement
Scale
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
_____
Chapter 1: Introduction to Statistics and Research
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
19
Chapter
2
CREATING AND USING
FREQUENCY DISTRIBUTIONS
LOOKING
BACK
GOING
F O R WA R D
Be sure you understand from Chapter 1:
Your goals in this chapter are to learn:
• What nominal, ordinal, interval, and
ratio scales of measurement are.
• What frequency is and how a frequency distribution is created.
• What continuous and discrete
measurements are.
• When to graph frequency distributions using a bar graph,
histogram, or polygon.
• What normal, skewed, and bimodal distributions are.
• What relative frequency and percentile are and how we use the
area under the normal curve to compute them.
Sections
S
o we’re off into the world of descriptive statistics. Recall
that a goal is to make sense of the scores by organizing
and summarizing them. One important way to do this
2-1
Some New Symbols and
Terminology
2-2
Understanding
Frequency Distributions
2-3
Types of Frequency
Distributions
we first summarize the scores on each variable alone. Therefore,
2-4
Relative Frequency and
the Normal Curve
from one variable by using a frequency distribution. You’ll see
2-5
Understanding Percentile
and Cumulative
Frequency
(2) the common patterns found in frequency distributions, and
is to create tables and graphs, because they show the
scores you’ve obtained and they make it easier to see the relationship between two variables that is hidden in the data. Before
we examine the relationship between two variables, however,
this chapter will discuss the common ways to describe scores
(1) how to show a frequency distribution in a table or graph,
(3) how to use a frequency distribution to compute additional
information about scores.
20
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
© Sai Yeung Chan/Shutterstock.com
2-1 SOME NEW SYMBOLS
AND TERMINOLOGY
© iStockphoto.com/Joanne Green/Blackjake
The scores we initially measure in
a study are
ar called the raw scores.
Descriptive statistics help us boil
down raw scores into an interpretable, “dig
“digestible” form. There are
several way
ways to do this, but the starting point is to count the number of
times each sc
score occurred. The number
of times a sco
score occurs in a set of data
is the score’s frequency. If we examine the frequencies
freque
of every score in the
data, we create a fr
frequency distribution. The
term distribution is th
the general name researchers have for any organized
org
set of data. In a
frequency dist
distribution, the scores are
organized based on each score’s frequency.
(Actually, resear
researchers have several ways to
describe freque
frequency, so technically, when
we simply cou
count the frequency of each
score, we are cr
creating a simple frequency
distribution.)
The symbo
symbol for a score’s frequency
is the lowercas
lowercase f. To find f for a score,
count how many times that score occurs. If three participants scored 66, then 66 occurred three times, so
the frequency of 66 is 3 and so f ⫽ 3. Creating a frequency distribution involves counting the frequency
of every score in the data.
In most statistical procedures, we also count the
total number of scores we have. The symbol for the
total number of scores in a set of data is the uppercase
N. Thus, N ⫽ 43 means that we have 43 scores. Note
that N is not the number of different scores, so even
if all 43 scores in a sample are the same score, N still
equals 43.
raw scores
The scores initially
measured in a study
frequency (f )
The frequency of a
score is symbolized
by f. The total number
of scores in the data
is symbolized by N.
The number of times
each score occurs
in a set of data;
also called simple
frequency
frequency
distribution A
distribution showing
the number of times
each score occurs in
the data
Chapter 2: Creating and Using Frequency Distributions
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
21
FREQUENCY DISTRIBUTIONS
The first step when trying to understand any set of
scores is to ask the most obvious question, “What are
the scores that were obtained?” In fact, buried in any
data are two important things to know: Which scores
occurred, and how often did each occur? These questions are answered simultaneously by looking at the
frequency of each score. Thus, frequency distributions
are important because they provide a simple and clear
way to show the scores in a set of data. Because of this,
they are always the first step when beginning to understand the scores from a study. Further, they are also
a building block for upcoming statistical procedures.
One way to see a frequency distribution is in a table.
2-2a Presenting Frequency
in a Table
Let’s begin with the following raw scores. (They might
measure one of the variables from a correlational study,
or they might be dependent scores from an experiment.)
14
13
14
14
13
13
15
14
11
15
15
17
13
14
10
14
12
15
In this disorganized arrangement, it is difficult
to make sense of these scores. Watch what happens,
though, when we arrange them into the frequency
table in Table 2.1.
Table 2.1
Simple Frequency Distribution Table
The left-hand column identifies each score, and the right-hand
column contains the frequency with which the score occurred.
Score
17
16
15
14
13
12
11
10
22
f
1
0
4
6
4
1
1
1
Total: 18 ⫽ N
© iStockphoto.com/Vladimir
2-2 UNDERSTANDING
Researchers have several rules of thumb for making a frequency table. Start with a score column and an
f column. The score column has the highest score in the
data at the top of the column. Below that are all possible whole-number scores in decreasing order, down
to the lowest score that occurred. Here, our highest
score is 17, the lowest score is 10, and although no one
obtained a score of 16, we still include it. In the f column opposite each score is the score’s frequency: In the
sample there is one 17, zero 16s, four 15s, and so on.
Not only can we see the frequency of each score,
we can also determine the combined frequency of several scores by adding together their individual fs. For
example, the score of 13 has an f of 4 and the score of
14 has an f of 6, so their
combined frequency is 10.
Notice that, although
8 scores are in the score
column, N is not 8. We had
18 scores in the original
sample, so N is 18. You
can see this by adding
together all of the individual
frequencies in the f column:
The 1 person scoring 17 plus
the 4 people scoring 15, and so on, adds up
to the 18 people in the sample. In a frequency distribution, the sum of the frequencies always equals N.
> Quick Practice
>
A frequency distribution shows the
number of times participants obtained
each score.
More Examples
The scores 15, 16, 13, 16, 15, 17, 16,
15, 17, and 15 contain one 13, no
14s, four 15s, and so on, producing
the frequency table to the right:
Scores
f
17
2
16
3
15
4
14
0
13
1
For Practice
1. What is the difference between f and N?
2. Create a frequency table for these scores:
7, 9, 6, 6, 9, 7, 7, 6, and 6.
(continued)
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
3. What is the N here?
4. What is the frequency of 6 and 7 together?
> Answers
this is a nominal variable, they can
bar graph A
be arranged in any order. In the fregraph showing a
vertical bar over each
quency table, we see that 6 people
X score, but adjacent
were Republicans, so we draw a bar
bars do not touch;
at a height of 6 above “Rep.,” and
used with nominal or
ordinal scores
so on.
The lower table and graph are
from a survey in which we counted the number of
participants having a particular military rank (an
ordinal variable). The ranks are arranged on the X
axis from lowest to highest. Again, the height of each
q
y
bar is the “score’s” frequency.
4. f ⫽ 3 ⫹ 4 ⫽ 7
3. N ⫽ 9
4
6
3
7
0
8
2
9
f
Scores
© iStockphoto.com/Bridgette Braley
2.
1. f is the number of times a score occurs;
N is the total number of scores in the data.
2-2b Graphing a
Frequency Distribution
When researchers talk of a frequency distribution, they often imply a graph that
shows the frequencies of each score. (A
review of basic graphing is in Appendix
A.1.) To graph a frequency distribution,
place the scores on the X axis. Place
frequency on the Y axis. Then we have
several ways to draw the graph of a frequency distribution, depending on the
scale of measurement that the raw scores
reflect. We may create a bar graph, a histogram, or a polygon.
Figure 2.1
Frequency Bar Graphs for Nominal and
Ordinal Data
ta
The height of each
ach bar indicates the frequency of the
corresponding score on the X axis.
Nominal Variable
of Political
Affiliation
Party
f
Libertarian
Socialist
Democrat
Republican
1
3
8
6
8
7
6
5
f 4
3
2
1
0
CREATING BAR GRAPHS We graph a
frequency distribution of nominal or
ordinal scores by creating a bar graph.
A bar graph has a vertical bar centered
over each X score and the height of the
bar corresponds to the score’s frequency.
Notably, adjacent bars do not touch.
Figure 2.1 shows the frequency
tables and bar graphs of two samples.
The upper table and graph are from a
survey in which we counted the number of participants in each category of
the nominal variable of political party
affiliation. The X axis is labeled using the
“scores” of political party, and because
Ordinal Variable
of Military Rank
Rank
f
General
Colonel
Lieutenant
Sergeant
3
8
4
5
Rep.
Dem.
Soc.
Political affiliation
Lib.
Sgt.
Lt.
Col.
Military rank
Gen.
8
7
6
5
f 4
3
2
1
0
Chapter 2: Creating and Using Frequency Distributions
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
23
Figure 2.2
Histogram Showing the Frequency of Parking Tickets
in a Sample
A graph of a frequency
distribution always shows the
scores on the X axis and their
frequency on the Y axis.
histogram
A frequency graph
similar to a bar graph
but with adjacent bars
touching; used with a
small range of interval
or ratio scores
frequency
polygon
A frequency graph
showing a data point
above each score, with
the adjacent points
connected by straight
lines; used with many
different interval or
ratio scores
data point A dot
plotted on a graph to
represent a pair of X
and Y scores
The reason we create bar
graphs with nominal and ordinal scales is that both are discrete
scales: You can be in one group or
the next, but not in between. The
space between the bars communicates this. On the other hand, recall
that interval and ratio scales are
usually assumed to be at least theoretically continuous: They allow fractional amounts that continue between
the whole numbers. To communicate
this, these scales are graphed using
continuous (connected) figures. We
may create two types of graphs here,
either a histogram or a polygon.
CREATING HISTOGRAMS We create
a histogram when we have a small
number of different interval or ratio
scores. A histogram is similar to a bar graph except
that in a histogram, the adjacent bars touch. For
example, say that we measured the ratio variable of
number of parking tickets that participants received,
obtaining the data in Figure 2.2. Again, the height of
each bar indicates the corresponding score’s frequency.
Because the adjacent bars touch, there is no gap
between the scores on the X axis. This communicates
In a histogram the
adjacent bars touch;
in a bar graph they do not.
24
Score
f
7
6
5
4
3
2
1
1
4
5
4
6
7
9
9
8
7
6
f
5
4
3
2
1
0
1 2 3 4 5 6
Number of parking tickets
7
that the X variable is continuous, with no gaps in our
measurements.
CREATING FREQUENCY POLYGONS Usually, we don’t
create a histogram when we have many different interval or ratio scores, such as if our participants had from
1 to 50 parking tickets. The 50 bars would need to be
very skinny, so the graph would be difficult to read. We
have no rule for what number of scores is too large,
but when a histogram is unworkable, we create a frequency polygon. Construct a frequency polygon by
placing a “dot” over each score on the X axis at the
height that corresponds to the appropriate frequency
on the Y axis. Then connect adjacent dots with straight
lines. To illustrate this, Figure 2.3 shows the previous
parking ticket data plotted as a frequency polygon. For
an X of 1, the frequency is 9; for an X of 2, f ⫽ 7; and
so on. Because each line continues between two adjacent dots, we again communicate that our measurements continue between the two scores on the X axis,
meaning that this is a continuous variable.
Notice that the polygon also includes on the X
axis the next score above the highest score in the data
and the next score below the lowest score (in Figure
2.3, scores of 0 and 8 are included). These added
scores have a frequency of 0, so the curve touches the
X axis. In this way we create a complete geometric
figure—a polygon—with the X axis as its base.
Also, here is an important new term: A “dot”
plotted on any graph is called a data point.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
> Quick Practice
Figure 2.3
Frequency Polygon Showing the Frequency of Parking
Tickets in a Sample
Score
f
7
6
5
4
3
2
1
1
4
5
4
6
7
9
>
9
8
7
6
f
Create a bar graph with
nominal or ordinal scores,
a histogram with a few
interval/ratio scores, and a
polygon with many different
interval/ratio scores.
grouped
distribution
A distribution created
by combining
individual scores into
small groups and
then reporting the
total frequency (or
other description) of
each group
5
4
3
2
1
0
1
2
3
4
5
6
Number of parking tickets
7
8
More Examples
Using the data from a survey, (1) create a bar graph of
the frequency of male or female participants (a nominal
variable); (2) create a bar graph of the number of people
who are first-born, second-born, etc. (an ordinal variable);
(3) create a histogram of the frequency of participants
falling in each of five salary ranges (a few ratio scores);
(4) create a polygon of the frequency for each individual
salary reported (many ratio scores).
For Practice
GROUPED DISTRIBUTIONS So far we have created fre-
quency distributions that show each individual score.
However, sometimes we have too many scores to produce a manageable table or graph. Then we create a
grouped distribution. In a grouped distribution,
individual scores are first combined into small groups,
and then we report the total frequency (or other information) for each group. For example, in some data
we might group the scores 0, 1, 2, 3, and 4 into the
“0–4” group. Then we would add the f for the score
of 0 to the f for the score of 1, and so on, to obtain
the frequency of all scores between 0 and 4. Likewise,
we would combine the scores between 5 and 9 into
another group, and so on. Then we report the total f
for each group.
This technique can be used to reduce the size
of a table and to make bar graphs, histograms,
and polygons more manageable. When graphing a
grouped frequency distribution, on the X axis we
use the middle score in each group to represent
the group. Thus, the X of 2 would represent the
0–4 group, and the X of 7 would represent the 5–9
group. On the Y axis we plot the total frequency
of all scores in each group. (Consult an advanced
statistics book for more about creating grouped
distributions.)
1. A _____has a separate bar above each score,
a _____ contains bars that touch, and a ______
has dots connected with straight lines.
2. A “dot” plotted on a graph is called a ______.
3. To show the frequency of people who are above
an average weight by either 0, 5, 10, or 15 pounds,
plot a _____.
4. To show the number in a sample preferring
chocolate or vanilla ice cream, plot a _____.
5. To show the number of people who are above
average weight by each amount between 0 and
100 pounds, plot a _____.
> Answers
1. bar graph; histogram; polygon 2. data point
3. histogram 4. bar graph 5. polygon
Thus, in Figure 2.3 we placed a data point over the
X of 4 at an f of 4.
2-3 TYPES OF FREQUENCY
DISTRIBUTIONS
Although bar graphs and histograms are more common in published research, polygons are an important
component of statistical procedures. This is because, in
many different situations, nature produces frequency
Chapter 2: Creating and Using Frequency Distributions
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
25
© iStockphoto.com/Marek Uliasz
distributions that have similar
characteristics and that form
polygons having the same
general shape. We have several common shapes, and
your first task is to learn their
names. When we apply them,
we think of the ideal version
of the polygon that would be
produced in the population.
By far the most important
frequency distribution is the
normal distribution. (This is
the big one, folks.)
2-3a The Normal Distribution
Figure 2.4 shows the polygon of the ideal normal
distribution for some test scores from a population.
Although specific mathematical properties define
this polygon, in general it is a bell-shaped curve. But
don’t call it a bell curve (that’s so pedestrian!). Call it
a normal curve or a normal distribution, or say
that the scores are normally distributed.
Because this polygon represents an infinite population, it is slightly different from that for a sample. First,
we cannot count the f of each score, so no numbers
occur on the Y axis. Simply remember that frequencies
increase as we proceed higher up the Y axis. Second, the
polygon is a smooth curved line. The
population contains so many different
normal curve
The symmetrical,
whole and decimal scores that the indibell-shaped curve
vidual data points form the curved line.
produced by graphing
Nonetheless, to see the frequency of a
a normal distribution
score, locate the score on the X axis
normal
and then move upward until you reach
distribution
A set of scores in which
the line forming the polygon. Then,
the middle score has
the highest frequency
and, proceeding toward
higher or lower scores,
the frequencies at
first decrease slightly
but then decrease
drastically, with the
highest and lowest
scores having very low
frequency
Figure 2.4
The Ideal Normal Curve
Scores farther above and below the middle scores occur with progressively lower frequencies.
f
tail of the
distribution
The far-left or far-right
portion of a frequency
polygon containing the
relatively low-frequency,
extreme scores
26
moving horizontally, determine whether the frequency
of the score is relatively high
or low.
As you can see, in a normal distribution, the score with
the highest frequency is the
middle score (in Figure 2.4 it
is the score of 30). The normal
curve is symmetrical, meaning that the left half below the
middle score is a mirror image
of the right half above the
middle score. As you proceed away from the middle, the
frequencies decrease, with the highest and lowest scores
having relatively very low frequency. However, no matter
how low or high a score might be, the curve never actually
touches the X axis. This is because, in an infinite population, theoretically any score might occur sometime, so the
frequencies approach—but never reach—zero.
Note: In the language of statistics, the portions of a normal curve containing the relatively low-frequency, extreme
high or extreme low scores are each called a tail of the
distribution. In Figure 2.4 the tails are roughly below the
score of 15 and above the score of 45.
The reason that the normal curve is important is
because it is a very common distribution in behavioral
research. For most of the variables that we study, most
of the individuals have scores at or close to the middle
score, with progressively fewer individuals scoring at
the more extreme, higher or lower scores. Because of
this, the normal curve is very common in our upcomingstatistical procedures. Therefore, before you proceed, be sure that you can read the normal curve. Can
you see in Figure 2.4 that the most frequent scores
are between 25 and 35? Do you see that a score of
15 has a relatively low frequency and a score of 45
Tail
Tail
0
...
5
10
15
20
25
30
35
Test scores
40
45
50
55
...
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
negatively
skewed
distribution
Figure 2.5
Idealized Skewed Distributions
An asymmetrical
distribution with lowfrequency, extreme
low scores, but
without corresponding
low-frequency,
extreme high scores;
its polygon has only
one pronounced tail,
over the lower scores
The direction in which the distinctive tail slopes indicates whether the skew is positive or negative.
Negative skew
Positive skew
f
f
Low
...
Middle
scores
...
High
Low
has the same low frequency? Do you see that there
are relatively few scores in the tail above 50 or in the
tail below 10? Do you see that the farther into a tail a
score lies, the less frequently the score occurs?
2-3b Skewed Distributions
Middle
scores
...
High
positively
skewed
distribution
An asymmetrical
distribution with lowfrequency, extreme
high scores, but
without corresponding
low-frequency,
extreme low scores; its
polygon has only one
pronounced tail, over
the higher scores
The polygon on the right in Figure 2.5
shows a positively skewed distribution. This pattern is often found, for
example, when measuring participants’
“reaction times” to a sudden stimulus.
Usually, scores tend to be rather low
(fast), but every once in a while, a person “falls asleep at
the switch,” requiring a large amount of time that produces
a high score. (To remember positively skewed,
remember that the tail slopes away from zero,
toward the higher, positive scores.)
Whether a skewed
distribution is
negative or positive
corresponds
to whether the
distinct tail slopes
toward or away
from zero.
© iStockphoto.com/Jason Lugo
Not all variables form normal distributions. One of the
most common non-normal distributions is a skewed
distribution. A skewed distribution is similar to a normal distribution except that it produces a polygon
with only one pronounced tail. As shown
in Figure 2.5, a distribution may be either
negatively skewed or positively skewed.
A negatively skewed distribution contains low-frequency, extreme
low scores but does not contain low-frequency, extreme high scores.The polygon
on the left in Figure 2.5 shows an idealized
negatively skewed distribution. This pattern might be found, for example, by measuring
the running speed of professional football players.
Most would tend to run at the higher speeds, but a
relatively few linemen lumber in at the slower speeds.
(To remember negatively skewed, remember that the
pronounced tail is over the lower scores, sloping toward
zero, where the negative
scores would be.)
On the other hand,
a positively skewed
distribution contains
low-frequency, extreme
high scores but does not
contain low-frequency,
extreme low scores.
...
Chapter 2: Creating and Using Frequency Distributions
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
27
bimodal
distribution
A distribution forming
a symmetrical polygon
with two humps where
there are relatively highfrequency scores, with
center scores that have
the same frequency
2-3c Bimodal
Distributions
An idealized bimodal distribution is
shown in Figure 2.6. A bimodal distribution forms a symmetrical polygon containing two distinct humps,
each reflecting relatively high-frequency
scores. At the center of each hump is one
score that occurs more frequently than the surrounding
scores, and technically the two center scores have the
same frequency. Such a distribution would occur with
test scores, for example, if most students scored around
60 or 80, with few students falling far below 60 or scoring in the 70s or 90s.
2-3d Labeling Frequency
Distributions
Figure 2.6
Idealized Bimodal Distribution
Bimodal
f
Low
...
Middle
scores
...
approximately normal curve in every study, we simplify
the task by using the ideal normal curve we saw previously as our one “model” of any distribution that generally has this shape. This gives us one reasonably accurate
way of envisioning the many approximately normal distributions that researchers encounter. The same is true for
the other shapes we’ve seen.
We also apply the names of the previous distributions
to samples as a way of summarizing and communicating
their general shapes. Figure 2.7 shows several examples,
as well as the corresponding labels we might use. (Notice
that we even apply these names to histograms and bar
graphs.) We assume that in the population, the additional
scores and their frequencies would “fill in” the sample
curve, smoothing it out to be closer to the ideal curve.
You need to know the names of the previous distributions
because descriptive statistics describe the important characteristics of data, and one very important characteristic
is the shape of the frequency distribution. First, the shape
allows us to understand the data. If, for example, I tell you
my data form a normal distribution, you can mentally
envision the distribution and instantly understand how my
participants generally performed. Also, the shape is important in determining which statistical procedures to employ.
Many of our statistics are applied only when we have a
normal distribution, while
others are for non-normal
Figure 2.7
distributions. Therefore,
Simple Frequency Distributions of Sample Data with Appropriate Labels
the first step when examNormal
Positively skewed
ining any data is to identify the shape of the frequency distribution that
f
f
is present.
Data in the real
world, however, never
form the perfect curves
Low
...
Middle
...
High
Low
...
Middle
...
we’ve discussed. Instead,
scores
scores
the scores will form a
Bimodal
Negatively skewed
bumpy, rough approximation to the ideal distribution. For example,
f
f
data never form a perfect
normal curve, and at best
only come close to that
Low
...
Middle
...
Low
...
Middle
...
High
shape. However, rather
scores
scores
than drawing a different,
28
High
High
High
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
2-4 RELATIVE FREQUENCY
> Quick Practice
>
AND THE NORMAL CURVE
The most common frequency
distributions are normal distributions,
negatively or positively skewed distributions,
and bimodal distributions.
More Examples
The variable of intelligence (IQ) usually forms a normal
distribution: The most common scores are in the middle, with higher or lower IQs occurring progressively less
often. If IQ was positively skewed, there would be only
one distinct tail, located at the higher scores. If IQ was
negatively skewed, there would be only a distinct tail
at the lower scores. If IQ formed a bimodal distribution,
there would be two distinct humps in the curve containing the highest-frequency scores.
For Practice
1. Arrange the scores below from most frequent to
least frequent.
We will return to frequency distributions—especially
the normal curve—throughout this course. However,
counting the frequency of scores is not the only thing
we do. Another important procedure is to describe
scores using relative frequency.
2-4a Understanding Relative
Frequency
Relative frequency is the proportion of the time that
a score occurs in a distribution. Any proportion is a
decimal number between 0 and 1 that indicates a fraction of the total. Thus, we use relative frequency to
indicate what fraction of the sample is produced by the
times that a particular score occurs. In other words, we
determine the proportion of N that is made up by the f
of a score. In symbols we have this formula:
THE FORMULA FOR RELATIVE
FREQUENCY IS
Relative frequency ⫽
f
N
f
A
B
C
D
2. What label should be given to each of the
following?
f
f
Scores
(a)
Scores
(b)
f
f
Scores
(c)
Scores
(d)
> Answers
Simply divide the score’s frequency (f) by the total
number of scores (N). For example, if a score occurred
5 times in a sample of 10 scores, then
Relative frequency ⫽
f
5
⫽
⫽ .50
N
10
The score has a relative frequency of .50, meaning that
the score occurred .50 of the time in this sample. Or,
say that a score occurred 6 times out of an N of 9.
Then its relative frequency is 6/9, which is .67. We usually “round off” relative frequency to two decimals.
Finally, we might find that several scores have a combined frequency of 10 in a sample of 30 scores: 10/30
equals .33, so these scores together have a relative frequency of .33—they make up .33 of this sample.
As you can see here, one reason
to compute relative frequency is
relative
because it can be easier to interpret
frequency The
proportion of time
than simple frequency. Saying that
a score occurs in a
a score has a frequency of 6 can
distribution
be difficult to interpret because we
1. C, B, A, D
2. a. positively skewed; b. bimodal; c. normal;
d. negatively skewed
Chapter 2: Creating and Using Frequency Distributions
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
29
have no frame of reference—is this a high frequency or
not? However, we can easily interpret the relative frequency of .67 because it means that the score occurred
67% of the time.
We can also begin with relative frequency and
work in the other direction, computing the corresponding simple frequency. To transform relative frequency into simple frequency, we multiply the relative
frequency times N. Thus, if a score’s relative frequency
is .4 when N is 10, then (.40)(10) gives f ⫽ 4.
Finally, sometimes we transform relative frequency to percent so that we have the percent of the
time that a score occurs. (Remember that officially,
relative frequency is a proportion.) To transform relative frequency to a percent, multiply the proportion by
100. If a score’s relative frequency is .50, then we have
(.50)(100), so this score occurred 50% of the time.
To transform a percent back into relative frequency,
divide the percent by 100. The score that is 50% of
the sample has a relative frequency of 50%/100 ⫽
.50. (For further review of proportions, percents, and
rounding, consult Appendix A.1.)
When reading research, you may encounter frequency tables that show relative frequency. Here, the
raw scores are arranged as we saw previously, but next
to the simple frequency column, an additional column
shows each score’s relative frequency. Also, the rules for
creating the different graphs we’ve seen are the same
inferential procedures is to compute relative frequency
using the normal curve.
2-4b Finding Relative Frequency
Using the Normal Curve
To understand how we use the normal curve to compute relative frequency, first think about the curve in a
novel way. Imagine you are in a helicopter flying over a
large parking lot that contains a mass of people crowded
together. The outline of the mass has that bell shape of a
normal curve. Upon closer inspection, you see an X axis
and a Y axis laid out on the ground, and at the marker for
each X score are people standing in line who received that
score. The lines of people are packed so tightly together
that, from the air, all you see are the tops of many heads,
in a solid “sea of humanity.” If you paint a line that goes
behind the last person in line at each score, you would
have the normal curve shown in Figure 2.8.
From this perspective, the height of the curve above
any score reflects the number of people standing in line
at that score. Thus, in Figure 2.8, the score of 30 has
the highest frequency because the longest line of people
is standing at this score in the parking lot. Likewise, say
that we counted the people in line at each score between
30 and 35. If we added them together, we would have
the combined frequencies for these scores.
The reason for using this “parking lot view” is so
you won’t think of the normal curve as just a line floating
g above the X axis. Instead, think of the space
under the curve—the space between the polyun
gon’s line and the X axis—as forming a solid
figure that has area. This area represents
the individuals and their scores in our data.
The entire parking lot contains everyone
Relative frequency indicates
the proportion of time (out
of N) that a score occurred.
© iStockphoto.com/Maria Toutoudaki
Figure 2.8
8
for relative frequency,
except that the Y axis
shows values of relative
frequency ranging from
0 to 1.0.
You should know
what relative frequency
is, but we will not
emphasize the above
formula. Instead, you
will see that a core element of descriptive and
30
Parking Lot View of the Ideal Normal Curve
The height of the curve above any score reflects the number of people standing at that score.
f
Tail
Tail
0
...
5
10
15
20
25
30
Scores
35
40
45
50
55
...
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
we’ve studied and 100% of the scores. Therefore,
this portion of the curve is the space
proportion
any portion of the parking lot—any portion of the
occupied by the people having those
of the area
under the
area under the curve—corresponds to that portion of
scores. We then compare this area, to
curve The
our data, which is its relative frequency.
the total area to determine the proporproportion of the
For example, in Figure 2.8, a vertical line is drawn
tion of the area under the curve
total area under
through the middle score of 30, and so half (.50) of the
that we have selected. This proportion
the normal curve at
certain scores, which
parking lot is to the left of that line. Because the comcorresponds to the combined relative
represents the relative
plete parking lot contains all participants, a part that
frequency of the selected scores.
frequency of those
is .50 of the parking lot contains 50% of the particiOf course, statisticians don’t fly
scores
pants. (We can ignore those relatively few people who
around in helicopters, eyeballing parkare straddling the line.) Participants who are standing
ing lots, so here’s a different approach:
to the left of the line received scores below 30. So, in
Say that by using a ruler and protractor, we determine
total, 50% of the participants received scores below
that in Figure 2.9 the entire polygon occupies an area of 6
30. Now turn this around: If 50% of the participants
square inches on this page. This total area corresponds to
obtained scores below 30, then the scores below 30
all scores in the sample. Say that the area under the curve
occurred 50% of the time. Thus, the scores below 30
between the scores of 30 and 35 covers 2 square inches.
have a combined relative frequency of .50.
This area is due to the number of times these scores occur.
The logic here is so simple that it almost sounds
Therefore, the scores between 30 and 35 occupy 2 out of
tricky. But it’s not! If you “slice off” one-half of the parkthe 6 square inches created by all scores, so these scores
ing lot, then you have one-half of
Figure 2.9
the participants and one-half of
Finding the Proportion of the Total Area under the Curve
the scores, so those scores occur
The complete curve occupies 6 square inches, with scores between 30 and 35 occupying 2 square inches.
.50 of the time. Or, if your slice is
25% of the parking lot, then you
6 square inches
have 25% of the participants and
25% of the scores, so those scores
occur .25 of the time. And so on.
2 square inches
This is how we describe what
f
we are doing using statistical
terms: The total space occupied
by people in the parking lot is the
total area under the normal curve.
We draw a line vertically to create
0
...
5
10
15
20
25
30
35
40
45
50
55
...
Scores
a “slice” of the polygon containing particular scores. The area of
Antonio Scorza/Getty Images
Chapter 2: Creating and Using Frequency Distributions
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
31
constitute 2/6, or .33, of the distribution. Thus, the scores between 30 and
percentage of all scores
in the sample that are
35 occur .33 of the time, so they have a
below a particular score
relative frequency of .33.
Thus, usually we will have normally distributed scores. To determine the relative frequency of particular scores, we will identify the “slice”
of the normal curve containing those scores. Then the
proportion of the total area under the curve occupied
by that slice will equal the relative frequency of those
scores. (Although it is possible to create a very narrow
slice containing only one score, we will usually seek the
combined relative frequency of a number of adjacent
scores that form a relatively wide slice.) Using the area
under the curve is especially useful because in Chapter 5,
you’ll see that statisticians have created a system for
easily finding the area under any part of the curve. (No,
you won’t need a ruler and a protractor.) Until then:
percentile The
.50
.15
f
45
50
55
60
65
70 75
Scores
80
85
90
95
For Practice
1. If a score occurs 23% of the time, its relative
frequency is _____.
2. If a score’s relative frequency is .34, it occurs _____
percent of the time.
3. If scores occupy .20 of the area under the curve,
they have a relative frequency of _____.
4. Say that the scores between 15 and 20 have a relative frequency of .40. They make up _____ of the
area under the normal curve.
> Quick Practice
>
>
Relative frequency is the proportion of
the time that a score occurs.
The area under the normal curve
corresponds to 100% of a sample, so a
proportion of the curve will contain that
proportion of the scores, which is their
combined relative frequency.
More Examples
In the following normal curve, the shaded portion is .15
of the total area (so 15% of people in the parking lot are
standing at these scores). Thus, scores between 55 and 60
occur .15 of the time, so their combined relative frequency is .15. Above the score of 70 is .50 of the curve, so scores
above 70 have a combined relative frequency of .50.
32
> Answers
1. 23%/100 ⫽ .23 2. (.34)(100) ⫽ 34 3. .20 4. .40
The total area under the normal
curve corresponds to all scores, so
the proportion of this area occupied
by some scores is the proportion of
the time those scores occur, which is
their relative frequency.
2-5 UNDERSTANDING
PERCENTILE AND
CUMULATIVE FREQUENCY
We have one other approach for describing scores,
and it is used when we want to know the standing of
a particular score in terms of the other scores that are
above or below it. Then, the most common procedure
is to compute the score’s percentile. A percentile is
usually defined as the percent of all scores in the data
that are below a particular score. (Essentially, your
percentile tells you the percent of all scores that you
are beating.) For example, say that the score of 40 is
at the 50th percentile. Then we say that 50% of the
scores are below 40 (and 50% of the scores are above
40). Or, if you scored at the 75th percentile, then 75%
of the group scored lower than you (and 25% scored
above you).
The formula for computing an exact percentile
is very involved, so the easiest way to compute it is
by using SPSS. However, you should know the name
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
of another way to organize scores that is part of
the computations, called
cumulative
frequency.
Figure 2.10
Normal Distribution Showing the Area under the Curve to the Left of Selected
Scores
Cumulative frequency
is the number of scores
in the data that are at or
below a particular score.
For example, say that
.15 of area
f
in some data, 3 people
.85 of area
had the score of 20 and
5 people scored below
20. The cumulative frequency for the score of 20
0
...
5
10
15
is 3 ⫹ 5 ⫽ 8, indicating
that 8 people scored at
or below 20. If 2 people
scored at 21, and 8 scored below 21 (at 20 or below),
then the cumulative frequency for 21 is 2 ⫹ 8 ⫽ 10.
And so on.
Computing a percentile then involves transforming a score’s cumulative frequency into something
like a percent of the total. Researchers prefer percentile over cumulative frequency because percentile is
usually easier to interpret. For example, if we know
only that 10 people scored at 21 or below, it is difficult to evaluate this score. However, knowing that
21 is, for example, at the 90th percentile, gives a
clearer understanding of this score and of the entire
sample.
You may have noticed that with cumulative frequency, we talked of the number of people scoring at
or below a score, but with percentile we talked about
only those scoring below a score. Technically, a percentile is the percent of the scores at or below a score.
However, usually we are dealing with a large sample
or population when computing percentile, so the relatively few participants at the score are a negligible
portion of the total and we can ignore them. (Recall
that we ignored those relatively few people who were
straddling the line back in Figure 2.8.) Therefore,
researchers usually interpret percentile as the percent
of all scores below a particular score.
Note: You may encounter names that researchers have for specific percentiles. The 10th percentile is
called the first decile, the 20th percentile is the second
decile, and so on. Likewise, the 25th percentile is the
first quartile, and so on.
Because we can ignore the people straddling the
line in our parking lot view, a quick way to find an
approximate percentile is to use the area under the
.50 of area
20
25
30
Scores
35
40
45
50
55
...
normal curve. Percentile describes the
cumulative
scores that are lower than a particufrequency The
number of scores in
lar score, and on the normal curve,
the data that are at
lower scores are to the left. Therefore,
or below a particular
the percentile for a score corresponds
score
to the percent of the area under the
curve that is to the left of the score.
For example, Figure 2.10 shows that .50 of the
curve is to the left of the score of 30. Because scores
to the left of 30 are below it, 50% of the distribution
is below 30 (in the parking lot, 50% of the people are
standing to the left of the line and all of their scores
are less than 30). Thus, the score of 30 is at the 50th
percentile. Likewise, say that we find .15 of the distribution is to the left of the score of 20; 20 is at the
15th percentile.
We can also work the other way to find the score
at a given percentile. Say that we seek the score at the
85th percentile. We would measure over to the right
until 85% of the area under the curve is to the left of a
certain point. If, as in Figure 2.10, the score of 45 is at
that point, then 45 is at the 85th percentile.
Percentile indicates the percent of all
scores below a particular score. On
the normal curve, a score’s percentile
is the percent of the area under the
curve to the left of the score.
Chapter 2: Creating and Using Frequency Distributions
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
33
USING SPSS
Review Card 2.4 provides instructions for using SPSS to compute simple frequency, percent, and percentile
in a set of data. You can also create bar graphs and histograms, as well as more elaborate graphs.
Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out
and use the Chapter Review Cards in the back of your book. Check out the additional study aids online
in CourseMate at www.cengagebrain.com
STUDY PROBLEMS
(Answers for odd-numbered problems are in Appendix C.)
1. What do these symbols mean? (a) N; (b) f ?
2. Why must the sum of all fs in a sample equal N ?
3. (a) What is the difference between a bar graph
and a histogram? (b) With what kind of data is
each used?
4. What is a dot plotted on a graph called?
5. (a) What is the difference between a histogram
and a polygon? (b) With what kind of data is
each used?
6. (a) What does it mean when a score is in a tail of
a normal distribution? (b) What is the difference
between scores in the left-hand tail and scores in
the right-hand tail?
7. (a) What is the difference between a score’s simple
frequency and its relative frequency?
(b) What is the difference between a score’s
cumulative frequency and its percentile?
8. (a) What is the advantage of computing relative
frequency instead of simple frequency?
(b) What is the advantage of computing percentile
instead of cumulative frequency?
9. (a) What is the difference between the polygon
for a skewed distribution and the polygon for a
normal distribution? (b) What is the difference
between the polygon for a bimodal distribution
and the polygon for a normal distribution?
10. What is the difference between the graphs for a
negatively skewed distribution and a positively
skewed distribution?
11. What is the difference between how we use the
proportion of the total area under the normal
curve to determine relative frequency and how we
use it to determine percentile?
34
12. In reading psychological research, you encounter
the following statements. Interpret each one.
(a) “The IQ scores were approximately normally
distributed.” (b) “A bimodal distribution of physical agility scores was observed.” (c) “The distribution of the patients’ memory scores was severely
negatively skewed.”
13. What type of frequency graph is appropriate when
counting the number of: (a) Blondes, brunettes,
redheads, or “others” attending a college?
(b) People having each body weight reported in a
statewide survey? (c) Children in each grade at an
elementary school? and (d) Car owners reporting
above-average, average, or below-average
problems with their car?
14. The distribution of scores on a statistics test is
positively skewed. What does this indicate about
the difficulty of the test?
15. The distribution of salaries at a large corporation
is negatively skewed. (a) What would this indicate
about the pay at this company? (b) If your salary
is in the tail of this distribution, what should you
conclude about your salary?
16. (a) On a normal distribution of exam scores, Crystal
scored at the 10th percentile, so she claims that
she outperformed 90% of her class. Why is she correct or incorrect? (b) Ernesto’s score is in a tail of
the normal curve, so he claims to have one of the
highest scores. Why is he correct or incorrect?
17. Interpret each of the following. (a) You scored at the
35th percentile. (b) Your score has a relative frequency of .40. (c) Your score is in the upper tail of the
normal curve. (d) Your score is in the left-hand tail of
the normal curve. (e) The cumulative frequency of
your score is 50. (f) Using the area under the normal
curve, your score is at the 60th percentile.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
18. Draw a normal curve and identify the approximate
location of the following scores. (a) You have the
most frequent score. (b) You have a low-frequency
score, but it is higher than most. (c) You have one
of the lower scores, but it has a relatively high
frequency. (d) Your score seldom occurred.
19. The following shows the distribution of final exam
scores in a large introductory psychology class. The
proportion of the total area under the curve is
given for two segments.
(d) Rank-order A, B, C, and D to reflect the order
of scores from the highest to the lowest
frequency.
(e) Rank-order A, B, C, and D to reflect the order
of scores from the highest to the lowest score.
21. Organize the ratio scores below in a table
and show their simple frequency and relative
frequency.
49 52 47 52 52 47 49 47 50 51 50 49 50 50 50
53 51 49
22. Draw a simple frequency polygon using the data in
problem 21.
f
.30
.20
45
50
55
60
65
70
75
Exam scores
80
85
95
90
(a) Order the scores 45, 60, 70, 72, and 85 from
most frequent to least frequent.
(b) What is the percentile of a score of 60?
(c) What proportion of the sample scored
below 70?
(d) What proportion scored between 60 and 70?
(e) What proportion scored above 80?
(f) What is the percentile of a score of 80?
20. The following normal distribution is based on a
sample of data. The shaded area represents 13%
of the area under the curve.
f
13%
x
A
x
x
x
B
x
C
x
x
x
D
(a) What is the relative frequency of scores
between A and B?
(b) What is the relative frequency of scores
between A and C?
(c) What is the relative frequency of scores
between B and C?
x
23. What type of graph should you create when
counting the frequency of: (a) The brands of cell
phones owned by students? Why? (b) The different body weights reported in a statewide survey?
Why? (c) The people falling into one of eight salary ranges? Why? (d) The number of students who
were absent from a class either at the beginning,
middle, or end of the semester? Why?
24. An experimenter studies vision in low light by
having participants sit in a darkened room for
either 5, 15, or 25 minutes and then testing their
ability to correctly identify 20 objects. (a) What is
the independent variable here? (b) What are the
conditions? (c) What is the dependent variable?
(d) You would use the scores from which variable
to create a frequency distribution?
25. (a) Why do we create a bar graph with a nominal
or ordinal X variable? (b) Why do we connect data
points with straight lines with an interval or ratio
X variable?
Chapter 2: Creating and Using Frequency Distributions
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
35
Chapter
3
SUMMARIZING SCORES
WITH MEASURES OF
CENTRAL TENDENCY
LOOKING
BACK
GOING
F O R WA R D
Be sure you understand:
Your goals in this chapter are to learn:
• From Chapter 1, the logic of statistics
and parameters, what independent
and dependent variables are, and how
experiments show a relationship.
• What central tendency is.
• From Chapter 2, what normal,
skewed, and bimodal distributions are,
and how to compute percentile using
the area under the curve.
Sections
3-1
Some New Symbols and
Procedures
3-2
What Is Central
Tendency?
3-3
Computing the Mean,
Median, and Mode
3-4
Applying the Mean to
Research
3-5
Describing the
Population Mean
36
• What the mean, median, and mode indicate and when each is
appropriate.
• The uses of the mean.
• What deviations around the mean are.
• How to interpret and graph the results of an experiment.
T
he frequency distributions discussed in Chapter 2 are
important because the shape of the distribution is
always the first important characteristic of data for us
to know. However, graphs and tables are not the most
efficient way to summarize a distribution. Instead, we compute
individual numbers—statistics—that provide information about
the scores. This chapter discusses statistics that describe the
important characteristic of data called central tendency. The
following sections present (1) the concept of central tendency,
(2) the three measures of central tendency, and (3) how we use
each measure to summarize and interpret data. But first, here
are some new symbols and procedures.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Jerry and Marcy Monkman/EcoPhotography.com/Alamy
3-1 SOME NEW SYMBOLS
AND PROCEDURES
Beginning in this chapter, we will be using common
statistical formulas. In them, we use X as the generic
symbol for a score. When a formula says to do something to X, it means to do it to all of the scores you
are calling X.
A new symbol you’ll see is ⌺, the Greek capital
letter S, called sigma. Sigma is the “summation sign,”
indicating to add together the scores. It always appears
with a symbol for scores, especially ⌺X. In words,
⌺X is called “sum of X” and literally means to find
the sum of the X scores. Thus, ⌺X for the scores 5, 6,
and 9 is 5 ⫹ 6 ⫹ 9, which is 20, so ⌺X ⫽ 20. Notice
we do not care whether each X is a different score. If
the scores are 4, 4, and 4, then ⌺X ⫽ 12.
Also, often the answers from a formula will contain decimals that we must “round off.” The rule for
rounding is to round the final answer to two more
decimal places than were in the original raw scores.
Usually we’ll have whole-number scores and then
the answer contains two decimal places, even if they
are zeros. However, carry more decimal places during your calculations: e.g., if the answer will have
two decimals, have at least three decimal places in
your calculations. (See Appendix A.1 to review how
to round.) Now on to central tendency.
A final answer should contain
two more decimal places than
are in the original raw scores.
3-2 WHAT IS CENTRAL
TENDENCY?
Statistics that measure central tendency are important
because they answer a basic question about data: Are
the scores in a distribution generally high scores or
generally low scores? For example, after taking a test
in some class, you first wonder how you did, but then
you wonder how the whole class did. Did everyone
generally score high, low, or what? You need this information to understand both how the class performed
and how you performed relative to everyone else. But
it is difficult to do this by looking at individual scores,
or even at a frequency distribution. Instead, it is better
if you know something like the class
sum of X (⌺X)
average. Likewise, in all research,
The sum of the scores
the first step is to shrink the data
in a sample
into one summary score, called a
Chapter 3: Summarizing Scores with Measures of Central Tendency
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
37
Norah Levine Photography/Brand X Pictures/Getty Images
measure of central tenSo when we ask, “Are the
dency, that describes the
scores in a distribution gensample as a whole.
erally high scores or generTo understand central
ally low scores?”, we are
tendency, first change your
actually asking, “Where on
perspective about what a
the variable is the distribuscore indicates. Think of
tion located?” A measure
a variable as a continuum
of central tendency is a statis(a straight line) and think
tic that indicates the location of a distriof a score as indicating a
bution on a variable. Listen to its name: It
participant’s location on that
indicates where the center of the distribution
variable. For example, if I am
tends to be located. Thus, it is the point on
70 inches tall, don’t think that I
the variable around where most of the scores
have 70 inches of height. Instead,
are located, and it provides an “address” for
as in Figure 3.1, my score is at
the distribution. In Sample A in Figure 3.2,
the “address” labeled 70 inches.
most of the scores are in the neighborhood of
If my brother is 60 inches tall,
59, 60, and 61 inches, so a measure of central
then he is located at 60. The
tendency will indicate that the distribution is
idea is not so much that he is
located around 60 inches. In Sample B, the
10 inches shorter than I am,
distribution is centered at 70 inches.
but rather that we are sepaNotice how descriptive statistics allow us
rated by a distance of 10 inches.
to understand a distribution without looking at
Thus, scores are locations, and the difference between
every score. If I told you only that one normal distribuany two scores is the distance between them.
tion is centered at 60 and another is centered around
From this perspective a frequency polygon shows
70, you could mentally envision Figure 3.2 and have
the location of all scores in a distribution. For exama good idea about all of the scores in the data. You’ll
ple, Figure 3.2 shows the height
scores from two samples. In our
Figure 3.1
“parking lot view” of each normal
curve, participants’ scores determine
Locations of Individual Scores on the Variable of Height
where they stand: A higher score puts
them on the right side of the curve,
Lower . . . X
X
X
X
X
X
X
X
X
X
X
X . . . Higher
scores
scores
a lower score puts them on the left
side, and a middle score puts them
70
60
Distance ⫽ 10⬙
in a crowd in the middle. Further,
with two distributions containing
different scores, then the
distributions have difFigure 3.2
ferent locations on the
Two Sample Polygons on the Variable of Height
variable.
Each polygon indicates the locations of the scores and their frequencies.
Sample A
measures
of central
tendency
Statistics that
summarize the location
of a distribution on a
variable by indicating
where the center of the
distribution tends to be
located
38
Sample B
f
0
Lower . . . 58
scores
59
60
61
62
...
Height (in inches)
68
69
70
71
72 . . . Higher
scores
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Figure 3.3
A Unimodal Distribution (a) and a Bimodal
Distribution (b)
The first step in summarizing
any set of data is to compute its
central tendency.
Each vertical line marks a highest point on the distribution, indicating
the most frequent score, which is the mode.
4
3
see other statistics that add to our understanding of a
distribution, but measures of central tendency are at
the core of summarizing data.
f
2
1
3-3 COMPUTING THE
0
MEAN, MEDIAN, AND MODE
In the following sections we consider the three common ways to measure central tendency: the mode, the
median, and the mean.
The mode is a score that has the highest frequency in
the data. For example, say that we have these scores:
2, 3, 3, 4, 4, 4, 4, 5, 5, 6. The score of 4 is the mode.
(There is no conventional symbol for the mode.)
The frequency polygon of these scores is shown in
the upper portion of Figure 3.3. It shows that the
mode does summarize this distribution because the
scores are located around 4. Notice that the polygon
is roughly a normal curve, with the highest point over
the mode. When a polygon has one hump, such as on
the normal curve, the distribution is called unimodal,
indicating that one score qualifies as the mode.
However, we may not always have only one
mode. Consider the scores 2, 3, 4, 5, 5, 5, 6, 7, 8, 9, 9,
9, 10, 11, 12. Here, the two scores of 5 and 9 are tied
for the most frequent score. This sample is plotted in
the lower portion of Figure 3.3. In Chapter 2 such a
distribution was called bimodal because it has two
modes. Identifying the two modes does summarize
this distribution, because most of the scores are either
around 5 or around 9.
The mode is the preferred measure of central
tendency when scores reflect a nominal scale of measurement (when participants are categorized using a
qualitative variable). For example, say that we asked
some people their favorite flavor of ice cream and
2
3
4
5
Test scores
(a)
6
7
4
3
f
3-3a The Mode
1
2
1
0
1 2 3 4 5 6 7 8 9 10 11 12 13
Test scores
(b)
counted the number of people choosing each category.
Reporting that the mode was “Goopy Chocolate”
does summarize the results, indicating that more people chose this flamode A score
having the highest
vor than any other.
frequency in the data
unimodal
The mode is a score
with the highest
frequency and is used
to summarize nominal
data.
A distribution whose
frequency polygon
has only one hump
and thus has only one
score qualifying as
the mode
bimodal A
distribution whose
frequency polygon
shows two humps,
each centered over
a score having the
highest frequency, so
there are two modes
Chapter 3: Summarizing Scores with Measures of Central Tendency
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
39
There are, however, two potential limitations of
the mode. First, the distribution may contain many
scores that all have the same highest frequency, and
then the mode does not summarize the data. In the
most extreme case we might obtain scores such as 4,
4, 5, 5, 6, 6, 7, 7. Here, there is no mode.
A second limitation is that the mode does not
take into account any scores other than the most frequent score(s), so it ignores much of the information
in the data. This can produce a misleading summary.
For example, in the skewed distribution containing
7, 7, 7, 20, 20, 21, 22, 22, 23, and 24, the mode is 7.
However, most scores are not around 7 but instead
are in the low 20s. Thus, the mode may not accurately summarize where most scores in a distribution
are located.
Because of these limitations, for ordinal, interval,
or ratio scores, we usually rely on one of the other
measures of central tendency, such as the median.
3-3b The Median
50% of the distribution from the upper 50%. Thus, on
the normal curve in Figure 3.4, the score at the vertical
line is the 50th percentile, so that score is the median.
(Notice that here, the median is also the mode.) Likewise, in the skewed distribution in Figure 3.4, 50% of
the curve is to the left of the vertical line, so the score
at the line is the median. In both cases, the median is a
reasonably accurate “address” for the entire distribution, with most of the scores around that point.
We have two ways to calculate the median. Usually we have at least an approximately normal distribution, so one way (the way we’ll use in this book)
is to estimate the median as the middle score that
separates the two halves of the distribution. However,
in real research we need more precision than an estimate can give. Therefore, the other way is to actually
compute the median using our data. This involves a
very tedious formula, so use SPSS, as described on
Review Card 3.4.
The median is the preferred measure of central
tendency when the data are ordinal scores. For example, say that several students ranked how well a college professor teaches on a scale of 1 to 10. Reporting
that the professor’s median ranking was 3 communicates that 50% of the students rated the professor as
number 1, 2, or 3. Also, as you’ll see, the median is
preferred when interval or ratio scores form a skewed
distribution. However, the median still ignores some
information in the data because it reflects only the
frequency of scores, and doesn’t consider their mathematical values. Therefore, the median is not our first
choice for describing the central tendency of normally
distributed interval or ratio scores.
The median is simply another name for the score at
the 50th percentile. Recall that researchers usually say
that 50% of the distribution is below the 50th percentile and 50% is above it. Thus, if the score of 10 is
the median, then 50% of the scores are below 10 and
50% are above 10.
The median presents fewer potential problems
than the mode because (1) a distribution can have
only one median and (2) the median will usually be
around where most of the scores in a distribution are
located. The symbol for the median is Mdn.
Figure 3.4 illustrates
how
the
Figure 3.4
median summarizes
Location of the Median in a Normal Distribution and in a Skewed Distribution
a distribution. Recall
The vertical line indicates the location of the median, with one-half of the distribution on each side of it.
from Chapter 2 that
because a score’s
Median
Median
percentile is the proportion of the area
under the curve that
is to the left of the
f
f
score, the median
separates the lower
median (Mdn)
The score located at the
50th percentile
40
Low
...
Middle
scores
...
High
Low
...
Middle
scores
...
High
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
© ollyy/Shutterstock.com
The median (Mdn) is the score
at the 50th percentile and is
used to summarize ordinal or
skewed interval or ratio scores.
> Quick Practice
>
>
The mode is the most frequent score in
the data.
distribution, but it is what most people call the average. Compute a mean in the same way that you compute an average: Add up the scores and then divide by
the number of scores you added. Unlike the mode or
the median, the mean considers the magnitude of every
score, so it does not ignore any information in the data.
Let’s first compute the mean of a sample. Usually
we use X to stand for the raw scores in a sample and
then the symbol for the sample mean is X. To compute
the X, recall that the symbol that indicates “add up
the scores” is ⌺X, and the symbol for the number of
scores is N, so
The median is the 50th percentile.
MORE EXAMPLES
With the scores 1, 3, 3, 4, 5, 6, and 9, the most frequent
score is 3, so the mode is 3. We calculate that the score
having 50% of the scores below it is 4, so the median
is 4.
For Practice
THE FORMULA FOR THE SAMPLE MEAN IS
1. What is the mode in 4, 6, 8, 6, 3, 6, 8, 7, 9, and 8?
⌺X
N
2. When is the median the same score as the mode?
X⫽
3. With what types of scores is the mode the preferred
statistic?
For example, say we have the scores 3, 4, 7, 6:
STEP 1:
4. With what types of scores is the median the preferred statistic?
Compute ⌺X. Add the scores together. Here, ⌺X ⫽
4 ⫹ 3 ⫹ 7 ⫹ 6 ⫽ 20
> Answers
STEP 2:
4. With ordinal or skewed interval/ratio scores
Determine N. Here, N ⫽ 4
3. With nominal scores
STEP 3:
2. When the data form a normal curve
Divide ⌺X by N. Here, X ⫽ 20>4 ⫽ 5
1. In this bimodal data, both 6 and 8 are modes
3-3c The Mean
By far the most common measure of central tendency in
behavioral research is the mean. The mean is defined
as the score located at the mathematical center of a
Saying that the mean of these
scores is 5 indicates that the center
of this distribution is located at the
score of 5.
What is the mathematical center
of a distribution? Think of the center
as the distribution’s balance point.
mean The score
located at the
mathematical center
of a distribution
X
The symbol used
to represent the
sample mean
Chapter 3: Summarizing Scores with Measures of Central Tendency
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
41
Figure 3.5
The Mean as the Balance Point of Any Distribution
1
3
4
6
7
1
3
7
8
X =4
X=5
For example, above, on the left side of Figure 3.5 are
the scores of 3, 4, 6, and 7. The mean of 5 is at the
point that balances the distribution. On the right side
of Figure 3.5, the mean is the balance point in a different distribution that is not symmetrical. Here, the
mean of 4 balances this distribution.
We compute the mean with only interval or ratio
data, because it usually does not make sense to compute the average of nominal scores (e.g., finding the
average of Democrats and Republicans) or of ordinal
scores (e.g., finding the average position in a race).
In addition, the distribution should be symmetrical
and unimodal. In particular, the mean is appropriate
with a normal distribution. For example, say we have
the scores 1, 2, 3, 3, 4, 4, 4, 5, 5, 6, 7, which form
the roughly normal distribution shown in Figure 3.6.
Here, ⌺X ⫽ 44 and N ⫽ 11, so the mean score—the
center—is 4. The reason we use the mean with such
data is because the mean is the mathematical center of
any distribution: On a normal distribution the center is
the point around where most of the scores are located,
so the mean is an accurate summary and provides an
accurate address for the distribution.
> Quick Practice
>
>
The mean is the average score, located
at the mathematical center of the
distribution.
Compute the mean with a normal, or
approximately normal, distribution of
interval or ratio scores.
More Examples
To find the mean of the scores 3, 4, 6, 8, 7, 3, and 5:
⌺X ⫽ 3 ⫹ 4 ⫹ 6 ⫹ 8 ⫹ 7 ⫹ 3 ⫹ 5 ⫽ 36, and N ⫽ 7.
Then X ⫽ 36>7 ⫽ 5.1428; this rounds to 5.14.
For Practice
1. What is the symbol for the sample mean?
2. What is the mean of 7, 6, 1, 4, 5, and 2?
3. With what data is the X appropriate?
4. How is a mean interpreted?
> Answers
The vertical line indicates the location of the mean score, which is the
center of the distribution.
3. With normally distributed interval or ratio scores
Location of the Mean on a Normal Distribution
4. It is the center or balance point of the distribution.
Figure 3.6
1. X 2. ⌺X ⫽ 25, N ⫽ 6, X ⫽ 4.1666, rounding to 4.17
X
3
f
2
3-3d Comparing the Mean,
1
0
Median, and Mode
1
2
3
4
Scores
42
5
6
7
8
In a normal distribution, all three measures of central
tendency are located at the same score. For example, in Figure 3.6 the mean of 4 also splits the curve
in half, so 4 is the median. Also, 4 has the highest
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
© iStockphoto.com/David Lentz/Strato
© iStockphoto.com/Stan Rohrer
If the mean was used to measure the real estate values of these properties, the price of the one mansion would
skew higher the true value of this row of run-down houses.
frequency, so 4 is the mode. If a distribution is only
roughly normal, then the mean, median, and mode
will be close to the same score. However, because the
mean uses all information in the data, and because
it has special mathematical properties, the mean is
the basis for most of the inferential procedures we
will see. Therefore, when you are summarizing interval or ratio scores, always compute the mean unless
it clearly provides an inaccurate description of the
distribution.
The mean will inaccurately describe a skewed distribution. This is because the mean must balance the
distribution and to do that, the mean will be pulled
toward the extreme tail of the distribution. In that
case, the mean will not be where most of the scores
are located. You can see this starting with the symmetrical distribution containing the scores 1, 2, 2,
2, 3. The mean is 2 and this accurately describes the
Figure 3.7
Measures of Central Tendency for Skewed Distributions
The vertical lines show the relative positions of the mean, median, and mode.
Positive skew
Mode
Median
Mean
f
f
Low
...
Middle
scores
...
High
Low
...
scores. However, including the score of 20 would give
the skewed sample 1, 2, 2, 2, 3, 20. Now the mean is
pulled up to 5. But! Most of these scores are not at or
near 5. As this illustrates, the mean is always at the
mathematical center, but in a skewed distribution that
center is not where most of the scores are located.
The solution is to use the median to summarize a
skewed distribution. Figure 3.7 shows the relative positions of the mean, median, and mode in skewed distributions. In both graphs the mean is pulled toward the
extreme tail and is not where most scores are located.
Each distribution is also not centered around its
mode. Thus, of the three measures, the median most
accurately reflects the central tendency—the overall
address—of a skewed distribution.
It is for the above reasons that the government
uses the median to summarize such skewed distributions as that of yearly income or the price of houses.
For example, the median income in the United States is
approximately $50,000
a year. But a relatively
small number of corporate executives, movie
stars, professional athletes, and the like make
Negative skew
millions! Averaging in
Mode
these high incomes
Median
would produce a mean
Mean
much higher than the
median. However, most
incomes are not located
around this higher
score, so the median is a
Middle
...
High
better summary of this
scores
distribution.
Chapter 3: Summarizing Scores with Measures of Central Tendency
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
43
the score “deviates” from the mean. A score’s deviation is equal to the score minus the mean. In symbols
this is written as:
With interval or ratio scores the
mean is used to summarize normal
distributions; the median is used to
summarize skewed distributions.
X⫺X
Thus, if the sample mean is 47, a score of 50 deviates
by ⫹3 because 50 ⫺ 47 is ⫹3. A score of 40 deviates
from the mean of 47 by ⫺7 because 40 ⫺ 47 ⫽ ⫺7.
Believe it or not, we’ve now covered the basic
measures of central tendency. In sum, the first step in
summarizing data is to compute a measure of central
tendency to describe the score around which the distribution tends to be located. Determine which measure
to compute based on (1) the scale of measurement of
the scores, and (2) the shape of the distribution. For
help, read research related to your study and see how
other researchers computed central tendency.
• Compute the mode with nominal data or with a
distinctly bimodal distribution of any type of scores.
•
Compute the median with ordinal scores or with
a very skewed distribution of interval/ratio scores.
• Compute the mean with a normal or approximately
normal distribution of interval or ratio scores.
3-4 APPLYING THE MEAN
TO RESEARCH
Most often the data in behavioral research are summarized using the mean. This is because most often, we
measure variables using interval or ratio scores that
naturally form a roughly normal distribution. Because
the mean is used so extensively, in the following sections
we will delve further into its characteristics and uses.
3-4a Deviations around the Mean
deviation The
distance a score is from
the mean; indicates
how much the score
differs from the mean
sum of the
deviations
around the
mean The sum of
all differences between
the scores and the
mean; symbolized by
⌺(X ⫺ X )
44
First, you need to understand why
the mean is at the center or “balance
point” of a distribution. The answer
is because the mean is just as far from
the scores above it as it is from the
scores below it. That is, the total distance that some scores are above the
mean equals the total distance that
the other scores are below the mean.
The distance separating a score
from the mean is called the score’s
deviation, indicating the amount
Always subtract the mean from the
raw score when computing a score’s
deviation.
Do not think of deviations as positive or negative
numbers in the traditional sense. Think of a deviation
as having two components: the number, which indicates distance from the mean (which is always positive), and the sign, which indicates direction from the
mean. Thus, a positive deviation indicates that the
score is greater than the mean, and a negative deviation indicates that the score is less than the mean. The
size of the deviation (regardless of its sign) indicates
the distance the score lies from the mean: The larger
the deviation, the farther the score is from the mean. A
deviation of 0 indicates that the score equals the mean.
When we determine the deviations of all the
scores in a sample, we find the deviations around the
mean. Then the sum of the deviations around the
mean is the sum of all differences between the scores
and the mean. And here’s why the mean is the mathematical center of a distribution:
The sum of the deviations around the mean
always equals zero.
Table 3.1
Computing Deviations around the Mean
X
minus
3
⫺
4
6
⫺
7
⫺
⫺
equals
Deviation
5
⫽
⫺2
5
5
⫽
⫺1
5
⫽
⫽
Sum
⫽
⫹1
⫹2
0
X
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
© iStockphoto.com/DNY59
For example, the scores 3, 4, 6,
and 7 have a mean of 5. Tablee 3.1
shows how to compute the sum
m of
the deviations around the mean
n for
these scores. First, we subtract the
mean from each score to obtain
ain
the score’s deviation. Then wee
add the deviations together.
© iStockphoto.com/Maartje van Caspel
The sum is zero. In fact, for
any distribution having any
shape, the sum of the deviations around the mean will
equal zero. This is because the sum of the positive deviations equals the sum of the negative deviations, so the
sum of all deviations is zero. In this way the mean is the
center of a distribution, because the mean is an equal
distance from the scores above and below it.
Note: Some of the formulas in later chapters
involve computing deviations around the mean, so
you need to know how we communicate the sum of
the deviations using symbols. Combining the symbol
for a deviation, (X ⫺ X ), with the symbol for sum,
⌺, gives the sum of the deviations as ⌺(X ⴚ X). We
always work inside parentheses first, so this says to
first subtract the mean from each score to find each
deviation. Then we add all of the deviations together.
Thus, in symbols, ⌺(X ⫺ X ) ⫽ 0.
The importance of the sum of the deviations
equaling zero is that it makes the mean literally the
score around which everyone in the sample scored:
Some scores are above the mean to the same extent
that others are below it. Therefore, we think of the
mean as the typical score, because it is the one score
that more or less describes everyone’s score, with the
same amounts of more and less. This is why the mean
is so useful for summarizing a group of scores.
This characteristic is also why the mean is the best
score to use if you want to predict an individual’s score.
Because the mean is the center score, any errors in our
predictions will cancel out over the long run. Here’s why:
The amount of error in one prediction
n is the difference
between what someone actually gets (X) and what we
predict he or she will get (X). In symbols,
ols, this difference
is X ⫺ X, which we’ve seen is a deviation.
ation. But alter
your perspective: In this context, a deviation
viation is the
amount of error we have when we predict
dict the mean
as someone’s score. If we determine the
he amount of
error in every prediction, our total error
rror is equal
to the sum of the deviations, which equals
uals zero.
For example, if the mean on
n an
exam is 80, we’ll predict a score of 80
for every student in the class and, of
course, sometimes we will be
wrong. Say that one student
scored 70. We would predict an
80, so we’d be off by
b ⫺10 because this person’s score
deviates from the m
mean by ⫺10. However, the mean
is the central score
score, so another student would score
90. By estimatin
estimating an 80 again, we’d be off by
⫹10 because th
this person deviates by ⫹10. And
so on, so that o
over the entire sample, our errors
will balance out to a total of zero because the
total positive deviations cancel out the total negative
deviations. (Likewise, any students taking the exam in
the future should be like those we’ve tested and score
around 80. Thus, we’d also predict they would score
80, and our errors would again balance out to zero.)
If we consistently predict any score other than the
mean, the total error will be greater than zero. However, a basic rule of statistics is that if we can’t perfectly
describe every score, then the next best thing is to have
our errors balance out to zero. There is an old joke
about two statisticians shooting at a target. One hits 1
foot to the left of the target, and the other hits 1 foot
to the right. “Congratulations,” one says, “we got it!”
Likewise, we want our errors—our over- and underestimates—to balance out to zero. Only the mean provides this balancing capability. Therefore, when we do
not know anything else about the scores, we predict
that any individual will score at the mean score.
So, to sum up, remember these three things about
deviations around the mean:
1. A deviation equals X ⫺ X and indicates a score’s
distance above or below the mean.
2. The mean is the central score, so the positive and
negative deviations cancel out and the sum of the
deviations around the mean equals zero.
3. A deviation also indicates the amount of error
between the X we predict for someone and the X
that she or he actually gets. The total error over
all such predictions
p
equals the sum of the deviations, w
which is zero.
3-4b Summarizing Research
3
N
Now
you can understand why researchers compute the mean anytime we have a
sample of normally distributed interval or ratio
sampl
scores. So, if we’ve merely observed some participants, we
w compute the mean number of times they
exhibit a particular behavior, or we compute the
exhib
mean response in a survey. In a correlational
mea
Chapter 3: Summarizing Scores with Measures of Central Tendency
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
45
Table 3.2
Number of Words Correctly Recalled from a 5-,
10-, or 15-Item List
Condition 1:
5-Item List
4
3
2
Condition 2:
10-Item List
7
6
5
Condition 3:
15-Item List
10
9
8
study we compute the mean score on the X variable and
the mean score on the Y variable (symbolized by Y).
Such means are then used to summarize the sample and
to predict any individual’s score in this situation.
The predominant way to summarize experiments
is also to compute means. As an example, say that we
conduct a very simple study of memory by having
participants recall a list in one of three conditions in
which the list contains 5, 10, or 15 words. Our dependent variable is the number of words recalled from
the list. Table 3.2 above shows some idealized recall
scores from the participants in each condition (each
column). A relationship appears to be present because
we see a different batch of higher recall scores occurring with each increase in list length.
A real experiment would employ a much larger
N, and so to see the relationship buried in the scores,
we would compute a mean (or other measure of central tendency) for each condition. When selecting the
appropriate measure, remember that the scores are from
the dependent variable, so compute the mean, median,
or mode depending upon (1) the scale of measurement
of the dependent variable and (2) for interval or ratio
scores, the shape of the distribution.
In our recall experiment, we compute the mean of
each condition, producing Table 3.3 below. When you
are reading research, you will usually see only the means,
and not the original raw scores. To interpret each mean,
envision the scores that typically would produce it. In
our condition 1, for example, a normal distribution
Table 3.3
producing a mean of 3 would contain scores distributed
above and below 3, with most scores close to 3. We then
use this information to describe the scores: In condition
1, for example, we’d say participants score around 3, or
the typical score is 3, and we’d predict that any participant would have a score of 3 in this situation.
To see the relationship that is present, look at the pattern formed by the means: Because a different mean indicates a different batch of raw scores that produced it, a
relationship is present when the means change as the conditions change. Table 3.3 shows a relationship because as
the conditions change from 5 to 10 to 15 items in a list,
the means indicate that the recall scores also change from
around 3 to around 6 to around 9, respectively.
Note, however, that not all means must change for
a relationship to be present. If, for example, our means
were 3, 5, and 5, respectively, then at least sometimes
we see a different batch of recall scores occurring for
different list lengths, and so a relationship is present.
On the other hand, if the means for the three conditions had been 5, 5, and 5, this would indicate that
essentially the same batch of recall scores occurred
regardless of list length, so no relationship is present.
Let’s assume that the data in Table 3.3 are representative of how the population behaves in this
situation. (We must perform inferential procedures
to check this.) If so, then we have demonstrated that
list length makes a difference in the mean scores and
thus in the individual recall scores. It is important to
recognize that demonstrating a difference between the
means is the same thing as demonstrating a relationship. In each case we are saying that a different group
of scores occurs in each condition. This is important
because in published research, researchers often imply
that they have found a relationship by saying that they
have found a difference between the means. If they
find no difference, they have not found a relationship.
In an experiment, if the means change as
the conditions change, then the raw scores
are changing, and a relationship is present.
Means of Conditions in Memory Experiment
46
Condition 1:
5-Item List
Condition 2:
10-Item List
Condition 3:
15-Item List
X⫽3
X⫽6
X⫽9
The above logic for interpreting an experiment also
applies to the median and mode. A relationship is present if a different median or mode occurs in two or more
conditions, because this indicates that a different batch
of raw scores is occurring as the conditions change.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
graph is often a good way to present the results of an
experiment, especially when they are complicated. To
create a graph, begin by labeling the X axis with the
conditions of the independent variable. Label the Y
axis with the mean (or mode or median) of the dependent scores. (Do not be confused by the fact that we
used X to represent the scores in the formula for computing the mean. We still plot the means on the Y axis.)
Complete the graph by creating either a line graph or a
bar graph. The type of graph to select is determined by
the characteristics of the independent variable.
Create a line graph when the independent variable
is an interval or a ratio variable. In a line graph we
plot a data point for each condition, and then connect
adjacent points with straight lines. For example, in our
previous memory study, the independent variable of list
length is a ratio variable. Therefore, we create the line
graph in the upper portion of Figure 3.8. A data point
is placed above the 5-item condition opposite the mean
Figure 3.8
Mean words recalled
Line Graphs Showing the Relationship between
Mean Words Recalled and List Length and No
Relationship
10
9
8
7
6
5
4
3
2
1
Mean words recalled
0
5
10
Length of list
15
10
9
8
7
6
5
4
3
2
1
0
5
10
Length of list
15
of 3 errors, a data point is
above the 10-item condition at the mean of
6 errors, and a data
point is above the
15-item condition at the
mean of 9 errors. Then we connect
adjacent data points with straight lines.
We use straight lines here for the same reason we
used them when producing polygons: When the X variriable involves an interval or ratio scale, we assume it iss a
continuous variable. The lines show that the relationship
hip
continues between the scores on the X axis. For exammple, we assume that if there had been a 6-item list, the
he
mean error score would fall on the line connecting the
he
means for the 5- and 10-item lists.
The graph conveys the same information as the
he
sample means did back in Table 3.3. Look at the overall
all
pattern: If the vertical positions of the data points go
up or down as the conditions change, then the means
ns
are changing. Different sample means indicate differerent scores in each condition, so a relationship is present.
nt.
However, say that instead, in every condition the mean
an
was 5, producing the lower graph in Figure 3.8. The
he
result is a horizontal line, indicating that the mean score
ore
stays the same as the conditions change, so essentially
the same recall scores occur in each condition and no
relationship is present.
©iStockphoto.com/Ivan Kmit
GRAPHING THE RESULTS OF AN EXPERIMENT A
If data points form a line that is not
horizontal, the Y scores are changing
as X changes, and a relationship is
present.
The other type of graph used to
summarize experiments is a bar graph,
like we saw in Chapter 2. Create a bar
graph when the independent variable
is a nominal or ordinal variable. Place
a bar above each condition on the X
axis to the height on the Y axis that
corresponds to the mean, median, or
mode for that condition. As usual,
adjacent bars do not touch. We use
bars here for the same reason we
used them in Chapter 2: Nominal and
line graph
A graph of an
experiment’s results
when the independent
variable is an interval
or ratio variable;
plotted by connecting
data points with
straight lines; as
opposed to a bar
graph, used when the
independent variable
is a nominal or ordinal
variable
Chapter 3: Summarizing Scores with Measures of Central Tendency
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
47
Figure 3.9
Bar Graph Showing Mean Words Recalled as a
Function of College Major
The height of each bar corresponds to the mean score for the condition.
14
Mean words recalled
12
10
>
>
8
6
4
2
0
> Quick Practice
Physics
Psychology
College major
English
ordinal X variables are assumed to be discrete, meaning
you can have one score or another score, but nothing in
between. The spaces between the bars communicate this.
For example, say that we conducted an experiment
comparing the recall errors made by psychology majors,
English majors, and physics majors. This independent
variable involves a nominal scale, so we have the bar
graph shown in Figure 3.9 above. Because the tops of the
bars do not form a horizontal line, we know that different means and thus different scores are in each condition.
We see that individual scores are around 8 for physics
majors, around 4 for psychology majors, and around 12
for English majors, so a relationship is present.
Note: In some experiments, we may measure a
nominal or an ordinal dependent variable. In that case
we would plot the mode or median on the Y axis for
each condition. Then, again depending on the characteristics of the independent variable, we would create
either a line or bar graph.
Graph the independent variable on
the X axis and the mean, median, or
mode of the dependent scores on the
Y axis.
Create a line graph when the
independent variable is interval or ratio;
create a bar graph when it is nominal or
ordinal.
More Examples
Say that men and women rated their satisfaction with
an instructor, and the mean scores were 20 and 30, respectively. To graph this, gender is a nominal independent variable, so plot a bar graph, with the labels “men”
and “women” on X and the mean for each gender on Y.
Or, say we measure the satisfaction scores of students
tested with either a 10-, 40-, or 80-question final exam
and, because the scores form very skewed distributions,
we compute the median in each condition. Test length
is a ratio independent variable, so plot a line graph, with
the labels 10, 40, and 80 on X and the median of each
condition on Y.
For Practice
1. The independent variable is plotted on the ____
axis, and the dependent variable is plotted on the
____ axis.
2. A ____ shows a data point above each X, with adjacent points connected with straight lines. A ____
shows a discrete bar above each X.
3. The characteristics of the ____ variable determine
whether to compute the mean, median, or mode.
4. The characteristics of the ____ variable determine
whether to plot a line or bar graph.
5. Create a bar graph with ____ or ____ variables.
48
6. Create a line graph with ____ or ____ variables.
> Answers
1. X; Y
2. line graph; bar graph
3. dependent
4. independent 5. nominal; ordinal 6. interval; ratio
The scale of measurement of the
dependent variable determines
which measure of central tendency
to compute. The scale of the
independent variable determines the
type of graph to create.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
/
to.com
ckpho
© iSto Uliasz
k
re
Ma
3-5 DESCRIBING THE
POPULATION MEAN
Recall that ultimately we seek to describe the population of scores found in a given situation. Populations are
unwieldy, so we also summarize them using measures of
central tendency. Usually we have normally distributed
interval or ratio scores, so usually we describe the population mean. The symbol for a population mean is the
Greek letter M (pronounced “mew”). Thus, to indicate
that the population mean is 143, we’d say m ⫽ 143. We
use m simply to show that we’re talking about a population, as opposed to a sample, but a population mean
has the same characteristics as a sample mean: m is the
average of the scores in the population, it is the center of
the distribution, and the sum of the deviations around m
equals zero. Thus, m is the score around which everyone
in the population scored, it is the typical score, and it is
the score we predict for any individual in the population.
The symbol for a population mean is μ
To visualize this
M The symbol used
relationship in pubto represent the
population mean
lished research, we
usually assume that a
graph of the population relationship would look like
the graph from the sample data we created earlier.
However, so that you can understand the statistics we
discuss later, you should envision the relationship in the
population in the following way. We assume that we
know two things: By estimating each m, we know where
on the dependent variable each population would be
located. Also, by assuming that recall scores are normally distributed, we know the shape of each distribution. Thus, we can envision the populations of recall
scores we expect as the frequency polygons shown in
Figure 3.10. (These are frequency distributions, so the
dependent (recall) scores are on the X axis.) The figure
shows a relationship because, as the conditions of the
independent variable change, scores on the dependent
variable change so that we see a different population of
scores for each condition. Essentially, for every 5 items
added to a list, the distribution slides to the right, going
from scores around 3 to around 6 to around 9.
Conversely, say that we had found no relationship
where, for example, every X was 3. Then, we’d envision one normal distribution located at m ⫽ 3 for all
three conditions.
Notice that by envisioning the relationship in the
population, we have the scores for describing everyone’s behavior, so we are describing how nature operates in this situation. In fact, as we’ve done here, in
every study we (1) use the sample means (or other
measures) to describe the relationship in the sample,
(2) perform our inferential procedures, and (3) use the
sample data to envision the relationship found in the
population—in nature. At that point we have basically
achieved the goal of our research and we are finished
with our statistical analyses.
How do we determine m? If all scores in the population are known, then we compute m using the same formula used to compute the sample mean, so m ⫽ ⌺X/N.
Usually, however, a population is infinitely large, so
instead, we perform the inferential process we’ve discussed previously, using the mean of a sample to estimate
m. Thus, if a sample’s mean in a particular situation is 99,
then, assuming the sample accurately represents the population, we estimate that m in that situation is also 99.
Likewise, ultimately we wish to describe any experiment in terms of the scores that would be found if we
tested the entire population in each
Figure 3.10
condition. For example, assume that the
Locations of Populations of Recall Scores as a Function of List Length
data from our previous list-length study
Each distribution contains the recall scores we would expect to find if the population were tested
is representative. Because the mean in
under each condition.
the 5-item condition was 3, we expect
that everyone should score around 3 in
µ for
µ for
µ for
5-item
10-item
15-item
this situation, so we estimate that if the
list
list
list
population recalled a 5-item list, the m
f
would be 3. Similarly, we infer that if
the population recalled a 10-item list, m
would equal our condition’s mean of 6,
0
1
2
3
4
5
6
7
8
9
10
11
12
and if the population recalled a 15-item
Recall scores
list, m would be 9.
Chapter 3: Summarizing Scores with Measures of Central Tendency
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
49
USING SPSS
SPSS will compute the three measures of central tendency we’ve discussed for one sample of data (along with
other statistics discussed in Chapter 4). Instructions for this are on Review Card 3.4. However, it is not necessary
to repeatedly run this routine to compute the mean for each condition in an experiment. Instead, SPSS
computes the means for all conditions at once as part of performing the experiment’s inferential procedures,
which we’ll discuss later.
Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out
and use the Chapter Review Cards in the back of your book. Check out the additional study aids online
in CourseMate at www.cengagebrain.com
STUDY PROBLEMS
(Answers for odd-numbered problems are in Appendix C.)
1. (a) What does a measure of central tendency indicate? (b) What are the three measures of central
tendency?
13. The following distribution shows the locations of
five scores.
2. What two aspects of the data determine which
measure of central tendency to compute?
3. What is the mode, and with what type of data is it
most appropriate?
4. What is the median, and with what type of data is
it most appropriate?
5. What is the mean, and with what type of data is it
most appropriate?
6. (a) Why does the mean accurately summarize
a normal distribution? (b) Why does the
mean inaccurately summarize a skewed
distribution?
7. What do you know about the shape of the
distribution if the median is a considerably lower
score than the mean?
8. What two pieces of information about the
location of a raw score does a deviation score
convey?
9. (a) What does (X ⫺ X ) indicate? (b) What does
⌺(X ⫺ X ) indicate? (c) What two steps must be
performed to compute ⌺(X ⫺ X ) for a sample?
(d) Why does the sum of the deviations around
the mean equal zero?
10. Why do we use the mean of a sample to predict
anyone’s score in that sample?
11. For the following data, compute (a) the mean and
(b) the mode.
55 57 59 58 60 57 56 58 61 58 59
12. (a) In question 11, what is your best estimate of
the median? (b) Why?
50
f
A
B
C
D
E
a. Match the deviation scores ⫺7, ⫹1, 0, ⫺2,
and ⫹5 with their locations.
A ⫽ _____ B ⫽ _____ C ⫽ _____ D ⫽ _____ E ⫽ _____
b. Arrange the deviation scores to show the
highest to lowest raw scores.
c. Arrange the deviation scores to show the
raw scores having the highest to lowest
frequency.
14. (a) You misplaced one of the scores in a sample,
but you have the other data below. What score
should you guess for the missing score? (b) Why?
15 12 13 14 11 14 13 13 12 11 15
15. A researcher collected the following sets of data.
For each, indicate the measure of central tendency
she should compute: (a) the following IQ scores:
60, 72, 63, 83, 68, 74, 90, 86, 74, and 80; (b) the
following error scores: 10, 15, 18, 15, 14, 13, 42, 15,
12, 14, and 42; (c) the following blood types: A2,
A2, O, A1, AB2, A1, O, O, O, and AB1; (d) the following grades: B, D, C, A, B, F, C, B, C, D, and D.
16. On a normal distribution, four participants
obtained the following deviation scores: ⫺5, 0,
⫹3, and ⫹1. (a) Which person obtained the lowest
raw score? How do you know? (b) Which person’s
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
raw score had the lowest frequency? How do
you know? (c) Which person’s raw score had the
highest frequency? How do you know? (d) Which
person obtained the highest raw score? How do
you know?
17. Kevin claims a deviation of ⫹5 is always better than
a deviation of ⫺5. Why is he correct or incorrect?
18. (a) What is the symbol “m” called and what does
it stand for? (b) How do we usually determine its
value?
19. In an experiment: (a) Which variable is plotted on
the X axis? (b) Which variable is plotted on the
Y axis? (c) How do you recognize the independent variable of an experiment? (d) How do you
recognize the dependent variable?
20. (a) In an experiment, what is the rule for when
to make a bar graph? (b) Define these scales.
(c) What is the rule for when to make a line graph?
(d) Define these scales.
Mean number of errors
21. For the following experimental results, interpret
specifically the relationship between the independent and dependent variables:
40
35
30
25
20
15
10
5
0
Y axis and which on the X axis, (2) whether the
researcher should create a line graph or a bar
graph, and (3) how she should summarize scores
on the dependent variable: (a) a study of income
for different age groups we’ve selected; (b) a study
of politicians’ positive votes on environmental
issues after we’ve classified them as having or not
having a wildlife refuge in their political district;
(c) a study of running speed depending on the
amount of carbohydrates we’ve given participants;
(d) a study of rates of alcohol abuse depending on
which ethnic group we examine.
25. You conduct a study to determine the impact that
varying the amount of noise in an office has on
worker productivity. You obtain the following
productivity scores.
Condition 1:
Low Noise
Condition 2:
Medium Noise
Condition 3:
Loud Noise
15
19
13
13
13
11
14
10
12
9
7
8
(a) Productivity scores are normally distributed
ratio scores. Compute the summaries of this
experiment. (b) Draw the appropriate graph
for these data. (c) Draw how we would envision
the populations produced by this experiment.
(d) What conclusions should you draw from this
experiment?
1
2
4
5
6
3
Hours of sleep deprivation
7
8
22. (a) If you participated in the study in question 21
and had been deprived of 5 hours of sleep, how
many errors do you think you would have made?
(b) If we tested all people in the world after
5 hours of sleep deprivation, how many errors
do you think each would make? (c) What symbol
stands for your prediction in part b?
23. You hear that a line graph of mean scores from
the Grumpy Emotionality Test slants downward as
the researcher increased the amount of sunlight
present in the room where participants were
tested. (Hint: Sketch this graph.) (a) What does this
tell you about the mean scores for the conditions?
(b) What does this tell you about the raw scores in
the conditions? (c) What would we expect to find
regarding the populations and their ms? (d) Should
we conclude there is a relationship between
emotionality and sunlight in nature?
24. For each of the experiments below, determine
(1) which variable should be plotted on the
26. In a study of participants’ speeds of texting, the
researchers concluded, “We found a difference
between the three means for the three age
groups, with slower speeds occurring with
increased age. However, no speed differences were
found between the overall means for males and
females.” Based on this conclusion, describe the
relationship we expect to find in nature between
texting speed and (a) age; (b) gender.
Chapter 3: Summarizing Scores with Measures of Central Tendency
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
51
Chapter
4
SUMMARIZING SCORES WITH
MEASURES OF VARIABILITY
LOOKING
BACK
GOING
F O R WA R D
Be sure you understand:
Your goals in this chapter are to learn:
• From Chapter 2, how to read and interpret the normal curve, and how to use
the proportion of the area under the
curve to determine relative frequency.
• What is meant by variability.
• From Chapter 3, what X and m stand
for, what a deviation is, and why the
sum of the deviations around the
mean is zero.
Sections
4-1
4-2
4-3
Understanding Variability
The Range
• What the range indicates.
• What the standard deviation and variance are and how to interpret
them.
• How to compute the standard deviation and variance when
describing a sample, when describing the population, and when
estimating the population.
Y
ou have seen that the first steps in dealing with data
are to consider the shape of the distribution and
to compute the mean (or other measure of central
tendency). This information simplifies the distribu-
tion and allows you to envision its general properties. But not
The Sample Variance and
Standard Deviation
everyone will behave in the same way, so we may see many
4-4
The Population Variance
and Standard Deviation
must also determine whether there are large or small differ-
4-5
Summary of the Variance
and Standard Deviation
4-6
Computing Formulas
for the Variance and
Standard Deviation
4-7
52
Statistics in the Research
Literature: Reporting
Means and Variability
different scores. Therefore, to completely describe data you
ences among the scores. This chapter discusses the statistics
for describing the differences among scores, which are called
measures of variability. In the following sections we discuss
(1) the concept of variability, (2) the statistics that describe
variability in the sample, and (3) the statistics that describe
variability in the population.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
©Stanth/Shutterstock.com
4-1 UNDERSTANDING
VARIABILITY
Computing a measure of variability is important
because it answers the question “How large are the
differences among the scores?” Without it, a measure
of central tendency provides an incomplete description. For example, look at Table 4.1. Each sample has
a mean of 6, so without looking at the raw scores,
you might think they are identical distributions. But,
Sample A contains scores that differ greatly from each
other and from the mean, Sample B contains scores
that differ less, and in Sample C there are no differences among the scores.
Table 4.1
Three Different Distributions Having the Same
Mean Score
Sample A
0
2
6
10
12
Sample B
8
7
6
5
4
Sample C
6
6
6
6
6
X 6
X 6
X 6
Thus, to completely describe a set of data, we
need to calculate statistics called measures of variability. Measures of variability describe the extent to
which scores in a distribution differ from each other. In
a sample with more frequent, larger differences among
the scores, these statistics will produce larger numbers
and we say the scores (and the underlying behaviors)
are more variable or show greater variability.
Measures of variability communicate three
aspects of the data. First, the opposite of variability is
consistency. Small variability indicates that the scores
are consistently close to each other. Larger variability indicates a variety of scores that are inconsistent.
Second, the amount of variability implies how accurately a measure of central tendency describes the distribution. Our focus will be on the mean and normal
distributions: The more that scores differ from each
other, the less accurately they are summarized by the
mean. Conversely, the smaller the variability, the closer
the scores are to each other and to the mean. Third,
we’ve seen that the difference between two scores can
be thought of as the distance that separates them.
From this perspective, greater differences indicate
greater distances between the scores,
measures of
so measures of variability indicate
variability
how spread out a distribution is.
Statistics that
(For this reason, researchers—and
summarize the extent
SPSS—also refer to variability as
to which scores in
a distribution differ
dispersion: With greater variability,
from one another
the scores are more “dispersed.”)
Chapter 4: Summarizing Scores with Measures of Variability
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
53
So, without even looking at the scores in the samples back in Table 4.1, by knowing their variability
we’d know that Sample C contains consistent scores
that are close to each other, so the mean of 6 accurately
describes them. Sample B contains scores that differ
more—are more spread out—so they are less consistent and 6 is not so accurate a summary. Sample A contains very inconsistent scores that are spread out far
from each other, so 6 poorly describes most of them.
You can see the same aspects when describing or
envisioning a larger normal distribution. For example,
Figure 4.1 shows three normal distributions that differ
because of their variability. Let’s use our “parking lot
view.” If our statistic indicates relatively small variability,
Figure 4.1
Three Variations of the Normal Curve
Distribution A
X
f
0
30 35 40 45 50 55 60 65 70
Scores
Distribution B
X
f
0
30 35 40 45 50 55 60 65 70 75
Scores
Distribution C
it implies a distribution similar to Distribution A: This
is very narrow because most of the people are standing in long lines located close to the mean (with few
standing at, say, 40 or 60). Thus, most scores are close
to each other, so their differences are small and this is
why our statistic indicates small variability. However,
if our statistic indicates intermediate variability, then it
implies a distribution more like B: This is more spread
out because longer lines of people are located at scores
farther above and below the mean (more people stand
near 40 and 60). In other words, a greater variety of
scores occur here, producing more frequent and larger
differences, and this is why our statistic is larger. Finally,
when our statistic shows very large variability, we envision a distribution like C. It is very wide because long
lines of people are at scores located farther into the tails
(here, scores beyond 40 and 60 occur often). Therefore,
frequently scores are anywhere between very low and
very high, producing many large differences, and this is
why the statistic is so large.
Researchers have several ways to measure variability. The following sections discuss the three most
common measures of variability: the range, the variance, and the standard deviation. But first:
> Quick Practice
>
>
Measures of variability describe the
amount that scores differ from each other.
When scores are variable, we see
frequent large differences among the
scores, indicating that participants are
behaving inconsistently.
More Examples
If a survey produced high variability, then each person
had a rather different answer—and score—than the
next. This produces a wide normal curve. If the variability
on a exam is small, then many students obtained either
the same or close to the same score. This produces a
narrow normal curve.
X
f
For Practice
1. When researchers measure the differences among
scores, they measure _________.
0
54
25 30 35 40 45 50 55 60 65 70 75
Scores
2. The opposite of variability is _________.
(continued)
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4-3 THE SAMPLE
3. When the variability in a sample is large, are the scores
close together or very different from each other?
VARIANCE AND
VAR
STANDARD
STA
DEVIATION
DE
4. If a distribution is wide or spread out, then the
variability is _________.
> Answers
3. Different 4. large
One way to describe variability is to determine how
w
far the lowest score is from the highest score. The
he
descriptive statistic that indicates the distance between
en
the two most extreme scores in a distribution is called
ed
the range.
THE FORMULA FOR THE RANGE IS
Glyn Jones/Cardinal/Corbis
Range Highest score Lowest score
For example, the scores back in Sample A (0, 2, 6, 10,
range of 12 0 12. The less variable
12) have a ran
Sample B (4, 5, 6, 7, 8) have a range of
scores in Sam
And the perfectly consistent Sample C
8 4 4. A
(6, 6, 6, 6, 6) has a range of 6 6 0.
Thus, the range does communiccate the spread in the data. However,
the range is a rather crude measure.
th
Because it involves only the two most
B
extreme scores, the range is based on the
extre
typical and often least frequent scores,
least typ
all other scores. Therefore, we
while ignoring
ign
use the range as our sole measure of
usually us
variability only with nominal or ordinal data.
With nominal data, we compute the range
the number of categories we’re
by counting
counti
examining: For example, there is more consisexamining
the participants in a study belong to 1
tency if th
parties than if they belong to 1 of
of 4 political
politi
parties.
14 parties
With ordinal data, the range is the disbetween the lowest and highest rank: If
tance be
runners finish a race spanning only the 5
100 run
positions from 1st through 5th, this is a close race with
many ties; if they span 75 positions, the runners are
more spread out.
© Kaz Chiba/Taxi Japan/Getty Images
1. variability 2. consistency
4-2 THE RANGE
Most behavioral research involves
Mo
interval or ratio scores that form a
int
normal distribution. In such situanor
tions (when the mean is appropritio
ate), we use two similar measures
ate
of variability called the variance
o
and the standard deviation.
Understand that we use the
variance and the standard deviation to describe how different the
scores are from each other. We calculate them, however, by measuring
how much the scores differ from the
mean. Because the mean is the cenmean
ter of a distribution, when scores are
spread out from each other, they are also spread out
from the mean. When scores are close to each other,
they are close to the mean.
The variance and standard
deviation are the two measures
of variability that indicate how
much the scores are spread out
around the mean.
Mathematically, the distance between a score and
the mean is the difference between the score and the
mean. This difference is symbolized by X X, which,
as in Chapter 3, is the amount that a score deviates
from the mean. Because some scores will deviate
from the mean by more than others, it makes sense to
compute the average amount the scores deviate from
the mean. The larger the “average of the deviations,”
the greater the variability or spread
between the scores and the mean.
range The distance
between the highest
We cannot, however, simply
and lowest scores in a
compute the average of the deviaset of data
tions. To compute an average, we
Chapter 4: Summarizing Scores with Measures of Variability
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
55
first sum the scores, so we would first sum the deviations. In symbols, this is (X X). Recall, however, that the sum of the deviations always equals
zero because the positive deviations cancel out the
negative deviations. Therefore, the average of the
deviations will always be zero.
Thus, we want a statistic like the average of
the deviations so that we know the average amount
the scores differ from the mean. But, because the
average of the deviations is always zero, we calculate slightly more complicated statistics called the
variance and standard deviation. Think of them,
however, as each producing a number that indicates
something like the average or typical amount that
the scores differ from the mean.
Table 4.2
Calculation of Variance Using the Defining
Formula
Age Score
2
3
4
X
5
(X X)
3
5
5
2
5
1
0
(X X)2
9
4
1
5
6
7
5
5
1
2
1
4
8
5
3
9
N7
0
(X X ) 2 28
4-3a Understanding the
Sample Variance
If the problem with the average of the deviations is
that the positive and negative deviations always cancel out to zero, then one solution is to square the deviations. This removes all negatives, so the sum of the
squared deviations is not necessarily zero and neither
is their average.
By finding the average squared deviation, we
compute the variance. The sample variance is the
average of the squared deviations of scores around the
sample mean. Our symbol for the sample variance is
S 2X. Always include the squared sign (2). The capital
S indicates that we are describing a sample, and the
subscript X indicates it is a sample of X scores.
We have a formula for the sample variance that
defines it:
THE DEFINING FORMULA FOR
THE SAMPLE VARIANCE IS
SX2 S X2 (X X) 2 28
4
N
7
This sample’s variance is 4. In other words, the
average squared deviation of the age scores around
the mean is 4.
The symbol for the sample
variance is S2X, and it indicates
the average squared deviation.
(X X ) 2
N
This formula is important because it shows you
the basis for the variance. Later we will see a different,
faster formula to use when you are
actually computing the variance. But
sample
2
variance (S X) The first, to understand the concept, say
that we measure the ages of some chilaverage of the squared
deviations of scores
dren. As shown in Table 4.2, we first
around the sample
compute each deviation, (X X), by
mean
subtracting the mean (which here is 5)
56
from each score. Next, as shown in the far-right column, we square each deviation. Adding the squared
deviations gives (X X) 2, which here is 28. The N
is 7 and so
The good news is that the variance is a legitimate measure of variability. The bad news, however,
is that the variance does not make much sense as the
“average deviation.” There are two problems. First,
squaring the deviations makes them very large, so the
variance is unrealistically large. To say that our age
scores differ from their mean by an average of 4 is silly,
because none of the scores actually deviates from the
mean by this much. The second problem is that variance is rather bizarre because it measures in squared
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
units. We measured ages, so the variance indicates that
the scores deviate from the mean by 4 squared years
(whatever that means!).
So, the variance is only very roughly analogous to
the “average” deviation. The variance is not a waste of
time, however, because it is used extensively in statistics. Also, variance does communicate the relative variability of scores. If one sample has S 2X 1 and another
has S 2X 3, you know that the first sample is less variable (more consistent) and more accurately described
by its mean. Further, looking back at Figure 4.1, for
the smaller variance you might envision a distribution
like Distribution A, while for the larger variance, you
would envision one more like Distribution B or C.
Thus, think of variance as a number that generally
communicates how variable the scores are: The larger
the variance, the more the scores are spread out. The
measure of variability that more directly communicates
the “average of the deviations” is the standard deviation.
To compute SX we first compute everything inside the square
root sign to get the variance, as we
did in Table 4.2. In the age scores
the variance was 4. We compute the
square root of the variance to find
the standard deviation:
so
SX 2
The standard deviation of the age scores is 2.
The standard deviation is as close as we come to
the “average of the deviations,” so we interpret our SX
of 2 as indicating that the age scores differ from the
mean by something like an “average” of 2. Further,
the standard deviation uses the same units as the raw
scores, so the scores differ from the mean age by an
“average” of 2 years.
Thus, our younger participants who were below
the mean usually missed it by about 2 years. When
our older participants were above the mean, they were
above it by an “average” of about 2 years.
(X X )2
SX C
N
Notice that the symymbol for the sample standard
ard
deviation is SX, which is the
square root of the symbol
bol
for the sample variance.
© iStockphoto.com/Helder Almeida
Standard Deviation
THE DEFINING
FORMULA FOR THE
SAMPLE STANDARD
D
DEVIATION IS
root of the sample
variance; interpreted
as somewhat like the
“average” deviation
SX 14
4-3b Understanding the Sample
The sample variance is always an unrealistically large
number because we square each deviation. A way to
solve this problem is to take the square root of the
variance. The answer is called the standard deviation.
The sample standard deviation is the square root
of the sample variance (the square root of the average
squared deviation of scores around the mean). Conversely, squaring the standard deviation produces the
variance.
To create the formula that defines
the standard deviation, we simply add the symbol forr the
square root to the previous
vious
defining formula for variance.
nce.
sample
standard
deviation
(SX ) The square
The symbol for the sample
standard deviation is SX, and we
interpret it as somewhat like
the average deviation of scores
around the mean.
Notice that the standa
standard deviation also allows
us to envision how spread out the distribution is and, correspondingly,
how accurately
ac
the mean summarizes the
t scores. If SX is relatively
large, then a large proportion
large
of scores
s
are relatively far
from tthe mean, which is why
they produce
pr
a large “average”
deviation. Therefore, we envision
deviatio
a relativ
relatively wider distribution. If
SX is ssmaller, then more often
sscores
sc
ores are close to the mean and
Chapter 4: Summarizing Scores with Measures of Variability
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
57
produce a smaller “average.” Then we envision a narrower distribution.
In fact, there is a mathematical relationship
between the standard deviation and the normal curve,
so we can precisely describe where most of the scores
in a distribution are located.
4-3c The Standard Deviation
and Area under the Curve
We’ve seen that the mean is the point around which a
distribution is located, but “around” can be anywhere
from a skinny distribution close to the mean to a wide
distribution very spread out from the mean. The standard deviation allows us to quantify “around.” To do
this we first find the raw score that is located at “plus
1 standard deviation from the mean” or, in symbols,
at 1SX. We also find the score located at “minus 1
standard deviation from the mean,” or at 1SX. For
example, say that on a statistics exam the mean is 80
and the standard deviation is 5. The score at 1SX is
at 80 5, which is 75. The score at 1SX is at 80 5,
which is 85. Figure 4.2 shows about where these
scores are located on a normal distribution.
As you can see, most of the distribution is
between these two scores. In fact, we know precisely how much of the distribution is here because
the standard deviation is related to the geometric
properties of a normal curve (like the constant Pi is
related to the geometric properties of circles). Recall
that any “slice” of the normal curve contains an area
under the curve and that this area translates into
relative frequency, which is the proportion of time
that the scores in the slice occur. Because of its shape,
about 34% of the area under the normal curve is in
the slice between the mean and the score that is 1
standard deviation from the mean. (Technically, it is
34.13%.) So, in Figure 4.2, about 34% of the scores
are between 75 and 80, and 34% of the scores are
between 80 and 85. Altogether, 68% of the scores
are between the scores at 1SX and 1SX from the
mean. Conversely, 16% of the scores are in the tail
below 75, and 16% are above 85. (If the distribution is only approximately normal, then we expect to
see approximately the above percentages.) Thus, it is
accurate to say that most of the scores are around the
mean of 80 between 75 and 85, because the majority
of scores (68%) are here.
Approximately 34% of the scores
in any normal distribution are
between the mean and the score
that is 1 standard deviation from
the mean.
The characteristic bell shape of any normal distribution always places 68% of the distribution between
the scores that are 1SX and
1SX from the mean. Look
Figure 4.2
back at Figure 4.1 once more.
Normal Distribution Showing Scores at Plus or Minus One Standard
In Distribution A most scores
Deviation
are relatively close to the mean.
With SX 5, the score of 75 is at − 1SX and the score of 85 is at 1SX. The percentages are the
This will produce a small SX that
approximate percentages of the scores falling into each portion of the distribution.
is, let’s say, 5. Because all scores
are relatively close to the mean,
X
68% of them will be in the
small area between 45 (50 5)
and 55 (50 5). However,
Distribution B is more spread
f
34%
34%
out, producing a larger SX (say
it’s 7). Because the distribution
16%
16%
68%
is wider, the middle 68% is also
wider, and mathematically this
75
80
85
0
corresponds to between 43 and
– 1SX
+ 1SX
57. Finally, Distribution C is
the most spread out, with the
58
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4. In Sample A, SX 6.82; in Sample B, SX 11.41.
Sample A is _________ (more/less) variable and
most scores tend to be _________ (closer to/farther
from) the mean.
5. If X 10 and SX 2, then 68% of the scores fall
between _________ and _________.
> Answers
4. less; closer to 5. 8; 12
3. The standard deviation is the square root of the
variance.
largest SX (let’s say it’s 12). Because this distribution is
so wide, the middle 68% is also very wide, spanning
scores far from the mean, at 38 and 62.
In summary, here is how to describe a distribution: If you know the data form a normal distribution, you can envision its general shape. If you know
the mean, you know where the center of the distribution is and what the typical score is. And if you
know the standard deviation, you know whether the
distribution is relatively wide or narrow, you know
the “average” amount that scores deviate from the
mean, and you know between which two scores the
middle 68% of the distribution lies.
1. SX2 2. SX
> Quick Practice
>
>
>
2
X
The sample variance (S ) and the
sample standard deviation (SX) are the
two statistics to use with the mean to
describe variability.
The variance is the average squared
deviation from the mean.
The standard deviation is interpreted as
the “average” amount that scores deviate
from the mean.
More Examples
For the scores 5, 6, 7, 8, 9, the X 7. The variance (S 2X )
is the average squared deviation of the scores around
the mean (here, S 2X 2). The standard deviation is the
square root of the variance: Here, SX 1.41, so when participants missed the mean, they were above or below
7 by an “average” of 1.41. Further, in a normal distribution, about 34% of the scores would be between the X
and 8.41 (7 1.41). About 34% of the scores would be
between the X and 5.59 (7 1.41).
For Practice
4-4 THE
population
standard
deviation (sX)
POPULATION
VARIANCE AND
STANDARD
DEVIATION
The square root
of the population
variance, or the
square root of the
average squared
deviation of
scores around the
population mean
Recall that our ultimate goal is to
describe the population of scores.
population
variance
Sometimes researchers have access to
(sX2) The average
a population, and then they directly
squared deviation of
calculate the actual population variscores around the
ance and standard deviation. The
population mean
symbol for the true or actual population standard deviation is sX.
(The s is the lowercase Greek letter s, called sigma.)
Because the squared standard deviation is the variance,
the symbol for the true population variance is sX2.
The defining formulas for sX and sX2 are similar to those
we saw for a sample:
POPULATION
STANDARD DEVIATION
sX (X m) 2
B
N
POPULATION
VARIANCE
sX2 (X m) 2
N
1. The symbol for the sample variance is _________.
2. The symbol for the sample standard deviation is
_________.
3. What is the difference between computing
the standard deviation and computing the
variance?
The only novelty here is that we are computing
how far each score deviates from the population mean,
symbolized by m. Otherwise, the population standard
deviation and variance tell us the same things about
the population that we saw previously for a sample:
Both are ways of measuring variability, indicating
how much the scores are spread out. Further, we can
Chapter 4: Summarizing Scores with Measures of Variability
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
59
interpret the population standard
deviation as the “average” deviation
The
of the scores around m, with 68% of
formula for the variance
or standard deviation
the scores in the population falling
involving a final division
between the scores that are at 1sX
by N, used to describe a
and 1sX from m.
sample, but that tends
to underestimate the
Usually you will not have a poppopulation variability
ulation of scores available, so you
unbiased
will not have to compute these paraestimators The
meters. However, you will encounter
formula for the variance
situations where, based on much preor standard deviation
vious research, researchers already
involving a final division
by N 1; calculated
know about a population, and so
using sample data to
the variance or standard deviation is
estimate the population
given to you. Therefore, learn these
variability
symbols and use their formulas to
understand what each represents.
So, to sum up, we’ve seen how to describe the
variability in a known sample (using SX2 or SX) and
how to describe the variability in a known population (using sX2 or sX). However, we must discuss one
other situation: Although the ultimate goal of research
is usually to describe the population, often we will not
know all of the scores in the population. In such situations, we use our sample data to infer or estimate the
variability in the population.
population, so we need a random sample of deviations.
Yet, when we measure the variability of a sample, the
mean is our reference point, so we have the restriction
that the sum of the deviations must equal zero. Because
of this, not all deviations in the sample are “free” to be
random and to reflect the variability found in the population. For example, say that the mean of five scores
is 6, and that four of the scores are 1, 5, 7, and 9. The
sum of their deviations is 2. The final score can only
be 8, because we must have a deviation of 2 so that
the sum of all deviations is zero. Because this deviation
is determined by the other scores, it is not a random
deviation that reflects the variability in the population.
Instead, only the deviations produced by the other four
scores reflect the variability in the population. The
same would be true for any four of the five scores. In
fact, out of the N deviations in any sample, only N 1
of them (the N of the sample minus 1) actually reflect
the variability in the population.
However, if only N 1 of the deviations reflect the
population’s variability, then when we get the “average”
deviation, we should divide by N 1. The problem
with the formulas for the previous biased estimators
(SX and SX2 ) is that they divide by N. Because we divide
by too large a number, the answer tends to be too
small. Instead, if we divide by N 1, we compute the
unbiased estimators of the population variance
and population standard deviation.
biased
estimators
4-4a Estimating the Population
Variance and Standard Deviation
60
Estimated Population Variance
sX2 (X X)2
N1
Estimated Population Standard Deviation
sX (X X)2
C N1
© joingate/Shutterstock.com
We use the variability in a sample to estimate the
variability we’d find if we could measure the entire
population. However, we do not use the previous formulas for the sample variance and sample standard
deviation as the basis for this estimate. Those formulas are used only when describing the variability of a
sample. In statistical terminology, the formulas for SX2
and SX are called the biased estimators: When used
to estimate the population, they are biased toward
underestimating the true population parameters. Such
a bias is a problem because, as we saw in the previous chapter, if we cannot be accurate, we at least want
our under- and overestimates to cancel out over the
long run. (Remember the two statisticians shooting
targets?) With the biased estimators, the under- and
overestimates do not cancel out. Instead, although the
sample variance and sample standard deviation accurately describe a sample, they are too often too small
to use as estimates of the population. Here’s why:
To accurately estimate a population, we use a random sample. Here we are estimating deviations in the
THE DEFINING FORMULAS FOR
THE UNBIASED ESTIMATORS OF THE
POPULATION VARIANCE AND STANDARD
DEVIATION ARE
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Capital S represents the Sample;
lowercase s represents
ents the
population estimate.
mate.
4-4b Interpreting thee
Estimated Population
Variance and
Standard Deviation
Interpret the estimated population variance and standard
deviation in the same way as
you did S 2X and SX, except that
here we describe what we expect the
estimated
“average deviation” in the population
population
variance (s2X)
to be, how spread out we expect the
The unbiased estimate
distribution will be, and how accuof the population
rately we expect the population to be
variance calculated
summarized by m. For example, based
from sample data
using N 1
on a statistics class with a mean of 80,
we’d infer that the population would
estimated
population
have a m of 80. The size of sX2 or sX
standard
estimates how spread out the popudeviation
lation is, so if sX turned out to be 6,
(sX) The unbiased
estimate of the
we’d expect the “average” amount
population standard
that individual scores deviate from 80
deviation calculated
would be about 6. We can also deterfrom sample data
using N 1
mine the scores at 1sX and 1sX
from m, so we’d expect 68% of the
population to score between 74 (80 – 6) and 86 (80 6).
Notice that, assuming a sample is representative, we
have reached our ultimate goal of describing the population of scores. Further, because these scores reflect
behavior, this description gives us a good idea of how
most individuals in the population behave in this situation (which is why we conduct research in the first place).
4-5 SUMMARY OF THE
VARIANCE AND STANDARD
DEVIATION
To keep track of the different statistics you’ve seen,
remember that variability refers to the differences
between scores, which we describe by computing the
variance and standar
standard deviation. In each, we are finding the difference be
between each score and the mean
and then calculating something, more or less, like the
average devia
deviation.
Organize your thinking about the particOrganiz
ular measu
measures of variability using Figure 4.3.
Any standard
standa deviation is merely the square
root of the
th corresponding variance. For
either measure,
meas
compute the descriptive versions wh
when the scores are available: When
describing how far the scores are
desc
spread out from X, we use the sample
sp
vvariance (SX2 ) and the sample standard deviation (SX). When describd
iing how far the scores are spread
out from m, we use the population
o
vvariance 1s2X 2 and the population
sstandard deviation 1sX 2. When
the complete population of scores is
th
© iStockphoto.com/Imagesbybarbara
The first formula above is for the estimated population variance. Notice that it involves the same
basic computation we saw in our sample variance:
We are finding the amount each score deviates from
the mean, which will then form our estimate of how
much the scores in the population deviate from m. The
only novelty is that in computing the “average” of the
squared deviations, we divide by N 1 instead of by N.
The second formula above is the defining formula for the estimated population standard
deviation. As with our previous formulas, we have
simply added the square root symbol to the formula
for the variance: The estimated standard deviation is
the square root of the estimated variance.
Because we have new formulas that produce new
statistics, we also have new symbols. The symbol for the
unbiased estimated population variance is the lowercase
sX2 . The square root of the variance is the standard deviation, so the symbol for the unbiased estimated population
standard deviation is sX. To keep your symbols straight,
remember that the symbols for the sample involve the
capital or big S, and in those formulas you divide by the
“big” value of N. The symbols for estimates of the population involve the lowercase or small s, and here you
divide by the smaller quantity N 1. Also, think of s 2X
and sX as the inferential versions, because the only time
you use them is to infer the variance or standard deviation of the population based on a sample. Think of S 2X
and SX as the descriptive variance and standard deviation, because they are used to describe the sample.
Chapter 4: Summarizing Scores with Measures of Variability
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
61
Figure 4.3
Organizational Chart of Descriptive and Inferential Measures of Variability
Describing variability
(differences between scores)
Descriptive measures are
used to describe
a known sample
or population
In formulas, final division uses N
To describe
sample variance
2
compute S X
Taking square root gives
Sample
standard deviation
SX
To describe
population variance
compute X2
Taking square root gives
Population
standard deviation
X
unavailable, we infer the variability of the population
based on a sample by computing the unbiased estimators (sX or sX2 ). These inferential formulas require a final
division by N 1 instead of by N.
We use S 2X and SX to describe a
sample, s 2X and sX to describe the
true population, and s 2X and sX to
estimate the population.
Inferential measures are used
to estimate the population
based on a sample
In formulas, final division uses N – 1
To estimate
population variance
compute s X2
Taking square root gives
Estimated population
standard deviation
sX
is to measure how far
the scores are from their
mean. However, in everyday use these formulas
are very time-consuming
and mind-numbing. By
reworking the defining
formulas, we have less
obvious but faster “computing formulas” for
describing the sample and
for estimating the population. To create these formulas, the symbol for the
mean (X) in the defining
formulas is replaced by
its formula (X>N). Then
some serious reducing is
performed.
These formulas involve
two new symbols that you
must master for later statistics, too. They are:
1. The sum of squared Xs: The symbol ⌺ X 2
indicates to find the sum of the squared Xs.
To do so, first square each X (each raw score)
and then sum—add up—the squared Xs. Thus,
to find X 2 for the scores 2, 2, and 3, add
22 22 32, which becomes 4 4 9, which
equals 17.
2. The squared sum of X: The symbol ( ⌺ X)2
indicates to find the squared sum of X. To do
so, work inside the parentheses first, so find
the sum of the X scores. Then square that sum.
Thus, to find (X 2) for the scores 2, 2, and 3,
you have (2 2 3)2, which is (7)2, which
is 49.
4-6 COMPUTING
sum of squared
Xs ( ⌺ X 2) Calculated
by squaring each score
in a sample and adding
the squared scores
squared sum
of X [( ⌺ X)2]
Calculated by adding
all scores and then
squaring their sum
62
THE FORMULAS
FOR VARIANCE
AND STANDARD
DEVIATION
The defining formulas we’ve seen are
important because they show that the
core computation in any version of
the variance and standard deviation
X 2 indicates the sum
of the squared Xs, and
(X)2 indicates the squared
sum of X.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4-6a Computing the Sample
The formulas here were derived from the defining
formulas for describing the sample variance and
sample standard deviation, so our final division is
by N.
THE COMPUTING FORMULA FOR
THE SAMPLE VARIANCE IS
X 2 S X2 (X )
N
STEP 2: Compute the squared sum of X. Here the
squared sum of X is 352, which is 1225, so
2
203 N
SX2 This says to first find the sum of X (X), square
that sum, and divide the squared sum by N. Then
subtract that result from the sum of the squared Xs
(X 2). Finally, divide that quantity by N.
For example, we can arrange our original age
scores as shown in Table 4.3.
STEP 1: Find X, X , and N. Here, X is 35, X
is 203, and N is 7. Putting these quantities
into the formula, we have
2
2
(X ) 2
(35) 2
203 N
7
N
7
X 2 SX2 © iStockphoto.com/Francisco Romero
Variance and Standard
Deviation
1225
7
7
STEP 3: Divide the (X)2 by N. Here 1225 divided
by 7 equals 175, so
SX2 203 175
7
STEP 4: Subtract in the numerator. Because 203
minus 175 equals 28, we have
SX2 28
7
STEP 5: Divide. After dividing 28 by 7 we have
SX2 4
Again, the sample variance for these age scores is
4, and it is interpreted as we discussed previously.
Table 4.3
Calculation of Variance Using the Computational
Formula
X2
4
9
16
25
36
49
64
X 35
X 2 203
Do not read
any further until
you understand how
to work this
formula!
© iStockPhoto.com/AJevs
X Score
2
3
4
5
6
7
8
Chapter 4: Summarizing Scores with Measures of Variability
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
63
Recall that a standard deviation is the square
root of the variance, so the computing formula
for the sample standard deviation merely adds the
square root symbol to the previous formula for the
variance.
THE COMPUTING FORMULA FOR
THE SAMPLE STANDARD DEVIATION IS
(X ) 2
X N
SX R
N
X 2 S 2X S 2X (X ) 2
N
N
255 (35) 2
5
5
255 245
2
5
2. To compute the sample standard deviation, find the
above variance and then find its square root:
X 2 2
SX R
N
22 1.41
(X ) 2
N
255 R
1225
5
5
Again, we’ll use the age scores in Table 4.3.
2. What is X 2?
3. What is the sample variance?
4. What is the sample standard deviation?
> Answers
2.667
STEP 6: Compute the square root.
2. 22 42 52 62 62 72 166
SX 2
> Quick Practice
>
>
X 2 indicates to square each score and
then find the sum of the squared Xs.
(X)2 indicates to sum the scores and
then find the squared sum of X.
More Examples
For the scores 5, 6, 7, 8, 9, we compute the sample
variance as:
1. X 5 6 7 8 9 35;
X 2 52 62 72 82 92 255; N 5, so
64
1. (30)2 900
As we saw originally, the standard deviation of
these age scores is 2; interpret it as we did then.
900
166 6
SX 24
6
Follow Steps 2–5 described previously
for computing the variance. Inside the square
root symbol will be the variance, which here
is again 4, so
1. What is (X )2?
3. S X2 (35)2
203 7
SX R
7
For Practice
For the scores 2, 4, 5, 6, 6, 7:
4. SX 22.667 1.63
STEP 1: Find X, X 2, and N. Again, X is 35, X 2
is 203, and N is 7, so
4-6b Computing the Estimated
Population Variance and Standard
Deviation
The only difference between the formulas for estimating the population and the previous formulas for
describing the sample is that here, the final division is
by N 1.
THE COMPUTING FORMULA
FOR THE ESTIMATED POPULATION
VARIANCE IS
1 X2 2
N
N1
X 2 sX2 Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Notice that in the numerator we still divide by N
and we use all scores in the sample.
For example, previously we had the age scores
of 3, 5, 2, 6, 7, 4, 8. To estimate the population variance, follow the same steps as before. First, find
X and X 2. Here, X 35 and X2 203. Also,
N 7 so N 1 6. Putting these quantities into the
formula gives
1 X2 2
1352 2
X 203 N
7
2
sX N1
6
2
Work through this formula the same way you did
for the sample variance: 352 is 1225, and 1225 divided
by 7 equals 175, so
sX2 203 175
6
Next, 203 minus 175 equals 28, so
sX2 28
6
and the final answer is
s2X 4.67
This answer is slightly larger than the sample variance for these scores, which was SX2 4. Although 4
accurately describes the sample variance, we estimate
that the variance in the corresponding population is
4.67. In other words, if we could measure all scores in
the population and then compute the true population
variance, we would expect sX2 to be 4.67.
The formula for the estimated population standard deviation merely adds the square root sign to the
above formula for the variance.
THE COMPUTING FORMULA FOR THE
ESTIMATED POPULATION STANDARD
DEVIATION IS
1 X2 2
N
N1
X 2 sX R
Using our age scores and performing the steps
inside the square root sign as we did above produces
4.67. Therefore, sX is 24.67, which is 2.16. If we could
compute the standard deviation using the entire population of scores, we would expect sX to equal 2.16.
4-7 STATISTICS IN THE
RESEARCH LITERATURE:
REPORTING MEANS AND
VARIABILITY
The standard deviation is most often reported in
published research because it more directly communicates how consistently close the individual scores
are to the mean and because it allows us to determine
the middle 68% of the distribution. Thus, the mean
from a study might describe the number of times participants exhibited a particular behavior, and a small
standard deviation indicates they consistently did so.
Or, in a survey, the mean might describe the typical
opinion held by participants, but a large standard
deviation indicates substantial disagreement among
them. The same approach is used in experiments, in
which we compute the mean and standard deviation
in each condition. Then each mean indicates the typical score and the score we predict for anyone in that
condition. The standard deviation indicates how consistently close the actual scores are to that mean. Or
instead, often researchers report the estimated population standard deviation to estimate the variability of
scores if everyone in the population was tested under
that condition.
You should be aware that the rules for reporting research results (such as those we saw for creating tables and graphs) are part of the guidelines for
research publications established by the American
Psychological Association (APA). We will also follow
this “APA format” or “APA style” when we discuss
how to report various statistics. However, research
journals that follow APA format do not always
use our statistical symbols. Instead (as if you don’t
already have enough symbols!), the symbol they use
for the sample mean is M. The symbol for the standard deviation is SD, and unless otherwise specified,
you should assume it is the estimated population version. On the other hand, when a report discusses the
true population parameters, our Greek symbols m
and a are used.
Chapter 4: Summarizing Scores with Measures of Variability
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
65
USING SPSS
SPSS will simultaneously compute the mean, median, mode, range, standard deviation, and variance for a
sample of data. The SPSS instructions on Review Card 4.4 show you how. The program computes only the
unbiased estimators of the variance and standard deviation (our s 2X and sX).
Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out
and use the Chapter Review Cards in the back of your book. Check out the additional study aids online
in CourseMate at www.cengagebrain.com
STUDY PROBLEMS
(Answers for odd-numbered problems are in Appendix C.)
1. What does a larger measure of variability communicate about: (a) the size of differences among
the scores in a distribution? (b) how consistently
the participants behaved? (c) how spread out the
distribution is?
2. In any research, why is describing the variability
important?
3. Thinking back on the previous three chapters,
what are the three major pieces of information we
need to know to summarize a distribution?
4. What is the difference between what a measure
of central tendency tells us and what a measure of
variability tells us?
5. (a) What is the range? (b) Why is it not the most
accurate measure of variability? (c) When is it primarily used?
6. (a) What is the mathematical definition of the variance? (b) Mathematically, how is a sample’s variance
related to its standard deviation and vice versa?
7. (a) What do both the variance and the standard
deviation tell you about a sample? (b) Which measure will you usually want to compute? Why?
8. Why is the mean a less accurate description of the
distribution if the variability is large than if it is small?
9. (a) What do SX , sX , and sX have in common?
(b) How do they differ in their use?
10. (a) What do S 2X , s 2X , and s2X have in common?
(b) How do they differ in their use?
11. (a) How do we determine the scores that mark the
middle 68% of a sample? (b) How do we determine
the scores that mark the middle 68% of a known
population? (c) How do we estimate the scores that
mark the middle 68% of an unknown population?
12. Why are your estimates of the population variance
and standard deviation always larger than the
66
corresponding values that describe a sample from
that population?
13. In a condition of an experiment, a researcher
obtains the following scores:
3 2 1 0 7 4 8 6 6 4
Determine the following: (a) the range, (b) the
variance, (c) the standard deviation, (d) the two
scores between which 68% of the scores lie.
14. If you could test the entire population in question
13, what would you expect each of the following
to be? (a) The typical, most common score; (b) the
variance; (c) the standard deviation; (d) the two
scores between which 68% of the scores lie.
15. Tiffany has a normal distribution of scores ranging
from 2 to 9. (a) She computed the variance to be
2.06. What should you conclude about this answer
and why? (b) She recomputes the standard deviation
to be 18. What should you conclude and why? (c) If
she computed that SX 0, what would this indicate?
16. From his statistics grades, Demetrius has a X 60 and SX 20. Andrew has X 60 and SX 8.
(a) Who is the more inconsistent student and why?
(b) Who is more accurately described as a 60 student and why? (c) For which student can you more
accurately predict the next test score and why?
(d) Who is more likely to do either extremely well
or extremely poorly on the next exam and why?
17. Consider these normally distributed ratio scores
from an experiment:
Condition A
Condition B
Condition C
12
11
11
10
33
33
34
31
47
48
49
48
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
(a) What “measures” should you compute to
summarize the experiment? (b) Compute the
appropriate descriptive statistics and summarize
the relationship in the sample data. (c) How
consistent does it appear the participants were
in each condition?
18. Say that you conducted the experiment
in question 17 on the entire population.
(a) Summarize the relationship that you’d
expect to observe. (b) Compute how consistently
you’d expect participants to behave in each
condition.
19. In two studies, the mean is 40 but in Study A,
SX 5, and in Study B, SX 10. (a) What is the
difference in the appearance of the distributions
from these studies? (b) Where do you expect the
majority of scores to fall in each study?
20. Consider these normally distributed ratio scores
from an experiment:
Condition 1
Condition 2
Condition 3
18
13
9
8
11
6
3
9
5
(a) What should you do to summarize the
experiment? (b) Summarize the relationship
in the sample data. (c) How consistent are the
scores in each condition?
21. Say that you conducted the experiment in question
20 on the entire population. (a) Summarize the
relationship that you’d expect to observe.
(b) How consistently would you expect participants
to behave in each condition?
22. (a) What are the symbols for the true population
variance and standard deviation? (b) What are the
symbols for the biased estimators of the variance
and standard deviation? (c) What are the symbols
for the unbiased estimators of the variance and standard deviation? (d) When do we use the unbiased
estimators? When do we use the biased estimators?
23. For each of the following, indicate the conditions
of the independent variable, the scores from which
variable to analyze, whether it is appropriate to
compute the mean and standard deviation, and the
type of graph you would create. (a) We test whether
participants laugh longer (in seconds) at jokes told
on a sunny or a rainy day. (b) We compare groups
who have been alcoholics for 1, 3, or 5 years. In each,
we measure participants’ income. (c) We count the
number of creative ideas produced by participants
who slept either 6, 7, or 8 hours the night before.
24. What is a researcher communicating with each of
the following statements? (a) “The line graph of
the means was close to flat, although the variability in each condition was quite large.” (b) “For
the sample of men (M 14 and SD 3), we conclude. . . .” (c) “We expect that in the population,
the average score is 14 and the standard deviation
is 3.5. . . .”
Chapter 4: Summarizing Scores with Measures of Variability
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
67
Chapter
5
DESCRIBING DATA
WITH z-SCORES AND
THE NORMAL CURVE
LOOKING
BACK
GOING
F O R WA R D
Be sure you understand:
Your goals in this chapter are to learn:
• From Chapter 2, that relative frequency
is the proportion of time that scores
occur and that it corresponds to the
area under the normal curve. A percentile equals the percent of the area
under the curve to the left of a score.
• What a z-score is.
• From Chapter 3, what a deviation is
and that the larger a deviation, the
farther a score is from the mean.
• What the sampling distribution of means is and what the standard
error of the mean is.
• From Chapter 4, that Sx and sx indicate the “average” deviation of scores
around X and m, respectively.
• What a z-distribution is and how it indicates a score’s relative
standing.
• How the standard normal curve is used with z-scores to determine
relative frequency, simple frequency, and percentile.
• How to compute z-scores for sample means and then determine
their relative frequency.
I
n previous chapters we have summarized an entire distribu-
Sections
5-1
5-2
Understanding z-Scores
5-3
Using the z-Distribution
to Compare Different
Variables
5-4
5-5
68
Using the z-Distribution
to Interpret Scores
Using the z-Distribution
to Compute Relative
Frequency
tion of scores. In this chapter we’ll take a different approach
and discuss the statistic to use when we want to interpret an
individual score. Here we ask the question “How does any
particular score compare to the other scores in a sample or population?” We answer this question by transforming raw scores
into “z-scores.” In the following sections, we discuss (1) the logic
of z-scores and their simple computation, (2) how z-scores are
used to evaluate individual raw scores, and (3) how the same
logic can be used to evaluate sample means.
Using z-Scores to
Describe Sample Means
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
©iStockphoto.com/Casarsa
5-1 UNDERSTANDING
z-SCORES
Researchers transform raw scores into z-scores
because we usually don’t know how to interpret a
raw score: We don’t know whether, in nature, a score
should be considered high or low, good or bad, or
what. Instead, the best we can do is to compare a
score to the other scores in the distribution, describing the score’s relative standing. Relative standing
reflects the systematic evaluation of a score by
comparing it to the sample or population in which
the score occurs. The way to determine the relative
standing of a score is to transform it into a z-score.
Using the z-score, we can easily compare the score to
the group of scores, so we’ll know whether the individual’s underlying raw score is relatively good, bad,
or in between.
To see how this is done, say we are conducting a study at Prunepit University in which the first
step is to measure the attractiveness of a sample of
males. The scores form the normal curve shown in
Figure 5.1 on the next page. We want to interpret
these scores, especially those of three men: Chuck,
who scored 35; Archie, who scored 65; and Jerome,
who scored 90. You already know that the way to
do this is to use a score’s location on the distribution
to determine its frequency, relative frequency, and
percentile. For example, Chuck’s score is far below
the mean and has a rather low frequency. Also, the
proportion of the area under the curve at his score
is small, so his score has a low relative frequency.
And because little of the distribution is to the left of
(below) his score, he also has a low percentile. On the
other hand, Archie is somewhat above the mean, so
he is somewhat above the 50th percentile. Also, the
height of the curve at his score is large, so his score
has a rather high frequency and relative frequency.
And then there’s Jerome: His score is far above the
mean, with a low frequency and relative frequency,
and a very high percentile.
The problem with the above descriptions is that
they are subjective and imprecise, and to get them we
had to look at all scores in the distribution. The way
to obtain the above information, but more precisely
and without looking at every score, is to compute
each man’s z-score. Then we can determine exactly
where on the distribution a score is located so that we
can precisely determine the score’s frequency, relative
frequency, and percentile.
5-1a Describing
a Score’s Relative
Location as a z-Score
We began the description of each
man’s score above by noting
whether it is above or below the
relative
standing
A systematic
evaluation of a
score by comparing
it to the sample or
population in which
it occurs
Chapter 5: Describing Data with z-Scores and the Normal Curve
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
69
Frequency Distribution of Attractiveness Scores at Prunepit U
Scores for three individuals are identified on the X axis.
X
f
25
30
35
40
45
Chuck
50
55
60
65
70
Archie
Attractiveness scores
mean. Likewise, our first calculation is to measure
how far a raw score is from the mean by computing the score’s deviation, which equals X ⫺ X. For
example, Jerome’s score of 90 deviates from the
mean of 60 by 90 ⫺ 60 ⫽ ⫹30. A deviation of ⫹30
sounds as if it might be large, but is it? We need a
frame of reference. For the entire distribution, only
a few scores deviate by as much as Jerome’s score,
and that makes his an impressively high score. Thus,
a score is impressive if it is far from the mean, and
“far” is determined by how frequently other scores
deviate from the mean by that amount.
Therefore, to interpret a score’s location, we
must compare its deviation to the other deviations.
As you saw in Chapter 4, the standard deviation is
interpreted as the “average deviation.” By comparing a score’s deviation to the standard deviation,
we can describe the score in terms of this average
deviation. For example, say that in the attractiveness
data, the sample standard deviation is 10. Jerome’s
deviation of ⫹30 is equivalent to 3 standard deviations, so Jerome’s raw score is located 3 standard
deviations above the mean. His raw score is impressive because it is three times as far above the mean
as the “average” amount that scores are above
the mean.
By transforming Jerome’s deviation into standard deviation units, we
z-score The statistic
have computed his z-score. A z-score
that indicates the
indicates the distance a raw score is
distance a score is
from its mean when
from the mean when measured in
measured in standard
standard deviations. The symbol for a
deviation units
z-score is z. A z-score always has two
70
75
80
85
90
95
Jerome
components: (1) either a positive or a negative sign, which
indicates whether the raw score
is above or below the mean; and
(2) the absolute value of the
z-score (ignoring the sign), which
indicates how far the score is
from the mean in standard
deviations. So, because Jerome
is above the mean by 3 standard
deviations, his z-score is ⫹3. If
he had been below the mean by
this amount, he
would have
z ⫽ ⫺3.
A z-score indicates how far a
raw score is above or below
the mean when measured in
standard deviations.
Thus, like any score, a z-score is a location on a
distribution. However, it also simultaneously communicates the distance it is from the mean. Therefore, knowing that Jerome scored at z ⫽ ⫹3 provides
us with a frame of reference that we do not have by
knowing only that his raw score was 90.
5-1b Computing z-Scores in
a Sample or Population
We computed Jerome’s z-score by first subtracting
the mean from his raw score and then dividing by the
standard deviation, so:
THE FORMULA FOR TRANSFORMING A RAW
SCORE IN A SAMPLE INTO A z-SCORE IS
z⫽
X ⫺X
SX
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
©iStockphoto.com/Helder Almeida
Figure 5.1
standard deviation of the population (sX). (We do not
compute z-scores using the estimated population standard deviation.) The logic here is the same as in the previous formula, but using the population symbols gives
(This is both the defining and the computing formula.)
To find Jerome’s z-score,
STEP 1: Determine the X and SX. Then, filling in the
above formula gives
z⫽
X ⫺ X 90 ⫺ 60
⫽
SX
10
THE FORMULA FOR TRANSFORMING
A RAW SCORE IN A POPULATION INTO
A z-SCORE IS
STEP 2: Find the deviation in the numerator. Always
subtract X from X. Then
z⫽
z⫽
⫹30
10
Now the answer indicates how far a raw score
lies from the population mean, measured using the
population standard deviation. For example, say that
in the population of attractiveness scores, m ⫽ 60
and sX ⫽ 10. Jerome’s raw score of 90 is again a
z ⫽ (90 ⫺ 60)/10 ⫽ ⫹3, but now this is his location in
the population.
STEP 3: Divide and you have
z ⫽ ⫹3
Likewise, Archie’s raw score is 65, so
z⫽
X ⫺ X 65 ⫺ 60 ⫹5
⫽
⫽
⫽ ⫹.5
SX
10
10
Archie’s raw score is literally one-half of 1 standard
deviation above the mean.
Notice it is important to always include a positive
or a negative sign when computing a z-score. Chuck,
for example, has a raw score of 35, so
z⫽
Always include a positive or
a negative sign when computing
a z-score.
X ⫺ X 35 ⫺ 60 ⫺25
⫽
⫽
⫽ ⫺2.5
SX
10
10
Here, 35 minus 60 results in a deviation of minus 25,
so his z-score is ⫺2.5. This tells us that Chuck’s raw
score is 2.5 standard deviations below the mean.
Of course, a raw score that equals the mean produces a z-score of 0. Above, our mean is 60, so for an
individual’s score of 60, we subtract 60 ⫺ 60, so z ⫽ 0.
We can also compute a z-score for a score in a population, if we know the population mean (m) and the true
5-1c Computing a Raw Score
When z Is Known
© iStockphoto.com/Stephen Rees/© iStockphoto.com/Ines Koleva
Chuck
is Here.
X⫺m
sX
Sometimes we know a z-score and want to transform
it back to the raw score that produced it. For example,
say that Leon scored at z ⫽ ⫹1. What is his attractiveness score? With X ⫽ 60 and SX ⫽ 10, his z-score indicates that he is 1 standard deviation above the mean.
In other words, he is 10 points above 60, so his raw
score is 70. What did we just do? We multiplied his
z-score times SX and then added the mean. So,
THE FORMULA FOR TRANSFORMING
A z-SCORE IN A SAMPLE INTO A RAW
SCORE IS
X ⫽ (z)(SX) ⫹ X
Chapter 5: Describing Data with z-Scores and the Normal Curve
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
71
So, to find Leon’s raw score,
z-distribution
The distribution
produced by
transforming all raw
scores in a distribution
into z-scores
STEP 1: Determine the X and SX.
Ours were X ⫽ 60 and
SX ⫽ 10, so
X ⫽ (⫹1)(10) ⫹ 60
STEP 2: Multiply z times SX. This gives:
X ⫽ ⫹10 ⫹ 60
STEP 3: Add, and you have
More Examples
In a sample, X ⫽ 25 and S X ⫽ 5. To find z for X ⫽ 32:
z⫽
X ⫺ X 32 ⫺ 25 ⫹7
⫽
⫽
⫽ ⫹1.4
SX
5
5
To find the raw score for z ⫽ ⫺.43:
X ⫽ (z)(S X ) ⫹ X ⫽ (⫺.43)(5) ⫹ 25
⫽ ⫺2.15 ⫹ 25 ⫽ 22.85
X ⫽ 70
Adding a negative number is the same as subtracting its positive value, so
X ⫽ 47
Brian’s raw score here is 47.
The above logic also applies to finding the raw
score for a z from a population, except that we use the
symbols for the population.
THE FORMULA FOR TRANSFORMING A
z-SCORE IN A POPULATION INTO A RAW
SCORE IS
X ⫽ (z)(sX) ⫹ m
Here, we multiply the z-score times the population
standard deviation and then add m.
> Quick Practice
>
>
72
A ⫹ z indicates that the raw score is
above the mean, a ⫺ z that it is below
the mean.
The absolute value of z indicates
the score’s distance from the
mean, measured in standard
deviations.
With m ⫽ 100 and sX ⫽ 16,
3. What is the z for X ⫽ 132?
4. What X produces z ⫽ ⫹1.4?
> Answers
1. z ⫽ (44 ⫺ 50)/10 ⫽ ⫺.60
X ⫽ ⫺13 ⫹ 60
2. What X produces z ⫽ ⫺1.3?
2. X ⫽ (⫺1.3)(10) ⫹ 50 ⫽ 37
so
1. What is z for X ⫽ 44?
3. z ⫽ (132 ⫺ 100)/16 ⫽ ⫹2
X ⫽ (⫺1.3)(10) ⫹ 60
For Practice
With X ⫽ 50 and SX ⫽ 10,
4. X ⫽ (⫹1.4)(16) ⫹ 100 ⫽ 122.4
The raw score of 70 corresponds to a z of ⫹1.
In another case, say that Brian has a z-score of
⫺1.3. Then with X ⫽ 60 and SX ⫽ 10,
5-2 USING THE
z-DISTRIBUTION TO
INTERPRET SCORES
The reason that z-scores are so useful is that they
directly communicate the relative standing of a raw
score within a distribution. The way to see this is to
envision any sample or population as a z-distribution.
A z-distribution is the distribution produced by
transforming all raw scores in the data into z-scores.
For example, our attractiveness scores produce the
z-distribution shown in Figure 5.2.
Notice the two ways the X axis is labeled. This
shows that by creating a z-distribution, we change
only the way that we identify each score. Saying that
Jerome has a z of ⫹3 is merely another way of saying
that he has a raw score of 90. Because he is still at the
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
of the highest or lowest
scores in the distribution and has a very low
frequency.
All normal z-distributions are similar
because of three important characteristics:
Figure 5.2
z-Distribution of Attractiveness Scores at Prunepit U
The labels on the X axis show first the raw scores and then the z-scores.
X
f
Raw scores
z-scores
25
30
35
–3
Chuck
©iStockphoto.com/Pixhook
40
–2
45
50
–1
55
60
65
0
70
+1
Archie
same location in the distribution, his z-score has the
same frequency, relative frequency, and percentile as
his raw score.
Because all z-distributions are laid out in the same
way, z-scores form a standard way to communicate
relative standing. The z-score of 0 always indicates
that the raw score equals the mean and is in the center
of the distribution (and is also the median and mode).
A positive z-score indicates that the z-score (and raw
score) is above and graphed to the right of the mean.
Positive z-scores become increasingly larger as we
look farther to the right. Larger positive z-scores (and
their corresponding raw scores) occur less frequently.
Conversely, a negative z-score indicates that
the z-score (and raw score) is below and
graphed to the left of the mean. Because
z-scores measure the distance a score is
from the mean, negative z-scores become
increasingly larger as we look farther to
the left. Larger negative z-scores (and
their corresponding raw scores) occur
less frequently.
Notice, however, that a negative
z-score is not automatically a bad
score. For some variables (e.g., credit
card debt), a low raw score is the goal
and so a larger negative z-score is a
better score. Also notice that most
of the z-scores are between ⫹3 and
⫺3. The symbol “ { ” means “plus
or minus,” so we can restate this
by saying that most z-scores are
between { 3. A z-score near { 3
indicates a raw score that is one
75
80
+2
85
90
+3
Jerome
95
1. A z-distribution
always has the same
shape as the raw
score distribution.
When the underlying raw score distribution is normal,
its z-distribution is
normal.
2. The mean of any z-distribution is 0. Whatever
the mean of the raw scores is, it transforms into a
z-score of 0. (Also, the average of the positive and
negative z-scores is 0.)
3. The standard deviation of any z-distribution is 1.
Whether the standard deviation of the raw scores
is 10 or 100, a score at that distance from the
mean is a distance of 1 when transformed into
a z-score, so the “average deviation” is now 1.
(Also, if we compute SX using the z-scores in a
z-distribution, the answer will be 1.)
The larger a z-score—whether
positive or negative—the farther
the corresponding raw score
is from the mean, and the less
frequently the z-score and
raw score occur.
Because all z-distributions are similar, you can
determine the relative standing of any raw score by
computing its z-score and envisioning a z-distribution
like that in Figure 5.2. Then, if the z-score is close
to zero, the raw score is near the mean and is a very
frequent, common score. A z greater than about {1
indicates a raw score that is less frequent. The closer
Chapter 5: Describing Data with z-Scores and the Normal Curve
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
73
5-3 USING THE
z-DISTRIBUTION TO
COMPARE DIFFERENT
VARIABLES
©iStockphoto.com/David Gilder
the z is to {3, the farther into a tail it is, and the
closer the raw score is to being one of the few highest
or lowest scores in the distribution.
Behavioral research usually involves normally distributed scores for which we compute the mean and
standard deviation, so computing z-scores and using the
z-distribution are usually appropriate. Besides describing
relative standing as above, z-scores have two additional
uses: (1) comparing scores from different distributions
and (2) computing the relative frequency of scores.
To see just how comparable the z-scores from
these two classes are, we can plot their z-distributions
on the same graph. Figure 5.3 shows the result, with
the original raw scores also plotted. (The English
curve is taller because of a higher frequency at each
score.) Although the classes have different distributions of raw scores, the location of each z-score is the
same. For example, any normal distribution is centered over its mean. This center is at z ⫽ 0, regardless
of whether this corresponds to a 30 in statistics or a
40 in English. Also, scores that are ⫹1SX above their
respective means are at z ⫽ ⫹1 regardless of whether
this corresponds to a 35 in statistics or a 50 in English. Likewise, the raw scores of 40 in statistics and 60
in English are both 2 standard deviations above their
respective means, so both are at the same location,
called z ⫽ ⫹2. And so on: When two raw scores are
the same distance in standard deviations from their
A second important use of z-scores is to compare
scores from different variables. Here’s a new example.
Say that Althea received a grade of 38 on her statistics
quiz and a grade of 45 on her English paper. These
scores reflect different kinds of tasks, so it’s like comparing apples to oranges. The solution is to transform
the raw scores from each class into z-scores. Then we
can compare Althea’s relative standing in English to
her relative standing in statistics, and we are no longer
comparing apples and oranges.
Note: z-scores equate or standardize different distributions, so they are often referred to as standard
scores.
Say that for the statistics quiz, the X was 30 and
Figure 5.3
the SX was 5. Althea’s grade
Comparison of Distributions for Statistics and English Grades, Plotted on
of 38 becomes z ⫽ ⫹1.6.
the Same Set of Axes
For the English paper, the
X was 40 and the SX was
X
English
10, so Althea’s 45 becomes
z ⫽ ⫹.5. Althea’s z of ⫹1.6
Statistics
in statistics is farther above
f
the mean than her z of ⫹.5
in English is above the
mean, so she performed
relatively better in statistics.
z-scores 0
–3
–2
–1
0
+1
+2
+3
Say that another student,
Statistics
15
20
25
30
35
40
45
Millie, obtained raw scores
English
10
20
30
40
50
60
70
that produced z ⫽ ⫺2 in statistics and z ⫽ ⫺1 in English.
Millie in
Millie in
Althea in
Althea in
Millie did better in English
Statistics,
English,
English,
Statistics,
because her z-score of ⫺1 is
z = –2
z = –1
z = +.5
z = +1.6
less distance below the mean.
74
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
respective mean, they produce the same z-score and
are at the same location in the z-distribution.
Using this z-distribution, we can see that Althea
scored higher in statistics than in English, but Millie
scored higher in English than in statistics.
To compare raw scores from two
different variables, transform the
scores into z-scores.
5-4 USING THE
z-DISTRIBUTION TO
COMPUTE RELATIVE
FREQUENCY
The third important use of z-scores is to determine the
relative frequency of specific raw scores. Recall that relative frequency is the proportion of time that a score
occurs and that it can be computed using the proportion
of the total area under the curve. Usually we are interested in finding the combined relative frequency for several scores in a “slice” of the normal curve. We can use
the z-distribution to determine this because, as we saw in
the previous section, a particular z-score is always at the
same location on the z-distribution for any variable. By
being at the same location, the z-score will always delineate the same slice, cutting off the same proportion of the
curve. Therefore, the relative frequency of the z-scores in
a slice will be the same on all normal z-distributions. The
relative frequency of those z-scores is also the relative
frequency of the corresponding raw scores.
To understand this, look again at Figure 5.3.
Although the heights of the two curves differ, the proportion under each curve is the same. For example, 50%
of each distribution is to the left of its mean, and this is
where all of the negative z-scores are. In other words,
negative z-scores occur 50% of the time, so they have a
combined relative frequency of .50. Having determined
the relative frequency of the z-scores, we work backward to identify the corresponding raw scores. In Figure 5.3, the statistics students having negative z-scores
have raw scores between 15 and 30, so the relative
frequency of these scores is .50. The English students
having negative z-scores have raw scores between 10
and 40, so the relative frequency of these scores is .50.
Here’s another example. Recall from Chapter 4
that about 34% of the normal distribution is between
the mean and the score that is 1 standard deviation
above the mean (at⫹1SX). Now you know that a score
at⫹1SX from the mean produces a z-score of ⫹1.
Thus, in Figure 5.3, statistics scores between 30 and
35 occur .34 of the time. English scores between 40
and 50 occur .34 of the time. Likewise, we know that
68% of the scores are between the scores as ⫺1S X and
⫹1SX, which translates into between the z-scores of
⫹1 and ⫺1. Thus, in Figure 5.3, 68% of the statistics
scores are between 25 and 35, and 68% of the English
scores are between 30 and 50.
We can also determine the relative frequencies
for any other portion of a distribution. To do so, we
employ the standard normal curve.
5-4a The Standard Normal Curve
Because the relative frequency of a particular z-score is
always the same for any normal distribution, we don’t
need to draw a different z-distribution for each variable we measure. Instead, we envision one standard
curve that, in fact, is called the standard normal curve.
The standard normal curve is a perfect normal
z-distribution that serves as our model of any approximately normal z-distribution. The idea is that most
raw scores produce only an approximately normal
z-distribution. However, to simplify things, we operate as if the z-distribution fits this perfect standard
normal curve. We use this curve to first determine the
relative frequency of particular z-scores. Then, as we
did above, we work backward to determine the relative frequency of the corresponding raw scores. This
is the relative frequency we would expect if our data
formed a perfect normal distribution. Therefore, this
approach is most accurate when (1) we have a large
sample (or population) of (2) interval or ratio scores
that (3) come close to forming a normal distribution.
You may compute z-scores using either of our formulas for finding a z-score in a sample or in a population. Then the first step is to determine the relative
frequency of the z-scores by looking
standard
at the area under the standard nornormal curve
mal curve. Statisticians have already
A perfect normal
determined the proportion of the
curve that serves
as a model of any
area under various parts of the curve,
approximately normal
as shown in Figure 5.4. The numz-distribution
bers above the X axis indicate
Chapter 5: Describing Data with z-Scores and the Normal Curve
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
75
©iStockphoto.com/Audrey Roorda
the proportion of the total area between the z-scores.
The numbers below the X axis indicate the proportion of the total area between the mean and the
z-score. (You won’t need to memorize them.)
Each proportion is also the relative frequency of
the z-scores—and raw scores—located in that part
of the curve. For example, between a z of 0 and a z of
⫹1 is .3413 of the area under the curve, so about 34%
of the scores really are here. Likewise, between z ⫽ ⫺1
and z ⫽ ⫹1 equals .3413 ⫹ .3413, which is .6826, so
about 68% of the scores are located here. Or, between
the z-scores of ⫹1 and ⫹2 is .1359 of the curve, so
scores occur here 13.59% of the time. And combining
this with the .3413 that is between the mean and a z
of ⫹1 gives a total of .4772 between the mean and
z ⫽ ⫹2. Also, we can add together nonadjacent portions. For example, out in the upper tail beyond z ⫽ ⫹2
is .0228 of the curve (because .0215 ⫹ .0013 ⫽ .0228).
In the lower tail beyond z ⫽ ⫺2 is also .0228. Adding
the two tails gives a total of .0456 of all scores that
fall beyond z⫽{2. And so on. (Notice that z-scores
beyond ⫹3 or beyond ⫺3 occur only .0013 of the time,
which is why the range of z is essentially between {3.)
We usually begin with a particular raw score in
mind and then compute its z-score. For example, back
in our original attractiveness scores, say that Steve has
a raw score of 80. With X ⫽ 60 and SX ⫽ 10, we have
We can envision Steve’s location as in Figure 5.5.
We might first ask what proportion of the scores
are expected to fall between the mean and Steve’s
score. We saw above that .4772 of the total area falls
between the mean and z ⫽ ⫹2. Therefore, we also
expect .4772, or 47.72%, of our attractiveness scores
to fall between the mean score of 60 and Steve’s score
of 80. Conversely, .0228 of the area—and scores—are
above his score.
We might also ask how many people scored between
X ⫺ X 80 ⫺ 60
the
mean
and Steve’s score. Then we would convert
⫽
⫽
⫹2
z⫽
SX
10
relative frequency to simple frequency by multiplying
the N of the sample times the relative frequency. Say
His z-score is ⫹2.
that our N was 1000. If we expect .4772 of all scores to
fall between the mean
and a z of ⫹2, then
Figure 5.4
(.4772)(1000) ⫽ 477.2,
so we expect about 477
Proportions of Total Area under the Standard Normal Curve
people to have scores
The curve is symmetrical: 50% of the scores fall below the mean, and 50% fall above the mean.
between the mean and
.50
.50
Steve’s score.
Mean
We can also determine a score’s expected
percentile (the percent
f
of the scores below—
graphed to the left
of—a score). As in
.0013
.0215
.1359
.3413
.3413
.1359
.0215
.0013
Figure 5.5, on a normal
z-scores –3
–2
–1
0
+1
+2
+3
distribution the mean
.3413
.3413
is the median (the 50th
.4772
.4772
percentile). A positive
.4987
.4987
z-score is above the
mean, so Steve’s score
76
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
5-4b Using the
Figure 5.5
z-Table
Location of Steve’s Score on the z-Distribution of Attractiveness Scores
Steve’s raw score of 80 is a z-score of ⫹2.
So far, our examples have
involved
whole-number
.50
.50
Mean
z-scores, but with real
data, a z-score may contain
decimals. To find the prof
portion of the total area
under the standard normal
curve for any two-decimal
z-score, look in Table 1
of Appendix B. A portion
Raw scores
30 35
40 45
50 55 60 65 70 75 80 85 90
of this “z-table” is repro–3
–2
–1
0
+1
+2
+3
z-scores
duced in Table 5.1.
.50
.4772
.0228
Say that you seek the
.9772
area under the curve above
or below a z ⫽ ⫹1.63.
Steve
First, locate the z in column A, labeled “z.” Then
move to the right. Column B is labeled “Area between
of ⫹2 is above the 50th percentile. In addition, Steve’s
the mean and z.” It contains each proportion under the
score is above the 47.72% of the scores that fall
curve between the mean and the z identified in column A.
between the mean and his score. Thus, we add the .50
Here, .4484 of the curve is between the mean and the z of
of the scores below the mean to the .4772 of the scores
⫹1.63. This is shown in Figure 5.6. Because the z is posibetween the mean and his score. This gives a total of
tive, we place this area between the mean and the z on the
.9772 of all scores that are below Steve’s score. We
usually round off percentile to a whole number, so
Steve’s raw score is at the 98th percentile.
Table 5.1
Finally, we can work in the opposite direction to
Sample Portion of the z-Table
find a raw score at a particular relative frequency or
percentile. Say that we seek the score that demarcates
the upper .0228 of the distribution. First in terms of
z-scores, we see that above a z ⫽ ⫹2 is .0228 of the
B
B
distribution. Then to find the raw score that corresponds to this z, we use the formula for transforming a z-score into a raw score: X ⫽ (z)(SX) ⫹ X. We’ll
C
C
find that above a raw score of 80 is .0228 of the
distribution.
⫺z
⫹z
X
A
To determine the relative
frequency of raw scores, transform
them into z-scores and then use
the standard normal curve.
z
B
Area between the
mean and z
C
Area beyond z
in the tail
1.60
1.61
1.62
1.63
1.64
1.65
.4452
.4463
.4474
.4484
.4495
.4505
.0548
.0537
.0526
.0516
.0505
.0495
Chapter 5: Describing Data with z-Scores and the Normal Curve
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
77
the relative frequency of
scores between his score
Distribution Showing the Area under the Curve for z ⫽ ⫹1.63
and the mean, then from
and z ⫽ ⫺1.63
column B, we expect
.4484 of the scores to
be here. For simple freColumn B
Column B
quency, multiply N times
the proportion: If N is
1000, then we expect
f
Column C
Column C
(1000)(.4484) ⫽ 448 men
to have a z-score and raw
score here. For percen.4484
.4484
.0516
.0516
tile, we want the percent
z = –1.63
z = +1.63
X
of the distribution to the
left of his score, so from
column C we see that
below Anthony’s score is .0516 of the curve, so he
right-hand side of the distribution. Next, Column C is
is at about the 5th percentile. Finally, if we began by
labeled “Area beyond z in the tail.” It contains each proasking what raw score creates either the slice containportion under the curve that is in the tail beyond the
ing .0516 or the slice containing .4484 of the curve,
z-score. Here, .0516 of the curve is in the right-hand tail
we would first find the proportion in column C or
of the distribution beyond the z of ⫹1.63 (also shown
B, respectively, and then look in column A to find
in Figure 5.6).
the z-score of ⫺1.63. Then we would transform the
Notice that the z-table shows no positive or
z-score to its corresponding raw score using our previnegative signs. You must decide whether your z
ous formula.
is positive or negative and place the areas in their
Note: If you seek a proportion not in the z-table,
appropriate locations. Thus, if we had the negative
use the z-score for the proportion that is nearest to
z of ⫺1.63, columns B and C would provide the
what you seek. Thus, say we seek .2000 in column B.
respective areas shown on the left-hand side of FigThe nearest proportion is .1985, so z ⫽ { .52.
ure 5.6. If you get confused when using the z-table,
Table 5.2 summarizes all of the procedures we
look at the drawing of the normal curve at the top
have discussed.
of the table, as was in Table 5.1. The different slices
are labeled to indicate the part of the
curve described in each column.
Table 5.2
We can also work in the opposite
Summary of Steps When Using the z-Table
direction, starting with a specific proportion and finding the corresponding
If You Seek
First, You Should
Then You
z-score. First, find the proportion in
Relative frequency of scores transform X to z
find area in column B*
column B or C, depending on the area
between X and X
you seek. Then identify the z-score in
Relative frequency of scores transform X to z
find area in column C*
column A. For example, say that you
beyond X in tail
seek the z-score that marks off .4484
X that marks a given relative find relative frequency transform z to X
of the curve between the mean and z.
frequency between X and X in column B
Find .4484 in column B of the table,
and then, in column A, the z-score
X that marks a given relative find relative frequency transform z to X
is 1.63.
frequency beyond X in tail
in column C
Use the information from the
transform X to z
find area in column B
Percentile of an X above X
z-table as we have done previously. For
and add .50
example, say that we examine Anthony’s
transform X to z
find area in column C
Percentile of an X below X
raw score, which happens to produce
the z of ⫺1.63. This is located on the
*To find the simple frequency of the scores, multiply relative frequency times N.
far-left side of Figure 5.6. If we seek
Figure 5.6
78
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
>
>
To find the relative frequency of scores
above or below a raw score, transform it
into a z-score. From the z-table, find the
proportion of the area under the curve
above or below that z.
To find the raw score at a specified
relative frequency, find the proportion
in the z-table and transform the
corresponding z into its raw score.
More Examples
With X ⫽ 40 and SX ⫽ 4,
To find the relative frequency of raw scores above 45:
z ⫽ (X ⫺ X)/SX ⫽ (45 ⫺ 40)/4 ⫽ ⫹1.25. Saying “above”
indicates "in the upper tail," so from column C the relative
frequency is .1056.
To find the percentile of the score of 41.5:
z ⫽ (41.5 ⫺ 40)/4 ⫽ ⫹.38. Between this positive z and X
in column B is .1480. This score is at the 65th percentile
because .1480 ⫹ .50 ⫽ .6480 ⫽ .65.
To find the proportion below z ⫽ ⫺.38: “Below”
indicates "the lower tail," so from column C is .3520.
To find the raw score at the 65th “percentile” we
have .65 ⫺ .50 ⫽ .15. Then, from column B, the proportion closest to .15 is .1517, so z ⫽ ⫹.39. Then
X ⫽ (⫹.39)(4) ⫹ 40 ⫽ 41.56.
For Practice
For a sample: X ⫽ 65, SX ⫽ 12, and N ⫽ 1000.
1. What is the relative frequency of scores below 59?
4. The “top” is the upper tail, so from column C
the closest to .30 is .0301, with z ⫽ ⫹1.88; so
X ⫽ (⫹1.88)(12) ⫹ 65 ⫽ 87.56
> Quick Practice
5-5 USING z-SCORES
TO DESCRIBE SAMPLE
MEANS
We can also use the logic of z-scores to describe the
relative standing of an entire sample. We do this by
computing a z-score for the sample’s mean.
To see how the procedure works, say that we
give a subtest of the Scholastic Aptitude Test (SAT)
to a sample of 25 students at Prunepit U. Their mean
score is 520. Nationally, the mean of individual SAT
scores is 500 (and sX is 100), so it appears that at least
some Prunepit students scored relatively high, pulling
the mean to 520. But how do we interpret the performance of the sample as a whole? The problem is
the same as when we examined individual raw scores:
Without a frame of reference, we don’t know whether
a particular sample mean is high, low, or in between.
The solution is to evaluate a sample mean by
computing its z-score. Previously, a z-score compared
a particular raw score to the other raw scores that
occur in a particular situation. Now we’ll compare our
sample mean to the other sample means that occur in
a particular situation. However, our discussion must
first take a small detour to see how to create a distribution showing these other means. This distribution is
called the sampling distribution of means.
2. What is the percentile of 75?
3. How many scores are between the mean and 70?
4. What raw score delineates the top 3%?
> Answers
3. z ⫽ (70 ⫺ 65)/12 ⫽ ⫹.42; from column B is .1628;
(.1628)(1000) gives about 163 scores.
©nakamasa/Shutterstock.com
2. z ⫽ (75 ⫺ 65)/12 ⫽ ⫹.83; between z and
the mean, from column B is .2967. Then
.2967 ⫹ .50 ⫽ .7967 ⫽ 80th percentile.
1. z ⫽ (59 ⫺ 65)/12 ⫽ ⫺.50; “below” is the lower tail, so
from column C is .3085.
Chapter 5: Describing Data with z-Scores and the Normal Curve
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
79
sampling
distribution of
means A frequency
distribution showing
all possible sample
means that occur when
samples of a particular
size are drawn from a
population
5-5a The Sampling
Distribution
of Means
If the national average of SAT scores
is 500, then we can envision a population of SAT scores where m ⫽ 500.
When we selected our sample of 25
central limit
students and then obtained their SAT
theorem A
statistical principle
scores, we essentially drew a sample
that defines the mean,
of 25 scores from this population.
standard deviation, and
To evaluate our sample mean, we
shape of a sampling
first create a distribution showing
distribution
all other possible means that could
occur when selecting a sample of
25 scores from this population. To do so, pretend
that we record all SAT scores from the population on
slips of paper and put them in a large hat. Then we
select a sample with the same size N as ours, compute the sample mean, replace the scores in the hat,
draw another 25 scores, compute the mean, and so
on. We do this an infinite number of times so that
we create the population of means. Because the scores
selected in each sample will not be identical, not all
sample means will be identical. If we then construct
a frequency polygon of the different values of X we
obtained, we would create a sampling distribution of
means. The sampling distribution of means is the
frequency distribution of all possible sample means
that occur when an infinite number of samples of the
same size N are selected from one raw score population. Our SAT sampling distribution of means is
shown in Figure 5.7. This is similar to a distribution
of raw scores, except that each “score” along the X
axis is a sample mean.
Of course, in reality we cannot sample the SAT
population an “infinite” number of times. However,
we know that the sampling distribution would look
like Figure 5.7 because of the central limit theorem.
The central limit theorem is a statistical principle
that defines the shape, the mean, and the standard
deviation of a sampling distribution.
From the central limit theorem, we know the
following:
1. A sampling distribution is always an approximately normal distribution. Here our sampling
distribution is a normal distribution centered
around 500. In the right-hand portion of the
curve are means above 500, and in the lefthand portion are means below 500. It is a
normal distribution for the following reasons:
Most scores in the population are close to 500,
so most samples will contain scores close to
500, which will produce sample means close
to 500. Sometimes, though, just by chance,
strange samples will contain primarily scores
that are farther below or above 500, and this
will produce means that are farther below or
above 500 that occur less frequently. Once in
a great while, very unusual samples will occur
that result in sample means that deviate greatly
from 500.
2. The mean of the sampling distribution equals the
mean of the underlying raw score population used
to create the sampling distribution. The sampling
distribution is the popFigure 5.7
ulation of all possible
Sampling Distribution of SAT Means
sample means, so its
The X axis shows the different values of X obtained when sampling the SAT population.
mean is symbolized by
m, and it stands for the
µ
average sample mean.
(That’s right—here, m is
the mean of the means!)
The m of our sampling
f
distribution equals
500 because the mean
of the underlying raw
score population is 500.
X
X
X
X
X
X
X
X
X
X
X
X
X
Because the individual
Lower means
Higher means
SAT scores are balanced
500
around 500, over the
80
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
long run, the sample means created from those
scores will also be balanced around 500, so the
average mean (m) will equal 500.
3. The standard deviation of the sampling distribution is mathematically related to the standard deviation of the raw score population.
As you’ll see in a moment, the variability of
the raw scores influences the variability of the
sample means.
Note: We will always refer to the population of
raw scores used to create the sampling distribution as
the underlying raw score population. Here, our sampling distribution of SAT means was created from the
underlying raw score population of SAT scores. The m
of the sampling distribution always equals the m of the
underlying raw score population.
relative standing of any “score” on a normal distribution: We use z-scores. That is, we will determine where
our mean of 520 falls on the X axis of this sampling
distribution by finding its distance from the m of the
sampling distribution when measured using the standard deviation of the sampling distribution. Then we
will know whether our mean is: (a) one of the frequent means that are relatively close to the average
sample mean that occurs with this underlying population, or (b) one of the higher or lower means that
seldom occur with this population.
To compute the z-score for a sample mean, we
need one more piece of information: the “standard
deviation” of the sampling distribution.
5-5b The Standard Error
of the Mean
The sampling distribution of means
is a normal distribution having
the same m as the underlying raw
score population used to create it,
and it shows all possible sample
means that can occur when
sampling from that raw score
population.
The importance of the central limit theorem is
that we can describe a sampling distribution without
having to infinitely sample a population of raw scores.
Therefore, we can create the sampling distribution of
means for any raw score population.
Why do we want to see the sampling distribution? We took a small detour, but the original problem
was to evaluate our Prunepit mean of 520. Once we
envision the distribution back in Figure 5.7, we have
a model of the frequency distribution of all sample
means that occur when measuring SAT scores. Then
we can use this distribution to determine the relative
standing of our sample mean.
The sampling distribution is a normal distribution, and you already know how to determine the
The standard deviation of the sampling distribution
of means is called the standard error of the mean.
Like a standard deviation, the standard error of the
mean can be thought of as the “average” amount that
the sample means deviate from the m of the sampling
distribution. That is, in some sampling distributions,
the sample means may be very different from one
another and deviate greatly from the average sample
mean. In other distributions, the means may be very
similar and deviate little from m.
For now, we’ll discuss the true standard error of
the mean, as if we had actually computed it using the
entire sampling distribution. Its symbol is sX–. The s
indicates that we are describing a population, and
the subscript X indicates it is a population of sample
means. The central limit theorem tells us that sX can
be found using this formula:
THE FORMULA FOR THE TRUE STANDARD
ERROR OF THE MEAN IS
sX ⫽
sX
1N
Notice that the formula involves
sX: This is the true standard deviation of the underlying raw score
population used to create the sampling distribution. The size of sX
depends on the size of sX because
standard
error of the
mean (SX̄) The
standard deviation
of the sampling
distribution of means
Chapter 5: Describing Data with z-Scores and the Normal Curve
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
81
5-5c Computing a z-Score for
more variable raw scores are likely to produce very
different samples each time, so their means will differ
more (and sX will be larger). Less variable scores will
produce more similar samples and means (and sX will
be smaller). The size of sX also depends on the size of
our N. The larger the N, the more each sample is like
the population, so the sample means will be closer to
the population mean and to each other (and sX will
be smaller). A smaller N allows for more different
samples each time (and sX will be larger).
To compute sX for our SAT example:
a Sample Mean
We use this formula to compute a z-score for a sample
mean:
THE FORMULA FOR TRANSFORMING
A SAMPLE MEAN INTO A Z-SCORE IS
z⫽
STEP 1: Identify the sX of the underlying raw score
population and the N used to create your
sample. For the SAT, the sX is 100, and the
N was 25, so
sX ⫽
sX
1N
⫽
In the formula, X is our sample mean, m is the
mean of the sampling distribution (which equals the
mean of the underlying raw score population), and sX
is the standard error of the mean, which we computed
above. The answer is a z-score that indicates how far
the sample mean is from the mean of the sampling distribution (m), measured in standard error units (sX).
To compute the z-score for our Prunepit sample,
100
125
STEP 2: Compute the square root of N. The square
root of 25 is 5, so
sX ⫽
X⫺m
sX
100
5
STEP 1: Compute the standard error of the mean (sX)
as described above, and identify the sample
mean and m of the sampling distribution.
STEP 3: Divide, and we have
sX ⫽ 20
This indicates that in our SAT sampling distribution, the individual sample means differ from the m of
500 by an “average” of 20 points.
The m of the sampling
distribution equals the m of the
underlying raw score population
the sample is selected from.
The standard error of the mean (sX)
is the standard deviation of the
sampling distribution of means.
For our data, X ⫽ 520, m ⫽ 500, and sX ⫽ 20, so
we have
z⫽
Now, at last, we can calculate a z-score for our
sample mean.
STEP 2: Subtract m from X. Then
©iStockphoto.com/Jesus Jauregui
82
X ⫺ m 520 ⫺ 500
⫽
sX
20
z⫽
⫹20
20
STEP 3: Dividing gives
z ⫽ ⫹1
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Thus, a sample mean of 520 has a z-score of
⫹1 on the SAT sampling distribution of means
that occurs when N is 25.
Here’s another example that combines
everything we’ve done, again using our SAT
population, where m ⫽ 500 and sX ⫽ 100. Say
that over at Podunk College, a sample of 25 SAT
scores produced a mean of 460. To find their
z-score:
sX ⫽
sX
1N
⫽
Bill Aron/PhotoEdit
1. First, compute the standard error of the mean
(s X):
100
100
⫽
⫽ 20
5
125
2. Then find z:
z⫽
X⫺m
460 ⫺ 500
⫺40
⫽
⫽
⫽ ⫺2
sX
20
20
The Podunk sample has a z-score of ⫺2 on the sampling distribution of SAT means.
5-5d Describing the Relative
Frequency of Sample Means
and z-table to describe the relative frequency of sample
means in any part of a sampling distribution.
Figure 5.8 shows the standard normal curve
applied to our SAT sampling distribution. These are
the same proportions that we used to describe individual raw scores. Here, however, each proportion is the
expected relative frequency of the sample means that
occur in this situation. For example, the sample mean
of 520 from Prunepit U has a z of ⫹1. As shown, and
as in column B of the z-table, .3413 of all scores fall
between the mean and z of ⫹1 on any normal distribution. Therefore, .3413 of all sample means are
expected to fall here, so we expect .3413 of all SAT
sample means to be between 500 and 520 (when N
is 25). Or, for sample means above our sample mean,
from column C of the z-table, above a z of ⫹1 is .1587
of the distribution. Therefore, we expect that .1587 of
all SAT sample means will be above 520.
Everything we said previously about a z-score for
an individual score applies to a z-score for a sample
mean. So, because our original Prunepit mean has
a z-score of ⫹1, we know that it is above the m of
the sampling distribution by an amount equal to the
“average” amount that sample means deviate above m.
Our Podunk sample, however, has a z-score of ⫺2, so
its mean is relatively low compared to other means
that occur in this situation.
And here’s the nifty
Figure 5.8
part: Because a sampling
Proportions of the Standard Normal Curve Applied to the Sampling Distribution
distribution is always an
of SAT Means
approximately normal distribution, transforming all
.50
.50
µ
of the sample means in the
sampling distribution into
z-scores produces a normal
z-distribution. Recall that
f
the standard normal curve
is our model of any normal
z-distribution. Therefore,
.0215
.1359
.3413
.3413
.1359
.0215
.0013
.0013
as we did previously with
SAT means
440
460
480
500
520
540
560
raw scores, we can use the
z-scores
–3
–2
–1
0
+1
+2
+3
standard normal curve
Chapter 5: Describing Data with z-Scores and the Normal Curve
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
83
On the other hand, the Podunk sample mean was
460, producing a z of ⫺2. From column B of the z-table,
a total of .4772 of the distribution falls between the
mean and this z-score. Therefore, we expect .4772 of
SAT means to be between 500 and 460. From column
C, we expect only .0228 of the means to be below 460.
We can use this same procedure to describe sample means from any normally distributed variable. To
be honest, though, researchers do not often compute
the z-score for a sample mean solely to determine relative frequency (nor does SPSS include this routine).
However, it is extremely important that you understand this procedure, because it is the essence of all
upcoming inferential statistical procedures (and you’ll
definitely be doing those!).
> Quick Practice
>
We can describe a sample mean by
computing its z-score and using the
z-table to determine the relative
frequency of sample means above or
below it.
More Examples
On a test, m ⫽ 100, sX ⫽ 16, and our N ⫽ 64. What proportion of sample means will be above X ⫽ 103?
First, compute the standard error of the mean (sX ):
sX ⫽
sX
1N
⫽
16
16
⫽
⫽2
8
164
Next compute z:
z⫽
Apply the standard normal curve
model and the z-table to any
sampling distribution.
5-5e Summary of Describing
a Sample Mean with a z-Score
X ⫺ m 103 ⫺ 100 ⫹3
⫽
⫽
⫽ ⫹1.5
sX
2
2
Finally, examine the z-table: The area above this z is the
upper tail of the distribution, so from column C is .0668.
This is the proportion of sample means expected to be
above a mean of 103.
For Practice
A population of raw scores has m ⫽ 75 and sX ⫽ 22; our
N ⫽ 100 and X ⫽ 80.
To describe a sample mean from any underlying raw
score population:
1. The m of the sampling distribution here equals
_____.
1. Create the sampling distribution of means with
a m equal to the m of the underlying raw score
population.
2. The symbol for the standard error of the mean is
_____, and here it equals _____.
84
1. 75
3. Use the z-table to determine the relative frequency
of z-scores above or below this z-score, which is
the relative frequency of sample means above or
below your mean.
> Answers
2. sX ; 22/ 1100 ⫽ 2.2
b. Compute z, finding how far your X is from
the m of the sampling distribution, measured
in standard error units.
4. How often will sample means between 75 and 80
occur in this situation?
3. z ⫽ (80 ⫺ 75)/2.2 ⫽ ⫹2.27
a. Using the sX of the underlying raw score
population and your sample N, compute the
standard error of the mean, sX.
3. The z-score for a sample mean of 80 is _____.
4. From column B: .4884 of the time
2. Compute the z-score for the sample mean:
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
USING SPSS
As described on Review Card 5.4, SPSS will simultaneously transform an entire sample of raw scores into
z-scores. It does not, however, provide the information found in the z-table.
Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out
and use the Chapter Review Cards in the back of your book. Check out the additional study aids online
in CourseMate at www.cengagebrain.com
STUDY PROBLEMS
(Answers for odd-numbered problems are in Appendix C.)
1. (a) What does a z-score indicate? (b) What are the
three major uses of z-scores with individuals’ scores?
2. What two factors determine the size of a z-score?
3. (a) What is a z-distribution? (b) Is a z-distribution
always a normal distribution? (c) What is the mean
and standard deviation of any z-distribution?
4. Why are z-scores called “standard scores”?
5. (a) What is the standard normal curve? (b) How
is it used to describe raw scores? (c) The standard
normal curve is most appropriate when raw scores
have what characteristics?
6. An instructor says that your test grade produced a
very large positive z-score. (a) How well did you do
on the test? (b) What do you know about your raw
score’s relative frequency? (c) What does it mean if
you scored at the 80th percentile? (d) What distribution would the instructor examine to make the
conclusion in part c?
7. An exam produced scores with X ⫽ 86 and SX ⫽ 12.
(a) What is the z-score for a raw score of 80?
(b) What is the z-score for a raw score of 98?
(c) What is the raw score for z ⫽ ⫺1.5? (d) What is
the raw score for z ⫽ ⫹1?
8. Another exam produced raw scores with a sample
mean of 25 and standard deviation of 2. Find the
following: (a) the z-score for X ⫽ 31; (b) the z-score
for X ⫽ 18; (c) the raw score for z ⫽ ⫺2.5; (d) the
raw score for z ⫽ ⫹.5
9. Which z-score in each of the following pairs corresponds to the lower raw score? (a) z ⫽ ⫹1.0 or
z ⫽ ⫹2.3; (b) z ⫽ ⫺2.8 or z ⫽ ⫺1.7; (c) z ⫽ ⫺.70 or
z ⫽ ⫹.20; (d) z ⫽ 0 or z ⫽ ⫹1.4
10. For each pair in question 9, which z-score has the
higher frequency?
11. (a) What are the steps for using the z-table to
find: (a) the relative frequency of raw scores in a
specified slice of a distribution? (b) the percentile
for a raw score below the mean? (c) the percentile for a raw score above the mean? (d) the raw
score that cuts off a specified relative frequency or
percentile?
12. In a normal distribution, what proportion of all
scores would fall into each of the following areas?
(a) between the mean and z ⫽ ⫹1.89; (b) below
z ⫽ ⫺2.30; (c) between z ⫽ ⫺1.25 and z ⫽ ⫹2.75;
(d) above z ⫽ ⫹1.96 and below ⫺1.96
13. For a distribution, X ⫽ 100, SX ⫽ 16, and N ⫽ 500.
(a) What is the relative frequency of scores
between 76 and the mean? (b) How many participants are expected to score between 76 and the
mean? (c) What is the percentile of someone scoring 76? (d) How many participants are expected to
score above 76?
14. (a) What is a sampling distribution of means?
(b) How do we use it? (c) What do we mean by the
“underlying raw score population”?
15. (a) What three things does the central limit
theorem tell us about the sampling distribution of
means? (b) Why is the central limit theorem useful
when we want to describe a sample mean?
16. What is the standard error of the mean and what
does it indicate?
17. In an English class, Emily earned a 76 (with X ⫽ 85,
SX ⫽ 10). Her friend Amber in French class earned a
60 (with X ⫽ 50, SX ⫽ 4). Should Emily be bragging
about how much better she did? Why?
18. What are the steps for finding the relative
frequency of sample means above or below a
specified mean?
Chapter 5: Describing Data with z-Scores and the Normal Curve
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
85
19. Derrick received a 55 on a biology test (with X ⫽ 50)
and a 45 on a philosophy test (with X ⫽ 50). He is
considering whether or not to ask his professor to
curve the grades using z-scores. (a) Does he want
the SX to be large or small in biology? Why? (b)
Does he want the SX to be large or small in philosophy? Why?
20. It seems that everyone I meet claims to have an IQ
above 145, and often above 160. I know that most
IQ tests produce a normal distribution with a m at
about 100 and a sX of about 15. Why do I doubt
their claims?
21. Students may be classified as having a math
dysfunction—and not have to take statistics—if
they score below the 25th percentile on a diagnostic test. The m of the test is 75 and sX ⫽ 10.
Approximately what raw score is the cutoff score
needed to avoid taking statistics?
22. For the diagnostic test in problem 21, we want to
create the sampling distribution of means when
N ⫽ 64. (a) What does this distribution show?
(b) What is the shape of the distribution and what
is its m? (c) Calculate sX for this distribution.
(d) What is your answer in part c called, and what
information does it provide? (e) Determine the
relative frequency of sample means above 77.
23. Candice has two job offers and must decide which
one to accept. The job in City A pays $43,000, and
86
the average cost of living is $45,000, with a standard deviation of $15,000. The job in City B pays
$46,000, but the average cost of living is $50,000,
with a standard deviation of $18,000. Assuming
salaries are normally distributed, which is the better job offer? Why?
24. Suppose you own shares of a company’s stock.
Over the past 10 trading days, its mean selling price has been $14.89. For the history of the
company, the average price of the stock has been
$10.43 (with sX ⫽ +5.60). You wonder if the mean
selling price for the next 10 days can be expected
to get much higher. Should you wait to sell, or
should you sell now?
25. A researcher develops a test for identifying
intellectually gifted children, with a m of 56 and
a sX of 8. (a) What percentage of children are
expected to score below 60? (b) What percentage of the scores will be below 54? (c) A gifted
child is defined as being in the top 20%. What
is the minimum test score needed to qualify as
gifted?
26. Using the test in question 25, you measure 64
children, obtaining a X of 57.28. Dexter says
that because this X is so close to the m of 56, this
sample is rather average. (a) Perform the appropriate procedure to evaluate this mean. (b) Decide
if Dexter’s assertion is correct by using percent to
describe this mean’s relative standing.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4LTR Press solutions are designed for today’s learners through the continuous
feedback of students like you. Tell us what you think about Behavioral Sciences STAT2
and help us improve the learning experience for future students.
Complete the Speak Up
survey in CourseMate at
www.cengagebrain.com
Follow us at
www.facebook.com/4ltrpress
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
© iStockphoto.com/mustafahacalak
YOUR
FEEDBACK
MATTERS.
6
Chapter
USING PROBABILITY TO MAKE
DECISIONS ABOUT DATA
LOOKING
BACK
GOING
F O R WA R D
Be sure you understand:
Your goals in this chapter are to learn:
• From Chapter 1, the logic of using a
sample to draw inferences about the
population.
• What probability is.
• From Chapter 2, that relative
frequency is the proportion of time
that scores occur.
• How random sampling should produce a representative sample.
• From Chapter 5, how to compute
a z-score for raw scores or sample
means, and how to determine their
relative frequency using the standard
normal curve and z-table.
• How to use a sampling distribution of means to decide whether
a sample represents a particular population.
Sections
6-1
6-2
6-3
6-4
6-5
Understanding
Probability
Probability Distributions
Obtaining Probability
from the Standard
Normal Curve
• How to compute the probability of raw scores and sample means
using z-scores.
• How sampling error may produce an unrepresentative sample.
Y
ou now know most of the common descriptive
statistics used in behavioral research. Therefore, you
are ready to begin learning the other type of statistical procedure, called inferential statistics. Recall that
these procedures are used to draw inferences from sample data
about the scores and relationship found in nature—in what we
call the population. This chapter sets the foundation for these
procedures by introducing you to the “wonderful” world of
Random Sampling and
Sampling Error
probability. Don’t worry, though, because the discussion is rather
Deciding Whether a
Sample Represents
a Population
do need to understand the basics. In the following sections we’ll
simple, and there is little in the way of formulas. However, you
discuss (1) what probability is, (2) how to determine probability
using the normal curve, and (3) how to use probability to draw
conclusions about a sample mean.
88
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Zentilia/Dreamstime.com
6-1 UNDERSTANDING
PROBABILITY
Probability is used to describe random or chance
events. By random we mean that nature is being fair,
with no bias toward one event over another (no rigged
roulette wheels or loaded dice). In statistical terminology, a random event that does occur in a given situation is our sample. The larger collection of all possible
events that might occur in this situation is the population. Thus, the sample could be drawing a particular playing card from the population of all cards in
the deck, or, when tossing a coin, the sample is the
sequence of heads and tails we see from the population of all possible combinations of heads and tails.
In research, the sample is the particular group of individuals selected from the population of individuals we
are interested in.
Because probability deals only with random
events, we compute probability only for samples that
are obtained through random sampling. Random
sampling involves selecting a sample in such a way
that all events or individuals in the population have
an equal chance of being selected. Thus, in research,
random sampling is anything akin to drawing participants’ names from a large hat that contains all names
in the population. A particular sample occurs or does
not occur solely because of the luck of the draw.
But how can we describe an event that occurs
only by chance? By paying attention to how often
the event occurs over the long run. Intuitively, we
use this logic all the time. If event A happens frequently over the long run, we think it is likely to
happen again now, and we say that it has a high
probability. If event B happens infrequently over the
long run, we think that it is unlikely to happen now,
and we say that it has a low probability.
Using our terminology, when we discuss events
occurring “over the long run,” we are talking about
how often they occur in the population of all possible events. When we decide that one event happens frequently in the population, we are making
a relative judgment and describing
the event’s relative frequency. This
random
is the proportion of time that the
sampling
Selecting samples
event occurs out of all events that
so that all members
might occur in the population. This
of the population
is also the event’s probability. The
have the same
chance of being
probability of an event is equal
selected
to the event’s relative frequency in
probability
the population of possible events
(p) The likelihood
that can occur. The symbol for
of an event when
probability is p.
a population is
Probability is essentially a sysrandomly sampled;
equal to the event’s
tem for expressing our confidence
relative frequency in
that a particular random event
the population
will occur. First we assume that an
Chapter 6: Using Probability to Make Decisions about Data
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
89
1 ⫺ .80 ⫽ .20,
the probability is
.20 that a word
will be error free.
Finally, underrstand that except
pt
when p equals either
ther
0 or 1, we are never
ver certain
that an event will
ill or will not
occur. The probability
ability of an
event is its relative
ive frequency
over the long run
n (in the population). It is up to chance—
luck—whether thee event occurs
in our sample. So, even though
I make typos 80% of the time,
I may go for quite a while without making one. That
20% of the time when I make no typos has to occur
sometime. Thus, it is only over the long run that we
expect to see precisely 80% typos.
People who fail to understand that probability
implies over the long run fall victim to what psychologists call the “gambler’s fallacy.” For example,
after observing my errorless typing for a while, the
fallacy would be thinking that errors “must” occur
now, essentially concluding that errors have become
more likely. Or, say we’re flipping a coin and get seven
heads in a row. The fallacy would be thinking that a
head is now less likely to occur, because it’s already
occurred too often (as if the coin decides, “Hold it.
That’s enough heads for a while!”). The mistake of
the gambler’s fallacy is failing to recognize that the
probability of an event is not altered by whether or
not the event occurs over the short run: Probability is
determined by what happens over the long run.
©iStockphoto.com/Ben phillips
event’s past relative frequency will continue over the
long run into the future. Then we express our confidence that the event will occur in any single sample
by using a number between 0 and 1 to express this
relative frequency as a probability. For example, I
am a rotten typist and I randomly make typos 80%
of the time. This means that in the population of my
typing, typos occur with a relative frequency of .80.
We expect the relative frequency of typos to continue
at a rate of .80 in anything else I type. This expected
relative frequency is expressed as a probability, so the
probability is .80 that I will make a typo when I type
the next woid.
Likewise, all probabilities communicate our confidence in an event. So if event A has a relative frequency of zero in a particular situation, then p ⫽ 0.
This means that we do not expect A to occur in this
situation because it never does. If A has a relative
frequency of .10 in this situation, then it has a probability of .10: Because it occurs only 10% of the time
in the population, we have some—but not much—
confidence that A will occur in the next sample. On
the other hand, if A has a probability of .95, we are
confident that it will occur: It occurs 95% of the time
in this situation, so we expect it to occur in 95% of our
samples. Therefore, our confidence is .95 that it will
occur now, so we say p ⫽ .95. At the most extreme,
an event’s relative frequency can be 1: It is 100% of
the population, so p ⫽ 1. Here we are positive it will
occur because it always occurs.
An event cannot happen less than 0% of the time
nor more than 100% of the time, so a probability
can never be less than 0 or greater than 1. Also, all
events together constitute 100% of
the population. This means that the
probability
distribution The
probabilities of all events must add
probability of every
up to 1. So, if the probability of my
event in a population
making a typo is .80, then because
6-2 PROBABILITY
©iStockphoto.com/Sharon Dominick
DISTRIBUTIONS
90
To compute the probability of an event, we need only
determine its relative frequency in the population.
When we know the relative frequency of every event
in a population, we have a probability distribution. A
probability distribution indicates the probability
of all possible events in a population.
One way to create a probability distribution is
to observe the relative frequency of events, creating an empirical probability distribution. Typically,
however, we cannot observe the entire population, so the probability distribution is based on the
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
observed frequencies of events in a sample, which
are used to represent the population. For example,
say that Dr. Fraud is sometimes very cranky, and
his crankiness is random. We observe him on 18
days and he is cranky on 6 of them. Relative frequency equals f/N, so the relative frequency of his
crankiness is 6/18, or .33. We expect that he will
continue to be cranky 33% of the time, so the
probability that he will be cranky today is p ⫽ .33.
Conversely, he was not cranky on 12 of the 18 days,
which is .67. Thus, p ⫽ .67 that he will not be cranky
today. Because his cranky days plus his noncranky
days constitute all possible events, we have the complete probability distribution for his crankiness.
Another way to create a probability distribution is
to devise a theoretical probability distribution, which
is based on how we assume nature distributes events in
the population. From such a model, we determine the
expected relative frequency of each event in the population, which is then the probability of each event.
For example, consider tossing a coin. We assume that
nature has no bias toward heads or tails, so over the
long run we expect the relative frequency of heads to
be .50 and the relative frequency of tails to be .50.
Thus, we have a theoretical probability distribution
for coin tosses: The probability of a head on any toss
is p ⫽ .50 and the probability of a tail is p ⫽ .50.
Or, consider drawing a playing card from a deck
of 52 cards. Because there is no bias in favor of any
card, we expect each card to occur at a rate of once
out of every 52 draws over the long run. Thus, each
card has a relative frequency of 1/52, or .0192, so the
> Quick Practice
>
>
An event’s probability equals its relative
frequency in the population.
A probability distribution indicates all
probabilities for a population.
More Examples
One hundred raffle tickets have been sold. Assuming no
bias, each should be selected at a rate of 1 out of 100
draws over the long run. Therefore, the probability that
you hold the winning ticket is p ⫽ 1/100 ⫽ .01.
For Practice
1. The probability of any event equals its ______ in the
______.
2. Probability applies only to what kinds of events?
3. If 25 people are in your class, what is the probability
the professor will randomly call on you?
4. Tossing a coin 10 times produces 10 heads. What is
the p of getting a head on the next toss?
> Answers
1. relative frequency; population 2. random
3. p ⫽ 1/25 ⫽ .04 4. p ⫽ .50
PNC/Digital Vision/Jupiterimages
probability of drawing any specific card on a single
draw is p ⫽ .0192.
Finally, if your state’s lottery says you have a 1
in 17 million chance of winning, it is because there
are 17 million different number combinations to
select from. Assuming no bias, we expect to draw all
17 million combinations equally often over the long
run. Therefore, because we’ll draw your selection once
out of every 17 million draws, your chance of winning
today is 1 in 17 million. (Also, to the lottery machine,
there is nothing special about a sequence like “1, 2, 3,
4,” so it has the same probability of being selected as
a sequence that looks more random like “3, 9, 1, 6.”)
And that is the logic of probability: We devise
a probability distribution based on the relative frequency of each event in the population. An event’s
relative frequency equals its probability of occurring
in a particular sample.
Chapter 6: Using Probability to Make Decisions about Data
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
91
6-3 OBTAINING
PROBABILITY FROM THE
STANDARD NORMAL CURVE
6-3a Finding the Probability
of Sample Means
We can compute the probability of obtaining particular sample means by using a sampling distribution of
means, which is another theoretical probability distribution. Recall that a sampling distribution is the frequency distribution of all possible sample means that
occur when a particular raw score population is sampled an infinite number of times using a particular N.
For example, one last time let’s look at SAT scores. We
begin with the population of students who have taken
an SAT subtest. By collecting their scores, we have the
SAT raw score population. Say its m is 500 and sX is
100. The sampling distribution shows us what to expect
if we were to infinitely sample this underlying SAT raw
score population. Say our N is 25. Then, it is as if we
randomly selected 25 scores (by reaching into a large hat
again), computed X, returned the scores to the hat, drew
another sample and computed X, and so on. Figure 6.1
shows the resulting SAT sampling distribution.
Recognize that the different values of X occur
here simply because of the luck of the draw of which
scores are in the sample each time. Sometimes a sample
mean higher than 500 occurs because, by chance, the
sample contains predominantly high scores. At other
times, a sample mean lower than 500 occurs because,
by chance, the sample contains predominantly low
scores. Thus, the sampling distribution provides a picture of how often different sample means occur simply
because of random chance.
The sampling distribution is useful because, without actually sampling the underlying raw score population, we can see all of the means that occur, and we
can determine the probability of randomly obtaining
any particular means. (Now it is like we are reaching
into a large hat containing all of the sample means.)
Enough about flipping coins and winning the lottery.
Now that you understand probability, you are ready
to see how researchers determine the probability of
data. Here, our theoretical probability distribution is
the standard normal curve. You learned in Chapter 5
that the way to use the standard normal curve is to
first compute a z-score to identify the scores in a part
of the curve. Then from the z-table, we determine
the proportion of the area under the curve for those
scores. This proportion is also the relative frequency
of those scores. But now you’ve learned in this chapter
that the relative frequency of an event is its probability. Therefore, by finding the proportion of the area
under the curve for particular scores, we also find the
probability of those scores.
For example, let’s say we have a normal distribution of scores that we collected as part of a national
study. Say it has a mean of 60. Also say that we want
to know the probability of randomly selecting a score
below 60 (as if we were reaching into a big hat containing all scores from this population). Because 60 is the
mean and we seek all scores below it, we are talking
about the raw scores that produce negative z-scores.
In the standard normal curve, negative z-scores constitute 50% of the area under the curve, so they occur
.50 of the time in the population. Therefore, the probability is .50 that we will randomly select a negative
z-score. Because negative z-scores correspond to raw
scores below 60, the probability is also .50 that we
will select a raw score below 60. And, because ultimately these scores are produced by participants, the
probability is also .50
that we will randomly
select a participant
Figure 6.1
with such a score.
Sampling Distribution of SAT Means When N ⫽ 25
In truth, researchµ
ers seldom use the
above procedure to
determine the probability of individual
f
.3413
scores. However, they
.0228
do use this procedure
as part of inferential
statistics to determine
Sample means
440
460
480
500
z-scores
–3
–2
–1
0
the probability of
sample means.
92
.0228
520
+1
540
+2
560
+3
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
However, understand how we phrase this: We are
finding the probability of obtaining particular sample means when we do draw a random sample from
our underlying raw score population. For example,
we know that the most likely sample mean is 500
when we are sampling our SAT raw score population,
because that’s the mean that happens most often in our
sampling distribution. Or, we know that 50% of the
time, we’ll get an SAT mean below 500, because the
sampling distribution shows us that’s how often those
means occur when sampling this raw score population.
We can also be more specific. Say we seek the
probability that a mean will be between 500 and 520.
To find this probability, first we transform the sample
mean of 520 into a z-score. As in Chapter 5, the first
step is to compute the standard error of the mean.
Recall that its formula is
sX ⫽
sX
1N
With N ⫽ 25 and sX ⫽ 100, we have
sX ⫽
sX
1N
⫽
100
100
⫽ 20
⫽
5
125
Next we compute the sample mean’s z-score using the
formula
z⫽
X ⫺m
sX
So we have
z⫽
X ⫺ m 520 ⫺ 500 ⫹20
⫽
⫽
⫽ ⫹1
sX
20
20
©iStockphoto.com/Jesus Jauregui/ImagePixel
Now we have rephrased the question to seek the
probability that a z-score will be between the mean
and z ⫽ ⫹1.
Next, we use the z-table to find the proportion
of the area under the normal
no
curve. Back in
Figure 6.1 we see that (from column B of the
th z-table) the relative
frequency of z-scores between
the mean and z ⫽ ⫹1 is .3413.
Therefore, we know that .3413
of the time,
tim we’ll get a mean
between 500
5
and 520 when
we are sa
sampling our underlying SAT ra
raw score population,
because the sampling
b
distribution shows us that’s how often those means
occur. So, when we select a sample of 25 scores from
the underlying SAT raw score population, the probability is .3413 that the sample mean will be between
500 and 520.
And here’s the most important part: Our underlying population of raw scores also reflects a population of students who have taken the SAT. When we
select a sample of scores it is the same as selecting
a sample of students who have those scores. Therefore, the probability of randomly selecting a particular sample mean is also the probability of randomly
selecting a sample of participants whose scores produce that mean. Thus, we’ve determined that if we
randomly select 25 participants from the population
of students who have taken the SAT, the probability
is .3413 that their sample mean will between 500
and 520.
Here’s another example: Say we seek the probability of obtaining SAT sample means above 540. As
in the right-hand tail of Figure 6.1, a mean of 540
has a z-score of ⫹2. As shown (in column C of the
z-table), the relative frequency of z-scores beyond this
z is .0228. Therefore, the probability is .0228 that we
will select a sample whose SAT scores produce a mean
higher than 540.
Finally, say we seek the probability of means that
are either above 540 or below 460. This translates
into seeking z-scores beyond a z of {2. In Figure 6.1,
beyond z ⫽ ⫹2 in the right-hand tail is .0228 of the
curve, and beyond z ⫽ ⫺2 in the left-hand tail is also
.0228 of the curve. When we talk about one area
of the distribution or another area, we add the two
areas together. Therefore, a total of .0456 of the
curve contains z-scores beyond {2, so the probability is .0456 that we’ll obtain a mean above 540 or
below 460.
PROBABILITY
OF SELECTING
A PARTICULAR SAMPLE MEAN IS
THE
THE SAME AS THE PROBABILITY
OF RANDOMLY SELECTING
A SAMPLE OF PARTICIPANTS
WHOSE SCORES PRODUCE
THAT MEAN.
Chapter 6: Using Probability to Make Decisions about Data
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
93
For Practice
1. With m ⫽ 500, sX ⫽ 100, and N ⫽ 25, what is the
probability of selecting a X above 530?
2. Approximately what is the probability of selecting
an SAT sample mean having a z-score between {1?
3. For some raw scores, if m ⫽ 100, are we more likely
to obtain a sample mean close to 100 or a mean
very different from 100?
4. The farther sample means are into the tail of the sampling distribution, the lower/higher their probability.
> Answers
4. lower
Before proceeding, be sure you
understand how a sampling distribution indicates the probability of samwhich the individuals
and scores accurately
ple means. In particular, look again
reflect the individuals
at Figure 6.1 and see what happens
and scores in the
when means have a larger z-score that
population
places them farther into the tail of the
sampling distribution: The height of
the curve above the means decreases, indicating that
they occur less often. Therefore, a sample mean having a larger z-score is less likely to occur when we are
dealing with the underlying raw score population. For
example, in Figure 6.1, a mean of 560 has a z-score
of ⫹3, indicating that we are very unlikely to select a
sample of students from the SAT population that has
such a high mean.
representative
sample A sample in
3. A mean close to 100 is more likely.
2. With 68% of the distribution here, p ⫽ .68.
1. sX ⫽ 100/ 225 ⫽ 20; z ⫽ (530 ⫺ 500)/20 ⫽ ⫹1.5;
p ⫽ .0668
The larger the absolute value of
a sample mean’s z-score, the less
likely the mean is to occur when
samples are drawn from the
underlying raw score population.
> Quick Practice
>
>
To find the probability of particular
sample means, envision the sampling
distribution, compute the z-score, and
apply the z-table.
Sample means farther into the tail of the
sampling distribution are less likely.
More Examples
In a population, m ⫽ 35 and sX ⫽ 8. What is the probability of obtaining a sample (N ⫽ 16) with a mean above
X ⫽ 38.3?, first compute the standard error of the mean:
sX ⫽ sX / 2N ⫽ 8/ 216 ⫽ 2. Then z ⫽ ( x៮ ⫺ m)/sX ⫽
(38.3 ⫺ 35)/2 ⫽ ⫹1.65. The means above 38.3 are in the
upper tail of the distribution, so from column C of the
z-table, sample means above 38.3 have a p ⫽ .0495.
94
6-4 RANDOM SAMPLING
AND SAMPLING ERROR
Now that you can compute the probability of sample
means, you are ready to begin learning about inferential
statistics. The first step is to understand why researchers
need such procedures. Recall that in research, we select
a sample of participants from the population we wish
to describe. Then we want to conclude that the way
our sample behaves is the way the population would
behave, if we could observe it. We summarize this by
using our sample’s mean to estimate the population’s
mean. However, it is at this step that researchers need
inferential statistics because there is no guarantee that
the sample accurately reflects the population. In other
words, we can never be certain that a sample is representative of the population. In a representative sample,
the characteristics of the individuals and scores in the
sample accurately reflect the characteristics of the individuals and scores in the population. For example, if
55% of a population is women, then a representative
sample has 55% women so that we’ll have the correct
mix of male and female scores. Or, if for some reason
20% of the population scored 475 on the SAT, then a
representative sample has 20% scoring at 475. And so
on. To put it simply, a representative sample is a miniature version of the population. More importantly, when
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Wayne HUTCHINSON/Alamy
4FR/E+/Getty Images
we have a representative sample, our statistics will also
be accurate: If the m in the SAT population is 500, then
the X in a representative sample will be 500.
The reason researchers select participants using
random sampling is to produce a representative
sample. A random sample should be representative
because, by being unselective when choosing participants, we allow the characteristics of the population
to occur naturally in the sample, the way they occur
in the population. Thus, when 55% of the population
is women, a random sample should also have 55%
women, because that is how often we should encounter
women when randomly selecting participants. In the
same way, random sampling should produce a sample
having all of the characteristics of the population.
At least we hope it works that way! A random
sample should be representative, but nothing forces this
to occur. The problem is that just by the luck of the
draw, a sample may be unrepresentative, having characteristics that do not match those of the population. So,
for example, 55% of a population may be women but,
by chance, we might select substantially more or fewer
women, and then the sample would not have the correct mix of male and female scores. Or, 20% of the population may score at 475, but simply through the luck
of who is selected, this score may occur more or less
often in our sample. The problem is that if the sample
is different from the population, then our statistics will
be inaccurate: Although the m in the population may be
500, the sample mean will be some other number.
Thus, any sample may be unrepresentative of the
population from which it is selected. Because this is such
a serious problem, we have a name for it—we say the
sample reflects sampling error. Sampling error occurs
when random chance produces an
sampling
unrepresentative sample, with the
error When
random chance
result that a sample statistic (such
produces an
as X) is not equal to the population
unrepresentative
parameter it represents (such as m). In
sample from a
population, with
plain English, because of the luck of
the result that the
the draw, a sample may contain too
sample’s statistic is
many high scores or too many low
different from the
scores relative to the population, so
population parameter
it represents
the sample is in error to some degree
in representing the population.
Sampling error is a central problem for researchers and is the reason you must understand inferential
statistics. As you’ll see, in research we will always be
able to specify one known population that our sample may be representing. The dilemma for researchers
occurs when our X is different from this population’s
m: Maybe this is because of sampling error, and we
have an unrepresentative sample from this population. Or, maybe this is because we have a representative sample but from some other population.
A sample may poorly represent
one population because of
sampling error, or it may
accurately represent some other
population.
Chapter 6: Using Probability to Make Decisions about Data
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
95
1. Maybe we have sampling error.
rror.
Perhaps we obtained a sample
ple
of relatively high SAT scoress
simply because of the luck of the
draw of who was selected for
or our
sample. Thus, maybe chancee produced
an unrepresentative sample and, although
it doesn’t look like it, we aree still representing the ordinary SAT population
ation where m
equals 500.
2. Maybe the sample does not represent this population. After all, these are Prunepit students, so
maybe they are not part of the ordinary SAT
population. Instead, they might belong to a different population, one having a m that is not 500.
Maybe, for example, they belong to a population
where m is 550, and our sample perfectly represents this population.
The solution to this dilemma is to use inferential
statistics to make a decision about the population
being represented by our sample. The next chapter
puts all of this into a research context, but in the following sections we will examine the basics for making
this decision.
6-5 DECIDING WHETHER
A SAMPLE REPRESENTS
A POPULATION
We deal with the possibility of sampling error in this
way: Because we rely on random sampling, how representative a sample is depends on random chance—
the luck of the draw of which individuals and scores
are selected. Therefore, we can determine whether
96
ou sample is likely to come
our
from and represent a particular
fro
population. If the sample is likely
pop
to occur when that population is
sampled, then we decide that the
sam
sample does represent that popusam
lation. If our sample is unlikely to
lation
occur wh
when that population is sampled,
then we de
decide that the sample does not
represent that population, and instead
represents some other population.
Here’s a simple example. You come
across a paragraph of someone’s typing, but you don’t know whose. Is it
mine? D
Does it represent the population
of my typing?
t
Say that the paragraph
contains zero typos. It’s possible that
contain
some q
quirk of chance produced such
an un
unrepresentative sample, but it’s
not likely:
l
I type errorless words
only 20% of the time, so the probability that I could produce an entire
errorless paragraph is extremely
error
small. Thus, because such a sample
small
is unlikely to come from the population
of my typing, you should conclude that the sample represents the population of another, competent
typist where such a sample is more likely.
On the other hand, say that there are typos in
78% of the words in the paragraph. This is consistent
with what you would expect if the sample represents
my typing, but with a little sampling error. Although
I make typos 80% of the time over the long run, you
should not expect precisely 80% typos in every sample. Rather, a sample with 78% errors seems likely
to occur simply by chance when the population of
my typing is sampled. Thus, you can accept that this
paragraph represents my typing, although somewhat
poorly.
We use the same logic to decide if our Prunepit
sample represents the ordinary population of SAT
scores where m is 500: We will determine the probability of obtaining a sample mean of 550 from this
population. As you’ve seen, we determine the probability of a sample mean by computing its z-score on
the sampling distribution of means. Thus, we first
envision the sampling distribution created from the
ordinary SAT population. This is shown in Figure 6.2.
The next step is to calculate the z-score for our sample
mean so that we can locate it on this distribution and
thus determine its likelihood.
©iStockphoto.com/Eliza Snow
For example, say that we
return to Prunepit University and
in a random sample obtain a mean
SAT score of 550. This is surprising,
rising,
because the ordinary national population of SAT scores has a m of 500.
00. Therefore, we should have obtained
d a sample
mean of 500 if our sample was perfectly
representative of this population.
n. How do we
explain a sample mean of 550? In every research
study, we will use inferential statistics to decide
between the following two opposing
posing explanations
for why a sample mean is different
rent from a particular
population mean.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
extent, containing slightly
higher scores than occur in
Sampling Distribution of SAT Means Showing Two Possible Locations of Our
the population. Thus, this
Sample Mean
is exactly the kind of mean
we’d expect if our Prunepit
µ = 500
sample came from this SAT
population and we had a
little sampling error. Theref
fore, using statistical terminology, we say we retain
the idea that our sample
probably comes from and
X
X
X
X X
X
X X
X
X X
X
X
X
X
X X
represents the ordinary SAT
A
B
population, accepting that
the difference between our
X and m reflects sampling
error.
However, say that instead, our sample has a
When evaluating a mean’s z-score, we will use
z-score at location B in Figure 6.2: Following the
this logic: Even if we are representing the populadashed line shows that this is a very infrequent and
tion where m is 500, we should not expect a perunlikely mean. Seldom is a sample so unrepresentative
fectly representative sample having a X of exactly
of the ordinary SAT population that it will produce
500.000. . . . (Think how unlikely that is!) Instead, if
this mean. In other words, our sample is unlikely to be
our sample represents this population, then the samrepresenting this population, because this mean almost
ple mean should be close to 500. For example, say
never happens with this population. Therefore, we say
that the z-score for our sample mean is at location A
that we reject that our sample represents this populain Figure 6.2. Read what the frequency distribution
tion, rejecting that the difference between the X and
indicates by following the dotted line. Remember, the
m reflects sampling error. Instead, it makes more sense
sampling distribution shows what to expect if we are
to conclude that the sample represents some other raw
sampling from the underlying SAT raw score popuscore population that has some other m, where this
lation. We see that this mean has a relatively high
sample mean is more likely. We would make the same
frequency and so is very likely in this situation. In
decision for a sample mean in the extreme lower tail
other words, when we are representing this populaof the sampling distribution.
tion, often a sample will be unrepresentative to this
Figure 6.2
THIS IS THE LOGIC USED IN ALL INFERENTIAL
PROCEDURES, SO BE SURE THAT YOU UNDERSTAND IT:
WE WILL ALWAYS BEGIN WITH A KNOWN, UNDERLYING RAW SCORE
POPULATION THAT A SAMPLE MAY OR MAY NOT REPRESENT. FROM THE
UNDERLYING RAW SCORE POPULATION, WE ENVISION THE SAMPLING
DISTRIBUTION OF MEANS THAT WOULD BE PRODUCED. THEN WE DETERMINE
THE LOCATION OF OUR SAMPLE MEAN ON THE SAMPLING DISTRIBUTION.
THE FARTHER INTO THE TAIL OF THE SAMPLING DISTRIBUTION THE SAMPLE
MEAN IS, THE LESS LIKELY THAT THE SAMPLE COMES FROM AND REPRESENTS
THE UNDERLYING RAW SCORE POPULATION THAT WE BEGAN WITH.
Chapter 6: Using Probability to Make Decisions about Data
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
97
> Quick Practice
>
>
If the z-score shows the sample mean
is unlikely in the sampling distribution,
reject that the sample is merely poorly
representing the underlying raw score
population.
If the z-score shows that the sample
mean is likely in the sampling distribution,
accept that the sample is representing
the underlying raw score population,
although somewhat poorly.
More Examples
On a sampling distribution created from body weights in
the United States, a sample’s mean produces a z ⫽ ⫹5.00!
This indicates that such a mean is extremely unlikely
when representing this population, so we reject that our
sample represents this population. However, say that
another mean produced a z ⫽ ⫺.2. Such a mean is close
to m and very likely, so our sample is likely to be representing this population, although with some sampling error.
For Practice
1. ______ communicates that a sample mean is different from the m it represents.
of obtaining our sample from this population, and
(2) decide whether this probability makes the sample
unlikely to be representing this population. We perform both tasks simultaneously by setting up the
sampling distribution.
We formalize the decision process in this way: At
some point, a sample mean is so far above or below
500 that it is unbelievable that chance produced such
an unrepresentative sample. Any samples beyond this
point that are farther into the tail are even more unbelievable. To identify this point, as shown in Figure 6.3,
we literally draw a line in each tail of the distribution. In statistical terms, the shaded areas beyond the
lines make up the region of rejection. As shown, very
infrequently are samples so poor at representing the
SAT population that they have means in the region of
rejection.
Thus, the region of rejection is the part of a
sampling distribution containing means that are so
unlikely that we reject that they represent the underlying raw score population. Essentially, we “shouldn’t”
get a sample mean that lies in the region of rejection
if we’re representing the ordinary SAT population
because such means almost never occur with this population. Therefore, if we do get such a mean, we probably aren’t representing this population: We reject that
our sample represents the underlying raw score population and decide that the sample represents some
other population.
2. Sampling error occurs because of ______.
3. A sample mean has a z ⫽ ⫹1 on the sampling
distribution created from the population of
psychology majors. Is this likely to be a sample of
psychology majors?
4. A sample mean has a z ⫽ ⫺4 on the previous
sampling distribution. Is this likely to be a sample of
psychology majors?
> Answers:
Samples that have means in
the region of rejection are
so unrepresentative of the
underlying raw score population
that it’s a better bet they
represent some other population.
1. sampling error 2. random chance 3. yes 4. no
6-5a Setting Up the
Sampling Distribution
To decide if our Prunepit sample represents the
ordinary SAT population where m ⫽ 500, we must
perform two tasks: (1) Determine the probability
98
Conversely, if our Prunepit mean is not in the
region of rejection, then our sample is not unlikely to
be representing the ordinary SAT population. In fact,
by our definition, samples not in the region of rejection are likely to represent this population. In such
cases we retain the idea that our sample is simply
poorly representing this population of SAT scores.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
region of
rejection The
Figure 6.3
Setup of Sampling Distribution of Means Showing the Region of Rejection
µ = 500
f
Region of rejection
equals 2.5%
Region of rejection
equals 2.5%
extreme tails of a
sampling distribution
containing those
sample means
considered unlikely to
be representing the
underlying raw score
population
criterion The
X
X
X
X
X
X
Less than 500
X
X X
X
Sample means
X
X
X
X
Greater than 500
X
X
X
probability that
defines whether
a sample is unlikely to
represent the
underlying raw score
population
critical value
©iStockphoto.com/DNY59
How do we know where to draw the line that
starts the region of rejection? By defining our criterion.
The criterion is the probability that defines samples
as unlikely to be representing the raw score population. Researchers usually use .05 as their criterion
probability. (You’ll see why in Chapter 7.) Thus, using
this criterion, those sample means that together occur
only 5% of the time are defined as so unlikely that if
we get any one of them, we’ll reject that our sample
represents the underlying raw score population.
Our criterion then determines the size of our
region of rejection. In Figure 6.3, the sample means
that occur 5% of the time are those that make up the
extreme 5% of the sampling distribution. However,
we’re talking about means above or below 500 that
together are a total of 5% of the curve. Therefore, we
divide the 5% in half so the extreme 2.5% of the sampling distribution will form our region of rejection in
each tail.
Now the task is to determine if our sample mean
falls into the region of rejection. To do this, we compare the sample’s z-score to the critical value.
The criterion probability
that defines samples
tha
unlikely—and also
as u
determines the size of
dete
the region of rejection—
th
is usually p ⫽ .05
6-5b Identifying the
Critical Value
The score that marks
the inner edge of the
region of rejection
in a sampling
distribution; values
that fall beyond it
lie in the region of
rejection
There is a specific z-score at the
spot on the sampling distribution
where the line marks the beginning
of the region of rejection. Because
z-scores get larger as we go farther
into the tails, if the z-score for our sample is larger
than the z-score at the line, then our sample mean
lies in the region of rejection. The z-score at the line
is called the critical value. A critical value marks the
inner edge of the region of rejection and thus identifies
the value required for a sample to fall into the region
of rejection. Essentially, it is the minimum z-score that
defines a sample as too unlikely.
How do we determine the critical value? By
considering our criterion. With a criterion of .05,
the region of rejection in each tail is the extreme
.025 of the total area under the curve. From column
C in the z-table, the extreme .025 lies beyond the
z-score of 1.96. Therefore, in each tail, the region
of rejection begins at 1.96, so {1.96 is the critical value of z. Thus, as shown in Figure 6.4, labeling the inner edges of the region of rejection with
{1.96 completes how you should set up the sampling distribution. (Note: In the next chapter, using
both tails like this is called a two-tailed test.)
We’ll use Figure 6.4 to determine whether our
Prunepit mean lies in the region of rejection by comparing our sample’s z-score to the critical value.
A sample mean lies in the region of rejection if its
z-score is beyond the critical value.
Thus, if our Prunepit mean has a z-score that is
larger than { 1.96, then the sample lies in the region
Chapter 6: Using Probability to Make Decisions about Data
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
99
Figure 6.4
Setup of Sampling Distribution of SAT Means Showing Region of Rejection
and Critical Values
µ
f
Region of rejection
equals 2.5%
Sample means
z-scores
440
–3
460
–2
Region of rejection
equals 2.5%
480
–1
500
0
520
+1
–1.96
Critical value
540
+2
+1.96
Critical value
560
+3
good bet that chance
produced our sample
from this population.
Therefore, we reject
that our sample represents the population of
SAT raw scores having
a m of 500.
Notice that we
make a definitive, yesor-no decision. Because
our sample is unlikely to
represent the SAT raw
score population where
m is 500, we decide that
no, it does not represent
this population.
of rejection. If the z-score is smaller than or equal to
the critical value, then the sample is not in the region
of rejection.
6-5c Deciding Whether the
When a sample’s z-score is
beyond the critical value, reject
that the sample represents the
underlying raw score population.
When the z-score is not beyond
the critical value, retain the idea
that the sample represents the
underlying raw score population.
Sample Represents a Population
Finally, we can evaluate our sample mean of 550 from
Prunepit U. First, we compute the sample’s z-score
on the sampling distribution created from the ordinary SAT raw score population. There, sX ⫽ 100 and
N ⫽ 25, so the standard error of the mean is
sX ⫽
sX
1N
⫽
100
⫽ 20
125
Then the z-score is
X ⫺ m 550 ⫺ 500
⫽
⫽ ⫹2.5
sX
20
To complete the procedure, we compare the sample’s z-score to the critical value to determine where the
sample mean is on the sampling distribution. As shown
in Figure 6.5, our sample’s z of ⫹2.5—and the underlying sample mean of 550—lies in the right-hand region
of rejection. This tells us that a sample mean of 550
is among those means that are extremely unlikely to
occur when the sample represents the ordinary population of SAT scores. In other words, very seldom does
chance—the luck of the draw—produce such unrepresentative samples from this population, so it is not a
100
Tetra Images Tetra Images/Newscom
z⫽
We wrap up our conclusions in this way: If the sample does not represent the ordinary SAT population, then
it must represent some other population. For example,
perhaps the Prunepit students obtained the high mean
of 550 because they lied about their scores, so they may
represent the population of students who lie about the
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
SAT. Regardless, we
Figure 6.5
use the sample mean
Completed Sampling Distribution of SAT Means Showing Location of the
to estimate the m of the
Prunepit U Sample Relative to the Critical Value
population that the
sample does represent.
µ
A sample having a mean
of 550 is most likely
to come from a popuRegion of rejection
f
Region of rejection
lation having a m of
equals 2.5%
equals 2.5%
550. Therefore, our
best estimate is that the
Prunepit sample repreSample means
440
460
480
500
520
540
560
sents a population of
z-scores
–3
–2
–1
0
+1
+2
+3
SAT scores that has a m
+1.96
–1.96
Prunepit
of 550.
X = 474
Critical value
Critical value X = 550
On the other hand,
z = +2.5
say that our sample
mean had been 474,
resulting in a z-score of (474 ⫺ 500)>20 ⫽ ⫺1.30.
Because ⫺1.30 does not lie beyond the critical value of
3. Compare the sample’s z-score to the critical value.
{1.96, this sample mean is not in the region of rejecIf the sample’s z is beyond the critical value, it is
tion. Look at Figure 6.5, and see that the sample mean
in the region of rejection: Reject that the sample
of 474 is a relatively frequent and thus likely mean.
represents the underlying raw score population. If
Therefore, we know that chance will often produce
the sample’s z is not beyond the critical value, do
such a sample from the ordinary SAT population, so
not reject that the sample represents the underlyit is a good bet that chance produced our sample from
ing population.
this population. Because of this, we can accept that random chance produced a less than perfectly representative sample for us but that it probably represents the
Other Ways to Set Up the
SAT population where m is 500.
6-5e
Sampling Distribution
6-5d Summary of How to
Decide If a Sample Represents the
Underlying Raw Score Population
The basic question answered by all inferential statistical procedures is “Does the sample represent a particular raw score population?” To answer this:
1. Set up the sampling distribution. Draw the
sampling distribution of means with a m
equal to the m of the underlying raw score
population. Select the criterion probability
(usually .05), locate the region of rejection,
and determine the critical value ( {1.96 in a
two-tailed test).
2. Compute the sample mean and its z-score.
a. Compute the standard error of the mean, sX.
b. Compute z using X and the m of the sampling
distribution.
Previously, we placed the region of rejection in both
tails of the distribution because we wanted to identify
unrepresentative sample means that were either too
far above or too far below 500. Instead, we can place
the region of rejection in only one tail of the distribution. (In the next chapter you’ll see why you would
want to use this one-tailed test.)
Say that we are interested only in those SAT means
that are less than 500, having negative z-scores. Our
criterion is still .05, but now we place the entire region
of rejection in the lower, left-hand tail of the sampling
distribution, as shown in part (a) of Figure 6.6. This
produces a different critical value. The extreme lower
5% of a distribution lies beyond the critical value of
⫺1.645. Therefore, the z-score for our sample must
lie beyond ⫺1.645 for it to be in the region of rejection. If it does, we will again conclude that the sample
mean is so unlikely to occur when sampling the SAT
raw score population that we’ll reject that our sample
represents this population. If the z-score is anywhere
Chapter 6: Using Probability to Make Decisions about Data
101
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
else on the sampling distribution, even far into
the upper tail, we will
not reject that the sample
represents the population where m ⫽ 500.
On the other hand,
say that we’re interested
only in those sample
means greater than 500,
having positive z-scores.
Then we place the entire
region of rejection in
the upper, right-hand
tail of the sampling distribution, as shown in
part (b) of Figure 6.6.
Now the critical value
is plus 1.645, so only if
our sample’s z-score is
beyond ⫹1.645 does the
sample mean lie in the
region of rejection. Then
we reject that our sample
represents the underlying
raw score population.
Figure 6.6
Setup of SAT Sampling Distribution to Test (a) Negative z-Scores and
(b) Positive z-Scores
µ
(a)
f
Sample means
z-scores
Region of rejection
equals 5%
440
–3
460
–2
480
–1
500
0
520
+1
540
+2
560
+3
Critical value = –1.645
µ
(b)
Region of rejection
equals 5%
f
Sample means
z-scores
440
–3
460
–2
480
–1
500
0
520
+1
540
+2
560
+3
Critical value = +1.645
For Practice
To decide if a sample represents a
particular raw score population, compute
the sample mean’s z-score and compare
it to the critical value on the sampling
distribution.
More Examples
A sample of SAT scores (N ⫽ 25) produces X ⫽ 460.
Does the sample represent the SAT population where
m ⫽ 500 and sX ⫽ 100? Compute z : sX ⫽ sX /1N ⫽
100/125 ⫽ 20;
z ⫽ (X ⫺ m)/sX ⫽ (460 ⫺ 500)/20 ⫽
⫺2. With a criterion of .05 and the region of rejection in
both tails, the critical value is {1.96. The sampling distribution is like Figure 6.5. The z of ⫺2 is beyond ⫺1.96,
so it is in the region of rejection. Conclusion: The sample
does not represent this SAT population.
102
2. The ______ defines the z-score that is required for a
sample to be in the region of rejection.
3. For a sample to be in the region of rejection, its
z-score must be ______ the critical value.
4. On a test, m ⫽ 60 and sX ⫽ 18. A sample (N ⫽ 100)
produces X ⫽ 65. Using the .05 criterion and both
tails, does this sample represent this population?
> Answers
1. unlikely 2. critical value 3. larger than (beyond)
>
1. The region of rejection contains those samples considered to be ______ to represent the underlying
raw score population.
4. sX ⫽ 18/ 1100 ⫽ 1.80; z ⫽ (65 ⫺ 60)/1.80 ⫽ ⫹2.78.
This z is beyond {1.96, so reject that the sample
represents this population; it’s likely to represent the
population with m ⫽ 65.
> Quick Practice
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out
and use the Chapter Review Cards in the back of your book. Check out the additional study aids online
in CourseMate at www.cengagebrain.com
STUDY PROBLEMS
(Answers for odd-numbered questions are in Appendix C.)
1. (a) What does probability convey about an event’s
occurrence? (b) What is the probability of a
random event based on?
13. Why do we reject that a sample represents the
underlying raw score population if the sample
mean is in the region of rejection?
2. What is random sampling?
14. Why do we retain that a sample represents the
underlying raw score population if the sample
mean is not in the region of rejection?
3. In a sample with a mean of 46 and a standard
deviation of 8, what is the probability of randomly
selecting (a) a score above 64? (b) a score between
40 and 50?
4. The population from which the sample in problem
3 was randomly drawn has a m of 51 and a sX of
14. What is the probability of obtaining a random
sample of 25 scores with a mean (a) below 46?
(b) below 46 or above 56?
5. David’s uncle is building a house on land that
has been devastated by hurricanes 160 times in the
past 200 years. However, there hasn’t been a major
storm there in 13 years, so his uncle says this is a
safe investment. David argues that he is wrong
because a hurricane must be due soon. What is
wrong with the reasoning of both men?
6. Four airplanes from different airlines have crashed
in the past two weeks. This terrifies Megan, who
must travel on a plane. Her travel agent claims
that the probability of a plane crash is minuscule.
Who is correct and why?
7. What does the term “sampling error” indicate?
8. When testing the representativeness of a sample
mean: (a) What is the criterion probability?
(b) What is the region of rejection? (c) What is the
critical value?
9. (a) What does comparing a sample’s z-score to the
critical value indicate? (b) What decision do we
make about a sample when its z-score is beyond
the critical value, and why?
10. What is the difference between using both tails
versus one tail of the sampling distribution in terms
of (a) the region of rejection? (b) the critical value?
11. Sharon asks a sample of students their choices in
the election for class president. She concludes that
Ramone will win. It turns out that Darius wins.
What is the statistical explanation for this incorrect prediction?
12. (a) Why does random sampling produce representative samples? (b) Why does random sampling
produce unrepresentative samples?
15. A researcher obtains a sample mean of 66, which
produces a z of ⫹1.45. The researcher uses critical
values of {1.96 and decides to reject that the sample represents the underlying raw score population
having a m of 60. (a) Draw the sampling distribution
and indicate the approximate locations of X, m, the
computed z-score, and the critical values. (b) Is the
researcher’s conclusion correct? Explain your answer.
16. In a population, m ⫽ 100 and sX ⫽ 25. A sample of
150 people has X ⫽ 120. Using two tails and the
.05 criterion: (a) What is the critical value? (b) Is
this sample mean in the region of rejection? How
do you know? (c) What does the mean’s location indicate about the likelihood of this sample
occurring in this population? (d) What should we
conclude about the sample?
17. The mean of a population of raw scores is 33 (sX ⫽ 12).
Our sample has X ⫽ 36.8 (N ⫽ 30). Using the .05 criterion and the upper tail of the sampling distribution:
(a) What is the critical value? (b) Is the sample mean in
the region of rejection? How do you know? (c) What
does the mean’s location indicate about the likelihood
of this sample occurring in this population? (d) What
should we conclude about the sample?
18. We obtain a X ⫽ 46.8 (with N ⫽ 15). This sample may
represent the population where m ⫽ 50 (sX ⫽ 11).
Using the .05 criterion and the lower tail of the
sampling distribution: (a) What is our critical value?
(b) Is this sample in the region of rejection? How do
you know? (c) What should we conclude about the
sample and why?
19. The mean of a population of raw scores is 28
(sX ⫽ 9). Your X ⫽ 34 (with N ⫽ 35). Using the .05
criterion with the region of rejection in both tails
of the sampling distribution, is the sample representative of this population? Why or why not?
20. The mean of a population of raw scores is
60 (sX ⫽ 16). Your X ⫽ 66 (with N ⫽ 40). Using the
.05 criterion with the region of rejection in the
lower tail, should you reject that the sample represents this population and why?
Chapter 6: Using Probability to Make Decisions about Data
103
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
21. On a test of motor coordination, the population
of average bowlers has a mean score of 24, with
a standard deviation of 6. A random sample of
30 bowlers at Bubba’s Bowling Alley has a sample
mean of 26. A second random sample of 30 bowlers, at Babette’s Bowling Alley, has a mean of 18.
Using the .05 criterion and both tails of the sampling distribution, decide if each sample represents
the population of average bowlers.
22. (a) In question 21, if a particular sample does not
represent the population of average bowlers, what
is your best estimate of the population m it does represent? (b) Explain the logic behind this conclusion.
23. A couple with eight daughters decides to have one
more baby, because they think this time, they are
sure to have a boy! Is this reasoning accurate?
24. In the population of typical statistics students,
m ⫽ 75 on a national final exam (sX ⫽ 6.4).
104
For 25 students who studied statistics using a new
technique, X ⫽ 72.1. Using two tails of the sampling distribution and the .05 criterion: (a) What
is the critical value? (b) Is this sample mean in the
region of rejection? How do you know? (c) Should
we conclude that the sample represents the population of typical statistics students?
25. In a study you obtain the following data measuring the aggressive tendencies of some football
players:
40 30 39 40 41 39 31 28 33
(a) Researchers have found that m ⫽ 30 in the
population of non–football players, with sX ⫽ 5.
Using both tails of the sampling distribution,
determine whether your football players
represent this population. (b) What do you
conclude about the population of football players
and its m?
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
USE THE TOOLS.
• Rip out the Review Cards in the back of your book to study.
Or Visit CourseMate for:
• Full, interactive eBook (search, highlight, take notes)
• Review Flashcards (Print or Online) to master key terms
• Test yourself with Auto-Graded Quizzes
• Bring concepts to life with Games, Videos,
and Animations!
Complete the Speak Up
survey in CourseMate at
www.cengagebrain.com
Follow us at
www.facebook.com/4ltrpress
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
©iStockphoto.com/A-Digit
Go to CourseMate for Behavioral Sciences STAT2 to
begin using these tools.
Access at www.cengagebrain.com
7
Chapter
OVERVIEW OF STATISTICAL
HYPOTHESIS TESTING:
THE z-TEST
LOOKING
BACK
GOING
F O R WA R D
Be sure you understand:
Your goals in this chapter are to learn:
• From Chapter 1, what the conditions
of an independent variable are and
what the dependent variable is.
• Why the possibility of sampling error causes researchers to perform
inferential statistics.
• From Chapter 4, that a relationship in
the population occurs when different
means from the conditions represent
different ms and thus different distributions of dependent scores.
• From Chapter 6, that when a sample’s
z-score falls in the region of rejection,
the sample is unlikely to represent the
underlying raw score population.
• When experimental hypotheses lead to either one-tailed or twotailed tests.
• How to create the null and alternative hypotheses.
• When and how to perform the z-test.
• How to interpret significant and nonsignificant results.
• What Type I errors, Type II errors, and power are.
I
n the previous chapter, you learned the basics involved
Sections
7-1
The Role of Inferential Statistics in
Research
7-2
7-3
7-4
Setting Up Inferential Procedures
in inferential statistics. Now we’ll put these procedures
into a research context and present the statistical
language and symbols used to describe them. Until
further notice, we’ll be talking about experiments.
Performing the z-Test
This chapter shows (1) how to set up an inferential
Interpreting Significant and
Nonsignificant Results
procedure, (2) how to perform the “z-test,” (3) how to
7-5
7-6
7-7
Summary of the z-Test
describe potential errors in our conclusions.
7-8
Errors in Statistical Decision Making
106
Behavioral Sciences STAT2
interpret the results of a procedure, and (4) the way to
The One-Tailed Test
Statistics in the Research Literature:
Reporting the Results
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
©iunewind/Shutterstock.com
7-1 THE ROLE OF
INFERENTIAL STATISTICS
IN RESEARCH
As you saw in the previous chapter, a random sample
may be unrepresentative of a population because, just
by the luck of the draw, the sample may contain too
many high scores or too many low scores relative to
the population. Because the sample is not perfectly
representative of the population, it reflects sampling
error, so the sample mean does not equal the population mean.
Sampling error is a potential problem in all behavioral research. Recall that the goal of research is to use
sample data to describe the relationship found in the
population. The dilemma for researchers is that sampling error may produce misleading sample data, so
that we draw incorrect conclusions about the population. For example, in an experiment we hope to see
a relationship in which, as we change the conditions
of the independent variable, participants’ scores on
the dependent variable change in a consistent fashion. If the means for our conditions differ, we infer
that, in nature, each condition would produce a different population of scores located at a different m.
But! Here is where sampling error comes in. Perhaps
we are wrong and the relationship does not exist in
nature. Maybe all of the scores actually come from
the same population, and the means in our conditions
differ simply because of which participants we happened to select for each—because of sampling error.
We won’t know this, of course, so we will be misled
into thinking the relationship does exist. For example,
say we compare the creativity scores of some men and
women, although we are unaware that in nature, men
and women do not differ on this variable. Through
sampling error, however, we might select some females
who are more creative than our males or vice versa.
Then sampling error will mislead us into thinking
there’s a relationship here, although really there is not.
Or, perhaps there is a relationship in the population, but because of sampling error, we see a different
relationship in our sample data. For example, say we
measure the heights of some men and women and, by
chance, obtain a sample of short men and a sample
of tall women. If we didn’t already know that in the
population men are taller, sampling error would mislead us into concluding that women are taller.
Thus, in every study it is possible that we are being
misled by sampling error, so that the relationship we
see in our sample data is not the relationship found
in nature. This is the reason why, in
every study, we apply inferential stainferential
statistics
tistics. Inferential statistics are
Procedures for
used to decide whether sample data
deciding whether
represent a particular relationship in
sample data
represent a particular
the population. Essentially, we decide
relationship in the
whether we should believe our sampopulation
ple data: Should we believe what the
Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test
107
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
sample data appear to
indicate about the relationship in the popuInferential procedures
that require certain
lation, or instead, is it
assumptions about the
likely that the sample
raw score population
relationship is a coinrepresented by the
sample; used when we
cidence produced by
compute the mean
sampling error that misnonparametric
represents what is found
statistics Inferential in the population?
procedures that do
The specific infernot require stringent
ential procedure we
assumptions about the
raw score population
should use in a particurepresented by the
lar experiment depends
sample; used with the
first on the charactermedian and mode
istics of our dependent
experimental
variable. We have two
hypotheses
Two statements
general categories of
describing the predicted
procedures to choose from. The first
relationship that
is parametric statistics. Parametmay or may not be
demonstrated by a study
ric statistics are procedures that
require specific assumptions about
the raw score populations being represented. Each
procedure has its own assumptions, but there are two
assumptions common to all parametric procedures:
(1) The population of dependent scores should be
at least approximately normally distributed. (2) The
scores should be interval or ratio scores. So, parametric procedures are used when it is appropriate to compute the mean.
The other category is nonparametric statistics,
which are inferential procedures that do not require
stringent assumptions about the populations
being represented. These procedures are used with nominal or ordinal scores or with skewed interval or ratio distributions. So, nonparametric procedures are used when
we compute the median or the mode.
Image Source Ltd/the Agency Collection/Jupiter Images
parametric
statistics
In this and upcoming chapters we will discuss the
most common parametric statistics. (Chapter 13 deals with
nonparametrics.) Once we have decided to use parametric
procedures, we select a specific procedure depending on
the particulars of our experiment’s design. However, we
set up all inferential procedures in a similar way.
7-2 SETTING UP
INFERENTIAL PROCEDURES
Researchers perform four steps in an experiment:
They create the experimental hypotheses, design
and conduct the experiment to test the hypotheses,
translate the experimental hypotheses into statistical
hypotheses, and test the statistical hypotheses.
7-2a Creating the Experimental
Hypotheses
An experiment always tests two experimental
hypotheses which describe the possible outcomes
Parametric and nonparametric
inferential statistics are for deciding
if the data reflect a relationship
in nature, or if sampling error is
misleading us into thinking there
is this relationship.
108
of the study. In general, one hypothesis states we will
demonstrate that the predicted relationship operates in
the population. By “predicted relationship” we mean
that manipulating the independent variable will have
the expected influence on dependent scores. The other
hypothesis states we will not demonstrate the predicted relationship in the population (manipulating
the independent variable will not “work” as expected).
We can predict a relationship in one of two ways.
Sometimes we predict that changing the conditions of
the independent variable will cause dependent scores
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
©iStockphoto.com/Emilia Stasiak
to change, even though we are not sure whether the
scores will increase or decrease. This leads to a “twotailed” test. A two-tailed test is used when we do
not predict the direction in which dependent scores
will change. Thus, we’d have a two-tailed hypothesis
if we thought men and women differ in creativity but
were unsure who would score higher.
The other approach is a one-tailed test. A
one-tailed test is used when we do predict the
direction in which dependent scores will change. We
may predict that scores will only increase, or that they
will only decrease. Thus, we’d have a one-tailed test
if we predicted only that men are more creative than
women. Or, we might predict only that women are
more creative than men.
Let’s first examine a study involving a two-tailed
test. Say we’ve discovered a substance related to intelligence that we will test on humans using an “IQ pill.”
The number of pills a person consumes is our independent variable, and the person’s IQ score is our dependent variable. Say that we believe the pill will affect IQ,
but we are not sure whether it will make people smarter
or dumber. Therefore, we have a two-tailed test.
When you are applying inferential procedures, be
sure to identify the “predicted relationship.” Here we
implicitly predict that the more of the pill participants
consume, the more their IQ scores will change. And
note: We will simplify things by assuming that if IQ
scores change, the only variable that could cause this
change is our pill. (In real research we do not make
this assumption.) Thus, here are our two-tailed experimental hypotheses:
1. We will demonstrate that the pill works by either
increasing or decreasing IQ scores.
2. We will not demonstrate that the pill works,
because IQ scores will not change.
7-2b Designing a One-Sample
Experiment
We could design an experiment to test the IQ pill in
a number of different ways. However, the simplest is
as a one-sample experiment. We will randomly select
one sample of participants and give each person, say,
one pill. Later we’ll give participants an IQ test. The
sample will represent the population of people when
they have taken one pill, and the sample X will represent that population’s m.
However, to demonstrate a relationship, we must
demonstrate that different conditions produce different populations having different ms. Therefore, to perform a one-sample experiment, we must already know
the population mean for participants tested under
another condition of the independent variable. So, we
must compare the population receiving one pill represented by our sample to some other, known population that receives a different amount of the pill. One
population we know about that has received a different amount is the population that has received zero
amount of our pill. Say that our IQ test has been given
to many people over the years who have not taken
the pill, and that this population has a m of 100. We
will compare this population without the pill to the
population with the pill represented by our sample. If
the population without the pill has a different m than
the population with the pill, then we have evidence of
a relationship in the population.
7-2c Creating the
Statistical Hypotheses
So that we can apply statistical procedures, we translate the experimental hypotheses into two statistical
hypotheses. Statistical hypotheses describe the population parameters that the sample data represent
if the predicted relationship does or
does not exist. The two statistical
hypotheses are called the alternative
hypothesis and the null hypothesis.
THE ALTERNATIVE HYPOTHESIS It
is easier to create the alternative
hypothesis first, because it corresponds to the experimental hypothesis that the experiment does work as
two-tailed
test The type of
inferential test used
when we do not
predict whether
dependent scores will
increase or decrease
one-tailed
test The type of
inferential test used
when we predict that
dependent scores will
only increase or will
only decrease
statistical
hypotheses
Statements that
describe the
population parameters
the sample statistics
represent if the
predicted relationship
exists or does not exist
Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test
109
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
predicted. The alternative hypothesis describes the
scores might decrease, but we do know that the m of the
population parameters that the sample data represent if
population with the pill will be less than 100.
the predicted relationship occurs in nature. It is always
Our alternative hypothesis will communicate all
the hypothesis of a difference: It says that changing the
of the above. However, every procedure involves an
independent variable “works” by producing the prealternative hypothesis so we use symbols to quickly
dicted difference in scores in the population. You can
express it. If the pill works as predicted, then the popsee our predicted relationship in Figures 7.1 and 7.2.
ulation with the pill will have a m that is either greater
Figure 7.1 shows the relationship if the pill
than or less than 100. In other words, the population
increases IQ. Without the pill, the population is cenmean with the pill will not equal 100. The symbol for
tered at a score of 100. By giving everyone one pill,
the alternative hypothesis is Ha. The symbol for not
however, all scores tend to increase so that the distriequal is “⬆,” so our alternative hypothesis is
bution moves to the right, over to the higher scores.
Ha: m ⬆ 100
We don’t know how much scores will increase, but
we do know that the m of the population with the pill
This proposes that our sample mean represents a
will be greater than 100, because 100 is the m of the
m not equal to 100. Because the m without the pill is
population without the pill.
100, we know that Ha implies there is a relationship in
On the other hand, Figure 7.2 shows the relationthe population. (In a two-tailed test, Ha is always that
ship if the pill decreases IQ. Here, the distribution with
m ⬆ some value.)
the pill moves to the
left, over to the lower
Figure 7.1
scores. We also don’t
Relationship in the Population If the IQ Pill Increases IQ Scores
know how much
No pill
µ = 100
alternative
hypothesis
(Ha) The hypothesis
describing the
population parameters
the sample data
represent if the
predicted relationship
does exist in nature
f
null hypothesis
(H0) The hypothesis
describing the
population parameters
the sample data
represent if the
predicted relationship
does not exist in nature
One pill
µ > 100
X
X
X
Low IQ scores
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
High IQ scores
X
X
X
X
High IQ scores
IQ scores
Figure 7.2
Relationship in the Population If the IQ Pill Decreases IQ Scores
©iStockphoto.com/Timur Nisametdinov
One pill
µ < 100
No pill
µ = 100
f
X
X
X
Low IQ scores
X
X
X
X
X
X
X
X
X
X
X
IQ scores
110
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The statistical
Figure 7.3
hypothesis corresponding to the experimental
Population of Scores If the IQ Pill Does Not Affect IQ Scores
hypothesis that the experiment does not work
as predicted is called the null hypothesis. The
µ = 100
null hypothesis describes the population
with or without pill
parameters that the sample data represent if
the predicted relationship does not occur in
nature. It is always the hypothesis of no difference: It says that changing the independent
variable does not “work” because it does not
f
produce the predicted difference in scores in
the population.
If the IQ pill does not work, then it would be
X X
X
X
X
X
X
X
X
X
X
as if the pill were not present. The population of
Low IQ scores
High IQ scores
IQ scores without the pill has a m of 100. ThereIQ scores
fore, if the pill does not work, then the population of scores will be unchanged and m will still
be 100. Accordingly, if we measured the population with and without the pill, we would have one
population of scores, located at the m of 100, as shown
in Figure 7.3.
Our null hypothesis will communicate the
The null hypothesis shows the value of
–
above but we again express it using symbols.
m that our X represents if the predicted
The symbol for the null hypothesis is H0. (The
relationship does not exist.
subscript is 0 because null means zero, as in
zero relationship.) The null hypothesis for the
The alternative hypothesis shows the
–
IQ pill study is
value of m that our X represents if the
©iStockphoto.com/Joanne Harris and Daniel Bubnich
THE
NULL
HYPOTHESIS
> Quick Practice
H0: m ⫽ 100
This proposes that our sample with the pill represents the population where m is 100. Because this is
the same population found without the pill, we know
that H0 implies the predicted relationship does not
occur in nature. (In a two-tailed test, H0 is always that
m ⫽ some value.)
>
>
predicted relationship does exist.
More Examples
In a experiment, we compare a sample of men to the
population of women who have a m of 75. We predict
simply that men are different from women, so this is a
two-tailed test. The alternative hypothesis is that our
men represent a different population, so their m is not 75;
thus, Ha:m ⬆ 75. The null hypothesis is that men are the
same as women, so the men’s m is also 75, so H0:m ⫽ 75.
For Practice
The alternative hypothesis (Ha)
always says the sample data
represent a m that reflects the
predicted relationship. The null
hypothesis (H0) says the sample
data represent the m that’s found
when the predicted relationship
is not present.
1. A _____ test is used when we do not predict the
direction that scores will change; a _____ test is
used when we do predict the direction that scores
will change.
2. The _____ hypothesis says the sample data represent a population where the predicted relationship
exists. The _____ hypothesis says the sample data
represent a population where the predicted relationship does not exist.
(continued)
Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test
111
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
3. The m for adults on a test is 140. We test a sample of
children to see if they are different from adults. What
are Ha and H0?
4. The m for days absent among workers is 15.6. We
train a sample of new workers and ask whether the
training changes worker absenteeism in the population. What are Ha and H0?
> Answers
4. Ha: m ⬆ 15.6; H0: m ⫽ 15.6
3. Ha: m ⬆ 140; H0: m ⫽ 140
2. alternative; null
1. two-tailed; one-tailed
7-2d The Logic of Statistical
Hypothesis Testing
The statistical hypotheses for the IQ pill study are
H0: m ⫽ 100 and Ha: m ⬆ 100. Remember, these are
hypotheses—guesses—about the population that is
represented by our sample with the pill. (We have no
uncertainty about what happens without the pill; we
know the m there.) Notice that, together, H0 and Ha
include all possibilities, so one or the other must be
true. We use inferential procedures to test (choose
between) these hypotheses, so these procedures are
called “statistical hypothesis testing.”
Say that we randomly selected 36 people, gave
them the pill, measured their IQs, and found their
mean was 105. We would like to say this: People who
have not taken this pill have a mean IQ of 100, so if the
pill did not work, the sample mean should have been
100. Therefore, a mean of 105 suggests that the pill
does work, raising IQ scores about 5 points. If the pill
does this for the sample, it should do this for the population, so we expect that a population receiving the
pill would have a m of 105. A m of 105 is “not equal
to 100,” so our results fit our alternative hypothesis
(Ha: m ⬆ 100), and it looks like the pill works. Apparently, if we measured everyone in the population with
and without the pill, we would have the two distributions shown previously in Figure 7.1, with the population that received the pill located at the m of 105.
But hold on! We just assumed that our sample is
perfectly representative of the population it represents.
112
But what if we have sampling error? Maybe we obtained
a mean of 105 not because the pill works, but because
we inaccurately represented the situation where the pill
does not work. After all, it is unlikely that any sample is
perfectly representative, so even if our sample represents
the population where m is 100, we don’t expect our X
to equal exactly 100! So, maybe the pill does nothing,
but by chance we happened to select participants who
already had an above-average IQ. Thus, maybe the null
hypothesis is correct: Maybe our sample actually represents the population where m is 100.
Likewise, in any study we cannot automatically
infer that the predicted relationship exists in nature
when the sample data show the relationship, because
we are still confronted by our two choices:
1. The H0 , which implies we are being misled by sampling error: By chance we obtained an unrepresentative sample that produced data that coincidentally fit the predicted relationship. This gives the
appearance of the relationship in nature although
in reality this relationship does not exist. Therefore, we should not believe that the independent
variable influences scores as our sample indicates.
(In our example, the m with the pill is really 100.)
2. The Ha, which implies we are representing the
predicted relationship: We obtained sample data
that fit the predicted relationship because this
relationship operates in nature and it produced
our data. Therefore, we can believe that the independent variable influences scores as our sample
indicates. (In our example, the m with the pill is
really 105.)
Before we can believe our sample, we must first be
sure we are not being misled by sampling error. However, the only way to prove whether H0 is true is to give
the pill to everyone in the population and see whether
m is 100 or 105. We cannot do that. We can, however, determine how likely it is that H0 is true. That is,
using the procedure discussed in the previous chapter,
we will determine the likelihood of obtaining a sample
mean of 105 from the population that has a m of 100.
If such a mean is too unlikely, then we will reject that
our sample represents this population, rejecting that
H0 is the correct hypothesis for our study.
All parametric and nonparametric procedures
use the above logic. To select the correct procedure
for a particular experiment, you should check that the
design and dependent scores fit the “assumptions” of
the procedure. The IQ pill study meets the assumptions of the parametric procedure called the z-test.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
7-3 PERFORMINGG
populat
population
that H0 says
z-test The
our sam
sample represents.
parametric procedure
used in a singleIn our p
pill study, we
sample experiment
want to see the likeliwhen the standard
hood o
of getting our
deviation of the raw
sample mean from
sampl
score population is
known
the p
population
where m ⫽ 100.
whe
alpha (A) The
Greek letter that
Thus, it is as if
Th
symbolizes the
we have created
w
criterion probability
the sampling
th
distribution by
di
infinitely sampling IQ scores from
in
the population of people who have
th
not taken the pill.
no
THE z-TEST
©iStockphoto.com/Anne-Louise Quarfoth
The z-test is the procedure for
computing a z-score for a sample
ple
mean that we’ve discussed in prerevious chapters. The z-test has four
ur
assumptions:
1. We have randomly selected onee
sample.
2. The dependent variable is at
least approximately normally
distributed in the population and
involves an interval or ratio scale.
2. Identify the m of the sampling distribution as the
value of m given in the null hypothesis. In our
pill study, if we infinitely sample the raw score
population where the average score is 100, then
the average sample mean will also be 100.
3. We know the mean of the population of raw
scores under another condition of the independent variable.
4. We know the true standard deviation of
the population (sX) described by the null
hypothesis.
3. Select the alpha. Recall that the criterion is the
probability that defines sample means as unlikely to
represent the underlying raw score population.
The symbol for the criterion is A, the Greek letter
alpha. Usually our criterion, our “alpha level,” is
.05, so in symbols we say a ⫽ .05.
Say that from the research literature, we know
that IQ scores meet the requirements of the z-test
and that in the population where m is 100, the standard deviation is 15. Therefore, the next step is to
perform the z-test. (Note: SPSS does not perform
the z-test.)
4. Locate the region of rejection. Recall that we may
use one or both tails of the sampling distribution.
Which arrangement to use depends on whether
we have a two-tailed or one-tailed test. Above, we
created a two-tailed hypothesis, predicting that
the pill makes people either smarter or dumber,
producing a X that is either larger than 100 or
7-3a Setting Up the Sampling
Distribution for a Two-Tailed Test
To perform the z-test we must create the sampling
distribution
of
means and identify
Figure 7.4
the region of rejecSampling Distribution of IQ Means for a Two-Tailed Test
tion as we did in the
A region of rejection is in each tail of the distribution, marked by the critical values of { 1.96.
previous chapter. The
finished sampling disµ
tribution is shown in
Figure 7.4. To create
it we performed the
following steps (and
f
Region of rejection
added some new
equals 2.5%
symbols).
1. Create the sampling distribution
of means from
the underlying raw score
Sample means
z-scores
X
–3
X
X
X
–2
zcrit = –1.96
X
–1
X
100
0
X
X
+1
X
Region of rejection
equals 2.5%
X
+2
X
X
+3
zcrit = +1.96
Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test
113
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
smaller than 100. Therefore, we have a twotailed test, with part of the region of rejection in
each tail.
our sample represents. For our IQ pill study, sX is 15
and N is 36, so
5. Determine the critical value. We’ll abbreviate the
critical value of z as zcrit. Recall that with a ⫽ .05,
the region of rejection in each tail is 2.5% of the
distribution. From the z-table, z ⫽ 1.96 demarcates this region. Thus, we complete Figure 7.4 by
adding that zcrit is {1.96.
sX ⫽
sX
1N
⫽
15
15
⫽
⫽ 2.5
6
136
Now compute zobt: The value of m to put in the
formula is always the m of the sampling distribution,
which is also the m of the raw score population that
H0 says the sample represents (here m ⫽ 100). The X
is computed from the scores in the sample. Then the
z-score for our sample mean of 105 is
zobt ⫽
The mean of the sampling
distribution always equals the m
of the raw score population
that H0 says we are
representing.
X ⫺ m 105 ⫺ 100 ⫹5
⫽
⫽
⫽ ⫹2
sX
2.5
2.5
The final step is to interpret this zobt by comparing
it to zcrit.
7-3c Comparing the Obtained z
to the Critical Value
The sampling distribution always describes the situation when null is true: Here, it shows all possible
means that occur when, as H0 says, our sample comes
from the population where m is 100 (from the situation where the pill does not work). If we are to believe
H0, the sampling distribution should show that a X of
105 is relatively frequent and thus likely in this situation. However, Figure 7.5 shows just the opposite.
A zobt of ⫹2 tells us that a X of 105 seldom occurs
7-3b Computing z
Now it’s time to compute the z-score for our sample
mean. The z-score we compute is “obtained” from the
data, so we’ll call it z obtained, which we abbreviate
as zobt. You know how to compute this from previous
chapters.
THE FORMULA
FOR THE z-TEST IS
X⫺m
zobt ⫽
sX
Figure 7.5
Sampling Distribution of IQ Means
The sample mean of 105 is located at zobt ⫽ ⫹2.00.
µ
where
sX ⫽
sX
1N
First, we compute the
standard error of the
mean (sX ). In the formula, sX is the known
standard deviation of
the underlying raw score
population that H0 says
114
f
Sample means
z-scores
X
–3
X
X
–2
X
zcrit = –1.96
X
–1
X
100
0
X
X
+1
X
X
+2
zcrit = +1.96
X
X
+3
zobt = +2
( X = 105)
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
by chance when we are representing the population
where m is 100. This makes it difficult to believe that
our sample mean of 105 occurred by chance from this
population: A mean like ours hardly ever occurs in the
situation where the pill doesn’t work. In fact, because
our zobt of ⫹2 is beyond the zcrit of {1.96, our sample
is in the region of rejection. Therefore, we conclude
that our sample is so unlikely to represent the population where m ⫽ 100 that we reject that the sample
represents this population. In other words, we reject
that our results poorly represent the situation where
the pill does not work.
In statistical terms, we say that we have “rejected”
the null hypothesis. If we reject H0, then we are left
with Ha, and so we “accept Ha.” Here, Ha is m ⬆ 100,
so we accept that our sample represents a population
where m is not 100.
If this makes your head spin, it may be because
the logic actually involves a “double negative.”
When our sample falls in the region of rejection, we
say “no” to H0. But H0 says there is no relationship
involving our pill. By rejecting H0 we are saying “no”
to no relationship. This is actually saying, “Yes, there
is a relationship involving our pill,” which is what
Ha says.
Thus, it appears we have evidence of a relationship in nature such that the pill would change the
population of IQ scores. In fact, we can be more
specific: A sample mean of 105 is most likely to represent the population where m is 105. So without the
pill, the population of IQ scores is at a m of 100, but
with the pill we expect scores will rise to produce a
population with a m at about 105.
NULL
HYPOTHESIS
IS SAYING “NO” TO THE IDEA
THAT THERE IS NO
RELATIONSHIP.
REJECTING THE
A sample statistic that lies
beyond the critical value is in the
region of rejection, so we reject H0
and accept Ha.
7
7-4a
Interpreting
7-4 INTERPRETING
ETING
SIGNIFICANT AND
NONSIGNIFICANT
ANT
RESULTS
©iStockphoto.com/Julien Tromeur
Once we have made a decision
sion about the
statistical hypotheses (H0 and Ha),
we then make a decision about
the corresponding original
ginal
experimental hypothesis. When
we reject H0 we also reject the
experimental hypothesis that
our independent variable
ble
does not work as predicted.
ed.
Therefore, we will reject that
at
our pill does not work. By
accepting Ha we accept that
at
our pill appears to work.
S
Significant
Results
T
The
way to communicate
th we have rejected H0
that
a accepted Ha is to use the
and
te
term
significant. Significant
do
does
not mean important
or impressive. Significant
indic
indicates that our results are
unlike to occur if the predicted
unlikely
relationshi does not exist in the
relationship
Ther
population. Therefore,
we
imply that the relationship
rela
significant
ex
Describes results that
found in the experiment
are unlikely to result
re
is “believable,” representfrom sampling error
ing a relationship found in
when the predicted
nature, and that it does not
relationship does
not exist; it indicates
err from
reflect sampling error
rejection of the null
the situation where the relahypothesis
tionship does not exist.
Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test
115
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Although we accept that a relationship exists, there
the pill affect intelligence, what brain mechanisms are
are three very important restrictions on how far we can
involved, and so on.
go with this claim. First, we never prove that H0 is false.
The sampling distribution in Figure 7.5 shows that a
Interpreting Nonsignificant
mean of 105 does occur once in a while when we are
Results
representing the population where m is 100. Maybe our
sample was one of these times. Maybe the pill did not
Let’s say that the IQ pill had instead produced a samwork, and our sample was very unrepresentative of this.
ple mean of 99. Now the z-score for the sample is
Second, we do not prove that our independent
variable caused the scores to change. In our pill study,
X ⫺ m 99 ⫺ 100 ⫺1
although we’re confident that our sample represents a
zobt ⫽
⫽
⫽
⫽ ⫺.40
sX
2.5
2.5
population with a m above 100, we have not proven
that it was the pill that produced these scores. Some
As in Figure 7.6, a zobt of ⫺.40 is not beyond the zcrit
other, hidden variable might have actually caused the
of {1.96, so the sample mean does not lie in the region
higher scores in the sample.
of rejection. This indicates that a mean of 99 is likely
Finally, we do not know the exact m of the popuwhen sampling the population where m ⫽ 100. Thus,
lation represented by our sample. In our pill study,
our sample mean was likely to have occurred if we were
assuming that the pill does increase IQ scores, the
representing this population. In other words, our mean
population m would probably not be exactly 105. Our
of 99 was a mean you’d expect if we were representing
sample mean may contain (you guessed it) sampling
the situation where the pill does not work and we had
error! That is, the sample may accurately reflect that
a little sampling error. Therefore, we will not conclude
the pill increases IQ, but it may not perfectly represent
that the pill works. After all, it makes no sense to claim
how much the pill increases scores. Therefore, we conthat the pill works if the results were likely to occur
clude that the m produced by our pill would probably
without the pill. Thus, our null hypothesis—that our
be around 105.
sample represents the population of scores without the
Bearing these qualifications in mind, we interpill—is a reasonable hypothesis, so we will not reject it.
pret the X of 105 the way we wanted to several pages
However, we have not proven that H0 is true; in such
back: Apparently, the pill increases IQ scores by about
situations, we “fail to reject H0” or we “retain H0.”
5 points. But now, because the results are significant,
To communicate the above we say we have
we are confident we are not being misled by sampling
nonsignificant or not significant results. (Don’t say
error. Instead, we are confident we have evidence
“insignificant.”) Nonsignificant indicates that the
of a relationship found in nature. Therefore, after
describing this relationship, we return to being
behavioral
researchers
Figure 7.6
and attempt to explain
Sampling Distribution of IQ Means
how nature operates in
The sample mean of 99 has a zobt of ⫺.40.
terms of the variables
µ
and behaviors we are
studying. Thus, our final
step would be to describe
how the ingredients in
7-4b
f
nonsignificant
Describes results that
are likely to result
from sampling error
when the predicted
relationship does
not exist; it indicates
failure to reject the null
hypothesis
116
Sample means
z-scores
X
–3
X
X
X
–2
zcrit = –1.96
X
–1
X
100
0
X
X
+1
X
X
+2
X
X
+3
zcrit = +1.96
zobt = –.40
( X = 99)
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
relationship shown by our sample data was likely to
have occurred through chance sampling error, without
there being a real relationship in nature.
Nonsignificant results do not prove that the
independent variable does not work. We have simply
failed to find convincing evidence that it does work.
The only thing we’re sure of is that sampling error
could have produced our data. Therefore, we still have
two hypotheses that are both viable:
1. H0, that the results do not really represent a
relationship
2. Ha, that the results do represent a relationship
Maybe the pill does not work. Or, maybe the pill
does work, but the sample data do not convincingly
show this. We simply don’t know. Therefore, when you
do not reject H0, do not say anything about whether the
independent variable influences behavior or not. All that
you can say is that you have failed to convincingly demonstrate the predicted relationship in the population.
and determine if it is a one- or a two-tailed test.
Then H0 describes the m that the X represents
if the predicted relationship does not exist. Ha
describes the m that the X represents if the relationship does exist.
2. Set up the sampling distribution. Select a,
locate the region of rejection, and determine the
critical value.
3. Compute the X and zobt. First, compute s . Then in
X
the formula for z, the value of m is the m of the sampling distribution, which is also the m of the raw
score population that H0 says is being represented.
4. Compare zobt to zcrit. If zobt lies beyond zcrit, then
reject H0, accept Ha, and say the results are “significant.” Then describe the relationship. If zobt
does not lie beyond zcrit, do not reject H0, and say
the results are “nonsignificant.” Do not draw any
conclusions about the relationship.
> Quick Practice
Nonsignificant indicates that we
have failed to reject H0 because
our results are not in the region
of rejection and are thus likely
to occur when there is not a
relationship in nature.
Nonsignificant results provide no
convincing evidence—one way
or the other—as to whether a
relationship exists in nature.
>
If zobt lies beyond zcrit, reject H0; the
results are significant, and conclude
there is evidence for the predicted
relationship. Otherwise, the results are
not significant, and make no conclusion
about the relationship.
More Examples
We test a new techniques for teaching reading. Without
it, the m on a reading test is 220, with sX ⫽ 15. An N of
25 participants has X ⫽ 211.55. Then:
1. With a two-tailed test, H0: m ⫽ 220; Ha: m ⬆ 220.
2. Compute zobt: s ⫽ sX / 1N ⫽ 15/ 125 ⫽ 3;
X
zobt ⫽ (X ⫺ m )/s ⫽ (211.55 ⫺ 220)/3 ⫽ ⫺2.817.
X
3. With a ⫽ .05, zcrit is {1.96, and the sampling distribution is like Figure 7.4.
7-5 SUMMARY OF THE
4. The zobt of ⫺2.817 is beyond the zcrit of ⫺1.96, so the
results are significant: The data reflect a relationship,
with the m of the population using the technique at
around 211.55, while those not using it at m ⫽ 220.
Altogether, the preceding discussion can be summarized as follows. For a one-sample experiment that
meets the assumptions of the z-test:
Another reading study produced z ⫽ ⫺1.83. This zobt is
not beyond zcrit so the results are not significant: Make no
conclusion about the influence of the new technique.
z-TEST
1. Determine the experimental hypotheses and create the statistical hypotheses. Identify the predicted relationship the study may demonstrate,
(continued)
Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test
117
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
©iStockphoto.com/Stephanie Brewer
For Practice
We test whether a sample of 36 successful dieters
rs are
more or less satisfied with their appearance than is the
population of nondieters, where m ⫽ 40 (sX ⫽ 12)..
1. What are H0 and Ha?
2. The X for dieters is 45. Compute zobt.
3. Set up the sampling distribution.
4. What should we conclude?
> Answers
We test H0 by testing whether the sample represents the raw score population in which m equals 100.
This is because our sample mean must be above 100
for us to even begin to conclude that our pill makes
people smarter. If we then find that our mean is too
high to represent a m equal to 100, then we automatically reject that it represents a m less than 100.
4. The zobt of ⫹2.5 is beyond zcrit of { 1.96, so the
results are significant: The population of dieters
are more satisfied (at a m around 45) than the
population of nondieters (at m ⫽ 40).
3. With a ⫽ .05 the sampling distribution has a region
of rejection in each tail, with zcrit ⫽ {1.96 (as in
Figure 7.4).
X
2. s ⫽ 12/ 136 ⫽ 2; zobt ⫽ (45 ⫺ 40)/2 ⫽ ⫹2.5
1. H0: m ⫽ 40; Ha: m ⬆ 40
7-6 THE ONE-TAILED TEST
Recall that a one-tailed test is used when we predict
the direction in which scores will change. The statistical hypotheses and sampling distribution are different
in a one-tailed test.
A one-tailed null hypothesis
always includes that m equals
some value. Test H0 by testing
whether the sample data
represent the population
with that m.
7-6a The One-Tailed Test for
Increasing Scores
Say that we had developed a “smart pill.” Then the
experimental hypotheses are (1) the pill makes people smarter by increasing IQ, or (2) the pill does not
make people smarter. For the statistical hypotheses,
start with the alternative hypothesis: People without
the pill produce a m of 100, so if the pill makes them
smarter, our sample will represent a population with
a m greater than 100. The symbol for greater than is
“+”; therefore, Ha: m ⬎ 100. For the null hypothesis, if
the pill does not work as predicted, either it will leave
IQ scores unchanged or it will decrease them (making
people dumber). Then our sample mean represents a
m either equal to 100 or less than 100. The symbol for
less than or equal to is “ " ”, so H0: m ⱕ 100.
118
Thus, as shown in Figure 7.7, the sampling distribution again shows the means that occur if we are representing a m ⫽ 100 (the situation where the pill does
nothing to IQ). We again set a ⫽ .05, but the region of
rejection is in only one tail of the sampling distribution. You can identify which tail to put the region in by
identifying the result you must have to be able to claim
that your independent variable works as predicted (to
support Ha). To say that the pill makes people smarter,
the sample mean must be significant and larger than
100. Means that are significantly larger than 100 are
in a region of rejection in the upper tail of the sampling distribution. Then, as in the previous chapter, the
region of rejection is 5% of the curve, so zcrit is ⫹1.645.
Say that after testing the pill, we find X ⫽ 106.58.
The sampling distribution is still based on the
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
population with m ⫽ 100 and sX ⫽ 15. Say that
N ⫽ 36, so
sX ⫽
15
⫽ 2.5
136
Then
zobt ⫽
106.58 ⫺ 100
⫽ ⫹2.63
2.5
you may miss a relationship that, with a two-tailed test,
would be significant.
7-6b The One-Tailed Test for
Decreasing Scores
A study where a one-tailed test would be appropriate
is if we had created a pill to lower IQ scores. If the pill
works, then the sample mean represents a m less than
100. The symbol for less than is “⬍”, so Ha: m ⬍ 100.
However, if the pill does not work, either it will leave
scores unchanged, or it will increase scores. Then, the
sample mean represents a m greater than or equal to
100. The symbol for greater than or equal to is “ " ”,
so H0: m ⱖ 100. However, we again test H0 by testing
whether m ⫽ 100.
As in Figure 7.7, this zobt is beyond zcrit, so it is in the
region of rejection. Therefore, the sample mean is
unlikely to represent the population having m ⫽ 100,
and it’s even less likely to represent a population that
has a m ⬍ 100. Therefore, we reject the null hypothesis that m ⱕ 100, and accept the alternative hypothesis that m ⬎ 100. We conclude that the pill produces
a significant increase in
IQ scores, and estimate
that with the pill, m would Figure 7.7
equal about 106.58.
Sampling Distribution of IQ Means for a One-Tailed Test of Whether Scores Increase
If zobt had not been in The region of rejection is entirely in the upper tail.
the region of rejection, we
µ
would have retained H0
and had no evidence as
to whether the pill makes
people smarter or not.
Region of rejection
f
Recognize there is a
equals 5%
risk to using one-tailed tests.
The one-tailed zobt is significant only if it lies beyond zcrit
X
X
X
X
X
X
X
X
X
X
X
X
100
and has the same sign. So, Sample means
z-scores
–3
–2
–1
0
+1
+2
+3
above, our results would not
zcrit = +1.645
have been significant if the
zobt = +2.63
pill had produced very low
IQ scores and a very large
negative zobt. We would have
had no region of rejection in Figure 7.8
the lower tail; and you can- Sampling Distribution of IQ Means for a One-Tailed Test of Whether Scores Decrease
not move the region after The region of rejection is entirely in the lower tail.
the fact to make the results
µ
significant. (After developing a “smart pill,” it would
make no sense to suddenly
say, “Whoops, I meant to
Region of rejection
f
equals 5%
call it a dumb pill.”) Therefore, use a one-tailed test
only when it is the appropriate test of your independent Sample means
X
X
X
X
X
X
X
X
X
X
X
X
100
variable—when the variable z-scores
–3
–2
–1
0
+1
+2
+3
can “work” only if scores go
zcrit = –1.645
in one direction. Otherwise,
Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test
119
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
ONE-TAILED
PERFORM A
TEST ONLY WHEN
IT IS THE APPROPRIATE WAY
TO TEST YOUR INDEPENDENT
VARIABLE.
For us to conclude that the pill lowers IQ, our sample
mean must be significantly less than 100. Therefore, the
region of rejection is in the lower tail of the distribution,
as in Figure 7.8. With a ⫽ .05, zcrit is now minus 1.645.
If the sample produces a negative zobt beyond ⫺1.645
(for example, zobt ⫽ ⫺1.69), then we reject the H0 that
the sample mean represents a m equal to or greater than
100, and accept the Ha that the sample represents a m
less than 100. Then we have evidence the pill works.
However, if zobt does not fall in the region of rejection
(for example, if zobt ⫽ ⫺1.25), we do not reject H0, and
we have no evidence as to whether the pill works or not.
7-7 STATISTICS IN THE
RESEARCH LITERATURE:
REPORTING THE RESULTS
When reading published reports, you’ll often see statements such as “the IQ pill produced a significant difference in IQ scores,” or “it had a significant effect on IQ.”
These are just other ways to say that the results reflect
a believable relationship, because they are unlikely to
occur through sampling error. However, whether a
result is significant depends on the probability used to
define “unlikely,” so we must also indicate our a. The
APA format for reporting a result is to indicate the statistic computed, the obtained value, and a. Thus, for a
significant zobt of ⫹2.00, a research report would have
z ⫽ ⫹2.00, p ⬍ .05
Notice that instead of indicating that a equals .05, we
indicate that the probability (p) is less than .05. (We’ll
discuss the reason for this shortly.) For a nonsignificant zobt of ⫺.40, we would have
We include p ⬍ .05 when
reporting a significant result
and p ⬎ .05 when reporting a
nonsignificant result.
> Quick Practice
>
>
Perform a one-tailed test when
predicting the direction the scores will
change.
When predicting that X will be higher
than m, the region of rejection is in the
upper tail of the sampling distribution.
When predicting that X will be lower
than m, the region of rejection is in the
lower tail.
More Examples
We predict that learning statistics will increase a student’s IQ. Those not learning statistics have m ⫽ 100 and
sX ⫽ 15. For 25 statistics students, X ⫽ 108.6.
1. With a one-tailed test, Ha: m ⬎ 100; H0: m ⱕ 100.
2. Compute zobt: sX ⫽ sX / 1N ⫽ 15/ 125 ⫽ 3;
zobt ⫽ (X ⫺ m)/sX ⫽ (108.6 ⫺ 100)/3 ⫽ ⫹2.87.
3. With a ⫽ .05, zcrit is ⫹1.645. The sampling distribution is as in Figure 7.7.
4. The zobt of ⫹2.87 is beyond zcrit, so the results are
significant: Learning statistics gives a m around 108.6,
while people not learning statistics have m ⫽ 100.
Say that a different mean produced zobt ⫽ ⫹1.47. This
is not beyond zcrit, so it is not significant. We’d have no
evidence whether or not learning statistics raises IQ.
For Practice
You test the effectiveness of a new weight-loss diet.
1. Why is this a one-tailed test?
z ⫽ ⫺.40, p ⬎ .05
Notice, that with nonsignificant results, the p is greater
than .05.
120
2. For the population of nondieters, m ⫽ 155. What are
Ha and H0?
(continued)
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
3. In which tail is the region of rejection?
4. With a ⫽ .05, the zobt for the sample of dieters is
⫺1.86. What do you conclude?
> Answers
4. The zobt is beyond zcrit of ⫺1.645, so it is significant:
The m for dieters will be less than the m of 155 for
nondieters.
3. The left-hand tail
2. Ha: m ⬍ 155 and H0: m ⱖ 155
1. Because a successful diet lowers weight scores
7-8 ERRORS IN STATISTICAL
DECISION MAKING
We have one other issue to consider when performing
hypothesis testing, and it involves potential errors in
our decisions. Regardless of whether we conclude that
the sample does or does not represent the predicted
relationship, we may be wrong.
7-8a Type I Errors
Sometimes the variables we investigate are not related
in nature, and so H0 is really true. When we are in this
situation, if we obtain data that cause us to reject H0,
then we make a Type I error. A Type I error is defined
as rejecting H0 when H0 is true. This error occurs
because our sample so poorly represents the situation
where the independent variable does not work that
we are fooled into concluding that the variable does
work. For example, say our pill does not make people smarter, but sampling error produces a very high
mean that falls in the region of rejection. Then we’ll
mistakenly conclude that the pill does work.
We never know when we make a Type I error,
because we never know whether our variables are
related in nature. However, we do know that the theoretical probability of a Type I error equals a. Here’s
why: When discussing the possibility of Type I errors,
it is a given that H0 is true and should not be rejected.
So assume that the IQ pill does not work, and that we
can obtain our sample only from the underlying raw
score population where m is 100. Assume a ⫽ .05. If
we repeated our experiment many times in this situation, we’d again have the sampling distribution back
in Figure 7.6 showing the different means obtained
over the long run when the pill does not work. However, 5% of the time we would obtain extreme sample
means that fall in the region of rejection and cause
us to reject H0 even though H0 is true. Rejecting H0
when it is true is a Type I error, so over the long run,
Type I errors will occur 5% of the time. Therefore, if
we happen to be in the situation where H0 is true, the
theoretical probability of making a Type I error is .05.
(The same is true in a one-tailed test.)
You either will or will
not make the correct decision when H0 is true, so
the probability of avoiding a Type I error is
h
ubnic
B
l
ie
1 ⫺ a. That is, if 5% of
Dan
and
arris
ne H
the
time
our samples are in the
n
a
o
m/J
to.co
kpho
c
region
of
rejection
when H0 is true, then
o
t
©iS
95% of the time they are not in the region of rejection when H0 is true. Therefore, with a ⫽ .05, the theoretical probability is .95 that we will avoid a Type I
error by retaining H0 when it is true.
Although the theoretical probability of a Type I
error equals a, the actual probability is slightly less
than a. This is because the region of rejection includes
the critical value, but to reject H0, our zobt must be
larger than the critical value. We cannot determine the
precise area under the curve for the point located at
zcrit, so we can’t remove it from our 5%. All we can say
is that when a is .05, the region of rejection is slightly
less than 5% of the curve. Because the region of rejection is less than a, the probability of a Type I error is
also less than a.
Thus, in our previous examples when we rejected
H0, the probability that we were making a Type I error
was less than .05. That is why the APA format is to
report significant results using p ⬍ .05. This is code
for “the probability of a Type I error is less than .05.”
On the other hand, we report nonsignificant results
using p ⬎ .05. This communicates that we did not call
a result significant because to do so would require a
region of rejection greater than .05 of the curve. But
then the probability of a Type I error
Type I error
would be greater than our a of .05
Rejecting the null
and that’s unacceptable.
hypothesis when it is
The reason we never use an a
true (that is, saying
the independent
larger than .05 is because then a
variable works when
Type I error is too likely. Instead,
it does not)
we limit the chances of making this
Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test
121
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
error because it can lead to serious consequences.
Researchers can cause enormous damage by claiming,
for example, that new drugs, therapies, or surgical
techniques work when they actually do not. In fact,
sometimes making a Type I error is so dangerous that
we want to reduce its probability even further. In that
case, we usually set alpha at .01, so the probability of
making a Type I error is p ⬍ .01. For example, if the
smart pill had some dangerous side effects, we would
set a ⫽ .01 so that we are even less likely to conclude
that the pill works when it does not. However, we
use the term significant in an all-or-nothing fashion:
A result is not “more” significant when a ⫽ .01 than
when a ⫽ .05. If zobt lies anywhere in a region of rejection, the result is significant, period! The only difference is that when a ⫽ .01, the probability that we’ve
made a Type I error is smaller.
When H0 is true: Rejecting H0 is
a Type I error, and its theoretical
probability is a; retaining H0 is
avoiding a Type I error, and its
probability is 1−a.
7-8b Type II Errors
In addition to Type I errors, we must also be concerned
about making a different kind of error. Sometimes the
variables we investigate are related in nature and so
H0 is really false. When we are in this situation, if we
obtain data that cause us to retain H0. then we make
a Type II error. A Type II error is defined as retaining
H0 when H0 is false (and Ha is true). In other words, we
fail to identify that an independent variable really does
work. This error occurs because the sample so poorly
represents the situation where the independent variable
works that we are fooled into concluding
that the variable does not work. For
Type II error
example, say our pill does make peoRetaining the null
ple smarter but it raises scores by so
hypothesis when it is
little that the sample mean is not high
false (that is, failing
to identify that the
enough to fall in the region of rejecindependent variable
tion. Then we’ll mistakenly conclude
does work as predicted)
that the pill does not work.
122
Anytime we discuss the possibility of Type II
errors, it is a given that, unknown to us, H0 is false and
should be rejected. Thus, in our examples where we
retained H0 and did not conclude that our pill worked,
it is possible we made a Type II error. (Researchers
can determine the probability of making this error, but
the computations are beyond the introductory level.)
Conversely, if we reject H0 in this situation, then we
avoid a Type II error: We’ve made the correct decision
by concluding that the pill works when it does work.
In a study where the predicted
relationship does exist in nature,
concluding that the relationship
does not exist is a Type II error;
concluding that the relationship
does exist avoids this error.
To help you distinguish between Type I and Type II
errors, remember that the type of error you can
potentially make is determined by your situation—
what nature “says” about whether there is a relationship between your variables. “Type I” is the error
that may occur when the predicted relationship does
not exist in nature. “Type II” is the error that may
occur when the predicted relationship does exist in
nature. Then, whether you actually make the error
depends on whether you disagree with nature in each
situation. Rejecting H0 when nature says there is not
a relationship is a Type I error; retaining H0 when
nature says there is a relationship is a Type II error.
Also, you cannot be in a situation where there is and
is not a relationship at the same time, so if you can
possibly make one error, you cannot make the other
error. Lastly, remember that you might be making a
correct decision. Thus, one of four outcomes is possible in any study:
When H0 is true—there is no relationship:
1. Our data cause us to reject H0, so we make a
Type I error.
2. Our data cause us to retain H0, so we avoid making a Type I error.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
When H0 is false—the relationship exists:
(at least 30), extreme sampling error
power The
is less likely, so we are less likely to
probability that
we will detect a
misrepresent a relationship that is
relationship and
present. (When you study how to
correctly reject a false
design research, you’ll learn of other
null hypothesis; the
probability of avoiding
ways to build in power.)
a Type II error
Thus, we can summarize how
researchers deal with potential errors
as follows: By using an a of .05 or less, we minimize
the probability of making the wrong decision when
H0 is true, so that we avoid a Type I error. At the same
time, by maximizing power we minimize the probability of making the wrong decision when H0 is false, so
that we avoid a Type II error.
3. Our data cause us to retain H0, so we make a
Type II error.
4. Our data cause us to reject H0, so we avoid
making a Type II error.
7-8c Power
We’ve seen that we can minimize the probability of
making Type I errors by selecting a small a. We can
also minimize the probability of making Type II errors,
but in an indirect way—by maximizing the probability that we will avoid them. This probability is called
power. Power is the probability that we will reject H0
when it is false, and correctly conclude that the sample data represent a real relationship. In other words,
power is the probability of not making a Type II error.
Remember that a Type II error occurs when the
predicted relationship does exist in nature, so to avoid
the error we should reject H0 and have significant
results. Therefore, researchers maximize the power of
a study by maximizing the chances that the results
will be significant. Then we are confident that if the
relationship exists in nature, we will not miss it. If
we still end up retaining H0, we are confident that
this is because the relationship is not there. Therefore,
we are confident in our decision to retain H0, and, in
statistical lingo, we are confident we are avoiding a
Type II error.
We have several ways to increase the power of a
study. First, it is better to design a study that employs
parametric procedures, because they are more powerful than nonparametric procedures: Because of its
theoretical basis, a parametric test is more likely to
produce significant results. Second, a one-tailed test
is more powerful than a two-tailed test: A zobt is more
likely to be beyond the one-tailed zcrit of 1.645 than
the two-tailed zcrit of 1.96, so the one-tailed test is
more likely to be significant. Finally, testing a larger N
provides greater power: With more scores in a sample
> Quick Practice
>
>
A Type I error is rejecting a true H0.
A Type II error is retaining a false H0.
Power is the probability of not making a
Type II error.
More Examples
When H0 is true, there is no relationship: If the data
cause us to reject H0, we make a Type I error. To decrease
the likelihood of this, we keep alpha small. If the data
cause us to retain H0, we avoid this error. When H0 is
false, there is a relationship: If the data cause us to retain
H0, we make a Type II error. If the data cause us to reject
H0, we avoid this error. To increase the likelihood of this,
we increase power.
For Practice
1. Claiming that an independent variable works even
though in nature it does not is a _____ error.
2. Failing to conclude that an independent variable
works even though in nature it does is a _____ error.
3. If we reject H0, we cannot make a _____ error.
5. To be confident in a decision to retain H0, our power
should be _____.
> Answers
1. Type I 2. Type II 3. Type II 4. Type I 5. high
©iStockphoto.com/Dan Tero
4. If we retain H0, we cannot make a _____ error.
Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test
123
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out
and use the Chapter Review Cards in the back of your book. Check out the additional study aids online
in CourseMate at www.cengagebrain.com
STUDY PROBLEMS
(Answers for odd-numbered questions are in Appendix C.)
1. Why does the possibility of sampling error present
a problem to researchers when inferring a relationship in the population?
2. What are inferential statistics used for?
3. What does a stand for, and what two things does
it determine?
4. (a) What are the two major categories of inferential procedures? (b) What characteristics of your
study determine which category is appropriate
for your study? (c) What is a statistical reason for
preferring to design a study where you can use
parametric procedures?
5. A researcher obtains sample data showing that
participants who wear a pedometer tend to exercise more often than those who do not wear one.
What are the two possible statistical explanations
for this result?
6. Some of the children given a new flu vaccine later
develop a neurological disease. Parents claim the
vaccine caused the disease. What are the two
possible statistical explanations for this result?
7. (a) What does H0 stand for and what does it communicate? (b) What does Ha stand for and what
does it communicate?
8. (a) When do you use a one-tailed test? (b) When
do you use a two-tailed test?
9. (a) What does “significant” convey about the
results of an experiment? (b) Why is obtaining
significant results a goal of behavioral research?
there is no relationship in the population.” (d) “We
have insignificant results.” (e) “We have no information about the relationship in the population.”
(f) “The independent variable may work, but we
might have sampling error in representing this.”
12. (a) In plain English, what is the incorrect statement
you could make about a relationship when it does
not exist in nature? (b) Which statistical hypothesis
says the relationship does not exist? (c) What is the
incorrect decision you can make with this hypothesis? (d) What is our name for this error? (e) What
is the incorrect statement you could make about a
relationship when it does exist in nature? (f) How
do we make an incorrect decision about H0 when
a relationship does exist? (g) What is our name for
this error?
13. We ask if the attitudes toward fuel costs of 100
owners of hybrid electric cars (X ⫽ 76) are different
from those reported in a national survey of owners
of non-hybrid cars (m ⫽ 65 and sX ⫽ 24). Higher
scores indicate a more positive attitude. (a) What
is the predicted relationship here? (b) Is this a oneor a two-tailed test? (c) In words, state the H0 and
the Ha. (d) Compute zobt. (e) What is zcrit? (f) What
do you conclude about attitudes here? (g) Report
your results in the correct format.
14. We ask if visual memory ability for a sample of 25 art
majors (X ⫽ 49) is better than that for a population
of engineering majors (m ⫽ 45 and sX ⫽ 14). Higher
scores indicate a better memory. (a) What is the predicted relationship here? (b) Is this a one- or a twotailed test? (c) In words, state H0 and Ha. (d) Compute
zobt. (e) What is zcrit? (f) What do you conclude about
differences in visual memory ability? (g) Report your
results using the correct format.
10. In a study we reject H0. Which of the following
statements are incorrect and why? (a) “Now we
know that H0 is false.” (b) “We have proof that our
sample mean represents a particular m.” (c) “We
have proof that the independent variable causes
scores to change as predicted.” (d) “It is not possible that the difference between X and m is due
to sampling error.” (e) “We have evidence that the
predicted relationship does not exist.”
15. (a) In question 13, what is the probability we made
a Type I error? What would be the error in terms
of the independent and dependent variables?
(b) What is the probability we made a Type II
error? What would be the error in terms of the
independent and dependent variables?
11. In a study we retain H0. Which of the following
statements are incorrect and why? (a) “We have
proof that the independent variable does not
cause scores to change as predicted.” (b) “We have
convincing evidence that the independent variable does not work.” (c) “We should conclude that
16. (a) In question 14, is it possible we made a Type
I error? What would be the error in terms of the
independent and dependent variables? (b) Is it
possible we made a Type II error? What would be
the error in terms of the independent and dependent variables?
124
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
17. Using the independent and dependent variables,
describe the experimental hypotheses when we
study: (a) whether the amount of pizza consumed
by college students during finals week increases
relative to the amount consumed during the rest
of the semester, (b) whether having participants
do breathing exercises alters their blood pressure,
(c) whether sensitivity to pain is affected if we
increase participants’ hormone levels, (d) whether
frequency of daydreaming decreases when we
increase the amount of light in the room.
18. For each study in question 17, indicate whether
a one- or a two-tailed test should be used and
give the H0 and Ha: Assume we have a one-sample
experiment, and the m of the dependent scores is
50 before we change the independent variable X.
19. Listening to music while taking a test may be
relaxing or distracting. Jonas tests 49 participants
while they listen to music, obtaining a X ⫽ 74.36.
The mean of the population taking this test without music is 70 (sX ⫽ 12). (a) Is this a one-tailed or
a two-tailed test? Why? (b) Using symbols, what
are H0 and Ha? (c) Compute zobt. (d) What is zcrit?
(e) Does he have evidence of a relationship in the
population? If so, describe the relationship.
20. Laura asks whether attending a private school
versus a public school leads to higher or lower
performance on a test of social skills. A sample
of 100 students from a private school produces a
mean of 71.30 on the test, and the mean for the
population of students from a public school is
75.62 (sX ⫽ 28.0). (a) Should she use a one-tailed
or a two-tailed test? Why? (b) Using symbols, what
are H0 and Ha? (c) Compute zobt. (d) With a ⫽ .05,
what is zcrit? (e) What should she conclude about
this relationship?
21. Melissa measures the self-esteem of a sample of
statistics students, predicting that the challenge of
this course lowers their self-esteem relative to that
of the typical college student, where nationally the
m ⫽ 28 and sX ⫽ 11.35. She obtains these scores:
44 55 39 17 27 38 36 24 36
(a) Summarize the sample data. (b) Is this a onetailed or a two-tailed test? Why? (c) What are H0
and Ha? (d) Compute zobt. (e) What is zcrit? (f) What
should she conclude about the relationship here?
(g) What three possible situations involving these
variables might be present in nature?
22. (a) What is power? (b) Why do researchers want
to maximize power? (c) Is power more important
when we reject or when we retain H0? Why?
(d) Why is a one-tailed test more powerful than
a two-tailed test?
23. Arlo claims that with a one-tailed test, the smaller
zcrit makes us more likely to reject H0 even if the
independent variable doesn’t work, so we are
more likely to make a Type I error. Why is he
correct or incorrect?
24. Researcher A finds a significant relationship
between increasing stress level and ability to concentrate. Researcher B repeats this study but finds
a nonsignificant result. Identify the statistical error
that each researcher may have made.
25. Amber says increasing power also makes us more
likely to reject H0 when it is true, making a Type I
error more likely. Why is she correct or incorrect?
Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test
125
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter
8
HYPOTHESIS TESTING USING
THE ONE-SAMPLE t-TEST
LOOKING
BACK
GOING
F O R WA R D
Be sure you understand:
Your goals in this chapter are to learn:
• From Chapter 4, that sX is the
estimated population standard
deviation, sX2 is the estimated
population variance, and both
involve dividing by N⫺1.
• The difference between the z-test and the t-test.
• From Chapter 7, the components
of hypothesis testing and what
significant indicates.
Sections
8-1
Understanding the
One-Sample t-Test
8-2
Performing the
One-Sample t-Test
8-3
Interpreting the t-Test
8-4
Estimating M by
Computing a Confidence
Interval
8-5
126
• How the t-distribution and degrees of freedom are used.
• When and how to perform the t-test.
• What is meant by the confidence interval for m and how it is
computed.
T
he logic of hypothesis testing discussed in the previous
chapter is common to all inferential procedures.
Therefore, for the remainder of this book, your goal
is to learn how slightly different procedures (with
different formulas) are applied when we have different research
designs. We begin the process in this chapter by introducing the
t-test, which is very similar to the previous z-test. The chapter
presents (1) the similarities and differences between the z-test
and the t-test, (2) when and how to perform the t-test, and (3)
a new procedure—called the confidence interval—that is used
to more precisely estimate m.
Stastics in the Research
Literature: Reporting t
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
©Yuri Arcurs/Shutterstock.com
8-1 UNDERSTANDING THE
ONE-SAMPLE t-TEST
Like the z-test, the t-test is used for significance testing
in a one-sample experiment. In fact, the t-test is used
more often in behavioral research. This is because
the z-test requires that we know the population
standard deviation (sX). However, usually researchers do not know such things about the population
because they’re exploring uncharted areas of behavior. Instead, we must estimate the population variability by using the sample data to compute the unbiased
estimators (the N⫺1 formulas) of the population’s
standard deviation or variance. Then we compute
something like a z-score for our sample mean, but,
because the formula is slightly different, it is called t.
The one-sample t-test is the parametric procedure
used in a one-sample experiment when the standard
deviation of the raw score population is not known.
Here’s an example that requires the t-test: Say that
a fashion/lifestyle magazine targeted toward “savvy”
young women asks readers to complete an online survey (with cool prizes) to measure their level of optimism
about the future. The survey uses an interval scale, where
0 is neutral, positive scores indicate optimism, and negative scores indicate pessimism. The magazine reports an
average score of 10 (so the m is 10). Say that we then
ask, “How would men score on this survey?” To answer
this, we’ll perform a one-sample experiment by giving
the survey to a comparable sample of
one-sample
men and use their X to estimate the m
t-test The
parametric procedure
for the population of men. Then we
used in a one-sample
can compare the m for men to the m
experiment when
of 10 for women. If men score differthe standard
ently from women, then we’ve found
deviation of the raw
score population is
a relationship in which, as gender
estimated
changes, optimism scores change.
Magazines don’t concern themselves with reporting a standard deviation, so we have
a one-sample experiment where we don’t know the
sX of the raw score population. Therefore, the t-test is
appropriate. As usual, we first set up the statistical test.
1. The statistical hypotheses: We’re open-minded
and look for any kind of difference, so we have a
two-tailed test. If men are different from women,
then our sample represents a m for men that will
not equal the m for women of 10, so Ha is m ⬆ 10.
If men are not different, then their m will equal
that of women, so H0 is m ⫽ 10.
Use the z-test when sX is known;
use the t-test when it is not known.
Chapter 8: Hypothesis Testing Using the One-Sample t-Test
127
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
estimated
standard error
of the mean
(sX– ) An estimate of
the standard deviation
of the sampling
distribution of means,
used in calculating the
one-sample t-test
2. Alpha: We select alpha; .05
sounds good.
not related to optimism and we are being misled by
sampling error: Maybe by chance we selected some
exceptionally pessimistic men for our sample, but in
the population men are not different from women and
so the sample is simply poorly representing the men’s
population where m ⫽ 10.
To test this null hypothesis, we’ll use the same
logic we’ve used previously: H0 says that the men’s
mean represents the population where m is 10. We
will compute tobt, which will locate our sample on the
sampling distribution of means produced when we are
sampling from this raw score population. The critical
value that marks the region of rejection is tcrit. If tobt
is beyond tcrit, our sample mean lies in the region of
rejection, so we’ll reject the idea that the sample represents the population where m ⫽ 10.
The only novelties here are that tobt is calculated
differently than zobt and that tcrit comes from the
“t-distribution.” Therefore, first we’ll see how to compute tobt and then we’ll see how to set up the sampling
distribution.
3. Check the assumptions: The
one-sample t-test requires the
following:
a. You have a one-sample
experiment using interval or
ratio scores.
b. The raw score population forms a normal
distribution.
c. The variability of the raw score population is
estimated from the sample.
Our study meets these assumptions, so we proceed. For simplicity, say we test 9 men. (For adequate
power, you should never collect so few scores.) Say
the sample produces a X of 7.78. On the one hand
(as in Ha), based on this X we might conclude that the
population of men would score around a m of 7.78.
Because women score at a m of 10, maybe we have
demonstrated a relationship between gender and optimism. On the other hand (as in H0), maybe gender is
8-2 PERFORMING THE
ONE-SAMPLE t-TEST
The computation of tobt consists of three steps that
parallel the three steps in the z-test.
The first step in the z-test was to find the true
standard deviation (sX) of the raw score population.
For the t-test, we can compute the estimated standard
deviation (sX), or, as we’ll see, the estimated population variance (sX2 ). Their formulas are shown here:
(⌺X ) 2
N
N⫺1
⌺X 2 ⫺
sX ⫽
R
and
(⌺X ) 2
⌺X ⫺
N
2
sX ⫽
N⫺1
©iStockphoto.com/Zorani
2
128
The second step of the z-test was to compute the
standard error of the mean (sX), which is the standard deviation of the sampling distribution of means.
However, because now we estimate the population
variability, we compute the estimated standard
error of the mean, which is an estimate of the
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
standard deviation of the sampling distribution of
means. The symbol for the estimated standard error
of the mean is sX . (The s stands for an estimate of the
population, and the subscript X indicates it is a population of means.)
Previously, we computed the standard error using
this formula:
X ⫺m
sX
Now, using the estimated standard error, we have
the very similar final step of computing tobt.
THE FORMULA FOR THE ONE-SAMPLE
t-TEST IS
tobt ⫽
sX
1N
You may use this formula to compute sX. However, to make your life a little easier and to prepare
for formulas in the next chapter, we’ll make a small
change. Instead of computing the estimated standard
deviation as above, we will compute the estimated
population variance (sX2 ). Recall that their difference
is that the standard deviation requires first computing
the variance, and then has the added step of finding
its square root. So, using the symbol for the variance,
here is how the previous formula computed the estimated standard error.
2sX2
©iStockphoto.com/Seraficus
1N
Using the estimated population standard deviation
produces this very similar formula:
sX ⫽
zobt ⫽
sX
sX ⫽
sX ⫽
Finally, the third step in the z-test was to compute
zobt using this formula:
X ⫺m
sX
X is the sample mean, m is the mean of the sampling
distribution (which equals the value of m that H0 says
we are representing), and sX is the estimated standard
error of the mean computed above.
For example, say that our optimism study yielded
the data in Table 8.1.
STEP 1: Compute the X and the estimated variance
using the sample data. Here X ⫽ 7.78. The
sX2 equals
(⌺X ) 2
(70)2
574 ⫺
N
9
⫽
⫽ 3.695
N⫺1
9⫺1
⌺X 2 ⫺
sX2 ⫽
1N
Finding the square root in the numerator gives the
standard deviation, and then dividing by the square
root of N gives the standard error. However, to
avoid all that square rooting, we can replace the
two square root signs with one big one, producing
this:
THE FORMULA FOR THE ESTIMATED
STANDARD ERROR OF THE MEAN IS
sX ⫽
sX2
BN
This formula divides the estimated population variance by the N of our sample and then takes the square
root.
Table 8.1
Optimism Scores of Nine Men
Participants
1
2
3
4
5
6
7
8
9
N⫽9
Scores (X)
9
8
10
7
8
8
6
4
10
X2
81
64
100
49
64
64
36
16
100
⌺X ⫽ 70
⌺X 2 ⫽ 574
X ⫽ 7.78
Chapter 8: Hypothesis Testing Using the One-Sample t-Test
129
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The sampling
distribution of all values
of t that occur when
samples of a particular
size are selected
from the raw score
population described by
the null hypothesis
sX ⫽
sX2
BN
⫽
3.695
B 9
⫽ 2.411 ⫽ .64
STEP 3: Compute tobt.
tobt ⫽
X ⫺ m 7.78 ⫺ 10
⫽
⫽ ⫺3.47
sX
.64
Thus, our tobt is ⫺3.47.
> Quick Practice
>
Perform the one-sample t-test in a
one-sample experiment when you
do not know the population standard
deviation.
More Examples
In a study, H0 is that m ⫽ 60. The X ⫽ 62, sX2 ⫽ 25, and
N ⫽ 36. To compute tobt:
sX ⫽
tobt ⫽
sX2
BN
⫽
25
⫽ 1.694 ⫽ .833
B 36
X ⫺ m 62 ⫺ 60 ⫹2
⫽
⫽
⫽ ⫹2.40
sX
.833
.833
For Practice
In a study, H0 is that m ⫽ 6. The data are 6, 7, 9, 8, 8.
1. To compute tobt, what two statistics are computed
first?
2. What do you compute next?
3. Compute the tobt.
> Answers:
3. X ⫽ 7.6, sX2 ⫽ 1.30, N ⫽ 5; s X ⫽ 11.30/5 ⫽ .51;
tobt ⫽ (7.6 ⫺ 6)>.51 ⫽ ⫹3.137
2. s X
1. X and sX2
130
8-2a The t-Distribution and df
To evaluate a tobt we must compare it to tcrit, and for
that we examine the t-distribution. Think of the
t-distribution in the following way. Once again
we infinitely draw samples of the same size N from
the raw score population described by H0. For each
sample we compute the X and its tobt. Then we plot the
frequency distribution of the different means, labeling
the X axis with tobt as well. Thus, the t-distribution
is the distribution of all possible values of t computed
for random sample means selected from the raw score
population described by H0.
You can envision the t-distribution as in
Figure 8.1. As we saw with z-scores, increasing positive values of t are located farther to the right of m;
increasing negative values of t are located farther to
the left of m. As usual, this sampling distribution is
still showing the different means that occur when H0
is true. So, if our tobt places our mean close to the
center of the distribution, then we have a mean that
is frequent and likely when we are representing the
population described by H0. (In our example, our
sample of men is likely to represent the population
where m is 10.) But, if tobt places our mean far into
a tail of the distribution, then we have a mean that
hardly ever happens and is very unlikely when we are
representing the population described by H0. (Our
sample of men is unlikely to represent the population
where m is 10.)
As usual, to determine if our mean is far enough
into a tail to be in the region of rejection, we first identify the critical value of t. But we have one important novelty here: The t-distribution does not fit the
perfect standard normal curve (and z-table) the way
our previous sampling distributions did. Instead, there
are actually many versions of the t-distribution, each having
a slightly different
shape. The shape of a
particular distribution
depends on the size of
the samples that are
used when creating
it. When using small
samples, the t-distribution is only roughly
normally distributed.
With larger samples,
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
©Paul Cooklin/Brand X Pictures/Jupiter Images
STEP 2: Compute the estimated
standard error of the
mean.
t-distribution
different critical values of t. For example,
Example of a t-Distribution of Random Sample Means
Figure 8.2 shows two
t-distributions. Notice
µ
the size of the blue
region of rejection in
the tails of Distribution
A. Say that this is the
f
extreme 5% of Distribution A and has the
critical value shown.
If we also use this tcrit
Higher means
Lower means
X
X
X X
X
X
X
X
X
X
X
10 X
on Distribution B, the
Values of t
–3
–2
–1
0
+1
+2
+3
region of rejection is
larger, containing more
than 5% of the distribution. Conversely, the tcrit markthe t-distribution is a progressively closer approximaing off 5% of Distribution B will mark off less than
tion to the perfect normal curve. This is because with
5% of Distribution A. (The same problem exists for a
a larger sample, our estimate of the population varione-tailed test.)
ance or standard deviation is closer to the true popuThis issue is important because not only is a the
lation variance and standard deviation. As we saw in
size of the region of rejection, it is also the probabilthe z-test, when we have the true population variability of a Type I error (which is rejecting H0 when it is
ity the sampling distribution is a normal curve.
true.) Unless we use the appropriate tcrit for a particuThe fact that there are differently shaped
lar t-distribution, the actual probability of a Type I
t-distributions for different sample sizes is important
error will not equal our a and that’s not supposed to
for one reason: Our region of rejection should conhappen! Thus, the obvious solution is to examine the
tain precisely that portion of the curve defined by
t-distribution that is created when using the same samour a. If a ⫽ .05, then we want the critical value to
ple size as in our study. For the particular shape of this
mark off precisely the extreme 5% of the curve as the
distribution we determine the specific value of tcrit. Only
region of rejection. However, for distributions that are
then will the region of rejection (and the probability of a
shaped differently, the point that marks the extreme
Type I error) equal our a.
5% will be at different locations on the X axis of the
However, in this context, the size of a sample is
distribution. Because this point is at the critical value,
not determined by N. Recall that when computing
with differently shaped t-distributions we must use
the estimated variance
or estimated standard
Figure 8.2
deviation, the final diviComparison of Two t-Distributions Based on Different Sample Ns
sion involves the quanµ
tity N ⫺ 1. In Chapter
Distribution A
4 we saw that this is
Distribution B
the number of scores in
a sample that actually
f
reflect the variability in
the population. Thus, it
is the size of the quantity
N ⫺ 1 that determines
the shape of the t-distriMeans
X X X X X X
X X X X X X X X X X X
bution and our tcrit for a
Values of t
–4
–3
–2
–1
0
+1
+2
+3
+4
particular study.
Critical value
Critical value
We have a special
for Distribution A
for Distribution A
name for “N ⫺ 1”: It is
Figure 8.1
Chapter 8: Hypothesis Testing Using the One-Sample t-Test
131
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The number of scores
in a sample that reflect
the variability in the
population; determine
the shape of the
sampling distribution
when estimating sX
called the degrees of freedom and
is symbolized as df. Thus
A Portion of the t-Table
THE FORMULA FOR
DEGREES OF FREEDOM IN
THE ONE-SAMPLE t-TEST IS
df ⫽ N ⫺ 1
where N is the number of scores in the sample. In our
optimism study, N ⫽ 9 so df ⫽ 8.
The larger the df, the closer the t-distribution
comes to forming a normal curve. However, a tremendously large sample is not required to produce
a normal t-distribution. When df is greater than 120,
the t-distribution is virtually identical to the standard
normal z-distribution. But when df is between 1 and
120 (which is often the case in research), a differently
shaped t-distribution will occur for each df. Therefore,
a different tcrit is required for each df.
Thus, you will no longer automatically use the
critical values of 1.96 or 1.645 as you did in previous
chapters. Instead, when your df is between 1 and 120,
use the df to identify the appropriate sampling distribution for your study. The tcrit on that distribution will
accurately mark off the region of rejection so that the
probability of a Type I error equals your a. So, in the
optimism study with an N of 9, we will use the tcrit from
the t-distribution for df ⫽ 8. In a different study, however, where N might be 25, we would use the different
tcrit from the t-distribution for df ⫽ 24. And so on.
The appropriate tcrit for
the one-sample t-test comes
from the t-distribution that has df
equal to N⫺1.
8-2b Using the t-Table
We obtain the different values of tcrit from Table 2
in Appendix B, titled “Critical Values of t.” In this
“t-table” you’ll find separate tables for two-tailed and
one-tailed tests. Table 8.2 contains a portion of the
two-tailed table.
To find the appropriate tcrit, first locate the appropriate
column for your a level (either .05 or .01). Then find the
132
Table 8.2
Alpha Level
df
1
2
3
4
5
6
7
8
a ⫽ .05
12.706
4.303
3.182
2.776
76
2.571
2.447
2.365
2.306
a ⫽ .01
63.657
9.925
5.841
4.604
4.032
3.707
3.
3.499
3.355
value of tcrit in the row at
the df for your sample.
For example, in the
optimism
ism study, dff is
8. For a two-tailed
o-tailed
test with a ⫽ .05 and df
d ⫽ 8,
tcrit is 2.306.
In a different study, say the sample N
is 61. Therefore, the df ⫽ N ⫺ 1 ⫽ 60. Look in
Table 2 of Appendix B to find tcrit. With a ⫽ .05, the
two-tailed tcrit ⫽ 2.000; the one-tailed tcrit ⫽ 1.671.
671.
The table contains no positive or negative
signs. In a two-tailed test you add the “ { ”, and in a
one-tailed test you supply the appropriate “⫹” or “⫺”.
Finally, the t-tables contain critical values for only
some values of df. When the df of your sample does
not appear in the table, select the two dfs that bracket
above and below your df and use their values of tcrit
(e.g., if your df ⫽ 65, use the tcrit at dfs of 60 and 120).
This gives you a larger and a smaller tcrit, with your
actual tcrit in between. Then:
1. If your tobt is larger than the larger tcrit, then
your results are significant. If you are beyond a
number larger than your actual tcrit, then you are
automatically beyond the actual tcrit.
2. If your tobt is smaller than the smaller tcrit, then
your results are not significant. If you are not
beyond a number smaller than your actual tcrit,
then you won’t be beyond the actual tcrit.
Rarely, tobt will fall between the two critical values: Then either perform the t-test using SPSS, or consult an advanced book to use “linear interpolation” to
compute the precise tcrit.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
©iStockphoto.com/Don Nichols
degrees of
freedom (df )
8-3 INTERPRETING
THE t-TEST
Once you’ve calculated tobt and identified tcrit, you
can make a decision about your results. Remember
our optimism study? We must decide whether the
men’s mean of 7.78 represents the same population
of scores that women have, where m is 10. Our tobt is
⫺3.47, and the two-tailed tcrit is {2.306. Thus, we
can envision the sampling distribution in Figure 8.3.
Remember, this shows all means that occur by chance
when H0 is true—here, when our sample represents
the population where m is 10. But, our tobt lies beyond
tcrit, so the results are significant: Our X is so unlikely to
occur if we were representing the population where m
is 10 that we reject that this is the population our sample represents. So, we reject H0 and accept Ha. With
a ⫽ .05, the probability is p ⬍ .05 that we’ve made a
Type I error (by incorrectly rejecting H0). We interpret
these results using the same rules and cautions discussed in the previous chapter.
But remember: Finding a significant result is not
the end of the story. First we describe the relationship
we’ve demonstrated. With a sample mean of 7.78, our
best estimate is that the m for men is around 7.78.
Because women have a different m, at 10, we conclude
that our results demonstrate a relationship in the
population where, as we change from men to women,
optimism scores change from a m around 7.78 to a
m around 10. Finally, we return to being researchers and interpret the relationship in psychological or
sociological terms: Why are women more optimistic
than men? Are there social/cultural reasons or perhaps physiological reasons? Or instead, do men only
act more pessimistic as part of a masculinity issue?
And so on.
If tobt had not fallen beyond tcrit (for example,
if tobt ⫽ ⫹1.32), then it would not lie in the region
of rejection and would not be significant. Then we
would consider whether we had sufficient power to
avoid making a Type II error (incorrectly retaining
H0 and missing the relationship). And, we would
apply the rules for interpreting nonsignificant results
as discussed in the previous chapter, concluding
that we have no evidence—one way or the other—
regarding a relationship between gender and optimism scores.
8-3a Performing One-Tailed Tests
As usual, we perform one-tailed tests only when we
can confidently predict the direction of the relationship. If we had a reason to predict that men score
higher than women (who have a m of 10), then Ha
would be that the sample represents a population with
m greater than 10 (Ha: m ⬎ 10). Our H0 is always that
our predictions are wrong, so here it would be that
the sample represents a population with a m less than
or equal to 10 (H0: m ⱕ 10). We compute tobt as shown
previously, but we find the one-tailed tcrit from the
t-table for our df and a. To decide which tail of the
sampling distribution to put the region of rejection in,
determine what’s needed to support Ha. For our sample to represent a population of higher scores, the X
must be greater than 10 and be significant. As shown
Figure 8.3
Two-Tailed t-Distribution for df ⫽ 8 When H0 Is True and m ⫽ 10
µ
f
Ambrophoto/Alamy
Means
Values of t
X
X
tobt = –3.47
X X
–3
X
–2
tcrit = –2.306
X
X
–1
X
10
0
X
X
+1
X
X
+2
X
X
+3
X
X
tcrit = +2.306
X = 7.78
Chapter 8: Hypothesis Testing Using the One-Sample t-Test
133
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Figure 8.4
H0 Sampling Distribution of t for a One-Tailed Test
H0: µ ≥ 10
Ha: µ < 10
H0: µ ≤ 10
Ha: µ > 10
µ
µ
f
f
.05
X
X
X
–t
–t
–t
X
X
–t
10
X
0
+t
X
X
X
+t
+t
.05
X
X
+t
–t
+tcrit
in the left-hand sampling distribution in Figure 8.4,
such means are in the upper tail, so tcrit is positive.
On the other hand, say we had predicted that
men score lower than women. Now Ha is that m
is less than 10, and H0 is that m is greater than or
equal to 10. For our sample to represent a population
of lower scores, the X must be less than 10 and be
significant. As shown in the right-hand sampling
distribution in Figure 8.4, such means are in the lower
tail, so tcrit is negative.
In either example, if tobt is in the region of rejection, then the X is unlikely to represent a m described
by H0. Therefore, reject H0, accept Ha, and describe
the results as significant. If tobt is not in the region of
rejection, then the results are not significant.
8-3b Summary of the
one-Sample t-Test
The one-sample t-test is used with a one-sample
experiment involving normally distributed interval or
ratio scores when the variability in the population is
not known. Then
X
X
–t
–t
⫺tcrit
X
X 10
–t
0
X
+t
X
X
X
+t
+t
X
+t
interpret the relationship. If tobt is not beyond tcrit,
the results are not significant; make no conclusion
about the relationship.
> Quick Practice
>
Perform the one-sample t-test when sx is
unknown.
More Examples
In a study, m is 40. We predict our condition will change
scores relative to this m. This is a two-tailed test, so
H0: m ⫽ 40; Ha: m ⬆ 40. Then compute tobt.
The 25 scores produce X ⫽ 46 and s 2x ⫽ 196. We
compute tobt to be ⫹2.14. Next, we find tcrit: With a ⫽ .05
and df ⫽ 24, tcrit ⫽ {2.064.
The tobt lies beyond the tcrit. Conclusion: The independent variable significantly increases scores from a m
of 40 to a m around 46.
a. Compute X and sX2 .
For Practice
We test if artificial sunlight during the winter months
lowers one’s depression. Without the light, a depression
test has m ⫽ 8. With the light, our sample with N ⫽ 41
produced X ⫽ 6. The tobt ⫽ ⫺1.83.
b. Compute sX.
1. What are the hypotheses?
1. Create either the two-tailed or one-tailed H0 and Ha.
2. Compute tobt.
c. Compute tobt.
3. Envision the sampling t-distribution and use
df ⫽ N ⫺ 1 to find tcrit in the t-table.
4. Compare tobt to tcrit. If tobt is beyond tcrit, the results
are significant; describe the populations and
134
2. What is tcrit?
3. What is the conclusion?
4. If N had been 50, would the results be significant?
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
4. Yes; the ⫺tcrit would be between −1.684 and −1.671.
3. tobt is beyond tcrit. Conclusion: Artificial sunlight
significantly lowers depression scores from a m of 8
to a m around 6.
2. With a ⫽ .05 and df ⫽ 40, tcrit ⫽ ⫺1.684
1. To “lower” is a one-tailed test: Ha: m ⬍ 8; H0: m ⱖ 8.
8-4 ESTIMATING M BY
COMPUTING A CONFIDENCE
INTERVAL
7.78
©iStockphoto.com/Caroline Klapper
As you’ve seen, after rejecting H0, we estimate the
population m that the sample mean represents. There
are two ways to estimate m.
The first way is point estimation, in which we
describe a point on the dependent variable at which
the population m is expected to fall. We base this
estimate on our sample mean. Earlier, for example,
we estimated that the m of the men’s population is
located on the optimism variable at the point equal
to our men’s sample mean of 7.78. However, the
problem with point estimation is that it is extremely
vulnerable to sampling error. Our sample of men
probably does not perfectly represent the population of men, so if we actually tested the entire population, m probably would not be exactly 7.78. This
is why we have been saying that the m for men is
around 7.78.
The other, better way to estimate a m is to include
the possibility of sampling error and perform interval
estimation. With interval estimation, we specify a
range of values within which we expect the population parameter to fall.
One way that you often encounter such intervals in
real life is when you hear of a result accompanied by
“plus or minus” some amount. This is called the margin of error. For
example, during
an election you
may hear that a
survey showed
that 45% of
the voters support a particular
candidate, with a margin of error
of {3%. The survey would involve
a sample, however, so it may contain sampling error in representing the population. The margin of
error defines this sampling error
by indicating that, if we could ask
the entire population, we expect the
result would be within {3 of 45%.
That is, we would expect the actual
percent of the population that supports the candidate to be inside the
interval that is between 42% and
48%. Thus, the margin of error
describes an interval by describing
a central value, with plus or minus
some amount.
In behavioral research we perform interval estimation in a similar way by creating a confidence
interval. Confidence intervals can
be used to describe various population parameters, but the most
common is for estimating m. The
confidence
interval
for
M
point
estimation A
way to estimate a
population parameter
by describing a point
on the variable at
which the population
parameter is expected
to fall
interval
estimation A
way to estimate a
population parameter
by describing an
interval within which
the population
parameter is expected
to fall
margin of
error Describes an
interval by describing
a central value, with
plus or minus some
amount
confidence
interval for M
A range of values of
m within which we
are confident that the
actual m is found
describes the interval within which
we are confident that a population m falls. So, in our
optimism study, instead of merely saying that our sample of men represents a m somewhere around 7.78, we
can use a confidence interval to define “around.” To do
so, we’ll identify the values of m that the sample mean is
likely to represent. You can visualize this as shown here:
m low c m m m m 7.78 m m m m c m high
冦
> Answers
values of m,one of which is likely to be
represented by our sample mean
The mlow is the lowest m that our sample mean is
likely to represent, and mhigh is the highest m that the
mean is likely to represent. When we compute these
two values, we have the confidence interval because
we are confident that the m being represented by our
sample falls between them.
When is a sample mean likely to represent a particular m? It depends on sampling error. For example,
intuitively we know that sampling error is unlikely
to produce a sample mean of 7.78 if m is, say, 500:
A sample “couldn’t” be that unrepresentative. In other
words, 7.78 is significantly different from 500. But
Chapter 8: Hypothesis Testing Using the One-Sample t-Test
135
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
sampling error is likely to produce a sample mean of
7.78 if, for example, m is 8 or 9: That’s a believable
amount of sampling error. In other words, 7.78 is not
significantly different from these ms. Thus, a sample
mean is likely to represent any m that the mean is not
significantly different from. The logic behind a confidence interval is to compute the highest and lowest
values of m that are not significantly different from
our sample mean. All ms between these two values are
also not significantly different from the sample mean,
so the mean is likely to represent one of them. In other
words, the m being represented by our sample mean is
likely to fall within this interval.
Now we simply need to rearrange this formula to find
the value of m that, along with our X and sX , produces
an answer equal to the tcrit for our study. However, we
want to do this twice, once describing the highest m
above our X and once describing the lowest m below
our X Therefore we always use the two-tailed value of
tcrit. Then we find the m that produces a ⫺tobt equal to
⫺tcrit, and we find the m that produces a ⫹tobt equal to
⫹tcrit. Luckily, we can combine all of these steps into
this one formula.
THE FORMULA FOR THE CONFIDENCE
INTERVAL FOR m IS
(sX )(⫺tcrit) ⫹X ⱕ m ⱕ (sX )(⫹tcrit) ⫹ X
We compute a confidence interval only after
finding a significant tobt. This is because we must be
sure our sample is not representing the m described
by H0 before we estimate any other m it might represent. Thus, we determined that our men represent a m
that is different from that of women, so now we can
describe that m.
STEP 1: Find the two-tailed tcrit and fill in the formula.
For our optimism study, the two-tailed tcrit for
a ⫽ .05 and df ⫽ 8 is { 2.306. The X ⫽ 7.78
and sX ⫽ .64. Filling in the formula, we have
(.64)(⫺2.306) ⫹7.78 ⱕ m ⱕ (.64)(⫹2.306)⫹7.78
STEP 2: Multiply each tcrit times sX. After multiplying .64 times ⫺2.306 and ⫹2.306, we have
⫺1.476 ⫹7.78 ⱕ m ⱕ ⫹1.476⫹ 7.78.
8-4a Computing the
Confidence Interval
The t-test forms the basis for the confidencee interval,
and it works like this. We seek the highest m above our
sample mean that is not significantly different
rent from
our mean and the lowest m below our sample
ple mean
that is not significantly different from our mean. The
most that a m and sample mean can differ and
d still not
be significant is when they produce a tobt that
at equals
tcrit. We can state this using the formula for the
he t-test:
tcrit ⫽
136
sample
mean
X ⫺m
sX
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
©iStockphoto.com/LevKing
A confidence interval describes
the values of m that are not
significantly different from our
sample mean, so it is likely our
mean represents one of them.
The m in the formula stands for our m that we
are estimating. The components to the left of m will
produce mlow, so we are confident our m is greater than
or equal to this value. The components to the right of
m will produce mhigh, so we are confident that our m is
less than or equal to this m.
In the formula, the X and sX are from your data.
Find the two-tailed value of tcrit in the t-table at your a
for df ⫽ N ⫺ 1, where N is the sample N.
STEP 3: Add the above positive and negative answers
to the X. After adding {1.476 to 7.78, we
have 6.30 ⱕ m ⱕ 9.26.
This is the finished confidence interval: We are
confident our sample mean represents a m that is
greater than or equal to 6.30, but less than or equal
to 9.26. In other words, if we could measure the
optimism scores of all men in the population, we
expect their m would be between these two values.
(Notice that after Step 2, you have the margin of
error, because we expect the m is 7.78, plus or minus
1.476.)
> Quick Practice
>
A confidence interval for m provides a
range of ms, any one of which our X is
likely to represent.
More Examples
A tobt is significant with X ⫽ 50, N ⫽ 20, and sX ⫽ 4.7. To
compute the 95% confidence interval, df ⫽19, so the
two-tailed tcrit ⫽ {2.093. Then,
(s X )(⫺tcrit )⫹X ⱕ m ⱕ (s X )(⫹tcrit )⫹ X
(4.7)(⫺2.093) ⫹ 50 ⱕ m ⱕ (4.7)(⫹2.093) ⫹ 50
(⫺9.837) ⫹ 50 ⱕ m ⱕ (⫹9.837) ⫹ 50
40.16 ⱕ m ⱕ 59.84
Use the two-tailed critical value
when computing a confidence
interval even if you have
performed a one-tailed t-test.
For Practice
1. What does this 95% confidence indicate:
15 ⱕ m ⱕ 20?
2. With N ⫽ 22, you perform a one-tailed test
(a ⫽ .50). What is tcrit for computing the confidence
interval?
3. The tobt is significant when X ⫽ 35, sX ⫽ 3.33, and
N ⫽ 22. Compute the 95% confidence interval.
> Answers
3. (3.33)(⫺2.080)⫹35 ⱕ m ⱕ (3.33)(⫹2.080)⫹35 ⫽
28.07 ⱕ m ⱕ 41.93
2. With df ⫽ 21, the two-tailed tcrit ⫽ {2.080.
1. We are 95% confident that our X represents a m
between 15 and 20.
Because we created our interval using the tcrit
for an a of .05, there is a 5% chance that our m
is outside of this interval. On the other hand, there
is a 95% chance that the m is within the interval.
Therefore, we have created what is called the 95%
confidence interval: We are 95% confident that the
interval between 6.30 and 9.26 contains our m. Usually this gives us sufficient confidence. However,
had we used the tcrit for a ⫽ .01, the interval would
have spanned a wider range, giving us even more
confidence that the interval contained the m. Then
we would have created the 99% confidence interval. Usually, researchers report the 95% confidence
interval.
Thus, we conclude our one-sample t-test by saying, with 95% confidence, that our sample of men
represents a m between 6.30 and 9.26. The center
of the interval is still at our X of 7.78, but now we
have much more information than if we had merely
said m is somewhere around 7.78. Therefore, you
should compute a confidence interval anytime you
are describing the m represented by the X in a condition of a significant experiment, or in any type of
study in which you believe a X represents a distinct
population m.
Compute a confidence interval
to estimate the m represented
⫺
by the X of a condition in an
experiment.
Chapter 8: Hypothesis Testing Using the One-Sample t-Test
137
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
8-5 STATISTICS IN THE
RESEARCH LITERATURE:
REPORTING t
Report the results of a one- or two-tailed t-test in the
same way that you reported the z-test, but also include
the df. In our optimism study, we had 8 df, the tobt was
⫺3.47, and with a ⫽ .05, the result was significant.
We report this as:
t(8) ⫽ ⫺3.47, p ⬍ .05
Notice the df in parentheses. (Had these results not
been significant, then we’d have p ⬎ .05.)
Usually, confidence intervals are reported in sentence form, and we always indicate the confidence
level used. So you might say, “The 95% confidence
interval for the m of men was 6.30 to 9.26.”
Note that researchers usually report the smallest
value of a at which a result is significant For example,
it turns out that when a is .01, tcrit is {3.355, so our
tobt of ⫺3.47 also would be significant if we had used
the .01 level. Therefore, instead of saying p ⬍ .05 above,
we would provide more information by reporting that
p ⬍ .01 because then we know that the probability of a
Type I error is not in the neighborhood of .04, .03, or .02.
Further, computer programs like SPSS determine
the precise, minimum size of the region of rejection that
our tobt falls into, so researchers often report the exact
probability of a Type I error. For example, you might
see “p ⫽ .04.” This indicates that tobt falls into a region
of rejection that is .04 of the curve, and therefore the
probability of a Type I error equals .04. This probability is less than the maximum of .05 that we require,
so we conclude this result is significant. On the other
hand, say that you see p ⫽ .07. Here, a larger region
of rejection is needed for the results to be significant,
However, with a region of rejection that is .07 of the
curve, the probability of a Type I error is now .07. This
probability is larger than the maximum of .05 that we
require, so we conclude this result is not significant.
USING SPSS
As described on Review Card 8.4, SPSS will perform the one-sample t-test, computing tobt, the X , sX , sX , and
the 95% confidence interval. Also, as described in the previous section, SPSS indicates the smallest region of
rejection that our tobt will fall into. This is labeled as “Sig. (2-tailed)” and tells you the smallest two-tailed alpha
level at which your tobt can be considered significant.
Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out
and use the Chapter Review Cards in the back of your book. Check out the additional study aids online
in CourseMate at www.cengagebrain.com
STUDY PROBLEMS
(Answers for odd-numbered questions are in Appendix C.)
1. Using the terms relationship and sampling
error, what are the two explanations for any
experiment’s results?
2. (a) Why do we reject H0 when tobt is in the region
of rejection? (b) Why do we retain H0 when tobt is
not in the region of rejection?
3. (a) How must your experiment be designed so that
you can perform the one-sample t-test? (b) What
138
type of dependent scores should you have? (c) How
do you choose between the t-test and the z-test?
4. (a) What is the difference between sX and sX ?
(b) How is their use the same?
5. (a) Why don’t we always use 1.96 or 1.645 as
the critical value in the t-test? (b) Why are there
different values of tcrit when samples have different Ns? (c) What must you determine in order to
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
find tcrit in a particular study? (d) What does df
equal in the one-sample t-test?
With a ⫽ .05 in a two-tailed test: (a) What is tcrit
when N is 25? (b) What is tcrit when N is 61? (c)
Why does tcrit decrease as N increases?
What is the final step when dealing with
significant results in any study?
(a) What does a confidence interval for m indicate?
(b) When do we compute a confidence interval?
We conduct a one-sample experiment to determine
if performing 15 minutes of meditation daily reduces
an individual’s stress levels: X ⫽ 73.2. In the population of people who don’t meditate, stress levels produce m ⫽ 88. With N ⫽ 31 we compute tobt ⫽ ⫹2.98.
(a) Is this a one- or a two-tailed test? (b) What are H0
and Ha? (c) What is tcrit? (d) Are the results significant? (e) Should we compute a confidence interval?
(f) To compute the confidence interval, what is tcrit?
Say our X ⫽ 44. (a) Estimate the m using point estimation. (b) What additional information does a confidence interval tell you? (c) Why is computing a confidence interval better than using a point estimate?
The layout of this textbook is different from that
of the typical textbook. We ask whether this is
beneficial or detrimental to learning statistics. On
a national statistics exam, m ⫽ 68.5 for students
using other textbooks. A sample of 10 students
using this book has X ⫽ 78.5, sx2 ⫽ 130.5.
(a) What are H0 and Ha for this study? (b) Compute
tobt. (c) With a ⫽ .05, what is tcrit? (d) What do you
conclude about the use of this book? (e) Report
your results using the correct format. (f) Compute
the confidence interval for m if appropriate.
A researcher predicts that smoking cigarettes
decreases a person’s sense of smell. On a test
of olfactory sensitivity, the m for nonsmokers is
18.4. A sample of 12 people who smoke a pack
a day produces X ⫽ 16.25, sX2 ⫽ 4.75. (a) What
are H0 and Ha for this study? (b) Compute tobt.
(c) What is your tcrit? (d) What should the researcher
conclude about this relationship? (e) Report your
results using the correct format. (f) Compute the
confidence interval for m if appropriate.
Bonita studies if hearing an argument in favor of an
issue alters participants’ attitudes toward the issue
one way or the other. She presents a brief argument to 8 people. In a national survey about this
issue, m ⫽ 50. She obtains X ⫽ 53.25 and sX2 ⫽ 69.86.
(a) What are H0 and Ha? (b) What is tobt? (c) What is
tcrit? (d) Report the results in the correct format and
indicate if they are significant. Should she compute
the confidence interval for m? (e) What should
Bonita conclude about the relationship?
In question 13, (a) what error has Bonita
potentially made? (b) With her variables, what
would the error be? (c) What statistical principle
should she be concerned with and why?
We ask whether people who usually use the
grammar-checking function in a word-processing
program make more or fewer grammatical errors
in a hand-written draft. On a national test of students using such programs, the number of errors
per page is m ⫽ 12. A sample prohibited from
using the program for a semester has these scores:
8, 12, 10, 9, 6, 7
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
(a) What are H0 and Ha? (b) Perform the t-test and
draw the appropriate conclusion. (c) Compute the
confidence interval if appropriate.
We study the effect of wearing uniforms in middle
school on attitudes toward achieving good grades.
On a national survey, the average attitude score
for students who do not wear uniforms is m ⫽ 79.
A sample of 41 students who wear uniforms has
scores of X ⫽ 83.5, sX2 ⫽ 159.20. (a) What are H0 and
Ha? (b) Perform the procedures to decide about
the effect of wearing uniforms. (c) If we measured
the population wearing uniforms, what m do you
confidently predict we would find?
You read that the starting mean salary in your
chosen profession is $46,000 per year, { +4,000.
(a) What is this type of estimate called?
(b) Interpret this report.
(a) Is the one-tailed or two-tailed tcrit used to
compute a confidence interval? (b)Why?
Senator Smith has an approval rating of 35%, and
Senator Jones has an approval rating of 37%. For
both, the margin of error is {3%. A news report
says that, statistically, they are tied. Explain why
this is true.
SPSS computed two t-tests with the following
results. For each, using a ⫽ .05, should you conclude the result is significant, and why? (a) p ⫽ .03;
(b) p ⫽ .09
While reading research reports, you encounter
the following statements. For each, identify
the N, the predicted relationship, the outcome,
and the possible type of error being made.
(a) “When we examined the perceptual skills
data (M ⫽ 55, SD ⫽ 11.44), comparing adolescents
to adults produced t(45) ⫽ ⫹3.76, p ⬍ .01.”
(b) “The influence of personality type failed to
produce a difference in emotionality scores, with
t(99) ⫽ ⫺1.72, p ⬎ .05.”
In a two-tailed test, a ⫽ .05 and N is 35. (a) Is
tobt ⫽ ⫹2.019 significant? (b) Is tobt ⫽ ⫺2.47
significant?
Report the results in problem 22 using the correct
format.
Study A reports results with p ⫽ .031. Study B
reports results with p ⬍ .001. What is the difference between these results in terms of: (a) how
significant they are? (b) the size of their critical
values? (c) the size of their regions of rejection?
(d) the probability of a statistical error?
Summarize the steps involved in conducting
a one-sample experiment (with a t-test) from
beginning to end.
Chapter 8: Hypothesis Testing Using the One-Sample t-Test
139
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter
9
HYPOTHESIS TESTING USING
THE TWO-SAMPLE t-TEST
LOOKING
BACK
GOING
F O R WA R D
Be sure you understand:
Your goals in this chapter are to learn:
• From Chapter 1, what a condition,
independent variable, and dependent
variable are.
• The logic of a two-sample experiment.
• From Chapter 8, how to perform
the one-sample t-test using the
t-distribution and df, and what a
confidence interval is.
Sections
9-1
Understanding the
Two-Sample Experiment
9-2
The Independent-Samples
t-Test
• The difference between independent samples and related samples.
• When and how to perform the independent-samples t-test.
• When and how to perform the related-samples t-test.
2
• What effect size is and how it is measured using Cohen’s d or rpb
.
T
his chapter presents the two-sample t-test, which is
the major parametric procedure used when an experiment involves two samples. As the name implies, this
test is similar to the one-sample t-test you saw in
Chapter 8. However, we have two ways to create a two-sample
design, and each requires a different procedure and formulas.
Performing the
Independent-Samples
t-Test
So that you don’t get confused, view the discussion of each
9-4
The Related-Samples
t-Test
called the independent-samples t-test. Then we will discuss the
9-5
Performing the
Related-Samples t-Test
related-samples t-test. Finally, we will discuss a new technique
9-6
Statistics in the Research
Literature: Reporting a
Two-Sample Study
9-7
Describing Effect Size
140
Behavioral Sciences STAT2
9-3
procedure as separate and distinct—a mini-chapter. First we
will discuss the t-test for one type of two-sample experiment,
t-test for the other type of two-sample experiment, called the
for describing the relationship in either type of experiment,
called effect size.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Fancy Collection/SuperStock
9-1 UNDERSTANDING THE
TWO-SAMPLE EXPERIMENT
under Condition 2. A possible outcome from such an
experiment is shown in Figure 9.1. If each condition
represents a different population, then the experiment
has demonstrated a relationship in nature.
However, there’s the usual problem of sampling
error. Even though we may have different sample
means, changing the conditions of the independent
variable may not really change the dependent scores
in nature. Instead, we might find the same population of scores under each condition, but one or both
of our conditions poorly represent this population.
The one-sample experiment discussed in previous
chapters is not often found in real research, because
it requires that we know the value of m for a population under one condition of the independent variable. However, because we explore new behaviors
and variables, we usually do not know m ahead of
time. Instead, the much more common approach is
to conduct a two-sample
experiment, measuring
Figure 9.1
participants’ dependent
Relationship in the Population in a Two-Sample Experiment
scores under two condiAs the conditions change, the population tends to change in a consistent fashion.
tions of the independent
variable. Condition 1
Condition 1
Condition 2
produces one sample
µ1
µ2
X1
X2
mean—call it X1—that
represents m1, the m we
would find if we tested
everyone in the populaf
tion under Condition 1.
Condition 2 produces
another sample mean—
call it X2—that repreX
X
X
X
X
X
X
X
X
X
X
X
X
X
sents m2, the m we would
Low scores
find if we tested everyDependent scores
one in the population
X
X
X
X
High scores
Chapter 9: Hypothesis Testing Using the Two-Sample t-Test
141
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
independentsamples t-test
The parametric
procedure used to
test sample means
from two independent
samples
independent
samples Samples
created by selecting
each participant
for one condition
without regard to the
participants selected for
any other condition
homogeneity
of variance
The requirement
that the populations
represented in a study
have equal variances
Thus, in Figure 9.1 we might find
only the lower or upper distribution, or we might find one in
between. Therefore, before we make
any conclusions about the experiment, we must determine whether
the difference between the sample
means reflects sampling error.
The parametric statistical procedure for determining whether the
results of a two-sample experiment
are significant is the two-sample
t-test. However, we have two different ways to create the samples, so
we have two different versions of the
t-test: One is called the independentsamples t-test, and the other is the
related-samples t-test.
Here is a study that calls for the independentsamples t-test. We propose that people may recall an
event differently when they are hypnotized. To test
this, we’ll have two groups watch a videotape of a
supposed robbery. Later, one group will be hypnotized and then answer 30 questions about the event.
The other group will answer the questions without
being hypnotized. Thus, the conditions of the independent variable are the presence or absence of hypnosis, and the dependent variable is the amount of
information correctly recalled. This design is shown
in Table 9.1. We will compute the mean of each condition (each column). If the means differ, we’ll have
evidence of a relationship where, as amount of hypnosis changes, recall scores also change.
First we check that our study meets the assumptions of the statistical test. In addition to requiring independent samples, this t-test has two other requirements:
1. The dependent scores are normally distributed
interval or ratio scores.
2. Here’s a new one: The populations have homogeneous variance.
9-2 THE INDEPENDENTSAMPLES t-TEST
The independent-samples t-test is the parametric
procedure for testing two sample means from
independent samples. Two samples are independent when we randomly select participants for a condition without regard to who else has been selected
for either condition. Then the scores
res in
one sample are not influenced by—are
—are
“independent” of—the scores in the
other sample. You can recognize indendependent samples by the absence off
things such as matching the participants in one condition with those
in the other condition or repeatedly testing the same participants
in both conditions.
142
Homogeneity of variance means that the variances
(s2x) of the populations being represented are equal.
You can determine if your data meet these
assumptions by seeing how other researchers analyze
your variables in the research literature.
Note: You are not required to have the same number of participants in each condition, but the samples
should not be massively unequal.
Table 9.1
Diagram of Hypnosis Study Using an
Independent-Samples Design
The independent variable is amount of hypnosis, and the dependent
variable is recall.
Recall Scores ➝
©iStockphoto.com/Grafissimo;
©iStockphoto.com/Jaroslaw Wojcik
The two ways to calculate
the two-sample t-test are the
independent-samples t-test and the
related-samples t-test.
No Hypnosis
X
X
X
X
X
Hypnosis
X
X
X
X
X
X
X
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
9-2a Statistical
sampling
distribution
of differences
between
means Shows all
Depending on our exper-imental hypotheses, we
may perform either a
one- or two-tailed test.
t.
Let’s begin with a twowotailed test: We simply predict
that the hypnosis condition
ion will produce recall scores that aree different from those in the
no-hypnosis condition, so our samples represent different populations that have different ms.
First, the alternative hypothesis: The predicted
relationship exists if one population mean (m1) is larger
or smaller than the other (m2). That is, m1 should not
equal m2. We could state this as Ha: m1 ⬆ m2 , but there
is a better way. If the two ms are not equal, then their
difference does not equal zero. Thus, the two-tailed
alternative hypothesis is
Ha: m1 ⫺ m2 ⬆ 0
Ha implies that the means from our conditions
each represent a different population of recall scores,
so a relationship is present.
Now, the null hypothesis: If no relationship
exists, then if we tested everyone under the two conditions, each time we would find the same population
of recall scores having the same m. In other words,
m1 equals m2. We could state this as H0: m1 ⫽ m2 , but,
again, there is a better way. If the two ms are equal,
then their difference is zero. Thus, the two-tailed null
hypothesis is
H0: m1 ⫺ m2 ⫽ 0
H0 implies that both sample means represent the
same population of recall scores, which have the same
m, so no relationship is present. If our sample means
differ, H0 maintains that this is due to sampling error
in representing that one m.
Notice that the above hypotheses do not contain a
specific value of m. Therefore, they are the two-tailed
hypotheses for any dependent variable. However, this
is true only when you test whether the data represent
zero difference between the populations. This is the
most common approach and the one we will use. (You
can also test for nonzero differences: You might know
©iStockphoto.com/RTimages
Hypotheses for the
he
IndependentSamples t-Test
differences between
two means that
occur when samples
are drawn from the
population of scores
that H0 says we are
representing
of an existing difference between tw
two populations and
test if the independent variable alters
alte that difference.
Consult an advanced statistics book for the details of
this test.)
As usual, we test the null hypothesis, and to do
that we examine the sampling distribution.
9-2b The Sampling Distribution
for the Independent-Samples t-Test
To understand the sampling distribution, let’s say that
we find a mean recall score of 20 in the no-hypnosis
condition and a mean of 23 in the hypnosis condition. We summarize these results using the difference
between our means. Here, changing from no hypnosis
to hypnosis results in a difference in mean recall of
3 points. We always test H0 by finding the probability of obtaining our results when no relationship is
present, so here we will determine the probability of
obtaining a difference of 3 between our Xs when they
actually represent zero difference in the population.
Think of the sampling distribution as being created
in the following way. We select two random samples
from one raw score population, compute the means,
and arbitrarily subtract one from the other. (Essentially,
this is what H0 implies that we did in our study, coincidently labeling the sample containing higher scores as
“Hypnosis”.) When we arbitrarily subtract one mean
from the other, the result is the difference between the
means, symbolized by X1 ⫺ X2. If we do this an infinite
number of times and plot the frequency distribution,
we have the sampling distribution of differences
between means. This is the distribution of all possible differences between two means when both samples
come from the one raw score population that H0 says
we are representing. You can envision this sampling
distribution as in Figure 9.2.
Chapter 9: Hypothesis Testing Using the Two-Sample t-Test
143
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The mean of the
Figure 9.2
sampling distribution is
Sampling Distribution of Differences between Means When H0: m1 ⫺ m2 ⫽ 0
zero because, most often,
The X axis has two labels: Each X 1⫺X 2 symbolizes the difference between two sample means; when labeled as t, a
both sample means will
larger { t indicates a larger difference between means that is less likely when H0 is true.
equal the m of the raw
score population, so
µ
their difference will be
zero. Sometimes, however, because of sampling error, both sample
f
means will not equal m
or each other. Depending on whether X1 or X2
is larger, the difference
Differences
X1 – X2
X1 – X2
X1 – X2
X1 – X2
X1 – X2
X1 – X2
X1 – X2
0 X1 – X2
will be positive or negaValues of t
...
–3.0
–2.0
–1.0
0
+1.0
+2.0
+3.0
...
tive. In Figure 9.2, the
negative differences are
Larger negative differences
Larger positive differences
in the left half of the distribution, and the positive differences are in the
of scores in a study. (Until now we’ve only had one
right half. Small negative or positive differences occur
sample so N was both). Now, with two conditions,
frequently, but larger ones do not.
we will use the lowercase n to stand for the number
To test H0, we compute a new version of tobt to
of scores in each sample. Thus, n1 is the number of
determine where on this sampling distribution the
scores in Condition 1, and n2 is the number of scores
difference between our sample means lies. As in
in Condition 2. (Adding the ns together equals N.)
Figure 9.2, the larger the value of { tobt, the farther
Likewise, we will compute an estimated population
into the tail of the distribution our difference lies, so
variance for each condition, using the symbols s21 for
the less likely it is to occur when H0 is true (when
the variance in Condition 1 and s22 for the variance in
really there is no relationship).
Condition 2.
Arbitrarily decide which condition will be Condition 1 and which will be Condition 2. Then, as in
the previous chapter, you compute tobt by performing three steps: (1) estimating the variance of the raw
score population, (2) computing the estimated stanThe independent-samples t-test
dard error of the sampling distribution, and (3) computing tobt.
determines the probability of
obtaining our difference between
Xs when H0 is true.
9-3 PERFORMING THE
INDEPENDENT-SAMPLES
t-TEST
Before computing tobt we must first expand our set
of symbols. Previously, N has been the number of
scores in a sample, but actually N is the total number
144
STEP 1: Compute the mean and estimated population variance in each condition. Using the
scores in Condition 1, compute X1 and s21;
using the scores in Condition 2, compute
X2 and s22.
THE FORMULA FOR THE ESTIMATED
VARIANCE IN A CONDITION IS
(⌺X ) 2
n
n⫺1
⌺X 2 ⫺
s2x ⫽
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
can determine how spread out the
sampling distribution is by computing the standard error.
N stands for the total number
of scores in an experiment; n
stands for the number of scores
in a condition.
STEP 2: Compute the pooled variance. Both s21 and
s22 estimate the population variance, but each
may contain sampling error. To obtain the best
estimate, we compute an average of the two.
Each variance is “weighted” based on the size
of its sample. This weighted average is called
the pooled variance, and its symbol s2pool.
THE FORMULA FOR THE POOLED
VARIANCE IS
s2pool ⫽
(n1 ⫺ 1)s21 ⫹ (n2 ⫺ 1)s22
(n1 ⫺ 1) ⫹ (n2 ⫺ 1)
STEP 3: Compute the standard
error of the difference.
The standard error of
the difference is the
standard deviation of the
sampling distribution of
differences between means
(of the distribution back
in Figure 9.2). The symbol
for the standard error of
the difference is sX–– ⴚX–– .
1
(17 ⫺ 1)9.0 ⫹ (15 ⫺ 1)7.5
(17 ⫺ 1) ⫹ (15 ⫺ 1)
After subtracting, we have
s2pool ⫽
(16)9.0 ⫹ (14)7.5
16 ⫹ 14
In the numerator, 16 times 9 is 144, and 14 times 7.5
is 105. In the denominator, 16 plus 14 is 30, so
144 ⫹ 105 249
s2pool ⫽
⫽
⫽ 8.3
30
30
The s2pool is our estimate of the variability in the
raw score population that H0 says we are representing. As in previous procedures, once we know how
spread out the underlying raw score population is, we
standard
error of the
difference
(sX––1ⴚX––2) The
estimated standard
deviation of the
sampling distribution
of differences
between the means
2
THE FORMULA FOR THE STANDARD ERROR
OF THE DIFFERENCE IS
1
s2pool ⫽
The weighted
average of the sample
variances in a twosample t-test
In previous chapters we computed the standard
error by dividing the variance by N and then taking
the square root. However, instead of dividing by N we
can multiply by 1/N. Then for the two-sample t-test,
we substitute the pooled variance and our two ns, producing this formula:
sX ⫺X ⫽
This says to determine n1 ⫺ 1 and multiply this by
the s21 that you computed above. Likewise, find n2 ⫺ 1
and multiply it by s22. Add the results together and
divide by the sum of (n1 ⫺ 1) ⫹ (n2 ⫺ 1).
For example, say that the hypnosis study produced the results shown in Table 9.2. Filling in the
formula, we have
pooled
variance (s2pool)
2
B
2
(spool
)a
1
1
⫹ b
n1 n2
To compute sX ⫺X , first reduce the fraction 1/n1 and
1/n2 to decimals. Then add them together. Then multiply the sum times s2pool, which you computed in Step 2.
Then find the square root.
For the hypnosis study, s2pool is 8.3, n1 is 17, and n2
is 15. Filling in the formula gives
1
sX ⫺X ⫽
1
2
B
2
(8.3)a
1
1
⫹ b
17 15
Table 9.2
Data from the Hypnosis Study
Condition 1:
Hypnosis
X1 ⫽ 23
Condition 2:
No Hypnosis
X 2 ⫽ 20
Number of participants
n1 ⫽ 17
n2 ⫽ 15
Estimated variance
s21 ⫽ 9.0
s22 ⫽ 7.5
Mean recall score
Chapter 9: Hypothesis Testing Using the Two-Sample t-Test
145
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
First, 1/17 is .059 and 1/15 is .067, so
> Quick Practice
sX ⫺X ⫽ 28.3(.059 ⫹ .067)
1
2
To compute the independent-samples tobt:
After adding,
sX ⫺X ⫽ 28.3(.126) ⫽ 21.046 ⫽ 1.023
1
2
STEP 4: Compute tobt. In previous chapters we’ve calculated how far the result of our study (X)
was from the mean of the sampling distribution (m) when measured in standard error
units. Now the “result of our study” is the
difference between our two sample means,
which we symbolize as (X1 ⫺ X2). The mean
of the sampling distribution is the difference
between the ms described by H0 and is symbolized by (m1 – m2). Finally, our standard
error is sX ⫺X . All together, we have
1
>
>
>
>
Compute X1, s21, and n1; X2, s22, and n2.
Then compute the pooled variance (s2pool).
Then compute the standard error of the
difference (sX––1⫺X––2).
Then compute tobt.
More Examples
An independent-samples study produced the following data: X1 ⫽ 27, s21 ⫽ 36, n1 ⫽ 11, X2 ⫽ 21, s22 ⫽ 33, and
n2 ⫽ 11.
2
s2pool ⫽
THE FORMULA FOR THE INDEPENDENTSAMPLES t-TEST IS
tobt ⫽
(X1 ⫺ X2) ⫺ (m1 ⫺ m2)
sX ⫺X
1
sX ⫺X ⫽
2
1
Here, X1 and X2 are the sample means, sX ⫺X is computed in Step 3, and the value of m1 ⫺ m2 is specified
by the null hypothesis. We write H0 as m1 ⫺ m2 ⫽ 0 to
indicate that the value of m1 ⫺ m2 to put into this formula is always zero. Then the formula measures how
far our difference between Xs is from the zero difference between the ms that H0 says we are representing,
when measured in standard error units.
For the hypnosis study, our sample means are 23
and 20, the difference between m1 and m2 is 0, and
sX ⫺X is 1.023. Putting these values into the formula
gives
1
1
2
⫽
2
2
tobt ⫽
⫽
(23 ⫺ 20) ⫺ 0
1.023
tobt ⫽
(n1 ⫺ 1)s21 ⫹ (n2 ⫺ 1)s22
(n1 ⫺ 1) ⫹ (n2 ⫺ 1)
(10)36 ⫹ (10)33
⫽ 34.5
10 ⫹ 10
B
B
s2pool a
1
1
⫹ b
n1 n2
34.5a
1
1
⫹ b ⫽ 2.506
11 11
(X1 ⫺ X2) ⫺ (m1 ⫺ m2)
sX ⫺X
1
2
(27 ⫺ 21) ⫺ 0
⫽
⫽ ⫹2.394
2.506
For Practice
We find X1 ⫽ 33, s21 ⫽ 16, n1 ⫽ 21, X2 ⫽ 27, s22 ⫽ 13, and
n2 ⫽ 21
1. Compute the pooled variance (s2pool).
2. Compute the standard error of the difference (sX ⫺X ).
1
2
3. Compute tobt.
After subtracting the means:
1
2
2. sX ⫺X ⫽
2
1. spool
⫽
B
14.5a
1
1
⫹ b ⫽ 1.18
21 21
(20)16 ⫹ (20)13
⫽ 14.5
20 ⫹ 20
146
(33 ⫺ 27) ⫺ 0
⫽ ⫹5.08
1.18
Our tobt is ⫹2.93. Thus, the difference between our
sample means is located at something like a z-score
of ⫹2.93 on the sampling distribution of differences
when both samples represent the same population.
> Answers
3. tobt ⫽
(⫹3) ⫺ 0
⫹3
tobt ⫽
⫽
⫽ ⫹2.93
1.023
1.023
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
the IndependentSamples t-Test
To determine if tobt is significant, we compare it to tcrit
which is found in the t-table
(Table 2 in Appendix B).
We again obtain tcrit using
degrees of freedom, but
with two samples, the df is
computed differently.
Figure 9.3
H0 Sampling Distribution of Differences between Means When m1 ⫺ m2 ⫽ 0
The tobt shows the location of a difference of ⫹3.0.
µ
f
Differences
THE FORMULA FOR
THE DEGREES OF
FREEDOM FOR THE
INDEPENDENTSAMPLES t-TEST IS
Values of t
X1 – X2
X1 – X2
–3.0
df ⫽ (n1 ⫺ 1) ⫹ (n2 ⫺ 1)
where each n is the number of scores in a condition.
Another way of expressing
pressing this is df
d ⫽ (n1 ⫹ n2) ⫺ 2
For the hypnosis study,
tudy, n1 ⫽ 17 and n2 ⫽ 15, so
df equals (17 ⫺ 1) ⫹ (15
15 ⫺ 1), which is 30. With
alpha at .05, the two-tailed
ailed tcrit is { 2.042.
The complete sampling
pling distribution is in Figure
9.3. It shows all differences
nces between sample means
that occur through sampling
mpling error when the samples really represent no difference in the population
(when hypnosis does not influence recall). Our
H0 says that the difference
nce of ⫹3 between
our sample means is merely
erely a poor representation of no difference.
rence. But, the
sampling distribution shows that a
difference of ⫹3 hardly
ly ever occurs
when the samples represent
esent no difference. Therefore, it is difficult
fficult to believe
that our difference of ⫹3 represents no
difference. In fact, the tobt lies beyond tcrit,
so the results are significant:
nificant: Our
difference of ⫹3 is so unlikely
to occur if our samples
es are
representing no difference
ence
in the population that we
reject that this is what
hat
they represent.
Thus, we reject H0
and accept Ha that our
data represent a difference
ence
X1 – X2
–2.0
X1 – X2
–1.0
0
0
– tcrit = –2.042
X1 – X2
X1 – X2
X1 – X2
+1.0
+2.0
+ tcrit = +2.042
X1 – X2
+3.0
+ tobt = +2.93
between ms that is not zero. We can summarize this by
saying that our difference of ⫹3 is significantly different from 0. Or we can say that our two means differ
significantly from each other. The mean for hypnosis (23) is larger
larg than the mean for
no hypnosis
hypno
(20), so we can
also conclude
co
that hypnosis
leads to
t significantly higher
recall scores. (As usual
with a ⫽ .05, the probability of a Type I error
abili
is p ⬍ .05.)
If our tobt had not
been be
beyond tcrit, then this
would indicate that our difference
between means occurs ofte
often when representing
the situation where there iis no difference in the
population. Therefore, we would
w
not reject H0,
and we would have no evide
evidence for or against a
relationship between hypnosi
hypnosis and recall. Then we
would consider if our design had sufficient power
so that we’d be confident we h
had not made a Type
II error (retaining a false H0).
Because w
we did find a significant result, w
we describe and interpret the rela
relationship. First, from
our sample m
means, we expect the
m for no hypnosis
hy
to be around
20 and the m for hypnosis to be
around 23. T
To precisely describe
these ms, we could compute a confidence inter
interval for each m, using
©iStockphoto.com/CurvaBezier; ©iStockphoto.com/
Jaroslaw Wojcik ©iStockphoto.com/VikramRaghuvanshi
9-3a Interpreting
Chapter 9: Hypothesis Testing Using the Two-Sample t-Test
147
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
the formula in the previous chapter and the data from
one condition at a time.
Alternatively, we could compute the confidence
interval for the difference between two ms. In our
study, the difference between our means was ⫹3, so
we expect the difference between the population ms
for hypnosis and no hypnosis to be around ⫹3. This
confidence interval describes a range of differences
between ms within which we are confident the actual
difference between our ms falls. The computations for
this confidence interval are presented in Appendix A.2.
Finally, remember that finding a significant result
is not the end of the story. Now we become behavioral researchers again, interpreting and explaining
the relationship in terms of our variables: What are
the psychological or neurological mechanisms that
resulted in improved recall after hypnosis? And note:
An important piece of information for interpreting the
influence of your independent variable is called the
effect size, which is described in the final section of
this chapter.
difference that is in the right-hand tail of the sampling distribution, so tcrit is positive.)
6. Complete the t-test as we did previously. Be
careful to subtract the Xs in the same way you
subtracted the ms! (We used mh ⫺ mn, so we’d
compute Xh ⫺ Xn.)
Confusion arises because, while still predicting a
larger mh, we could have reversed Ha, saying mn ⫺ mh ⬍ 0.
Subtracting the larger mh from the smaller mn should
produce a negative difference, so now the region of
rejection is in the left-hand tail, and tcrit is negative.
9-3c Summary of the
Independent-Samples t-Test
After checking that the study meets the assumptions,
the independent-samples t-test involves the following.
1. Create either the two-tailed or the one-tailed H0
and Ha.
2. Compute tobt by following these four steps.
9-3b Performing One-Tailed Tests
on Independent Samples
Recall that one-tailed tests are used only when we can
confidently predict the direction the dependent scores
will change. For example, we could have conducted the
hypnosis study using a one-tailed test if we had reason
to believe that hypnosis results in higher recall scores
than no hypnosis. Everything discussed above applies
here, but to prevent confusion, use more meaningful
subscripts than 1 and 2. For example, use the subscript
h for hypnosis and n for no-hypnosis. Then follow
these steps:
1. Decide which X and corresponding m is expected
to be larger. (We think the m for hypnosis is
larger.)
2. Arbitrarily decide which condition to subtract
from the other. (We’ll subtract no-hypnosis from
hypnosis.)
3. Decide whether the difference will be positive or
negative. (Subtracting what should be the smaller
mn from the larger mh should produce a positive
difference, one that’s greater than zero.)
4. Create Ha and H0 to match this prediction. (Our
Ha is that mh ⫺ mn ⬎ 0; H0 is that mh ⫺ mn ⱕ 0.)
5. Locate the region of rejection based on your
predictions and subtraction. (We expect a positive
148
a. Compute X1, s21, and n1; X2, s22, and n2.
b. Compute the pooled variance (s2pool).
c. Compute the standard error of the difference
(sX ⫺X ).
1
2
d. Compute tobt.
3. Set up the sampling distribution: Find tcrit in the
t-table using df ⫽ (n1 ⫺ 1) ⫹ (n2 ⫺ 1).
4. Compare tobt to tcrit. If tobt is beyond tcrit, the
results are significant; describe the relationship. If tobt is not beyond tcrit, the results are
not significant; make no conclusion about the
relationship.
5. If the results are significant, compute the “effect
size” as described in Section 9.7.
> Quick Practice
>
Perform the independent-samples
t-test in experiments that test two
independent samples.
More Examples
We perform a two-tailed experiment, so H0: m1 ⫺ m2 ⫽ 0
and Ha: m1 ⫺ m2 ⬆ 0. The X1 ⫽ 24, s21 ⫽ 9, n1 ⫽ 14,
X2 ⫽ 21, s22 ⫽ 9.4, and n2 ⫽ 16.
(continued)
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
9-4 THE RELATED-SAMPLES
Then
s2pool ⫽
⫽
sX ⫺X ⫽
1
2
⫽
tobt ⫽
(n1 ⫺ 1) ⫹ (n2 ⫺ 1)
(13)9 ⫹ (15)9.4
⫽ 9.214
13 ⫹ 15
B
B
s2poola
1
1
⫹ b
n1 n2
9.214a
1
1
⫹ b ⫽ 1.111
14 16
(X1 ⫺ X2) ⫺ (m1 ⫺ m2)
sX ⫺X
1
⫽
t-TEST
(n1 ⫺ 1)s21 ⫹ (n2 ⫺ 1)s22
2
(24 ⫺ 21) ⫺ 0
⫽ ⫹2.70
1.111
With a ⫽ .05 and df ⫽ (n1 ⫺ 1) ⫹ (n2 ⫺ 1) ⫽ 28, tcrit ⫽
{2.048. The tobt is significant: We expect m1 to be 24
and m2 to be 21.
For Practice
We test whether “cramming” for an exam is harmful to
grades. Condition 1 crams for a pretend exam, but Condition 2 does not. Each n ⫽ 31, the cramming X is 43
(s2x ⫽ 64), and the no-cramming X is 48 (s2x ⫽ 83.6).
1. Subtracting cramming from no cramming, what are
H0 and Ha?
2. Will tcrit be positive or negative?
3. Compute tobt.
4. What do you conclude about this relationship?
> Answers
Now we will discuss the other way to analyze the
results of a two-sample experiment. The relatedsamples t-test is the parametric procedure used with
two related samples. Related samples occur when
we pair each score in one sample with a particular
score in the other sample. Researchers create related
samples to have more equivalent and thus comparable
samples. The two types of research designs that produce related samples are matched-samples designs and
repeated-measures designs.
In a matched-samples design, the researcher
matches each participant in one condition with a particular participant in the other condition. The matching is based on a variable relevant to the behavior
being studied, but not the independent or dependent
variable. For example, say we are studying some aspect
of playing basketball, so we might match participants
on the variable of their height. We would select pairs of
people who are the same height and assign one member
of the pair to each condition. Thus,
relatedif two people are 6 feet tall, one is
samples t-test
assigned to one condition and the
The parametric
procedure used for
other to the other condition. Liketesting sample means
wise, a 4-foot person in one condifrom two related
tion is matched with a 4-footer in
samples
the other condition, and so on. Then,
related
overall, the conditions are comparasamples Samples
created by matching
ble in height, so we’d proceed with
each participant in
the experiment. In the same way, we
one condition with
might match participants using their
a participant in the
age or physical ability, or we might
other condition
or by repeatedly
use naturally occurring pairs, such
measuring the same
as roommates or identical twins.
participants under all
The other way to create related
conditions
samples is with a repeated-meamatchedsures design, in which each particsamples
design When
ipant is tested under all conditions
each participant
of the independent variable. That
in one condition
is, first participants are tested under
is matched with a
participant in the
Condition 1, and then the same parother condition
ticipants are tested under Condition
repeated2. Although we have one sample of
measures
participants, we have two samples
design When the
of scores.
same participants
Matched-groups and repeatedare measured under
all conditions of the
measures designs are analyzed in the
independent variable
same way, using the related-samples
4. With a ⫽ .05 and df ⫽ 60, tcrit ⫽ ⫹1.671, tobt is significant: mc is around 43; mac is around 48.
tobt ⫽ (5)>2.190 ⫽ ⫹2.28
1
2
sX ⫺X ⫽ 173.80(.065) ⫽ 2.190;
2
3. spool
⫽ (1920 ⫹ 2508)>60 ⫽ 73.80;
2. Positive
1. Ha: mac ⫺ mc ⬎ 0; H0: mac ⫺ mc ⱕ 0
Chapter 9: Hypothesis Testing Using the Two-Sample t-Test
149
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
t-test. (Related samples are also called
dependent samples.) Except for requiring
related samples, the assumptions for this
t-test are the same as for the independentsamples t-test: (1) The dependent variable
involves normally distributed interval or
ratio scores and (2) the populations being
represented have homogeneous variance.
Because related samples form pairs of scores,
the ns in the two samples must be equal.
9-4a The Logic of the
Table 9.3
Finding the Difference Scores in the Phobia Study
Each D ⫽ Before ⫺ After
1 (Millie)
Before
Therapy
11
2 (Archie)
3 (Jerome)
16
20
4 (Althea)
17
⫺
⫺
5 (Leon)
10
⫺
Related-Samples t-Test
©iStockphoto.com/Eric Isselée
Let’s say that we have a new therapy to
N⫽5
test on arachnophobes—people who are
overly frightened by spiders. From the
local phobia club we randomly select the
unpowerful N of five participants and test our therapy
using repeated measures of two conditions: before
therapy and after therapy. Before therapy we measure
each person’s fear response to a picture of a spider,
measuring heart rate, perspiration, etc., and compute a
“fear” score between 0 (no fear) and 20 (holy terror!).
After providing the therapy, we again measure each
person’s fear response to the picture. (A before-andafter, or pretest/posttest, design such as this always
uses the related-samples t-test.)
The left side of Table 9.3 shows the fear scores
from the two conditions. First, we compute the mean
of each condition. Before therapy the mean fear score
is 14.80, and after therapy the mean is 11.20. It looks
as if therapy reduces fear scores by an average of
14.80 ⫺ 11.20 ⫽ 3.6 points.
But, here we go again! On the one hand, maybe we
are accurately representing that the therapy works in
nature: If we tested all such participants before and after
therapy, we would have two populations of fear scores
having different ms. On the other hand, maybe we are
inaccurately representing that the therapy does nothing to fear scores; If we tested everyone before
and after therapy, each time we would find
the same population of fear scores
with the same m.
150
X ⫽ 14.80
ⴚ
⫺
⫺
After
Therapy
8
ⴝ
D
D2
⫽
⫹3
9
⫽
⫹5
11
⫽
⫽
⫹5
⫹6
25
25
11
⫽
11
15
X ⫽ 11.20
36
1
⫺1
2
⌺D ⫽ ⫹18 ⌺D ⫽ 96
D ⫽ ⫹3.6
But, if we are testing the same people and our
therapy does nothing, why do our samples have different before and after scores that produce different
means? Because people are seldom perfectly consistent, Through random psychological and physiological fluctuations, anyone’s performance on a task may
change from moment to moment. So, perhaps by luck,
our participants were having a particularly bad, scary
day when they were tested before therapy, but were
having a good, not-so-scary day when tested after
therapy. This would give the appearance that the
therapy reduces fear scores, because one or both measurements are in error when we sample participants’
behaviors. So, maybe the therapy does nothing, but
we have sampling error in representing this, and we
obtained different means simply through the luck of
the draw of when we happened to test participants.
To resolve this issue, we perform the t-test. However, because of advanced statistical reasons, we cannot
directly create a sampling distribution for related samples the way we did for independent samples. Instead,
we must first transform the raw scores and then perform the t-test on the tran
transformed scores. As in the right
side of Table 9.3, we transform
tran
the data by finding the
difference between the two
tw fear scores for each participant. Thus, we subtract Millie’s
M
after score (8) from her
before score (11) for a difference
d
of ⫹3; we subtract
Archie’s after score (11) from
f
his before score (16) for
a difference of ⫹
⫹5; and so on. Notice that the
symbol for d
difference score is D. Here we
arbitrarily subtracted after-therapy from
before-therapy. You could subtract in
before-the
the op
opposite direction; just be sure
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
©iStockphoto.com/Hans Van IJzendoorn
to subtract all scores in the same direction. If this were
a matched-samples design, we’d subtract the scores in
each pair of matched participants.
Next, compute the mean difference, symbolized
as D. Add the positive and negative differences to find the
sum of the differences, symbolized by ⌺D. Then divide
by N, the number of difference scores. In Table 9.3,
D ⫽ 18>5 ⫽ ⫹3.6: The before scores were, on average,
3.6 points higher than the after scores. Notice this is
the same difference we found when we subtracted the
means of the original fear scores. (As in the far righthand column of Table 9.3, later we’ll need to square
each difference and then find the sum, finding ⌺D2.)
Finally, here’s a surprise: Because now we have one
sample mean from one sample of scores, we perform
the one-sample t-test! The fact that we have difference
scores is irrelevant, so we create the statistical hypotheses and test them in virtually the same way that we
did with the one-sample t-test in the previous chapter.
The related-samples t-test is
performed by applying the
one-sample t-test to the
difference scores.
we identify as mD. To create the statistical hypotheses,
we determine the predicted values of mD in H0 and Ha.
Let’s first perform a two-tailed test, predicting
that the therapy either raises or lowers fear scores.
The H0 always says our independent variable does
not work as predicted, so here it is as if we had provided no therapy and simply measured everyone’s fear
score twice. Ideally, everyone should have the same
score on both occasions so everyone in the population
should have a D of zero. However, this will not occur
because of those random fluctuations that cause participants to behave inconsistently. Implicitly, H0 says
everyone has good days and bad days so, by chance,
each person may exhibit higher or lower fear scores
at different times. This produces a variety of Ds that
are sometimes positive numbers and sometimes negative numbers, creating the population of different Ds
shown in the top portion of Figure 9.4. Notice that
larger positive or negative Ds occur less frequently.
Because chance produces the positive and negative Ds,
over the long run they should balance out so that the
average D in this population (mD) is zero. And, because
these Ds are generated from the situation where the
therapy does not work, this is the population that H0
says our sample of Ds represents. Thus, we have:
H 0: m D ⫽ 0
This implies that our D represents a mD of 0. If D does
not equal 0, it is because of sampling error in representing this population. (Likewise, in a matched-pairs
design, each pair of individuals would not perform
identically, so H0 would still say we have a population
of Ds with mD ⫽ 0.)
For the alternative hypothesis, if the therapy alters
fear scores in the population, then either the before
scores or the after scores will be consistently higher.
Then, after subtracting them, the population of Ds
will tend to contain only positive or only negative
scores. Therefore, mD will be a positive or a negative
number and not zero. So, the alternative hypothesis is:
H a: m D ⬆ 0
9-4b Statistical Hypotheses for
the Related-Samples t-Test
Our sample of difference scores represents the population of difference scores that would result if we measured everyone in the population before therapy and
again after therapy, and then computed their difference
scores. This population of difference scores has a m that
As usual, we test H0 by examining the sampling distribution, which
here is called the sampling distribution of mean differences. It is shown
in the bottom portion of Figure 9.4.
The underlying raw score population
used to create the sampling distribution is the population of Ds in the top
portion of Figure 9.4 that H0 says we
mean
difference
The mean of the
differences between
the paired scores in a
related-samples t-test;
symbolized as D in
the sample and mD in
the population
Chapter 9: Hypothesis Testing Using the Two-Sample t-Test
151
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
are representing. As usual, it is as
if we have infinitely sampled this
population using our N, and each
time computed D. Thus, the sam-
Figure 9.4
Population of Difference Scores Described by H0 and the Resulting
Sampling Distribution of Mean Differences
pling distribution of mean differences shows all possible values
of D that occur when samples are
drawn from the population of difference scores that H0 says we are
f
representing. Remember, these Ds
are produced when our two samples
of raw scores represent no relationship in the population. So, for the
phobia study, the sampling distriD
D
D
D
bution essentially shows all values
Larger negative D s
of D we might get by chance when
the therapy does not work. Because
larger positive or negative Ds occur
less frequently, the Ds that are farther into the tails of the distribution
are less likely to occur when H0 is
f
true and the therapy does not work.
Notice that the hypotheses
H0: mD ⫽ 0 and Ha: mD ⬆ 0 and
the above sampling distribution are
appropriate for the two-tailed test
D
D
D
D
for any dependent variable when
Larger negative D s
you test whether there is zero difference between your conditions. This
is the most common approach and the one we’ll discuss.
(You can test whether you’ve altered a nonzero difference. Consult an advanced statistics book for details.)
We test H0 by determining where on the above
sampling distribution our D is located. To do that, we
compute tobt.
9-5 PERFORMING THE
Distribution of difference scores
µD
D
D
D
0
are drawn from the
population of difference
scores that H0 says we are
representing
152
D
D
D
D
D
D
Larger positive D s
Distribution of mean differences
µD
D
D
D
0
D
D
D
D
D
D
D
Larger positive D s
population of difference scores shown on
the top in Figure 9.4. Replacing the Xs in
our previous formula for variance with
Ds gives
THE FORMULA FOR THE
ESTIMATED VARIANCE OF THE
DIFFERENCE SCORES IS
RELATED-SAMPLES t-TEST
Computing tobt here is identical to computing the
one-sample t-test discussed in Chapter 8—only the
symbols have been changed, from
sampling
X to D. There, we first computed
distribution
the estimated population variof mean
ance, then the standard error of the
differences Shows
all possible values of D
mean, and then tobt. We perform the
that occur when samples
same three steps here.
D
(⌺D)2
N
N⫺1
⌺D2 ⫺
s2D ⫽
Note: For all computations in this t-test, N equals
the number of difference scores.
Using the phobia data from Table 9.3, we have
(⌺D)2
(18)2
⌺D ⫺
96 ⫺
N
5
2
sD ⫽
⫽
⫽ 7.8
N⫺1
4
2
STEP 1: Compute s2D, which is the
estimated variance of the
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
For the phobia study, D is ⫹3.6, sD is 1.249, and
mD equals 0, so
STEP 2: Compute the standard error of the mean
difference. This is the standard deviation
of the sampling distribution of D. Its symbol
is sD––.
tobt ⫽
THE FORMULA FOR THE
STANDARD ERROR OF THE MEAN
DIFFERENCE IS
sD ⫽
So, tobt ⫽ ⫹2.88
9-5a Interpreting the Related-
s2D
Samples t-Test
BN
Interpret tobt by comparing it to tcrit from the t-table in
Appendix B.
For the phobia study, s2D ⫽ 7.8, and N ⫽ 5, so
sD ⫽
s2D
BN
⫽
D ⫺ mD
⫹ 3.6 ⫺ 0
⫽
⫽ ⫹2.88
sD
1.249
7.8
⫽ 11.56 ⫽ 1.249
A 5
THE FORMULA FOR THE DEGREES OF
FREEDOM FOR THE RELATED-SAMPLES
t-TEST IS
STEP 3: Find tobt.
df ⫽ N ⫺ 1
THE FORMULA FOR THE
RELATED-SAMPLES t-TEST IS
where N is the number of difference scores.
For the phobia study, with a ⫽ .05 and df ⫽ 4,
the tcrit is {2.776. The complete
standard error
sampling distribution is shown in
of the mean
Figure 9.5. Remember, it shows the
difference (sD––)
The standard
distribution of all Ds that occur
deviation of the
when we are representing a populasampling distribution
tion of Ds where mD is 0 (when our
of mean differences
therapy does not work). But, we see
D ⫺ mD
tobt ⫽
sD
In the formula, D is the mean of your difference
scores, sD is computed as above, and m is the value given
in H0. (It is always 0 unless you are testing for a nonzero
difference.)
Figure 9.5
Two-Tailed Sampling Distribution of Ds When mD ⫽ 0
µD
f
D
Values of t
D
D
D
...
–2.0
– tcrit = –2.776
D
D
–1.0
D
0
0
D
D
+1.0
D
D
+2.0
...
+ tcrit = +2.776
D
D
D
+ tobt = +2.88
( D = +3.6)
Chapter 9: Hypothesis Testing Using the Two-Sample t-Test
153
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
that a D like ours hardly ever occurs when the sample
represents this population. In fact, our tobt lies beyond
tcrit, so we conclude that our D of ⫹3.6 is unlikely to
represent the population of Ds where mD ⫽ 0. Therefore, the results are significant: We reject H0 and accept
Ha, concluding that the sample represents a mD around
⫹3.6. (As usual, with a ⫽ .05, the probability of a
Type I error here is p ⬍ .05.)
Now we work backward to our original fear
scores. Recall that our D of ⫹3.6 is equal to the difference between the original mean fear score for
before therapy (X ⫽ 14.80) and the mean fear score
for after therapy (X ⫽ 11.20). According to H0, this
difference is due to sampling error, and we are really
representing zero difference in the population. However, using D we have determined that ⫹3.6 is significantly different from zero: Our data are unlikely
to poorly represent zero difference in the population.
Therefore, it is also unlikely that the original means of
our fear scores poorly represent zero difference. Thus,
we conclude that the means of 14.80 and 11.20 differ
significantly from each other, and are unlikely to represent the same population of fear scores. Instead, the
therapy appears to work, with the data representing
a relationship in the population such that fear scores
change from a m around
ound 14.80 before therapy to a m
around 11.20 after therapy.
©Hemera Technologies/PhotoObjects.net/Jupiter Images
If the related-samples
elated-samples
tobt is significant,
gnificant, then
the original
inal raw score
means differ
fer significantly
from each other.
As usual, now we describe and
interpret this relationship.
onship. Again,
a helpful statistic for
or doing this
is to compute the effect size,
as described in Section
tion 9.7.
Also, it would be useful
to compute a confifidence interval to
better estimate the
154
m of the fear scores for each condition. However, we
cannot do that! The confidence interval for m requires
independent samples, which we do not have. We can,
however, compute a confidence interval for mD. For
example, with our D of ⫹3.6, we assume that if we
measured the entire population before and after therapy, the resulting population of Ds would have a mD
around ⫹3.6. The confidence interval provides a range
of values around ⫹3.6, within which we are confident
the actual mD falls. The computations for this confidence interval are presented in Appendix A.2.
If tobt had not been beyond tcrit, the results would
not be significant and we would make no conclusions
about whether our therapy influences fear scores (and
we’d again consider our power).
9-5b One-Tailed Hypotheses with
the Related-Samples t-Test
As usual, we perform a one-tailed test only when we
can confidently predict the direction of the difference
between the two conditions. Realistically, in the phobia study, we would predict we’d find lower scores in
the after-therapy condition. Then to create Ha, we first
arbitrarily decide
dec
which condition to subtract from which and what the differences should
sho
be. We subtracted after
from be
before, so lower after-therapy
scores would
w
produce positive differences. Then our D should be
feren
positive, representing a posip
tive mD. Therefore, we have:
Ha: mD ⬎ 0
Conversely, H0 always
i
implies
the independent variaable doesn’t work, so here it
would be that we have higher
w
or unchanged fear scores after
therapy. This would produce Ds
therapy
that are
ar either negative or at zero,
respectively. Therefore, we have
respecti
H0: mD ⱕ 0
We again examine the sampling dis
distribution that occurs when
mD ⫽ 0, like
li in Figure 9.5, except we
use only one tail. We are predicting
a positiv
positive D, which will be on the
right-hand side of the distribution,
right-ha
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
so the region of rejection is in only the upper tail, and
the one-tailed tcrit (from the t-table) is positive.
Had we predicted higher after scores, then by
subtracting after from before, the Ds and their D
should be negative, representing a negative mD. Then,
Ha: mD ⬍ 0, and H0: mD ⱖ 0. Now the region of rejection is in the lower tail and tcrit is negative.
For either test, compute tobt and find tcrit as we did
previously. Be sure you subtract to get your Ds in the
same way as when you created your hypotheses.
Wife
4
5
3
5
Husband
6
8
9
8
X ⫽ 4.25
X ⫽ 7.75
D
−2
−3
−6
−3
D ⫽ ⫺3.5
D ⫽ ⫺14>4 ⫽ ⫺3.5
9-5c Summary of the
Related-Samples t-Test
After checking that the design is matched-samples or
repeated-measures and meets the assumptions, the
related-samples t-test involves the following:
1. Create either the two-tailed or one-tailed H0
and Ha.
2. Compute tobt.
a. Compute the difference score for each pair of
scores.
b. Compute D and s2D.
(⌺D)2
(⫺14)2
58 ⫺
N
4
⫽
⫽3
N⫺1
3
⌺D 2 ⫺
s2D ⫽
sD ⫽
tobt ⫽
s2D
BN
⫽ 13>4 ⫽ .866
D ⫺ mD
sD
⫽ (⫺3.5 ⫺ 0)>.866 ⫽ ⫺4.04
With a ⫽ .05 and df ⫽ 3 tcrit is { 3.182. The tobt is significant. For wives, we expect m is 4.25, and for husbands,
we expect m is 7.75.
c. Compute sD.
d. Compute tobt.
3. Create the sampling distribution and, using
df ⫽ N ⫺ 1, find tcrit in the t-table.
4. Compare tobt to tcrit . If tobt is beyond tcrit
the results are significant; describe the
populations of raw scores and interpret the
relationship. If tobt is not beyond tcrit, the results
are not significant; make no conclusion about
the relationship.
For Practice
A two-tailed study tests the same
participants in both Conditions A
and B, with these data.
1. This way of producing related
samples is called a ______
design.
2. What are H0 and Ha?
5. If the results are significant, compute the “effect
size” as described in Section 9.7.
3. Subtracting A − B, perform the t-test.
> Quick Practice
> Answers
sD ⫽ 12.8>5 ⫽ .748; tobt ⫽ 13.4 ⫺ 02>.748 ⫽ ⫹4.55.
3. D ⫽ 17>5 ⫽ ⫹3.4; sD2 ⫽ 2.8;
2. H0: mD ⫽ 0; Ha: mD ⬆ 0
1. repeated-measures
More Example
In a two-tailed study, we compare husband-and-wife
pairs, with H0: mD ⫽ 0 and Ha: mD ⬆ 0. Subtracting wife–
husband produces
4. Subtracting A − B, what are H0 and Ha if we
predicted that B would produce lower scores?
With a ⫽ 0.5, tcrit ⫽ {2.776 and tobt is significant.
Perform the related-samples t-test with
a matched groups or repeated measures
design.
B
7
5
6
5
6
4. H0: mD ⱕ 0; Ha: mD ⬎ 0
>
A
8
10
9
8
11
Chapter 9: Hypothesis Testing Using the Two-Sample t-Test
155
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
9-6 STATISTICS IN THE
RESEARCH LITERATURE:
REPORTING A TWO-SAMPLE
STUDY
Report the results of an independent- or relatedsamples t-test using the same format as in previous
chapters. For example, in our hypnosis study, the tobt
of ⫹2.93 was significant with 30 df, so we report
t(30) ⫽ ⫹2.93, p ⬍ .05. As usual, df is in parentheses,
and because a ⫽ .05, the probability is less than .05
that we’ve made a Type I error. Also, as described in the
next section, you should include a measure of “effect
size” for any significant result. In fact, the American
Psychological Association requires published research
to report effect size.
In addition, as in Chapter 3, we report the mean
and standard deviation from each condition. Also,
with two or more conditions, researchers often include
graphs of their results. Recall that we graph the results
of an experiment by plotting the mean of each condition on the Y axis and the conditions of the independent variable on the X axis.
Note: In a related-samples study, report the means
and standard deviations of the original raw scores—
not the Ds—and graph these means.
9-7 DESCRIBING
relationship is “believable” because it is unlikely to be
due to sampling error. Although a relationship must
be significant to be potentially important, it can be
significant and still be unimportant.
We have two methods for measuring effect size.
The first is to compute Cohen’s d.
9-7a Effect Size Using Cohen’s d
One approach for describing effect size is in terms of
how big the difference is between the means of the
conditions. For example, the presence/absence of hypnosis produced a difference between the means of 3,
so this is the size of its effect. However, we don’t know
if 3 should be considered a large amount or not. To
decide this, we need a frame of reference, so we also
consider the estimated population standard deviation.
Recall that a standard deviation reflects the “average”
amount that scores differ. Thus, if individual scores
differ by an “average” of, say, 30, then large differences between scores frequently occur, so a difference
of 3 between their means is not all that impressive.
However, if scores differ by an “average” of, say, only
5, then a difference between their means of 3 is more
impressive.
Cohen’s d measures effect size by describing the
size of the difference between the means relative to
the population standard deviation. We have two versions of how it is computed, depending on which twosample t-test we have performed.
EFFECT SIZE
An important statistic for describing a significant relationship is called a measure of effect size. The “effect”
is the influence that the independent variable had on
dependent scores. Effect size indicates the amount
of influence that changing the conditions of the independent variable had
effect size The
amount of influence
on dependent scores. For example,
that changing the
the extent to which changing the
conditions of the
amount of hypnosis produced differindependent variable
had on dependent
ences in recall scores is the effect size
scores
of hypnosis.
Cohen’s d A
We want to identify those varimeasure of effect
ables that most influence a behavior,
size in a two-sample
so the larger the effect size, the more
experiment that reflects
scientifically important the indethe magnitude of the
differences between
pendent variable is. But! Remember
the means of the
that “significant” does not mean
conditions
“important”—only that the sample
156
THE FORMULAS FOR COHEN’S d ARE
INDEPENDENTSAMPLES t-TEST
d⫽
X1 ⫺ X2
2s
RELATED
SAMPLES t-TEST
d⫽
2
pool
D
2
2sD
In the formula for the independent-samples t-test,
the difference between the conditions is measured as
X1 ⫺ X2, and the standard deviation comes from the
square root of the pooled variance. For our hypnosis
study, the means were 23 and 20, and s2pool was 8.3, so
d⫽
X1 ⫺ X2
2s
2
pool
⫽
23 ⫺ 20
28.3
⫽
⫹3
⫽ 1.04
2.88
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
This tells us that the effect of changing our conditions
was to change scores by an amount that is slightly
larger than 1 standard deviation.
In the formula for the related-samples t-test, the
difference between the conditions is measured by D,
and the standard deviation comes from finding the
square root of the estimated variance (s2D). In our phobia study, D ⫽ ⫹3.6 and s2D ⫽ 7.8, so
therapy manipulation had a slightly larger impact on
the dependent variable.
The other way to measure effect size is by computing the proportion of variance accounted for.
D
⫹3.6 ⫹3.6
⫽
⫽
⫽ 1.29
2
2sD 27.8 2.79
Instead of measuring effect size in terms of the size of
the changes in scores as above, we can also determine
how consistently the scores change. Here, an independent variable has a greater impact the more it controls
behavior: It produces one set of similar behaviors and
scores for everyone in one condition, while producing a different set of similar behaviors and scores for
everyone in a different condition. A variable is more
minor, however, when it exhibits less control over
behaviors and scores.
In statistical terminology, when we describe the
consistency among scores produced by the conditions, we are describing the proportion of variance
accounted for. To see how this works, here are some
possible fear scores from our phobia study.
d⫽
9-7b Effect Size Using Proportion
of Variance Accounted For
Thus, the effect size of the therapy was 1.29.
The larger the effect size, the
greater the influence that an
independent variable has on
dependent scores and thus the
more important the variable is.
Before Therapy After Therapy
10
5
11
6
12
7
Values of d
d ⫽ .2
d ⫽ .5
d ⫽ .8
Interpretation of Effect Size
small effect
medium effect
large effect
Thus, in our previous examples we
found two very large effects.
Second, we can compare the size of
different ds to determine the relative impact
of different independent variables. In our previous
examples, the d for hypnosis was 1.04, but for therapy
it was 1.29. Therefore, in the respective studies, the
1
Cohen, J. (1988) Statistical power analysis for the
behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum
Associates.
©iStockphoto.com/Archives
We can interpret the above ds in two
ways. First, the larger the absolute size
of d, the larger the impact of the independent variable. In fact, Cohen1 proposed the following guidelines.
Each after-therapy score is 5 points lower than
the corresponding before-therapy score. These differ
ferences
among the scores can be attributed to
cha
changing
the conditions of our independent variabl However, we also see differences among scores
able.
wi
within
each condition: In the before scores, for
exa
example,
one participant had a 10 while someone
els had an 11. These differences cannot be attribelse
ute to changing the independent variable. Thus, out
uted
of all the differences among these six scores, some
dif
differences
seem to have been produced by changing the independent variable while others were not.
In other words, some proportion of all differences
among the scores can be attributed to changing our
conditions. By attributing these differences to changing the conditions, we explain their cause or account
for them. And finally, recall that one way to measure
differences among scores is to measure their variance. So, altogether, we are describing “the proportion of variance accounted for.”
Chapter 9: Hypothesis Testing Using the Two-Sample t-Test
157
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
proportion
of variance
accounted for
In an experiment, the propor-
In our hypnosis study, tobt ⫽ ⫹2.93 with df ⫽ 30. So
tion of variance accounted for
is the proportion of all differences in
In an experiment,
dependent scores that can be attribthe proportion of
uted to changing our conditions.
the differences in
The larger the proportion, the more
dependent scores
associated with
that differences in scores seem to be
changing the conditions
consistently caused by changing the
of the independent
independent variable, so the larger
variable
is the variable’s effect in determining
squared
scores.
point-biserial
However, adjust your expectacorrelation
coefficient (r 2pb)
tions about what is “large.” Any
Indicates the
behavior is influenced by many variproportion of variance
ables, so one variable by itself will
in dependent scores
that is accounted for
have a modest effect. Therefore, here
by the independent
is a rough guide of what to expect
variable in a twoin real research: An effect size less
sample experiment
than .09 is considered small; between
about .10 and .25 is considered moderate and is relatively common; and above .25 is considered large and is rare.
In the next chapter, we will see that the proportion
of variance accounted for depends on the consistency
of the relationship. There, we will also discuss the statistic for summarizing a relationship called the correlation coefficient. It turns out that the mathematical
steps needed to determine the proportion of variance
accounted for are accomplished by computing the
appropriate correlation coefficient and then squaring
it. In the two-sample experiment, we describe the proportion of variance accounted for by computing the
squared point-biserial correlation coefficient
Its symbol is r 2pb.
It also turns out that these steps are largely accomplished by computing tobt. So,
2
rpb
⫽
⫽
(tobt)2
(tobt)2 ⫹ df
⫽
(2.93)2
(2.93)2 ⫹ 30
8.585
⫽ .22
38.585
Thus, .22 or 22% of the differences in our recall
scores are accounted for by changing our hypnosis
conditions. You can understand why this is a moderate effect by considering that if 22% of the differences in scores are due to changing our conditions,
then 78% of the differences are due to something else
(perhaps motivation or the participant’s IQ played a
role). Therefore, hypnosis is only one of a number of
variables that influence memory here, and thus, it is
only somewhat important in determining scores.
On the other hand, in the phobia study tobt ⫽ ⫹2.88
and df ⫽ 4, so
r ⫽
2
pb
(tobt)2
(tobt)2 ⫹ df
⫽
(2.88)2
⫽ .68
(2.88)2 ⫹ 4
This indicates that 68% of all differences in our
fear scores are associated with before- or after-therapy. Therefore, our therapy had an extremely large
effect size and is an important variable in determining
fear scores.
Further, we use effect size to compare the importance of different independent variables: The therapy
accounts for 68% of the variance in fear scores, but
hypnosis accounts for only 22% of the variance in
recall scores. Therefore, in the respective relationships,
the therapy variable had a much larger effect, so it
is scientifically more important for understanding the
relationship and the behavior.
2
THE FORMULA FOR rpb
IS
r ⫽
2
pb
(tobt)2
(tobt)2 ⫹ df
This can be used with either the independentsamples or the related-samples t-test. In the numerator, square tobt. In the denominator, add (tobt)2 to
the df from the study. For independent samples,
df ⫽ (n1 ⫺ 1) ⫹ (n2 ⫺ 1), but for related samples,
df ⫽ N ⫺ 1.
158
Effect size as measured by the
proportion of variance accounted for
indicates how consistently an
independent variable influences
dependent scores.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
USING SPSS
As described on Review Card 9.4, SPSS performs the independent-samples or the related-samples t-test,
including computing the mean and estimated population variance for each condition and the two-tailed a at
which tobt is significant. The program also computes the confidence interval for the difference between two ms
for the independent-samples test and the confidence interval for mD for the related-samples test. It does not
compute measures of effect size.
Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out
and use the Chapter Review Cards in the back of your book. Check out the additional study aids online
in CourseMate at www.cengagebrain.com
STUDY PROBLEMS
(Answers for odd-numbered problems are in Appendix C.)
1. (a) With what design do you perform the
independent-samples t-test? (b) With what design
do you perform the related-samples t-test?
2. (a) How do you identify the independent variable
in an experiment? (b) How do you identify the
dependent variable in an experiment? (c) What is
the goal of an experiment?
3. What are the two explanations for obtaining a
different mean in each condition of a two-sample
experiment?
4. (a) How do you create independent samples?
(b) What is the difference between n and N?
5. Explain the two ways to create related samples.
6. In addition to requiring independent or related
samples, the two-sample t-test has what other
assumptions?
7. (a) What does the sampling distribution of the
differences between means show? (b) What information does the standard error of the difference
provide? (c) What does being in the region of
rejection of a sampling distribution indicate?
8. What does “homogeneity of variance” mean?
9. (a) What does the sampling distribution of mean
differences show? (b) What information does the
standard error of the mean difference provide?
(c) What does being in the region of rejection of
this sampling distribution indicate?
10. (a) What does effect size indicate? (b) What does d
indicate? (c) What does the proportion of variance
accounted for indicate? (d) What statistic describes
this proportion in the two-sample t-test?
11. (a) What is your final task after finding a significant
result? (b) Why is effect size useful at this stage?
12. For the following, which type of t-test is
required? (a) Studying the effects of a memory
drug on Alzheimer’s patients by testing a group
of patients before and after administration of
the drug. (b) Studying whether men and women
rate the persuasiveness of an argument delivered
by a female speaker differently. (c) The study
described in part b, but with the added
requirement that for each man of a particular age,
there is a women of the same age.
13. We solicit two groups of professors at our school:
One group agrees to check their e-mail once
per day; the other group can check it as often as
they wish. After two weeks, we give everyone a
standard test of productivity to determine who has
the higher (more productive) scores. The productivity scores from two independent samples are
Once:
X ⫽ 43, s2x ⫽ 22.79, n ⫽ 15
Often:
X ⫽ 39, s2x ⫽ 24.6, n ⫽ 15
(a) What are H0 and Ha? (b) Compute tobt. (c) With
a ⫽ .05, what is tcrit? (d) What should we conclude about this relationship? (e) Using our two
approaches, how big is the effect of checking
e-mail on productivity? (f) Describe how you would
graph these results.
14. We investigate if a period of time feels longer or
shorter when people are bored compared to when
they are not bored. Using independent samples,
we obtain these estimates of the time period (in
minutes):
Sample 1 (bored):
X ⫽ 14.5, s2x ⫽ 10.22, n ⫽ 28
Sample 2 (not bored):
X ⫽ 9.0, s2x ⫽ 14.6, n ⫽ 34
(a) What are H0 and Ha? (b) Compute tobt. (c) With
a ⫽ .05, what is tcrit? (d) What should we conclude about this relationship? (e) Using our two
approaches, how important is boredom in determining how quickly time seems to pass?
Chapter 9: Hypothesis Testing Using the Two-Sample t-Test
159
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
15. A researcher asks if people score higher or lower
on a questionnaire measuring their emotional
well-being when they are exposed to much sunshine compared to when they’re exposed to little
sunshine. A sample of 8 people is measured under
both levels of sunshine and produces these wellbeing scores:
Low:
14 13 17 15 18 17 14 16
(a) Subtracting before from after, what are H0 and
Ha? (b) Compute tobt. (c) With a ⫽ .05, what is tcrit?
(d) What should we conclude about this relationship? (e) How large is the effect of violence in terms
of the difference it produces in aggression scores?
18. You investigate whether the older or the younger
male in pairs of brothers tends to be more
extroverted. You obtain the following extroversion scores:
High: 18 12 20 19 22 19 19 16
(a) Subtracting low from high, what are H0 and
Ha? (b) Compute tobt. (c) With a ⫽ .05, what
do you conclude about this study? (d) What is
the predicted well-being score for someone
when tested under low sunshine? Under high
sunshine? (e) How consistently are dependent
scores changed by changing the amount of
sunlight? (f) How scientifically important are
these results?
16. A researcher investigates whether classical music is
more or less soothing to air-traffic controllers than
modern music. She plays a classical selection for
one group and a modern selection for another. She
gives each person an irritability questionnaire and
obtains the following:
Classical:
n ⫽ 6, X ⫽ 14.69, s2x ⫽ 8.4
Modern:
n ⫽ 6, X ⫽ 17.21, s2x ⫽ 11.6
(a) Subtracting C ⫺ M what are H0 and Ha?
(b) What is tobt? (c) With a ⫽ .05, are the results
significant? (d) Report the results using the correct
format. (e) What should she conclude about the
relationship in nature between type of music and
irritability? (f) What other statistics should be
computed?
17. We predict that children exhibit more aggressive
acts after watching a violent television show. The
scores for ten participants before and after watching the show are
Sample 1 (After)
5
6
4
4
7
3
2
1
4
3
160
Sample 2 (Before)
4
6
3
2
4
1
0
0
5
2
Sample 1 (Younger)
10
11
18
12
15
13
19
15
Sample 2 (Older)
18
17
19
16
15
19
13
20
(a) What are H0 and Ha? (b) Compute tobt. (c) With
a ⫽ .05, what is tcrit? (d) What should you conclude about this relationship? (e) Which of our
approaches should we use to determine the effect
size here?
19. A rather dim student proposes testing the conditions of “male” and “female” using a repeatedmeasures design. What’s wrong with this idea?
20. With a ⫽ .05 and df ⫽ 40, a significant independent-samples tobt was ⫹ 4.55 How would you
report this in the literature?
21. An experimenter investigated the effects of a
sensitivity-training course on policemen’s effectiveness
at resolving domestic disputes (using independent
samples who had or had not completed the course).
The dependent variable was the ability to resolve a
domestic dispute. These success scores were obtained:
No Course
11
14
10
12
8
15
12
13
9
11
Course
13
16
14
17
11
14
15
18
12
11
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
(a) Should a one-tailed or a two-tailed test be
used? (b) What are H0 and Ha? (c) Subtracting
course form no course, compute tobt and determine
whether it is significant. (d) What conclusions
can the experimenter draw from these results?
(e) Using our two approaches, compute the effect
size and interpret it.
22. (a) What does it mean that an independent variable “accounts” for more of the variance? (b) Why
is such a variable more scientifically important?
23. When reading a research article, you encounter
the following statements. For each, identify the
design, the statistical procedure performed, if a
Type I or Type II error is possibly being made, and
the influence of the independent variable in the
relationship being studied. (a) “The t-test indicated
a significant difference between the mean for men
(M = 5.4) and the mean for women (M = 9.3), with
t(58) = +7.93, p < .01. Unfortunately, the effect size
was only r2pb ⫽ .08.” (b) “The t-test indicated that
participants’ weights were significantly reduced
after three weeks of dieting, with t(40) = 3.56,
p < .05, and r2pb ⫽ .26.”
24. For each of the following, which type of t-test is
required? (a) Studying whether psychology or sociology majors are more prone to math errors on a
statistics exam. (b) The study in part a, but for each
psychology major, there is a sociology major with
the same reading score. (c) A study of the spending habits of a group of teenagers, comparing the
amount of money each spends in an electronic
games store and in a clothing store. (d) A study of
the effects of a new anti-anxiety drug, measuring
participants’ anxiety before and after administration of the drug. (e) Testing whether women in the
U.S. Army are more aggressive than women in the
U.S. Marine Corps.
25. (a) When do you perform a parametric inferential
procedure in an experiment? (b) What are the four
parametric inferential procedures for experiments
that we have discussed so far in this book, and
what is the design in which each is used?
Chapter 9: Hypothesis Testing Using the Two-Sample t-Test
161
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter
10
DESCRIBING RELATIONSHIPS
USING CORRELATION AND
REGRESSION
LOOKING
BACK
GOING
F O R WA R D
Be sure you understand:
Your goals in this chapter are to learn:
• From Chapter 1, the difference
between an experiment and a
correlational study, and how to
recognize a relationship between the
scores of two variables.
• How to create and interpret a scatterplot.
• From Chapter 4, that greater
variability indicates that scores are
not consistently close to each other.
• The logic of predicting scores using linear regression and of r 2.
• What a regression line is.
• When and how to compute the Pearson r.
• How to perform significance testing of the Pearson r.
• From the previous chapters, the basics
of significance testing.
R
ecall that in a relationship we see a pattern where,
Sections
10-1
Understanding
Correlations
10-2
The Pearson Correlation
Coefficient
10-3
as the scores on one variable change, the scores on
the other variable also change in a consistent manner.
This chapter presents a new statistic for describing a
relationship called the correlation coefficient. In the following
sections we will discuss (1) what a correlation coefficient is and
Significance Testing of the
Pearson r
how to interpret it, (2) how to compute the most common coef-
10-4
Statistics in the Research
Literature: Reporting r
and (4) how we use a relationship to predict unknown scores.
10-5
An Introduction to Linear
Regression
10-6
The Proportion of
Variance Accounted For: r 2
162
ficient, (3) how to perform inferential hypothesis testing of it,
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
© Martin Holtkamp/Taxi Japan/Getty Images
10-1 UNDERSTANDING
© iStockphoto.com/Michael Krinke
CORRELATIONS
Whenever we find a relationship, we then want to
know its characteristics: What pattern is formed, how
consistently do the scores change together, what direction do the scores change, and so on. The best—and
easiest—way to answer these questions is by computing a correlation coefficient. The term correlation
means relationship, and a correlation coefficient is a
number that describes a relationship. The correlation
coefficient is a statistic that describes the important
characteristics of a relationship. We compute the correlation coefficient using our sample data, and the
answer is a single number that quantifies the pattern
in a relationship. No other statistic can do this. Thus,
the correlation
correla
coefficient is
important because it simpliimporta
fies a complex relationship involving many
scores into one easily
sco
interpreted statistic.
in
Correlation coefficients are most
commonly used to sumcom
marize the relationship found in a
correlational design,
but computing a correlation coefficient does not create
this type of study. Recall from Chapter 1 that in a correlational design, we simply measure participants’ scores
on two variables. For example, as people drink more coffee, they typically become more nervous. To study this
in a correlational study, we might ask participants the
amount of coffee they had consumed that day and also
measure how nervous they were. This is different from
an experiment because there we would manipulate coffee
consumption by assigning some people to a one-cup condition, others to a two-cup condition, and so on, and then
measure their nervousness. Thus, the major difference
here is how we go about demonstrating the relationship.
However, it is important to note another major
distinction between these designs: A correlational
design does not allow us to conclude that changes in X
cause changes in Y. It may be that X does cause Y. But
it also may be that Y causes X or that a third variable
influences both X and Y. Thus, if we find that higher
coffee scores are associated with higher nervousness
scores, it may be that more coffee makes people more
nervous. But, it may instead be that participants who
were already more nervous then drank more coffee.
Or, perhaps a hidden variable was
correlation
operating: Perhaps some particicoefficient
pants had less sleep than others the
A statistic that
night before testing, and this caused
describes the important
characteristics of a
these people both to be more nerrelationship
vous and to drink more coffee.
Chapter 10: Describing Relationships Using Correlation and Regression
163
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
So remember, a correlation by
itself does not indicate causality. You
A graph of the
must also consider the research design
individual data points
from a set of X-Y pairs
used to demonstrate the relationship.
In experiments we have more control
over the variables and the situation, so they tend to
provide better evidence for identifying the causes of a
behavior. Correlational research—and correlations—
simply describe how nature relates the variables, without identifying the cause.
scatterplot
to create a scatterplot. A scatterplot is a graph of the
individual data points from a set of X-Y pairs. Figure
10.1 shows some data and then the scatterplot we
might have from studying nervousness and coffee consumption. (Real research typically involves a larger N,
and the data points will not form such a pretty pattern.)
To create this scatterplot, the first person in the data
had 1 cup of coffee and a nervousness score of 1, so we
place a data point above an X of 1 cup at the height of a
score of 1 on the Y variable of nervousness. The second
10-1a Distinguishing
Characteristics of Correlational
Analysis
Figure 10.1
There are four major differences between how we
handle data in a correlational analysis versus in an
experiment. First, in our coffee experiment we would
examine the mean nervousness score (the Y scores)
for each condition of the amount of coffee consumed
(each X). Then we would examine the relationship they
show. However, with a correlational study we typically have a rather large range of different X scores:
People will probably report many different amounts
of coffee consumed in a day. Comparing the mean nervousness scores for so many amounts would be very
difficult. Therefore, in correlational procedures we do
not compute a mean Y score at each X. Instead, the
correlation coefficient simultaneously summarizes the
entire relationship that is formed by all pairs of X-Y
scores in the data.
A second difference is that, because we examine all pairs of X-Y scores, correlational procedures
involve one sample: In correlational analysis, N stands
for the number of pairs of scores in the data.
Third, in a correlational study, either variable
may be labeled as the X or Y variable. How do we
decide which is X or Y? Any relationship may be seen
as asking, “For a given X, what are the Y scores?”
So, simply identify your “given” variable and it is X.
Thus, if we ask, “For a given amount of coffee, what
are the nervousness scores?”, then amount of coffee is
the X variable and nervousness is the Y. Conversely,
if we ask, “For a given nervousness score, what is the
amount of coffee consumed?”, then nervousness is
X and amount of coffee is Y. (Note: Researchers disagree about whether to then call X and Y the independent and dependent variables. The safer approach is
to reserve these terms for experiments.)
Finally, the data are graphed differently in correlational research. We use the individual pairs of scores
Each data point is created using a participant’s coffee consumption as
the X score and nervousness as the Y score.
164
Scatterplot Showing Nervousness as a Function
of Coffee Consumption
Cups of
Coffee:
X
Nervousness
Scores:
Y
1
1
1
2
2
3
3
4
4
5
5
6
6
1
1
2
2
3
4
5
5
6
8
9
9
10
Nervousness scores
Scatterplot
10
9
8
7
6
5
4
3
2
1
0
1
2
3
4
5
Cups of coffee consumed
6
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
participant had the same X and Y scores, so that data
point is on top of the previous one. The third participant scored an X of 1 and Y of 2, and so on.
We interpret a scatterplot by examining the overall pattern formed by the data points. We read a graph
from left to right and see that the Xs become larger.
So, get in the habit of describing a relationship by asking, “As the X scores increase, what happens to the
Ys?” Back in Figure 10.1, as the X scores increase, the
data points move higher on the graph, indicating that
the corresponding Y scores are higher. Thus, the scatterplot reflects a relationship. Recall that in any relationship as the X scores increase, the Y scores change
such that different values of Y tend to be paired with
different values of X.
Drawing the scatterplot allows you to see your
particular relationship and to map out the best way to
summarize it. The shape and orientation of a scatterplot
reflect the characteristics of the relationship that are
described by the correlation coefficient. The two important characteristics of any relationship that we need to
know about are the type and the strength of the relationship. The following sections discuss these characteristics.
that follows one straight line. This is
type of
because in a linear relationship,
relationship
The overall direction
as the X scores increase, the Y scores
the Y scores tend to
tend to change in only one direcchange as X scores
tion. Figure 10.2 shows two linear
increase
relationships: between the amount
linear
of time that students study and their
relationship
test performance, and between the
A relationship in
which the Y scores
number of hours that students watch
tend to change in
television and the amount of time
only one direction as
they sleep. These are linear because
the X scores increase
as students study longer, their grades
tend only to increase, and as they watch more television, their sleep time tends only to decrease.
To better see the overall pattern in a scatterplot,
visually summarize it by drawing a line around its outer
edges. As in Figure 10.2, a scatterplot that forms a
slanted ellipse indicates a linear relationship: By slanting,
Figure 10.2
Scatterplots Showing Positive and Negative
Linear Relationships
Positive linear
study–test relationship
The type of relationship that is present in a set
of data is the overall direction in which the Y scores
change as the X scores change. The two general types
of relationships are linear and nonlinear relationships.
Test scores
10-1b Types of Relationships
LINEAR RELATIONSHIPS The term linear means
“straight line,” and a linear relationship forms a pattern
100
90
80
70
60
50
40
30
0
1
2
3
4
5
6
7
Hours of study time per day
8
Hours of sleep
© John Lund/Sam Diephuis/Blend Images/Jupiterimages
Negative linear
television–sleep relationship
10
9
8
7
6
5
4
3
0
1
2
3
4
5
6
Hours of TV time per day
7
8
Chapter 10: Describing Relationships Using Correlation and Regression
165
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
it indicates the Y scores are changing
anging as the X
as the X scores in
increase, the Y scores do not only
scores increase; by slanting in one
ne direction, it
increase or only decrease: At some point, the Y
indicates it is a linear relationship..
scores alter th
their direction of change. (Note:
Further, we also summarizee the relaAnother nam
name for nonlinear is curvilinear.)
tionship by drawing a line through
ough the
Nonlinear relationships come in many
Nonlin
scatterplot. This line is called the regresdifferent shapes, but Figure 10.3 shows
sion line. While the correlation coeffitwo com
common ones. On the left is the relacient is the statistic that summarizes
izes
tionship between a person’s age and
tionsh
a relationship, the regression
on
the amount of time required to move
line is the line that summafrom one place to another. At first,
fr
rizes the relationship. The
aas age increases, movement time
linear regression line is
decreases; but, beyond a certain
the straight line that summaage, the time scores change direc© iStockphoto.com/Thomas Flügge
rizes a relationship by passing
tion and begin to increase. (This
through the center of the scatis called a U-shaped pattern.) The
terplot. That is, although not all data points are on the
scatterplot on the right shows the relationship between
line, the distance that some are above the line equals
the number of alcoholic drinks consumed and feeling
the distance that others are below it, so that, “on averwell. At first, people tend to feel better as they drink,
age,” the regression line passes through the center
but beyond a certain point, drinking more makes
of the scatterplot. Therefore, think of the regression
them feel progressively worse. (This pattern reflects an
line as reflecting the pattern that all data points more
inverted U-shaped relationship.) Curvilinear relationor less follow, so it shows the linear—straight-line—
ships may be more complex than those above, producrelationship hidden in the data.
ing a wavy pattern that repeatedly changes direction.
The different ways that the scatterplots slant back
Also, the scatterplot does not need to be curved to be
in Figure 10.2 illustrate the two subtypes of linear
nonlinear. Scatterplots similar to those in Figure 10.3
relationships that occur, depending on the direction in
might be best summarized by two straight regression
which the Y scores change. The study–test relationship
lines that form a V and an inverted V, respectively. Or
is a positive relationship. In a positive
we might see lines that form angles like
or
. All
linear relationship, as the X scores
are
still
nonlinear
relationships,
because
they
cannot
be
linear
increase, the Y scores also tend to
summarized by one straight line.
regression line
The straight line that
increase. Any relationship that fits the
Note that the preceding terminology is also used
summarizes a linear
pattern “the more X, the more Y” is a
to describe the type of relationship found in experirelationship by passing
positive linear relationship.
ments. If, as the amount of the independent variable
through the center of
the scatterplot
On the other hand, the television–
(X) increases, the dependent scores (Y) also increase,
sleep
relationship
is
a
negative
relathen you have a positive linear relationship. If the
positive linear
relationship
tionship. In a negative linear
dependent scores decrease as the independent variA relationship in which
relationship, as the X scores increase,
able increases, then you have a negative linear relathe Y scores tend to
the Y scores tend to decrease. Any relationship. And if, as the independent variable increases,
increase as the X scores
tionship that fits the pattern “the more
increase
X, the less Y” is a negative linear relanegative linear
tionship. (Note: The term negative does
relationship
A relationship in which
not mean there is something wrong
the Y scores tend to
with the relationship: It merely indidecrease as the X
cates the direction in which the Y scores
scores increase
A linear relationship follows one
change as the X scores increase.)
nonlinear
(curvilinear)
relationship
A relationship in which
the Y scores change
their direction of
change as the X scores
increase
166
NONLINEAR RELATIONSHIPS If a
relationship is not linear, then it is
nonlinear, meaning that the data cannot be summarized by one straight
line. In a nonlinear relationship,
straight line and may be positive
(with increasing Y scores)
or negative (with decreasing
Y scores).
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Figure 10.3
Scatterplots Showing Nonlinear Relationships
Inverted U-shaped pattern
7
7
6
6
Feeling of wellness
Time for movement
U-shaped pattern
5
4
3
5
4
3
2
2
1
1
0
10
20
30
40 50
Age in years
60
70
0
1
2
3
4
5
6
7
Alcoholic drinks consumed
8
9
10
the dependent scores alter their direction of change,
then you have a nonlinear relationship.
HOW THE COEFFICIENT DESCRIBES THE TYPE OF
RELATIONSHIP The first step when summarizing data
is to decide on the specific correlation coefficient to
compute. This begins with deciding between a linear
and a nonlinear version. Most behavioral research
uses a linear correlation coefficient—one designed to
summarize a linear relationship. How do you know
whether your data form a linear relationship? If the
scatterplot generally follows a straight line, then a linear correlation is appropriate. (We will discuss only
linear correlations.)
By computing a linear correlation coefficient, we
communicate to other researchers that we have a linear relationship. Then, the coefficient itself communicates whether the relationship is positive or negative.
Sometimes our computations will produce a negative
number (with a minus sign), indicating we have a
negative linear relationship. Other data will produce
a positive number (and we place a plus sign with it),
indicating we have a positive linear relationship.
The other characteristic of a relationship communicated by the correlation coefficient is the strength of
the relationship.
10-1c Strength of the Relationship
Recall that a relationship can exhibit varying degrees
of consistency. The strength of a relationship is
the extent to which one value of Y is consistently
A linear correlation coefficient
has two components: the sign,
indicating a positive or a negative
relationship, and the absolute
value, indicating the strength of
the relationship.
paired with one and only one value of X. (This is also
referred to as the degree of association.) The strength
of a relationship is indicated by the absolute value
of the correlation coefficient (ignoring the sign). The
larger the coefficient, the stronger, more consistent
the relationship is. The largest possible value of a correlation coefficient is 1, and the smallest value is 0.
When you include the sign, a linear correlation coefficient can be any value between 1and 1. Thus, the
closer the coefficient is to { 1, the more consistently
one value of Y is paired with one and
only one value of X.
strength of a
relationship
Recognize that correlation coefThe extent to
ficients do not directly measure
which one value
units of “consistency.” Thus, if one
of Y is consistently
associated with one
correlation coefficient is .40 and
and only one value
another is .80, you cannot conof X
clude that the second relationship
Chapter 10: Describing Relationships Using Correlation and Regression
167
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Perfect positive
coefficient ⴝ ⴙ1
X
Y
1
1
1
3
3
3
5
5
5
2
2
2
5
5
5
8
8
8
168
8
7
6
5
4
3
2
1
0
1
2
3
X scores
4
5
1
2
3
X scores
4
5
Perfect negative
coefficient ⴝ ⴚ1
X
Y
1
1
1
3
3
3
5
5
5
8
8
8
5
5
5
2
2
2
8
7
6
5
4
3
2
1
0
Dave & Les Jacobs/Cultura/Getty Images
correlation coefficient of 1 or
1 describes a perfectly consistent linear relationship. Figure
10.4 shows an example of each.
In this and the following figures,
first look at the scores to see how
they pair up; then look at the
scatterplot. Other data having the
same correlation coefficient will
produce similar patterns having
similar scatterplots.
Interpreting any correlation
coefficient involves envisioning
the scatterplot that is present.
Here are four related ways to
think about what a coefficient
tells you about the relationship.
First, the coefficient indicates
the relative degree of consistency
with which Ys are paired with
Xs. A coefficient of {1 indicates
that one, identical Y score was
obtained by everyone who obtains
a particular X. Then, every time X
changes, the Y scores all change
to one new value.
Data and Scatterplots Reflecting Perfect Positive and Negative Correlations
Y scores
THE PERFECT CORRELATION A
Figure 10.4
Y scores
is twice as strong as the first.
Instead, evaluate any correlation coefficient by comparing it
to the extreme values of 0 and
{1. The starting point is a perfect relationship.
Second, the opposite of consistency is variability, so the coefficient communicates the variability in
the group of Y scores paired with each X. When the
coefficient is {1, only one Y is paired with an X, so
there is no variability among the Y scores paired with
each X.
Third, the coefficient indicates how closely the
scatterplot fits the regression line. Because a coefficient equal to {1 indicates zero variability or spread
among the Y scores at each X, the data points form a
perfect straight-line relationship so that they all lie on
the regression line.
Fourth, the coefficient communicates the relative accuracy of our predictions. A goal of behavioral
science is to predict the specific behaviors—and the
scores that reflect them—that occur in a particular
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
situation. We do this using relationships because a
particular Y score is naturally paired with a particular
X score. Therefore, if we know someone’s X, we use
the relationship to predict that individual’s Y. A coefficient of {1 indicates perfect accuracy in our predictions: Because only one value of Y occurs with an X,
by knowing someone’s X, we can predict exactly what
his or her Y will be. For example, in both graphs an X
of 3 is always paired with a Y of 5, so we predict that
anyone scoring an X of 3 will have a Y score of 5.
The correlation coefficient
communicates how consistently
the Ys are paired with X, the
variability in Ys at each X, how
closely the scatterplot fits the
regression line, and the accuracy
in our predictions of Y.
that is not {1 indicates that the data form a linear
relationship to only some degree. The key to understanding the strength of any relationship is this:
A RELATIONSHIP
The correlation coefficient communicates this because, as the variability in the
Ys at each X becomes larger, the correlation coefficient becomes smaller.
2. Variability: A coefficient less than {1 indicates
there is variability among the Y scores at each
X. In other words, different Y scores are now
paired with an X. However, .98 is close to 1,
indicating this variability is relatively small so
that the different Ys at an X are relatively close
to each other.
4. Predictions: When the coefficient is not {1,
there is not one Y score for a particular X, so
we can predict only around what someone’s Y
score will be. In Figure 10.5, at an X of 1 are Y
scores of 1 and 2. Split the difference and for each
person here we’d predict a Y of around 1.5. But
no one scored exactly 1.5, so we’ll have some
error in our predictions. However, .98 is close
to 1, indicating that, overall, our error will be
relatively small.
Figure 10.5
Data and Scatterplot Reflecting a Correlation Coefficient of .98
X
Y
1
1
1
3
3
3
5
5
5
1
2
2
4
5
5
7
8
8
Y scores
(DIFFERENCES)
AMONG THE GROUP
OF Y SCORES PAIRED
WITH EACH X BECOMES
LARGER.
1. Consistency: An absolute value less than
{1 indicates that not every participant who
obtained a particular X obtained the same Y.
However, a coefficient of .98 is close to 1, so
here we have close to perfect association between
the X and Y scores.
3. The scatterplot: A coefficient less than 1 indicates variability in Y at each X, so the data points
at an X are vertically spread out above and below
the regression line. However, a coefficient of .98
is close to 1, so we know the data points are
close to the regression line, resulting in a scatterplot that is a narrow ellipse.
INTERMEDIATE STRENGTH A correlation coefficient
BECOMES
WEAKER
AS THE VARIABILITY
For example, Figure 10.5 shows data that produce a correlation coefficient of .98. Again, interpret
the coefficient in four ways:
8
7
6
5
4
3
2
1
0
1
2
3
X scores
4
Chapter 10: Describing Relationships Using Correlation and Regression
5
169
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
2. the variability in the Y scores paired
with each X is relatively large;
3. the large variability produces data
points on the scatterplot at each
X that are vertically very spread
out above and below the regression, forming a relatively wide
scatterplot; and
4. because each X is paired with many
different Ys, knowing a participant’s
X will not get us close to his or her
Y, so our predictions will contain
large amounts of error.
Greater variability in the
group of Y scores at each
X reduces the strength
of a relationship and the
size of the correlation
coefficient.
Data and Scatterplot Reflecting a Correlation Coefficient of .28
X
Y
1
1
1
3
3
3
5
5
5
9
6
3
8
6
3
7
5
1
Y scores
1. this relationship is not close to
showing perfectly consistent
association;
Figure 10.6
9
8
7
6
5
4
3
2
1
0
1
X
Y
1
1
1
3
3
3
5
5
5
3
5
7
3
5
7
3
5
7
4
5
9
8
7
6
5
4
3
2
1
1
0
2
3
X scores
4
5
2. The spread in Y at any X is
2
at maximum and equals the
overall spread of Y in the
data.
value of the correlation coefficient
nt is
0, indicating that no relationship is
present. Figure 10.7 shows data
that produce such a coefficient. A
correlation coefficient of 0 is as far
as possible from {1, telling us the
scatterplot is as far as possible from
m
forming a slanted straight line. Thereerefore, we know the following:
170
3
X scores
Data and Scatterplot Reflecting a Correlation Coefficient of 0
ZERO CORRELATION The lowest possible
3. The scatterplot is horizontal
3
and elliptical (or circular),
and in no way hugs the
regression line.
© Chris Stein/Stone/Getty Images
1. No Y score tends to be consistently
tly associated with only one X, and instead,
d, virtually the
same batch of Y scores is paired with every X.
2
Figure 10.7
Y scores
On the other hand, Figure 10.6
shows data that produce a coefficient of
.28. Because this coefficient is not very
close to 1, this tells us:
4. Because each X is paired
4
with virtually all Y scores,
knowing someone’s X score
is no help in predicting his or
her Y score.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
2. The more that you smoke cigarettes, the lower is
your healthiness. This is a(n) _____ linear relationship, producing a scatterplot that slants _____ as X
increases.
The larger a correlation
coefficient (whether positive or
negative), the stronger the linear
relationship, because the Ys are
spread out less at each X, and so
the data come closer to forming
a straight line.
3. The more that you exercise, the better is your
muscle tone. This is a(n) _____ linear relationship,
producing a scatterplot that slants _____ as X
increases.
4. In a stronger relationship, the variability among the
Y scores at each X is _____, producing a scatterplot
that forms a(n) _____ ellipse.
5. The _____ line summarizes the scatterplot.
> Answers
> Quick Practice
1. linear; nonlinear 2. negative; down 3. positive; up
>
4. smaller; narrower 5. regression
>
In a positive linear relationship, as X
increases, Y increases. In a negative
linear relationship, as X increases, Y
decreases.
The larger the correlation coefficient, the
more consistently one Y occurs with one
X, the smaller the variability in Ys at an
X, the narrower the scatterplot, and the
more accurate our predictions.
More Examples
A coefficient of .84 indicates (1) as X increases, Y
consistently increases; (2) the Y scores paired with a
particular X show little variability; (3) the scatterplot is
a narrow ellipse, with the data points lying near the
upward-slanting regression line; and (4) by knowing an
individual’s X, we can closely predict his/her Y score. However, a coefficient of .38 indicates (1) as X increases, Y
somewhat consistently increases; (2) a variety of Y scores
are paired with each X; (3) the scatterplot is a relatively
wide ellipse around the upward-slanting regression line;
and (4) knowing an X score produces only moderately
accurate predictions of the paired Y score.
For Practice
1. In a(n) _____ relationship, as the X scores increase,
the Y scores increase or decrease only. This is not
true in a(n) _____ relationship.
10-2 THE PEARSON
CORRELATION COEFFICIENT
Statisticians have developed several different linear
correlation coefficients that are used with different
scales of measurement and different designs. However,
the most common correlation in behavioral research
is the Pearson correlation coefficient, which
describes the linear relationship between two interval
variables, two ratio variables, or one
interval and one ratio variable. The
Pearson
symbol for the Pearson correlation
correlation
coefficient in a sample is the lowercoefficient (r)
The coefficient that
case r. All of the example coefficients
describes the linear
in the previous section were rs.
relationship between
In addition to requiring intertwo interval or ratio
val or ratio scores, the r has two
variables
other requirements. First, the X
restricted
and Y scores should each form
range Occurs
when the range of
an approximately normal distriscores on a variable
bution. Second, we should avoid
is limited, producing
a restricted range of X or Y. A
an r that is smaller
than it would be if
restricted range occurs when the
the range was not
range of scores from a variable is
restricted
limited so that we have only a few
Chapter 10: Describing Relationships Using Correlation and Regression
171
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
multiplying each pair of z-scores together as
in this formula:
r
different scores that are close together. Then we will
inaccurately describe the relationship, obtaining an r
that is smaller than it would be if the range was not
restricted. Generally, a restricted range occurs when
researchers are too selective when obtaining participants. So, for example, if we are interested in the
relationship between participants’ high school grades
and their subsequent salaries, we should not restrict
the range of grades by studying only honor students.
Instead, we should include all students to get the widest range of grades possible.
10-2a Computing the Pearson r
Computing r requires that we have collected pairs
of X and Y scores. Then we use the same symbols
for Y that we’ve previously used for X, so Y is the
sum of the Y scores, Y 2 is the sum of the squared Y
scores, and (Y)2 is the squared sum of Y. You will
also see two new symbols: We have XY, called the
sum of the cross products. This says to first multiply
each X score in a pair times its corresponding Y score.
Then sum all of the resulting products. We also have
(X)(Y). This says to find the sum of the Xs and the
sum of the Ys. Then multiply the two sums together.
Mathematically, r determines the “average” amount
the X and Y scores correspond. However, as you saw
in Chapter 5, we compare scores from different variables by transforming them into z-scores. Thus, computing r involves transforming each Y score into a
z-score (call it zY) and transforming each X score into
a z-score (call it zX). Then, because z-scores involve
positive and negative numbers that always balance
out to zero, we can measure their correspondence by
172
(zX zY )
N
This says to multiply each zX times the paired
zY, sum the products, and divide by N. The
answer will always be between {1. If in each
pair tthe zs tend to have the same sign, then the more
the zz-scores match, the closer the r is to 1. If in each
pair the
t zs tend to have opposite signs, then the more
the two z-scores match, the closer the r is to 1.
However, not only is this formula extremely
time-consuming, it also leads to substantial rounding
errors. To derive a better computing formula for r, the
symbols zX and zY in the above formula were replaced
by their respective formulas. Then in each z-score formula, the symbols for the mean and standard deviation were replaced by their computing formulas. This
produces a monster of a formula. After reducing it, we
have the smaller monster below.
THE COMPUTING FORMULA FOR THE
PEARSON r IS
r
N(X Y ) (X )(Y )
2[N(X 2) (X ) 2][N(Y 2) (Y ) 2]
In the numerator, the N (the number of pairs of
scores) is multiplied by XY. Then subtract the quantity (X)(Y). In the
he denominator, in the left brackets
multiply N times X
X 2, and
from that subtract
act
(X) 2. In the right
brackets, multiply
N times Y 2, and
from that subtract
(Y) 2. Multiply the
he
answers from the brackets
together, and then find the
square root. Then
n divide
the denominator into
nto the
numerator and, voilà,
là, the
answer is the Pearson
on r.
For example, say
ay that
we ask 10 people the number of times they visited a
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
© iStockphoto.com/Floortje
© iStockphoto.com/Mario Aguilar/© iStockphoto.com/UteHil
THE DEFINING FORMULA FOR
THE PEARSON r IS
doctor in the last year and the number of glasses of
orange juice they drink daily. To describe the linear
relationship between juice drinking and doctor visits,
we compute r. Table 10.1 shows a good way to set up
the data.
STEP 1: Compute X, (X) 2, X 2, Y, (Y) 2,
Y 2, XY, and N. As in Table 10.1, in addition to the columns for X and Y, make columns for X 2 and Y 2. Also, make a column
for XY where you multiply each X times its
paired Y. Then sum all of the columns. Then
square X and Y.
N(XY ) (X )(Y )
2[N(X 2) (X ) 2][N(Y 2) (Y ) 2]
279
2(450 289)(2750 2209)
On the left, subtracting 450 289 gives 161. On
the right, subtracting 2750 2209 gives 541. So
2[10(45) 289][10(275) 2209]
STEP 2: Compute the numerator. 10 times 52 is 520,
and 17 times 47 is 799. Now, we have
2[10(45) 289][10(275) 2209]
STEP 3: Compute the denominator and then divide.
First perform the operations within each
set of brackets. In the left brackets above,
10 times 45 is 450. In the right brackets, 10
times 275 is 2750.
This gives
r
10(52) (17)(47)
520 799
Complete the numerator: 799 from 520 is 279.
(Note the negative sign.)
r
Filling in the formula for r, we get
r
r
279
2(161)(541)
Now multiply: 161 times 541 equals 87,101.
After taking the square root we have
r
279
.95
295.129
Table 10.1
Sample Data for Computing the r between Orange Juice Consumed
(the X variable) and Doctor Visits (the Y variable)
Participant
1
2
3
4
5
6
7
8
9
10
Glasses of Juice per Day
X
X2
0
0
0
0
1
1
1
1
1
1
2
4
2
4
3
9
3
9
4
16
N 10
X 17
(X)2 289
X 2 45
Doctor Visits per Year
Y
Y2
8
64
7
49
7
49
6
36
5
25
4
16
4
16
4
16
2
4
0
0
Y 47
XY
0
0
7
6
5
8
8
12
6
0
Y 2 275 XY 52
(Y )2 2209
Our correlation coefficient between orange juice
drinks and doctor visits is
.95. (Note: We usually
round off a correlation coefficient to two decimals.) Interpret this r as we discussed
previously: On a scale of 0
to {1, our .95 indicates
an extremely strong negative linear relationship: Each
amount of orange juice is
associated with a very small
range of doctor visits, and as
juice scores increase, doctor
visits consistently decrease.
Therefore, we envision a very
narrow scatterplot that slants
downward. Further, based
on participants’ juice scores,
we can very accurately predict their frequency of doctor visits.
Chapter 10: Describing Relationships Using Correlation and Regression
173
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
> Quick Practice
The Pearson correlation coefficient (r)
describes the linear relationship
between two interval and/or ratio
variables.
More Examples
> Answers
r
14
.64
21.817
2 3 5(19) 81 43 5(46) 196 4
5(28) (9)(14)
3
2
4
5
5
6
1
3
2
4
4
1
1
2
2
3
3
1
1
2
2
3
14
Y
Y
2 3 14 43 34 4
X
X
>
For Practice
Compute r for the following:
To compute r for the above scores:
X 12, (X) 2 144, X 2 28, Y 25, (X) 2 625,
Y 2 155, XY 56, and N 6
r
r
N(XY ) (X )(Y )
2 3 N(X ) (X ) 43 N(Y ) (Y ) 4
2
2
2
r
6(56) (12)(25)
2 3 6(28) 144 43 6(115) 625 4
336 300
2 3 6(28) 144 43 6(115) 625 4
r
174
The Pearson r describes a sample. Ultimately, however, we wish to describe the relationship that occurs
in nature—in the population. Therefore, we use the
sample’s correlation coefficient to estimate or infer
the coefficient we’d find if we could study everyone
in the population. But before we can believe that the
sample correlation represents the relationship found
in the population, we must first perform statistical
hypothesis testing and conclude that r is significant.
36
2 3 6(28) 144 43 6(115) 625 4
In the denominator, 6 times 28 is 168; 6 times 115
is 690, so
r
TESTING OF THE PEARSON r
2
In the numerator, 6 times 56 is 336, and 12 times 25 is
300, so
r
10-3 SIGNIFICANCE
36
2 3 168 144 43 690 625 4
36
Never accept that a sample
correlation coefficient reflects a
relationship in nature unless it is
significant.
2 3 24 43 65 4
36
21560
36
.91
39.497
Here’s a new example. We are interested in the
relationship between a man’s age and his physical
agility. We select 25 men, measure their age and their
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
H a: r ⬆ 0
On the other hand, the null hypothesis is always
that the predicted relationship does not exist, so here
it is that the correlation in the population is zero. So,
H0 : r 0
distribution showing
all possible values of
r that occur when
samples are drawn
from a population in
which r is zero
obtained a slanting elliptical sample scatterplot from
this population, producing an r that does not equal
zero. So, in our example, age and agility are not really
related, but the scores in our sample happen to pair up
so that it looks like they’re related. (On the other hand,
Ha implies that the population’s scatterplot would be
similar to the sample’s slanting elliptical scatterplot.)
As usual, we test H0, so here we determine the likelihood of obtaining our sample’s r when r is zero. To do
so, we examine the sampling distribution of r. To create
this, it is as if we infinitely sample the population back
in Figure 10.8, each time computing r. The sampling
distribution of r shows all possible values of the r
Figure 10.8
Scatterplot of a Population for Which r 0, as
Described by H0
Our r results from sampling error when selecting a sample from this
scatterplot.
Sample
scatterplots
Test scores
These are the two-tailed hypotheses whenever you
test that the sample either does or does not represent a
relationship. This is the most common approach and
the one we’ll use. (You can also test the H0 that your
sample represents a nonzero r. Consult an advanced
statistics book for details.)
sampling
distribution
of r A frequency
© iStockphoto.com/Alexander Yakovlev
agility, and using the previous formula, compute that
r .45. This suggests that the correlation in the
population would also be .45.
The symbol for the Pearson population correlation
coefficient is the Greek letter R, called “rho.” A r is interpreted in the same way as r: It is a number between 0 and
{1, indicating either a positive or a negative linear relationship in the population. The larger the absolute value
of r, the stronger the relationship and the more closely
the population’s scatterplot hugs the regression line.
Thus, we might estimate that r would equal .45
if we measured the agility and age of all men. But, on
the other hand, there is always the potential problem
of sampling error. Maybe these variables are not really
related in nature, but, through the luck of the draw of
who we tested, the sample data coincidentally form a
pattern that produces an r equal to .45. This leads
to our statistical hypotheses. As usual, we can perform
either a one- or two-tailed test.
The two-tailed test is used when we do not predict
the direction of the relationship, predicting that the correlation will be either positive or negative. First, any
alternative hypothesis always says the predicted relationship exists. If there is a correlation in the population
that is either positive or negative, then r does not equal
zero. So,
10-3a The Sampling
Distribution of r
Our H0 implies that we obtained our r because of
sampling error—that is, we have an unrepresentative
sample that poorly represents zero correlation in the
population. You can understand this by looking at
Figure 10.8. It shows the scatterplot in the population
that H0 says we would find: There is no relationship
here, so r is 0. However, H0 implies that, by chance, we
Age
Chapter 10: Describing Relationships Using Correlation and Regression
175
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
that occur by chance when samples are drawn from
a population in which r is zero. Such a distribution
is shown in Figure 10.9. The only novelty here is that
along the X axis are different values of r. When r 0,
the most frequent sample r is also 0, so the mean of the
sampling distribution—the average r—is 0. Because of
sampling error, however, sometimes we might obtain a
positive r and sometimes a negative r. Most often the r
will be relatively small (close to 0), but less frequently
we may obtain a larger r that falls more into the tails
of the distribution. Thus, the larger the r (whether positive or negative), the less likely it is to occur when the
sample represents a population where r 0.
To test H0, we determine where our r is located
in this “r-distribution.” The size of our r directly communicates its location. For example, the mean of the
sampling distribution is always 0, so our r of .45 is a
distance of .45 below the mean. Therefore, we test H0
simply by examining our obtained r, which is robt. To
determine whether robt is in the region of rejection, we
compare it to rcrit.
10-3b Drawing Conclusions about r
As with the t-distribution, the shape of the r-distribution is slightly different for each df, so there is a
different value of rcrit for each df. Table 3 in Appendix
B gives the critical values of the Pearson correlation
coefficient. Use this “r-table” in the same way you’ve
used the t-table: Find rcrit for either a one- or a twotailed test at the appropriate a and df.
THE FORMULA FOR THE DEGREES OF
FREEDOM FOR THE PEARSON CORRELATION
COEFFICIENT IS:
df N 2
where N is the number of X-Y pairs in the data
Figure 10.9
Sampling Distribution of r When r 0
It is an approximately normal distribution, with values of r plotted along the X axis.
μ
f
–1. 0 ... –r –r
–r
–r
–r
–r
–r
–r 0 +r +r +r +r +r +r +r +r ... +1.0
Values of r
Figure 10.10
H0 Sampling Distribution of r When H0: r 0
For the two-tailed test, there is a region of rejection for positive values of robt and for negative values of robt.
μ
f
Values of r –1.0 ...
–r
robt = –.45
176
–r
–r –r –r –r
–rcrit = –.396
0 +r +r +r +r +r +r
+rcrit = +.396
... +1.0
For our example, N
was 25, so df 23. For
a .05, the two-tailed
rcrit is { .396. We set up
the sampling distribution as in Figure 10.10.
An robt of .45 is beyond
the rcrit of {.396, so it is
in the region of rejection.
Thus, our r is so unlikely
to occur if we had been
representing the population where r is 0 that we
reject the H0 that we were
representing this population. We conclude that
the robt is “significantly
different from zero.” (If
your df is not listed in the
r-table, use the bracketing df and critical values
as we did for the t-test.)
The rules for interpreting a significant result
here are the same as with
previous statistics. In particular, a is again the theoretical probability of a
Type I error. Here, a Type
I error is rejecting the H0
that the population correlation is zero, when in fact
the correlation is zero.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Because the sample robt is .45, our best estimate
is that in the population, r equals .45. However, recognizing that the sample may contain sampling error,
we say that r is around .45. (We could more precisely
describe this r by computing a confidence interval for r:
Consult an advanced textbook.)
If robt did not lie beyond rcrit, then we would retain
H0 and conclude that the sample may represent a population where r 0. However, we have not proven
there is no relationship in the population. Also, we
would be concerned about having sufficient power so
that we did not make a Type II error by missing a
relationship that exists. Therefore, we make no claims
about the relationship, one way or the other.
Figure 10.11
H0 Sampling Distribution of r Where r 0 for
One-Tailed Test
Predicting positive
correlation
H0 : ρ ≤ 0
Ha: ρ > 0
μ
f
–1.0 ...
–r –r –r –r 0 +r +r +r +r ... +1.0
10-3c One-Tailed Tests of r
+rcrit
Predicting negative
correlation
If we had predicted only a positive correlation or only
a negative correlation, then we would have performed
a one-tailed test.
H0: ρ ≥ 0
Ha: ρ < 0
μ
THE ONE-TAILED HYPOTHESES FOR TESTING
A CORRELATION COEFFICIENT
© ColorBlind Images/Blend Images/Jupiterimages
Predicting a positive Predicting a negative
correlation
correlation
H0 : r 0
H0 : r 0
H a: r 0
H a: r
f
–1.0 ...
0
–r –r –r –r 0 +r +r +r +r ... +1.0
–rcrit
When predicting a positive relationship, we are
saying that r will be greater than zero (in Ha) but
H0 says we are wrong. When predicting a negative
relationship, we are saying that r will be less than zero
(in Ha) but H0 says we are wrong.
We test each H0 by again examining the sampling
distribution of
o r, created for when r
0. From the r-table in Appendix B, find the one-tailed critical
value for df and a, and set up
one of the sampling distributions
shown in Figure 10.11. When
predicting a positive correlation,
use the upper distribution:
roobt is significant if it falls
beyond the positive
rcrit. When predicting
a negative correlation,
use the lower distribution: robt is significant
if it falls beyond the
negative rcrit.
10-3d Summary of the Pearson
Correlation Coefficient
The Pearson correlation coefficient describes the
strength and direction of a linear relationship between
normally distributed interval/ratio variables.
1. Compute robt.
2. Create either the two-tailed or the one-tailed H0
and Ha.
3. Set up the sampling distribution and, using df N 2 (where N is the number of pairs), find rcrit
in the r-table.
4. If robt is beyond rcrit, the results are significant, so
describe the relationship and population correlation coefficient (r). If robt is not beyond rcrit, the
results are not significant and make no conclusion
about the relationship.
5. Further describe significant relationships using
the linear regression and r2 procedures discussed
in the next sections.
Chapter 10: Describing Relationships Using Correlation and Regression
177
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
> Quick Practice
>
You should always perform hypothesis
testing on a correlation coefficient
to be confident it is not produced by
sampling error.
More Examples
We compute an r .32 (N 42). We predict some
kind of relationship, so H0: r 0; Ha: r ≠ 0. With a .05
and df 42 2 40, the two-tailed rcrit {.304. The
robt is beyond rcrit, so it is significant: We expect the population correlation coefficient (r) to be around .32.
For Practice
We predict a negative relationship and obtain
robt .44.
1. What are H0 and Ha?
2. With a .05 and N 10, what is rcrit?
3. What is the conclusion about robt?
4. What is the conclusion about the relationship in the
population?
> Answers
4. Make no conclusion.
3. not significant
2. df 8, rcrit .549
1. H0: r ≥ 0; Ha: r < 0
linear
regression
The procedure used
to predict Y scores
based on correlated
X scores
predictor
variable The
known X scores used
in linear regression
procedures to predict
unknown Y scores
criterion
variable The
unknown Y scores that
are predicted in linear
regression procedures
178
10-4 STATISTICS
IN THE RESEARCH
LITERATURE:
REPORTING r
Report the Pearson correlation coefficient using the same format as with
previous statistics. Thus, in our agility
study, the robt of .45 was significant
with 23 df. We report this as r(23) .45, p < .05. As usual, df is in parentheses and because a .05, the probability of a Type I error is less than .05.
Understand that, although theoretically a correlation coefficient may be
as large as {1, in real research such values do not occur.
Any data will reflect the behaviors of living organisms,
who always show variability. Therefore, adjust your
expectations about real correlation coefficients: Typically, researchers obtain coefficients in the neighborhood
of {.30 to {.50, so below {.30 is considered rather
weak and above {.50 is considered extremely strong.
Finally, published research often involves a rather
large N, producing a complex scatterplot that is difficult to read. Therefore, instead, a graph showing only
the regression line may be included.
10-5 AN INTRODUCTION
TO LINEAR REGRESSION
We’ve seen that in a relationship, particular Y scores are
naturally paired with particular X scores. Therefore, if
we know an individual’s X score and the relationship
between X and Y, we can predict the individual’s Y
score. The statistical procedure for making such predictions is called linear regression. Linear regression
is the procedure for predicting unknown Y scores
based on known correlated X scores. To use regression, we first establish the relationship by computing
r for a sample and determining that it is significant.
Then, essentially, the regression procedure identifies
the Y score that everyone in our sample scored around
when they scored at a particular X. We predict that
anyone else at that X would also score around that Y.
Therefore, we can measure the X scores of individuals
who were not in our sample and we have a good idea
of what their corresponding Y score would be.
For example, the reason that students take the
Scholastic Aptitude Test (SAT) when applying to some
colleges is because researchers have previously established that SAT scores are positively correlated with
college grades: We know the typical college grade
average (Y) that is paired with a particular SAT score
(X). Therefore, through regression techniques, the SAT
scores of applying students are used to predict their
future college performance. If the predicted grades are
high enough, the student is admitted to the college.
Because we base our predictions on someone’s X
score, in correlational research the X variable is often
referred to as the predictor variable. The Y variable
is called the criterion variable. Thus, above, SAT
score is the predictor variable and college grade average is the criterion variable.
You can understand the regression procedure by
looking at Figure 10.12. First, in the scatterplot on
the left we see that participants who scored an X of 1
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Figure 10.12
Graphs Showing the Actual Y Scores and the Predicted Y Scores from the
Regression Line
Y scores
Actual scores
Predicted scores
7
7
6
6
5
5
4
Y' 4
4
3
3
2
2
1
1
0
1
2
3
4
X scores
0
5
had Ys of 1, 2, or 3, so literally they scored around 2.
Thus, our best prediction for anyone in the future who
scores an X of 1 is to predict a Y of 2. Likewise, for
any other X, our best prediction is the central value of
Y. So, for example, for people scoring an X of 3, we
see Ys of 3, 4, and 5, so for anyone else scoring an X
of 3, we predict a Y of 4.
Notice that because the regression line passes
through the center of the scatterplot, you could obtain
these predicted Y scores by traveling vertically from an X
until you intercept the regression line, and then traveling
horizontally until you intercept the Y axis. For example,
the arrows in the right-hand graph of Figure 10.12 show
that at the X of 3, the predicted Y score is again 4. The
symbol for this predicted Y score is Y which is pronounced “Y prime.” In fact, the Y at any X is the value
of Y falling on the regression line. Therefore, the regression line consists of the data points formed by pairing
every possible value of X with its corresponding value of
Y . (In a less symmetrical scatterplot, the regression line
would still, “on average,” pass through the center of the
scatterplot so that, considering the entire linear relationship in the data, our best prediction would still be the
value of Y falling on the regression line.)
Performing linear regression is like reading off
the value of Y on the regression line at a particular
X as we did above, but we can be more precise than
that. Instead, we compute the linear regression equation. The linear regression equation is the equation
that produces the value of Y at each X and defines the
straight line that summarizes a relationship. Thus, the
1
2
3
4
X scores
5
equation allows us to do
two things: First, we use
the equation to produce
the value of Y at several Xs. When we plot
the data points for these
X-Y pairs and connect
them with a line, we have
plotted the regression
line. Second, because the
equation allows us to
determine the Y at any
X, we can use the equation to directly predict
anyone’s Y score.
The general form
of the linear regression
equation is
Y bX a.
The b stands for the slope
of the line, a number indicating the degree and direction
the line slants. The X stands for the score on the X variable. The a stands for the Y intercept, the value of Y
when the regression line intercepts, or crosses, the Y axis.
So, using our data, we compute a and b. Then the
formula says that to find the value of Y for a given
X score, multiply b times the score and add a. Essentially, the regression equation describes how, starting at a particular value of Y (the Y intercept), the Y
scores tend to change at a particular rate (described
by the slope) as the X scores increase. Then, the center
of the Y scores paired with a particular X is Y . This
Y is the predicted Y score for anyone scoring that X.
(Appendix A.3 provides an expanded discussion of
the components of the regression equation and shows
their computation. SPSS also computes them.)
Of course, not all relationships are equal, so we are
also concerned about the accuracy of our predictions.
We determine this accuracy by seeing how well we can predict the
predicted Y
score (Y ) In linear
actual Y scores in our data. Recall
regression, the predicted
from previous sections that the
Y score at a particular
larger the correlation coefficient,
X, based on the linear
relationship summarized
the more accurate our predicby the regression line
tions will be. This is because in a
linear regression
stronger relationship, there is less
equation The
variability or “spread” among the
equation that produces
group of Y scores at each X, so the
the value of Y at each X
and defines the straight
Y scores are closer to the regresline that summarizes a
sion line. Therefore, the Y scores
relationship
at an X are closer to the central Y
Chapter 10: Describing Relationships Using Correlation and Regression
179
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
we predict, so our predictions are closer to the actual
scores. You can see this in the scatterplot back in Figure
10.5, where a high correlation produced a narrow scatterplot with data points close to the regression line. The
data points represent actual Y scores, and the regression line represents Y scores, so the Y scores are close
to their Y and we have greater accuracy. Conversely,
the weaker relationship back in Figure 10.6 produced
a scatterplot that is vertically spread out so that data
points are relatively far from the regression line. Therefore, the actual Y scores are far from their Y , so we
have less accuracy.
Thus, the relative size of r indicates the relative
accuracy of our predictions. However, we can directly
measure the accuracy by using an advanced statistic
called the standard error of the estimate. (See Appendix A.3 for its formula.) The standard error of the
estimate is somewhat like the “average” difference
between the actual Y scores that participants would
obtain and the Y scores we predict for them. Therefore, the larger it is, the greater our “average” error
when using regression procedures to predict Y scores.
10-6 THE PROPORTION
OF VARIANCE ACCOUNTED
FOR: r2
From the correlation coefficient we can compute one
more piece of information about a relationship, called
the proportion of variance accounted for. In the previous chapter we saw that with experiments this measured
the “effect size.” With correlational studies, we don’t call
it effect size, because we cannot confidently conclude
that changing X caused Y to change, so we can’t say that
X had an “effect.” The logic, however, is the same. In any
relationship, the proportion of variance accounted
for describes the proportion of all differences in Y scores
that are associated with changes in the X variable. For
example, consider the scores in this simple relationship:
proportion
of variance
accounted for
In a correlational
design, the proportion
of the differences in Y
scores associated with
changes in X
180
X
1
1
1
Y
1
2
3
2
2
2
5
6
7
When we change from an X of 1 to an X of 2,
the Ys change from scores around 2 to scores around
6. These are differences in Y associated with changes
in X. However, we also see differences in Y when X
does not change: At an X of 1, for example, one participant had a 1 while someone else had a 2. These
are differences not associated with changes in X.
Thus, out of all the differences among these six Y
scores, some differences are associated with changing X and some differences are not. The proportion
of variance accounted for is the proportion of all differences in Y scores that are associated with changes
in X scores.
Luckily, most of the mathematical operations
needed to measure the variability in Y that is associated with changing X are performed by first computing the Pearson r. Then the proportion of variance
accounted for is
THE FORMULA FOR THE PROPORTION OF
VARIANCE ACCOUNTED FOR IS
Proportion of variance accounted for r 2
Not too tough! Compute r and then square it. For
example, previously our age and agility scores produced r .45, so r2 (.45)2 .20. Thus, out of
all of the differences in the agility scores, .20 or 20%
of them are associated with differences in our men’s
ages. (And 80% of the differences in the agility scores
are not associated with changes in age.)
Thus, while r describes the consistency of the
pairing of a particular X with a particular Y in a
relationship, r2 is slightly different: It indicates the
extent to which the differences in Y occur along with
changes in X. The r2 can be as low as 0 (when r 0),
indicating that no differences in Y scores are associated with X, to as high as 1 (when r { 1), indicating that 100% of the changes in Y occur when
X changes, with no differences in Y occurring at the
same X. However, we previously noted that in real
research, correlation coefficients tend to be between
.30 and .50. Therefore, squaring these values indicates that the proportion of variance accounted for is
usually between .09 and .25.
Computing the proportion of variance accounted
for is important because it gives an indication of the
accuracy of our predictions if we were to perform the
linear regression procedure. If 20% of the differences
in agility scores are related to a man’s age, then by
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
using the resulting regression equation, we can predict these 20% of the differences. In other
her words,
we are, on average, 20% closer to knowing
ng a man’s
specific agility score when we use this relationship
elationship
to predict scores, compared to if we did not use—or
know about—the relationship. Therefore,
e, this relationship improves our ability to predict agility scores
by 20%.
The proportion of variance accounted for is used
to judge the usefulness and scientific importance of a
relationship. Although a relationship must be significant in order to be potentially important (and for us
ZUMA Press, Inc./Alamy
r2 indicates the proportion
of variance accounted for by
y
the relationship. This is the
e
proportion of all differences in Y
that occur with changes in X.
to even compute r 2), it can be significant and
an still be unimportant. The r 2
indicates
indicate the importance of a relationship beca
because the larger it is, the closer
the relationship
gets us to our goal of
relat
understanding
behavior. That is, by
underst
understanding
the differences in Y
underst
scores that
th are associated with changes
in X, we are actually describing the differences in behaviors that are associated with changes in X.
Further,
we compare different
Furth
relationships
by comparing each r2.
relationsh
Say that
tha in addition to correlating
age and agility, we also correlated a
man’s w
weight with his agility, finding
a significant
robt .60, so r 2 .36.
signifi
Thus, while a man’s age accounts
for only 20% of the differences in
agility scores, his weight accounts for
36% of these differences. Therefore, the relationship
involving a man’s weight is the more useful and important relationship, because with it we are better able to
predict and understand differences in physical agility.
USING SPSS
Review Card 10.4 contains instructions for using SPSS to compute the Pearson r, simultaneously performing either
a one- or two-tailed significance test. SPSS will also compute the mean and estimated population standard deviation for the X and Y scores. A separate routine will compute the components of the linear regression equation.
Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out
and use the Chapter Review Cards in the back of your book. Check out the additional study aids online
in CourseMate at www.cengagebrain.com
STUDY PROBLEMS
(Answers for odd-numbered problems are in Appendix C.)
1. What is the difference between an experiment
and a correlational study in terms of how we
(a) collect the data? (b) examine the relationship?
2. In a correlational study, how do you decide which
variable is X and which is Y?
3. (a) What is a scatterplot? (b) What is a regression
line?
4. Define: (a) positive linear relationship;
(b) negative linear relationship; (c) nonlinear
relationship.
Chapter 10: Describing Relationships Using Correlation and Regression
181
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
5. (a) When do you compute a Pearson correlation
coefficient? (b) What two characteristics of a linear relationship are described by the coefficient?
6. As the value of r approaches { 1, what does it
indicate about the following? (a) The consistency
that Ys are paired with X; (b) the variability of the
Y scores at each X; (c) the closeness of Y scores to
the regression line at each X; (d) the accuracy with
which we can predict Y if X is known.
7. What are the two statistical explanations for why
we obtain a particular r in a study?
8. (a) What does r stand for? (b) How do we determine the value of r?
9. Summarize the steps involved in hypothesis testing of the Pearson correlation coefficient, including additional analyses you may perform.
10. (a) What is linear regression used for? (b) How do
researchers set up and use the linear regression procedure to predict unknown Y scores? (c) What is the
symbol for a predicted Y score and what is it called?
11. (a) What is the proportion of variance accounted
for? (b) How is it measured in a correlational
study? (c) How do we use the size of this statistic
to judge a relationship?
12. For each of the following, indicate whether it
is a positive linear, negative linear, or nonlinear
relationship: (a) Quality of performance (Y) increases
with increased mental arousal (X) up to an optimal
level; then quality of performance decreases with
increased arousal. (b) More overweight people (X)
are less healthy (Y). (c) As number of minutes of
exercise increases each week (X), dieting individuals
lose more pounds (Y). (d) The number of bears in an
area (Y) decreases as the area becomes increasingly
populated by humans (X).
13. Ritchie sees the data in question 12(d) and concludes, “We should stop people from moving into
bear country so that we can preserve the bear
population.” Why is he correct or incorrect?
14. John finds r .60 between the variables of
number of hours studied (X ) and number of errors
on a statistics test (Y ). He also finds r .30
between the variables of size of the classroom (X )
and number of errors on the test (Y ). (a) Describe
the relative shapes of the two scatterplots.
(b) Describe the relative amount of variability in
Y scores at each X in each study. (c) Describe the
relative closeness of Y scores to the regression line
in each study. (d) Which scatterplot will lead to
more accurate predictions and why?
15. In question 14, Kim claims that the relationship
involving hours studied (r .60) is twice as strong
as the relationship with classroom size (r .30),
so it is twice as accurate for predicting test scores.
(a) Is she correct? (b) Which variable is better for
predicting test errors and how do you know?
182
16. Andres asked if there is a relationship between
the quality of sneakers worn by a sample of 20
volleyball players and their average number of
points scored per game. He computed r .21
and immediately claimed he had evidence that
better-quality sneakers are related to better performance. (a) Is his claim correct? Why? (b) What
are H0 and Ha? (c) With a .05, what is rcrit? (d)
What should he conclude, and how should he
report the results? (e) What other computations
should he perform to describe this relationship?
17. Tasha asked whether the number of errors made
on a math test (X ) is related to the person’s level
of satisfaction with his/her performance (Y ). She
obtained these scores.
Participant Errors (X) Satisfaction (Y)
1
2
3
4
5
6
7
9
8
4
6
7
10
5
3
2
8
5
4
2
7
(a) Summarize this relationship. (Hint: Compute
something!) (b) What does this tell you about
the sample relationship? (c) What are H0 and Ha?
(d) With a .05, what is rcrit? (e) What do you
conclude about this relationship in nature? (f) Report
the results using the correct format. (g) What
proportion of the differences in participants’
satisfaction is linked to their error scores?
18. A researcher believes nurses are absent from work
more frequently when they score higher on a test of
“psychological burnout.” These data are collected:
Participant Burnout (X) Absences (Y )
1
2
3
4
5
6
7
8
9
2
1
2
3
4
4
7
7
8
4
7
6
9
6
8
7
10
11
(a) Compute the correlation coefficient. (b) What
are H0 and Ha? (c) With a .05, what is rcrit?
(d) What do you conclude about the strength of
this relationship in nature? (e) Report the results
using the correct format. (f) How scientifically
useful is this relationship?
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
19. We predict that the more people are initially
attracted to a person of the opposite sex, the
more anxious they become before their first date.
We obtain these scores.
Participant Attraction (X) Anxiety (Y)
1
2
3
4
5
6
7
8
9
10
2
6
1
3
6
9
6
6
4
2
21. Alonzo suspects that as a person’s stress level
changes, so does the amount of his or her impulse
buying. He collects data from 72 people and
obtains an robt .38. (a) Using a .05, is this r
significant? (b) Report his results using the correct format. (c) What other statistics would be
appropriate to compute?
8
14
5
8
10
15
8
8
7
6
(a) Compute the correlation coefficient. (b) Is
this a relatively strong or relatively weak sample
relationship, and what does that tell you about
the Y scores at each X? (c) What are H0 and Ha?
(d) What is rcrit? (e) Report your results using the
correct format. What do you conclude about this
relationship in the population? (f) How much does
knowing participants’ attraction scores improve
your ability to predict their anxiety scores?
20. Ramona measures how positive a person’s mood
is and how creative he or she is, obtaining the
following interval scores:
Participant Mood (X) Creativity (Y )
1
2
3
4
5
6
7
8
9
10
10
8
9
6
5
3
7
2
4
1
7
6
11
4
5
7
4
5
6
4
(a) Compute the correlation coefficient. (b) What
are H0 and Ha? (c) With a .05, what is rcrit?
(d) What should she conclude about this relationship in nature? (e) Report the results using the
correct format. (f) How scientifically useful is this
relationship?
22. Bertha computes the correlation between
participants’ physical strength and college
grade average using SPSS. She gets r .09,
p < .001. She concludes that this relationship
is very significant and is a useful tool for
predicting which college applicants are more
likely to succeed academically. Do you agree or
disagree? Why?
23. We wish to determine what happens to creativity
scores as participants’ intelligence scores
increase. Which variable is our X and which
is Y?
24. (a) What do we mean by “restricted range”?
(b) How do researchers create a restricted
range? (c) Why should we avoid a restricted
range?
25. Relationship A has r .20; Relationship
B has r .40. (a) The relationship with
the scatterplot that more closely hugs the
regression line is _____. (b) The relationship
having Y scores closer to the Y scores that
we’ll predict for them is ______. (c) Relationship
A improves our understanding of differences
in behavior Y by ______%. (d) Relationship B
improves our understanding of differences in
behavior Y by ______%. (e) Relationship B is
______ times as useful in predicting scores as
Relationship A.
Chapter 10: Describing Relationships Using Correlation and Regression
183
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter
11
HYPOTHESIS TESTING USING
THE ONE-WAY ANALYSIS OF
VARIANCE
LOOKING
BACK
GOING
F O R WA R D
Be sure you understand:
Your goals in this chapter are to learn:
• From Chapter 1, what an independent
variable, a condition, and a dependent
variable are.
• The terminology of analysis of variance.
• From Chapter 4, that variance measures the differences among scores.
• From Chapter 7, why we limit the
probability of a Type I error to .05.
• When and how to compute Fobt.
• Why Fobt should equal 1 if H0 is true, and why it is greater than 1 if
H0 is false.
• When and how to compute Tukey’s HSD test.
• How eta squared describes effect size.
• From Chapter 9, what independent
samples and related samples are and
what effect size is.
I
n this chapter we return to analyzing experiments. We have
Sections
11-1
An Overview of the Analysis
of Variance
11-2
11-3
11-4
Components of the ANOVA
11-5
Statistics in the Research
Literature: Reporting ANOVA
11-6
11-7
Effect Size and Eta2
184
Performing the ANOVA
only one more common type of parametric procedure to
learn, and it is called the analysis of variance. This chapter
will show you (1) the general logic behind the analysis of
variance, (2) how to perform this procedure, and (3) an additional part to this analysis called post hoc tests.
Performing the Tukey
HSD Test
A Word about the
Within-Subjects ANOVA
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
© Influx Productions/Photodisc/Jupiterimages
11-1 AN OVERVIEW OF
THE ANALYSIS OF VARIANCE
It is important to know about the analysis of variance
because it is the most common inferential statistical
procedure used to analyze experiments. Why? Because
there are actually many versions of this procedure, so
it can be used with many different designs: It can be
applied to independent or related samples, to an independent variable involving any number of conditions,
and to a study involving any number of independent
variables. Such complex designs are common because,
first, an adequate test of the experimental hypotheses
may require such a design. Second, after all of the time
and effort involved in setting up a study, often little
more is needed to test additional conditions or variables.
Then we learn even more about a behavior (which is the
purpose of research). Therefore, you’ll often encounter
the analysis of variance when conducting your own
research or when reading about the research of others.
The analysis of variance has its own language,
which is also commonly used in research:
1. Analysis of variance is abbreviated as ANOVA.
2. An independent variable is also called a factor.
3. Each condition of the independent variable is
also called a level or a treatment, and differences produced by the independent variable are
a treatment effect.
4. The symbol for the number of levels in a factor
is k.
5. We have slightly different formulas for an
ANOVA depending on our design. A one-way
ANOVA is performed when one independent variable is tested. (A “two-way” ANOVA is used with
two independent variables, and so on.)
6. When an independent variable is studied using
independent samples, it is called a betweensubjects factor and involves using the formulas
for a between-subjects ANOVA.
7. When a factor is studied using related samples, it
is called a within-subjects factor and requires
the formulas for a within-subjects ANOVA.
In this chapter we discuss the one-way betweensubjects ANOVA. (The formulas for a one-way withinsubjects ANOVA are presented in Appendix A.5.) As
an example, let’s examine how people perform a task,
depending on how difficult they believe the task will
be—the “perceived difficulty” of the task. We’ll create three conditions containing the unpowerful n of
5 participants each and provide them with the same
easy 10 math problems. However, we will tell participants in Level 1 that the problems are easy, in Level 2
that the problems are of medium difficulty, and in
Level 3 that the problems are difficult. Our dependent
variable is the number of problems that participants
correctly solve within an allotted time.
Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance
185
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table 11.1
Anova Key Terms
FACTOR An independent variable
LEVELS (TREATMENTS) The conditions of the independent
variable
TREATMENT EFFECT The result of changing the conditions of
an independent variable so that different populations of scores
having different ms are produced
ONE-WAY ANOVA The ANOVA performed when an experiment
has only one independent variable
The way to diagram such a one-way ANOVA is
shown in Table 11.2. Each column is a level of the
factor, containing the scores of participants tested
under that condition (here, symbolized by X). The
mean of each level is the mean of the scores from
that column. Because we have three levels, k 3.
(Notice that the general format is to label the factor
as factor A, with levels A1, A2, A3, and so on.)
Our purpose here is the same as in the two-sample
t-test, except now we have three conditions. We hope
to see a relationship where, as we change the level of
perceived difficulty, the mean number of correct solutions will also change. We would like to conclude that
this relationship is also found in nature, where each
sample and X represent a different population located
at a different m. But there’s the usual problem: Maybe
changing perceived difficulty does nothing, and if we
tested all participants in the population under all three
conditions, we would repeatedly find the same population of scores having the same m. By chance, however,
our three samples poorly represent this one population, and because some of the samples happen to contain lower or higher scores than the other samples, we
have the appearance of a relationship.
Therefore, we must test whether the differences
between our sample means reflect sampling error. The
analysis of variance (ANOVA) is the
parametric procedure for determining whether significant differences
analysis of
variance
occur in an experiment containing
(ANOVA) The
two or more sample means. Notice
parametric procedure
that when you have only two confor hypothesis testing
in an experiment
ditions, you can use either a twocontaining two or more
sample t-test or the ANOVA: You’ll
conditions
reach the same conclusions, and
186
BETWEEN-SUBJECTS FACTOR An independent variable that is
studied using independent samples in all conditions
BETWEEN-SUBJECTS ANOVA The type of ANOVA that is performed
when a study involves between-subjects factors
WITHIN-SUBJECTS FACTOR An independent variable that is
studied using related samples in all conditions
WITHIN-SUBJECTS ANOVA The type of ANOVA performed when a
study involves within-subjects factors
both have the same probability of making Type I and
Type II errors. However, you must use ANOVA when
you have more than two conditions.
Table 11.2
Diagram of a Study Having Three Levels of
One Factor
Each column represents a condition of the independent variable.
Factor A: Independent Variable
of Perceived Difficulty
Level A1: Level A2: Level A3:
Easy
Medium Difficult
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X1
X2
Conditions
k3
X3
The one-way between-subjects ANOVA requires
that
1. all conditions contain independent samples.
2. the dependent scores are normally distributed
interval or ratio scores.
3. the variances of the populations are
homogeneous.
Note: The ns in all conditions need not be equal,
although they should not be massively different. However, these procedures are much easier to perform with
equal ns.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
©Zeamonkey Images/Shutterstock.com
©Peter Clark/Shutterstock.com
©Fedor Selivanov/Shutterstock.com
LEVEL 1
LEVEL 2
LEVEL 3
11-1a Controlling the
Experiment-Wise Error Rate
You might be wondering why we even need ANOVA.
Couldn’t we use the independent-samples t-test to
test for significant differences among the three means
of our perceived-difficulty study? Couldn’t we test
whether X1 differs from X2, then test whether X2
differs from X3, and then test whether X1 differs from
X3? The answer is that we cannot use this approach
because of the resulting probability of making a Type
I error (rejecting a true H0).
To understand this, we must distinguish between
the probability of making a Type I error when comparing a pair of means and the probability of making a
Type I error somewhere in the experiment. In previous
chapters we’ve seen that alpha (a) is the probability
of making a Type I error when we compare a pair of
means. The probability of making a Type I error somewhere among the comparisons in an experiment is
called the experiment-wise error rate. Until now,
we have made only one comparison in an experiment,
so with a .05, the experiment-wise error rate was
also .05. But if we performed multiple t-tests in the
present study, we could make a Type I error when comparing X1 to X2, or when comparing X2 to X3, or when
comparing X1 to X3. Now we have three opportunities to make a Type I error, so the overall probability
of making at least one Type I error somewhere in the
experiment—the experiment-wise error rate—is much
greater than .05. Remember, a Type I error is the dangerous error of concluding the independent variable
has an effect when really it does not. The probability of
making a Type I error should never be greater than .05.
Therefore, we do not perform multiple t-tests
because the resulting experiment-wise error rate
would be greater than our alpha. Instead, we perform
The reason for performing
ANOVA is that it keeps the
experiment-wise error rate
equal to a.
ANOVA because it limits the experiment-wise error
rate, so that when we are finished with all of our
decisions, the probability that we’ve made any Type I
errors will equal our a.
11-1b Statistical Hypotheses
in ANOVA
ANOVA tests only two-tailed hypotheses. The null
hypothesis is that there are no differences between the
populations represented by the conditions. Thus, for
our perceived-difficulty study with the three levels of
easy, medium, and difficult, we have
H0: m1 m2 m3
In general, when we perform ANOVA on a factor with
k levels, the null hypothesis is H0: m1 m2 c mk.
The “ . . . mk” indicates there are
experimentas many ms as there are levels.
wise error
However,
the
alternative
rate The
hypothesis is not that all ms are
probability of making
a Type I error when
different, or Ha: m1 ⬆ m2 ⬆ m3. A
comparing all means
study may demonstrate differin an experiment
ences between some but not all
Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance
187
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
conditions. Perhaps our data represent a difference
between m1 and m2, but not between m1 and m3, or perhaps only m2 and m3 differ. To communicate this idea,
the alternative hypothesis is
If Fobt is significant, then perform
post hoc comparisons to
determine which specific means
differ significantly.
Ha: not all ms are equal
Ha implies that a relationship is present because
the population mean represented by one of the sample
means is different from the population mean represented by at least one other sample mean.
As usual, we test H0, so ANOVA tests whether all
sample means represent the same population mean.
11-1c The F Statistic and Post Hoc
Comparisons
Completing an ANOVA requires two major steps.
First, we compute the statistic called F to determine
whether any of the means represent different ms. We
calculate Fobt, which we compare to Fcrit.
When Fobt is not significant, it indicates there are
no significant differences between any means. Then,
the experiment has failed to demonstrate a relationship and it’s back to the drawing board.
When Fobt is significant, it indicates only that
somewhere among the means at least two or more
of them differ significantly. But, Fobt does not indicate
which specific means differ significantly. So, if Fobt for
the perceived-difficulty study is significant, we will
know we have one or more significant differences
somewhere among the means of the easy, medium, and
difficult levels, but we won’t know where they are.
Therefore, when Fobt is significant we perform a second statistical procedure, called post hoc comparisons.
Post hoc comparisons are like t-tests in which we
compare all pairs of means from a factor, one pair at a
time, to determine which means differ significantly from
each other. For the difficulty study we’ll compare the
means from easy and medium, from easy and difficult,
and from medium and difficult. However, we perform
post hoc comparisons only when Fobt is significant. A significant Fobt followed by post hoc comparisons ensures
that the experiment-wise probability of
a Type I error will equal our alpha.
post hoc
The one exception to performing
comparisons
Procedures used to
post hoc comparisons is when you
compare all pairs of
have only two levels in the factor. Then
means in a significant
the significant difference indicated by
factor to determine
which means differ
Fobt must be between the only two
significantly from each
means in the study, so it is unnecessary
other
to perform post hoc comparisons.
188
> Quick Practice
>
>
The one-way ANOVA is performed when
testing two or more conditions from one
independent variable.
A significant Fobt followed by post hoc
comparisons indicates which level
means differ significantly, with the
experiment-wise error rate equal to a.
More Examples
We measure the mood of participants after they have
won $0, $10, or $20 in a rigged card game. With one
independent variable, a one-way design is involved, and
the factor is the amount of money won. The levels are $0,
$10, or $20. If independent samples receive each treatment, we perform the between-subjects ANOVA. (Otherwise, perform the within-subjects ANOVA.) A significant
Fobt will indicate that at least two of the conditions produced significant differences in mean mood scores.
Perform post hoc comparisons to determine which
levels differ significantly, comparing the mean mood
scores for $0 versus $10, $0 versus $20, and $10 versus
$20. The probability of a Type I error in the study—the
experiment-wise error rate—equals a.
For Practice
1. A study involving one independent variable is a(n)
_____ design.
2. Perform the ____ ANOVA when a study involves
independent samples; perform the _____ ANOVA
when it involves related samples.
3. An independent variable is also called a(n) _____,
and a condition is also called a _____ or _____.
(continued)
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
In the ANOVA we compute this variance from
two perspectives, called the mean square within
groups and the mean square between groups.
4. The _____ will indicate whether any of the conditions differ, and then the _____ will indicate which
specific conditions differ.
5. The probability of a Type I error in the study is called
the ______.
> Answers
5. experiment-wise error rate
4. Fobt; post hoc comparisons
3. factor; level; treatment
2. between-subjects; within-subjects
1. one-way
11-2 COMPONENTS OF
THE ANOVA
The logic and components of all versions of the
ANOVA are very similar. In each case, the analysis of
variance does exactly that—it “analyzes variance.” But
we do not call it variance. Instead, ANOVA has more
of its own terminology. We begin with the defining formula for the estimated population variance on the left:
s2x (X X )2
sum of squares
N1
degress of freedom
SS
mean square MS
df
In the numerator of the variance, we find the sum of
the squared deviations between the mean and each
score. In ANOVA, the “sum of the squared deviations”
is shortened to sum of squares, which is abbreviated
as SS. In the denominator we divide by N 1, which is
our degrees of freedom or df. Recall that dividing the
sum of the squared deviations by the df produces the
“average” or the “mean” of the squared deviations. In
ANOVA, this is shortened to mean square, which is
symbolized as MS. So, when we compute MS we are
computing an estimate of the population variance.
11-2a The Mean Square
within Groups
The mean square within groups describes the
variability in scores within the conditions of an experiment. It is symbolized by MSwn. Recall that variance
is a way to measure the differences among the scores.
Here, we find the differences among the scores within
each condition and “pool” them (like we did in the
independent-samples t-test). Thus, the MSwn is the
“average” variability of the scores within each condition. Because we look at scores within one condition
at a time, MSwn stays the same regardless of whether
H0 is true or false. Either way, the MSwn is an estimate
of the variability of individual scores in any of the
populations being represented.
11-2b The Mean Square
between Groups
The other variance we compute is the mean square
between groups. The mean square between groups
describes the differences between the means of the conditions in a factor. It is symbolized by MSbn. We measure the differences between the means by treating them
as scores, finding the “average” amount
they deviate from their mean, which
sum of
is the overall mean of the experiment.
squares (SS)
The sum of the
In the same way that the deviations
squared deviations of
of raw scores around their mean
a set of scores around
describe how different the scores are
the mean of those
from each other, the deviations of
scores
the sample means around their overmean square
all mean describe how different the
(MS) In ANOVA, an
estimated population
sample means are from each other.
variance
As we’ll see, performing the
mean square
ANOVA involves first using our
within groups
data to compute the MSwn and MSbn.
(MSwn) Describes
The final step is to then compare
the variability of
scores within the
them by computing Fobt.
conditions
IF
H0 IS TRUE THEN F
OBT
SHOULD
EQUAL 1
mean square
between
groups (MSbn)
Describes the variability
among the means in
a factor
Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance
189
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
11-2c Comparing the
population. If H0 is true in our study and
we are dealing with one population, then
Mean Squares: The Logic
in our data, the variability of the sample
of the F-Ratio
means should equal the variability of the
individual scores. The MSbn estimates
The key to understanding ANOVA is first
the variability of the sample means, and
understanding what MSbn reflects when the
the MSwn estimates the variability
null hypothesis true. In this case,
of the individual scores. Thereeven though all conditions represent
fore: If we are dealing with only
the same population and m, we will
one population, our MSbn should
not perfectly represent this because
equal our MSwn. So, if H0 is true
©iStockphoto.com/Marek Uliasz
of sampling error, so the means of
for our study, the answer we comour conditions will not necessarily equal each other.
pute for MSbn should be the same answer we compute
Thus, MSbn is our way of measuring how much the
for MSwn.
means of our levels differ from each other because of
An easy way to determine if two numbers are
sampling error. Essentially, here the MSbn is an estimate
equal is to make a fraction out of them, which is what
of the “average” difference between sample means that
we do when computing Fobt.
sampling error will produce when we are representing
one underlying raw score population.
THE FORMULA FOR Fobt IS
The test of H0 is based on the fact that statisticians
have shown that when samples of scores are selected
MSbn
Fobt from one population, the size of the differences among
MSwn
the sample means will equal the size of the differences
among individual scores. This makes sense because how
This fraction is referred to as the F-ratio. The F-ratio
much the sample means differ depends on how much
equals the MSbn divided by the MSwn. (The MSbn is
the individual scores differ. Say that the variability in
always on top!)
the population is small so that all scores are very close
If we place the same number in the numerator as
to each other. When we select samples we will have litin
the
denominator, the ratio will equal 1. Thus, when
tle variety in scores to choose from, so each sample will
H0 is true and we are representing one population, the
contain close to the same scores as the next and their
MSbn should equal the MSwn, so Fobt should equal 1.
means also will be close to each other. However, if the
Of course Fobt may not equal exactly 1.00 when H0
variability is very large, we have many different scores
is
true,
because we may have sampling error in either
available. When we select samples of these scores, we
MSbn or MSwn. That is, either the differences among
will often encounter a very different batch each time, so
our individual scores and/or the differences among our
the means also will be very different each time.
means may inaccurately represent the corresponding
differences in the population. Therefore, realistically,
we expect that when H0 is true for our study, Fobt will
be “around” 1. In fact, statisticians have shown that
when Fobt is a fraction less than 1, mathematically this
can occur only because H0 is true and we have sampling
In one population, the variability
error in representing this. (Each MS is a variance that
of sample means equals the
squares differences, so Fobt cannot be a negative number.)
variability of individual scores.
It gets interesting, however, as Fobt becomes larger
than 1. No matter what our data show, the H0 implies
that Fobt is “trying” to equal 1, and if it does not, it’s
Here, then, is the logic of the
because of sampling error. Let’s think about that. If
ANOVA. We know that when we
Fobt 2, it is twice what H0 says it should be, although
F-ratio The ratio
are
dealing
with
one
population,
the
according to H0, we simply had a little bad luck in
of the mean square
between groups to the
variability of sample means from
representing the one population that is present. Or
mean square within
that population equals the variabilsay that Fobt 4, which means the MSbn is four times
groups
ity of individual scores from that
the size of MSwn, and Fobt is four times what it should
190
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
be if H0 is correct. Yet H0 says that MSbn would have
equaled MSwn, but we happened to get a few unrepresentative scores. If Fobt is, say, 10, then it and the MSbn
are ten times what H0 says they should be! Still, H0
says, “No big deal—a little sampling error.”
As this illustrates, the larger the Fobt, the more
difficult it is to believe that our data are poorly representing the situation where H0 is true. Of course, if
sampling error won’t explain so large an Fobt, then we
need something else that will. The answer is our independent variable. When Ha is true so that changing our
conditions would involve more than one population of
scores, MSbn will be larger than MSwn, and Fobt will be
larger than 1. Further, the more that changing the levels of our factor changes scores, the larger will be the
differences between our level means and so the larger
will be MSbn. However, the MSwn stays the same. Thus,
greater differences produced by our factor will produce
a larger Fobt. Turning this around, the larger the Fobt, the
more it appears that Ha is true. Putting this all together:
population. Therefore, retain the H0 that all conditions represent the same population. Say instead that MSbn 24
and MSwn 6, so Fobt 4. Because MSbn is so much
larger than MSwn, at least two conditions might represent different populations. If Fobt is beyond Fcrit, these
results are unlikely to be due to sampling error, so
accept the Ha that at least two conditions represent
different populations.
For Practice
1. MSwn is the symbol for the _____, and MSbn is the
symbol for the _____.
2. Differences between the individual scores in the
population are estimated by _____.
3. Differences between sample means in the population are estimated by _____.
4. The larger the Fobt, the _____ likely that H0 is true.
> Answers
3. MSbn 4. less
1. mean square within groups; mean square between
groups
If our Fobt is large enough to be beyond Fcrit, we
will conclude that our Fobt is so unlikely to occur if H0
were true that we will reject H0 and accept Ha. (This is
the logic of all ANOVAs, whether for a between- or a
within-subjects design.)
The MSbn measures the differences
among level means.
THE ANOVA
MS The MSwn measures the differences
among individual scores
Fobt 11-3 PERFORMING
Now we can discuss the computations involved in performing the ANOVA. In the beginning of this chapter
you saw that the formula for a mean square involves
dividing the sum of squares by the degrees of freedom.
In symbols, this is
> Quick Practice
>
>
>
2. MSwn
THE LARGER THE FOBT, THE LESS
LIKELY THAT H0 IS TRUE AND THE
MORE LIKELY THAT HA IS TRUE.
MSbn
MSwn
More Examples
In a study, MSbn 6 and MSwn 6, so Fobt 1. The MSbn
equals the MSwn when all samples belong to the same
SS
df
Adding subscripts, we compute the mean square
between groups (MSbn) by computing the sum of
squares between groups (SSbn) and dividing by the
degrees of freedom between groups (dfbn). Likewise,
we compute the mean square within groups (MSwn)
by computing the sum of squares within groups (SSwn)
and dividing by the degrees of freedom within groups
(dfwn). With MSbn and MSwn, we compute Fobt.
Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance
191
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
If all this strikes you as the most confusing thing
ever devised, you’ll find an ANOVA summary table very
helpful. Table 11.3 below shows the general format:
(2) the degrees of freedom, (3) the mean squares,
and (4) Fobt.
COMPUTING THE SUMS OF SQUARES The computa-
Table 11.3
tions here require four steps.
Summary Table of One-Way ANOVA
STEP 1: Compute the sums and means. As in
Table 11.4, compute X, X 2, and X for
each level (each column). Each n is the number of scores in the level. Then add together
the X from all levels to get the total, which
is “Xtot.” Also, add together the X 2 from
all levels to get the total, which is “X 2tot.”
Add the ns together to obtain N.
Source
Between
Within
Total
Sum of
Squares
SSbn
SSwn
SStot
Mean
Square
MSbn
MSwn
df
dfbn
dfwn
dftot
F
Fobt
STEP 2: Compute the total sum of squares (SStot ).
The “Source” column identifies each source of
variability, either between or within, and we also consider the total. Using the following formulas, we’ll
compute the components for the other columns.
11-3a Computing Fobt
Say that we performed the perceived-difficulty study
discussed earlier, telling participants that some math
problems were easy, of medium difficulty, or difficult, and measuring the number of problems they
solved. The data are presented in Table 11.4. As
shown in the following sections, there are four parts
to computing the Fobt: finding (1) the sum of squares,
Data from Perceived-Difficulty Experiment
Factor A: Perceived Difficulty
Level A1:
Level A2:
Level A3:
Easy
Medium
Difficult
9
4
1
12
6
3
4
8
4
8
2
5
7
10
2
X 30
X 354 X 220
n1 5
n2 5
X1 8
192
2
X2 6
(Xtot) 2
N
2
Using the data from Table 11.4, Xtot
629,
Xtot 85, and N 15, so
(85)2
SStot 629 15
7225
SStot 629 15
SStot 629 481.67
(Xtot) 2
(X in column) 2
SSbn a
b
n in column
N
X 15 Xtot 85
X 2 55
Xtot2 629
n3 5
N 15
k3
X3 3
STEP 3: Compute the sum of squares between
groups (SSbn ).
THE FORMULA FOR THE SUM OF SQUARES
BETWEEN GROUPS IS
Totals
2
SStot Xtot2 Thus, SStot 147.33
Table 11.4
X 40
THE FORMULA FOR THE TOTAL SUM OF
SQUARES IS
In Table 11.4, each column represents a level of
the factor. Find the X for a column, square that X,
and then divide by the n in that level. After doing this
for all levels, add the results together and subtract the
quantity (Xtot)2 >N:
SSbn a
(40)2 (30)2 (15)2
(85)2
b
5
5
5
5
so
SSbn (320 180 45) 481.67
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
COMPUTING
CO
and
MEAN
SQUARES You can work
S
SSbn 545 481.67 63.33
directly from the sumd
mary table to compute
ma
the mean squares.
th
© iStockphoto.com/Malerapaso
STEP 4: Compute the sum of squares
uares
within groups (SSwn). Mathathematically, SStot equals SS
Sbn
plus SSwn. So, the total minuss
the between leaves the within.
THE
STEP 1: Compute the mean
ST
square between groups.
THE FORMULA FOR THE MEAN SQUARE
BETWEEN GROUPS IS
THE FORMULA FOR THE SUM OF SQUARES
Q
WITHIN GROUPS IS
SSwn SStot SSbn
MSbn SSbn
dfbn
From the summary table
In the example, SStot is 147.33 and SSbn is 63.33 so
SSwn 147.33 63.33
SSwn 84.00
STEP 2: Compute the mean square within groups.
COMPUTING THE DEGREES OF FREEDOM Compute
the dfbn, dfwn, and dftot.
STEP 1: The degrees of freedom between groups
equal k 1,
where k is the number of levels in the factor.
In the example are three levels of perceived
difficulty, so k 3. Thus, dfbn 2.
STEP 2: The degrees of freedom within groups equal
N k,
where N is the total N in the experiment
and k is the number of levels in the factor.
In the example N is 15 and k is 3, so dfwn 15 3 12.
STEP 3: The degrees of freedom total equals N 1,
where N is the total N in the experiment. In
the example N is 15, so dftot 15 1 14
The dftot must equal the dfbn plus the dfwn. At this point
the ANOVA summary table looks like this:
Source
Between
Within
Total
Sum of
Squares
63.33
84.00
147.33
63.33
31.67
2
MSbn df
2
12
14
Mean
Square
MSbn
MSwn
F
Fobt
THE FORMULA FOR THE MEAN SQUARE
WITHIN GROUPS IS
MSwn SSwn
dfwn
For the example
MSwn 84
7.00
12
Do not compute the mean square for SStot because
it has no use.
COMPUTING THE F Finally, compute Fobt.
THE FORMULA FOR Fobt IS
Fobt MSbn
MSwn
In the example MSbn is 31.67 and MSwn is 7.00, so
Fobt MSbn
MSwn
31.67
4.52
7.00
Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance
193
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
F-distribution
The sampling
distribution of all values
of F that occur when
the null hypothesis is
true and all conditions
represent one
population m
Now the completed ANOVA summary table is
Source
Between
(difficulty)
Within
Total
Sum of
Squares
63.33
df
2
84.00
147.33
12
14
Mean
F
Square
31.67 4.52
7.00
2. Compute the degrees of freedom:
dfbn k 1 2 1 1
dfwn N k 8 2 6
dftot N 1 8 1 7
3. Compute the mean squares:
MSbn SSbn >dfbn 24.5>1 24.5
MSwn SSwn >dfwn 7.5>6 1.25
The Fobt is placed in the row labeled “Between.”
(Because this row reflects differences due to our treatment, we may also include the name of the independent variable here.)
4. Compute Fobt.
> Quick Practice
For Practice
>
To compute Fobt, compute SStot, SSbn, and
SSwn and dftot, dfbn, and dfwn. Dividing
SSbn by dfbn gives MSbn; dividing SSwn by
dfwn gives MSwn. Dividing MSbn by MSwn
gives Fobt.
More Examples
We test participants under conditions A1 and A2.
6
8
9
8
X1 4.25
X 17
X 2 75
n1 4
X 7.75
X 31
X 2 245
n2 4
Xtot 48
X tot2 320
N8
1. Compute the sums of squares:
SStot X2tot SSbn a
a
(Xtot)2
N
320 a
482
b 32
8
(Xtot)2
(X in column)2
b
n in column
N
172 312
482
b a b 24.5
4
4
8
SSwn SStot SSbn 32 24.5 7.5
194
3. Finally, Fobt equals ______ divided by ______.
> Answers
1. The SS and the df
4
5
3
5
2. For between groups, to compute ______ we divide
______ by ______. For within groups, to compute
______ we divide ______ by ______.
2. MSbn, SSbn, dfbn; MSwn, SSwn, dfwn
A2
1. What two components are needed to compute any
mean square?
3. MSbn, MSwn
A1
Fobt MSbn >MSwn 24.5>1.25 19.60
11-3b Interpreting Fobt
The final step is to compare Fobt to Fcrit, and for that
we examine the F-distribution. The F-distribution is
the sampling distribution showing the various values
of F that occur when H0 is true and all conditions represent one population. To create it, it is as if, using our
ns and k, we select the scores for all of our conditions
from one raw score population (like H0 says we did in
our experiment), and compute MSwn, MSbn, and then
Fobt. We do this an infinite number of times, and plotting the various Fs we obtain produces the sampling
distribution, as shown in Figure 11.1.
The F-distribution is skewed because there is no
limit to how large Fobt can be, but it cannot be less
than zero. The mean of the distribution is 1 because,
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
determine the shape of an F-distribution: the df used when computSampling Distribution of F When H0 Is True for dfbn 2 and dfwn 12
ing the mean square between groups
(dfbn) and the df used when computμ
ing the mean square within groups
(dfwn). Therefore, to obtain Fcrit, turn
to Table 4 in Appendix B, titled
f
“Critical Values of F.” A portion of
this “F-table” is presented in Table
11.5 below. Across the top of the
α = .05
table, the columns are labeled “df
between groups.” On the left-hand
F
F 1.0 F F
F F
F F
F
F F F F
0
side, the rows are labeled “df within
Fcrit = 3.88
groups.” Locate the appropriate
Fobt = 4.52
column and row using the dfs from
your study. The critical values in
dark type are for a .05, and those
in light type are for a .01. For our
example, dfbn 2 and dfwn 12. For a .05, the Fcrit
most often when H0 is true, MSbn will equal MSwn,
is 3.88. (If your df is not listed in the F-table, use the
and so F will equal 1. The right-hand tail, however,
bracketing dfs and their critical values as we’ve done
shows that sometimes, by chance, F is greater than 1.
previously.)
Because our Fobt can reflect a relationship in the popuAs shown in Figure 11.1, in our perceived-difficulty
lation only when it is greater than 1, the entire region
study, Fobt is 4.52 and Fcrit is 3.88. Our H0 says that Fobt
of rejection is in this upper tail of the F-distribution.
is greater than 1 because of sampling error and that
(That’s right, ANOVA involves two-tailed hypotheses,
actually, we are poorly representing no relationship in
but they are tested using only the upper tail of the
the population. However, our Fobt is beyond Fcrit and in
sampling distribution.)
the region of rejection, so we reject H0: Our Fobt is so
The F-distribution is actually a family of curves,
unlikely to occur if our samples were representing no
each having a slightly different shape, depending on
difference in the population that we reject that this is
our degrees of freedom. However, two values of df
what they represent. Therefore, we conclude that the
Fobt is significant and that
Table 11.5
the factor of perceived diffiPortion of Table 4 in Appendix B, “Critical Values of F”
culty produces a significant
difference in mean perforDegrees of Freedom
mance scores.
Within Groups
Degrees
of
Freedom
Between
Groups
Of course, had Fobt
(degrees of freedom
(degrees of freedom in numerator of F-ratio)
been less than Fcrit, then the
in denominator
corresponding differences
a
of F-ratio)
1
2
3
4
5
between our means would
1
.05
161
200
216
225
230
not be too unlikely to occur
.01
4,052
4,999
5,403
5,625
5,764
when H0 is true, so we
—
—
—
—
—
—
—
would not reject it. Then, as
usual, we’d draw no conclu—
—
—
—
—
—
—
sion about the influence of
11
.05
4.84
3.98
3.59
3.36
3.20
our independent variable,
.01
9.65
7.20
6.22
5.67
5.32
one way or the other
12
.05
4.75
3.88
3.49
3.26
3.11
(and we would consider if
we had sufficient power to
.01
9.33
6.93
5.95
5.41
5.06
prevent a Type II error).
Figure 11.1
Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance
195
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Studio DL/Corbis
STEP 1: Find qk.
Use Table 5 in Appendix B, entitled “Values
of Studentized Range Statistic.” Locate the
column labeled with the k that corresponds
to the number of means in your factor. Find
the row labeled with the dfwn used to compute your Fobt. (If your df is not in the table,
use the df in the table that is closest to it.)
Then find the qk for the appropriate a. For
our perceived-difficulty study, k 3, dfwn 12, and a .05, so qk 3.77.
STEP 2: Compute the HSD.
Because we rejected H0 and accepted Ha, we
return to the means from the levels:
THE FORMULA FOR TUKEY’S HSD IS
HSD (qk)a
Perceived Difficulty
Easy
Medium
Difficult
X1 8
X2 6
X3 3
To see the treatment effect, look at the overall pattern: Because the means change, the scores that produce them are changing, so a relationship is present;
as perceived difficulty increases, performance scores
decrease. However, we do not know if every increase
in difficulty results in a significant decrease in scores.
To determine that, we must perform the post hoc
comparisons.
11-4 PERFORMING THE
TUKEY HSD TEST
Remember that a significant Fobt indicates at least one
significant difference somewhere among the level means.
To determine which means differ, we perform post hoc
comparisons. Statisticians have developed a variety of
post hoc procedures that differ in how likely they are
to produce Type I or Type II errors. One common procedure that has reasonably low error rates is Tukey’s
HSD test. It is used only when the ns in
all levels of the factor are equal. The
Tukey’s HSD
HSD is a rearrangement of the t-test
test The post hoc
that computes the minimum difference
procedure performed
between two means that is required
with ANOVA to
compare means from
for them to differ significantly. (HSD
a factor when all levels
stands for Honestly Significant Differhave equal ns
ence.) The four steps to performing the
HSD test are:
196
MSwn
b
B n
MSwn is the denominator from your significant
F-ratio, and n is the number of scores in each level of
the factor.
In the example MSwn was 7.0, n was 5, and qk is
3.77, so
HSD (qk)a
MSwn
7.0
b (3.77)a
b
B n
A 5
(3.77)( 11.4) (3.77)(1.183) 4.46
Thus, HSD is 4.46.
STEP 3: Determine the differences between each pair
of means.
Subtract each mean from every other mean.
Ignore whether differences are positive or
negative because this is a two-tailed test.
The differences for the perceived-difficulty study
can be diagramed as shown below.
Easy
Medium
Difficult
X1 8
X2 6
X3 3
2.0
3.0
5.0
HSD 4.46
On the line connecting any two levels is the absolute
difference between their means.
STEP 4: Compare each difference to the HSD.
If the absolute difference between two means
is greater than the HSD, then these means
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
In our example, the HSD was 4.46. The means
from the easy level (8) and the difficult level (3) differ by more than 4.46, so they differ significantly. The
mean from the medium level (6), however, differs from
the other means by less than 4.46, so it does not differ
significantly from them.
Thus, our final conclusion is that we demonstrated
a relationship between performance and perceived difficulty, but only when we changed from the easy to the
difficult condition. If everyone in the population were
tested under these two conditions, we would expect to
find two populations of scores, one for the easy condition at a m around 8, and one for the difficult condition at a m around 3. Further, we could compute a
confidence interval for each m (using only the scores in
a condition and the formula in Chapter 8) to describe
an interval within which we are confident the m produced by the condition would fall. However, we cannot say anything about the population produced by
the medium condition, because it did not differ significantly from the other conditions. Finally, as usual, we
return to being behavioral researchers and interpret
the results in terms of our variables and behaviors:
“Psychologically,” why and how does the perceived
difficulty of a task influence performance?
11-4a Summary of the
©Keith Levit/Shutterstock.com
differ significantly. (It’s as if you performed
a t-test on them and tobt was significant.) If
the absolute difference between two means
is less than or equal to the HSD, then it is
not a significant difference (and would not
produce a significant tobt).
4. With a significant Fobt, more than two levels, and
equal ns, perform the Tukey HSD test.
a. Find qk in Table 5, using k and dfwn.
b. Compute the HSD.
c. Find the difference between each pair of level
means.
d. Any differences larger than the HSD are
significant.
5. Draw conclusions about the influence of your
independent variable by considering the significant means of your levels. Also consider the
measure of effect size described in Section 11.6.
> Quick Practice
one-way ANOVA
It’s been a long haul, but after checking the assumptions, here is how to perform a one-way ANOVA:
1. The null hypothesis is H0: m1 m2 . . . mk, and
the alternative hypothesis is Ha: not all ms are equal.
2. Compute Fobt.
a. Compute the sums of squares and the degrees
of freedom.
b. Compute the mean squares.
c. Compute Fobt.
3. Compare Fobt to Fcrit. Find Fcrit in the F-table using
dfbn and dfwn. Envision the F-distribution as in
Figure 11.1. If Fobt is larger than Fcrit, then Fobt is
significant, indicating that the means in at least
two conditions differ significantly.
>
>
Perform post hoc comparisons when Fobt
is significant to determine which levels
differ significantly.
Perform Turkey’s HSD test when all ns are
equal.
More Examples
An Fobt is significant, with X1 4.0, X2 1.5, and X3 6.8.
All n 11, MSwn 20.61, and dfwn 30.
To compute Tukey’s HSD, find qk. For k 3 and
dfwn 30, qk 3.49. Then:
HSD (qk) a
MSwn
B n
b (3.49) a
20.61
b 4.78
B 11
(continued)
Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance
197
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The differences are 4.0 1.5 2.5; 1.5 6.8 5.3;
4.0 6.8 2.8. Comparing each difference to
HSD 4.78, only X2 and X3 differ significanlty.
For Practice
We have X1 16.50, X2 11.50, and X3 8.92, with
n 21 in each condition, MSwn 63.44, and dfwn 60.
1. Which post hoc test should we perform?
2. What is qk here?
3. What is the HSD?
4. Which means differ significantly?
> Answers
4. Only X1 and X3 differ significantly.
3. HSD (3.40)(163.44>21) 5.91
2. For k 3 and dfwn 60, qk 3.40.
1. Turkey’s HSD
11-5 STATISTICS IN THE
RESEARCH LITERATURE:
REPORTING ANOVA
In research publications, an Fobt is reported using the
same format as previous statistics, except that we
include both the dfbn and the dfwn. In the perceiveddifficulty study, the significant Fobt was 4.52, with
dfbn 2 and dfwn 12. We report this as
11-6 EFFECT SIZE AND ETA
2
Recall that in experiments we describe the effect size,
which tells us how large an impact the independent
variable had on dependent scores. In Chapter 9, you
saw one measure of effect size, called Cohen’s d, but
it is generally used only with two-sample designs.
Instead, with larger designs we compute the proportion of variance accounted for. Recall that this is the
proportion of all differences among dependent scores
in an experiment that are produced by changing
our conditions. The greater the proportion of differences we can account for, the greater the impact of
the independent variable in that the more it controls
the behavior. This produces a stronger or more consistent relationship in which differences in scores tend to
occur only when the conditions change. Then we see a
set of similar behaviors and scores for everyone in one
condition, with a different set of similar behaviors and
scores for everyone in a different condition.
In ANOVA, this effect size is computed by squaring a new correlation coefficient symbolized by the
Greek letter “eta” (pronounced “ay-tah”). The symbol for eta squared is H2. Eta squared indicates the
proportion of variance in dependent scores that is
accounted for by changing the levels of a factor. An h2
can be used to describe any linear or nonlinear relationship containing two or more levels of a factor. In a
particular experiment, h2 will be a proportion between
0 and 1, indicating the extent to which dependent
scores change as the independent variable changes.
THE FORMULA FOR ETA SQUARED IS
h2 F(2, 12) 4.52, p .05
Note: In the parentheses always report dfbn and
then dfwn.
Usually the HSD value is not
reported. Instead, indicate that the
Tukey HSD was performed, give the
eta squared
2
alpha level used, and identify which
(H ) A measure
of effect size in
levels differ significantly. However,
ANOVA, indicating
for completeness, the means and
the proportion of
standard deviations from all levels
variance in the
are reported, even those that do not
dependent variable
that is accounted for by
differ significantly. Likewise, when
changing the levels of
graphing the results, the means
a factor
from all levels are plotted.
198
SSbn
SStot
The SSbn reflects the differences that occur when
we change the conditions. The SStot reflects the differences among all scores in the experiment. Thus, h 2
reflects the proportion of all differences in scores that
are associated with changing the conditions.
For example, for the perceived-difficulty study,
SSbn was 63.33 and SStot was 147.33. So
h2 SSbn
SStot
63.33
.43
147.33
Thus, 43% of all differences in scores were
accounted for by changing the levels of perceived difficulty. Because 43% is a substantial amount, this factor
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
plays an important role in determining participants’
scores, so it is also a scientifically important variable
for understanding participants’ underlying behaviors.
h2 (eta squared) measures the
effect size of a factor by indicating
the proportion of all differences
in dependent scores that is
accounted for by changing the
levels of a factor.
11-7 A WORD ABOUT THE
WITHIN-SUBJECTS ANOVA
Recall that we create related samples either by
matching participants in each condition or by using
repeated measures of the same participants. However, matching participants is usually unworkable
when we have more than two levels. Instead, in our
perceived-difficulty study, we might have repeatedly measured one group of participants under all
three levels of difficulty. This would equate the conditions for such things as participants’ math ability or their math anxiety. Then, in each condition
we’d give them different but equivalent problems
to solve.
Repeated measures would create a one-way
within-subjects design, so we would perform the oneway within-subjects ANOVA. As shown in Appendix A.5, the only difference in this analysis is how we
compute the denominator of the F-ratio. Otherwise,
we are still testing the same H0 that the conditions do
not represent different populations. We use the same
logic in which Fobt should equal 1 if H0 is true, so the
larger the Fobt is, the less likely that H0 is true and
the more likely that Ha is true. If Fobt is larger than
Fcrit, then it is significant, so we perform Tukey’s HSD
(we’ll have equal ns) and compute h2. The results
are reported and interpreted as in the between-subjects design.
USING SPSS
Review Card 11.4 describes how to use SPSS to perform the one-way between-subjects ANOVA, including
the HSD test. The program also computes the mean and estimated population standard deviation for
each condition, determines the 95% confidence interval for each mean, and plots a line graph of the means.
However, it does not compute h 2 (and the “partial eta squared” provided is not what we’ve discussed).
Instructions are also provided for the one way within-subjects ANOVA. Here, SPSS will not perform the HSD test.
Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out
and use the Chapter Review Cards in the back of your book. Check out the additional study aids online
in CourseMate at www.cengagebrain.com
STUDY PROBLEMS
(Answers for odd-numbered problems are in Appendix C.)
1. What does each of the following terms mean?
(a) ANOVA; (b) one-way design; (c) factor;
(d) level; (e) treatment; (f) between subjects;
(g) within subjects; (h) k.
Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance
199
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
2. (a) How do you identify which variable in a study
is the factor? (b) How do you identify the levels of
a factor? (c) How do you identify the dependent
variable?
3. What are the two statistical explanations for why
we have differing means among the conditions of
an experiment?
4. (a) Describe all the requirements of a design
for performing the one-way between-subjects
ANOVA. (b) Describe the requirements of a design
for performing the one-way within-subjects
ANOVA.
5. (a) What is the experiment-wise error rate?
(b) How do multiple t-tests create a problem with
the experiment-wise error rate? (c) How does
ANOVA fix this problem?
6. What are two reasons for conducting a study with
more than two levels of a factor?
7. Summarize the steps involved in analyzing an
experiment when k 2.
8. (a) When is it necessary to perform post hoc
comparisons? (b) Why do we perform post hoc
tests? (c) What is the name of the post hoc
test discussed in this chapter? (d) When do you
perform it?
9. In words, what are H0 and Ha in ANOVA?
10. What are the two types of mean squares and what
does each estimate in the population?
11. (a) Why should Fobt equal 1 if the data represent
the H0 situation? (b) Why is Fobt greater than
1 when the data represent the Ha situation?
(c) What does a significant Fobt indicate about the
means of the levels of a factor?
12. (a) What is h 2 called? (b) What does it measure?
(c) What does it tell us about the influence of a
factor?
13. (a) What does the F-distribution show? (b) What
do we know about an Fobt if it is in the region of
rejection? (c) How does such an Fobt relate back to
what our conditions represent?
14. A study compares four levels. (a) What is H0?
(b) What is Ha? (c) Why is Ha not written as
Ha: m1 ⬆ m2 ⬆ m3 ⬆ m4?
15. (a) Dixon computes an Fobt of .63. How should this
be interpreted? (b) He computes another Fobt of
1.7. How should this be interpreted?
16. Lauren obtained a significant Fobt from an
experiment with five levels. She immediately
concluded that changing each level results in a
significant change in the dependent variable.
(a) Is she correct? Why or why not? (b) What must
she do?
200
17. A report says that the between-subjects factor of
participants’ salaries produced significant differences in self-esteem. (a) Describe the design of this
study. (b) What was the outcome of the ANOVA
and what does it indicate? (c) What is the researcher’s next step?
18. A report says that the level of math anxiety for
30 statistics students decreased over the duration of the semester. (a) How were the students
tested? (b) What was the factor? (c) What was the
outcome of the ANOVA and what does it indicate? (d) What do we call testing the same participants in this way? (e) In ANOVA, what do we call
this design?
19. In a study in which k 3, n 21, X 1 45.3,
X 2 16.9, and X 3 8.2, you obtain these sums of
squares.
Source
Between
Within
Total
Sum of Squares
df
Mean Square
F
147.32
862.99
1010.31
—
—
—
—
—
—
(a) Complete the ANOVA summary table. (b) What
are H0 and Ha? (c) What is Fcrit? (d) What do you
conclude about Fobt? Report your results in the
correct format. (e) Perform the Tukey HSD test if
appropriate. (f) What do you conclude about this
relationship in the population? (g) What is the
effect size in this study, and what does this tell you
about the influence of the independent variable?
20. A researcher investigated the effect of different
volumes of background noise on participants’
accuracy rates while performing a difficult task.
He tested three groups (n 11) and obtained the
following means: for low volume, X 66.5; for
medium, X 61.5; for loud, X 48.25. He computed the following sums of squares:
Source
Between
Within
Total
Sum of Squares
df
Mean Square
F
452.16
522.75
974.91
(a) Complete the ANOVA summary table. (b) What
are H0 and Ha? (c) What is Fcrit? (d) What do you
conclude about Fobt? Report your results in the
correct format. (e) Perform the Tukey HSD test
if appropriate. (f) What do you conclude about
this relationship? (g) What is the effect size in
this study, and what does this tell you about this
factor?
21. A researcher investigated the number of viral
infections people contracted as a function of
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
the amount of stress they experienced during a
6-month period. She obtained the following data:
Amount of Stress
Negligible
Stress
Minimal
Stress
Moderate
Stress
Severe
Stress
2
1
4
1
4
3
2
3
6
5
7
5
5
7
8
4
24. We compare the final exam grades of students
taking statistics in the morning (X 76.2), in the
afternoon (X 74.45), and in the evening
(X 72.53). With n 10 we compute the following sums of squares.
Source
(a) What are H0 and Ha? (b) Complete the ANOVA
summary table. (c) What is Fcrit? (d) What do you
conclude about Fobt? Report your results in the
correct format. (e) Perform the Tukey HSD test if
appropriate. (f) What do you conclude about this
study? (g) Compute the effect size and interpret it.
22. Here are data from an experiment studying the
effect of age on creativity scores:
Age 4
Age 6
Age 8
Age 10
3
5
7
4
3
9
11
14
10
10
9
12
9
8
9
7
7
6
4
5
23. In an ANOVA, with dfbn 4 and dfwn 51, you
have Fobt 4.63. (a) Is the Fobt significant? (b) How
did you determine this?
(a) Compute Fobt and create an ANOVA summary table. (b) What do you conclude about Fobt?
(c) Perform post hoc comparisons if appropriate.
(d) What should you conclude about this relationship? (e) How important is age in determining creativity scores? (f) Describe how you would graph
these results.
Between
Within
Total
Sum of Squares
df
Mean Square
F
127.60
693.45
821.05
(a) Complete the ANOVA summary table. (b) What is
Fcrit? (c) Is there a significant difference between the
class means? (d) What other procedures should be
performed? (e) Based on these results, what psychological explanation can you give for why the time of
day the class meets has no influence on grades?
25. Considering the chapters you’ve read, identify the
inferential procedure to perform for the following
studies: (a) Doing well in statistics should reduce
students’ math phobia, so we measure their fear
after selecting groups who received a final grade
of either an A, B, C, or D. (b) To determine if
recall is better or worse than recognition, participants study a list of words, and then half of
them recall the words and the other half perform
a recognition test. (c) We test the aggressiveness
of a group of rats after 1, 3, 5, and 7 weeks to
see if they become more aggressive as they grow
older. (d) We want to use students’ scores on the
first exam in a course to predict their final exam
grades. (e) We ask if pilots have quicker reaction
times than the copilots they usually fly with.
Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance
201
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter
12
UNDERSTANDING THE
TWO-WAY ANALYSIS OF
VARIANCE
LOOKING
BACK
GOING
F O R WA R D
Be sure you understand:
Your goals in this chapter are to learn:
• From Chapter 11, the terms factor and
level, what a significant F indicates,
and how to perform and interpret the
Tukey HSD test and h 2.
• What a two-way ANOVA is.
• How to calculate main effect means and cell means.
• What a significant main effect indicates.
• What a significant interaction indicates.
• How to perform the Tukey HSD test on the interaction.
• How to interpret the results of a two-way experiment.
I
n the previous chapter, we used ANOVA to test the means
Sections
from one factor. In this chapter, we’ll expand the experiment
to involve two factors. This analysis is similar to the previous
12-1
Understanding the
Two-Way Design
12-2
Understanding Main
Effects
12-3
Understanding the
Interaction Effect
Appendix A.4). So, think of this chapter as teaching you how to
12-4
Completing the
Two-Way ANOVA
such designs in research publications and in research you may con-
12-5
Interpreting the
Two-Way Experiment
ogy, and purpose of such ANOVAs. The following sections present
ANOVA, except here we compute several Fs. The good news
is we will NOT focus on the computations. Nowadays we usually
analyze such experiments using SPSS or other programs (although
the formulas for the between-subjects version are presented in
understand the computer’s output. You will frequently encounter
duct yourself, so you need to understand the basic logic, terminol(1) the general layout of a two-factor experiment, (2) what the
ANOVA indicates about the influence of your variables, (3) how
to compute a special case of the Tukey HSD that SPSS does not
perform, and (4) how to interpret a completed study.
202
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
©Aeypix/Shutterstock.com
12-1 UNDERSTANDING
THE TWO-WAY DESIGN
The two-way ANOVA is the parametric inferential
procedure that is applied to designs that involve two
independent variables. When both factors involve independent samples, we perform the two-way betweensubjects ANOVA. When both factors involve related
samples, we perform the two-way within-subjects
ANOVA. If one factor is tested using independent samples and the other factor involves related samples, we
perform the two-way mixed-design ANOVA. The
logic of these ANOVAs is identical except for slight
variations in the formulas. In this chapter we’ll discuss
the between-subjects version.
A specific design is described using the number of
levels in each factor. If, for example, factor A has two
levels and factor B has two levels, we have a “twoby-two” ANOVA, which is written as 2 2. Or, with
four levels of one factor and three levels of the other,
we have a 4 3 ANOVA, and so on. Each factor can
involve any number of levels.
Here’s a semi-fascinating idea for a study. Let’s say
we are interested in what aspects of a message make it
more or less persuasive. One obvious physical characteristic is how loud the message is: Does a louder message
“grab” your attention and make you more persuaded?
To answer this question, we will present a recorded
message supporting a fictitious politician to participants
at one of three volumes. Volume is measured in decibels, but to simplify things we’ll call our volumes soft,
medium, and loud. Say we are also interested in differences in how our male and female participants are persuaded, so our other
two-way
ANOVA The
factor is the gender of the listener.
parametric procedure
So, we have a two-way experiment
performed when an
involving three levels of volume and
experiment contains
two independent
two levels of gender. The dependent
variables
variable measures how persuasive the
two-way
message is, with higher scores indibetweencating greater persuasiveness.
subjects
Understand that this two-way
ANOVA The
design will tell us everything we
parametric procedure
performed when both
would learn by conducting two, onefactors are betweenway studies: one that compared only
subjects factors
the persuasiveness scores from the
two-way
three levels of volume, and one that
withincompared the scores of women and
subjects
ANOVA The
men. However, the advantage of the
parametric procedure
two-way design is that we will also be
performed when both
able to study something that we’d othfactors are withinsubjects factors
erwise miss—the interaction between
gender and volume. For now, think of
two-way
mixed-design
an interaction as the influence of comANOVA The
bining the two factors. Interactions
parametric procedure
are important because they often
performed with one
influence a behavior in nature. Thus,
within-subjects factor
and one betweena primary reason for conducting a
subjects factor
study with two (or more) factors is to
Chapter 12: Understanding the Two-Way Analysis of Variance
203
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
cell In a twoway ANOVA, the
combination of one level
of one factor with one
level of the other factor
factorial
design A design in
which all levels of one
factor are combined
with all levels of the
other factor
main effect The
effect on the dependent
scores of changing the
levels of one factor
after collapsing over the
other factor
observe the interaction between them.
A second reason is again that once
you’ve created a design for studying
one factor, often only a minimum of
additional effort is required to include
additional factors.
The way to organize our 3 2
design is shown in Table 12.1.
In the diagram:
We perform the two-way between-subjects ANOVA
if (1) each cell is an independent sample, and (2) we
have normally distributed interval or ratio scores that
have homogeneous variance. Note: These procedures
are much easier when we have equal cell ns throughout,
so we’ll assume we do. Then, as in the following sections, any two-way ANOVA involves examining three
things: the two main effects and the interaction effect.
1. Each column represents a level
of the volume factor. (In general we’ll call the column factor
“factor A.”) Thus, the scores in
column A1 are from participants
tested under soft volume.
12-2 UNDERSTANDING
2. Each row represents a level of the gender factor.
(In general we’ll call the row factor “factor B.”)
Thus, scores in row B1 are from male participants.
3. Each small square produced by combining a level
of factor A with a level of factor B is called a cell.
Here, we have six cells, each containing a sample
of three participants who are one gender and given
one volume. For example, the highlighted cell
contains scores from three females presented with
medium volume. (With 3 participants per cell, we
have a total of 9 males and 9 females, so N 18.)
4. In a “multi-factor” design like this, when we
combine all levels of one factor with all levels
of the other factor, the design is also called a
factorial design. Here, all levels of gender are
combined with all levels of our volume factor.
MAIN EFFECTS
The first step in the two-way ANOVA is to examine
the influence of each factor by itself. This is called a
factor’s main effect. The main effect of a factor is
the overall effect that changing the levels of that factor has on dependent scores while we ignore the other
factors in the study. So, in the persuasiveness study, we
will examine the main effect that changing volume by
itself has on scores. Then we will examine the main
effect that changing gender by itself. has on scores. In
any two-way ANOVA, we examine the main effect of
factor A and the main effect of factor B.
12-2a The Main Effect of Factor A
In the persuasiveness study, the way to examine the
main effect of volume by itself is to ignore gender. To
do this, we will literally erase the horizontal line that
separates the rows of males and females in Table 12.1.
Table 12.1
A 3 2 Design for the Factors of Volume and Gender
Each column represents a level of the volume factor; each row represents a level of the gender
factor; each cell contains the scores of participants tested under a particular combination of
volume and gender.
© iStockphoto.com/ra-Photos
Level B1:
Male
204
Factor B:
Gender
Level B2:
Female
Level A1:
Soft
9
4
11
2
6
4
Factor A: Volume
Level A2:
Medium
8
12
13
9
10
17
One of the six cells
Level A3:
Loud
18
17
15
6
8
4
N 18
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Once we erase that horizontal line, we treat the
experiment as if it were this one-way design:
Factor A: Volume
Level A2:
Level A3:
Level A1:
Soft
Medium
Loud
9
8
18
4
12
17
11
13
15
2
9
6
6
10
8
4
17
4
XA 6
1
XA 11.5
2
XA 11.33
3
By ignoring the distinction between males and
females, we simply have six people in each column,
so we have a study consisting of one factor with three
levels of volume. Then, as usual, we find the mean of
each level by averaging the scores in each column.
However, in a two-way design, these means are called
the main effect means. Here we have the main effect
means for volume.
In statistical terminology, we computed the main
effect means for volume by collapsing the factor of
gender. Collapsing a factor refers to averaging
together all scores from all levels of that factor. When
we collapse one factor, we make it disappear (like we
did with gender), so we are left with the main effect
means for the remaining factor. Thus, a main effect
mean is the mean of the level of one factor after collapsing the other factor.
Once we have the main effect means, we can determine the main effect of the factor. To see a main effect,
look at how the main effect means change as the levels
of the factor change. For the main effect of volume,
we look at how persuasiveness scores change as volume increases: Scores go up from around 6 (at soft)
to around 11.5 (at medium), but then drop slightly to
around 11.33 (at high). So it appears there is a main
effect—an influence—of changing the levels of volume.
BUT! There’s the usual problem. Although there
appears to be a relationship between volume and
scores, maybe we are being misled by sampling error.
Maybe changing volume does nothing, but by chance
we happened to get three samples containing different scores. Therefore, to determine if these are significant differences—if there is a significant main
effect of the volume factor—we essentially perform
a one-way ANOVA that compares these main effect
means. The H0 says there is no difference between the levels of factor
A in the population, so we have
H0: mA mA mA . The Ha is that at
least two of the main effect means
reflect different populations, so we
have Ha: not all mA are equal.
1
2
3
collapsing
Averaging together
scores from all levels
of one factor to
calculate the main
effect means for the
other factor
main effect
means The means
of the levels of one
factor after collapsing
the levels of the other
factor
When we examine the main effect
of factor A, we look at the overall
mean of each level of A.
We test this H0 by computing an Fobt, which we’ll
call FA. Approach this exactly as you did the one-way
ANOVA in the previous chapter. First, we compare FA
to Fcrit, and if it is significant, it indicates that at least
two means from factor A differ significantly. Then
(with equal ns) we determine which specific levels differ by performing the Tukey HSD test. We also compute the factor’s effect size (h2) and graph the main
effect means. Then we describe and interpret the relationship (here describing how changing volume influences persuasiveness scores).
12-2b The Main Effect of Factor B
After analyzing the main effect of factor A, we examine the main effect of factor B. To see this main effect,
we collapse the levels of factor A (volume), so we erase
the vertical lines separating the levels of volume back
in Table 12.1. Then we have this:
Factor B:
Gender
Level B1:
Male
Level B2:
Female
9
4
11
2
6
4
8
12
13
9
10
17
18
17 XB 11.89
15
6
8 XB 7.33
4
1
2
We simply have the persuasiveness scores of 9
males and 9 females, ignoring the fact that some of
each heard the message at different volumes. So now
we have a study consisting of one factor with two
Chapter 12: Understanding the Two-Way Analysis of Variance
205
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
levels. Averaging the scores in each row yields the
mean persuasiveness score for each gender. These are
our main effect means for gender.
To see the main effect of this factor, we again look
at the pattern of the means: Apparently, changing
from males to females leads to a drop in scores from
around 11.89 to around 7.33. As usual, though, there
is the problem of sampling error, so to determine if
this is a significant difference—if there is a significant
main effect of the gender factor—we perform essentially another one-way ANOVA that compares these
main effect means. Our H0 says there is no difference
between the levels of factor B in the population, so we
have H0: mB mB . Our Ha is that at least two of the
main effect means reflect different populations, so we
have Ha: not all mB are equal.
X 130
Collapsing the levels of A produces
the main effect means for factor B.
Differences among these means reflect
the main effect of B.
For Practice
In this study.
A1
B1
B2
2
2
2
11
10
9
A2
5
4
3
7
6
5
1. The means produced by collapsing across factor B
equal _____ and _____. They are called the _____
means for factor _____.
2. Describe the main effect of A.
3. The means produced by collapsing across factor
A are _____ and _____. They are called the _____
means for factor _____.
4. Describe the main effect of B.
> Answers
1. XA1 6; XA2 5; main effect; A
Collapsing (averaging together) the
scores from the levels of factor B
produces the main effect means for
factor A. Differences among these means
reflect the main effect of A.
The column means are the main effect means for dose:
The main effect is that mean IQ increases from 110 to
130 as dosage increases. The row means are the main
effect means for age: The main effect is that mean IQ
decreases from 125 to 115 as age increases.
2. Changing from A1 to A2 produces a decrease in
scores.
> Quick Practice
206
X 110
X 115
3. XB1 3; XB2 8; main effect; B
We test this H0 by computing another Fobt, which
is FB. We compare this to Fcrit, and if FB is significant,
it indicates that at least two main effect means from
factor B differ significantly. Then, if needed, we perform the Tukey HSD test to determine which means
differ, we compute the effect size, and we describe and
interpret this relationship (here describing how gender
influences persuasiveness scores).
>
20 years
X 125
2
When we examine the main effect
of factor B, we look at the overall
mean for each level of B by
examining the row means.
>
10 years
Factor B:
Age
Factor A: Dose
One Pill
Two Pills
100
140
105
145
110
150
110
110
115
115
120
120
4. Changing from B1 to B2 produces an increase in
scores.
1
More Examples
We compare the effects of two dose levels of a “smart pill”
and two levels of age. We have the IQ scores shown here.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
12-3 UNDERSTANDING
under one level of factor B. Then see if this effect—this
pattern—for factor A is different when you look at the
other levels of factor B. For example, here is the first
row of Table 12.2, showing the relationship between
volume and scores for males. As volume increases, the
means also increase, forming an approximately positive linear relationship.
THE INTERACTION EFFECT
After we have examined the main effects of factors A
and B, we examine the effect of their interaction. The
interaction of two factors is called a two-way interaction. It is the influence on scores created by combining
each level of factor A with each level of factor B. In our
example, it is combining each volume with each gender.
In general an interaction is identified as A B. Here,
factor A has 3 levels and factor B has 2 levels, so it is
a 3 2 (say “3 by 2”) interaction. Because an interaction examines the influence of combining the levels of
the factors, we do not collapse (ignore) either factor.
Instead, we examine the cell means. A cell mean is the
mean of the scores from one cell. The cell means for the
interaction between volume and gender are shown in
Table 12.2. Here we will compare the mean in malesoft to the mean in male-medium, then compare it to the
mean in female–soft, and so on.
B1: Male
However, now look at the relationship between
volume and scores for females, using the cell means
from the bottom row of Table 12.2.
B2: Female
For the interaction effect we
compare the cell means.
ns.
For a main effect we
compare the level means.
ns.
However, examining an interaction
n
is not as simple as saying that the celll
means differ significantly from each
other. Instead, the way to look at an
interaction is to look at the influence
of changing the levels of factor A
© iStockphoto.com/archives
Table 12.2
The Volume by Gender Interaction
Each mean is a cell mean.
Factor A: Volume
Soft
Medium
Loud
Factor B:
Gender
Male
X8
X 11
X 16.67
Female
X4
X 12
X6
Factor A: Volume
Soft
Medium
Loud
X 8 X 11 X 16.67
Factor A: Volume
Soft
Medium
Loud
X 4 X 12
X6
Here, as volume increases, the means first increase
but then decrease, producing a nonlinear relationship.
Thus, there is a different relationship between volume and persuasiveness scores for each gender level. A
two-way interaction
int
effect is present when the
relationship between one factor and the dependent
scores changes
ch
as the levels of the other factor
change. In other words, an interaction effect
occurs when
w
the influence of changing one factor is not
n the same for each level of the other
factor.
facto Here, for example, increasing volume
does
doe not have the same effect on males that
it
i does on females.
An easy way to spot that an interaction effect is present is that you
must use the word “depends” when
describing the influence of a factor:
What effect does increasing the volume
have?
h
It depends on
whether
wh
we’re talking
cell mean The
about
abou males or females.
mean of the scores
Likewise,
Likew
you can see the
from one cell in a
interaction effect by looking
two-way design
at the difference between males and
two-way
females at each volume. Who score
interaction
effect Occurs
higher, males or females? It depends
when the relationship
on which level of volume we’re talkbetween one factor
ing about.
and the dependent
scores depends on
Conversely, an interaction
the level of the other
effect would not be present if the
factor that is present
cell means formed the same pattern
Chapter 12: Understanding the Two-Way Analysis of Variance
207
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
for males and females. For example, say the cell means
had been as follows:
Factor A: Volume
Soft
Medium
Loud
Factor B:
Gender
Male
X5
X 10
X 15
Female
X 20
X 25
X 30
A two-way interaction effect
indicates that the influence one
factor has on scores depends on
which level of the other factor
is present.
Increasing the volume increases scores by about 5
points, regardless of whether it’s for males or females.
Or, females always score higher, regardless of volume.
Thus, an interaction effect is not present when the
influence of changing the levels of one factor does
not depend on which level of the other variable we
are talking about. In other words, there’s no interaction when we see the same relationship between the
dependent scores and one factor for each level of the
other factor.
Here’s another example of an interaction. Say
that in a different study, for one factor we measure
whether participants are in a sad or a happy mood.
Our second factor involves having participants learn a
list of 15 happy words (e.g., love, beauty, etc.) or a list
of 15 sad words (e.g., death, pain, etc.). Each group
then recalls its list. Research suggests we would obtain
mean recall scores forming a pattern like this:
So, in our persuasiveness study it appears we have
an interaction effect. But . . . that’s right: Perhaps by
chance we obtained cell means that form such a pattern, but in the population (in nature), these variables
do not interact in this way. To determine if we have
a significant interaction effect, we perform essentially
another one-way ANOVA that compares the cell
means. To write the H0 and Ha in symbols is complicated, but in words, H0 is that the cell means do not
represent an interaction effect in the population, and
Ha is that at least some of the cell means do represent
an interaction effect in the population.
To test H0, we compute another Fobt, called FAB. If
FAB is significant, it indicates that at least two of the
cell means differ significantly in a way that produces
an interaction effect. Then we perform a slightly different version of the Tukey HSD test, we compute the
effect size, and we describe and interpret the relationship (here describing how the different combinations
of volume and gender influence persuasiveness scores).
Words
Happy
Sad
X 10
X5
Happy
X5
X 10
Is there an interaction effect here? Yes, because
the relationship between recall scores and sad/happy
words changes with the level of mood. Participants
recall more sad words than happy words when in
a sad mood. But they recall fewer sad words than
happy words when in a happy mood. So, what is
the influence of sad or happy words on recall? It
depends on what mood particir, in which
pants are iin. Or,
ple recall
mood wil
will people
words be
best? It depends
on whet
whether they are
py or sad
recalling happy
words.
208
> Quick Practice
>
We examine the interaction effect by
looking at the cell means. An effect is
present if the relationship between one
factor and the dependent scores changes
as the levels of the other factor change.
More Examples
Mor
Here are the cell means when factor A is dose of the
smart pill and factor B is age of participants.
smar
© Thinkstock/Jupiter Images
Mood
Sad
Factor A: Dose
One Pill Two Pills
Factor B: 10 years
Age
20 years
X 105
X 145
X 115
X 115
(continued)
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
We see an interaction effect because the influence of
increasing dose depends on participants’ age. Dosage
increases mean IQ for 10-year-olds from 105 to 145, but
it does not change mean IQ for 20-year-olds (always at
115). Or the influence of increasing age depends on
dose. With 1 pill, 20-year-olds score higher (115) than
10-year-olds (105), but with 2 pills, 10-year-olds score
higher (145) than 20-year-olds (115).
For Practice
A study produces these data:
B1
B2
A1
2
2
2
11
10
9
A2
5
4
3
7
6
5
1. The means to examine for the interaction are called
the _____ means.
2. When we change from A1 to A2 for B1, the cell means
are _____ and _____.
3. When we change form A1 to A2 for B2 the cell means
are _____ and _____.
4. How does the influence of changing from A1 to A2
depend on the level of B that is present?
5. Is an interaction effect present?
> Answers
1. cell 2. 2, 4 3. 10, 6 4. Under B1 the means increase;
under B2 they decrease. 5. yes
12-4 COMPLETING
THE TWO-WAY ANOVA
As you’ve seen, in the two-way ANOVA we
compute three Fs: one for the main effect
of factor A, one for the main effect of factor B, and one for the interaction of A B.
The formulas for the two-way betweensubjects ANOVA applied to our persuasiveness data are presented in Appendix A.4.
The completed ANOVA Summary Table is
shown here in Table 12.3. (This is similar
to the output produced by SPSS.)
The row labeled Factor A reflects the differences
between groups due to the main effect of changing
volume. The row labeled Factor B reflects the differences between groups due to the main effect of gender. The row labeled Interaction reflects the differences
between groups (cells) formed by combining volume
with gender. The row labeled Within reflects the differences among individual scores within each cell, which
are then pooled. (In SPSS, this row is labeled Error.)
The logic and calculations for the Fs here are the
same as in the one-way ANOVA. For each, if H0 is
true and the data represent no relationship in the population, then the variability of the means (the MSbn)
should equal the variability of the scores (the MSwn).
Then the F-ratio of the MSbn divided by the MSwn
should equal 1. However, the larger the Fobt, the less
likely that H0 is true.
The novelty here is that we have three versions of the
MSbn, one for factor A, one for factor B, and one for the
interaction. To compute them, in each row of Table 12.3
we first compute the appropriate sum of squares and
then divide by the appropriate df. This produces the
corresponding mean square between groups. In the row
labeled Within, dividing the sum of squares by the df
produces the one MSwn we use to compute all three Fs.
Then, the Fobt for factor A (volume) of 7.14 is produced
by dividing 58.73 by 8.22. The Fobt for factor B (gender)
of 11.36 is produced by dividing 93.39 by 8.22. And
the Fobt for the interaction (volume gender) of 6.25 is
produced by dividing 51.39 by 8.22.
Each Fobt is tested by comparing it to Fcrit from the
F-table in Appendix B. To find the Fcrit for a particular
Fobt, use the dfbn and dfwn used to compute that Fobt.
You may have different dfs for each Fobt. Above, for
our factor A and the interaction, we find Fcrit using
dfbn 2 and dfwn 12. However, for factor B, we
use dfbn 1 and dfwn 12. Also, which of your Fs
Table 12.3
Completed Summary Table of Two-Way ANOVA
Source
Between
Factor A (volume)
Factor B (gender)
Interaction (vol gen)
Within
Total
Sum of Squares
df
Mean Square
F
117.45
93.39
102.77
98.67
412.28
2
1
2
12
17
58.73
93.39
51.39
8.22
7.14
11.36
6.25
Chapter 12: Understanding the Two-Way Analysis of Variance
209
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
are significant depends solely on
Table 12.4
your particular data: Any comMain Effect and Interaction Means from the Persuasiveness Study
bination of the main effects and/
or interaction may or may not be
Factor A: Volume
significant.
Level A1: Level A2:
Level A3:
As shown in Appendix A.4,
Soft
Medium
Loud
the persuasiveness data produced
8
11
16.67
significant Fs for factor A (volX male 11.89
Factor B: Level B1: Male
ume), for factor B (gender), and
Gender Level B : Female
4
12
6
X fem 7.33
2
for the interaction. Table 12.4
shows all of the means for the
X soft 6 X med 11.5 X loud 11.33
persuasiveness study. The means
inside the matrix are again the
cell means. The means under the
means and then follow the procedure described in the
matrix are again the column means, which are the
previous chapter.
main effect means for volume. The means to the right
of the matrix are again the row means, which are the
main effect means for gender.
THE FORMULA FOR THE TUKEY HSD
TEST IS
Notice that instead of using the individual raw
scores in a column to compute the main effect mean
MSwn
of the column, we can average together the two cell
HSD (qk)a
b
B n
means in the column (e.g., for soft volume, (8 4)/2
6). Likewise, to compute the main effect mean for
a row, we can average together the three cell means
where MSwn is from the ANOVA and n is the numin the row (e.g., for males, (8 11 16.67)/3 ber of scores in a level. Find qk in Table 5 in Appen11.89).
dix B using the k for the factor and dfwn from the
To understand and interpret the results of a
ANOVA.
two-way ANOVA, you should examine the means
Be aware the k and n may be different for each
from each significant main effect and interaction by
factor, so you may need to compute a separate HSD
performing the HSD test, graphing the means, and
for each factor. In our study, k for volume is 3, but k
computing h 2.
for gender is 2. Also, be careful when identifying n.
12-4a Examining the Main Effects
© Samantha Everton/Photolibrary
Approach each main effect as the separate one-way
ANOVA that we originally diagrammed. As usual, a
significant Fobt merely indicates differences somewhere
among the main effect means. Therefore, if the ns are
equal and we have more than two levels in the factor,
we determine which specific main effect means differ by performing Tukey’s HSD test. For one factor at
a time, we find the differences among all main effect
Look back at Table 12.1. When we collapsed gender
to get the main effect means of volume, we combined
two groups of 3 scores each, so our n in the HSD when
comparing volume means is 6. (There are 6 scores in
each column.) However, when we collapsed volume
to get the main effect means of gender, we combined
three groups of 3 scores each, so our n in the HSD
comparing gender means is 9! (There are 9 scores in
each row.)
As shown in Appendix A.4, the HSD test for our
main effect means for the volume factor indicates that
the soft condition (6) differs significantly from both the medium
(11.5) and loud (11.33) levels.
However, medium and loud do not
differ significantly. For the gender
factor, it must be that the mean for
males (11.89) differs significantly
from the mean for females (7.33).
If factor B had involved more than
210
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
To compute h 2 for factor A, use the sum of squares
for factor A as the SSbn in the above formula. So, from
Table 12.3, h 2 117.45/412.28 .28. To compute h 2
for factor B, use the sum of squares for factor B as the
SSbn, so 93.39/412.28 .23. Thus, changing volume
accounts for 28% of the differences in persuasiveness
scores, and changing gender accounts for 23% of the
variance in these scores.
two lines on the
graph to show
ušan Zidar
hoto.com/D
© iStockp
this relationship
twice, once for
males and once for
females. So, approach this
in the same way as when
we examined the means
for the interaction effect.
There we first looked at
the relationship between volume
and persuasiveness scores for only
males: From Table 12.4 their cell
means are Xsoft 8, Xmedium 11,
and Xloud 16.67. We plot these
three means and connect the data points with straight
th
lines. Then look at the relationship between volume
and scores for females: Their cell means are Xsoft 4,
Xmedium 12, and Xloud 6. We plot these means and
connect their adjacent data points with straight lines.
(Notice: Always provide a key to identify each line.)
The way to read the graph is to look at one line
at a time. For males (the dashed line), as volume
increases, mean persuasiveness scores increase. However, for females (the solid line), as volume increases,
persuasiveness scores first increase but then decrease.
Thus, we see one relationship for males and a different relationship for females, so the graph shows an
interaction effect.
12-4b Examining the
Figure 12.1
THE FORMULA FOR h2 IS
h2 SSbn
SStot
Interaction Effect
We examine an interaction effect in the same ways that
we examine a main effect. So first, using the above formula, we can determine h 2 for the interaction. We find
the sums of squares back in Table 12.3, and then h 2 102.77/412.28 .25. Thus, our various combinations
of volume and gender in the interaction account for
25% of the differences in persuasiveness scores.
Second, an interaction can be a beast to interpret,
so always graph it! As usual, label the Y axis with the
mean of the dependent variable. To produce the simplest graph, place on the X axis the factor with the
most levels. So, for the persuasiveness study the X axis
is labeled with the three volume levels. Then we plot
the cell means.
The resulting graph is shown in Figure 12.1. As in
any graph, we are showing the relationship between the
X and Y variables, so here we show the relationship
between volume nd persuasiveness. However, we plot
Graph of Cell Means, Showing the Interaction of
Volume and Gender
18
16
Male
Female
14
Mean persuasiveness
two levels, however, we would compute a new HSD
and proceed as usual.
Also, it is appropriate to produce a separate graph
ph for
each significant main effect. As you saw in Chapter 3, we show the relationship between the levels
of the factor (independent variable) on the X axis,
and the main effect means (dependent variable) on the
he Y
axis. Include all means, even those that do not differ
significantly.
Finally, SPSS will not compute
the effect size. Therefore, you should
compute h 2 to determine the proportion
of variance accounted for by each significant main effect.
12
10
8
6
4
2
0
Soft
Medium
Volume of message
Loud
Chapter 12: Understanding the Two-Way Analysis of Variance
211
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
An interaction effect is present
when its graph produces lines
that are not parallel.
Graph the interaction by drawing
a separate line for each level
of one factor that shows the
relationship between the other
factor on the X axis and the
dependent (Y) scores.
comparing cells connected by the dashed lines in
Table 12.5.
Instead, we perform only unconfounded
comparisons, in which two cells differ along only
one factor. The cells connected by solid lines in
Table 12.5 are examples of unconfounded comparisons. Thus, we compare only cell means within the
same column because these differences result from
factor B. We compare means within the same row
because these differences result from factor A. We
do not, however, make any diagonal comparisons,
because they are confounded comparisons.
Take note that an interaction effect can produce
a graph showing any kind of pattern, except that it
always produces lines that are not parallel. Each line
shows the relationship between X and Y, so a line that
is shaped or oriented differently from another line
indicates a different relationship. Therefore, when the
lines are not parallel, they indicate that the relationship between X and Y changes depending on the level
of the second factor, so an interaction effect is present. Conversely, when an interaction effect is
Table 12.5
not present, the lines will be virtually parallel,
Interaction Means for Persuasiveness Study
with each line depicting essentially the same
Any horizontal or vertical comparison is unconfounded; any diagonal comparison
relationship between X and Y.
12-4c Performing the Tukey
Soft
Factor A: Volume
Medium
Loud
B1: Male
X 8
X 11
X 16.67
B2: Female
X 4
X 12
X 6
HSD Test on the Interaction
We also apply the Tukey HSD test to a signifiFactor B:
cant interaction effect so that we can deterGender
mine which of the cell means differ significantly. However, SPSS will not perform this
test, and it is slightly different from the test
for main effects.
First, recognize that we do not compare every cell
mean to every other cell mean. Look at Table 12.5. We
would not, for example, compare the mean for males
at soft volume to the mean for females at medium
dium volume. Because the two cells differ
iffer both
in terms of gender and volume,
lume, we
confounded
comparison A
cannot determine which of these
hese varicomparison of two cells
ables caused the difference. Therefore,
that differ along more
we are confused or “confounded.”
unded.” A
than one factor
confounded
comparison
n occurs
unconfounded
when
two
cells
differ
along
ng
comparison A
comparison of two cells
more than one factor. Other
er
that differ along only
examples of confounded
ed
one factor
comparisons would involve
ve
212
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
© iStockphoto.com/Niels Laan
is confounded.
Values of Adjusted k
Design of
Number of Cell
Study
Means in Study
4
22
Adjusted
Value of k
3
23
6
5
24
8
6
In the left-hand column locate your design: Ours is a
3 2 or a 2 3. In the middle column confirm the
number of cell means in the interaction: We have 6. In
the right-hand column is the adjusted k: For our study
it is 5. Use the adjusted k as the value of k for finding
qk. For our study, in Table 5 we look in the column
for k equal to 5. With a .05 and dfwn 12, the qk is
4.51. We compute the HSD using this qk and the usual
formula. Notice that n is different than in the main
effects: Back in Table 12.1, each cell mean is based on
the 3 scores in the cell, so now the n in the HSD is 3.
> Quick Practice
>
>
The graph of an interaction shows the
relationship between one factor on X
and dependent scores on Y for each level
of the other factor.
When performing Tukey’s HSD test on
an interaction effect, determine the
adjusted value of k and make only
unconfounded comparisons.
More Examples
We obtain the cell means below on the left. To produce
the graph of the interaction on the right, plot data points
at 2 and 6 for B1 and connect them with a solid line. Plot
data points at 10 and 4 for B2 and connect them with a
dashed line.
B1
B2
A1
A2
X 2
X 6
X 10
X 4
A1
A2
Say that dfwn 16, MSwn 5.19, and the n per cell is 5. For
the HSD, from Table 5 in Appendix B, the adjusted k is 3, so
MSwn
5.19
b 3.72
B n
B 5
The unconfounded comparisons involve subtracting
the means in each column and each row. All differences
are significant except when comparing 6 versus 4.
b (3.65)a
For Practice
We obtain the following data:
B1
B2
A1
A2
X 13
X 14
X 12
X 22
The dfwn 12, MSwn 4.89, and n 4.
1. The adjusted k is _____.
2. The qk is _____.
3. The HSD is _____.
4. Which cell means differ significantly?
> Answers
1. 3 2. 3.77 3. 4.17 4. only 12 versus 22 and
14 versus 22
To complete the HSD test, look back at Table 12.5.
For each column, we subtract every mean in the column from every other mean in that column. For each
row, we subtract every mean in the row from every
other mean in that row. Any difference between two
cell means that is larger than the HSD is significant. As
shown in Appendix A.4 our persuasiveness study produced only three significant differences: (1) between
females at soft volume and females at medium volume, (2) between males at soft volume and males at
loud volume, and (3) between males at loud volume
and females at loud volume.
B1
B2
10
8
6
4
2
0
HSD (qk )a
Performing the HSD test on the
interaction requires making only
unconfounded comparisons and
finding the adjusted k.
Mean score
We also have one other difference when performing
the HSD test on an interaction. Recall that to compute
the HSD requires qk. Previously, we found qk in Table
5 in Appendix B using k, the number of means being
compared. However, because we are not comparing all
cell means, we must “adjust k.” You obtain the adjusted
k from the table titled Values of Adjusted k at the top of
Table 5 in Appendix B. A portion of the table is below:
Chapter 12: Understanding the Two-Way Analysis of Variance
213
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
12-5 INTERPRETING THE
TWO-WAY EXPERIMENT
The way to interpret an experiment is to look at the
significant differences between means from the post
hoc comparisons for all significant main effects and
interaction effects. All of the differences found in the
persuasiveness study are summarized in Table 12.6.
Each line connecting two means indicates that they
differ significantly.
Often the interpretation of a two-way study may
focus on the significant interaction, even when main
effects are significant. This is because our conclusions
about main effects may be contradicted by the interaction. After all, the interaction indicates that the influence of one factor depends on the levels of the other
factor and vice versa, so you should not act like either
factor has a consistent effect by itself. For example,
look at the main effect means for gender in Table 12.6:
This leads to the conclusion that males (at 11.89) score
higher than females (at 7.33). However, now look at
the cell means of the interaction: Gender differences
depend on volume because only in the loud condition is
there a significant difference between males (at 16.67)
and females (at 6). Therefore, it is inaccurate to conclude that males always score higher than females.
Likewise, look at the main effect means for volume: Increasing volume from soft to medium significantly raises scores (from 6 to 11.5), as does increasing
volume from soft to loud (from 6 to 11.33). However,
the interaction indicates that increasing volume from
soft to medium produces a significant difference only
for females (from 4 to 12). Increasing volume from
soft to loud produces a significant difference only for
males (from 8 to 16.67). So, it is inaccurate to conclude that increasing volTable 12.6
ume always has the same
effect.
When the interaction is not significant we
can focus our interpretation on the main effects,
because then they have a
more consistent effect. For
completeness,
however,
always perform all analyses of significant main
and interaction effects,
and report all significant
and nonsignificant results.
214
Report each Fobt using the same format we used for the
one-way ANOVA.
Also, the size of h2 for each main effect and the
interaction can guide your conclusions. In our study, all
of our effects are rather large, with volume and gender
accounting for 28% and 23% of the variance, respectively, and the interaction accounting for 25% of the
variance. Therefore, they are all of equal importance
in understanding differences in persuasiveness scores.
However, had any of the effects been small, we would
downplay the role of that effect in our interpretations.
In particular, if the interaction’s effect is small, then
although the interaction contradicts the main effect, it
is only slightly and inconsistently contradictory. In such
cases, you may focus your interpretation on significant
main effects that have a more substantial effect.
So, looking at the significant cell means in Table
12.6 we conclude our persuasiveness study by saying
that increasing the volume of a message beyond soft
tends to increase persuasiveness scores in the population, but this increase occurs for females with medium
volume and for males with loud volume. Further, differences in persuasiveness scores occur between males
and females in the population, but only if the volume
of the message is loud.
The primary interpretation of a
two-way ANOVA may focus on
the significant interaction.
Summary of Significant Differences in the Persuasiveness Study
Each line connects two means that differ significantly.
Factor B:
Gender
Level B1: Male
Level B2: Female
Level A1:
Soft
8
Factor A: Volume
Level A2:
Medium
11
Level A3:
Loud
16.67
4
12
6
X soft 6
X med 11.5
X loud 11.33
X male 11.89
X fem 7.33
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
USING SPSS
See Review Card 12.4 for instructions for performing the two-way between-subjects ANOVA. In addition, you
can choose for SPSS to (1) compute all descriptive statistics, including main effect and cell means; (2) graph
the main effects and interaction; and (3) perform the Tukey HSD test, but on main effects only. You must
perform the HSD test on the interaction as described in this chapter and in Appendix A.4 . Also, you must
compute each h 2.
Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out
and use the Chapter Review Cards in the back of your book. Check out the additional study aids online
in CourseMate at www.cengagebrain.com
STUDY PROBLEMS
(Answers for odd-numbered problems are in Appendix C.)
1. (a) When do you perform the two-way ANOVA?
(b) What information can be obtained from a
two-way ANOVA that cannot be obtained from
two one-way designs that test the same factors?
2. Which type of ANOVA is used in a two-way
design when (a) both factors are tested using
independent samples? (b) one factor involves
independent samples and the other factor involves
related samples? (c) both factors involve related
samples?
3. Explain the following terms: (a) factorial; (b) cell;
(c) collapsing a factor.
4. What is the difference between a main effect
mean and a cell mean?
5. (a) What do we mean by the “main effect” of
factor A? (b) How are the main effect means of
factor A computed? (c) What does a significant FA
indicate about these means?
6. (a) How do we obtain the means examined in an
interaction effect? (b) What does a significant
interaction effect indicate about the factors and
the dependent scores?
7. (a) Identify the F-ratios computed in a two-way
ANOVA and what they apply to. (b) What must
be done for each significant effect in a two-way
ANOVA before interpreting the experiment?
(c) Why must this be done?
8. (a) What is a confounded comparison, and how
do you spot it in a study’s diagram? (b) What
is an unconfounded comparison, and when
does it occur in a study’s diagram? (c) Why
don’t we perform post hoc tests on confounded
comparisons?
9. We study the effect of factor A, which is four different memory techniques, and factor B, which is
participants age of 15, 20, or 25 years old. We test
10 participants per cell. (a) Draw a diagram of this
study. (b) Using two numbers, describe the design.
(c) What is the n in each level when computing
the main effect means for the memory factor?
(d) What is the n in each level when computing the
main effect means for the age factor? (e) What is
the n in each group when performing the HSD test
on the interaction?
10. For a 2 2 ANOVA, describe in words the statistical hypotheses for factor A, factor B, and A B.
11. (a) What is the major difference when
computing the HSD for a main effect and for
an interaction? (b) What is the major difference
when finding the differences between the means
in the HSD test for a main effect and for an
interaction?
12. (a) When is it appropriate to compute h 2 in a
two-way ANOVA? (b) For each effect, what does it
tell you?
13. The diagram below shows the means from a study.
A1
A2
A3
B1
10
8
7
8.33
B2
9
13
18
13.33
9.5
10.5
12.5
(a) Does there appear to be an interaction, and
if so, why? (b) What are the main effect
means for A, and what is the apparent
conclusion about the main effect of
factor A? (c) What are the main effect means
for B, and what is the apparent conclusion about this main effect? (d) Graph the
interaction.
Chapter 12: Understanding the Two-Way Analysis of Variance
215
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
14. Joanne examined eye–hand coordination scores
for three levels of reward and three levels of
practice. She obtained the means given below.
Low
Practice Medium
High
Low
4
Reward
Medium
10
High
7
7
5
5
14
8
15
15
15
15
8
10
12
(a) What are the main effect means for reward,
and if Fobt is significant, what do they appear
to indicate? (b) What are the main effect
means for practice, and if Fobt is significant,
what do they appear to indicate? (c) What
procedure should be performed to confirm
your answers in parts a and b? (d) What else
has she forgotten to examine?
15. In question 14: (a) If FA B is significant, what is
the apparent conclusion about scores from the
viewpoint of increasing rewards? (b) How does
the interaction contradict your conclusions about
the main effect of rewards in question 14?
(c) What should she do next? (d) How would
you find unconfounded comparisons of the cell
means? (e) The dfwn 60. For this HSD what is the
appropriate qk?
16. Felix measured participants’ preferences for two
brands of soft drinks (factor A). For each brand he
tested male and female participants (factor B).
The ANOVA produces all significant Fs. The
MSwn 9.45, n 11 per cell, and dfwn 40.
The means are below.
17. Below are the cell means of three experiments.
For each, compute the main effect means and
decide whether there appears to be an effect
of A, B, and/or A B.
Study 1
A2
A1
Factor B
14
29
21.5
25
12
18.5
19.5
20.5
(a) What are the main effect means for brands?
Describe this main effect on preferences.
(b) What are the main effect means for
gender? Describe this main effect on
preferences. (c) Perform Tukey’s HSD
test where appropriate. (d) Describe the
interaction. (e) Describe a graph of the
interaction when factor A is on the X axis.
(f) Why does the interaction contradict your
conclusions about the main effects?
216
Study 3
A1
A2
B1
2
4
B1
10
5
B1
8
14
B2
12
14
B2
5
10
B2
8
2
18. In question 17, if you label the X axis with factor
A and graph the cell means, what pattern will you
see for each interaction?
19. We classified participants as high- or lowfrequency cell phone users, and also as having
one of four levels of part-time income (from low
to high). The dependent variable was satisfaction
with their social lives. The ANOVA produced only a
significant main effect for income level and a significant interaction effect. What can you conclude
about differences between mean satisfaction
scores occurring with (a) phone usage? (b) income
level? (c) the interaction?
20. You measure the dependent variable of
participants’ relaxation level as a function of
whether they meditated before being tested, and
whether they were shown a film containing a low,
medium, or high amount of fantasy. Here are the
data and the ANOVA.
Mediation
Factor A
Level A1: Level A2:
Brand X Brand Y
Level B1:
Males
Level B2:
Females
Study 2
A1
A2
No Mediation
Low
5
6
2
2
5
10
10
9
10
10
Amount of Fantasy
Medium
7
5
6
9
5
2
5
4
3
2
Source
Sum of
Squares
df
A: Fantasy
B: Meditation
A B: Interaction
Within
Total
42.467
.833
141.267
38.800
223.367
2
1
2
24
29
High
9
8
10
10
10
5
6
5
7
6
Mean
Square
21.233
.833
70.633
1.617
F
13.134
.515
43.691
(a) Which effects are significant? (b) Compute
the main effect means and the interaction
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
means. (c) Perform the Tukey HSD test where
appropriate. (d) What do you conclude about
the relationship(s) this study demonstrates?
(e) Evaluate the impact of each effect.
21. A 2 2 design tests participants’ frustration levels
when solving problems as a function of the difficulty
of the problem and whether they are math or logic
problems. The results are that logic problems produce significantly more frustration than math problems; greater difficulty leads to significantly greater
frustration; and difficult math problems produce
significantly greater frustration than difficult logic
problems, but the reverse is true for easy problems.
Which effects are significant in the ANOVA?
22. In question 21, say instead the researcher found no
difference between math and logic problems, but
frustration significantly increases with greater difficulty, and this is true for both math and logic problems. Which effects are significant in this ANOVA?
23. Summarize the steps in analyzing a two-way experiment and describe what each step accomplishes.
24. (a) What do researchers do to create a design that
fits a two-way ANOVA? (b) What must be true
about the dependent variable? (c) Which versions
of ANOVA are available?
25. For the following, identify the parametric procedure to perform. (a) We measure babies’ irritability when their mothers are present and when
they are absent. (b) We test the driving ability of
participants who score either high, medium, or
low on the trait of “thrill seeker.” For each type,
we test some participants who have had either 0,
1, or 2 accidents. (c) We compare the degree of
alcoholism in participants with alcoholic parents
to those with nonalcoholic parents. (d) Our participants identify visual patterns after sitting in a
dim room for 1 minute, again after 15 minutes,
and again after 30 minutes. (e) To test if creativity
scores change with age, we test groups of 5-, 10-,
and 15-year-olds. (f) We measure the happiness
of some mothers and the number of children they
have to determine if happiness and number of
children are related.
Chapter 12: Understanding the Two-Way Analysis of Variance
217
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter
13
CHI SQUARE AND
NONPARAMETRIC
PROCEDURES
LOOKING
BACK
GOING
F O R WA R D
Be sure you understand:
Your goals in this chapter are to learn:
• From Chapter 2, the four types of
measurement scales (nominal, ordinal,
interval, and ratio).
• When to use nonparametric statistics.
• From Chapter 9, the independentsamples t-test and the related-samples
t-test.
• From Chapter 11, the one-way betweensubjects or within-subjects ANOVA.
• From Chapter 12, what a two-way
interaction indicates.
Sections
13-1
Parametric versus
Nonparametric Statistics
13-2
13-3
Chi Square Procedures
13-4
The One-Way Chi Square:
The Goodness of Fit Test
• The logic and use of the one-way chi square.
• The logic and use of the two-way chi square.
• The names of the nonparametric procedures used with ordinal
scores.
P
revious chapters have discussed the category of inferential statistics called parametric procedures. Now
we’ll turn to the other category, called nonparametric
statistics. Nonparametric procedures are still used for
deciding whether the relationship in the sample accurately represents the relationship in the population. Therefore, H0 and Ha,
sampling distributions, Type I and Type II errors, alpha, critical
values, and significance all apply. Although a number of different nonparametric procedures are available, we’ll focus on the
The Two-Way Chi Square:
The Test of Independence
most common ones. This chapter presents (1) the one-way chi
13-5
Statistics in the Research
Literature: Reporting X 2
procedures for ordinal scores.
13-6
A Word about
Nonparametric Procedures
for Ordinal Scores
218
square, (2) the two-way chi square, and (3) a brief review of the
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Nicholas Pavloff/Iconica/Getty Images
13-1 PARAMETRIC VERSUS
NONPARAMETRIC STATISTICS
Previous parametric procedures have required that
our dependent scores involve an interval or ratio scale,
that the scores are normally distributed, and that the
population variances are homogeneous. But sometimes researchers obtain data that do not fit these
requirements. Some dependent variables are nominal
variables (e.g., whether someone is male or female).
Sometimes we can measure a dependent variable only
by assigning ordinal scores or “ranks” (e.g., judging
this participant as showing the most of an attribute,
this one the second-most, and so on). And sometimes
a variable involves an interval or ratio scale, but the
populations are severely skewed and/or do not have
homogeneous variance (e.g., we saw that yearly
income forms a positively skewed distribution).
It is better to design a study that allows us to use
parametric procedures, because they are more powerful than nonparametric procedures. Recall this means
we are less likely to make a Type II error, which is missing a relationship that actually exists in nature. But,
on the other hand, if our data violate the rules of a
parametric procedure, then we increase the probability
of making a Type I error (rejecting H0 when it’s true),
so that the actual probability of a Type I error will be
larger than the alpha level we’ve set. Therefore, when
data clearly do not fit a parametric procedure, we
turn to nonparametric procedures. Nonparametric
statistics are inferential procedures used with either
nominal or ordinal data. That is, some nonparametric
procedures are appropriate if we originally measure
participants using nominal scores. Other nonparametric procedures are appropriate for ordinal scores.
However, we have two ways of obtaining such scores.
Our original raw scores may indicate each participant’s
rank. Or, our original scores may be interval or ratio
scores that violate the rules of parametric procedures,
so we transform the scores into ranks, assigning the
highest score a “1,” the next highest a “2,” and so on.
Either way, we then apply a nonparametric procedure
for ordinal data.
Use nonparametric statistics
when dependent scores are
measured using ordinal or
nominal scales.
In published research, the
most common nonparametric data
is nominal data, and then the chi
square procedure is performed.
nonparametric
statistics Inferential
procedures used with
nominal or ordinal
(ranked) data
Chapter 13: Chi Square and Nonparametric Procedures
219
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
13-2 CHI SQUARE
PROCEDURES
Chi square procedures are performed when participants are measured using a nominal variable. With
nominal variables we do not measure an amount, but
rather we indicate the category that participants fall
into, and then count the number—the frequency—of
individuals in each category. Thus, we have nominal variables when counting how many individuals
answer yes or no to a question; how many claim
to vote Republican, Democratic, or Socialist; how
many say they were or were not abused children;
and so on.
After counting the frequency of “category membership” in the sample, we want to draw inferences
about the population. For example, we might find that
out of 100 people, 40 say yes to a question and 60
say no. These numbers indicate how the frequencies
are distributed in the sample. But can we then infer
that if we asked the entire population this question,
40% would also say yes and 60% would say no, or
would we see a different distribution, say with a 50-50 split? To make
chi square
procedure (X 2)
inferences about the frequencies in
The nonparametric
the population, we perform the chi
procedure for
square procedure (pronounced “kigh
testing whether
the frequencies of
square”). The chi square procedure
category membership
is the nonparametric procedure for
in the sample
testing whether the frequencies in
represent the predicted
each category in sample data repfrequencies in the
population
resent specified frequencies in the
population. The symbol for the chi
one-way
chi square
square statistic is X 2.
The procedure for
Theoretically, there is no limit to
testing whether the
the number of categories—levels—
frequencies of category
membership on one
you may have in a variable and no
variable represent the
limit to the number of variables you
predicted distribution in
may have. Therefore, we describe a
the population
chi square design in the same way
observed
we described ANOVA: When a study
frequency (fo)
The frequency with
has only one variable, we use the onewhich participants fall
way chi square; when a study has
into a category of a
two variables, we use the two-way
variable
chi square.
Use the chi square procedure (x 2)
when you count the number
of participants falling into
different categories.
13-3 THE ONE-WAY CHI
SQUARE: THE GOODNESS
OF FIT TEST
The one-way chi square is computed when data
consist of the frequencies with which participants
belong to the different categories of one variable. Here
is an example. Being right-handed or left-handed is
apparently related to brain organization, and many of
history’s great geniuses were left-handed. Therefore,
using an IQ test, we select a sample of 50 geniuses.
Then we count how many are left- or right-handed
(ambidextrous is not an option). The results are
shown here:
Handedness
Left-Handers Right-Handers
fo 10
fo 40
k2
N total fo 50
Each column contains the frequency with which
participants are in that category. We call this the
observed frequency, symbolized by fo. The sum of
the fos from all categories equals N, the total number
of participants in the study. Notice that k stands for
the number of categories, or levels, and here k 2.
So, 10 of the 50 geniuses (20%) are left-handers,
and 40 of them (80%) are right-handers. We might
argue that the same distribution of 20% left-handers
and 80% right-handers would occur in the population of all geniuses. But
there is the usual problem:
sampling error. Maybe our
sample is unrepresentative,
© iStockphoto.com/Hamza Türkkol
220
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
those who disagree
d
with a statement). (5) For
theoretical rreasons, the “expected frequencies”
must be at least
le
5 per category.
13-3a Computing the
One-Way X 2
© iStockphoto.com/Marek Uliasz
The first step in computing x 2 is
to translate H0 into the expected
frequency for each category. The
expected frequency is the
frequency we expect in a category if the sample data perfectly represent the distribution
of frequencies in the population
described by H0. The symbol for an
expected frequency is fe. Our H0
is that the frequencies of left- and
right-handedness are equal. We
translate this into the expected
frequency in each group based
on our N. If our samples perfectly represented equal frequencies, then out o
of our 50 participants, 25 should be
right-handed and
an 25 should be left-handed. Thus,
the expected frequency
fre
in each category is fe 25.
When H0 is that the frequencies in the categories are equal, the fe will be the same in all categories,
and there’s a shortcut for computing it:
© iStockphoto.com/MARIA TOUTOUDAKI
so that in the population of all geniuses we
would not find this distribution off right- and
left-handers. Maybe our results poorly represent some other distribution. Ass
usual, this is the null hypothesis,
implying that we are being misled
by sampling error.
Technically, there is a
relationship in our data here,
because the frequencies change
as handedness changes. Usually,
researchers test the H0 that there
is no difference among the frequencies in the categories in the
population, meaning there is no
relationship in the population. Forr
the moment we’ll ignore that theree
are more right-handers than left-handers
handers in the
world. Therefore, our H0 is that the
he frequencies
of left- and right-handed geniusess are equal in
the population. We have no conventional
ntional way to
write this in symbols, so simply write
rite H0: all frequencies in the population are equal.
al. This implies
that if the observed frequencies (fo) in the sample
are not equal, it is because of sampling
ling error.
The alternative hypothesis always
ays implies that
the relationship does exist in the population
population, so
here it implies that, at a minimum, the frequencies of left- and right-handed geniuses are not equal.
However, like in ANOVA, a study may involve more
than two levels of a variable, and all levels need not
represent a difference in the population. Therefore,
our general way of stating Ha is: not all frequencies in
the population are equal. For our handedness study,
Ha implies that our observed frequencies represent
the frequencies of left- and right-handers found in the
population of geniuses.
We can test only whether the frequencies are or
are not equal, so the one-way x 2 tests only two-tailed
hypotheses. Also, the one-way x 2 has five assumptions: (1) Participants are categorized along one variable having two or more categories, and we count the
frequency in each category. (2) Each participant can
be in only one category (i.e., you cannot have repeated
measures). (3) Category membership is independent:
The fact that an individual is in one category does
not influence the probability that another participant
will be in any category. (4) We include the responses
of all participants in the study (i.e., you would not
count only the number of right-handers, or in a different study, you would count both those who agree and
THE FORMULA FOR EACH EXPECTED
FREQUENCY WHEN TESTING AN H0 OF NO
DIFFERENCE IS:
fe in each category N
k
Thus, in our study, with N 50 and k 2, the
fe in each category 50/2 25. (Sometimes fe may
contain a decimal. For example, if we included a third
category, ambidextrous, then k 3, and each fe would
be 16.67.)
For the handedness study we
expected
have these frequencies:
frequency
(fe) The frequency
Handedness
Left-Handers Right-Handers
fo 10
fo 40
fe 25
fe 25
expected in a
category if the data
perfectly represent
the distribution of
frequencies described
by the null hypothesis
Chapter 13: Chi Square and Nonparametric Procedures
221
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
The x 2 compares the difference between our
observed frequencies and the expected frequencies.
2
We compute an obtained x 2, which we call xobt
.
THE FORMULA FOR THE CHI SQUARE IS:
x a
2
obt
( fo fe )2
fe
b
This says to find the difference between fo and fe in
each category, square that difference, and then divide
it by the fe for that category. After doing this for all
2
categories, sum the quantities, and the answer is xobt
.
Thus, altogether we have:
STEP 1: Compute the fe for each category. We computed our fe to be 25 per category.
STEP 2: Create the fraction
(fo fe) 2
fe
The formula becomes
x a
2
obt
a
( fo fe) 2
fe
for each category.
b
(10 25) 2
(40 25) 2
ba
b
25
25
13-3b Interpreting the One-Way X2
2
We interpret xobt
by determining its location on the x 2
sampling distribution. The X 2-distribution contains
all possible values of x 2 that occur when H0 is true.
Thus, for the handedness study, the x 2-distribution is
the distribution of all possible values of x 2 when the
frequencies in the two categories in the population are
equal. You can envision the x 2-distribution as shown
in Figure 13.1.
Even though the x 2-distribution is not at all normal, it is used in the same way as previous sampling
distributions. When the data perfectly represent the
H0 situation, so that each fo equals its fe, then x 2 is
zero. However, sometimes by chance the observed frequencies differ from the expected frequencies, producing a x 2 greater than zero. The larger the differences,
the larger the x 2. But, the larger the x 2, the less likely
it is to occur when H0 is true. Because x 2 can become
only larger, we again have two-tailed hypotheses but
one region of rejection.
2
To determine if xobt
, is significant, we compare it
2
to the critical value, symbolized by xcrit
. As with previ2
ous statistics, the x -distribution changes shape as the
degrees of freedom change, so to find the appropriate
2
value of xcrit
, first determine the degrees of freedom.
THE FORMULA FOR THE DEGREES OF
FREEDOM IN A ONE-WAY CHI SQUARE IS:
df k 1
STEP 3: Perform the subtraction in the numerator of
each fraction. After subtracting,
2
xobt
a
(15) 2
(15) 2
ba
b
25
25
Remember that k is the number of categories.
Figure 13.1
STEP 4: Square the numerator in each fraction. This
gives
2
xobt
a
225
225
ba
b
25
25
STEP 5: Perform the division in each fraction and
then sum the results.
X 2-distribution
The sampling
distribution of values of
x 2 that occur when the
samples represent the
frequencies described
by the null hypothesis
222
Sampling Distribution of x 2 When H0 Is True
f
Region of rejection
2
xobt
9 9 18
so
x 18
2
obt
STEP 6: Compare 2obt to 2crit This is
discussed below.
α = .05
0
χ2
χ2
χ2
χ 2 greater than 0
χ2
χ2
χ2
χ2
χ2
χ2
χ2
crit
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
© iStockphoto.com/Evgeny Kuklev
Find the critical value of x 2
in Table 6 in Appendix B, titled
“Critical Values of Chi Square.”
For the handedness study, k 2
so df 1, and with a .05,
2
2
the xcrit
3.84. Our xobt
of 18
2
is larger than this xcrit, so the
results are significant: The differences between our observed and
expected frequencies are so large
2
that they, and the xobt
they produce, are unlikely to occur if H0
is true. Therefore, we reject the
H0 that our categories are poorly
representing a distribution of
equal frequencies in the population (rejecting that we
are poorly representing that geniuses are equally leftor right-handed).
2
When our xobt
is significant, we accept the Ha that
the sample represents frequencies in the population
that are not equal. In fact, as in our samples, we would
expect to find about 20% left-handers and 80% righthanders in the population of geniuses. We conclude
that we have evidence of a relationship between the
categories of handedness and the frequency with which
geniuses fall into each. Then, as usual, we interpret the
relationship, here attempting to explain what aspects
of being left-handed and being a genius are related.
Unlike ANOVA, a significant one-way chi square
that involves more than two conditions usually is not
followed by post hoc comparisons. Instead, we simply
use the observed frequency in each category to estimate the frequencies that would be found in the population. Also, there is no measure of effect size here.
2
had not been significant, we would not
If xobt
reject H0 and would have no evidence—one way or
the other—regarding how handedness is distributed
among geniuses.
13-3c The “Goodness of Fit” Test
Notice that the one-way chi square procedure is also
called the goodness of fit test: Essentially, it tests
how “good” the “fit” is between our data and the frequencies we expect if H0 is true. This is simply another
way of asking whether sample data are likely to represent the frequencies in the population described
by H0. This name is especially appropriate when we
test an H0 that does not say the frequencies in all
categories are equal in the population. Instead, from
past research or from a hypothesis, we may create a
“model” of how the frequencies in the different categories are distributed. Then we compute the expected
frequencies (fe) using the model and test whether the
data “fit” the model.
For example, in the handedness study we ignored
the fact that right-handers are more common in the
real world than left-handers. Only about 10% of the
general population is left-handed, so we should have
tested whether our geniuses fit this model. Now H0
is that our geniuses are like the general population,
being 10% left-handed and 90% right-handed. Our
Ha is that the data represent a population of geniuses
that does not have this distribution. Each fe is again
based on our H0. Say that we had tested our previous 50 geniuses. Our H0 says that left-handed geniuses
should occur 10% of the time: 10% of 50 is 5, so
fe 5. Right-handed geniuses should occur 90% of
the time: 90% of 50 is 45, so fe 45. Then we com2
pute xobt
as we did previously:
x a
2
obt
( fo fe) 2
fe
b
(40 45) 2
(10 5) 2
ba
b 5.56
a
5
45
2
With a .05 and k 2, the xcrit
is again 3.84. There2
fore, the xobt of 5.56 is significant: We reject H0 and
conclude that the observed frequengoodness of
cies are significantly different from
fit test A name
what we would expect if handedness
for the one-way chi
in the population of geniuses was dissquare procedure,
tributed as it is in the general populabecause it tests how
“good” the “fit” is
tion. Instead, we estimate that in the
between the data
population of geniuses, 20% are leftand H0
handers and 80% are right-handers.
Chapter 13: Chi Square and Nonparametric Procedures
223
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
13-4 THE TWO-WAY
> Quick Practice
>
The one-way x2 is used when counting
the frequency of category membership
on one variable.
More Examples
Below are the number of acts of graffiti that occur on walls
painted white, painted blue, or covered with chalkboard.
H0 is frequencies of graffiti are equal in the population.
With N 30, fe N/k 30/3 10 for each category.
White
Blue
Chalk
fo 8
fe 10
fo 5
fe 10
fo 17
fe 10
2
xobt
a
(fe fo) 2
fe
b
(10 8) 2 (10 5)2 (10 17)2
10
10
10
4
25
49
2
xobt
7.80
10 10 10
2
xobt
2
With df k 1 2 and a .05, xcrit
5.99, the
wall coverings produce a significant difference in the
frequency of graffiti acts. In the population, we expect
27% of graffiti on white walls, 17% on blue walls, and
57% on chalkboard walls.
For Practice
1. The one-way chi square is used when we count the
______ that participants fall into different ______.
2. We find fo 21 in category A and fo 39 in
category B. Ho is that the frequencies are equal. The
fe for A is ______, and the fe for B is ______.
2
3. Compute xobt
for question 2.
2
4. The df is ______, so at a .05, xcrit
is ______.
2
5. We conclude the xobt
is ______, so in the population we expect membership is around _____% in A
and around _____% in B.
CHI SQUARE: THE TEST OF
INDEPENDENCE
The two-way chi square is used when we count the
frequency of category membership along two variables.
(This is similar to the factorial ANOVA discussed in
the previous chapter.) The procedure for computing
x 2 is the same regardless of the number of categories
in each variable. The assumptions of the two-way chi
square are the same as for the one-way chi square.
13-4a Logic of the Two-Way
Chi Square
Here is a study that calls for a two-way chi square. At
one time psychologists claimed that someone with a
“Type A” personality tends to be very pressured and
never seems to have enough time. The “Type B” personality, however, tends not to be so pressured, and
is more relaxed and mellow. A controversy developed
over whether people with Type A personalities are
less healthy, especially when it comes to having heart
attacks. Therefore, say that we select a sample of 80
people and determine how many are Type A and how
many are Type B. We then count the frequency of heart
attacks in each type. We must also count how many
in each type have not had heart attacks. Therefore,
we have two categorical variables: personality type
(A or B) and health (heart attack or no heart attack).
Table 13.1 shows the layout of this study. Notice, with
two rows and two columns, this is a 2 2 (“2 by 2”)
matrix, so we have a 2 2 design. With different variables, the design might be a 2 3, a 3 4, etc.
Although this looks like a two-way ANOVA, it is
not analyzed like one. The two-way x 2 is also called the
test of independence: It tests only whether the frequency of participants falling into the categories of one
Table 13.1
A Two-Way Chi Square Design Comparing
Participants’ Personality Type and Health
> Answers
4. 1; 3.84 5. significant; 35; 65
2
3. xobt
(30 21)2 (30 39)2
5.40
30
30
1. frequency; categories 2. Each fe 60/2 30.
224
Health
Heart Attack
No Heart Attack
Personality Type
Type A
Type B
fo
fo
fo
fo
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
two-way chi
square The
© iStockphoto.com/Eric Isselée
© iStockphoto.com/Eric Isselée
procedure for testing
whether category
membership on
one variable is
independent of
category membership
on the other variable;
also called the “Test
of Independence”
Table 13.3
Example of Dependence
Personality type and heart attacks are perfectly dependent.
Personality Type
Type A
Type B
Health
Heart Attack
fo 40
fo 0
No Heart Attack
fo 0
fo 40
© iStockphoto.com/Dino Ablakovic
variable is independent of, or unrenrelated to, the frequency of their falling
alling
into the categories of the other variable.
riable.
Thus, in our example we will testt whether
the frequencies of having or not having
aving a heart
attack are independent of the frequencies
quencies of being
Type A or Type B. Essentially, the two-way x 2 tests the
interaction, which, as in the two-way ANOVA, tests
whether the influence of one factor depends on the
level of the other factor that is present. Thus, we’ll ask,
“Does the frequency of people having heart attacks
depend on their frequency of being Type A or Type B?”
To understand “independence,” Table 13.2 shows
an example where category membership is perfectly
independent. Here, the frequency of having or not having a heart attack does not depend on the frequency
of being Type A or Type B. Another way to view the
two-way x 2 is as a test of whether a correlation exists
between the two variables. When variables are independent, there is no correlation, and using the categories from one variable is no help in predicting the
frequencies for the other variable. Here, knowing if
people are Type A or Type B does not help to predict if
they have heart attacks (and heart attacks do not help
in predicting personality type).
How
However,
Table 13.3 shows a pattern
we might see when the variables are
totally dependent. Here, the freto
quency of a heart attack or no heart
q
attack depends on personality type.
at
Likewise, a perfect correlation exists
Like
here because whether people are Type
her
A or
o Type B is a perfect predictor of
whether or not they have had a heart
wh
attack (and vice versa).
attac
Say that our actual data are shown in Table 13.4.
A degree of dependence occurs here because a heart
attack tends to be more frequent for Type A, while no
Table 13.4
13 4
Observed Frequencies as a Function of Personality
Type and Health
Table 13.2
Example of Independence
Personality type and heart attacks are perfectly independent.
Personality Type
Type A
Type B
Health
Heart Attack
fo 20
fo 20
No Heart Attack
fo 20
fo 20
Personality Type
Type A
Type B
Health
Heart Attack
fo 25
No Heart Attack fo 5
fo 10
fo 40
N 80
Chapter 13: Chi Square and Nonparametric Procedures
225
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1. H0 is that category membership on one variable is independent of (not correlated with)
category membership on the other variable. If
the sample data look correlated, this is due to
sampling error.
2. Ha is that category membership on the two
variables in the population is dependent
(correlated).
13-4b Computing the Two-Way
Each fe is based on the probability of a participant
falling into the cell if the two variables are independent. The expected frequency then equals this probability multiplied by N. Luckily, the steps involved in
this can be combined to produce this formula:
THE FORMULA FOR THE EXPECTED
FREQUENCY IN A CELL OF A TWO-WAY
CHI SQUARE IS:
fe (Cell>s row total fo )(Cell>s column total fo )
N
To find the fe for a particular cell, multiply the
total observed frequency for the row containing the
cell times the total observed frequency for the column
containing the cell. Then divide by the N of the study.
Thus, to compute the two-way x 2:
STEP 1: Compute the fe for each cell. Table 13.5 shows
the computations of all fe for the example.
STEP 2: Compute 2obt. Use the same formula as in the
one-way design, which is
Chi Square
Again, the first step is to compute the expected frequencies. To do so, first compute the total of the
observed frequencies in each column and the total of
the observed frequencies in each row. This is shown
in Table 13.5. Also, note N, the total of all observed
frequencies. Now we compute the expected frequency
in each cell.
2
x obt
a
Personality Type
Type A
Type B
Health
a
(5 16.875)2
(40 28.125)2
ba
b
16.875
28.125
Heart fo 25
Row
fo 10
Attack fe 13.125 fe 21.875 Total 35
(35)(30)/80 (35)(50)/80
No
Row
fo 5
fo 40
Heart fe 16.875 fe 28.125 Total 45
Attack (45)(30)/80 (45)(50)/80
Column
total 30
226
b
(25 13.125)2
(10 21.875)2
ba
b
13.125
21.875
Diagram Containing fo and fe for Each Cell
Each fe equals the row total times the column total, divided by N.
fe
First form a fraction for each cell: In the numerator
square the difference between the fe and fo for the cell.
In the denominator is the fe for the cell. Thus, from the
data in Table 13.5 we have
2
xobt
a
Table 13.5
( fo fe)2
Column
total 50
N 80
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
© iStockphoto.com/
heart attack is more frequent for Type B. Therefore,
some degree of correlation exists between the variables.
On the one hand, we’d like to conclude that
this correlation occurs in the population. But, on the
other hand, we have the usual problem of sampling
error: Perhaps a correlation really does not exist in
the population, but by chance we obtained frequencies that poorly represent this, misleading us to believe
the correlation exists. The above choices translate into
our null and alternative hypotheses. In the two-way
chi square:
STEP 3: Perform the subtraction in the numerator of
each fraction. After subtracting, we have
(11.875)2 (11.875)2
2
xobt 13.125
21.875
(11.875)2 (11.875)2
16.875
28.125
having a heart attack depends on
the frequency of being Type A or
Type B (and vice versa).
2
If xobt
is not larger than the critical value, we do not reject H0. Then
we cannot say whether these variables are independent or not.
phi
coefficient (F)
The statistic that
describes the strength
of the relationship in
a two-way chi square
that is a 2 2 design.
STEP 4: Square the numerator in each fraction. This
gives
2
xobt
141.016 141.016
13.125
21.875
141.016 141.016
16.875
28.125
STEP 5: Perform the division in each fraction and
then sum the results.
A significant two-way x 2 indicates
that the sample data are likely
to represent variables that are
dependent (correlated) in the
population.
2
xobt
10.74 6.45 8.36 5.01
13-4c Describing the Relationship
so
in a Two-Way Chi Square
2
xobt
30.56
STEP 6: Compare 2obt to 2crit First, determine the
degrees of freedom. In the diagram of your
study, count the number of rows and columns. Then:
THE FORMULA FOR THE DEGREES OF
FREEDOM IN A TWO-WAY CHI SQUARE IS
df Number of
a
b
rows 1
a
Number of
b
columns 1
2
2
Our xobt
of 30.56 is larger than xcrit
, so
it is significant: The differences between our
observed and expected frequencies are unlikely
to occur if our data represent variables that are
independent. Therefore, we reject H0 that the
variables are independent and accept the alternative hypothesis that the sample represents
variables that are dependent in the population. In other words, the correlation is significant such that the frequency of having or not
© iStockphoto.com/Paul Kline
For our study, df is (2 1) multiplied times (2 1),
which is 1. Find the critical value of x 2 in Table 6 in
2
Appendix B. At a .05 and df 1, the xcrit
is 3.84.
A significant two-way chi square indicates a significant correlation between the two variables. To determine the size of this correlation, we compute one of
two new correlation coefficients, either the phi coefficient or the contingency coefficient.
If you have performed a 2 2 chi square and it
is significant, compute the phi coefficient. The symbol for the phi coefficient is F, and its value can be
between 0 and 1. Think of phi as comparing your
data to the ideal situations that were illustrated back
in Tables 13.2 and 13.3, when the variables are or are
not dependent. A value of 0 indicates that the data
are perfectly independent. The larger the value of phi,
Chapter 13: Chi Square and Nonparametric Procedures
227
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
the closer the data come to being perfectly dependent.
(Real research tends to find values in the range of
.20 to .50.)
THE FORMULA FOR THE PHI
COEFFICIENT IS
f
2
xobt
B N
2
The formula says to divide the xobt
by N (the total
number of participants) and then find the square root.
2
For the heart attack study, xobt
was 30.56 and N
was 80, so
f
2
xobt
B N
30.56
2.382 .62
A 80
Thus, on a scale of 0 to 1, where 1 indicates perfect
dependence, the correlation is .62 between the frequency of heart attacks and the frequency of personality types.
Further, recall that by squaring a correlation coefficient we obtain the proportion of variance accounted
for, which is the proportion of differences in one variable that is associated with the other variable. If we
do not compute the square root in the formula above,
we have f 2. For our study f 2 .38. This is analogous
to r 2, indicating that about 38% of the differences in
whether people have heart attacks are associated with
differences in their personality type—and vice versa.
The other correlation coefficient is the contingency
coefficient, symbolized by C. This is used to describe
a significant two-way chi square that is not a 2 2
design (when it is a 2 3, a 3 3, etc.).
THE FORMULA FOR THE CONTINGENCY
COEFFICIENT IS
C
2
xobt
2
B N xobt
> Quick Practice
>
>
The two-way x2 is used when counting
the frequency of category membership
on two variables.
The H0 is that category membership for
one variable is independent of category
membership for the other variable.
More Examples
We count the participants who like or dislike statistics
and their gender. The H0 is that liking/disliking is independent of gender. The results are
Like
Dislike
Male
fo 20
fe 15
fo 10
fe 15
Total fo 30
Female
fo 5
fe 10
fo 15
fe 10
Total fo 20
Total
fo 25
Total
fo 25
As above, first compute each fe
(row total fo)(column total fo)
fe N
For example, for male–like: fe (30)(25)>50 15. Then
2
xobt
a
(fo fe)2
fe
b
(20 15)2 (10 15)2
15
15
(5 10)2 (15 10)2
10
10
2
xobt
8.334
df (Number of Rows 1)(Number of Columns 1)
df (2 1)(2 1) 1
2
2
with a .05, xcrit
3.84, so xobt
is significant: The frequency of liking/disliking statistics depends on whether
participants are male or female.
For Practice
contingency
coefficient (C )
The statistic that
describes the strength of
the relationship in a twoway chi square that is not
a 2 2 design.
228
This says to first add N to the
2
xobt
in the denominator. Then divide
2
that quantity into xobt
, and then find
the square root. Interpret C in the
same way as f. Likewise, C 2 is analogous to f 2.
1. The two-way x 2 is used when counting the ______
with which participants fall into the ______ of two
variables.
2. The H0 is that the categories of one variable are
______ of those of the other variable.
(continued)
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
3. Below are the frequencies for people who are
satisfied/dissatisfied with their job and who do/
don’t work overtime. What is the fe in each cell?
Overtime
N 50
Figure 13.2
Frequencies of (a) Left- and Right-Handed Geniuses
and (b) Heart Attacks and Personality Type
No Overtime
Satisfied
fo 11
fo 3
Dissatisfied
fo 8
fo 12
2
4. Compute xobt
.
40
30
f
20
10
2
5. The df ______ and xcrit
______.
0
6. What do you conclude about these variables?
Left
Right
Handedness
(a)
> Answers
45
40
35
30
f 25
20
15
10
5
0
Heart attacks
No heart attacks
A
B
Personality type
(b)
2
6. xobt
is significant: The frequency of job satisfaction/
dissatisfaction depends on the frequency of
overtime/no overtime.
5. 1; 3.84
(11 7.824)2 (3 6.176)2
2
4. xobt
7.824
6.176
(8 11.176)2 (12 8.824)2
4.968
11.176
8.824
3. For satisfied–overtime, fe 7.824; for satisfied–no
overtime, fe 6.176; for dissatisfied–overtime,
fe 11.176; for dissatisfied–no overtime, fe 8.824.
2. independent
1. frequency; categories
13-5 STATISTICS IN THE
RESEARCH LITERATURE:
REPORTING X 2
The chi square is reported like previous results, except
that in addition to df, we also include the N. For
example, in our one-way design involving geniuses
and handedness, we tested an N of 50, df was 1, and
2
the significant xobt
was 18. We report these results as
x2(1, N 50) 18.00, p .05
We report a two-way x 2 using the same format.
As usual, a graph is useful for summarizing the data.
For a one-way design, label the Y axis with frequency
and the X axis with the categories, and then plot the fo
in each category. Because the X variable is nominal, we
create a bar graph. The upper bar graph in Figure 13.2
shows the results of our handedness study.
The lower graph in Figure 13.2 shows the bar
graph for our heart attack study. To graph the data
from a two-way design, place frequency on the Y axis
and one of the nominal variables on the X axis. The
levels of the other variable are indicated in the body
of the graph. (This is similar to the way a two-way
interaction was plotted in the previous chapter, except
that here we create bar graphs.)
13-6 A WORD ABOUT
NONPARAMETRIC
PROCEDURES FOR
ORDINAL SCORES
Recall that we also have nonparametric procedures
that are used with ordinal (rank-ordered) scores. The
procedures are the same regardless of whether the
Chapter 13: Chi Square and Nonparametric Procedures
229
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
original raw scores were ranks or
were interval or ratio scores that you
then transformed to ranks.
The coefficient that
Although the computations for
describes the linear
nonparametric procedures are difrelationship between
ferent from those for parametric
pairs of ranked scores
procedures, their logic and rules are
Mann–Whitney
the same. A relationship occurs here
test The
nonparametric version
when the ordinal scores consistently
of the independentchange. For example, in an experisamples t-test for
ment we might see the scores change
ranked scores
from predominantly low ranks in one
Wilcoxon test
condition (with many participants
The nonparametric
tied at, or near, 1st, 2nd, etc.) to preversion of the relatedsamples t-test for
dominantly higher ranks in another
ranked scores
condition (with many participants at
Kruskal–
or near, say, 20th). The null hypothWallis test
esis says our data show this pattern
The nonparametric
because of sampling error: We are
version of the one-way
between-subjects
poorly representing that no relationANOVA for ranked
ship occurs in the population where
scores
each condition contains a mix of
Friedman test
high and low ranks. The alternative
The nonparametric
hypothesis says our data reflect the
version of the one-way
within-subjects ANOVA
relationship that would be found in
for ranked scores
the population. We test H0 by computing an obtained statistic that describes
our data. By comparing it to a critical
value, we determine whether the sample relationship
is significant. If it is, then our data are so unlikely to
occur when H0 is true that we reject that H0 was true
for our study. Instead, we conclude that the predicted
relationship exists in the population (in nature). If the
data are not significant, we retain H0 and make no conclusion about the relationship, one way or the other.
In the literature you will encounter a number of
nonparametric procedures. The computations for each
Spearman
correlation
coefficient (rs)
Perform nonparametric procedures
for ranked data when the
dependent variable is measured
in, or transformed to, ordinal
scores.
are found in more advanced textbooks (or you can use
SPSS). We won’t dwell on their computations because
you are now experienced enough to compute and understand them if you encounter them. However, you should
know when we use the most common procedures.
13-6a Common Nonparametric
Procedures for Ranked Scores
1. The Spearman correlation coefficient is
analogous to the Pearson correlation coefficient
for ranked data. Its symbol is rs. It produces a
number between {1 that describes the strength
and type of linear relationship that is present
when data consist of pairs of X-Y scores that are
both ordinal scores. If significant, rs estimates the
corresponding population coefficient.
2. The Mann–Whitney test is analogous to the
independent-samples t-test. It is performed when
a study contains two independent samples of
ordinal scores.
3. The Wilcoxon test is analogous to the relatedsamples t-test. It is performed when a study has
two related samples of ordinal scores. Recall
that related samples occur either through
matching or through repeated measures.
PCN Photography/Alamy
4. The Kruskal–Wallis test is analogous to
a one-way between-subjects ANOVA. It
is performed when a study has one factor
with at least three conditions, and each
involves independent samples of ordinal
scores.
230
5. The Friedman test is analogous to a
one-way within-subjects ANOVA. It is
performed when a study has one factor
with at least three levels, and each involves
related samples of ordinal scores.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
USING SPSS
Check out Review Card 13.4 for instructions on using SPSS to perform the one-way or the two-way chi square
procedure. The program computes x 2obt and provides the usual minimum a level. In the two-way design, the
program also computes f or C.
Also, SPSS will perform the nonparametric procedures for ordinal scores discussed in this chapter. Consult an
advanced SPSS text.)
Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out
and use the Chapter Review Cards in the back of your book. Check out the additional study aids online
in CourseMate at www.cengagebrain.com
STUDY PROBLEMS
(Answers for odd-numbered problems are in Appendix C.)
1. What do all nonparametric inferential
procedures have in common with all parametric
procedures?
2. (a) Which variable in an experiment determines
whether to use parametric or nonparametric
procedures? (b) Which two scales of measurement
always require nonparametric procedures?
3. (a) What two things can be “wrong” with interval
or ratio scores that lead us to use nonparametric
procedures? (b) What must you do to the interval/
ratio scores first?
4. (a) Why, if possible, should you design a study that
meets the assumptions of a parametric test instead
of a nonparametric test? (b) Explain the error this
relates to. (c) Why shouldn’t you use parametric procedures if data violate the assumptions?
(d) Explain this error.
5. What do researchers do to create a design
requiring a one-way chi square?
6. What do researchers do to create a design
requiring a two-way chi square?
7. (a) What is the major difference between two
studies if one uses the one-way ANOVA and the
other uses the one-way chi square? (b) How is
the purpose of the ANOVA and the chi square
the same?
8. (a) What is the symbol for observed frequency
and what does it refer to? (b) What is the symbol for expected frequency and what does it
refer to?
2
9. (a) When calculating xobt
what makes it become a
2
larger number? (b) Why does a larger xobt
mean
that H0 is less likely to be true? (c) What does the
x 2 sampling distribution show? (d) Why do we
2
reject H0 when xobt
is in the region of rejection and
significant?
10. (a) Usually what is H0 in a one-way chi square?
(b) How do we interpret a significant one-way
chi square? (c) What is H0 in a two-way chi square?
(d) How do we interpret a significant two-way
chi square?
11. What are the two ways to go about computing the
fe in a one-way x 2 depending upon our experimental hypotheses?
12. (a) When is the phi coefficient computed, and
when is the contingency coefficient computed?
(b) What do both indicate?
13. A survey of 89 women finds that 34 prefer to
go out with men much taller than themselves,
and 55 prefer going out with men slightly
taller than themselves. We ask whether there
is really no preference in the population. (a)
What procedure should we perform? (b) In
words, what are H0 and Ha? (c) What must you
2
compute before calculating xobt
and what are
2
your answers? (d) Compute xobt and what do you
conclude about the preferences of women in the
population? (e) Describe how you would graph
these results.
14. A report about an isolated community claims
there are more newborn females than males,
although we assume equal frequencies for
each. Records indicate 628 boys and 718 girls
born in the past month. (a) What are H0 and Ha?
(b) Compute the appropriate statistic. (c) What
do you conclude about birthrates in this
community? (d) Report your results in the
correct format.
Chapter 13: Chi Square and Nonparametric Procedures
231
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
15. The following data reflect the frequency with
which people voted in the last election and were
satisfied with the officials elected. We wonder if
voting and satisfaction are correlated.
Satisfied
Voted
Yes
No
Yes
48
35
No
33
52
20. (a) What is the name of the nonparametric
correlation coefficient for ranked data and what is
its symbol? (b) What do you use it for?
(a) What procedure should we perform?
(b) What are H0 and Ha? (c) What must you
2
compute before calculating xobt
and what
2
answers did you compute? (d) Compute xobt
.
(e) What do you conclude about these variables? (f) How consistent is this relationship?
16. As part of the above study, we also counted
the frequency of the different political party
affiliations for men and women to see if they are
related. The following data were obtained:
Affiliation
Republican Democrat
Gender
Other
Men
18
43
14
Women
39
23
18
(a) What procedure should we perform? (b) What
are H0 and Ha? (c) What do you compute next and
2
what are your answers? (d) Compute xobt
. (e) What
do you conclude about gender and party affiliation in the population? (f) How consistent is this
relationship?
17. In the general population, political party affiliation
is 30% Republican, 55% Democrat, and 15% Other.
To determine whether these percentages also “fit”
the elderly population, we ask a sample of 100
senior citizens and find 26 Republicans, 66 Democrats, and 8 Other. (a) What procedure should we
perform? (b) What are H0 and Ha? (c) What must
you compute next and what are your answers?
2
(d) Compute xobt
. (e) What do you conclude about
party affiliation among senior citizens?
2
18. After testing 40 participants, a significant xobt
of
13.31 was obtained. With a .05 and df 2, how
would this result be reported in a publication?
232
19. Foofy counts the students who like Professor Demented and those who like Professor
Randomsampler. She then performs a one-way x 2
to test for a significant difference between the
frequencies of students liking each professor.
(a) Why is this approach incorrect? (Hint: Check
the assumptions of the one-way x 2.) (b) How
should she analyze the data?
21. What is the nonparametric version of each of
the following? (a) a one-way between-subjects
ANOVA; (b) an independent-samples t-test;
(c) a related-samples t-test; (d) a one-way
within-subjects ANOVA
22. A researcher performed the Mann–Whitney
test and found a significant difference between
psychologists and sociologists. Without knowing
anything else, what does this tell you about the
researcher’s ultimate conclusions about sampling
error versus a relationship in nature?
23. Select the nonparametric procedure to perform
for the following: (a) We test the effect of a pain
reliever on rankings of the emotional content of
words describing pain. One group is tested before
and after taking the drug. (b) We test the effect of
four different colors of spaghetti sauce on its tastiness. A different sample tastes each color, and tastiness scores are ranked. (c) Last semester a teacher
gave 25 As, 35 Bs, 20 Cs, and 10 Ds. According to
college policy, each grade should occur 20% of the
time. Is the teacher diverging from the college’s
model? (d) We examine the (skewed) reaction time
scores after one group of participants consumes 1,
3, and then 5 alcoholic drinks. (e) We test whether
two levels of family income produced a difference
in the percentage of income spent on clothing last
year. Percentages are then ranked.
24. (a) How do you recognize when you need to
perform x 2? (b) How do you recognize whether
to perform a one-way or a two-way x 2?
(c) Summarize the steps when performing the
x 2 procedure.
25. Thinking back on this and previous chapters, what
three aspects of the design of your independent
variable(s) and one aspect of your dependent variable determine the specific inferential procedure
to perform in a particular experiment?
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
WHY
CHOOSE?
Every 4LTR Press solution comes complete with a visually
engaging textbook in addition to an interactive eBook.
Go to CourseMate for Behavioral Sciences STAT2 to
begin using the eBook.
Complete the Speak Up
survey in CourseMate at
www.cengagebrain.com
Follow us at
www.facebook.com/4ltrpress
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
©iStockphoto.com/A-Digit
Access at www.cengagebrain.com
appendix
A
Sections
A-1
Review of Basic Math
A-2
Computing Confidence
Intervals for the TwoSample t-Test
A-3
Computing the Linear
Regression Equation
A-4
Computing the Two-Way
Between-Subjects ANOVA
A-5
Computing the One-Way
Within-Subjects ANOVA
234
MATH REVIEW
AND ADDITIONAL
COMPUTING
FORMULAS
A-1
REVIEW OF BASIC MATH
The following is a review of the math used in performing statistical
procedures. There are accepted systems for identifying mathematical
operations, for rounding answers, for computing a proportion and a
percent, and for creating graphs.
A-1a Identifying Mathematical Operations
Here are the mathematical operations you’ll use in statistics, and
they are simple ones. Addition is indicated by the plus sign, so for
example, 4 2 is 6. (I said this was simple!) Subtraction is indicated
by the minus sign. We read from left to right, so X Y is read as
“X minus Y.” This order is important because 10 4, for example,
is 6, but 4 10 is 6. With subtraction, pay attention to what is
subtracted from what and whether the answer is positive or negative. Adding two negative numbers together gives a larger negative
number, so 4 3 7. Adding a negative number to a positive
number is the same as subtracting by the negative’s amount, so 5 2 5 2 3. When subtracting a negative number, a double negative produces a positive. Thus, in 4 3, the minus 3 becomes 3,
so we have 4 3 7.
X
We indicate division by forming a fraction, such as . The number
Y
above the dividing line is called the numerator, and the number
below the line is called the denominator. Always express fractions as
decimals, dividing the denominator into the numerator. (After all, 1/2
equals .5, not 2!)
Multiplication is indicated in one of two ways. We may place
two components next to each other: XY means “X times Y.” Or we
may indicate multiplication using parentheses: 4(2) and (4)(2) both
mean “4 times 2.”
The symbol X 2 means square the score, so if X is 4, X 2 is 16.
Conversely, 2X means “Find the square root of X,” so 24 is 2.
(The symbol 2 also means “Use your calculator.”)
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
A-1b Determining the Order of
Mathematical Operations
Statistical formulas often call for a series of mathematical steps. Sometimes the steps are set apart by parentheses. Parentheses mean “the quantity,” so always
find the quantity inside the parentheses first and then
perform the operations outside of the parentheses on
that quantity. For example, (2)(4 3) indicates to
multiply 2 times “the quantity 4 plus 3.” So first add,
which gives (2)(7), and then multiply to get 14.
A square root sign also operates on “the quantity,” so always compute the quantity inside the square
root sign first. Thus, 22 7 means find the square
root of the quantity 2 7; so 22 7 becomes 29,
which is 3.
Most formulas are giant fractions. Pay attention
to how far the dividing line is drawn because the
length of a dividing line determines the quantity that
is in the numerator and the denominator. For example, you might see a formula that looks like this:
X
6
3
14
264
The longer dividing line means you should divide the
square root of 64 into the quantity in the numerator,
so first work in the numerator. Before you can add 6/3
to 14, you must reduce 6/3 by dividing 6/3 2. Then
you have
X
2 14
264
Now adding 2 14 gives 16, so
X
16
264
Before we can divide we must find the square root of
64, which is 8, so we have
X
16
2
8
When working with complex formulas, perform one
step at a time and then rewrite the formula. Trying to
do several steps in your head is a good way to make
mistakes.
If you become confused in reading a formula,
remember that there is a rule for the order of mathematical operations. Often this is summarized with
PEMDAS, or you may recall the phrase “Please Excuse
My Dear Aunt Sally.” Either way, the letters indicate
that, unless otherwise indicated, first compute inside
any Parentheses, then compute Exponents (squaring
and square roots), then Multiply or Divide, and finally,
Add or Subtract. Thus, for (2)(4) 5, first multiply 2
times 4 and then add 5. For 22 32, first square each
number, resulting in 4 9, which is then 13. Finally,
an important distinction is whether a squared sign is
inside or outside of parentheses. Thus, in (22 32) we
square first, giving (4 9), so the answer is 13. But!
For (2 3)2 we add first, so 2 3 5 and then squaring gives (5)2, so the answer is 25!
A-1c Working with Formulas
We use a formula to find an answer, and we have symbols that stand for that answer. For example, in the
formula B AX K, the B stands for the answer we
will obtain. The symbol for the unknown answer is
always isolated on one side of the equal sign, but we
will know the numbers to substitute for the symbols
on the other side of the equal sign. For example, to
find B, say that A 4, X 11, and K 3. In working any formula, the first step is to copy the formula
and then rewrite it, replacing the symbols with their
known values. Thus, start with
B AX K
Filling in the numbers gives
B 4(11) 3
Rewrite the formula after performing each mathematical operation. Above, multiplication takes precedence
over addition, so multiply and then rewrite the formula as
B 44 3
After adding,
B 47
Do not skip rewriting the formula after each step!
A-1d Rounding Numbers
Close counts in statistics, so you must carry out calculations to the appropriate number of decimal places.
Usually, you must “round off” your answer. The rule
is this: Always carry out calculations so that your final
Appendix A: Math Review and Additional Computing Formulas
235
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
answer after rounding has two more decimal places
than the original scores. Usually, we have wholenumber scores (e.g., 2 and 11) so the final answer contains two decimal places. But say the original scores
contain one decimal place (e.g., 1.4 and 12.3). Here
the final answer should contain three decimal places.
So when beginning a problem, first decide the
number of decimals that should be in your final answer.
However, if there are intermediate computing steps,
do not round off to this number of decimals at each
step. This will produce substantial error in your final
answer. Instead, before rounding, carry out each intermediate step to more decimals than you’ll ultimately
need. Then the error introduced will be smaller. If the
final answer is to contain two decimal places, round
off your intermediate answers to at least three decimal
places. Then after you’ve completed all calculations,
round off the final answer to two decimal places.
Round off your final answer to
two more decimal places than
are in the original scores.
To round off a calculation use the following rules:
If the number in the next decimal place is 5
or greater, round up. For example, to round to
two decimal places, 2.366 is rounded to 2.370,
which becomes 2.37.
If the number in the next decimal place is less
than 5, round down: 3.524 is rounded to 3.520,
which becomes 3.52.
We add zeros to the right of the decimal point to
indicate the level of precision we are using. For example,
rounding 4.966 to two decimal places produces 5, but
to show we used the precision of two decimal places, we
report it as 5.00.
A-1e Computing Proportions
and Percents
Sometimes we will transform an individual’s original score into a proportion. A proportion is a decimal number between 0 and 1 that indicates a fraction
236
of the total. To transform a number to a proportion,
divide the number by the total. If 4 out of 10 people
pass an exam, then the proportion of people passing
the exam is 4/10, which equals .4. Or, if you score 6
correct on a test out of a possible 12, the proportion
you have correct is 6/12, which is .5.
We can also work in the opposite direction from a
known proportion to find the number out of the total
it represents. Here, multiply the proportion times the
total. Thus, to find how many questions out of 12 you
must answer correctly to get .5 correct, multiply .5
times 12, and voilà, the answer is 6.
We can also transform a proportion into a percent. A percent (or percentage) is a proportion multiplied by 100. Above, your proportion correct was
.5, so you had (.5)(100) or 50% correct. Altogether,
to transform the original test score of 6 out of 12 to
a percent, first divide the score by the total to find
the proportion and then multiply by 100. Thus (6/12)
(100) equals 50%.
To transform a percent back into a proportion,
divide the percent by 100 (above, 50/100 equals .5).
Altogether, to find the test score that corresponds to a
certain percent, transform the percent to a proportion
and then multiply the proportion times the total number possible. Thus, to find the score that corresponds
to 50% of 12, transform 50% to a proportion, which
is .5 and then multiply .5 times 12. So, 50% of 12 is
equal to (50/100)(12), which is 6.
Recognize that a percent is a whole unit: Think
of 50% as 50 of those things called percents. On the
other hand, a decimal in a percent is a proportion of
one percent. Thus, .2% is .2, or two-tenths, of one
percent, which is .002 of the total.
A-1f Creating Graphs
Recall that the horizontal line across the bottom of
a graph is the X axis, and the vertical line at the lefthand side is the Y axis. (Draw the Y axis so that it is
about 60 to 75% of the length of the X axis.) Where
the two axes intersect is always labeled as a score of
zero on X and a score of zero on Y. On the X axis,
scores become larger positive scores as you move to
the right. On the Y axis, scores become larger positive
scores as you move upward.
Say that we measured the height and weight of
several people. We decide to place weight on the Y
axis and height on the X axis. (How to decide this
is discussed later.) We plot the scores as shown in
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Figure A.1. Notice that because the lowest height
score is 63, the lowest label on the X axis is also 63.
The symbol // in the axis indicates that we cut out the
part between 0 and 63. We do this with either axis
when there is a large gap between 0 and the lowest
score we are plotting.
In the body of the graph, we plot the scores from
the table above the graph. Jane is 63 inches tall and
weighs 130 pounds, so we place a dot above the
height of 63 and opposite the weight of 130. And so
on. As mentioned in Chapter 2, each dot on a graph
is called a data point. Notice that you read the graph
by using the scores on one axis and the data points.
For example, to find the weight of the person who has
a height of 67, travel vertically from 67 to the data
point and then horizontally to the Y axis: 165 is the
corresponding weight.
Always label the X and Y axes to indicate what
the scores measure (not just X and Y), and always give
your graph a title indicating what it describes.
Practice these concepts with the following practice
problems.
Figure A.1
For Practice
1.
(a) 13.7462 (b) 10.043 (c) 10.047 (d) .079 (e) 1.004
2.
The intermediate answers in a formula based
on whole-number scores are X 4.3467892
and Y 3.3333. What values of X and Y do
we use when performing the next step in the
calculations?
3.
For Q (X Y)(X 2 Y 2), find Q when X 8 and
Y 2.
4.
Below, find D when X 14 and Y 3.
Da
Height
Weight
Jane
Bob
Mary
Tony
Sue
Mike
63
64
65
66
67
68
130
140
155
160
165
170
Using the formula in problem 4, find D when
X 9 and Y 4.
6.
(a) What proportion is 5 out of 15? (b) What
proportion of 50 is 10? (c) One in a thousand
equals what proportion?
7.
Transform each answer in problem 6 to a percent.
8.
Of the 40 students in a gym class, 35% played
volleyball and 27.5% ran track. (a) What proportion
of the class played volleyball? (b) How many students played volleyball? (c) How many ran track?
9.
You can earn a total of 135 points in your statistics course. To pass you need 60% of these
points. (a) How many points must you earn to
pass the course? (b) You actually earned a total of
115 points. What percent of the total did you earn?
10.
Weight (in pounds)
170
160
150
140
130
0
Jane
63
64
65
66
Height (in inches)
67
68
XY
b 12X 2
Y
5.
Plot of Height and Weight Scores
Person
Round off the following numbers to two
decimal places:
Create a graph showing the data points for the
following scores:
X Score
Student’s Age
20
25
35
45
25
40
45
Y Score
Student’s Test Score
10
30
20
60
55
70
3
Appendix A: Math Review and Additional Computing Formulas
237
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
> Answers
15
Test scores
the difference between our two sample means. This
procedure is appropriate when the sample means are
from independent samples. For example, in Chapter 9
we discussed the experiment that compared recall
scores under the conditions of Hypnosis and Nohypnosis. We found a difference of 3 between the
sample means 1X1 X2 2. If we could examine the
corresponding m1 and m2, we’d expect that their difference (m1 m2) would be around 3. We say “around”
because we may have sampling error, so the actual
difference between m1 and m2 might be 2 or 4. The
confidence interval contains the highest and lowest
values around 3 that the difference between our
sample means is likely to represent.
THE FORMULA FOR THE CONFIDENCE
INTERVAL FOR THE DIFFERENCE BETWEEN
TWO ms IS
(sX X )(tcrit ) (X1X2) m1m2 10.
20
25
30
Age
35
40
45
80
70
60
50
40
30
20
10
0
Plot of Students’ Age and Test Scores
9. (a) 60% of 135 is (60%/100)(135) 81
(b) (115/135)(100) 85%
8. (a) 35%/100 .35 (b) (.35)(40) 14
(c) (27.5%/100)(40) 11
7. (a) 33% (b) 20% (c) .1%
1
2
(sX X )(tcrit) (X1X2)
1
2
Here, m1 m2 stands for the unknown difference we
are estimating. The tcrit is the two-tailed value found
for the appropriate a at df (n1 1) (n2 1).
The values of sX X and (X1 X2) are computed in the
independent-samples t-test.
1
2
13
5. D =a
b(3)=( 3.250)(3)= 9.75
4
6. (a) 5/15 .33 (b) 10/50 .20 (c) 1/1000 .001
4. D (3.667)(3.742) 13.72
3. Q (8 2)(64 4) (6)(68) 408
2. Carry at least three places, so X 4.347 and
Y 3.333.
1. (a) 13.75 (b) 10.04 (c) 10.05 (d) .08 (e) 1.00
A-2
COMPUTING
CONFIDENCE INTERVALS FOR
THE TWO-SAMPLE t-TEST
Two versions of a confidence interval can be used
to describe the results from the two-sample t-test
described in Chapter 9. For the independent-samples
t-test we compute the confidence interval for the difference between two ms; for the related-samples t-test
we compute the confidence interval for mD.
A-2a Confidence Interval for the
Difference between Two Ms
The confidence interval for the difference between
two ms describes a range of differences between two
ms, any one of which is likely to be represented by
238
The confidence interval for the
difference between two ms describes
the difference between the
population means represented
by the difference between
our sample means in the
independent-samples t-test.
In the hypnosis study, the two-tailed tcrit for df 30 and a .05 is 2.042, sX X is 1.023, and X1 X2
is 3. Filling in the formula gives
1
2
(1.023)(2.042) (3) m1 m2 (1.023)(2.042) (3)
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Multiplying 1.023 times ±2.042 gives
2.089 (3) m1 m2 2.089 (3)
So finally,
.911 m1 m2 5.089
Because a .05, this is the 95% confidence interval:
We are 95% confident that the interval between .911
and 5.089 contains the difference we’d find between
the ms for no-hypnosis and hypnosis. In essence,
if someone asked how big is the average difference
between when the population recalls under hypnosis
and when it recalls under no-hypnosis, we’d be 95%
confident the difference is between .91 and 5.09.
The tcrit is the two-tailed value for df N 1,
where N is the number of difference scores. The sD is
the standard error of the mean difference computed in
the t-test, and D is the mean of the difference scores.
For example, in Chapter 9 we compared the fear
scores of participants who had or had not received our
phobia therapy. We found that the mean difference score
in the sample was D 3.6, sD 1.25, and with a = .05
and df 4, tcrit is ±2.776. Filling in the formula gives
(1.25)(2.776) 3.6 mD (1.25)(2.776) 3.6
which becomes
(3.47) 3.6 mD (3.47) 3.6
and so
For Practice
1. In question 13 of the study problems in Chapter 9,
what is the 95% confidence interval for the
difference between the ms?
2. In question 21 of the study problems in Chapter 9,
what is the 95% confidence interval for the
difference between the ms?
> Answers
.13 mD 7.07
Thus, we are 95% confident that our sample mean
of differences represents a population mD within this
interval. In other words, if we performed this study
on the entire population, we would expect the average difference in before- and after-therapy scores to be
between .13 and 7.07.
For Practice
2. (1.03)(2.101) 2.6 m1 m2 (1.03)(2.101)
2.6 4.76 m1 m2 .44
1. In question 15 of the study problems in Chapter 9,
what is the 95% confidence interval for mD?
1. (1.78)(2.048) 4 m1 m2 (1.78)(2.048) 4 .35 m1 m2 7.65
2. In question 17 of the study problems in Chapter 9,
what is the 95% confidence interval for mD?
> Answers
2. (.359)(2.262) 1.2 mD (.359)(2.262) 1.2 .39 mD 2.01
A-2b Computing the Confidence
Interval for MD
THE FORMULA FOR THE CONFIDENCE
INTERVAL FOR mD IS
(sD )(tcrit) D mD (sD )(tcrit) D
1. (.75)(2.365) 2.63 mD (.75)(2.365) 2.63 .86 mD 4.40
The other confidence interval is used with the relatedsamples t-test to describe the m of the population of
difference scores (mD) that is represented by our sample of difference scores (D). The confidence interval
for mD describes a range of values of mD, one of which
our sample mean is likely to represent. The interval
contains the highest and lowest values of mD that are
not significantly different from D.
A-3
COMPUTING
THE LINEAR REGRESSION
EQUATION
As discussed in Chapter 10, computing the linear
regression equation involves computing two components: the slope and the Y intercept.
First we compute the slope of the regression
line, which is symbolized by b. This is a number that
Appendix A: Math Review and Additional Computing Formulas
239
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
mathematically conveys the direction and the amount
that the regression line is slanted. A negative number indicates a negative linear relationship; a positive
number indicates a positive relationship. A slope of 0
indicates no relationship.
THE FORMULA FOR THE SLOPE OF THE
REGRESSION LINE IS
N(XY ) (X )(Y )
b
N(X 2) (X )2
N is the number of pairs of scores in the sample,
and X and Y are the scores in our data. Notice that
the numerator of the formula here is the same as the
numerator of the formula for r, and that the denominator of the formula here is the left-hand quantity in
the denominator of the formula for r. [An alternative
formula is b (r)(SY /SX).]
For example, in Chapter 10 we examined the
relationship between daily juice consumption and
yearly doctor visits and found r .95. In the data
(in Table 10.1) we found X 17, (X)2 289,
X2 45, Y 47, XY 52, and N 10. Filling
in the above formula for b gives:
b
10(52) (17)(47) 520 799
10(45) 289
450 289
279
1.733
161
Thus, in this negative relationship we have a negative
slope of b 1.733.
Next we compute the Y intercept, symbolized
by a. This is the value of Y when the regression line
crosses the Y axis. (Notice that a negative number is
also possible here if the regression line crosses the Y
axis at a point below the X axis.)
THE FORMULA FOR THE Y INTERCEPT OF
THE REGRESSION LINE IS
a Y (b)(X )
Here we first multiply the mean of the X scores times
the slope of the regression line. Then we subtract that
quantity from the mean of the Y scores. For our example, X 1.70, Y 4.70, and b 1.733, so
a 4.70 (1.733)(1.70) 4.70 (2.946)
240
Subtracting a negative number is the same as adding
its positive value, so
a 4.70 2.946 7.646
Thus, when we plot the regression line, it will cross the
Y axis at the Y of 7.646.
A-3a Applying the Linear
Regression Equation
We apply the slope and the Y intercept in the linear
regression equation.
THE LINEAR REGRESSION EQUATION IS
Y bX a
This says to obtain the Y for a particular X, multiply
the X by b and then add a.
For our example data, substituting our values of
b and a, we have
Y (1.733)X 7.646
This is the equation for the regression line that
summarizes our juicedoctor visits data.
To plot the regression line: We need at least two
data points to plot a line. Therefore, choose a low
value of X, insert it into the completed regression
equation, and calculate the value of Y. Choose a
higher value of X and calculate Y for that X. (Do not
select values of X that are above or below those found
in your original data.) Plot your values of X-Y and
connect the data points with a straight line.
To predict a Y score: To predict an individual’s Y
score, enter his/her X score into the completed regression equation and compute the corresponding Y. This is
the Y score we predict for anyone who scores at that X.
A-3b The Standard Error
of the Estimate
We compute the standard error of the estimate to determine the amount of error we expect to have when we
use a relationship to predict Y scores. Its symbol is SY.
THE FORMULA FOR THE STANDARD ERROR
OF THE ESTIMATE IS
sY (SY )( 21r 2)
This says to find the square root of the quantity 1 r2
and then multiply it times the standard deviation of all Y
scores (SY). In our juicedoctor visits data in Table 10.1,
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
the Y 47, Y 2 275, and N 10. Thus, first we
compute SY.
(Y ) 2
(47)2
275 N
10
N
R
10
Y 2 sY R
25.41 2.326
Our r in these data is .95, so the standard error of
the estimate is
sY (SY ) 121r 2 2 (2.326) 121(.95)22
(2.326) 121.90252 (2.326)(.312).73
Thus, in this relationship, SY =.73. It indicates that
we expect the “average error” in our predictions to
be .73 when we use this regression equation to predict
Y scores. For example, if someone’s Y score is 2, we
expect to be “off” by about .73, so we expect his or
her actual Y score will be between 1.27 and 2.73.
For Practice
1. Compute the regression equation for each set of
scores.
a.
X
1
1
2
2
3
3
Y
3
2
4
5
5
6
b.
X
1
1
2
2
3
4
Y
5
3
4
3
2
1
2. What will the standard error of the estimate indicate
for each set of scores?
> Answers
A-4
COMPUTING
THE TWO-WAY BETWEENSUBJECTS ANOVA
As discussed in Chapter 12, the following presents the
formulas for computing the two-way between subjects ANOVA, the Tukey HSD test for main effects
and interactions, and h2.
A-4a Computing the ANOVA
Chapter 12 discusses a 3
2 design for the factors
of volume of a message and participants’ gender, and
the dependent variable of persuasiveness. Organize
the data as shown in Table A.1. The ANOVA involves
five parts: computing (1) the sums and means, (2) the
sums of squares, (3) the degrees of freedom, (4) the
mean squares, and (5) the Fs.
COMPUTING THE SUMS AND MEANS
STEP 1: Compute X and X2 in each cell. Note the
n of the cell. For example, in the malesoft
cell, X 4 9 11 24; X2 42
92 112 218; n 3. Also, compute
the mean in each cell (for the malesoft
cell, X 8). These are the interaction
means.
STEP 2: Compute X vertically in each column of the
study’s diagram. Add the Xs from the cells
in a column (e.g., for soft, X 24 12).
Note the n in each column (here, n 6) and
compute the mean for each column (e.g.,
Xsoft 6). These are the main effect means
for factor A.
STEP 3: Compute X horizontally in each row of
the diagram. Add the Xs from the cells in
a row (for males, X 24 33 50 107). Note the n in each row (here, n 9).
Compute the mean for each row (e.g.,
Xmale 11.89). These are the main effect
means for factor B.
STEP 4: Compute Xtot. Add the Xs from the levels
(columns) of factor A, so Xtot 36 69 68 173.
STEP 5: Compute X2tot. Add the X2s from all cells,
so X2tot 218 377 838 56 470 116 2075. Note N 18.
2. It indicates the “average error” we expect between
the actual Y scores and the predicted Y scores.
6(32) (13)(18)
b. b 1.024;
6(35) (13)2
a 3 (1.024)(2.167) 5.219;
Y 1.042X 5.219
6(56) (12)(25)
1. a. b =
= 1.5;
6(28) (12)2
a 4.167 (1.5)(2) 1.17;
Y 1.5X 1.17
Appendix A: Math Review and Additional Computing Formulas
241
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table A.1
Summary of Data for 3
2 ANOVA
A1: Soft
Factor A: Volume
A2: Medium
A3: Loud
4
9
11
8
12
13
18
17
15
X8
B1: Male
Factor B: Gender
B2: Female
X 33
X 2 377
X 50
X 2 838
n3
2
6
4
n3
9
10
17
n3
6
8
4
X4
X 12
X6
242
X 7.33
n3
n3
n3
Xsoft 6
Xmed 11.5
Xloud 11.33
Xtot 173
X 36
X 69
X 68
X tot2 2075
n6
n6
n6
N 18
d
(173)2
d 2057 1662.72
18
412.28
n9
X 18
X 2 116
This says to divide (Xtot)2 by N and then subtract the
answer from X2tot.
From Table A.1, Xtot 173, X2tot 2075, and
N 18. Filling in the formula gives
SStot 2075 c
X 107
X 36
X 2 470
THE FORMULA FOR THE TOTAL SUM OF
SQUARES IS
N
Xmale 11.89
X 12
X 2 56
STEP 1: Compute the total sum of squares.
2
SStot X tot
c
X 16.67
X 24
X 2 218
COMPUTING THE SUMS OF SQUARES
(Xtot)2
X 11
X 66
n9
Note: (Xtot)2/N above is also used later and is called
the correction (here, the correction equals 1662.72).
STEP 2: Compute the sum of squares for factor A.
Always have factor A form your columns.
THE FORMULA FOR THE SUM OF SQUARES
BETWEEN GROUPS FOR COLUMN FACTOR A
IS
SSA c
c
(X in the column)2
d
n of scores in the column
(Xtot)2
N
d
This says to square the Xs in each column of the
study’s diagram, divide by the n in the column, add the
answers together, and subtract the correction.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
362 182
b 1662.72
3
3
From Table A.1 the column sums are 36, 69, and
68, and n was 6, so
SSA a
362 692 682
1732
ba
b
6
6
6
18
(216 793.5 770.67) 1662.72
SSA 1780.17 1662.72 117.45
STEP 3: Compute the sum of squares between groups
for factor B. Factor B should form the rows.
THE FORMULA FOR THE SUM OF SQUARES
BETWEEN GROUPS FOR ROW FACTOR B IS
SSB c
c
(X in the row)2
d
n of scores in the row
(Xtot)2
N
d
This says to square the Xs for each row of the diagram and divide by the n in the level. Then add the
answers and subtract the correction.
In Table A.1, the row sums are 107 and 66, and
n was 9, so
SSB a
1072 662
b 1662.72
9
9
1756.11 1662.72 93.39
STEP 4: Compute the sum of squares between groups
for the interaction. First, compute the overall
sum of squares between groups, SSbn.
THE FORMULA FOR THE OVERALL SUM OF
SQUARES BETWEEN GROUPS IS
(X in the cell)2
SSbn c
d
n of scores in the cell
c
(Xtot)2
N
d
Find (X)2 for each cell and divide by the n of the
cell. Then add the answers together and subtract the
correction.
From Table A.1
SSbn a
242 332 502 122
3
3
3
3
SSbn 1976.33 1662.72 313.61
To find SSA B, subtract the sum of squares for both
main effects (in Steps 2 and 3) from the overall SSbn.
Thus,
THE FORMULA FOR THE SUM OF
SQUARES BETWEEN GROUPS FOR THE
INTERACTION IS
SSA
B
SSbn SSA SSB
In our example, SSbn 313.61, SSA 117.45, and SSB
93.39, so
SSA
B
313.61 117.54 93.39 102.77
STEP 5: Compute the sum of squares within groups.
Subtract the overall SSbn in Step 4 from the
SStot in Step 1 to obtain the SSwn.
THE FORMULA FOR THE SUM OF SQUARES
WITHIN GROUPS IS
SSwn SStot SSbn
Above, SStot = 412.28 and SSbn = 313.61, so
SSwn 412.28 313.61 98.67
COMPUTING THE DEGREES OF FREEDOM
STEP 1: The degrees of freedom between groups for
factor A is kA 1, where kA is the number of
levels in factor A. (In our example, kA is the
three levels of volume, so dfA 2.)
STEP 2: The degrees of freedom between groups for
factor B is kB 1, where kB is the number of
levels in factor B. (In our example, kB is the
two levels of gender, so dfB 1.)
STEP 3: The degrees of freedom between groups for
the interaction is the df for factor A multiplied by the df for factor B. (In our example,
dfA 2 and dfB 1, so dfA B 2.)
STEP 4: The degrees of freedom within groups equals
N kA B, where N is the total N of the study
and kA B is the number of cells in the study.
(In our example, N is 18 and we have six
cells, so dfwn 18 6 12.)
Appendix A: Math Review and Additional Computing Formulas
243
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
STEP 5: The degrees of freedom total
equals N 1. Use this to check
your previous calculations,
because the sum of the above
dfs should equal dftot. (In our
example, dftot 17.)
Table A.2
Summary Table of Two-Way ANOVA with df and Sums of Squares
Place each SS and df in the ANOVA
summary table as shown in Table A.2.
Perform the remainder of the computations using this table.
COMPUTING THE MEAN SQUARES
STEP 1: Compute the mean square
between groups for factor A.
Source
Between
Factor A (volume)
Factor B (gender)
Interaction (vol
Within
Total
gen)
Sum of
Squares
df
117.45
93.39
102.77
98.67
412.28
2
1
2
12
17
Mean
Square
F
MSA
MSB
FA
FB
MSA B
MSwn
FA
B
THE FORMULA FOR THE MEAN SQUARE
BETWEEN GROUPS FOR FACTOR A IS
MSA SSA
STEP 4: Compute the mean square within groups.
dfA
THE FORMULA FOR THE MEAN SQUARE
WITHIN GROUPS IS
From Table A.2,
117.45
MSA 58.73
2
STEP 2: Compute the mean square between groups
for factor B.
THE FORMULA FOR THE MEAN SQUARE
BETWEEN GROUPS FOR FACTOR B IS
MSB SSwn
dfwn
Thus, we have
MSwn 98.67
8.22
12
SSB
COMPUTING F
dfB
STEP 1: Compute the Fobt for factor A.
In our example
MSB MSwn 93.39
93.39
1
THE FORMULA FOR THE MAIN EFFECT OF
FACTOR A IS
FA STEP 3: Compute the mean square between groups
for the interaction.
MSA
MSwn
In our example, we have
THE FORMULA FOR THE MEAN
SQUARE BETWEEN GROUPS FOR THE
INTERACTION IS
MSA
B
SSA
dfA
B
B
FA 58.73
7.14
8.22
STEP 2: Compute the Fobt for factor B.
THE FORMULA FOR THE MAIN EFFECT OF
FACTOR B IS
Thus, we have
102.77
MSA B 51.39
2
244
FB MSB
MSwn
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Thus,
FB three effects are significant: FA (7.14) is larger than its
Fcrit (3.88), FB (11.36) is larger than its Fcrit (4.75), and
FA B (6.25) is larger than its Fcrit (3.88).
93.39
11.36
8.22
A-4b Performing the Tukey
STEP 3: Compute the Fobt for the interaction.
HSD Test
THE FORMULA FOR THE INTERACTION
EFFECT IS
FA
B
MSA
Perform post hoc comparisons on any significant Fobt.
If the ns in all levels are equal, perform Tukey’s HSD
procedure. However, the procedure is computed differently for an interaction than for a main effect.
B
MSwn
Thus, we have
FA
B
PERFORMING TUKEY’S HSD TEST ON MAIN EFFECTS
Perform the HSD on each main effect, using the procedure described in Chapter 11 for a one-way design.
51.39
6.25
8.22
And now the finished summary table is in Table A.3.
THE FORMULA FOR THE HSD IS
INTERPRETING EACH F Determine whether each Fobt
HSD (qk)a
is significant by comparing it to the appropriate Fcrit.
To find each Fcrit in the F-table (Table 4 in Appendix
B), use the dfbn and the dfwn used in computing the corresponding Fobt.
1. To find Fcrit for testing FA, use dfA as the df
between groups and dfwn. In our example, dfA 2
and dfwn 12. So for a .05, the Fcrit is 3.88.
2. To find Fcrit for testing FB, use dfB as the df
between groups and dfwn. In our example, dfB 1
and dfwn 12. So at a .05, the Fcrit is 4.75.
3. To find Fcrit for the interaction, use dfA B as the df
between groups and dfwn. In our example, dfA B 2
and dfwn 12. Thus, at a .05, the Fcrit is 3.88.
Interpret each Fobt as you have previously: If an
Fobt is larger than its Fcrit, the corresponding main effect
or interaction effect is significant. For the example, all
Table A.3
Completed Summary Table of Two-Way ANOVA
Source
Between
Factor A (volume)
Factor B (gender)
Interaction (vol
Within
Total
gen)
Sum of
Squares
df
Mean
Square
117.45
93.39
102.77
2
1
2
58.73
93.39
51.39
98.67
412.28
12
17
8.22
MSwn
b
B n
The MSwn is from the two-way ANOVA, and qk is
found in Table 5 of Appendix B for dfwn and k (where
k is the number of levels in the factor). The n in the formula is the number of scores in a level. Be careful here:
For each factor there may be a different value of n and
of k. In the example, six scores went into each mean
for a level of volume (each column), but nine scores
went into each mean for a level of gender (each row).
The n is the n in each group that you are presently
comparing! Also, because qk depends on k, when factors have a different k, they have different values of qk.
After computing the HSD for a factor, find the difference between each pair of its main effect means. Any
difference that is larger than the HSD is significant. In
the example, for volume, n 6; MSwn 8.22; and with
a .05, k 3, and dfwn 12, the qk is
3.77. Thus, the HSD is 4.41. The main
effect mean for soft (6) differs from the
means for medium (11.5) and loud (11.33)
by more than 4.41, so these are significant
differences. The means for medium and
F
loud, however, differ by less than 4.41, so
they do not differ significantly. No HSD is
7.14
needed for the gender factor.
11.36
6.25
PERFORMING TUKEY’S HSD TEST ON
INTERACTION EFFECTS The post hoc
comparisons for a significant interaction
involve the cell means. However, as discussed in Chapter 12, we perform only
Appendix A: Math Review and Additional Computing Formulas
245
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
unconfounded comparisons, in which two cells
differ along only one factor. Therefore, we find
the differences only between the cell means within
the same column or within the same row. Then
we compare the differences to the HSD. However,
when computing the HSD for an interaction, we
find qk using a slightly different procedure.
Previously, we found qk in Table 5 using k, the
number of means being compared. For an interaction, we first determine the adjusted k. This value
“adjusts” for the actual number of unconfounded
comparisons you will make. Obtain the adjusted
k from Table A.4 (or at the beginning of Table 5
of Appendix B). In the left-hand column locate the
design of your study. Do not be concerned about
the order of the numbers. We called our persuasiveness study a 3 2 design, so look at the row
labeled “2 3.” Reading across that row, confirm
that the middle column contains the number of cell
means in the interaction (we have 6). In the righthand column is the adjusted k (for our study it is 5).
The adjusted k is the value of k to use to obtain qk
from Table 5. Thus, for the persuasiveness study with
a .05, dfwn 12, and in the column labeled k 5,
the qk is 4.51. Now compute the HSD using the same
formula used previously. In each cell are 3 scores, so
HSD (qk)a
7.47
MSwn
8.22
b (4.51)a
b
B n
A 3
The HSD for the interaction is 7.47.
The differences between our cell means are shown
in Table A.5. On the line connecting any two cells is
the absolute difference between their means. Any
difference between two means that is larger than the
Table A.4
Table A.5
Table of the Interaction Cells Showing the Difference
between Unconfounded Means
Factor B:
Gender
Factor A: Volume
A1: Soft
A2: Medium
A3: Loud
8.0
11.0
16.67
3.0
5.67
1.0
10.67
4.0
8.67
B1:
Male
4.0
B2:
Female
12.0
8.0
6.0
6.0
2.0
HSD 7.47
HSD is a significant difference. Only three differences
are significant: (1) between the mean for females at the
soft volume and the mean for females at the medium
volume, (2) between the mean for males at the soft
volume and the mean for males at the loud volume,
and (3) between the mean for males at the loud volume and the mean for females at the loud volume.
A-4c Computing H2
In the two-way ANOVA, we again compute eta squared
(h2) to describe effect size—the proportion of variance
in dependent scores that is accounted for by a relationship. Compute a separate h2 for each significant main
and interaction effect.
THE FORMULA FOR ETA SQUARED IS
h2 SSbn
SStot
Values of Adjusted k
Design of
Study
2 2
246
Number of Cell
Means in Study
4
Adjusted
Value of k
3
2
2
3
4
6
8
5
6
3
3
9
7
3
4
4
4
12
16
8
10
4
5
20
12
Here, we divide the SStot into the sum of squares
between groups for each significant effect, either SSA,
SSB, or SSA B. For example, for our factor A (volume),
SSA was 117.45 and SStot was 412.28. Therefore, the h2
is .28. Thus, the main effect of changing the volume of
a message accounts for 28% of our differences in persuasiveness scores. For the gender factor, SSB is 93.39,
so h2 is .23: The conditions of male or female account
for an additional 23% of the variance in scores.
Finally, for the interaction, SSA B is 102.77, so h2 .25: The particular combination of gender and volume
we created in our cells accounts for an additional 25%
of the differences in persuasiveness scores.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
2 1
> Answers
2 1
e. Compute the effect size where appropriate.
1 2
d. What do you conclude about the relationships
that this study demonstrates?
1 2
c. Perform the appropriate post hoc
comparisons.
b. For factor A, X1 8.9, X2 10.1; for factor B,
X1 11.9, X2 7.1; for the interaction,
XA B 9.0, XA B 8.8, XA B 14.8, XA B 5.4.
b. Compute the main effect means and interaction
means.
c. Because factor A is not significant and factor B
contains only two levels, such tests are unnecessary for them. For A B, adjusted k 3, so qk 3.65,
HSD (3.65)126.05>52 4.02; the only significant
differences are between males and females tested
by a male, and between females tested by a male
and females tested by a female.
a. Using a .05, perform an ANOVA and
complete the summary table.
d. Conclude that a relationship exists between gender
and test scores when testing is done by a male, and
that male versus female experimenters produce a
relationship when testing females, p < .05.
Level B2:
Female
105.8
.33.
325
Factor B:
Experimenter
B, h2 Level B1:
Male
Factor A: Participants
Level A1:
Level A2:
Males
Females
6
8
11
14
9
17
10
16
9
19
8
4
10
6
9
5
7
5
10
7
115.2
.35; for A
325
1. A study compared the performance of males and
females tested by either a male or a female experimenter. Here are the data:
e. For B, h2 For Practice
A-5
COMPUTING THE
ONE-WAY WITHIN-SUBJECTS
ANOVA
This section contains formulas for the one-way withinsubjects ANOVA discussed in Chapter 11. (However,
it also involves the concept of an interaction described
in Chapter 12, which is briefly explained here.) This
ANOVA is used when either the same participants
are measured repeatedly or different participants are
matched under all levels of one factor. (Statistical terminology still uses the old-fashioned term subjects
instead of the more modern participants.) The other
assumptions are (1) the dependent variable is a normally distributed ratio or interval variable and (2) the
population variances are homogeneous.
A-5a Logic of the One-Way
Within-Subjects ANOVA
As an example, say we’re interested in whether one’s
form of dress influences how comfortable one feels
in a social setting. On three consecutive days, we ask
participants to “greet” people arriving for a different
experiment. On day one, participants dress casually;
For each factor, df 1 and 16, so Fcrit 4.49:
Factor B and the interaction are significant,
p < .05.
Between groups
Factor A
Factor B
Interaction
Within groups
Total
1. a. Source
7.20
115.20
105.80
96.80
325.00
Sum of
Squares
1
1
1
16
df
7.20
115.20
105.80
6.05
Mean
Square
1.19
19.04
17.49
F
Appendix A: Math Review and Additional Computing Formulas
247
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
on another day,
Table A.6
they dress semiforOne-Way Repeated-Measures Study of the Factor of Type of Dress
mally; on another
day, they dress forFactor A: Type of Dress
mally. Each day
Level A1:
Level A2:
Level A3:
participants comCasual
Semiformal
Formal
plete a questionnaire measuring the
Xsub 14
1
4
9
1
dependent variable
Xsub 21
2
6
12
3
of their comfort
Xsub 16
3
8
4
4
level. Our data are
Subjects
shown in Table A.6.
Xsub 15
4
2
8
5
As usual, we test
Xsub 19
5
10
7
2
whether the means
from the levels repTotal:
resent different ms.
X 30
X 40
X 15
Xtot 30 40 15 85
Therefore, H0: m1 2
2
2
X 220 X 354
X 55 X tot2 220 354 55 629
m2 m3, and Ha: Not
all ms are equal.
n1 5
n2 5
n3 5
N 15
Notice that this
k3
X1 6
X2 8
X3 3
one-way ANOVA
can be viewed as
a two-way ANOVA:
Factor A (the columns) is one factor, and the different
participants or subjects (the rows) are a second facComputing the One-Way
tor, here with five levels. That is, essentially we creWithin-Subjects ANOVA
ated a “condition” when we combined Subject 1
and Casual dress, producing a score of 4. This situSTEP 1: Compute the X, the X, and the X2 for
ation is different from when we combined Subject
each level of factor A (each column). Then
2 with Casual dress, producing a score of 6. As discompute Xtot and X2tot. Also, compute
cussed in Chapter 12, such a combined condition
Xsub, which is the X for each participant’s
is called a “cell,” and combining subjects with type
scores (each row).
of dress creates the “interaction” between these two
Then follow these steps.
variables.
In Chapter 12, we computed F by dividing by
STEP 2: Compute the total sum of squares.
the mean square within groups (MSwn). This was an
estimate of the variability in the population. We comTHE FORMULA FOR THE TOTAL SUMS OF
puted MSwn using the differences between the scores
SQUARES IS
in condition or cell and their mean. However, in Table
(Xtot)2
A.4, each cell contains only one score. Therefore, the
2
SS
X
a
b
tot
tot
mean of each cell is the score in the cell, and the difN
ferences within a cell are always zero. So, we cannot
compute MSwn in the usual way.
From the example, we have
Instead, the mean square for the interaction
(85)2
between factor A and subjects (abbreviated MSA subs)
SS
629
a
b
tot
reflects the variability of scores. It is because of the
15
variability among people that the effect of type of
SStot 629 481.67 147.33
dress will change as we change the “levels” of which
Note that the quantity (Xtot)2 >N is the correction in
participant we test. Therefore, MSA subs is our estithe following computations. (Here, the correction is
mate of the variance in the scores, and it is used as the
481.67.)
denominator of the F-ratio.
1
2
3
4
5
A-5b
248
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
STEP 3: Compute the sum of squares for the column
factor, factor A.
In the example,
SSA
THE FORMULA FOR THE SUM OF SQUARES
BETWEEN GROUPS IS
SSA c
c
147.33 63.33 11.33 72.67
subs
STEP 6: Determine the degrees of freedom.
(Sum of scores in the column)2
d
n of scores in the column
THE DEGREES OF FREEDOM BETWEEN
GROUPS FOR FACTOR A IS
(Xtot)2
dfA kA 1
N
d
kA is the number of levels of factor A. In the example,
kA 3, so dfA is 2.
Find X in each level (column) of factor A, square the
sum, and divide by the n of the level. After doing this
for all levels, add the results together and subtract the
correction.
In the example,
THE DEGREES OF FREEDOM FOR THE
INTERACTION IS
dfA
302 402 152
SSA a
b 481.67
5
5
5
(kA1)(ksubs1)
subs
kA is the number of levels of factor A, and ksubs is the
number of participants. In the example with three levels of factor A and five subjects, dfA subs (2)(4) 8.
SSA 545 481.67 63.33
STEP 4: Find the sum of squares for the row factor
of subjects.
STEP 7: Find the mean squares for factor A and the
interaction.
THE FORMULA FOR THE SUM OF SQUARES
FOR SUBJECTS IS
SSsubs (Xsub1)2 (Xsub2)2 g (Xn)2
(Xtot)2
ka
THE FORMULA FOR THE MEAN SQUARE FOR
FACTOR A IS
MSA N
MSA 142 212 162 152 192
481.67
3
SSsubs 493 481.67 11.33
STEP 5: Find the sum of squares for the interaction
by subtracting the sums of squares for the
other factors from the total.
THE FORMULA FOR THE INTERACTION OF
FACTOR A BY SUBJECTS IS
SSA
SStot SSA SSsubs
subs
dfA
In our example,
Square the sum for each subject (Xsub). Then add the
squared sums together. Next, divide by ka, the number
of levels of factor A. Finally, subtract the correction.
In the example,
SSsubs SSA
SSA
dfA
63.33
31.67
2
THE FORMULA FOR THE MEAN SQUARE
FOR THE INTERACTION OF FACTOR A BY
SUBJECTS IS
MSA
subs
SSA
dfA
subs
subs
In our example,
MSA
subs
SSA
dfA
subs
subs
72.67
9.08
8
Appendix A: Math Review and Additional Computing Formulas
249
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
STEP 8: Find Fobt.
c. With a .05, what do you conclude about Fobt?
THE FORMULA FOR THE WITHIN-SUBJECTS
F-RATIO IS
Fobt e. What is the effect size in this study?
MSA
MSA
subs
In the example,
Fobt MSA
MSA
d. Perform the post hoc comparisons.
subs
31.67
3.49
9.08
The finished summary table is
Sum of
Source
Squares
Subjects
11.33
Factor A (dress)
63.33
Interaction
72.67
(A subjects)
Total
147.33
df
4
2
8
Mean
Square
31.67
9.08
F
3.49
14
STEP 9: Find the critical value of F in Table 5 of
Appendix B. Use dfA as the degrees of
freedom between groups and dfA subs as the
degrees of freedom within groups. In the
example for a .05, dfA 2, and dfA subs 8,
the Fcrit is 4.46.
Interpret the above Fobt the same way you did in
Chapter 11. Our Fobt is not larger than Fcrit, so it is not
significant. Had Fobt been significant, then at least two
of the means from the levels of type of dress differ
significantly. Then, for post hoc comparisons, graphing,
eta squared, and confidence intervals, follow the procedures discussed in Chapter 11. However, in any of
those formulas, in place of the term MSwn use MSA subs.
For Practice
1. We study the influence of practice on eye-hand
coordination. We test people with no practice,
1 hour of practice, or 2 hours of practice.
a. What are H0 and Ha?
Subjects
1
2
3
4
5
6
7
8
Amount of Practice
None 1 Hour 2 Hours
4
3
6
3
5
5
1
4
3
3
4
6
1
5
6
2
6
7
2
4
5
1
3
8
2. You measure 21 students’ degrees of positive
attitude toward statistics at four equally spaced
intervals during the semester. The mean score for
each level is: time 1, 62.50; time 2, 64.68; time 3,
69.32; and time 4, 72.00. You obtain the following
sums of squares:
Source
Subjects
Factor A
Sum of Squares
402.79
189.30
688.32
A subjects
Total
1280.41
df
Mean Square
F
a. What are H0 and Ha?
b. Complete the ANOVA summary table
c. With a .05, what do you conclude
about Fobt?
d. Perform the appropriate post hoc comparisons.
e. What is the effect size in this study?
f. What should you conclude about this
relationship?
b. Complete the ANOVA summary table.
250
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Appendix A: Math Review and Additional Computing Formulas
251
1. a. H0: m1 = m2 = m3; Ha: Not all ms are equal.
b. SStot 477 392.04; SSA 445.125 392.04;
1205
b 392.04
3
and SSsubs a
Source
Subjects
Factor A
A subjects
Total
Sum of Squares df
9.63
7
53.08
2
22.25
14
84.96
c. With dfA 2 and dfA
Fobt is significant.
subs
Mean
Square
F
f. Attitudes during the second half of the semester
are significantly higher than during the first half,
but this is a small to moderate effect.
2. a. H0: m1 m2 m3 m4; Ha: Not all ms
are equal.
b.
26.54
1.59
Source
Subjects
Factor A
16.69
23
A subjects
Total
14, the Fcrit is 3.74. The
d. The qk 3.70 and HSD 1.65. The means for 0,
1, and 2 hours are 2.13, 4.25, and 5.75, respectively. Significant differences occurred between 0
and 1 hour and between 0 and 2 hours, but not
between 1 and 2 hours.
e. Eta squared (h2) 53.08
.62.
84.96
> Answers
Sum of Squares df
402.79
20
189.30
3
688.32
60
1280.41
c. With dfA 3 and dfA
Fobt is significant.
subs
Mean
Square
63.10
11.47
F
5.50
83
60, the Fcrit is 2.76. The
d. The qk 3.74 and HSD 2.76. The means at time
1 and time 2 differ from those at times 3 and 4,
but time 1 and time 2 do not differ significantly,
and neither do times 3 and 4.
e. Eta squared (h2) 189.30>1280.41 .15.
appendix
B
STATISTICAL
TABLES
Sections
Table 1
Proportions of Area
under the Standard
Normal Curve: The
z-Table
Table 2
Critical Values of t:
The t-Table
Table 3
Critical Values of the
Pearson Correlation
Coefficient: The
r-Table
Table 4
Critical Values of F:
The F-Table
Table 5
Values of
Studentized Range
Statistic, qk
Table 6
Critical Values of Chi
Square: The X2-Table
252
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table 1
Proportions of Area under the Standard Normal Curve: The z-Table
Column (A) lists z-score values. Column (B) lists the proportion of the area between the mean and the z-score value. Column (C) lists the proportion of the
area beyond the z-score in the tail of the distribution. (Note: Because the normal distribution is symmetrical, areas for negative z-scores are the same as
those for positive z-scores.)
A
z
B B
B B
B B
C
C
⫺z X ⫹z
B
Area
between
Mean and z
C
C
⫺z X ⫹z
B
Area
between
Mean and z
C
C
⫺z X ⫹z
B
Area
between
Mean and z
C
Area
beyond z
in Tail
A
z
C
Area
beyond z
in Tail
A
z
C
Area
beyond z
in Tail
(continued)
Appendix B: Statistical Tables
253
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table 1 (cont.)
Proportions of Area under the Standard Normal Curve: The z-Table
A
z
254
B B
B B
B B
C
C
⫺z X ⫹z
B
Area
between
Mean and z
C
C
⫺z X ⫹z
B
Area
between
Mean and z
C
C
⫺z X ⫹z
B
Area
between
Mean and z
C
Area
beyond z
in Tail
A
z
C
Area
beyond z
in Tail
A
z
C
Area
beyond z
in Tail
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table 1 (cont.)
Proportions of Area under the Standard Normal Curve: The z-Table
A
z
B B
B B
B B
C
C
⫺z X ⫹z
B
Area
between
Mean and z
C
C
⫺z X ⫹z
B
Area
between
Mean and z
C
C
⫺z X ⫹z
B
Area
between
Mean and z
C
Area
beyond z
in Tail
A
z
C
Area
beyond z
in Tail
A
z
C
Area
beyond z
in Tail
Appendix B: Statistical Tables
255
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table 2
Critical Values of t : The t-Table
(Note: Values of —tcrit ⫽ values of ⫹ tcrit.)
Two-Tailed Test
–tcrit
0
One-Tailed Test
0
+tcrit
Alpha Level
df
A ⴝ .05
+tcrit
Alpha Level
A ⴝ .01
df
A ⴝ .05
A ⴝ .01
From Table 12 of E. Pearson and H. Hartley, Biometrika Tables for Statisticians, Vol. 1, 3rd ed. Cambridge: Cambridge University Press, 1966. Reprinted with the permission of the
Biometrika Trustees.
256
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table 3
Critical Values of the Pearson Correlation Coefficient: The r-Table
Two-Tailed Test
–rcrit
0
One-Tailed Test
0
+rcrit
Alpha Level
df (no. of pairs ⴚ 2)
A ⴝ .05
A ⴝ .01
+rcrit
Alpha Level
df (no. of pairs ⴚ 2)
A ⴝ .05
A ⴝ .01
From R. A. Fisher and F. Yates, Statistical Tables for Biological, Agricultural and Medical Research, 6th ed. Copyright © 1963, R. A. Fisher and F. Yates. Reprinted by permission of Pearson
Education Limited.
Appendix B: Statistical Tables
257
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table 4
Critical Values of F: The F-Table
Critical values for a ⫽ .05 are in dark numbers.
Critical values for a ⫽ .01 are in light numbers.
Degrees of Freedom
within Groups
(degrees of freedom
in denominator
of F-ratio)
258
Fcrit
0
Degrees of Freedom between Groups
(degrees of freedom in numerator of F-ratio)
α
1
2
3
4
5
6
7
8
9
10
11
12
14
16
20
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table 4 (cont.)
Critical Values of F: The F-Table
Degrees of Freedom
within Groups
(degrees of freedom
in denominator
of F-ratio)
Degrees of Freedom between Groups
(degrees of freedom in numerator of F-ratio)
α
1
2
3
4
5
6
7
8
9
10
11
12
14
16
20
(continued)
Appendix B: Statistical Tables
259
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table 4 (cont.)
Critical Values of F: The F-Table
Degrees of Freedom
within Groups
(degrees of freedom
in denominator
of F-ratio)
Degrees of Freedom between Groups
(degrees of freedom in numerator of F-ratio)
α
1
2
3
4
5
6
7
8
9
10
11
12
14
16
20
From G. Snedecor and W. Cochran, Statistical Methods, 8th edition. Copyright © 1989 by the Iowa State University Press.
260
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table 5
Values of Studentized Range Statistic, qk
For a one-way ANOVA, or a comparison of the means from a main effect, the value of k is the number of means in the factor. To compare the
means from an interaction, find the appropriate design (or number of cell means) in the table below and obtain the adjusted value of k. Then use
adjusted k as k to find the value of qk.
Values of Adjusted k
Design of
Study
Number of Cell
Means in Study
2⫻2
2⫻3
2⫻4
3⫻3
3⫻4
4⫻4
4⫻5
Adjusted
Value of k
4
6
8
9
12
16
20
3
5
6
7
8
10
12
Values of qk for a ⫽ .05 are dark numbers and for a ⫽ .01 are light numbers.
Degrees of Freedom
within Groups
(degrees of freedom
in denominator
of F-ratio)
k ⴝ Number of Means Being Compared
α
2
3
4
5
6
7
8
9
10
11
12
(continued)
Appendix B: Statistical Tables
261
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table 5 (cont.)
Values of Studentized Range Statistic, qk
Degrees of Freedom
within Groups
(degrees of freedom
in denominator
of F-ratio)
k ⴝ Number of Means Being Compared
α
2
3
4
5
6
7
8
9
10
11
12
From B. J. Winer, Statistical Principles in Experimental Design, McGraw-Hill, Copyright © 1962. Reproduced by permission of the McGraw-Hill Companies, Inc.
262
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Table 6
Critical Values of Chi Square: The χ2-Table
2
χ crit
0
Alpha Level
df
A ⴝ .05
A ⴝ .01
From R. A. Fisher and F. Yates, Statistical Tables for Biological, Agricultural and
Medical Research, 6th ed. Copyright © 1963, R. A. Fisher and F. Yates. Reprinted
by permission of Pearson Education Limited.
Appendix B: Statistical Tables
263
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
appendix
C
ANSWERS TO
ODD-NUMBERED
STUDY PROBLEMS
Chapter 1
1. To understand the laws of nature pertaining to
the behaviors of living organisms.
3. (a) It is the large group of individuals (or scores)
to which we think a law of nature applies. (b) It
is a subset of the population that is actually
measured and that represents or stands in for
the population. (c) Assuming the sample is
representative, we use the scores and behaviors
observed in the sample to infer the scores and
behaviors that would be found in the population.
(d) The behavior of everyone in a specified group
in nature.
5. (a) A relationship exists when as the scores on one
variable change, the scores on the other variable
change in a consistent fashion. (b) No consistent
pattern of change occurs with virtually the same
batch of Y scores occurring at every X.
7. (a) It accurately reflects the scores and relationship in the population. (b) It is different from
and inaccurately reflects the data found in the
population. (c) The luck of the draw of the
particular participants selected for the sample.
9. In an experiment, the researcher actively controls
and manipulates one variable (the independent
variable). In a correlational study, the researcher
passively measures participants’ scores on two
variables.
11. The independent variable is the overall variable
the researcher is interested in; the conditions
are the specific amounts or categories of the
independent variable under which participants
are tested.
13. (a) Statistics describe an aspect of a sample, and
parameters describe an aspect of a population.
(b) Statistics are symbolized by English letters,
and parameters are symbolized by Greek letters.
15. (a) A continuous variable allows for fractional
amounts. A discrete variable measures fixed
264
17.
19.
21.
23.
amounts that cannot be broken into smaller
amounts. (b) Nominal and ordinal scales are
assumed to be discrete; interval and ratio scales
are assumed to be continuous.
Researcher A has an experiment because alcohol
consumption is manipulated. Researcher B has
a correlational study because both variables are
simply measured.
(a) The independent variable is volume of music.
The conditions are whether the music is soft,
loud, or absent. The dependent variable is the
final exam score. (b) The independent variable
is size of the college. The conditions are small,
medium, and large. The dependent variable is the
amount of fun had. (c) The independent variable
is birth order. The conditions are being born first,
second, or third. The dependent variable is level
of intelligence. (d) The independent variable is
length of exposure to the lamp. The conditions
are 15 or 60 minutes. The dependent variable
is amount of depression. (e) The independent
variable is wall color. The conditions are blue,
green, or red walls. The dependent variable is
aggression.
Sample A (as X increases, Y increases) and
Sample D (as X increases, Y tends to
decrease).
We see a group of similar dependent scores in
one condition, and a different group of similar
scores in the next condition, and so on.
25.
Variable
Personality type
Academic major
Number of
minutes before
and after an event
Continuous or
Discrete
Discrete
Discrete
Continuous
Type of
Measurement
Scale
Nominal
Nominal
Interval
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Variable
Restaurant ratings
(best, next best,
etc.)
Speed
Dollars in your
pocket
Change in weight
Savings account
balance
Reaction time
Letter grades
Clothing size
Registered voter
Therapeutic
approach
Schizophrenia
type
Work absences
Words recalled
Continuous or
Discrete
Discrete
Type of
Measurement
Scale
Ordinal
Continuous
Discrete
Ratio
Ratio
11.
13.
15.
Continuous
Discrete
Interval
Ratio
Continuous
Discrete
Discrete
Discrete
Discrete
Ratio
Ordinal
Ordinal
Nominal
Nominal
17.
19.
Discrete
Nominal
Discrete
Discrete
Ratio
Ratio
Chapter 2
1. (a) N is the number of scores in a sample. (b) f is
the frequency of a score or scores.
3. (a) In a bar graph adjacent bars do not touch; in
a histogram they do. (b) Bar graphs are used with
nominal or ordinal scores; histograms are used
with interval or ratio scores.
5. (a) A histogram has a bar above each score; a
polygon has data points above the scores that are
connected by straight lines. (b) Histograms are
used with a few different interval or ratio scores;
polygons are used with many different interval/
ratio scores.
7. (a) Simple frequency is the number of times a
score occurs; relative frequency is the proportion of time the score occurs. (b) Cumulative
frequency is the number of scores at or below a
particular score; percentile is usually defined as
the percent of the scores below a particular score.
9. (a). A skewed distribution has one distinct tail;
a normal distribution has two. (b) A bimodal
distribution has two distinct humps above
the two highest-frequency scores; a normal
21.
distribution has one hump and one highestfrequency score.
For relative frequency, we find the proportion
of the total area under the curve at the specified
scores. For percentile, we find the proportion of
the total area under the curve that is to the left of
a particular score.
(a) Bar graph. (b) Polygon. (c) Bar graph.
(d) Histogram.
(a) The most frequent salaries tend to be in
the middle to high range, with relatively few
extremely low salaries. (b) Yours is one of the
lowest, least common salaries.
(a) 35% of the sample scored below you.
(b) Your score occurred 40% of the time. (c) It is
one of the highest and least frequent scores. (d) It
is one of the lowest and least frequent scores.
(e) 50 participants had either this score or a score
below it. (f) 60% of the area under the curve is to
the left of (below) your score.
(a) 70, 72, 60, 85, 45. (b) Because .20 of the area
under the curve is to the left of 60, it’s at the 20th
percentile. (c) With .50 of the area under the curve to
the left of 70, .50 of the sample is below 70. (d) With
.50 of the area under the curve below 70, and .20 of
the area under the curve below 60, then .50 .20 .30 of the area under the curve is between 60 and
70. (e) .20. (f) With .50 below 70 and .30 between
80 and 70, a total of .50 .30 .80 of the curve is
below 80, so it is at the 80th percentile.
Score
53
52
51
50
49
48
47
f
1
3
2
5
4
0
3
Relative Frequency
.06
.17
.11
.28
.22
.00
.17
23. (a) Bar graph; for a nominal (categorical) variable. (b) Polygon; for many different ratio scores.
(c) Histogram; for only 8 different ratio scores.
(d) Bar graph; for an ordinal variable.
25. (a) These variables are assumed to be discrete,
and the spaces between bars communicate a discrete variable. (b) These variables are assumed to
be continuous, and the lines between data points
communicate that the variable continues between
the plotted X scores.
Appendix C: Answers to Odd-Numbered Study Problems
265
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
1. (a) It indicates where on a variable most scores
tend to be located. (b) The mode, median, and
mean.
3. The mode is the most frequently occurring score;
it is used with nominal scores.
5. The mean is the average score, the mathematical
center of a distribution; it is used with normal
distributions of interval or ratio scores.
7. The distribution is positively skewed, with the
mean pulled toward the extreme high scores in
the tail.
9. (a) It is the symbol for a score’s deviation from
the mean. (b) It is the symbol for the sum of the
deviations around the mean. (c) Compute all
deviations in the sample and then find their sum.
(d) The mean is the center, so the positive deviations cancel out the negative deviations, producing a total of zero.
11. (a)X 638, N 11, X 58. (b) The mode is
58.
13. (a) A 7, B 2, C 0, D 1, E 5.
(b) 5, 1, 0, 2, 7. (c) 0, 1, 2, 5, 7.
15. (a) Mean. (b) Median (these ratio scores are
skewed). (c) Mode (this is a nominal variable).
(d) Median (this is an ordinal variable).
17. He is incorrect if the variable is something on
which it is undesirable to have a high score (e.g.,
number of errors on a test). In that case, being
below the mean with a negative deviation is
better.
19. (a) The independent variable. (b) The dependent
variable. (c) It is the variable manipulated by the
researcher that supposedly influences a behavior. (d) It measures participants’ behavior that
is expected to be influenced by the independent
variable.
21. Mean errors do not change until there have been
5 hours of sleep deprivation. Mean errors then
increase as sleep deprivation increases.
23. (a) Reading the graph left to right, the mean
scores on the Grumpy Test decrease as
sunlight increases. (b) Individual Grumpy
raw scores tend to decrease as sunlight
increases. (c) The populations of Emotionality
scores and, therefore, the ms would tend to
decrease as sunlight increases. (d) Yes, these
data provide evidence of a relationship in
nature.
25. (a) The means for conditions 1, 2, and 3 are 15,
12, and 9, respectively.
266
(b)
Mean productivity
Chapter 3
15
14
13
12
11
10
9
8
Low
Medium
Noise level
High
(c)
µ for
high noise
f
7
8
9
µ for
medium noise
10
13
11 12
Productivity scores
µ for
low noise
14
15
(d) You have evidence for a relationship where,
as noise level decreases, the typical productivity
score increases from around 9 to around 12 to
around 15.
Chapter 4
1. (a) There are larger and/or more frequent differences among the scores. (b) The behaviors are
more inconsistent. (c) The distribution is wider,
more spread out.
3. The shape of the distribution, and the statistics that indicate its central tendency and its
variability.
5. (a) The range is the distance between the highest
and lowest scores in a distribution. (b) It includes
only the most extreme and often least-frequent
scores. (c) With nominal or ordinal scores.
7. (a) They communicate how much the scores are
spread out around the mean. (b) The standard
deviation because it is closer to being the “average deviation,” and we can use it to find the
middle 68% of the distribution.
9. (a) All are forms of the standard deviation, communicating the “average” amount scores differ
from the mean. (b) SX is a sample’s standard
deviation; sX is an estimate of the population’s
standard deviation based on a sample; and sX is
the population’s true standard deviation.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
11. (a) Find the scores at 1 SX and at 1 SX from
the mean. (b) Find the scores at 1 sX and at 1
sX from m. (c) Use X to estimate m, and then find
the scores at 1 sX and at 1 sX from m.
13. (a) Range 8 0 8. (b) X 41, X2 231, N 10; so S2X (231 – 168.1)/10 6.29.
(c) SX 26.29 2.51. (d) With X 4.1 and
SX 2.51, the scores are 4.1 2.51 1.59 and
4.1 2.51 6.61.
15. (a) She made an error. Variance measures the
distance that scores are from the mean, and distance cannot be a negative number. (b) She made
another error. Her “average” deviation cannot
be greater than the range between the lowest and
highest scores. (c) It would incorrectly indicate
that everyone has the same score.
17. (a) Compute the mean and sample standard
deviation in each condition. (b) As we change the
conditions from A to B to C, the dependent scores
change from around a mean of 11.00 to 32.75
to 48.00, respectively. The SX for the three conditions are .71, 1.09, and .71, respectively. (c) The
SX for the three conditions seem small, so participants scored consistently in each condition.
19. (a) Study A has a relatively narrow/skinny distribution, and Study B has a wider distribution.
(b) In A, between 35 (40 5) and 45 (40 5);
in B, between 30 (40 10) and 50 (40 10).
21. (a) Based on the sample means for conditions
1, 2, and 3, we’d expect a m of about 13.33,
8.33, and 5.67, respectively. (b) Somewhat
inconsistently, because computing each sX
indicates to expect a sX of 4.51, 2.52, and 3.06,
respectively.
23. (a) Conditions are sunny versus rainy; compute X
and SX using the ratio scores of length of laughter;
create a bar graph for this discrete independent
variable. (b) Conditions are years of alcoholism;
for these skewed income scores, compute the
median, not the X and SX; create a line graph for
this continuous independent variable. (c) Conditions are hours slept; compute the X and SX for
the ratio scores of number of ideas; create a line
graph for this continuous independent variable.
3.
5.
7.
9.
11.
13.
15.
17.
Chapter 5
1. (a) A z-score indicates the distance a score is
above or below the mean when measured in
standard deviation units. (b) z-scores are used
to determine relative standing, compare scores
19.
from different variables, and compute relative
frequency and percentile.
(a) It is the distribution that results from
transforming a distribution of raw scores into
z-scores. (b) No, only when the raw scores are
normally distributed. (c) The mean is 0 and the
standard deviation is 1.
(a) It is our model of the perfect normal z-distribution. (b) It is used as a model of any normal
distribution of raw scores after being transformed to z-scores. (c) When scores are a large
group of normally distributed interval or ratio
scores.
(a) z (80 86)/12 .5. (b) z (98 86)/12 1. (c) X (1.5)(12) 86 68.
(d) X (1)(12) 86 98.
(a) z 1. (b) z 2.8. (c) z .70. (d) z 0.
(a) Convert the raw score that marks the slice to z;
find in column B or C the proportion in the slice,
which is also the relative frequency of the scores
in the slice. (b) Convert the raw score to z; the
proportion of the curve to the left of z (in column
C) becomes the percentile. (c) Convert the raw
score to z; the proportion of the curve between the
z and the mean (in column B) plus .50 becomes
the percentile. (d) Find the specified proportion
in column B or C, identify the z in column A, and
transform the z to its corresponding raw score.
(a) z (76 100)/16 1.5; from column B,
the relative frequency is .4332. (b) (.4332)
(500) 216.6 people. (c) From column C, it
is .0668, or about the 7th percentile. (d) With
.4332 between z and the mean, and .50 above the
mean, .4332 .50 .9332; then (.9332)(500) 466.6 people.
(a) That it is normally distributed, that its m
equals the m of the underlying raw score population, and that its standard deviation is related
to the standard deviation of the raw score population. (b) Because it describes the sampling
distribution for any population without our
having to actually measure all possible sample
means.
No. To compare the scores we need z-scores: For
Emily, z (76 85)/10 .90; for Amber,
z (60 50)/4 2.5. Relative to their respective classes, Amber did much better than Emily.
(a) Small. This will give him a large positive
z-score, placing him at the top of his class.
(b) Large. Then he will have a small negative z
and will be relatively close to the mean.
Appendix C: Answers to Odd-Numbered Study Problems
267
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
21. From column C in the z-table, the 25th percentile
is at approximately z .67. The cutoff score is
X (.67)(10) 75 68.3.
23. To make the salaries comparable, compute z. For
City A, z (43,000 45,000)/15,000 .13.
For City B, z (46,000 50,000)/18,000 .22. City A is the better offer, because her
income will be closer to (less below) the average
cost of living in that city.
25. (a) z (60 56)/8 .50, so from column B,
.1915 of the curve is between 60 and 56. Adding .50 of the curve below the mean gives a total
of .6915, or 69.15% of the curve is expected
to be below 60. (b) z (54 56)/8 .25,
so from column C, .4013 of the curve is below
54. (c) The approximate upper .20% of the
curve from column C is .2005, at z .84.
The corresponding raw score is X (.84)(8) 56 62.72.
15. (a)
z
17.
19.
Chapter 6
1. (a) It is our expectation or confidence the event
will occur. (b) The relative frequency of the event
in the population.
3. (a) z (64 46)/8 2.25; from column C,
p .0122. (b) z (40 46)/8 .75; z (50 46)/8 .50; using column B, p .2734 .1915 .4649.
5. The p of a hurricane is 160/200 .80. The uncle
may be looking at an unrepresentative sample
over the past 13 years. David uses the gambler’s
fallacy, failing to realize that p is based on the
long run, and so there may not be a hurricane
soon.
7. It indicates that by chance, we’ve selected too
many high or too many low scores, so a sample is
unrepresentative of its population. Then X does
not equal the m it represents.
9. (a) It indicates whether or not the sample’s
z-score (and X) lies in the region of rejection.
(b) We reject that the sample comes from or
represents the underlying raw score population,
because it is very unlikely to do so.
11. She had sampling error, obtaining an unrepresentative sample that contained a majority of
Ramone’s supporters, but the majority in the
population supported Darius.
13. Because it is a mean and a sample that is very
unlikely to occur if we were representing that
population.
268
21.
23.
25.
1.96
µ
60
z
1.45
z
1.96
(b) No: The sample’s z-score is not beyond the
critical value, so this is a frequent sample mean
when representing this population so we should
not reject that the sample represents the population with m 60.
(a) 1.645. (b) Yes, because sX 2.19, so z (36.8 33)/2.19 1.74, which lies beyond
1.645. (c) Such a sample is unlikely to occur
when representing this population. (d) Reject
that the sample represents this population.
sX 1.521, so z (34 28)/1.521 3.945.
No, because the z is beyond the critical value of
1.96, so this sample is unlikely to represent this
population.
For Bubba’s, z (26 24)/1.10 1.82, which
is not beyond 1.96. Retain that this sample
represents the population of average bowlers. For
Babette’s, z (18 24)/1.10 5.45, which is
beyond 1.96. Reject that this sample represents
the population of average bowlers.
No: Having a boy now is no more likely than at
any other time.
(a) First compute the X, which is 35.67;
sX 5> 29 1.67 ; z (35.67 30)/1.67 3.40. With a critical value of 1.96, conclude
that your football players do not represent this
population. (b) Football players, as represented
by your sample, form a population different from
non–football players, with a m of about 35.67.
Chapter 7
1. By poorly representing a population, a sample
may mislead us to incorrectly describe an existing relationship, or to describe a relationship that
does not exist.
3. a stands for the criterion probability; it determines the size of the region of rejection and the
theoretical probability of a Type I error.
5. Perhaps wearing the device causes people to exercise more, and the researcher’s result accurately
reflects this. Or, perhaps wearing the device does
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
7.
9.
11.
13.
15.
17.
nothing and, by chance, the researcher happened
to select an unrepresentative sample who naturally tend to exercise more than in the general
population.
(a) The null hypothesis, indicating the sample
represents a population where the predicted
relationship does not exist. (b) The alternative
hypothesis, indicating the sample represents a
population where the predicted relationship
does exist.
(a) That the sample relationship was so unlikely
to occur if there was not the predicted relationship in nature that we conclude there is this
relationship in nature. (b) Because then we can
believe that we’ve found a relationship and are
not being misled by sampling error.
(a), (b), and (c) are incorrect because by retaining
H0, we still have both hypotheses, so we can make
no conclusions about the relationship; in (d) the
results are nonsignificant; (e) and (f) are correct.
(a) That owning a hybrid causes different attitudes than not owning one. (b) Two-tailed. (c) H0:
The m of hybrid owners equals that of nonhybrid
owners, at 50; Ha: The m of hybrid owners is not
equal to that of nonhybrid owners, so it is not
equal to 50. (d) sX 24> 2100 2.40; zobt (76 65)/2.40 4.58. (e) zcrit 1.96. (f) The
zobt of 4.58 is larger than the zcrit of 1.96, so
the results are significant: Owners of hybrid cars
have significantly more positive attitudes (with m
around 77) than owners of nonhybrid cars (with
m 65). (g) z 4.58, p .05.
(a) The probability of a Type I error is p .05;
this is concluding that type of car influences attitudes, when really it does not. (b) By rejecting H0,
there is no chance of a Type II error, which would
be retaining H0; this is not concluding that type
of car influences attitudes, when really it does.
(a) Changing the independent variable from not
finals week to finals week increases the dependent variable of amount of pizza consumed; we
will not demonstrate an increase. (b) Changing
the independent variable from not performing
breathing exercises to performing them changes
the dependent scores of blood pressure; we will
not demonstrate a change in scores. (c) Changing the independent variable by increasing hormone levels changes the dependent scores of pain
sensitivity; we will not demonstrate a change in
scores. (d) Changing the independent variable
by increasing amount of light will decrease the
19.
21.
23.
25.
dependent scores of frequency of daydreams; we
will not demonstrate a decrease.
(a) A two-tailed test. (b) H0: m 70; Ha: m 70.
(c) sX 12> 249 1.71; zobt (74.36 70)/1.71 2.55. (d) zcrit 1.96. (e) Yes;
because zobt is beyond zcrit, the results are significant: Changing from no music to music results
in test scores changing from a m of 70 to a m
around 74.36.
(a) The X is 35.11; SX 10.629. (b) One-tailed
test—she predicts that taking statistics lowers
self-esteem. (c) H0: m 28; Ha: m 28.
(d) sX 11.35> 29 3.78 ; zobt (35.11 28)/3.78 1.88. (e) zcrit 1.645. (f) Because
the positive zobt is not beyond the negative zcrit, the
results are not significant. She can make no claim
about the relationship. (g) Perhaps statistics do
nothing to self-esteem; perhaps they lower selfesteem and she poorly represented this; or perhaps
they actually raise self-esteem as the X suggests.
He is incorrect because the total size of the region
of rejection (which is a) is the same for a one- or
a two-tailed test. This is also the probability of
making a Type I error, so it is equally likely with
either type of test.
Incorrect: Whether we make a Type I or Type II
error is determined by whether the independent
variable “works” in nature. Power applies when
it does work, increasing the probability of rejecting H0 when it is false. When H0 is true, then a is
the probability of making a Type I error.
Chapter 8
1. Either (1) the predicted relationship does not
occur in nature, but by chance, sampling error
produced unrepresentative data that make it look
like the relationship occurs or (2) the data accurately represent the predicted relationship, which
does occur in nature.
3. (a) As a one-sample experiment. (b) Normally
distributed interval or ratio scores. (c) Perform
the t-test when sX of the underlying raw score
population is not known; perform the z-test
when sX is known.
5. (a) Because the t-distribution is not a perfectly
normal distribution like the z-distribution is.
(b) Different Ns produce differently shaped
t-distributions, so a different tcrit is needed to
demarcate a region of rejection equal to a. (c) The
degrees of freedom (df). (d) df N 1.
Appendix C: Answers to Odd-Numbered Study Problems
269
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
7. To describe the relationship and interpret it in
terms of our variables.
9. (a) One-tailed. (b) H0: m 88; Ha: m 88.
(c) With df 30 and a .05, the one-tailed tcrit
is 1.697. (d) Yes. (e) Yes. (f) The two-tailed tcrit
is 2.042 (not the one-tailed 1.697).
11. (a) H0: m 68.5; Ha: m 68.5.
(b) sX 2130.5>10 3.61 ; tobt (78.5 68.5)/3.61 2.77. (c) With df 9, tcrit 2.262. (d) Compared to other books, which
produce m 68.5, this book produces a significant improvement in exam scores, with a m
around 78.5. (e) t(9) 3.62, p .05. (f) (3.61)
(2.262) 78.5 m (3.61)(2.262) 78.5 70.33 m 86.67.
13. (a) H0: m 50; Ha: m 50.
(b) tobt (53.25 50)/8.44 .39.
(c) For df 7, tcrit .2.365.
(d) t(7) .39, p .05, so the results are not
significant, so do not compute the confidence
interval.
(e) She can make no conclusion about whether
the argument changes people’s attitudes.
15. (a) H0: m 12; Ha: m 12. (b) X 8.667;
s2X 4.67; sX 24.67>6 .882; tobt (8.667 12)/.882 3.78. With df 5, tcrit 2.571.
The results are significant: Not using the program significantly reduces grammatical errors.
(c) (.882)(2.571) 8.667 m (.882)
(2.571) 8.667 6.399 m 10.935.
17. (a) A margin of error. (b) If we could measure
the population in your profession, we expect the
mean to be between $42,000 and $50,000.
19. In the population, Smith’s true rating is between
32% and 38%, and Jones’ is between 34% and
40%. Because of the overlap, Smith might be
ahead, or Jones might be ahead. Using these statistics there is no clear winner, so we conclude
there is a tie.
21. (a) N 46; that adolescents and adults differ in
perceptual skills; with p .01, the difference is
significant; Type I error. (b) N 100; that each
personality type produces a different population
of emotionality scores; with p .05, the difference is not significant; Type II error.
23. (a) t(34) 2.019, p .05. (b) t(34) 2.47,
p .05.
25. Create the experimental hypotheses and design
a study to obtain the X of dependent scores
under one condition to compare to a known
m under another condition. Create the one- or
270
two-tailed H0 and Ha, and perform the t-test
to determine if X differs significantly from m.
If the results are significant, describe and
interpret the relationship and compute the confidence interval for the m being represented. If
the results are not significant, make no conclusion about the predicted relationship, one way
or the other.
Chapter 9
1. (a) In an experiment with two independent
samples. (b) In an experiment with two related
samples.
3. Either (1) changing the conditions does not produce the predicted relationship in nature, but by
chance, sampling error produced unrepresentative sample data that produce different means,
making it look like the relationship occurs or
(2) the data accurately represent that changing
the conditions produces the predicted relationship that does occur in nature.
5. Using matched pairs, where each participant in
one condition is paired with a participant in the
other condition, or using repeated measures,
where the same sample of participants is tested
under all conditions.
7. (a) It is a distribution showing all differences
between two means that occur when two samples
are drawn from the one population of raw scores
described by H0. (b) It is the standard deviation
of the sampling distribution of differences
between means from independent samples.
(c) That the difference between our means is
unlikely to occur if we are representing the
population described by H0, where there is not
the predicted relationship.
9. (a) It is a distribution showing all values of D
that occur when samples represent a population
of difference scores where mD 0 and where the
predicted relationship does not exist. (b) It is the
standard deviation of the sampling distribution
of D. (c) That our mean difference (D) is unlikely
to occur if we are representing the population
described by H0, where there is not the predicted
relationship.
11. (a) To interpret the results in terms of the variables and underlying behaviors. (b) Because it
indicates the size of the impact the independent
variable has on the dependent variable and the
behavior it reflects.
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
13. (a) H0: m1 m2 0; Ha: m1 m2 ⬆ 0.
(b) s2pool 23.695; sX X 1.78;
tobt (43 39)>1.78 2.25.
(c) With df (15 1) (15 1) 28,
tcrit 2.048.
(d) The results are significant: In the population,
Once (with m around 43) leads to more productivity than Often (with m around 39).
(e) d (43 39)> 223.695 .82;
r2pb (2.25)2 > 3 (2.25)2 28 4 .15. This is a
moderate to large effect.
(f) Label the X axis as e-mail checked; label the
Y axis as mean productivity. Plot the data
point for Once at Y 43; plot the data point
for Often at Y 39. Create a bar graph: Frequency of checking is measured as an ordinal
variable.
15. (a) H0: mD 0; Ha: mD ⬆ 0.
(b) tobt (2.63 0)>.75 3.51.
(c) With df 7, tcrit 2.365, so people
exposed to much sunshine exhibit a significantly higher well-being score than when
exposed to less sunshine.
(d) The X of 15.5; the X of 18.13.
(e) r2pb (3.51)2 > 3 (3.51)2 7 4 .64; 64% of
the changes are due to changing sunlight.
(f) By accounting for 64% of the variance, these
results are very important.
17. (a) H0: mD 0; Ha: mD 0.
(b) D 1.2, s2D 1.289, sD .359;
tobt (1.2 0)>.359 3.34.
(c) With df 9, tcrit 1.833.
(d) The results are significant. In the population,
children exhibit more aggressive acts after
watching the show (with m about 3.9) than
they do watching before the show (with m
about 2.7).
(e) d 1 .2/21 .289 1.14; a relatively large
difference.
19. You cannot test the same people first when they’re
males and then again when they’re females.
21. (a) Two-tailed.
(b) H0: m1 m2 0; Ha: m1 m2 ⬆ 0.
(c) X1 11.5, s21 4.72; X 2 14.1,
s22 5.86, s2pool 5.29, sX X 1.03,
tobt (11.5 14.1)>1.03 2.52.
With df = 18, tcrit = 2.101, so tobt is
significant.
(d) Police who’ve taken this course are more
successful at solving disputes than police
who have not taken it. The m for the police
1
2
1
2
with the course is around 14.1, and the m for
the police without the course is around 11.5.
(e) d 1.13; r2ph .26; taking the course is
important for effectively resolving disputes.
23. (a) Independent-samples design; independentsamples t-test; a significant result, so Type I error
(with p . 01); changing from male to female
increased scores from around 5.4 to 9.3, but this
was an inconsistent change because it accounts
for only 8% of the variance in scores.
(b) Repeated-measures design; related-samples
t-test; a significant result (with p .05), so Type I
error; dieting decreased participants’ weights
and accounts for 26% of the variance in scores,
which is a consistent effect.
25. (a) When the dependent variable is measured
using normally distributed interval or ratio scores
that have homogeneity of variance. (b) z-test,
used in a one-sample experiment when the sX of
the underlying raw score population is known;
one-sample t-test, used in a one-sample experiment when the sX of the underlying raw score
population is not known; independent-samples
t-test, used when two independent samples are
tested under two conditions; related-samples
t-test, used when participants are tested under
two conditions, with either matching pairs of different participants or repeated measures of the
same participants.
Chapter 10
1. (a) In experiments we manipulate one variable
and measure participants on another variable;
in correlational studies we measure participants
on two variables. (b) In experiments we compute
the mean of the dependent (Y) scores for each
condition of the independent variable (X); in
correlational studies the correlation coefficient
simultaneously examines the relationship formed
by all X-Y pairs.
3. (a) It is a graph of the individual data points
formed by a sample of X-Y pairs. (b) It is the
summary straight line drawn through the center
of the scatterplot.
5. (a) When you want to summarize the relationship between two normally distributed interval
or ratio variables. (b) The type (direction) of the
relationship and its strength.
7. Either (1) the r accurately reflects the predicted
relationship that does occur in nature or (2) the
Appendix C: Answers to Odd-Numbered Study Problems
271
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
9.
11.
13.
15.
17.
19.
21.
predicted relationship does not occur in nature, but
by chance, we obtained misleading sample data
that produces the r so that it looks like the relationship occurs.
Create the two- or the one-tailed H0 and Ha.
Compute robt. Set up the sampling distribution
and, using df N 2, find rcrit. If robt is larger
than rcrit, the coefficient is significantly different
from zero, so estimate the population coefficient
(r) as being around the value of robt. Determine
the proportion of variance accounted for using r2
and perform linear regression procedures.
(a) It describes the proportion of all of the differences in Y scores that are associated with changing X scores. (b) By computing r2. (c) The larger
the r2, the more accurately we can predict scores
(and the underlying behavior), so the relationship
is scientifically more useful and important.
He is incorrect, inferring that this relationship
shows that an increased population causes fewer
bears.
(a) No; she should square each r, giving r2 (.60)2 .36, and r2 (.30)2 .09. The relationship with hours studied is four times more consistent than that with classroom size. (b) Study time,
because it accounts for 36% of the differences in
test scores; classroom size accounts for only 9%.
(a) First compute r. X 49, X2 371,
(X)2 2401, Y 31, Y2 171,
(Y)2 961, XY 188, and N = 7, so
r (1316 1519)> 2(196)(236) .94.
(b) This is a very strong relationship, with close
to one Y score paired with each X. (c) H0: r 0;
Ha: r 0. r2. (d) With df 5, the two-tailed rcrit
.754. (e) The coefficient is significant, so we
expect in the population that r is around .94.
(f) r(5) .94, p .05. (g) r2 (.94)2 .88,
so 88% of the differences in participants’ satisfaction scores are linked to their error scores.
(a) X 45, X2 259, Y 89,
Y2 887, XY 460, N 10;
r (4600 4005)> 2(565)(949) .81.
(b) Very strong: Close to one value of Y tends to
be associated with one value of X. (c) H0: r 0;
Ha: r 0. (d) With df 8, the one-tailed rcrit .669. (e) r(7) .81, p .05: This r is significant, so we expect r is around .81. (f) r2 .66,
so predictions are 66% more accurate.
(a) With N 78 there is no rcrit for df 76. The
nearest bracketing values are .232 and .217
for a df of 70 and 80, respectively. The robt of
272
.38 is well beyond these critical values, so it is
significant. (b) r(76) .31, p .05. (c) Compute r2 and the linear regression equation.
23. Implicitly we are asking, “For a given intelligence
score, what creativity score occurs?”, so intelligence is X and creativity is Y.
25. (a) B. (b) B. (c) 4. (d) 16. (e) 4.
Chapter 11
1. (a) Analysis of variance. (b) A study that contains
one independent variable. (c) An independent
variable. (d) A condition of the independent variable. (e) Another name for a level. (f) All samples
are independent. (g) All samples are related, usually through a repeated-measures design. (h) The
number of levels in a factor.
3. Either (1) the independent variable influences
the behavior in nature, producing the different
means in the different conditions so that there
is a relationship or (2) the independent variable
does not influence the behavior in nature,
but by chance we obtain sample data that
produce different means, making it appear the
relationship exists.
5. (a) It is the probability of making a Type I error
after comparing all possible pairs of means in
an experiment. (b) Multiple t-tests result in an
experiment-wise error rate that is larger than our
alpha. (c) Performing ANOVA and then post hoc
tests keeps the experiment-wise error rate equal
to alpha.
7. Using either a between- or a within-subjects
ANOVA, compute Fobt and compare it to Fcrit If
Fobt is significant, perform Tukey’s HSD test if all
ns are equal. Describe the effect size by computing
h2. Graph the results and/or compute confidence
intervals for each condition. Interpret the results
“psychologically.”
9. The H0 is that the independent variable does
not have an influence on behaviors or scores, so
the means from the levels “really” represent the
same population m. The Ha is that the independent variable has an influence on behaviors and
scores, so the means from two or more levels represent different population ms.
11. (a) Because the treatment does not work, one
population is present and then MSbn should equal
MSwn, so the F-ratio is 1. (b) Because the treatment produces differences in scores among the
conditions, producing an MSbn that is larger
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
13.
15.
17.
19.
than MSwn, so the F-ratio is greater than 1. (c) It
indicates that two or more of the level means
probably represent different values of m so the
data represent a relationship in the population.
(a) It shows all values of F that occur by chance
when H0 is true (only one population is represented). (b) That the Fobt hardly ever occurs when
H0 is true. (c) It indicates that our conditions are
unlikely to represent only one population.
(a) The MSbn is less than the MSwn and H0 is
assumed to be true. (b) He made an error: Fobt
cannot be a negative number.
(a) Several independent samples of participants
were created based on their particular salaries.
Then their self-esteem was measured. (b) The
salary factor produced a significant Fobt,
indicating “believable” differences among the
mean self-esteem scores for two or more conditions. (c) Perform a post hoc test to identify
which salary levels differ.
(a)
Source
Between
Within
Total
Sum of
Squares
147.32
862.99
1010.31
df
2
60
62
Mean
Square
73.66
14.38
F
5.12
H0: m1 m2 m3; Ha: Not all ms are equal.
For dfbn 2 and dfwn 60, Fcrit 3.15.
The Fobt is significant: F(2, 60) 5.12, p .05.
HSD (3.40) 1219.18>212 3.25. All
three means differ from each other by more
than 3.25, so all differ significantly. (f) We
expect that changing the levels of the factor produces a change in scores from a m around 45.3
to a m around 16.9 to a m around 8.2.
(g) h2 147.32>1010.31 .15, The factor
determines 15% of the differences in scores,
so it does not have a very large or important
influence.
21. (a) H0: m1 m2 m3 m4; Ha: Not all ms are
equal.
(b)
Sum of
Squares
47.69
20.75
68.44
df
3
12
15
Mean
Square
15.897
1.729
HSD (4.20)(21.73>42 2.76; significant
differences are between negligible and moderate, negligible and severe, and minimal and
severe. (f) As stress levels increased, illness
rates significantly increased, although only
differences between nonadjacent stress conditions were significant.
(g) h2 47.69>68.44 .70; increasing the stress
level accounted for 70% of the variance in
illness rates.
23. (a) Yes. (b) A dfwn 51 is not in the F-table, but
the bracketing dfs of 50 and 55 have critical values of 2.56 and 2.54, respectively. The Fobt is well
beyond the Fcrit that would be at dfwn 51, so it
is significant.
25. (a) One-way between-subjects ANOVA.
(b) Independent samples t-test or betweensubjects ANOVA. (c) One-way within-subjects
(repeated-measures) ANOVA. (d) Pearson r and
linear regression. (e) With these matched pairs,
the related-samples t-test or the within-subjects
ANOVA.
Chapter 12
(b)
(c)
(d)
(e)
Source
Between
Within
Total
(e) X1 2.0, X2 3.0, X3 5.75, X4 6.0;
F
9.19
(c) For dfbn 3 and dfwn 12, Fcrit 3.49. (d)
The Fobt is significant: F(3, 12) 9.19, p .05.
1. (a) When the experiment simultaneously examines two independent variables. (b) The effect of
the interaction of the two variables.
3. (a) All levels of one factor are combined with all
levels of the other factor. (b) The combination
of a level of factor A with a level of factor B.
(c) Collapsing a factor is averaging together all
scores from all levels of the factor.
5. (a) The overall effect on the dependent scores of
changing the levels of that factor. (b) By averaging together all scores from all levels of factor B
so that we have only the mean for each level of A.
(c) That at least two of the means are likely to
represent different ms.
7. (a) The FA for the main effect of factor A, FB
for the main effect of factor B, and FA×B for the
interaction. (b) Post hoc comparisons must be
performed. (c) To determine which specific levels
or cells differ significantly.
9. (a) It forms a matrix with four columns for the
levels of A and three rows for the levels of B, with
n 10 per cell. (b) 4 × 3. (c) 30. (d) 40. (e) 10.
11. (a) Use k when finding qk for a main effect; use
the adjusted k when finding qk for the interaction. (b) Find the difference among each pair
Appendix C: Answers to Odd-Numbered Study Problems
273
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Scores
of means for a main effect; find only unconfounded differences among the cell means in the
interaction.
13. (a) Yes: Because changing the levels of A produced
one relationship for B1 (decreasing scores) and a different relationship for B2 (increasing scores). (b) For
A1, X 9.5; for A2, X 10.5; for A3, X 12.5.
Changing A increases scores from around 9.5 to
around 10.5 to around 12.5. (c) For B1, X 8.33;
for B2, X 13.33. Changing B increases scores
from around 8.33 to around 13.33.
20
(d)
18
16
14
12
10
8
6
4
2
0
B1
B2
A1
A2
Levels of Factor A
A3
15. (a) The way the scores change with increasing
reward depends on the level of practice: For
low practice, scores increase and then decrease;
for medium practice, scores are level and then
increase; for high practice, scores are unchanged.
(b) As reward increases, performance does not
increase under every level of practice, contradicting the main effect means. (c) Perform the HSD
test on the cell means. (d) Subtract each mean
from every other mean within each column
and within each row. (e) For this 3 × 3 design,
adjusted k 7 and dfwn 60, so qk 4.31.
17. Study 1: For A, means are 7 and 9; for B, means
are 3 and 13. Apparently there are effects for A
and B but not for A × B. Study 2: For A, means
are 7.5 and 7.5; for B, means are 7.5 and 7.5.
There is no effect for A or B but there is an effect
for A × B. Study 3: For A, means are 8 and 8; for
B, means are 11 and 5. There is no effect for A,
but there are effects for B and A × B.
19. (a) No difference occurs between the means for
low and high users. (b) The means are different
for two or more of the income levels. (c) The difference between the means for high or low users
depends on income level. (Or, differences among
income levels depend on whether participants are
in the high or the low usage group.)
21. The main effect for math versus logic problems,
the main effect for difficulty level, and the interaction of math or logic and difficulty level are all
significant.
274
23. Perform the ANOVA to compute FA for testing
the differences among main effect means of
factor A, FB for testing the differences among
main effect means of factor B, and FA×B for
testing the cell means for the interaction.
Perform the Tukey HSD test for each significant
F to determine which means differ significantly.
Compute h2 to determine the size of each
significant effect.
25. (a) Related-samples t-test or one-way withinsubjects ANOVA. (b) Two-way between-subjects
ANOVA. (c) Independent-samples t-test or
one-way between-subjects ANOVA. (d) Oneway within-subjects ANOVA. (e) One-way
between-subjects ANOVA. (f) Pearson correlation
coefficient.
Chapter 13
1. Both types of procedures test whether, due to
sampling error, the data poorly represent the
absence of the predicted relationship in the
population.
3. (a) They form non-normal distributions, or they
do not have homogeneous variance. (b) Transform them to ranked scores.
5. They count the frequency of participants falling
into each category of one variable.
7. (a) In the ANOVA, the researcher measures the
amount of a behavior or attribute; in the chi
square, the researcher counts the number of participants showing or not showing the behavior or
attribute. (b) They both test whether the groups
differ significantly, representing a relationship in
nature.
9. (a) It becomes larger as the differences between
each fo and fe become larger. (b) Because there is
a larger difference between the frequencies we
have obtained and the frequencies we should
have obtained if H0 was true. (c) It shows the
frequency of all values of x2obt that occur when H0
is true. (d) Because our x2obt is so unlikely to occur
if H0 was true that we reject it is true.
11. If testing no difference between the groups, then
each fe N/k. If testing the goodness of fit of
some other model, each fe is based on the percentage given in the model.
13. (a) The one-way x2 (b) H0: In the population,
the frequencies of women’s preferences for much
or slightly taller men are equal; Ha: The frequencies of women’s preferences for much or slightly
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
taller men are not equal in the population.
(c) Compute each fe: With N 89, fe 89/2 44.5 for each group. (d) The x2obt [(34 44.5)2/44.5] [(55 44.5)2/44.5] 4.96. With
df 1, x2crit 3.84 so the results are significant.
Conclude: In the population, about 55/89, or
62%, of women prefer slightly taller men, and
about 38% prefer much taller men, p .05.
(e) Label the Y axis with f. Label the X axis with
each preference, and for each, draw a bar graph
to the height of the f.
15. (a) The two-way x2. (b) H0: In the population,
the frequency of voting or not is independent
of the frequency of satisfaction or not; Ha: The
frequency of voting or not is dependent on
the frequency of satisfaction or not, and vice
versa. (c) Compute each fe. For voters: Satisfied
fe (83)(81)/168 40.02, dissatisfied fe (83)(87)/168 42.98; for nonvoters: Satisfied fe (81)(85)/168 40.98, dissatisfied
fe (85)(87)/168 44.02. (d) x2obt [(48 40.02)2/40.02] [(35 42.98)2/42.98] [(33 40.98)2/40.98] [(52 44.02)2/44.02] 6.07. (e) For df 1, x2crit 3.84. The results
are significant, so in the population, satisfaction with election results is correlated with—
depends on—voting. (f) For the phi coefficient:
w 26.07>168 .19, so the relationship is
somewhat consistent.
17. (a) The one-way x2. (b) H0: The elderly population is 30% Republican, 55% Democrat, and
15% Other; Ha: The elderly population is not
distributed this way. (c) Compute each fe: For
Republicans, fe (.30)(100) 30; for Democrats, fe (.55)(100) 55; for Other, fe (.15)
(100) 15. (d) x2obt .53 2.20 3.27 6.00.
(e) For df 2, x2crit 5.99, so the results are significant: Party affiliation among senior citizens is
different from affiliations in the general population. As in our samples, we expect 26% Republican, 66% Democrat, and 8% Other.
19. (a) The frequency of students disliking each professor must be included. (b) She should perform
the two-way x2 to test whether liking or disliking
one professor is correlated with liking or disliking the other professor.
21. (a) Kruskal–Wallis test. (b) Mann–Whitney test.
(c) Wilcoxon test. (d) Friedman test.
23. (a) Wilcoxon test. (b) Kruskal–Wallis test.
(c) One-way chi square. (d) Transform the scores
to ranks, and then perform the Friedman test.
(e) Mann–Whitney test.
25. For the independent variable: whether independent or related samples were tested, the number of
independent variables, and if only one variable, the
number of conditions. For the dependent variable:
whether the scores meet the requirements of parametric or a nonparametric procedure.
Appendix C: Answers to Odd-Numbered Study Problems
275
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
INDEX
A
alpha (α), 113, 128
alternative hypothesis (Ha),
109–110
creating, 111
defined, 110, 111
in independent-sample t-tests,
147–148
interaction effects and, 208
main effects and, 205–206
mean square and, 191
in one-tailed tests, 148,
154–156
in one-way ANOVA,
187–188
in one-way chi square,
221–223
Pearson r and, 175–177
population mean in, 110,
143, 188
proving, 115–117
rejecting, 115–117
in related-sample t-tests,
151–154
in t-distribution, 133–134
in two-way chi square, 226
in within-subjects
ANOVA, 199
American Psychological
Association (APA)
symbols, 65
analysis of variance (ANOVA).
See one-way analysis of
variance (ANOVA); twoway analysis of variance
(ANOVA)
area under the curve, 30–31. See
also proportion of the area
under the curve
association. See also
correlational study
degree of, 167
relationship and, 7
assumptions
in independent-samples t-test,
142, 148
in one-sample t-test, 128
in one-way ANOVA, 197
in one-way chi square,
197, 221
in parametric statistics,
108, 112
in related samples t-test, 149,
150, 155
in two-way chi square, 224
in z-tests, 113, 117
average. See mean
B
bar graph, 23–24, 47–48, 229
behavioral research. See research
bell-shaped curve, 26. See
normal curve
276
between-subjects ANOVA,
185, 186
two-way, 203
between-subjects factor, 185, 186
biased estimators, 60–61
bimodal distribution, 27–28, 39
C
cause and independent
variables, 12
cell, 204
cell mean, 207
central limit theorem, 80–81
central tendency, 37–38. See also
measure of central tendency
chance, 89–92. See also
probability (p)
chi square distribution, 222–223
chi square procedures (χ2)
defined, 220
nonparametric statistics
and (See nonparametric
statistics)
one-way (See one-way
chi square)
reporting, 229
sampling distribution,
222–223
SPSS to perform, 231
two-way (See two-way chi
square)
Cohen’s d, 156–157
collapsing, 205
condition
defined, 12
of independent variables, 12
confidence interval for μ, 137
computing, 136–137
defined, 135–137
to estimate population mean,
135–137
in independent-samples t-test,
147–148
in related-samples t-test, 154
reporting, 138
in Tukey HSD test, 197
confounded comparison, 212
consistency
correlation coefficients and,
167–168, 169
vs. measures of variability,
53–55
in proportion of variance
accounted for,
157–158, 180
of relationships, 8–9
contingency coefficient (C), 228
continuous variables, 16–17
defined, 16
vs. discrete variables, 16–17
correlation, 163–164
causality and, 164
defined, 163
negative, 177
perfect, 168–169
positive, 177
type of relationship described
by, 167
zero, 170, 176
correlational design, 163
correlational study, 162–183
characteristics of, 164–165
defined, 14
vs. experiment, 164
linear relationships and,
165–166
mean score and, 45–46
nonlinear relationships and,
166–167
scatterplots and, 164–165
strength of relationship and,
167–171
type of relationship and,
165–167
correlation coefficient. See also
correlational study; Pearson
correlation coefficient (r)
computing, 163
consistency and,
167–168, 169
contingency, 228
defined, 163
linear, 167
measures of variability and,
168–169
phi, 227–228
vs. regression line, 166
Spearman, 230
squared point-biserial, 158
criterion
defined, 99
selecting, 101
symbol for, 113
criterion variable, 178
critical value
in computing confidence
interval, 136–137
defined, 99
determining, 114
identifying, 99–100
interpreting t-test and, 133
in one-tailed test, 119–120,
133–134
region of rejection and,
99–100, 101
representative samples,
99–101
t-distribution and,
130–132
t-table and, 132
z compared to, 99–100,
114–115
cumulative frequency, 32–33
defined, 33
percentile and, 32–33
curvilinear relationships. See
nonlinear relationships
D
data, statistics and, 3
data point, 24–25
degree of association. See
strength of relationship
degrees of freedom (df ),
130–132
computing, 193
defined, 132
in independent-samples
t-test, 147
in one-samples t-test, 132
in one-way ANOVA, 189,
191–195
in one-way chi square, 222
in Pearson r, 176
in related-samples t-test, 153
t-distribution and, 130–132
in two-way chi square, 227
dependent samples. See
related samples
dependent variables, 12–13, 46
descriptive statistics, 10
design
correlational, 163
defined, 11
factorial, 204
matched-samples, 149
pretest/posttest, 150
repeated-measures, 149
two-way mixed-design
ANOVA, 203
deviation
defined, 44
sum of, around the mean,
44–45
difference scores, 150–153
discrete scales, 24
discrete variables, 16–17
vs. continuous variables, 16–17
defined, 16
dispersion, 53. See also measures
of variability
E
effect size, 156–158
Cohen’s d and, 156–157
defined, 156
measure of, 156
one-way ANOVA and,
198–199
using proportion of variance
accounted for, 157–158
empirical probability
distribution, 90
error. See also standard error
sampling, 94–96
sum of the deviations around
the mean and, 45
Type I, 121–122
Type II, 122–123
estimated population standard
deviation, 60–61
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
computing, 64–65
defined, 61
defining formulas for unbiased
estimators of, 60–61
estimated population variance,
60–61
computing, 64–65
defined, 61
defining formulas for unbiased
estimators of, 60–61
estimated standard error of the
mean, 128–130
estimating population μ by
computing confidence
interval, 135–137
estimators, biased and unbiased,
60–61
eta squared (η 2), 198–199, 211
expected frequency (fe)
defined, 221
in one-way chi square, 221
in two-way chi square,
226–227
experimental hypotheses
creating, 108–109
defined, 108
experiments
components of, 13
vs. correlational study, 164
correlational study and, 14
defined, 12
dependent variables and,
12–13
drawing conclusions from, 13–14
independent variables and, 12
experiment-wise error rate, 287
F
factorial design, 204
factors. See also independent
variables
between-subjects, 185, 186
collapsing, 205
in computing degrees of
freedom, 193
defined, 185, 186
eta squared and, 198–199
levels in, 185, 186
main effect of, 204–206
mean square between groups
and, 189
post hoc comparisons and, 188
Tukey HSD test and, 196
within-subjects, 185, 186
F distribution, 194–196
formulas. See specific procedures
F-ratio, 190–191
frequency
cumulative, 32–33
defined, 21
expected, 221, 226–227
polygons, 24–25
relative, 29–32
symbol for, 20
frequency distributions, 20–35
defined, 21
graphing, 23–25
labeling, 28
simple, 21, 22, 28
symbols and terminology
used in, 21
in table, 22–23
types of, 25–28
frequency polygons, 24–25
Friedman test, 230
F statistic, 188
F-table, 195, 197
G
gambler’s fallacy, 90
generalize, 5
goodness of fit test, 223–224.
See also one-way chi square
defined, 223
graphs
bar graphs, 23–24, 47–48, 229
data points on, 24–25
of frequency distributions,
23–25
frequency polygons, 24–25
of grouped distributions,
25–26
histograms, 24
of interaction effect, 211–212
line graph, 47
in one-way ANOVA, 198
regression line, 178, 179
scatterplots, 164–165
in two-way ANOVA, 210
X axis, 23
Y axis, 23
grouped distributions
defined, 25
graphs of, 25–26
sampling distribution for,
143–144
standard error of the difference
and, 145–146
statistical hypothesis for, 143
summary of, 148–149
independent variables. See also
factors
cause and, 12
conditions of, 12
defined, 12
treatment effect and, 185
inferential statistics, 10–11,
107–112
experimental hypotheses and,
108–109
one-sample experiment
and, 109
statistical hypotheses and,
109–112
defined, 107
population mean and, 107
in research, 94–96, 107–108
sampling error and, 95–96
interaction effect, two-way,
207–209, 211–212
interval estimation, 135
interval scale, 16
inverted U-shaped relationship,
166, 167
K
H
Kruskal–Wallis test, 230
histogram, 24
homogeneity of variance, 142
Honestly Significant Difference
test. See HSD test
HSD test, 196–197
hypotheses in research, 5
hypothesis testing, 106–139.
See also specific tests
alternative hypothesis and,
109–110
errors, 121–123
experimental hypotheses and,
108–109
inferential statistics and,
107–112
null hypothesis and, 110,
111–112
one-tailed test and, 118–120
results, reporting, 120–121
statistical, 109–112
statistical hypotheses and,
109–112
two-tailed test and,
113–114
L
I
independence, test of,
224–225
independent samples, 142
independent-samples t-test,
142–149
assumptions in, 142, 148
defined, 142
homogeneity of variance
and, 142
interpreting, 147–148
one-tailed tests and, 148
performing, 144–149
pooled variance and, 145
labeling frequency distributions,
28
laws of nature, 5
levels (k), 185, 186
linear correlation coefficient, 167
linear regression, 178–180
criterion variable in, 178
defined, 178
predicted Y score in,
179–180
predictions and, 178–180
predictor variable in, 178
procedure, 178–179
linear regression equation, 179
linear regression line, 166. See
also regression line
linear relationships, 165–166
line graph, 47
M
main effect, 204–206
defined, 204
examining, 210–211
of factor A, 204–205
of factor B, 205–206
main effect mean, 205
Mann–Whitney test, 230
margin of error, 135
matched-samples design, 149
mean, 41–48. See also
sample mean
applied to research, 45–48
as balance point of
distribution, 41–42
cell, 207
defined, 41
deviations around, 44
estimated standard error of,
128–130
inferential procedures and, 43
location of, on normal
distribution, 42
main effect, 205
vs. mode and median, 42–44
reporting, 65
sampling distribution of,
79–81
sampling distribution of
differences between,
143–144
standard error of, 81–82
sum of deviations around,
44–45
mean difference
defined, 151
sampling distribution of, 152
standard error of, 153
mean squares, 189–191
comparing, 190–191
computing, 193
formula, 189
F-ratio and, 190–191
between groups, 189
within groups, 189
measurement scales, 15–16
interval, 16
nominal, 15–16
ordinal, 16
ratio, 16
measure of central tendency,
37–49. See also mean;
median; mode
defined, 38
mean, 41–45
median, 40–41
mode, 39–40
measure of effect size, 156
measures of variability, 52–67
vs. consistency, 53–55
correlation coefficient and,
168–169
defined, 53
importance of, 53
normal distributions and, 54
population variance and
standard deviation, 59–61
range, 55
reporting, 65
sample standard deviation
and, 57–58
sample variance and, 55–57
in SPSS, 66
standard deviation and, 55,
57–59
median, 40–41
defined, 41
vs. mode and mean, 42–44
mixed-design ANOVA,
two-way, 203
mode, 39–40
bimodal distribution and, 39
defined, 39
limitations of, 39–40
vs. median and mean, 42–44
unimodal distribution and, 39
N
negative linear relationship, 166
negatively skewed
distributions, 27
Index
277
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
95% confidence interval, 137
nominal scale, 15–16, 39,
48, 219
nonlinear relationships, 166–167
non-normal distributions, 27, 28
nonparametric statistics
chi square procedure and, 220
defined, 108, 219
for ordinal scores, 229–230
vs. parametric statistics, 219
for ranked scores, 230
SPSS to perform, 231
nonsignificant
defined, 116–117
results, 115, 116–117
normal curve. See also standard
normal curve
defined, 26
to find relative frequency,
30–32
importance of, 26–27
proportion of the area under,
31–32
tail of the distribution and, 26
normal distribution, 26–27. See
also normal curve
defined, 26
location of mean on, 42
measures of variability and, 54
null hypothesis
population mean in, 188, 221
null hypothesis (H0), 110,
111–112
creating, 111
defined, 110, 111
f-distribution and, 194–195
in independent-sample t-tests,
143–148
interaction effects and, 208
main effects and, 205–206
mean square and, 189–191
in nonparametric
procedures, 230
in one-sample t-tests, 127–128
in one-tailed tests, 118–120,
133–134, 148, 154–156
in one-way ANOVA, 187–188
in one-way chi square,
221–223
Pearson r and, 175–177
power and, 123
proving, 115–117
rejecting, 115–117
in related-sample t-tests,
151–154
in t-distribution, 130–131,
133–134
in two-way ANOVA, 209
in two-way chi square,
224, 226
in Type I errors, 121–122
in Type II errors, 122–123
in within-subjects
ANOVA, 199
number of scores (N), 21,
24, 29
number of scores (n), 144
O
observed frequency (fo), 220–221
one-sample experiment, 109
one-sample t-test, 126–139
assumptions in, 128
278
defined, 127
interpreting, 133–135
one-tailed tests and,
133–134
performing, 128–132
setting up, 127–128
statistical hypotheses and, 127
summary of, 134–135
one-tailed test, 101, 118–120
defined, 109
independent-samples t-test
and, 148
in one-sample t-test, 133–134
of Pearson r, 177
related-samples t-test and,
154–155
scores decreasing, 119–120
scores increasing, 118–119
t-distribution and df and,
130–132
t-table and, 132
one-way analysis of variance
(ANOVA), 185–201. See also
factors
assumptions in, 197
components of, 189–191
defined, 185, 186
degrees of freedom in, 189,
191, 193, 194, 195
diagramming, 186
effect size and, 198–199
eta squared and, 198–199
experiment-wise error rate
and, 187
F statistic and, 188
key terms, 185, 186
performing, 191–196
post hoc comparisons
and, 188
reasons for conducting, 186
reporting, 198
SPSS to perform, 199
statistical hypotheses in,
187–188
summary of, 197–198
summary table of, 192
Tukey HSD test in,
196–197
within-subjects, 199
one-way chi square,
220–224
assumptions in, 197, 221
computing, 221–222
defined, 220
goodness of fit test and,
223–224
interpreting, 222–223
observed frequency in,
220–221
SPSS to perform, 231
ordinal scale (scores), 16–17, 24,
229–230
P
parameter
defined, 11
vs. statistic, 11
parametric statistics, 108
assumptions in, 108, 112
defined, 108
vs. nonparametric
statistics, 219
participants, 5–6
Pearson correlation coefficient
(r), 171–178
computing, 172–174
defined, 171
drawing conclusions about,
176–177
one-tailed test of, 177
reporting, 178
restricted range in, 171–172
sampling distribution of,
175–176
significance testing of, 174–177
summary of, 177–178
percentile
cumulative frequency and,
32–33
defined, 32
relative frequency and, 30
percent, 30
perfect correlation, 168–169
perfectly consistent
relationship, 8
phi coefficient (φ), 227–228
point estimation, 135
polygon, 24–25
pooled variance, 145
population mean (μ)
in alternative hypothesis, 110,
143, 188
in computing z-score, 71
confidence interval used to
estimate, 135–137
describing, 49
inferential statistics and, 107
interval estimation used to
estimate, 135
margin of error and, 135
in null hypothesis, 188, 221
in one-sample experiment, 109
point estimation used to
estimate, 135
population standard deviation
and, 59
population variance and, 59
vs. sample mean, 94, 96
populations
defined, 5
inferential statistics and, 10–11
samples and, 5–6
population standard deviation
defined, 59
estimated, 60–61
interpreting, 61
population mean and, 59
population variance
defined, 59
estimated, 60–61
interpreting, 61
population mean and, 59
positive linear relationship, 166
positively skewed
distributions, 27
post hoc comparisons. See also
Tukey HSD test
defined, 188
in one-way ANOVA, 188–189,
196
in one-way chi square, 223
in two-way ANOVA, 214
power, 123
predicted Y score (Y'), 179–180
predictions
deviations around the mean
and, 45
to interpret coefficient,
168–170
linear regression and, 178–180
one-tailed tests and, 133
predictor variable, 178
pretest/posttest design, 150
probability (p), 88–104
defined, 89
logic of, 91
probability distributions and,
90–91
random sampling and, 89,
94–96
representative samples and,
96–102
of sample means, 92–94, 96–97
sampling error and, 94–96
standard normal curve and,
obtaining from, 92–94
of Type I errors, 121–122
of Type II errors, 122
probability distributions, 90–91
proportion of the area under
the normal curve
defined, 31
finding, 31–32
standard deviation and, 58–59
z-distribution to determine,
75–79
proportion of variance
accounted for
computing, 180–181
defined, 180
effect size using, 157–158,
198, 211
Q
qualitative variables, 7
quantitative variables, 7
quasi-independent variables, 12
R
random sampling, 94–96
defined, 89
sampling error and, 94–96
range, 55
ranked scores, 230
ratio scale, 16
raw scores. See also z-scores
computing when z is known,
71–72
decimal places and, 37
defined, 21
means and, 41, 44
relative frequency and, 30
standard deviation and,
57, 58
in z-distribution, 72–75
region of rejection
critical value and, 99–100, 101
defined, 98
sampling distribution and,
98–99, 101–102
regression line
vs. correlation coefficient, 166
linear, 166
predicted Y score and,
179–180
scatterplot and, 169, 170,
175, 179
related samples, 149–150
related-samples t-test, 149–155
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
assumptions in, 149, 150, 155
defined, 149
interpreting, 153–154
logic of, 150–151
matched-samples design
and, 149
mean difference in, 151
one-tailed tests and,
154–155
performing, 152–155
repeated-measures design
and, 149
sampling distribution of mean
differences and, 152
standard error of the mean
difference and, 153
statistical hypotheses for,
151–152
summary of, 155
in two-sample experiment, 142
relationships. See also
correlational study
absence of, 9–10
association and, 7
consistency of, 8–9
defined, 7
strength of, 9, 167–171
types of, 165–167
weakness of, 9
relative frequency, 29–32
computing, 29–30
defined, 29
normal curve used to find,
30–32
of sample means, 83–84
standard normal curve and,
75–76
z-distribution to compute,
75–79
z-tables and, 76–78
relative standing, 69
defined, 69
z-distribution and, 73–74
repeated-measures design, 149
representative samples
critical value and, 99–101
defined, 94
likelihood of representativeness
and, 96–102
random sampling and, 94–96
sampling distribution and,
setting up, 98–99,
101–102
sampling error and, 94–96
vs. unrepresentative samples, 6
research. See also statistics
conducting, 4
confidence interval created in,
135–136
distributions in (See frequency
distributions)
inferential statistics in, 94–96,
107–108
linear correlation coefficient
in, 167
logic of, 5–7
mean in, 41, 44, 45–48, 74
normal curve in, 26–27
Pearson correlation coefficient
in, 171–174
populations and, 5–6
samples and, 5–6
sampling error in, 109
standard deviation in, 55, 76
t-test in (See one-sample t-test;
two-sample t-test)
variables in, 6–9
variance in, 55
research literature, statistics in
reporting ANOVA, 198
reporting χ2, 229
reporting means and
variability, 65
reporting Pearson r, 178
reporting significant/
nonsignificant results,
120–121
reporting t, 136
reporting two-sample
study, 156
restricted range, 171–172
rho (ρ), 175
rounding, rule for, 37
S
sample mean
computing z-score for,
82–83
formula for, 41
vs. population mean, 94, 96
probability of, 92–94, 96–97
relative frequency of, 83–84
z-score to describe, 79–84
samples. See also representative
samples
defined, 5
independent, 142
populations and, 5–6
random sampling and, 89
related, 149–150
sample standard deviation,
57–58
sample variance, 55–57
computing, 63–64
defined, 56
formula for, 56
sampling distribution
chi square, 222–223
of differences between means,
143–144
of F, 194–196
for independent-samples t-test,
143–144
of mean differences, 152
of means, 79–81
for one-tailed test, 101
of Pearson r, 175–176
probability of sample means
and, 92–94, 96–97
region of rejection and, 98–99,
101–102
representativeness and,
100–101
setting up, 98–100, 101–102
for two-tailed test, 99, 101
sampling error
defined, 95
inferential statistics and, 95–96
probability and, 94–96
scales of measurement. See
measurement scales
scatterplot, 164–165. See also
correlation coefficient
scores.
characteristics of, 15–17
continuous vs. discrete
variables and, 16–17
deviation of, 46
frequency of (See frequency
distributions)
as locations on continuum, 38
measurement scales and, 15–16
number of, 21
one-tailed test for decreasing,
119–120
one-tailed test for increasing,
118–119
pairs of, 13, 164, 172
populations of vs. samples
of, 6
raw (See raw scores)
sigma (s). See population
standard deviation
sigma (∑). See sum
significance testing
of Pearson r, 174–177
t-test used for, 127
significant, defined, 115
significant results, determining,
115–116
simple frequency distributions,
21, 22, 28
skewed distributions, 27
slope of the line, 179
Spearman correlation coefficient
(rs), 230
spread of scores. See measures
of variability; normal
distribution; standard
deviation
SPSS
chi square procedures and, 231
measures of central tendency
and, 40, 50–51
nonparametric procedures
and, 231
one-sample t-test and, 138
one-way ANOVA and, 199
Pearson r and, 181
percentiles and, 32–33
statistics and, 4–5
two-sample t-test and, 159
two-way ANOVA and, 215
two-way chi square and, 231
z-scores in, 85–86
squared point-biserial correlation
coefficient, 158
squared sum of X, 62–63
squared sum of Y, 172
standard deviation. See also
population standard
deviation
area under the curve, 58–59
computing, 63–64
formulas for, computing, 63–64
measures of variability and,
55, 57–59
summary of, 61–62
standard error
of the difference, 145–146
of the estimate, 180
of the mean, 81–82
of the mean difference, 153
standard normal curve
defined, 75
probability obtained from, 92
relative frequency of sample
means and, 83
t-distribution and, 130
z-distribution and, 75–77
z-table and, 83–84
standard scores. See z-scores
statistical hypotheses, 109–112
alternative hypothesis and,
109–110
creating, 109–112
defined, 109
for independent-samples
t-test, 143
logic of, 112
null hypothesis and, 110,
111–112
one-sample t-test and, 127
in one-way ANOVA, 187–188
parametric statistics in, 112
for related-samples t-test,
151–152
statistical procedures. See
statistics
statistics. See also research
defined, 11
descriptive, 10
inferential, 10–11
vs. parameter, 11
purpose of, 3–4
SPSS computer program
and, 4–5
studying, 4
strength of relationship, 9,
167–171
defined, 167
intermediate strength, 169–170
perfect correlation, 168–169
zero correlation, 170
sum
of cross products, 172
of the deviations around the
mean, 44–45
of squares (SS), 189, 192–193
of X, 37, 62
of Y, 172
symmetrical distribution. See
normal distribution
T
tables
frequency distributions in,
22–23
F-tables, 195, 197
summary table of one-way
ANOVA, 192
t-tables, 132
z-tables, 76–78, 83–84
tail of distribution, 26
t-distribution
defined, 130
degrees of freedom and,
130–132
test of independence, 224–225.
See also two-way chi square
defined, 224–225
theoretical probability
distribution, 91
treatment, 185, 186
treatment effect, 185, 186, 196
t-table, 132
t-tests. See one-sample t-test;
two-sample t-test
Tukey HSD test
in one-way ANOVA, 196–197
in two-way ANOVA, 212–213
two-sample t-test, 140–161
effect size and, 156–158
experiment, understanding,
141–142
Index
279
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
two-sample t-test (Continued)
independent-samples t-test and,
142–149
related-samples t-test and, 142,
149–155
reporting, 156
two-tailed test, 99, 101
defined, 109
sampling distribution for,
113–114
two-way analysis of variance
(ANOVA), 203–217
cells in, 204
completing, 209–213
defined, 203
interpreting, 214
main effects and, 204–206
reasons for conducting,
203–204
SPSS to perform, 215
Tukey HSD test and, 212–213
two-way between-subjects, 203
two-way interaction effect and,
207–209
two-way mixed design, 203
two-way within-subjects, 203
two-way between-subjects
ANOVA, 203
two-way chi square, 224–229
assumptions in, 224
computing, 226–227
defined, 224, 225
logic of, 224–226
relationship in, 227–229
SPSS to perform, 231
test of independence and,
224–225
two-way interaction effect,
207–209
two-way mixed design
ANOVA, 203
factorial design in, 204
280
two-way within-subjects
ANOVA, 203
Type I errors, 121–122
defined, 121
experiment-wise error rate
and, 187
probability of, 121–122
Type II errors, 122–123
defined, 122
power and, 123
probability of, 122
type of relationship,
165–167
coefficient used to
describe, 167
linear, 165–166
nonlinear, 166–167
terminology to describe, 166
U
unbiased estimators, 60–61
unconfounded comparison, 212
under the curve, 30–31. See also
proportion of the area under
the curve
unimodal distribution, 39
unrepresentative samples, 6
U-shaped pattern, 166
V
variability. See measures of
variability
variables
continuous vs. discrete,
16–17
defined, 6
dependent, 12–13
independent, 12, 185
qualitative vs. quantitative, 7
quasi-independent, 12
relationships between,
7–10
in research, 6–9
understanding, 6–7
z-distribution to compare,
74–75
variance
estimated population variance,
60–61
population, 59–61
sample, 55–57
W
weakness of relationship, 9
Wilcoxon test, 230
within-subjects ANOVA, 199
two-way, 203
X
X axis, 23
X variable. See independent
variable; predictor variable
Y
Y axis, 23
Y intercept, 179
Y variable, See dependent
variable; criterion variable
Z
z-distribution, 72–79
area under the curve and,
75–79
characteristics of, 73
to compare different variables,
74–75
to compute relative frequency,
75–79
defined, 72
to interpret scores, 72–74
raw scores in, 72–75
zero correlation, 170, 176
z-scores, 69–72, 79–84
computing, 114
computing for sample mean,
82–83
computing in sample or
population, 70–71
computing raw score when z is
known, 71–72
critical value compared to,
99–100, 114–115
critical value of, 99–100,
114–115
defined, 70
to describe sample means,
79–84
to determine relative frequency
of raw scores, 75–79
raw scores transformed
into, 69
relative location as, 69–70
relative standing of, 69
SPSS and, 85–86
z-tables, 76–78, 83–84
z-test, 113–115. See also onesample t-test
assumptions in, 113, 117
comparing z to critical value
and, 114–115
computing z and, 114
defined, 113
nonsignificant results,
interpreting, 115,
116–117
sampling distribution for
two-tailed test and,
113–114
significant results, interpreting,
115–116
summary of, 117–118
Behavioral Sciences STAT2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
reviewcard
CHAPTER 1 INTRODUCTION TO STATISTICS
AND RESEARCH
CHAPTER SUMMARY
1-1
KEY TERMS
Learning about Statistics
• Statistical procedures are used to make sense out of data obtained in behavioral
research: They are used to organize, summarize, and communicate data and to
interpret what the data indicate.
1-2
The Logic of Research
• The population is the group of individuals to which a law of nature—and our
conclusions—apply.
• The subset of the population that is actually measured in a study is the sample.
• The individuals measured in the sample are the participants.
• The scores and behaviors of the sample are used to infer (estimate) the scores and
behaviors found in the population.
• A representative sample accurately describes the population. However, by chance,
a sample may be unrepresentative.
• A variable is anything about a situation or behavior that can produce two or more
different scores.
• Variables may be quantitative, and reflect an amount, or qualitative, and reflect a
quality or category.
1-3
variable
quantitative
qualitative
relationship
Applying Descriptive and Inferential Statistics
• Descriptive statistics are used to describe sample data.
• Inferential statistics are used for deciding whether the sample data accurately
represent the relationship found in the population.
• A statistic is a number that describes a characteristic of a sample of scores,
symbolized using a letter from the English alphabet.
• A parameter is a number that describes a characteristic of a population of scores,
symbolized using a letter from the Greek alphabet.
1-5
sample
participants
Understanding Relationships
• A relationship occurs when, as the scores on one variable change, the scores on
the other variable tend to change in a consistent fashion.
• The consistency in a relationship is also referred to as its strength.
• In a perfectly consistent relationship, only one value of Y is associated with one X,
and a different Y is associated with a different X.
• In a weaker relationship, one batch of Y scores is associated with one X, and a
different batch of Y scores is associated with a different X.
• When no relationship is present, virtually the same batch of Y scores is paired with
every X.
1-4
population
descriptive statistics
inferential statistics
statistic
parameter
Understanding Experiments and Correlational Studies
• A study’s design is the way in which the study is laid out.
• In an experiment, we manipulate the independent variable and then measure
participants’ scores on the dependent variable. A specific amount or category of
the independent variable that participants are tested under is called a condition.
• An experiment shows a relationship if participants’ dependent scores tend to
consistently change as we change the conditions of the independent variable.
• In a correlational study, neither variable is actively manipulated. Scores on both
variables are simply measured, and then the relationship between them is
examined.
design
experiment
independent variable
dependent variable
condition
correlational study
1.1
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER SUMMARY
1-6
KEY TERMS
The Characteristics of Scores
The four scales of measurement are:
• A nominal scale, in which numbers name or identify a quality or characteristic.
• An ordinal scale, in which numbers indicate rank order.
• An interval scale, in which numbers measure a specific amount, but with no true
zero, so negative numbers are allowed.
• A ratio scale, in which numbers measure a specific amount and zero indicates truly
zero amount.
• A continuous variable can be measured in fractional amounts.
• A discrete variable cannot be measured in fractional amounts.
PROCEDURES AND FORMULAS
Summary of Identifying an Experiment’s Components
Researcher’s Activity
Role of Variable
Name of Variable
Amounts of Variable Present
Researcher manipulates
variable
→
Variable influences a
behavior
→
Independent variable
→
Conditions
Researcher measures
variable
→
Variable measures
behavior that is influenced
→
Dependent variable
→
Scores
Summary of Measurement Scales:
Type of Measurement Scale
Nominal
Ordinal
Interval
Ratio
What Does the Scale
Indicate?
Quality
Relative quantity
Quantity
Quantity
Is There an Equal Unit of
Measurement?
No
No
Yes
Yes
Is There a True Zero?
No
No
No
Yes
How Might the Scale Be
Used in Research?
To identify males and
females as 1 and 2
To judge who is 1st, 2nd,
etc., in aggressiveness
To convey the results
of intelligence and
personality tests
To count the number of
correct answers on a test
Additional Examples
Social Security numbers
Elementary school grade
Individual’s standing
relative to class average
Distance traveled
Nominal and ordinal scales are assumed to be discrete, in which fractions are not possible.
Interval and ratio scales are assumed to be continuous, in which fractions are possible.
Summary of the Flow of Research
1. Based on a hypothesis about nature, we design either an experiment or a correlational study to observe the relationship
between our variables in the sample.
2. Depending on the design and scale of measurement used, we select particular descriptive statistics to understand the scores
and the relationship in the sample.
3. Depending on the design and scale used, we select particular inferential procedures to decide whether the sample
accurately represents the scores and relationship found in the population.
4. By describing the scores and relationship in the population, we are describing the behavior of everyone in a particular
situation, so we are describing an aspect of nature.
1.2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
reviewcard
CHAPTER 1 INTRODUCTION TO
STATISTICS AND RESEARCH
PUTTING IT ALL TOGETHER
1. The purpose of statistical procedures is to make data meaningful by organizing, _____,
communicating, and _____scores.
2. The large group of individuals or scores to which a law of nature applies is called the _____.
3. The small group of individuals we measure in a study is called the _____.
4. The individuals we measure in a study are called the _____.
5. To draw a research conclusion, we use the scores in the _____ to estimate the scores in the
_____.
6. For inferences about a population to be accurate, the sample must be _____ of the
population.
7.
However, a sample may be unrepresentative because of _____ in determining which
participants were selected.
8. Anything about a situation or behavior that when measured produces different scores is
called a(n) _____.
9. If we are measuring actual amounts of a variable, it is a(n) _____variable.
10. If we are identifying qualities or categories, we have a(n) _____ variable.
11. When the scores on one variable change in a consistent manner as the scores on another
variable change, a(n) _____ is present.
12. The clearer the pattern in a relationship, the more _____ the X and Y scores pair up.
13. How consistently the Y scores are associated with each X is also referred to as the _____ of a
relationship.
14. The procedures used to describe a sample of data are called _____ procedures.
15. The procedures used to make inferences about the scores and relationship in the population
are called _____ procedures.
16. A number describing an aspect of scores in the population is called a(n) _____.
17. A number describing an aspect of the sample data is called a(n) _____.
18. The layout of a study is called its _____.
19. The two general types of designs used to demonstrate a relationship are _____ and _____.
20. When we actively manipulate one variable to create a relationship with another variable, we
are conducting a(n) _____.
21. When we passively measure scores from two variables to discover a relationship, we are
conducting a(n) _____.
22. In an experiment, the variable manipulated by the experimenter is called the _____ variable.
23. In an experiment, the variable that measures participants’ behavior and produces our data is
the _____variable.
24. The situation created by each amount or category of the independent variable is called
a(n) _____.
25. Say that we measure the number of hours spent “social networking” by students who have
spent 1, 2, 3, or 4 years in college. The hours spent networking is our _____variable, and year
in college is our _____ variable.
26. In question 25, if we measure only freshmen and seniors, then we have two _____.
27. In question 25, we will demonstrate a relationship if the _____ scores for each group tend to
be different.
28. The particular descriptive or inferential procedures we use in a study are determined by the
_____ of the study and the _____ used to measure the variables.
1.3
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
SPSS INSTRUCTIONS
29. When we use numbers to identify qualitative or category differences, we have
a(n) _____ scale.
Entering Data and
Naming Variables
30. When the scores indicate rank order, we have a(n) _____ scale.
• The first step is to input the data.
Open SPSS to the large grid labeled
“Data Editor.” Across the top is a
“Menu Bar” with buttons for dropdown menus such as File or Analyze.
• Enter a set of scores from one variable
in one column, one score per box.
You may simply type in the data,
but if you do so, the variable will be
named “VAR00001” and so on.
• It is better to give variables
meaningful names. To name them,
click on Variable View at the bottom
of the editor.
• In the left column under “Name,”
click on the first rectangle and
type a variable’s name. Press Enter
on the keyboard. The information
that appears is the SPSS defaults,
including rounding to two decimal
places. Click a rectangle to change
the default.
• For a second variable, click on the
second rectangle in the “Name”
column and enter the variable’s
name.
• Click on Data View and type in the
scores under the corresponding
variable.
• When participants are measured on
two variables, each row holds the
scores from the same participant.
• To save data, on the Menu Bar click
File and then Save.
• To retrieve a file to add more data
or to analyze it, double-click on the
saved file. This also opens SPSS.
31. When scores measure an amount, but there is no true zero, we have a(n) _____
scale.
32. When scores measure an amount and zero indicates truly zero amount, we
have a(n) _____ variable.
33. A variable that can be measured in fractional amounts is called a(n) _____
variable.
34. A variable that cannot be measured in fractional amounts is called a(n) _____
variable.
Answers to Putting It All Together
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
1. summarizing;
interpreting
population
sample
participants
sample; population
representative
chance
variable
quantitative
qualitative
relationship
consistently
20.
21.
22.
23.
24.
13.
14.
15.
16.
17.
18.
19.
strength
descriptive
inferential
parameter
statistic
design
experiments;
correlational studies
experiment
correlational study
independent
dependent
condition
26.
27.
28.
29.
30.
31.
32.
33.
34.
25. dependent;
independent
conditions
networking
design; scale
nominal
ordinal
interval
ratio
continuous
discrete
1.4
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
reviewcard
CHAPTER 2 CREATING AND USING
FREQUENCY DISTRIBUTIONS
CHAPTER SUMMARY
2-1
•
•
•
•
•
Some New Symbols and Terminology
The initial scores in a study are the raw scores.
A score’s (simple) frequency is the number of times the score occurs.
The symbol for simple frequency is f.
A frequency distribution shows the frequency of each score in the data.
The symbol for the total number of scores in the data is N.
2-2
data point
polygon
grouped distribution
normal distribution
normal curve
tail
negatively skewed
distribution
positively skewed
distribution
bimodal distribution
Relative Frequency and the Normal Curve
• The relative frequency of a score is the proportion of time that it occurred.
• The total area under the normal curve represents 100% of the scores. A proportion
of this area occupied by particular scores equals the combined relative frequency
of those scores.
2-5
bar graph
histogram
Types of Frequency Distributions
• In a normal distribution forming a normal curve, extreme high and low scores are
relatively infrequent, scores closer to the middle score are more frequent, and the
middle score occurs most frequently.
• The low-frequency, extreme low and extreme high scores are in the tails of a
normal distribution.
• A negatively skewed distribution shows a pronounced tail only for low-frequency,
extreme low scores.
• A positively skewed distribution shows a pronounced tail only for low-frequency,
extreme high scores,
• A bimodal distribution shows two humps containing relatively high-frequency
scores, with a score in each having the highest frequency.
2-4
raw score
frequency
f
frequency distribution
N
Understanding Frequency Distributions
• Create a bar graph (adjacent bars do not touch) with nominal or ordinal scores.
• Create a histogram (adjacent bars touch) with a small range of interval or
ratio scores.
• A data point is a dot plotted on a graph.
• Create a polygon (data points connected by straight lines) with a large range of
interval or ratio scores.
• In a grouped distribution, different X scores are grouped together and their
combined frequency is reported.
2-3
KEY TERMS
relative frequency
area under the normal
curve
Understanding Percentile and Cumulative Frequency
• Percentile is the percent of all scores below a given score.
• Cumulative frequency is the number of scores at or below a particular score.
• On the normal curve the percentile of a score is the percent of the area under the
curve to the left of the score.
percentile
cumulative frequency
2.1
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
PROCEDURES AND FORMULAS
When to create a bar graph, histogram, or polygon
Consider the scale of measurement of scores on the X axis.
Graph
When Used?
How Produced?
Bar graph
With nominal or ordinal X scores
Adjacent bars do not touch
Histogram
With small range of interval/ratio Adjacent bars do touch
X scores
Polygon
With large range of interval/ratio
X scores
Straight lines; add points above
and below actual scores
Computing Relative Frequency, Proportions,
and Percents
1. To compute a score’s relative frequency, divide its frequency (f ) by the total
number of scores (N ).
2. To transform relative frequency to simple frequency, multiply the relative
frequency times N.
3. To transform relative frequency to a percent, multiply the proportion by 100.
4. To compute percent beginning with a raw score, perform steps 2 and 3 above.
5. Using the area under the normal curve:
The combined relative frequency for a group of scores equals the
proportion of the area in the slice above those scores.
A score’s percentile equals the proportion of the area under the curve to
the left of the score.
Chapter Formulas
Relative Frequency ⫽
f
N
2.2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
reviewcard
CHAPTER 2 CREATING AND USING
FREQUENCY DISTRIBUTIONS
PUTTING IT ALL TOGETHER
1. The number of times a score occurs in the data is called the score’s __________and is
symbolized as _________.
2. A(n) _________ organizes the data according to the number of times each score occurs.
3. The total number of scores in a sample is symbolized by _________ and equals the sum of
the _________ of the individual scores.
4. In a frequency table, scores are listed starting from the top with the __________ score.
5. A graph of a frequency distribution shows the scores on the _______axis and their
corresponding frequency on the _______axis.
6. A frequency graph showing vertical bars that do not touch is called a(n) _________ graph
and is used when the scale is either _________ or _________.
7. The height of each bar indicates the score’s _________.
8. The gap between adjacent bars indicates the variable on the _________ axis is a _________
variable.
9. When adjacent bars on the graph touch, the graph is called a(n) _________.
10. The lack of a gap between adjacent bars indicates the X variable is __________, so this
graph is produced when the scale is either _________ or _________.
11. When a frequency graph is created by plotting a dot above each score and connecting
adjacent dots with straight lines, the graph is called a(n) ________.
12. The dots on a graph are called ________ .
13. Polygons are created when the measurement scale is either _______ or _______.
14. The continuous line formed by the polygon indicates the scores were measured using a
_________ scale.
15. Histograms are preferred when we have a _________ range of X scores, and polygons are
preferred when we have a ________ range of X scores.
16. With too many scores to plot individually, we may combine scores into small groups
and report the total frequency for each group. Such distributions are called _________
distributions.
17. A distribution forming a symmetrical, bell-shaped polygon is known as a ________
distribution.
18. In a normal distribution, scores near the________ of the distribution occur frequently, while
the highest and lowest scores occur ________.
19. The portions of the curve containing low-frequency, extreme high or low scores are called
the ____________ of the distribution.
20. A nonsymmetrical distribution that has only one distinct tail is called a(n) _________
distribution.
21. When the only tail is at the extreme low scores, the distribution is _________ skewed; when
the only tail is at the extreme high scores, the distribution is ___________skewed.
22. A symmetrical distribution with two areas in which there are high-frequency scores is called
a(n) __________ distribution.
23. Most behavioral research produces distributions that form a ________ distribution.
24. The proportion of time that a score occurs in the data is called the score’s __________ .
25. A score’s relative frequency is computed by dividing the score’s ________by_______.
26. To convert a relative frequency back to simple frequency, _________ the relative frequency
by N.
2.3
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
SPSS INSTRUCTIONS
27. To convert a relative frequency to percent, _________ it by 100.
28. To convert a percent to relative frequency, _________ it by 100.
29. A “slice” of the normal curve that contains a particular proportion of the area
under the curve also contains that proportion of all _________ in the data.
30. The proportion of the area under the normal curve in a slice at certain scores
equals those scores’ combined _________.
31. The percent of the scores that are lower than a particular score is that score’s
__________.
32. The number of scores that are at or below a particular score is the score’s
_________.
33. When using the normal curve, a score’s percentile equals the proportion of the
area under the normal curve that is located to the_________of the score.
Answers to Putting It All Together
Frequency Tables and
Percentile
• Enter the data and name it as
described in Chapter 1.
• On the Menu Bar, click Analyze,
Descriptive Statistics, and Frequencies.
• Click the arrow to move the variable
to “Variable(s).”
• Click OK. The frequency table
appears. (“Cumulative percent” is not
the precise percentile.)
• To find the score at a particular
percentile, click Analyze, Descriptive
Statistics, Frequencies, and move the
data to “Variables(s).”
• Click Statistics. Checkmark
“Percentile(s)” and type the percentile
you seek.
• Click Add. Add other percentiles,
quartiles, or cut points.
• Click Continue and OK. The
percentile(s) will be listed in the
“Statistics” table .
Graphs
To plot bar graphs and histograms,
click Analyze, Descriptive Statistics, and
Frequencies.
1. frequency; f
2. frequency distribution
3. N; frequencies
4. highest
5. X; Y
6. bar; nominal; ordinal
7. frequency
8. X; discrete
9. histogram
10. continuous; interval;
ratio
11. polygon
12.
13.
14.
15.
16.
17.
18
19.
20.
21.
22.
23.
data points
interval; ratio
continuous
small; large
grouped
normal
middle; infrequently
tails
skewed
negatively; positively
bimodal
normal
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
relative frequency
f; N
multiply
multiply
divide
scores
relative frequency
percentile
cumulative frequency
left
• Click Charts. Select “Chart Type.” Click
Continue. Right-click on the graph to
export it.
• To plot polygons and more complex
graphs, on the Menu Bar, click Graphs
and then Chart Builder. If asked, click
OK to define the chart.
• Under “Choose from,” select the
graph’s style, and from the gallery,
drag and drop the version you want.
• Drag and drop your variable name to
the X axis. (These graphs can also be
used for plotting other types of X–Y
relationships.)
2.4
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
reviewcard
CHAPTER 3 SUMMARIZING SCORES WITH
MEASURES OF CENTRAL TENDENCY
CHAPTER SUMMARY
3-1
Some New Symbols and Procedures
• The symbol ⌺X stands for the sum of X.
• Round a final answer to two more decimal places than were in the original raw scores.
3-2
mode
unimodal;
bimodal
median (Mdn)
mean
X
Applying the Mean to Research
• The amount a score differs from the mean is its deviation, computed as X ⫺ X.
• The sum of the deviations around the mean, ⌺(X ⫺ X), equals zero. This makes the
mean the best score to predict for any individual because the total error across all
predictions will equal zero.
• In an experiment, a relationship between the independent and dependent
variables is present when the means from two or more conditions are different.
No relationship is present when all means are the same.
• When graphing the results of an experiment, the independent variable is plotted
on the X axis and the dependent variable on the Y axis.
• A line graph is created when the independent variable is measured using a ratio
or an interval scale. A bar graph is created when the independent variable is
measured using a nominal or ordinal scale.
• On a graph, if the data form a pattern that is not horizontal, then the Y scores are
changing as the X scores change, and a relationship is present. If the data form a
horizontal line, then the Y scores are not changing as X changes, and a relationship
is not present.
3-5
central tendency
Computing the Mean, Median, and Mode
• The mode is the most frequently occurring score or scores in a distribution, and is
used primarily to summarize nominal data.
• A distribution with only one mode is unimodal; a distribution with two modes is
bimodal.
• The median (Mdn) is the score at the 50th percentile. It is used primarily with
ordinal data and with skewed interval or ratio data.
• The mean is the average score, located at the mathematical center of a
distribution. It is used with interval or ratio data that form a normal distribution.
• The symbol for a sample mean is X.
3-4
sum of X
What Is Central Tendency?
• Measures of central tendency indicate a distribution’s location on a variable,
indicating where the center of the distribution tends to be.
• The three measures of central tendency are the mean, median, and mode.
3-3
KEY TERMS
deviation
sum of the deviations
around the mean
line graph
bar graph
Describing the Population Mean
• The symbol for a population mean is μ.
• The X in each condition of an experiment is the best estimate of (1) the score of
any participant in that condition and (2) the mean that would be found if the
population was tested under that condition.
• We conclude that a relationship in the population is present when we infer
different values of m, implying different distributions of dependent scores for two
or more conditions of the independent variable.
μ
3.1
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
PROCEDURES AND FORMULAS
Selecting a Measure of Central Tendency
Type of Data
Compute
What It Is
Nominal scores
Mode
Most frequent score
Ordinal scores
Median
50th percentile
Skewed interval or ratio scores
Median
50th percentile
Normally distributed interval
or ratio scores
Mean
Average score
Steps in Summarizing an Experiment
1. Identify the independent and dependent variables.
2. Summarize the dependent scores. Depending on the characteristics of the
dependent variable, compute the mean, median, or mode in each condition.
3. Graph the results. Depending on the characteristics of the independent
variable, create a line graph (with interval or ratio variables) or a bar graph (with
nominal or ordinal variables).
Chapter Formulas
⌺X
N
Sample Mean
X⫽
Deviation
(X ⫺ X )
Sum of the Deviations ⌺(X ⫺ X )
3.2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
reviewcard
CHAPTER 3 SUMMARIZING SCORES WITH
MEASURES OF CENTRAL TENDENCY
PUTTING IT ALL TOGETHER
1. Measures of central tendency describe the general _________ of a distribution on a variable.
2. The three measures of central tendency are the _________.
3. The most frequently occurring score in a distribution is called the _________ and is the
preferred measure when the scores involve a(n) _________ scale of measurement.
4. The score at the 50th percentile is the measure of central tendency called the _________ and
symbolized as _________.
5. The term “50th percentile” means that 50% of the scores are _________ this score.
6. The median is preferred when the data involve a(n) _________ scale.
7. The statistic located at the mathematical center of a distribution is the _________.
8. The mean is the preferred measure with a normal distribution of _________ or _________
scores.
9. The _________ is the preferred measure with skewed interval and ratio data.
10. The symbol for a sample mean is _________.
11. The formula for a sample mean is _________ divided by _________.
12. In words, “⌺X” is called the _________.
13. The difference between a score and the mean is called the score’s _________.
14. The symbol for a deviation is _________.
15. The distance a score is from the mean is indicated by the _________ of the deviation, and
the direction a score is from the mean is indicated by the _________ of the deviation.
16. A positive deviation indicates the score is _________ than the mean, and a negative
deviation indicates the score is _________ than the mean.
17. A score equal to the mean has a deviation of _________.
18. When we add together all the deviations in a sample, we find the “_________.”
19. We indicate this sum in symbols as _________.
20. The sum of the deviations around the mean always equals _________.
21. To predict any individual’s score, we should predict the _________ .
22. Then, in symbols, the difference between a participant’s actual score and his or her predicted
score is _________.
23. The total of all the prediction errors in symbols is _________, and always equals _________.
24. In an experiment, we measure participants’ scores on the _________ variable under each
condition of the _________ variable.
25. We decide which statistics to compute based on the characteristics of the _________
variable.
26. We decide which type of graph to create based on the characteristics of the _________
variable.
27. We usually summarize experiments by computing the _________ for each _________.
28. When a relationship is present, the _________ change as the conditions change.
29. To graph an experiment, place the _________ variable on X and the _________ variable
on Y.
30. Create a _________ graph if the _________ variable is interval or ratio; create a _________
graph if the variable is nominal or ordinal.
31. A graph shows a relationship if the pattern formed by the data points is not _________.
3.3
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
SPSS INSTRUCTIONS
Computing Central
Tendency
32. The symbol for a population mean is _________.
33. We usually estimate the population mean by computing _________ in a
sample.
34. The ultimate goal of an experiment is to conclude that changing our conditions
would produce different _________ located at different values of _________.
Answers to Putting It All Together
• Enter the data as in Chapter 1.
• On the Menu Bar, select Analyze,
Descriptive Statistics, and Frequencies.
• Move each variable to “Variables(s).”
• Click Statistics. In the “Frequencies:
Statistics” box, check each measure
under Central Tendency that you seek.
• Click Continue and OK. Your answers
will appear in the “Statistics” box.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
1. location
2. mode, median, and
mean
mode; nominal
median; Mdn
below
ordinal
mean
interval; ratio
median
X
⌺ X; N
sum of X
19.
20.
21.
22.
23.
24.
13.
14.
15.
16.
17.
18.
deviation
X⫺X
size; sign
larger; smaller
zero
sum of the deviations
around the mean
⌺ (X ⫺ X)
zero
mean
X⫺X
⌺ (X ⫺ X); zero
dependent;
30.
31.
32.
33.
34.
25.
26.
27.
28.
29.
populations; m
X
m
independent
dependent
independent
mean; condition
means and scores
independent;
dependent
line; independent; bar
horizontal
3.4
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
reviewcard
CHAPTER 4 SUMMARIZING SCORES WITH
MEASURES OF VARIABILITY
CHAPTER SUMMARY
4-1
Understanding Variability
• Measures of variability indicate how much the scores differ from each other, how
accurately the mean represents the scores, and how much the distribution is spread out.
4-2
population variance
(sX2)
population standard
deviation (sX)
biased estimators
unbiased estimators
estimated population
variance (sX2)
estimated population
standard deviation (sX)
Computing Formulas for the Variance and Standard
Deviation
• The symbol (⌺X)2 is the squared sum of X.
• The symbol ⌺X 2 is the sum of squared Xs.
4-7
sample standard
deviation (SX)
Summary of the Variance and Standard Deviation
• For the variance: S2X describes the sample, sX2 describes the population, and s2X is
the estimated population variance based on a sample.
• For the standard deviation: SX describes the sample, sX describes the population,
and sX is the estimated population standard deviation based on a sample.
4-6
sample variance (S 2X)
The Population Variance and Standard Deviation
• The true population variance (sX2) indicates the average squared deviation of
scores around m.
• The true population standard deviation (sX) can be interpreted as somewhat like
the “average” amount that scores deviate from μ.
• The formulas for SX and S 2X are biased estimators of the population’s variability
because they use N in the denominator.
• The formulas for sX and s2X are the unbiased estimators of the population’s
variability because they use N – 1 in the denominator.
4-5
range
The Sample Variance and Standard Deviation
• The variance and standard deviation are used with the mean to describe a normal
distribution of interval or ratio scores.
• The sample variance (S 2X) is the average of the squared deviations of scores around
the mean.
• The sample standard deviation (SX) is the square root of the variance, but can be
interpreted as like the “average” amount that scores deviate from the mean.
• On a normal distribution, 34% of the scores are between the mean and the score
that is 1 standard deviation from the mean.
4-4
measures of variability
The Range
• The range is the difference between the highest and the lowest scores.
• It is used as the sole measure of variability with nominal or ordinal scores.
4-3
KEY TERMS
squared sum of X [(⌺X)2]
sum of squared Xs (⌺ X 2)
Statistics in the Research Literature: Reporting Means
and Variability
• In research publications, the symbol for the sample mean is M, and the symbol for
the estimated population standard deviation is SD.
4.1
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
PROCEDURES AND FORMULAS
Organizing the Measures of Variability
Describing variability
(differences between scores)
Descriptive measures are
used to describe
a known sample
or population
In formulas, final division uses N
Inferential measures are used
to estimate the population
based on a sample
In formulas, final division uses N – 1
To describe
population variance
compute ␴ X2
To describe
sample variance
2
compute S X
Taking square root gives
Taking square root gives
Population
standard deviation
␴X
Sample
standard deviation
SX
To estimate
population variance
compute s X2
Taking square root gives
Estimated population
standard deviation
sX
Chapter Formulas
1. The formula for the range is
Range ⫽ highest score ⫺ lowest score
2. The computing formula for the sample variance is
(⌺X )2
N
⌺X 2 ⫺
SX2 ⫽
N
3. The computing formula for the sample standard deviation is
⌺X 2 ⫺
SX ⫽
H
(⌺X )2
N
N
4. The computing formula for estimating the population variance is
(⌺X )2
N
N⫺1
⌺X 2 ⫺
sX2 ⫽
5. The computing formula for estimating the population standard deviation is
(⌺X )2
⌺X ⫺
N
sX ⫽
H
N⫺1
2
4.2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
reviewcard
CHAPTER 4 SUMMARIZING SCORES WITH
MEASURES OF VARIABILITY
PUTTING IT ALL TOGETHER
1. In statistics the extent to which scores differ from one another is called _______.
2. A distribution with greater variability is _______ spread out around the mean, so the mean is
_______ accurate for describing the scores.
3. The opposite of variability is _______.
4. The measure of variability indicating how far the lowest score is from the highest score is the
_______.
5. It is used primarily when variables involve _______ or _______ scales.
6. The range is computed by subtracting the _______ score from the _______ score.
7. When the mean is the appropriate measure of central tendency, the two measures of
variability to compute are the _______ and _______.
8. The smaller these statistics are, the _______ the scores are spread out around the mean, and
the _______ the scores differ from one another.
9. The variance is defined as the _______ deviation of the scores around the mean.
10. The symbol for the sample variance is _______.
11. The variance can be difficult to interpret because it measures the variable in _______ units.
12. The measure of variability more similar to the average deviation of the scores around the
mean is called the _______.
13. Mathematically, the standard deviation equals the _______ of the variance.
14. The symbol for the sample standard deviation is _______.
15. The larger the standard deviation, the _______ the distribution is spread out.
16. To describe where most of the scores in a sample are located, we find the scores at _______
and _______ 1 standard deviation from the mean.
17. In a normal distribution, _______% of all scores fall between the mean and a score that is
1 standard deviation from the mean.
18. Between the scores at ⫹1SX and ⫺1SX from the mean are _______% of the scores.
19. The population variance and standard deviation indicate how spread out the scores are
around _______.
20. The symbol for the true population variance is _______, and for the true population
standard deviation, it is _______.
21. We expect 68% of the population to fall between the scores at ______ and _______.
22. We do not estimate the population variability using the formulas for SX and S2X because they
are called the _______.
23. They are called this because they tend to produce an answer that is too _______.
24. This occurs because they divide by N when only _______ of the scores in a sample reflect the
variability in the population.
25. Instead, we estimate the variability in the population using the _______, which divide
by _______.
26. The symbol for the estimated population standard deviation is ________, and the symbol
for the estimated population variance is _______.
27. In symbols, for the variance we compute _______ to estimate _______; for the standard
deviation we compute _______ to estimate _______.
4.3
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
SPSS INSTRUCTIONS
28. In published research the measure of variability researchers usually report is
the _______.
29. Publications usually use the symbol _______ for the sample mean and the
symbol ________ for the standard deviation.
Answers to Putting It All Together
Computing Variability
• After entering the data, on the
Menu Bar select Analyze, Descriptive
Statistics, and Frequencies. Move each
variable to “Variable(s).”
• Click Statistics. Check the measures of
Central Tendency and the measures
of Dispersion that you seek.
• Click Continue and OK.
• In the “Statistics” box, the standard
(std.) deviation and variance given are
the estimated population versions.
(You’ll encounter the “S.E. mean” in
Chapter 5.)
1. variability
2. more; less
3. consistency
4. range
5. nominal; ordinal
6. lowest; highest
7. variance; standard
deviation
8. less; less
9. average squared
10. S 2X
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
m
sX2 ; sX
⫹ 1 sX ; ⫺ 1 sX
squared
standard deviation
square root
SX
more
plus; minus
34
68
22. biased estimators
23. small
24. N ⫺ 1
25. unbiased estimators;
N⫺1
26. sX ; s2X
27. s2X ; s2X ; sX ; sX
28. estimated population
standard deviation
29. M; SD
4.4
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
reviewcard
CHAPTER 5 DESCRIBING DATA WITH z-SCORES
AND THE NORMAL CURVE
CHAPTER SUMMARY
5-1
Understanding z-Scores
• The relative standing of a score reflects a systematic evaluation of it relative to a
sample or a population.
• A z-score indicates the distance a score is from the mean when measured in
standard deviations.
• z-scores are used to describe the relative standing of raw scores, to compare
scores from different variables, and to determine their relative frequency.
5-2
relative standing
z-score (z)
Using the z-Distribution to Interpret Scores
• A z-distribution is produced by transforming all raw scores in a distribution into
z-scores.
• The mean of a z-distribution is 0 and the standard deviation is 1.
• A positive z-score indicates that the raw score is above the mean, and a negative
z-score indicates that the raw score is below the mean.
• The larger the absolute value of z, the farther the raw score is from the mean,
so the less frequently the z-score and raw score occur.
5-3
KEY TERMS
z-distribution
Using the z-Distribution to Compare Different Variables
• z-scores equate scores from different variables by comparing them using relative
standing.
• z-scores are also called “standard scores.”
• A particular z-score is always at the same relative location on the z-distribution for
any variable.
5-4
Using the z-Distribution to Compute Relative Frequency
• The standard normal curve is a perfect normal z-distribution that is our model of
the z-distribution. It is used with normally distributed, interval or ratio scores.
• Raw scores that produce the same z-score have the same relative frequency and
percentile.
• Using the standard normal curve and z-table, we determine the proportion of the
area under the curve that is above or below a particular z-score. This proportion is
also the expected relative frequency of raw scores in this portion of the curve.
5-5
standard normal curve
Using z-Scores to Describe Sample Means
• The sampling distribution of means is the frequency distribution of all sample
means that occur when an underlying raw score population is infinitely sampled
using a particular N.
• The central limit theorem shows that a sampling distribution is approximately
normal, has a m equal to the m of the underlying raw score population, and has
variability related to the variability of the raw scores.
• The true standard error of the mean (sX) is the standard deviation of the sampling
distribution of means.
• A z-score for a sample mean indicates how far the mean is from the m of the
sampling distribution when measured in standard error units.
• Using the standard normal curve and z-table, we determine the proportion of the
area under the curve that is above or below our mean’s z-score. This proportion
is the relative frequency of sample means above or below our mean that occur
when sampling from the underlying raw score population.
sampling distribution of means
central limit theorem
standard error of the mean (sX–)
5.1
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
PROCEDURES AND FORMULAS
Summary of Steps When Using the z-Table
If You Seek
First, You Should
Then You
Relative frequency of scores
between X and X
transform X to z
find area in column B*
Relative frequency of scores
beyond X in tail
transform X to z
find area in column C*
X that marks a given relative
frequency between X and X
find relative frequency in
column B
transform z to X
X that marks a given relative
frequency beyond X in tail
find relative frequency in
column C
transform z to X
Percentile of an X above X
transform X to z
find area in column B and
add .50
Percentile of an X below X
transform X to z
find area in column C
*To find the simple frequency of the scores, multiply relative frequency times N.
Summary of Steps When Computing a z-Score for a
Sample Mean
1. Create the sampling distribution of means with m equal to the m of the
underlying raw score population.
2. Compute the sample mean’s z-score:
a. Compute the standard error of the mean, sX.
b. Compute z.
3. Use the z-table to determine the relative frequency of scores above or below
this z-score. This is the relative frequency of sample means when sampling from
the underlying raw score population.
Chapter Formulas
1. The formula for a z-score in a sample is
z⫽
X⫺X
SX
2. The formula for a z-score in a population is
z⫽
X⫺m
sX
3. The formula for transforming a z-score to its raw score in a sample is
X ⫽ (z)(SX) ⫹ X
4. The formula for transforming a z-score to its raw score in a population is
X ⫽ (z)(sX) ⫹ m
5. The formula for the standard error of the mean is
sX ⫽
sX
1N
6. The formula for computing a z-score for a sample mean is
z⫽
X⫺m
sX
5.2
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
reviewcard
CHAPTER 5 DESCRIBING DATA WITH z-SCORES
AND THE NORMAL CURVE
PUTTING IT ALL TOGETHER
1. When we evaluate a raw score relative to other scores in the data, we are describing
the score’s _____.
2. The best way to accomplish this evaluation is by computing the score’s _____.
3. The definition of a z-score is that it indicates the _____ a raw score deviates from the mean
when measured in _____ units.
4. The symbol for a z-score is _____.
5. A positive z-score indicates that the raw score is _____ than the mean and graphed to the
_____ of it.
6. A negative z-score indicates that the raw score is _____ than the mean and graphed to the
_____ of it.
7. The size of the z-score (ignoring the sign) indicates the _____ the score is from the mean.
8. The z-score for a raw score that equals the mean is _____.
9. The larger a positive z-score is, the _____ is the raw score, and the _____ frequently it occurs.
10. The larger a negative z-score is, the _____ is the raw score, and the _____ frequently it
occurs.
11. Seldom are z-scores greater than { _____.
12. Transforming all raw scores in a distribution to z-scores results in a(n) _____.
13. The mean of a z-distribution always equals _____ and the standard deviation always equals
_____.
14. One reason to transform raw scores to z-scores is to make scores on different variables _____,
because we compare participants’ _____ in each sample.
15. Another reason to compute z-scores is to determine the _____ of raw scores.
16. Relative frequency tells us the _____ a score occurs.
17. The model of the z-distribution we employ is called the _____.
18. To use the standard normal curve, we first compute a(n) _____ to identify a slice of the
normal curve.
19. Then, from the z-table we determine the _____ in the slice.
20. Each proportion also equals the _____ of the corresponding z-scores in the slice.
21. The relative frequency of the z-scores is also the expected relative frequency of the
corresponding _____.
22. A raw score’s percentile equals the proportion of the curve that is to the _____ of its z-score.
23. When computing a percentile, we add .50 to the proportion obtained from the z-table when
the z-score has a(n) _____ sign, but not when the z-score has a(n) _____ sign.
24. We also use z-scores to describe the _____ of a sample mean when compared to a
distribution of all possible sample means that might occur.
25. This distribution is called the _____.
26. The statistical principle called the _____ defines the shape, the mean, and the standard
deviation of a sampling distribution.
27. The mean of the sampling distribution always equals the mean of the _____ population.
28. The standard deviation of the sampling distribution is called the _____.
29. The symbol for the true standard error of the mean is _____.
30. A z-score for a sample mean indicates the amount the mean deviates from the _____ of the
sampling distribution when measured in _____ units.
5.3
Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBo
Download