S T U D E N T T E S T E D , F A C U LT Y A P P R O V E D THE PROCESS Like all 4LTR Press solutions, Behavioral Sciences STAT begins and ends with student and faculty feedback. For the Statistics for the Behavioral Sciences course, here’s the process we used: Conduct research with students on their challenges and learning preferences. As psychology majors, students taking statistics for the behavioral sciences expressed a need for a broad base of examples and applications. Specifically, they were looking for resources that would help them apply key formula in a relevant, accessible fashion. Develop the ideal product mix with students to address each course’s needs. SHOW The first 4-color product in the statistics for the behavioral sciences course, Behavioral Sciences STAT offers students a visually engaging experience that makes the material more accessible. Additionally, Review and Tech cards provide students with a convenient resource to use in class and after graduation. Share student feedback and validate product mix with faculty. TEST Adopters of the first edition found that Behavioral Sciences STAT supported the way they teach by providing an efficient presentation of the concepts with current examples. Discussions were richer, because students came to class better prepared having read the chapter. Publish a Student-Tested, Faculty-Approved solution. Faculty broadly endorse our studenttested, faculty-approved approach, but suggest a title change from Marketing To Go to MKTG, officially launching the 4LTR Press brand. First adoption of MKTG. Early adopters embrace consistent approach, adopt multiple 4LTR Press solutions to drive better outcomes. Our first adoption of 20+ titles at a single school. Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. JUNE 2009 Marks the 1 millionth dollar saved by students. JANUARY 2009 Our title count grows to 8 solutions across business disciplines. 2008 MKTG publishes and launches a new debate about how best to engage today’s students. APRIL 2007 Student conversations begin. BY 2008 Behavioral Sciences STAT delivers an engaging mixture of print and digital tools that expose students to a variety of applications to prepare them to be professionals, while supporting instructors with a suite of tracking and assessment resources. FALL 2006 SPRING 2006 4LTR PRESS TIMELINE WORK MARCH 2007 CONTINUOUSLY IMPROVING MEET Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. This is an electronic version of the print textbook. Due to electronic rights restrictions, some third party content may be suppressed. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. The publisher reserves the right to remove content from this title at any time if subsequent rights restrictions require it. For valuable information on pricing, previous editions, changes to current editions, and alternate formats, please visit www.cengage.com/highered to search by ISBN#, author, title, or keyword for materials in your areas of interest. Important Notice: Media content referenced within the product description or the product text may not be available in the eBook version. Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Behavioral Sciences STAT2 Gary Heiman Product Director: Jon-David Hague Product Manager: Timothy Matray Content Developer: Thomas Finn Content Coordinator: Jessica Alderman Product Assistant: Nicole Richards Media Developer: Jasmin Tokatlian © 2015, 2012 Cengage Learning WCN: 02-200-203 ALL RIGHTS RESERVED. No part of this work covered by the copyright herein may be reproduced, transmitted, stored, or used in any form or by any means graphic, electronic, or mechanical, including but not limited to photocopying, recording, scanning, digitizing, taping, Web distribution, information networks, or information storage and retrieval systems, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the publisher. Brand Manager: Jennifer Levanduski Market Development Manager: Christine Sosa Content Project Manager: Michelle Clark Art Director: Jennifer Wahi Manufacturing Planner: Karen Hunt Rights Acquisitions Specialist: Roberta Broyer For product information and technology assistance, contact us at Cengage Learning Customer & Sales Support, 1-800-354-9706. For permission to use material from this text or product, submit all requests online at www.cengage.com/permissions. Further permissions questions can be e-mailed to permissionrequest@cengage.com. Production Service: Integra Photo and Text Researcher: PreMedia Global Library of Congress Control Number: 2013936603 Copy Editor: Integra ISBN-13: 978-1-285-45814-4 Text and Cover Designer: Trish Knapke ISBN-10: 1-285-45814-1 Cover Image: Cheryl Graham/iStockPhoto Compositor: Integra Cengage Learning 200 First Stamford Place, 4th Floor Stamford, CT 06902 USA Cengage Learning is a leading provider of customized learning solutions with office locations around the globe, including Singapore, the United Kingdom, Australia, Mexico, Brazil, and Japan. Locate your local office at www.cengage.com/global. Cengage Learning products are represented in Canada by Nelson Education, Ltd. To learn more about Cengage Learning Solutions, visit www.cengage.com. Purchase any of our products at your local college store or at our preferred online store www.cengagebrain.com. Printed in the United States of America 1 2 3 4 5 6 7 17 16 15 14 13 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. For my wife Karen, the love of my life Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. BRIEF CONTENTS 1 Introduction to Statistics and Research 2 2 Creating and Using Frequency Distributions 20 3 Summarizing Scores with Measures of Central Tendency 36 4 Summarizing Scores with Measures of Variability 52 5 Describing Data with z-Scores and the Normal Curve 68 6 Using Probability to Make Decisions about Data 88 7 Overview of Statistical Hypothesis Testing: The z-Test 106 8 Hypothesis Testing Using the One-Sample t-Test 126 9 Hypothesis Testing Using the Two-Sample t-Test 140 10 Describing Relationships Using Correlation and Regression 162 11 Hypothesis Testing Using the One-Way Analysis of Variance 184 12 Understanding the Two-Way Analysis of Variance 202 13 Chi Square and Nonparametric Procedures 218 Appendix A: Math Review and Additional Computing Formulas 234 Appendix B: Statistical Tables 252 Appendix C: Answers to Odd-Numbered Study Problems 264 Index 276 iv Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. CONTENTS 1 Introduction to Statistics and Research 2 1-1 Learning about Statistics 3 1-3 Understanding Relationships 7 1-4 Applying Descriptive and Inferential Statistics 10 1-5 Understanding Experiments and Correlational Studies 11 1-6 The Characteristics of Scores 15 © Vladimir L./Shutterstock.com 1-2 The Logic of Research 5 2 2-1 Some New Symbols and Terminology 21 2-2 Understanding Frequency Distributions 22 2-3 Types of Frequency Distributions 25 2-4 Relative Frequency and the Normal Curve 29 2-5 Understanding Percentile and Cumulative Frequency 32 © Sai Yeung Chan/Shutterstock.com Creating and Using Frequency Distributions 20 3 3-1 Some New Symbols and Procedures 37 3-2 What Is Central Tendency? 37 3-3 Computing the Mean, Median, and Mode 39 3-4 Applying the Mean to Research 44 3-5 Describing the Population Mean 49 Jerry and Marcy Monkman/EcoPhotography.com/Alamy Summarizing Scores with Measures of Central Tendency 36 v Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 4 Summarizing Scores with Measures of Variability 52 4-1 Understanding Variability 53 4-2 The Range 55 4-3 The Sample Variance and Standard Deviation 55 ©Stanth/Shutterstock.com 4-4 The Population Variance and Standard Deviation 59 4-5 Summary of the Variance and Standard Deviation 61 4-6 Computing the Formulas for Variance and Standard Deviation 62 4-7 Statistics in the Research Literature: Reporting Means and Variability 65 5 Describing Data with z-Scores and the Normal Curve 68 5-1 Understanding z-Scores 69 ©iStockphoto.com/Casarsa 5-2 Using the z-Distribution to Interpret Scores 72 5-3 Using the z-Distribution to Compare Different Variables 74 5-4 Using the z-Distribution to Compute Relative Frequency 75 5-5 Using z-Scores to Describe Sample Means 79 6 Using Probability to Make Decisions about Data 88 6-1 Understanding Probability 89 6-2 Probability Distributions 90 6-3 Obtaining Probability from the Standard Normal Curve 92 6-4 Random Sampling and Sampling Error 94 Zentilia/Dreamstime.com 6-5 Deciding Whether a Sample Represents a Population 96 vi Contents Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 7 Overview of Statistical Hypothesis Testing: The z-Test 106 7-2 Setting Up Inferential Procedures 108 7-3 Performing the z-Test 113 7-4 Interpreting Significant and Nonsignificant Results 115 7-5 Summary of the z-Test 117 ©iunewind/Shutterstock.com 7-1 The Role of Inferential Statistics in Research 107 7-6 The One-Tailed Test 118 7-7 Statistics in the Research Literature: Reporting the Results 120 7-8 Errors in Statistical Decision Making 121 8 Hypothesis Testing Using the One-Sample t-Test 126 8-1 Understanding the One-Sample t-Test 127 8-3 Interpreting the t-Test 133 8-4 Estimating m by Computing a Confidence Interval 135 8-5 Statistics in the Research Literature: Reporting t 138 ©Yuri Arcurs/Shutterstock.com 8-2 Performing the One-Sample t-Test 128 9 Hypothesis Testing Using the Two-Sample t-Test 140 9-1 Understanding the Two-Sample Experiment 141 9-2 The Independent-Samples t-Test 142 9-4 The Related-Samples t-Test 149 9-5 Performing the Related-Samples t-Test 152 9-6 Statistics in the Research Literature: Reporting a Two-Sample Study 156 9-7 Describing Effect Size 156 Fancy Collection/SuperStock 9-3 Performing the Independent-Samples t-Test 144 Contents Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. vii © Martin Holtkamp/Taxi Japan/Getty Images 10 Describing Relationships Using Correlation and Regression 162 10-1 Understanding Correlations 163 10-2 The Pearson Correlation Coefficient 171 10-3 Significance Testing of the Pearson r 174 10-4 Statistics in the Research Literature: Reporting r 178 10-5 An Introduction to Linear Regression 178 © Influx Productions/Photodisc/Jupiterimages 10-6 The Proportion of Variance Accounted For: r 2 180 11 Hypothesis Testing Using the One-Way Analysis of Variance 184 11-1 An Overview of the Analysis of Variance 185 11-2 Components of the ANOVA 189 11-3 Performing the ANOVA 191 11-4 Performing the Tukey HSD Test 196 11-5 Statistics in the Research Literature: Reporting ANOVA 198 11-6 Effect Size and Eta2 198 11-7 A Word about the Within-Subjects ANOVA 199 12 Understanding the Two-Way Analysis of Variance 202 12-1 Understanding the Two-Way Design 203 12-2 Understanding Main Effects 204 ©Aeypix/Shutterstock.com 12-3 Understanding the Interaction Effect 207 viii 12-4 Completing the Two-Way ANOVA 209 12-5 Interpreting the Two-Way Experiment 214 Contents Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 13 Chi Square and Nonparametric Procedures 218 13-1 Parametric versus Nonparametric Statistics 219 13-3 The One-Way Chi Square: The Goodness of Fit Test 220 13-4 The Two-Way Chi Square: The Test of Independence 224 13-5 Statistics in the Research Literature: Reporting χ 2 229 13-6 A Word about Nonparametric Procedures for Ordinal Scores 229 Nicholas Pavloff/Iconica/Getty Images 13-2 Chi Square Procedures 220 Appendix A: Math Review and Additional Computing Formulas 234 A-1 Review of Basic Math 234 A-2 Computing Confidence Intervals for the Two-Sample t-Test 238 A-3 Computing the Linear Regression Equation 239 A-4 Computing the Two-Way Between-Subjects ANOVA 241 A-5 Computing the One-Way Within-Subjects ANOVA 247 Appendix B: Statistical Tables 252 Appendix C: Answers to Odd-Numbered Study Problems 264 Index 276 Contents Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ix Chapter 1 INTRODUCTION TO STATISTICS AND RESEARCH GOING F O R WA R D Your goals in this chapter are to learn: • The logic of research and the purpose of statistical procedures. • What a relationship between scores is. • When and why descriptive and inferential procedures are used. • What the difference is between an experiment and a correlational study, and what the independent variable, the conditions, and the dependent variable are. • What the four scales of measurement are. Sections O kay, so you’re taking a course in statistics. What does this involve? Well, first of all, statistics involve math, but if that makes you a little nervous, you can relax: 1-1 Learning about Statistics 1-2 The Logic of Research 1-3 Understanding Relationships and divide—and use a calculator. Also, the term statistics is often Applying Descriptive and Inferential Statistics developed the statistical procedures you’ll be learning about. So Understanding Experiments and Correlational Studies and derivations, or doing other mystery math. You will simply The Characteristics of Scores the answer. And don’t worry, there are not that many to learn, and 1-4 1-5 1-6 You do not need to be a math wizard to do well in this course. You need to know only how to add, subtract, multiply, shorthand for statistical procedures, and statisticians have already you won’t be solving simultaneous equations, performing proofs learn how to select the statistical procedure—the formula—that is appropriate for a given situation and then compute and interpret these fancy-sounding “procedures” include such simple things as computing an average or drawing a graph. (A quick refresher in 2 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. © Vladimir L./Shutterstock.com math basics is in Appendix A.1. If you can do that, you’ll be fine.) Instead of thinking of statistics as math problems, think of them as tools that psychologists and other behavioral researchers employ when “analyzing” the results of their research. Therefore, for you to understand statistics, your first step is to understand the basics of research so that you can see how statistics fit in. To get you started, in this chapter we will discuss (1) what learning statistics involves, (2) the logic of research and the purpose of statistics, (3) the two major types of studies that researchers conduct, and (4) the four ways that researchers measure behaviors. 1-1 LEARNING ABOUT STATISTICS Why is it important to learn statistics? Statistical procedures are an important part of the research that forms the basis for psychology and other behavioral sciences. People involved with these sciences use statistics and statistical concepts every day. Even if you are not interested in conducting research yourself, understanding statistics is necessary for comprehending other people’s research and for understanding your chosen field of study. How do researchers use statistics? Behavioral research always involves measuring behaviors. For example, to study intelligence, researchers measure the IQ scores of individuals, or to study memory, they measure the number of things that people remember or forget. We call these scores the data. Any study typically produces a very large batch of data, and it is at this point that researchers apply statistical procedures, because statistics help us to make sense out of the data. We do this in four ways. 1. First, we organize the scores so that we can see any patterns in the data. Often this simply involves creating a table or graph. 2. Second, we summarize the data. Usually we don’t want to examine each individual score in a study, and a summary—such as the average score— allows us to quickly understand the general characteristics of the data. 3. Third, statistics communicate the results of a study. You will learn the standard techniques and symbols we use to quickly and clearly communicate results, especially in published research reports. 4. Finally, we use statistics to interpret what the data indicate. All behavioral research is designed to answer a question about a behavior and, ultimately, we must decide what the data tell us about that behavior. Chapter 1: Introduction to Statistics and Research Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 3 is learning this code. Once you speak the language, much of the mystery of statistics will evaporate. So learn (memorize) the terminology by using the glossary in the page margins and the other learning aids that are provided. THE PURPOSE OF STATISTICAL PROCEDURES IS TO MAKE SENSE OUT OF DATA. You’ll see there are actually only a few different ways that behavioral research is generally conducted, and for each way, there are slightly different formulas that we use. Thus, in a nutshell, the purpose of this course is to familiarize you with each research approach, teach you the appropriate formulas for that approach, and show you how to use the answers you compute to make sense out of the data (by organizing, summarizing, communicating, and interpreting). Along the way, it is easy to get carried away and concentrate on only the formulas and calculations. However, don’t forget that statistics are a research tool that you must learn to apply. Therefore, more than anything else, your goal is to learn when to use each procedure and how to interpret its answer. 1-1a Studying Statistics The nature of statistics leads to some “rules” for how to approach this topic and how to use this book. • You will be learning novel ways to think about the information conveyed by numbers. You need to carefully read and study the material, and often you will need to read it again. Don’t try to “cram” statistics. You won’t learn anything (and your brain will melt). You must translate the new terminology and symbols into things that you understand, and that takes time and effort. • Don’t skip something if it seems difficult because concepts and formulas build upon previous ones. Following each major topic in a chapter, test yourself with the in-chapter “Quick Practice.” If you have problems with it, go back—you missed something. (Also, the beginning of each chapter lists what you should understand from previous chapters. Make sure you do.) • Researchers use a shorthand “code” for describing statistical analyses and communicating research results. A major part of learning statistics 4 • The only way to learn statistics is to do statistics, so you must practice using the formulas and concepts. Therefore, at the end of each chapter are study questions that you should complete. Seriously work on these questions. (This is the practice test before the real test!) The answers to the odd-numbered problems are in Appendix C, and your instructor has the answers to the evennumbered problems. • At the end of this book are two tear-out “Review Cards” for each chapter. They include: (1) a Chapter Summary, with linkage to key vocabulary terms; (2) a Procedures and Formulas section, where you can review how to use the formulas and procedures (keep it handy when doing the end-of-chapter study questions); and (3) a Putting It All Together fill-in-the-blank exercise that reviews concepts, procedures, and vocabulary. (Complete this for all chapters to create a study guide for the final exam.) • You cannot get too much practice, so also visit the CourseMate website as described on the inside cover of this book. A number of study tools are provided for each chapter, including printable flashcards, interactive crossword puzzles, and more practice problems. 1-1b Using the SPSS Computer Program In this book we’ll use formulas to compute the answers “by hand” so that you can see how each is produced. Once you are familiar with statistics, though, you will want to use a computer. One of the most popular statistics programs is called SPSS. At the end of most chapters in this book is a brief section relating SPSS to the chapter’s procedures, and you’ll find step-bystep instructions on one of the Chapter Review Cards. (Review Card 1.4 describes how to get started by entering data.) These instructions are appropriate for version 20 and other recent versions of SPSS. Establish a routine of using the data from odd-numbered study problems at the end of a chapter and checking your answers in Appendix C. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. BRUSH UP ON BASIC MATH TO YOUR MATH SKILLS, CHECK OUT THE REVIEW OF IN APPENDIX A.1 ON PAGE 234. But remember, computer programs do only what you tell them to do. SPSS cannot decide which statistical procedure to compute in a particular situation, nor can it interpret the answer for you. You really must learn when to use each statistic and what the answer means. 1-2 THE LOGIC OF RESEARCH The goal of behavioral research is to understand the “laws of nature” that apply to the behaviors of living organisms. That is, researchers assume that specific influences govern every behavior of all members of a particular group. Although any single study is a small step in this process, our goal is to understand every factor that influences the behavior. Thus, when researchers study such things as the mating behavior of sea lions or social interactions among humans, they are ultimately studying the laws of nature. The reason a study is a small step is because nature iss very complex. Therefore, research involves a series of translations that simplify things so that we can examine a specific influence on a specific behavior in population a specific situation. Then, using The large group of individuals to which our findings, we generalize back a law of nature to the broader behaviors and applies laws we began with. For example, sample A here’s an idea for a simple study. relatively small subset Say that we think a law of nature of a population is that people must study informaintended to represent the population tion in order to learn it. We translate this into the more specific participants The individuals who hypothesis that “the more you are measured in a study statistics, the better you’ll sample learn them.” Next, we will translate the hypothesis into a situation where we can observe and measure specific people who study specific material in different amounts, to see if they do learn differently. Based on what we observe, we have evidence for working back to the general law regarding studying and learning. The first part of this translation process involves samples and populations. 1-2a Samples and Populations When researchers talk of a behavior occurring in nature, they say it occurs in the population. A population is the entire group of individuals to which a law of nature applies (whether all humans, all men, all 4-year-old English-speaking children, etc.). For our example, the population might be all college students who take statistics. A population usuallyy contains all possible members—past, present, and future—so we usually consider it to be infinitely large. However, to study an infinite popHow ulation would w take roughly forever! Instead, we w study a sample from the population. A sample is a relatively populatio small subset sub of a population that is intended to represent, or stand in for, the population. p Thus, we might study the students in your statistics cl class as a sample representing the population po of all college students studying statistics. The individuals measured in a sample are vidual called the participants and it is their scores that constitute our data. © iStockphoto.com/Andrzej Burak But what does it mean? Chapter 1: Introduction to Statistics and Research Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 5 © iStockphoto.com/Aldo Murillo example of the scores that everyone in the population would obtain. On the other hand, any sample can be unrepresentative and then it inaccurately reflects the population. The reason this occurs is simply due to random chance—the “luck of the draw” of who we happen to select for a sample. Thus, maybe, simply because of who happened to enroll in your statistics class, it contains some very unusual, atypical students who are not at all like those in the population. If so, then their behaviors and scores will mislead us about those of the typical statistics student. Therefore, as you’ll see, researchers always consider the possibility that a conclusion about the population—about nature— might be incorrect because it might be based on an unrepresentative sample. Researchers study the behavior of the individuals in a sample by measuring specific variables. 6 1-2b Understanding Variables We measure aspects of the situation that we think influence a behavior, and we measure aspects of the behavior itself. The aspects of the situation or behavior that we measure are called variables. A variable is anything that can produce two or more different scores. A few of the variables found in behavioral research include characteristics of an individual, like your age, race, gender, personality type, political affiliation, and physical attributes. Other variables measure your reactions, such as how anxious, angry, or aggressive you are, or how attractive you think someone is. Sometimes variables reflect performance, such as how hard you work at a task or how well you recall a situation. And variables also measure characteristics of a situation, like the amount of noise, light, or heat that is present; the difficulty or attributes of a task; or Erik Isakson/Getty Images Although researchers ultimately discuss the behavior of individuals, in statistics we often go directly to their scores. Thus, we will talk about the population of scores as if we have already measured the behavior of everyone in the population in a particular situation. Likewise, we will talk about a sample of scores, implying that we have already measured our participants. Thus, a population is the complete group of scores that would be found for everyone in a particular situation, and a sample is a subset of those scores that we actually measure in that situation. The logic behind samples and populations is this: We use the scores in a sample to infer—to estimate— the scores we would expect to find in the population if we could measure it. Then by translating the scores back into the behaviors they reflect, we can infer the behavior of the population. By describing the behavior of the population, we are describing how nature works, because the population is the entire group to which the law of nature applies. Thus, if we observe that greater studying leads to better learning for the sample of students in your statistics class, we will infer that similar scores and behaviors would be found in the population of all statistics students. This provides evidence that, in nature, more studying does lead to better learning. Notice that the above assumes that a sample is representative of the population. We discuss this issue in later chapters, but put simply, the individuals in a representative sample accurately reflect the individuals that are found in the population. This means that then our inferences variable Anything about the scores and behaviors found about a behavior or in the population will also be accusituation that, when measured, can produce rate. Thus, if your class is representatwo or more different tive of all college students, then the scores scores the class obtains are a good Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. how many others are present and the types of interactions you have with them. Variables fall into two general categories. If a score indicates the amount of a variable that is present, the variable is a quantitative variable. A person’s height, for example, is a quantitative variable. Some variables, however, cannot be measured in amounts, but instead classify or categorize an individual on the basis of some characteristic. These variables are called qualitative variables. A person’s gender, for example, is qualitative because the “score” of male or female indicates a quality, or category. For our research on studying and learning statistics, say that to measure “studying,” we select the variable of the number of hours that students spent studying for a particular statistics test. To measure “learning,” we select the variable of their performance on the test. After measuring participants’ scores on these variables, we examine the relationship between them. Table 1.1 Scores Showing a Relationship between the Variables of Study Time and Test Grades FYI: The data presented in this book are fictional. Any resemblance to real data is purely a coincidence. Student Gary Bo Sue Tony Sidney Ann Rose Lou Study Time in Hours 1 1 2 2 3 4 4 5 Test Grades F F D D C B B A 1-3 UNDERSTANDING RELATIONSHIPS If nature relates those mental activities we call studying to those mental activities we call learning, then different amounts of learning should occur with different amounts of studying. In other words, there should be a relationship between studying and learning. A relationship is a pattern in which, as the scores on one variable change, the scores on the other variable change in a consistent manner. In our example, we predict the relationship in which the longer you study, the higher your test grade will be. Say that we ask some students how long they studied for a test and their subsequent grades on the test. We obtain the data in Table 1.1. To see the relationship, first look at those people who studied for 1 hour and see their grade. Then look at those who studied 2 hours, and see that they had a different grade from those studying 1 hour. And so on. These scores form a relationship because as the study-time scores change (increase), the test grades also change in a consistent fashion (also increase). Further, when study-time scores do not change (e.g., Gary and Bo both studied for 1 hour), the grades also do not change (they both received Fs). We often use the term association when talking about relationships: Here, low study times are associated with low test grades and high study times are associated with high test grades. In a relationship, as the scores on one variable change, the scores on the other variable change in a consistent manner. Because we see a relationship in these sample data, we have evidence that in nature, studying and learning do operate as we hypothesized: quantitative The amount someone studies does variable A seem to make a difference in test variable for which grades. In the same way, whenever scores reflect the amount of the a law of nature ties behaviors or variable that is present events together, then we’ll see that qualitative particular scores from one variable variable A are associated with particular scores variable for which from another variable so that a relascores reflect a quality tionship is formed. Therefore, most or category that is present research is designed to investigate relationships, because relationships relationship A pattern between are the tell-tale signs of a law at work. two variables A major use of statistical prowhere a change cedures is to examine the scores in in one variable is accompanied by a a relationship and the pattern they consistent change in form. The simplest relationships fit the other one of two patterns. Let’s call one Chapter 1: Introduction to Statistics and Research Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 7 variable X and the other Y. Then, sometimes the relationship fits the description “the more you X, the more you Y.” Examples of this include the following: The more you study, the higher your grade; the more alcohol you drink, the more you fall down; the more often you speed, the more traffic tickets you receive; and even that old saying “The bigger they are, the harder they fall.” At other times, the relationship fits the description “the more you X, the less you Y.” Examples of this include the following: The more you study, the fewer the errors you make; the more alcohol you drink, the less coordinated you are; the more you “cut” classes, the lower your grades; and even that old saying “The more you practice statistics, the less difficult they are.” Relationships may also form more complicated patterns where, for example, more X at first leads to more Y, but beyond a certain point, even more X leads to less Y. For example, the more you exercise the better you feel, until you reach a certain point, beyond which more exercise leads to feeling less well, due to pain and exhaustion. Although the above examples involve quantitative variables, we can also study relationships that involve qualitative variables. For example, gender is a commonly studied qualitative variable. If you think of being male or female as someone’s “score” on the gender variable, then we see a relationship when, as gender scores change, scores on another variable also change. For example, saying that men tend to be taller than women is actually describing a relationship, because as gender scores change (going from men to women), their corresponding height scores tend to decrease. 1-3a The Consistency of a Relationship Table 1.1 showed a perfectly consistent association between hours of study time and test grades: All those who studied the same amount received the same grade. In a perfectly consistent relationship, a score on one variable is always paired with one and only one score on the other variable. This makes for a very clear, obvious pattern when you examine the data. In the real world, however, not everyone who studies for the same amount of time will receive the same test grade. (Life is not fair.) A relationship can be present even if there is only some degree of consistency. Then, as the scores on one variable change, the scores on the other variable tend to change in a consistent fashion. This produces a less obvious pattern in the data. 8 Table 1.2 Scores Showing a Relationship between Study Time and Number of Errors on Test Student Amy Karen Joe Cleo Jack Maria Terry Manny Chris Sam Gary X Hours of Study 1 1 1 2 2 2 3 3 4 4 5 Y Errors on Test 12 13 11 11 10 9 9 10 9 8 7 For example, Table 1.2 presents a less consistent relationship between the number of hours studied and the number of errors made on the test. Notice that the variables are also labeled X and Y. When looking at a relationship, get in the habit of asking, “As the X scores increase, do the Y scores change in a consistent fashion?” Answer this by again looking at one study-time score (at one X score) and seeing the error scores (the Y scores) that are paired with it. Then look at the next X score and see the Y scores paired with it. Two aspects of the data in Table 1.2 produce a less consistent relationship: First, not everyone who studies for a particular time receives the same error score (e.g., 12, 13, and 11 errors are all paired with 1 hour). Second, sometimes a particular error score is paired with different studying scores (e.g., 11 errors occur with both 1 and 2 hours of study). These aspects cause overlapping groups of different error scores to occur at each study time, so the overall pattern is harder to see. In fact, the greater the differences among the group of Y scores at an X and the more the Y scores overlap between groups, the less consistent the relationship will be. Nonetheless, we still see the pattern where more studying tends to be associated with lower error scores, so a relationship is present. Essentially, one batch of error scores occurs at one study-time score, but a different batch of error scores tends to occur at the next studytime score. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Notice that the less consistent relationship above still supports our original hypothesis about how nature operates: We see that, at least to some degree, nature does relate studying and test errors. Thus, we will always examine the relationship in our data, no matter how consistent it is. A particular study can produce anywhere between a perfectly consistent relationship and no relationship. In Chapter 10 we will discuss in depth how to describe and interpret the consistency of a particular relationship. (As you’ll see, the degree of consistency in a relationship is called its strength, and a less consistent relationship is a weaker relationship.) Until then, it is enough for you to simply know what a relationship is. A relationship is present (though not perfectly consistent) if there tends to be a different group of Y scores associated with each X score. A relationship is not present when virtually the same batch of Y scores is paired with every X score. 1-3b When No Relationship Is Present Table 1.3 At the other extreme, sometimes the scores from two variables do not form a relationship. For example, say that we had obtained the data shown in Table 1.3. Here, no relationship is present because the error scores paired with 1 hour are essentially the same as the error scores paired with 2 hours, and so on. Thus, virtually the same (but not identical) batch of error scores shows up at each study time, so no pattern of increasing or decreasing errors is present. These data show that how long people study does not make a consistent difference in their error scores. Therefore, this result would not provide evidence that studying and learning operate as we think. Student Amy Karen Joe Cleo Jack Maria Terry Manny Chris Sam Jane Gary X Hours of Study 1 1 1 2 2 2 3 3 3 4 4 4 Y Errors on Test 12 10 8 11 10 9 12 9 10 11 10 8 © iStockphoto.com/AAR Studio Less studying may lead to more errors ... Scores Showing No Relationship between Hours of Study Time and Number of Errors on Test Chapter 1: Introduction to Statistics and Research Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 9 1-4 APPLYING DESCRIPTIVE > Quick Practice > AND INFERENTIAL STATISTICS A relationship is present when, as the scores on one variable change, the scores on another variable tend to change in a consistent fashion. Statistics help us make sense out of data, and now you can see that “making sense” means to understand the scores and the relationship they form. However, because we are always talking about samples and populations, we separate statistical procedures into those that apply to samples and those that apply to populations. Descriptive statistics are procedures for organizing and summarizing sample data. The answers from such procedures are often a single number that describes important information about the scores. (When you see descriptive, think describe.) A sample’s average, for example, is an important descriptive statistic because in one number we summarize all scores in the sample. Descriptive statistics are also used to describe the relationship in sample data. For our study-time research, for example, we’d want to know whether a relationship is present, how consistently errors decrease with increased study time, and so on. (We’ll discuss the common descriptive procedures in the next few chapters.) After describing the sample, we want to use that information to estimate or infer the data we would find if we could measure the entire population. However, we cannot automatically assume that the scores and the relationship we see in the sample are what we would see in the population: Remember, the sample might be unrepresentative, so that it misleads us about the population. Therefore, first we apply additional statistical procedures. Inferential statistics are procedures for drawing inferences about the scores and relationship that would be found in the population. Essentially, inferential procedures help us to decide whether our sample accurately represents the relationship found in the population. If it does, then, for More Examples Below, Sample A shows a perfect relationship: One Y score occurs at only one X. Sample B shows a less consistent relationship: Sometimes different Ys occur at a particular X, and the same Y occurs with different Xs. Sample C shows no relationship: The same Ys tend to show up at every X. A X 1 1 1 2 2 2 3 3 3 B Y 20 20 20 25 25 25 30 30 30 C X 1 1 1 2 2 2 3 3 3 Y 12 15 20 20 30 40 40 40 50 X 1 1 1 2 2 2 3 3 3 Y 12 15 20 20 12 15 20 15 12 For Practice Which samples show a perfect, inconsistent, or no relationship? X 2 2 3 3 4 4 B Y 4 4 6 6 8 8 X 80 80 85 85 90 90 C Y 80 79 76 75 71 70 X 33 33 43 43 53 53 D Y 28 20 27 20 20 28 X 40 40 45 45 50 50 Y 60 60 60 60 60 60 © iStockphoto.com/Rob Broek A > Answers A: Perfect Relationship B: Inconsistent Relationship C and D: No Relationship 10 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. © Naypong/Shutterstock.com © iStockphoto.com/Robert MacAusland example, we would use the class average as an estimate of the average score we’d find in the population of students. Or, we would use the relationship in our sample to estimate how, for everyone, greater learning tends to occur with greater studying. (We discuss inferential procedures in the second half of this book.) population in a given situation, we are describing how a law of nature operates. 1-5 UNDERSTANDING 1-4a Statistics versus Parameters EXPERIMENTS AND CORRELATIONAL STUDIES Researchers use the following system so that we know when we are describing a sample and when we are describing a population. A number that describes an aspect of the scores in a sample is called a statistic. Thus, a statistic is an answer obtained from a descriptive procedure. We compute different statistics to describe different aspects of the data, and the symbol for each is a different letter from the English alphabet. On the other hand, a number that describes an aspect of the scores in the population is called a parameter. Thus, a parameter is obtained when applying inferential procedures. The symbols for the different parameters are letters from the Greek alphabet. For example, the average in your statistics class is a sample average, a descriptive statistic that is symbolized by a letter from the English alphabet. If we then estimate the average in the population, we are estimating a parameter, and the symbol for a population average is a letter from the Greek alphabet. After performing the appropriate descriptive and inferential procedures, we stop being a “statistician” and return to being a behavioral scientist: We interpret the results in terms of the underlying behaviors, psychological principles, sociological influences, and so on, that they reflect. This completes the circle, because by describing the behavior of everyone in the In research we can examine a relationship using a variety of different kinds of studies. In other words, we use different designs. The design of a study is how it is laid out— how many samples are examined, how participants are selected and tested, and so on. A study’s design is important because different designs require different descriptive and inferential procedures. Recall that your goal is to learn when to use each statistical procedure and, in part, that means learning the particular procedures that are appropriate for a particular design. (On the tear-out cards in your book is a decision tree for selecting procedures, which you should refer to as you learn statistics.) To begin, recognize that we have two major types of designs because we have two general ways of demonstrating a relationship: using experiments or using correlational studies. descriptive statistics Procedures for organizing and summarizing sample data inferential statistics Procedures for determining whether sample data accurately represent the relationship in the population statistic A number that describes a sample of scores; symbolized by a letter from the English alphabet parameter A number that describes a population of scores; symbolized by a letter from the Greek alphabet design The way in which a study is laid out Chapter 1: Introduction to Statistics and Research Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 11 experiment A study in which one variable is actively changed or manipulated and scores on another variable are measured to determine whether a relationship occurs independent variable In an experiment, a variable manipulated by the experimenter that is hypothesized to cause a change in the dependent variable condition An amount or category of the independent variable that creates the specific situation under which participants’ scores on the dependent variable are measured dependent variable In an Jetta Productions/Blend Images/Alamy experiment, the behavior or attribute of participants that is measured; expected to be influenced by the independent variable 1-5a Experiments In an experiment, the researcher actively changes or manipulates one variable and then measures participants’ scores on another variable to see if a relationship is produced. For example, say that we study amount of study time and test errors in an experiment. We decide to compare 1, 2, 3, and 4 hours of study time, so we select four samples of students. We have one sample study for 1 hour, administer the statistics test, and count the number of errors each participant makes. We have another sample study for 2 hours, administer the test, and count their errors, and so on. Then we determine if we have produced the relationship where, as we increase study time, error scores tend to decrease. You must understand the components of an experiment and learn their names. CONDITIONS OF THE INDEPENDENT VARIABLE An THE INDEPENDENT VARIABLE An independent variable is the vari- able that is changed or manipulated by the experimenter. We manipulate this variable because we assume that doing so will cause the behavior and scores on the other variable to change. Thus, in our example above, amount of study time is our independent variable: We manipulate study time because doing this should cause participants’ error scores to change in the predicted way. (To prove that this variable is actually the cause is a very difficult task, which we’ll save for an advanced discussion. In the meantime, be cautious when using the word cause.) You can remember independent because this variable occurs independently of participants’ 12 wishes (we’ll have some participants study for 4 hours whether they want to or not). Technically, a true independent variable is manipulated by doing something to participants. However, there are many variables that an experimenter cannot manipulate in this way. For example, we might hypothesize that growing older causes a change in some behavior, but we can’t make some people be 20 years old and make others be 60 years old. Instead, we would manipulate the variable by selecting one sample of 20-year-olds and one sample of 60-yearolds. We will also call this type of variable an independent variable (although technically it is called a quasi-independent variable). Statistically, we treat all independent variables the same. Thus, the experimenter is always in control of the independent variable, either by determining what is done to each sample or by determining a characteristic of the individuals in each sample. Therefore, a participant’s “score” on the independent variable is determined by the experimenter: Above, students in the sample that studied 1 hour have a score of 1 on the study-time variable; people in the 20-year-old sample have a score of 20 on the age variable. independent variable is the overall variable a researcher manipulates, which is potentially composed of many different amounts or categories. From these the researcher selects the conditions. A condition is the specific amount or category of the independent variable that creates the situation under which participants are studied. Thus, although our independent variable is amount of study time—which could be any amount— our conditions involve 1, 2, 3, or 4 hours of study. Likewise, if we compare 20-year-olds to 60-year-olds, then 20 and 60 are each a condition of the independent variable of age. THE DEPENDENT VARIABLE The dependent variable is the variable that measures a behavior or attribute of participants that we expect will be Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. influenced by the indepenTable 1.4 dent variable. Therefore, Summary of Identifying an Experiment’s Components we measure participants’ scores on the dependent Researcher’s Name of Amounts of variable in each condition. Activity Role of Variable Variable Variable Present You can remember depenResearcher dent because whether a → Variable influences a → Independent → Conditions manipulates behavior variable score is high or low prevariable sumably depends on a participant’s reaction to the Researcher → Variable measures → Dependent → Scores condition. (This variable measures behavior that is variable reflects the behavior that variable influenced is “caused” in the relationship.) Thus, in our studying experiment, test errors is our dependent variable because these scores depend Drawing Conclusions from on how participants respond to their particular study an Experiment time. Or, in a different experiment, if we compare the activity levels of 20- and 60-year-olds, then particiAs we change the conditions of the independent varipants’ activity level is the dependent variable because able, participants’ scores on the dependent variable presumably it depends on their age. Note: The depenshould also change in a consistent fashion. To see this dent variable is also called the “dependent measure” relationship, a useful way to diagram an experiment and we obtain “dependent scores.” is shown in Table 1.5. Each column in the diagram is 1-5b IDENTIFYING AN EXPERIMENT’S COMPONENTS It is a condition of the independent variable (here, amount of study time). The numbers in a column are the scores on the dependent variable from participants who were tested under that condition (here, each score is the number of test errors). Remember that a condition determines participants’ scores on the independent variable: Participants in the 1-hour condition each have a score of “1” on the independent variable, those under 2 hours have a score of “2,” and so on. Thus, the diagram communicates pairs of scores consisting of 1-13, 1-12, 1-11; then 2-9, 2-8, 2-7, etc. Now look for the relationship as we did previously: First look at the error scores paired with 1 hour, important that you can identify independent and dependent variables, so let’s practice: Say my experiment is to determine if a person’s concentration is improved immediately after physical exercise. First, recognize that implicitly, we are always looking for a relationship, so I’m really asking, “Is it true that the more people exercise, the more their concentration improves?” Therefore, also implicitly, I’m going to need to measure the concentration of different participants after I make them get different amounts of exercise. So what are the variables? Use Table 1.4, which summarizes the decision process. (The table is also on Review Card 1.2.) What is the variable I’m manipulating because I think it influences a behavior? Amount of exercise; so Table 1.5 it is my independent Diagram of an Experiment Involving the Independent Variable of Number of variable (and the differHours Spent Studying and the Dependent Variable of Number of Errors Made on a ent amounts that parStatistics Test ticipants exercise are my conditions). What Independent Variable: Number of Hours Spent Studying is the variable I’m meaCondition 1: Condition 2: Condition 3: Condition 4: suring because it reflects 1 Hour 2 Hours 3 Hours 4 Hours Dependent Variable: a behavior I think is Number of Errors Made on 13 9 7 5 being influenced? Cona Statistics Test ➞ 12 8 6 3 centration; so it is my 11 7 5 2 dependent variable that produces my data. Chapter 1: Introduction to Statistics and Research Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 13 © mtsyri/Shutterstock.com Petarneych... | Dreamstime.com then at the error scores paired with 2 hours, and so on. The pattern here forms a relationship where, as study-time scores increase, error scores tend to decrease. Essentially, participants in the 1-hour condition produce one batch of error scores, those in the 2-hour condition produce a different, lower batch of error scores, and so on. We use this diagram because it facilitates applying our statistics. For example, it makes sense to compute the average error score in each condition (each column). Notice, however, that we apply statistics to the dependent variable. We do not know what scores our participants will produce, so these are the scores that we need help in making sense of (especially in a more realistic study where we might have 100 different scores in each column). We do not compute anything about the independent variable because we know all about it (e.g., above we have no reason to compute the average of 1, 2, 3, and 4 hours). Rather, the conditions simply form the groups of dependent scores that we then examine. Thus, we will use specific descriptive procedures to summarize the sample’s scores and the relationship found in an experiment. Then, to infer that we’d see a similar relationship if we tested the entire population, we have specific inferential procedures for experiments. Finally, we will translate the relationship back to the original hypothesis about studying and learning that we began with, so that we can add to our understanding of nature. correlational study A design in 1-5c Correlational which participants’ scores on two variables are measured, without manipulation of either variable, to determine whether they form a relationship Studies 14 Not all research is an experiment. Sometimes we do not manipulate or change either variable and instead conduct a correlational study. In a correlational study, the researcher measures par- ticipants’ scores on two variables and then determines whether a relationship is present. Thus, in an experiment the researcher attempts to make a relationship happen, while in a correlational study the researcher is a passive observer who looks to see if a relationship exists. For example, we used a correlational approach previously when we simply asked some students how long they studied for a test and what their test grade was. Or, we would have a correlational design if we asked people their career choices and measured their personality, asking, “Is career choice related to personality type?” As usual, we want to first describe and understand the relationship we’ve observed in the sample, and correlational designs have their own descriptive statistical procedures for doing this. Here we do not know the scores that participants will produce for either variable, so the starting point for making sense of them is often to compute the average score on each variable. Also, to decide about the relationship we would find in the population, we have specific correlational inferential procedures. Finally, as with an experiment, we will translate the relationship back to the original hypothesis about studying and learning that we began with so that we can add to our understanding of nature. In a correlational study, the researcher simply measures participants’ scores on two variables to determine if a relationship exists. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 1-6 THE CHARACTERISTICS > Quick Practice > In a correlational design the researcher measures participants on two variables. More Examples In a study, participants’ relaxation scores are measured after they’ve been in a darkened room for either 10, 20, or 30 minutes. This is an experiment because the researcher controls the length of times in the room. The independent variable is length of time, the conditions are 10, 20, or 30 minutes, and the dependent variable is relaxation. A survey measures participants’ patriotism and also asks how often they’ve voted. This is a correlational design because the researcher passively measures both variables. For Practice 1. In an experiment, the _____ is changed by the researcher to see if it produces a change in participants’ scores on the _____. 2. To see if drinking influences one’s ability to drive, each participants’ level of coordination is measured after drinking 1, 2, or 3 ounces of alcohol. The independent variable is _____, the conditions are _____, and the dependent variable is _____. 3. In an experiment, the _____ variable reflects participants’ behaviors or attributes. 4. We measure the age and income of 50 people to see if older people tend to make more money. What type of design is this? > Answers We have one more issue to consider when selecting the descriptive or inferential procedure to use in a particular experiment or correlational study. Although we always measure one or more variables, the numbers that comprise the scores can have different erent underlying mathematical characteristics. The particular characteristics of our scores determine which procedures we should use, e, because the kinds of math we can perform rm depend on the kinds of numbers we have.. Therefore, always pay attention to two o important characteristics of your scores: the scale of measurement involved and whether the measurements are continuous us or discrete. © iStockphoto.com/RusN > OF SCORES In an experiment, the researcher changes the conditions of the independent variable and then measures participants’ behavior using the dependent variable. 1-6a The Four Types of Measurement Scales Numbers mean different things in different contexts. The meaning of a 1 on a license plate is different from that of a 1 in a race, which is different still from the meaning of a 1 in a hockey score. The kind of information that scores convey depends on the scale of measurement that is used in measuring the variable. There are four types of measurement scales: nominal, ordinal, interval, and ratio. With a nominal scale, we do not measure an amount, but rather we categorize or classify individuals. For example, to “measure” your gender, we classify you as either male or female. Rather than using these labels, however, it is easier for us (and for computers) to use numbers to identify the categories. For example, we might assign a “1” to each male and a “2” to each female. These scores involve a nominal scale because the numbers are used simply for identification (so for nominal, think name). Such scores are assigned arbitrarily—they don’t reflect an amount, and we could use any other numbers. Thus, the key here is that nominal scores indicate only that one individual is qualitatively differnominal scale ent from another. So, the numbers A scale in which on football uniforms or on your each score is used for identification and credit card are nominal scales. In does not indicate an research, we have nominal variables amount when studying different types of 4. correlational 3. dependent 2. amount of alcohol; 1, 2, or 3 ounces; level of coordination 1. independent variable; dependent variable Chapter 1: Introduction to Statistics and Research Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 15 schizophrenia or different therapies. These variables can occur in any scale in which scores design, so for example, in a correlaindicate rank order tional study, we might measure the interval scale political affiliation of participants A scale in which scores measure actual using a nominal scale by assigning amounts; but zero does a 5 to democrats, a 10 to republinot mean zero amount cans, and so on. Then we might also is present, so negative numbers are possible measure participants’ income, to determine whether as party affiliaratio scale A scale in which scores measure tion “scores” change, income scores actual amounts and zero also change. Or, if an experiment means no amount of compares the job satisfaction scores the variable is present, of workers in several different occuso negative numbers are not possible pations, the independent variable is the nominal variable of type of continuous variable A variable occupation. that can be measured A different approach is to use in fractional amounts an ordinal scale. Here the scores discrete indicate rank order—anything that is variable A akin to 1st, 2nd, 3rd, … is ordinal. variable that cannot be measured in fractional (Ordinal sounds like ordered.) In amounts our studying example, we’d have an ordinal scale if we assigned a 1 to students who scored best on the test, a 2 to those in second place, and so on. Then we’d ask, “As study times change, do students’ ranks also tend to change?” Or, if an experiment compares 1st graders to 2nd graders, then this independent variable involves an ordinal scale. The key here is that ordinal scores indicate only a relative amount—identifying who scored relatively high or low. Also, there is no score of 0, and the same amount does not separate every pair of adjacent scores: 1st may be only slightly ahead of 2nd, but 2nd may be miles away from 3rd. Other examples of ordinal variables include clothing size (e.g., small, medium, large), college year (e.g., freshman or sophomore), and letter grades (e.g., A or B). A third approach is to use an interval scale. Here each score indicates an actual quantity, and an equal amount separates any adjacent scores. (For interval scores, remember equal intervals between them.) However, although interval scales do include © udra11/Shutterstock.com ordinal scale A 16 the number 0, it is not a true zero—it does not mean that none of the variable is present. Therefore, the key is that you can have less than this amount, so an interval scale allows negative numbers. For example, temperature (in Celsius or Fahrenheit) involves an interval scale: Because 0° does not mean that zero heat is present, you can have even less heat at ⫺1°. In research, interval scales are common with intelligence or personality tests: A score of zero does not mean zero intelligence or zero personality. Or, in our studying research we might determine the average test score and then assign students a zero if they are average; a ⫹1, ⫹2, etc., for the amount they are above average; and a ⫺1, ⫺2, etc., for the amount they are below average. Then we’d see if more positive scores tend to occur with higher study times. Or, if we create conditions based on whether participants are in a positive, negative, or neutral mood, then this independent variable reflects an interval scale. The final approach is to use a ratio scale. Here, like interval scores, each score measures an actual quantity, and an equal amount separates adjacent scores. However, 0 truly means that none of the variable is present. Therefore, the key is that you cannot have negative numbers, because you cannot have less than nothing. Also, only with a true zero can we make “ratio” statements, such as “4 is twice as much as 2.” (So for ratio, think ratio!) We used ratio scales in our previous examples when measuring the number of errors and the number of hours studied. Likewise, if we compare the conditions of having people on diets consisting of either 1,000, 1,500, or 2,000 calories a day, then this independent variable involves a ratio scale. Other examples of ratio variables include the level of income in a household, the amount of time required to complete a task, or the number of items in a list to be recalled by participants. (See Review Card 1.2 for a summary of the four scales of measurement.) We can study relationships that involve any combination of the above scales. 1-6b Continuous versus Discrete Variables In addition to considering the scale used to measure a variable, you must also consider whether the variable is continuous or discrete. A continuous variable can be measured in fractional amounts, so decimals make Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. sense. That is, our measurements continue between the whole-number amounts, and there is no limit to how small a fraction may be. Thus, the variable of age is continuous because it is perfectly intelligent to say that someone is 19.6879 years old. On the other hand, some variables are discrete variables, which are measured in fixed amounts that cannot be broken into smaller amounts. Usually the amounts are labeled using whole numbers, so decimals do not make sense. For example, being male or female, or being in 1st grade versus 2nd grade are discrete variables, because you can be in one group or you can be in the other group, but you can’t be in between. Some variables may be labeled using fractions, as with shoe sizes, but they are still discrete because they cannot be broken into smaller units. Usually researchers assume that variables measured using nominal or ordinal scales are discrete, but that variables measured using interval or ratio scales are at least theoretically continuous. For example, intelligence tests are designed to produce wholenumber scores, so you cannot have an IQ of 95.6. But theoretically an IQ of 95.6 makes sense, so intelligence is a theoretically continuous (interval) variable. Likewise, it sounds strange if the government reports that the average family has 2.4 children, because this is a discrete (ratio) variable and no one has .4 of a child. However, it makes sense to treat this as theoretically continuous, because we can interpret what it means if the average this year is 2.4, but last year it was 2.8. (I’ve heard that a recent survey showed the average American home contains 2.5 people and 2.7 televisions!) > Quick Practice > > Nominal scales identify categories and ordinal scales reflect rank order. Both interval and ratio scales measure actual quantities, but negative numbers can occur with interval scales and not with ratio scales. Interval and ratio scales are assumed to be continuous, which allows fractional amounts; nominal and ordinal scales are assumed to be discrete, which does not allow fractional amounts. More Examples If your grade on an essay exam is based on the number of correct statements you include, then a ratio scale is involved; if it is based on how much your essay is better or worse than what the professor expected, an interval scale is involved; if it indicates that yours was relatively one of the best or worst essays in the class, this is an ordinal scale (as is pass/fail); if it is based on the last digit of your ID number, then a nominal scale is involved. If you can receive one grade or another, but nothing in between, it involves a discrete scale; if fractions are possible, it involves a continuous scale. For Practice 1. Whether you are ahead or behind when gambling involves a(n) _____ scale. 2. The number of hours you slept last night involves a(n) _____ scale. 3. Your blood type involves a(n) _____ scale. 5. If scores can contain fractions, the variable is _____; if fractions are not possible, the variable is_____. > Answers 5. continuous; discrete 4. ordinal Whether a variable is continuous or discrete and whether it is measured using a nominal, ordinal, interval, or ratio scale are factors that determine which statistical procedure to apply. 4. Whether you are a lieutenant or major in the army involves a(n) _____ scale. 3. nominal 2. ratio 1. interval Chapter 1: Introduction to Statistics and Research Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 17 Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out and use the Chapter Review Cards in the back of your book. Check out the additional study aids online in CourseMate at www.cengagebrain.com STUDY PROBLEMS (Answers for odd-numbered problems are in Appendix C.) 1. What is the goal of behavioral research? 2. Why is it important for students of behavioral research to understand statistics? 3. (a) What is a population? (b) What is a sample? (c) How are samples used to make conclusions about populations? (d) What are researchers really referring to when they talk about the population? 4. (a) What is a variable? (b) What is a quantitative variable? (c) What is a qualitative variable? 5. What pattern among the X and Y scores do you see when: (a) A relationship exists between them? (b) No relationship is present? 6. What is the difference in the pattern among the X and Y scores between (a) a perfectly consistent relationship and (b) a less consistent (weaker) relationship? 7. (a) What is a representative sample? (b) What is an unrepresentative sample? (c) What produces an unrepresentative sample? 8. What is the general purpose of experiments and correlational studies? 9. What is the difference between an experiment and a correlational study? 10. In an experiment, what is the dependent variable? 11. What is the difference between the independent variable and the conditions of the independent variable? 12. (a) What are descriptive statistics used for? (b) What are inferential statistics used for? 13. (a) What is the difference between a statistic and a parameter? (b) What types of symbols are used for statistics and for parameters? 17. Researcher A gives participants various amounts of alcohol and then observes any decrease in their ability to walk. Researcher B notes the various amounts of alcohol that participants drink at a party and then observes any decrease in their ability to walk. Which study is an experiment, and which is a correlational study? Why? 18. Maria asked a sample of college students about their favorite beverage. Based on what the majority said, she concluded that most college students prefer drinking carrot juice to other beverages! What statistical argument can you give for not accepting this conclusion? 19. In the following experiments, identify the independent variable, the conditions, and the dependent variable: (a) studying whether final exam scores are influenced by whether concurrent background music is soft, loud, or absent; (b) comparing students from small, medium, and large colleges with respect to how much fun they have during the semester; (c) studying whether being first-, second-, or third-born is related to intelligence; (d) studying whether length of daily exposure to a sunlamp (15 versus 60 minutes) accounts for differences in depression; (e) studying whether being in a room with blue walls, green walls, or red walls influences aggressive behavior in adolescents. 20. Use the words relationship, sample, population, statistic, and parameter to describe the flow of a research study to determine whether a relationship exists in nature. 21. Which of the following data sets show a relationship? Sample A Sample B Sample C Sample D 14. Define the four scales of measurement. X Y X Y X Y X Y 15. (a) Distinguish between continuous and discrete variables. (b) Which scales are usually assumed to be discrete, and which are assumed to be continuous? 1 1 1 2 2 3 3 10 10 10 20 20 30 30 20 20 22 22 23 24 24 40 42 40 41 40 40 42 13 13 13 13 13 13 13 20 19 18 17 15 14 13 92 92 92 95 95 97 97 76 75 77 74 74 73 74 16. What are the two aspects of a study to consider when selecting the descriptive or inferential statistics you should employ? 18 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 22. Which sample in problem 21 shows the most consistent relationship? How do you know? 25. In the chart below, identify the characteristics of each variable. 23. What pattern do we see when the results of an experiment show a relationship? 24. Indicate whether a researcher would conduct an experiment or a correlational study when studying: (a) whether different amounts of caffeine consumed in 1 hour influence speed of completing a complex task; (b) the relationship between number of extracurricular activities and GPA; (c) the relationship between the number of pairs of sneakers owned and the person’s athleticism; (d) how attractive men rate a woman when she is wearing one of three different types of perfume; (e) the relationship between GPA and the ability to pay off school loans; (f) the influence of different amounts of beer consumed on a person’s mood. Variable Personality type Academic major Number of minutes before and after an event Restaurant ratings (best, next best, etc.) Speed (miles per hr) Dollars in your pocket Change in weight (in lb) Savings account balance Reaction time Letter grades Clothing size Registered voter Therapeutic approach Schizophrenia type Work absences Words recalled Continuous or Discrete Type of Measurement Scale _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ Chapter 1: Introduction to Statistics and Research Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 19 Chapter 2 CREATING AND USING FREQUENCY DISTRIBUTIONS LOOKING BACK GOING F O R WA R D Be sure you understand from Chapter 1: Your goals in this chapter are to learn: • What nominal, ordinal, interval, and ratio scales of measurement are. • What frequency is and how a frequency distribution is created. • What continuous and discrete measurements are. • When to graph frequency distributions using a bar graph, histogram, or polygon. • What normal, skewed, and bimodal distributions are. • What relative frequency and percentile are and how we use the area under the normal curve to compute them. Sections S o we’re off into the world of descriptive statistics. Recall that a goal is to make sense of the scores by organizing and summarizing them. One important way to do this 2-1 Some New Symbols and Terminology 2-2 Understanding Frequency Distributions 2-3 Types of Frequency Distributions we first summarize the scores on each variable alone. Therefore, 2-4 Relative Frequency and the Normal Curve from one variable by using a frequency distribution. You’ll see 2-5 Understanding Percentile and Cumulative Frequency (2) the common patterns found in frequency distributions, and is to create tables and graphs, because they show the scores you’ve obtained and they make it easier to see the relationship between two variables that is hidden in the data. Before we examine the relationship between two variables, however, this chapter will discuss the common ways to describe scores (1) how to show a frequency distribution in a table or graph, (3) how to use a frequency distribution to compute additional information about scores. 20 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. © Sai Yeung Chan/Shutterstock.com 2-1 SOME NEW SYMBOLS AND TERMINOLOGY © iStockphoto.com/Joanne Green/Blackjake The scores we initially measure in a study are ar called the raw scores. Descriptive statistics help us boil down raw scores into an interpretable, “dig “digestible” form. There are several way ways to do this, but the starting point is to count the number of times each sc score occurred. The number of times a sco score occurs in a set of data is the score’s frequency. If we examine the frequencies freque of every score in the data, we create a fr frequency distribution. The term distribution is th the general name researchers have for any organized org set of data. In a frequency dist distribution, the scores are organized based on each score’s frequency. (Actually, resear researchers have several ways to describe freque frequency, so technically, when we simply cou count the frequency of each score, we are cr creating a simple frequency distribution.) The symbo symbol for a score’s frequency is the lowercas lowercase f. To find f for a score, count how many times that score occurs. If three participants scored 66, then 66 occurred three times, so the frequency of 66 is 3 and so f ⫽ 3. Creating a frequency distribution involves counting the frequency of every score in the data. In most statistical procedures, we also count the total number of scores we have. The symbol for the total number of scores in a set of data is the uppercase N. Thus, N ⫽ 43 means that we have 43 scores. Note that N is not the number of different scores, so even if all 43 scores in a sample are the same score, N still equals 43. raw scores The scores initially measured in a study frequency (f ) The frequency of a score is symbolized by f. The total number of scores in the data is symbolized by N. The number of times each score occurs in a set of data; also called simple frequency frequency distribution A distribution showing the number of times each score occurs in the data Chapter 2: Creating and Using Frequency Distributions Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 21 FREQUENCY DISTRIBUTIONS The first step when trying to understand any set of scores is to ask the most obvious question, “What are the scores that were obtained?” In fact, buried in any data are two important things to know: Which scores occurred, and how often did each occur? These questions are answered simultaneously by looking at the frequency of each score. Thus, frequency distributions are important because they provide a simple and clear way to show the scores in a set of data. Because of this, they are always the first step when beginning to understand the scores from a study. Further, they are also a building block for upcoming statistical procedures. One way to see a frequency distribution is in a table. 2-2a Presenting Frequency in a Table Let’s begin with the following raw scores. (They might measure one of the variables from a correlational study, or they might be dependent scores from an experiment.) 14 13 14 14 13 13 15 14 11 15 15 17 13 14 10 14 12 15 In this disorganized arrangement, it is difficult to make sense of these scores. Watch what happens, though, when we arrange them into the frequency table in Table 2.1. Table 2.1 Simple Frequency Distribution Table The left-hand column identifies each score, and the right-hand column contains the frequency with which the score occurred. Score 17 16 15 14 13 12 11 10 22 f 1 0 4 6 4 1 1 1 Total: 18 ⫽ N © iStockphoto.com/Vladimir 2-2 UNDERSTANDING Researchers have several rules of thumb for making a frequency table. Start with a score column and an f column. The score column has the highest score in the data at the top of the column. Below that are all possible whole-number scores in decreasing order, down to the lowest score that occurred. Here, our highest score is 17, the lowest score is 10, and although no one obtained a score of 16, we still include it. In the f column opposite each score is the score’s frequency: In the sample there is one 17, zero 16s, four 15s, and so on. Not only can we see the frequency of each score, we can also determine the combined frequency of several scores by adding together their individual fs. For example, the score of 13 has an f of 4 and the score of 14 has an f of 6, so their combined frequency is 10. Notice that, although 8 scores are in the score column, N is not 8. We had 18 scores in the original sample, so N is 18. You can see this by adding together all of the individual frequencies in the f column: The 1 person scoring 17 plus the 4 people scoring 15, and so on, adds up to the 18 people in the sample. In a frequency distribution, the sum of the frequencies always equals N. > Quick Practice > A frequency distribution shows the number of times participants obtained each score. More Examples The scores 15, 16, 13, 16, 15, 17, 16, 15, 17, and 15 contain one 13, no 14s, four 15s, and so on, producing the frequency table to the right: Scores f 17 2 16 3 15 4 14 0 13 1 For Practice 1. What is the difference between f and N? 2. Create a frequency table for these scores: 7, 9, 6, 6, 9, 7, 7, 6, and 6. (continued) Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 3. What is the N here? 4. What is the frequency of 6 and 7 together? > Answers this is a nominal variable, they can bar graph A be arranged in any order. In the fregraph showing a vertical bar over each quency table, we see that 6 people X score, but adjacent were Republicans, so we draw a bar bars do not touch; at a height of 6 above “Rep.,” and used with nominal or ordinal scores so on. The lower table and graph are from a survey in which we counted the number of participants having a particular military rank (an ordinal variable). The ranks are arranged on the X axis from lowest to highest. Again, the height of each q y bar is the “score’s” frequency. 4. f ⫽ 3 ⫹ 4 ⫽ 7 3. N ⫽ 9 4 6 3 7 0 8 2 9 f Scores © iStockphoto.com/Bridgette Braley 2. 1. f is the number of times a score occurs; N is the total number of scores in the data. 2-2b Graphing a Frequency Distribution When researchers talk of a frequency distribution, they often imply a graph that shows the frequencies of each score. (A review of basic graphing is in Appendix A.1.) To graph a frequency distribution, place the scores on the X axis. Place frequency on the Y axis. Then we have several ways to draw the graph of a frequency distribution, depending on the scale of measurement that the raw scores reflect. We may create a bar graph, a histogram, or a polygon. Figure 2.1 Frequency Bar Graphs for Nominal and Ordinal Data ta The height of each ach bar indicates the frequency of the corresponding score on the X axis. Nominal Variable of Political Affiliation Party f Libertarian Socialist Democrat Republican 1 3 8 6 8 7 6 5 f 4 3 2 1 0 CREATING BAR GRAPHS We graph a frequency distribution of nominal or ordinal scores by creating a bar graph. A bar graph has a vertical bar centered over each X score and the height of the bar corresponds to the score’s frequency. Notably, adjacent bars do not touch. Figure 2.1 shows the frequency tables and bar graphs of two samples. The upper table and graph are from a survey in which we counted the number of participants in each category of the nominal variable of political party affiliation. The X axis is labeled using the “scores” of political party, and because Ordinal Variable of Military Rank Rank f General Colonel Lieutenant Sergeant 3 8 4 5 Rep. Dem. Soc. Political affiliation Lib. Sgt. Lt. Col. Military rank Gen. 8 7 6 5 f 4 3 2 1 0 Chapter 2: Creating and Using Frequency Distributions Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 23 Figure 2.2 Histogram Showing the Frequency of Parking Tickets in a Sample A graph of a frequency distribution always shows the scores on the X axis and their frequency on the Y axis. histogram A frequency graph similar to a bar graph but with adjacent bars touching; used with a small range of interval or ratio scores frequency polygon A frequency graph showing a data point above each score, with the adjacent points connected by straight lines; used with many different interval or ratio scores data point A dot plotted on a graph to represent a pair of X and Y scores The reason we create bar graphs with nominal and ordinal scales is that both are discrete scales: You can be in one group or the next, but not in between. The space between the bars communicates this. On the other hand, recall that interval and ratio scales are usually assumed to be at least theoretically continuous: They allow fractional amounts that continue between the whole numbers. To communicate this, these scales are graphed using continuous (connected) figures. We may create two types of graphs here, either a histogram or a polygon. CREATING HISTOGRAMS We create a histogram when we have a small number of different interval or ratio scores. A histogram is similar to a bar graph except that in a histogram, the adjacent bars touch. For example, say that we measured the ratio variable of number of parking tickets that participants received, obtaining the data in Figure 2.2. Again, the height of each bar indicates the corresponding score’s frequency. Because the adjacent bars touch, there is no gap between the scores on the X axis. This communicates In a histogram the adjacent bars touch; in a bar graph they do not. 24 Score f 7 6 5 4 3 2 1 1 4 5 4 6 7 9 9 8 7 6 f 5 4 3 2 1 0 1 2 3 4 5 6 Number of parking tickets 7 that the X variable is continuous, with no gaps in our measurements. CREATING FREQUENCY POLYGONS Usually, we don’t create a histogram when we have many different interval or ratio scores, such as if our participants had from 1 to 50 parking tickets. The 50 bars would need to be very skinny, so the graph would be difficult to read. We have no rule for what number of scores is too large, but when a histogram is unworkable, we create a frequency polygon. Construct a frequency polygon by placing a “dot” over each score on the X axis at the height that corresponds to the appropriate frequency on the Y axis. Then connect adjacent dots with straight lines. To illustrate this, Figure 2.3 shows the previous parking ticket data plotted as a frequency polygon. For an X of 1, the frequency is 9; for an X of 2, f ⫽ 7; and so on. Because each line continues between two adjacent dots, we again communicate that our measurements continue between the two scores on the X axis, meaning that this is a continuous variable. Notice that the polygon also includes on the X axis the next score above the highest score in the data and the next score below the lowest score (in Figure 2.3, scores of 0 and 8 are included). These added scores have a frequency of 0, so the curve touches the X axis. In this way we create a complete geometric figure—a polygon—with the X axis as its base. Also, here is an important new term: A “dot” plotted on any graph is called a data point. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. > Quick Practice Figure 2.3 Frequency Polygon Showing the Frequency of Parking Tickets in a Sample Score f 7 6 5 4 3 2 1 1 4 5 4 6 7 9 > 9 8 7 6 f Create a bar graph with nominal or ordinal scores, a histogram with a few interval/ratio scores, and a polygon with many different interval/ratio scores. grouped distribution A distribution created by combining individual scores into small groups and then reporting the total frequency (or other description) of each group 5 4 3 2 1 0 1 2 3 4 5 6 Number of parking tickets 7 8 More Examples Using the data from a survey, (1) create a bar graph of the frequency of male or female participants (a nominal variable); (2) create a bar graph of the number of people who are first-born, second-born, etc. (an ordinal variable); (3) create a histogram of the frequency of participants falling in each of five salary ranges (a few ratio scores); (4) create a polygon of the frequency for each individual salary reported (many ratio scores). For Practice GROUPED DISTRIBUTIONS So far we have created fre- quency distributions that show each individual score. However, sometimes we have too many scores to produce a manageable table or graph. Then we create a grouped distribution. In a grouped distribution, individual scores are first combined into small groups, and then we report the total frequency (or other information) for each group. For example, in some data we might group the scores 0, 1, 2, 3, and 4 into the “0–4” group. Then we would add the f for the score of 0 to the f for the score of 1, and so on, to obtain the frequency of all scores between 0 and 4. Likewise, we would combine the scores between 5 and 9 into another group, and so on. Then we report the total f for each group. This technique can be used to reduce the size of a table and to make bar graphs, histograms, and polygons more manageable. When graphing a grouped frequency distribution, on the X axis we use the middle score in each group to represent the group. Thus, the X of 2 would represent the 0–4 group, and the X of 7 would represent the 5–9 group. On the Y axis we plot the total frequency of all scores in each group. (Consult an advanced statistics book for more about creating grouped distributions.) 1. A _____has a separate bar above each score, a _____ contains bars that touch, and a ______ has dots connected with straight lines. 2. A “dot” plotted on a graph is called a ______. 3. To show the frequency of people who are above an average weight by either 0, 5, 10, or 15 pounds, plot a _____. 4. To show the number in a sample preferring chocolate or vanilla ice cream, plot a _____. 5. To show the number of people who are above average weight by each amount between 0 and 100 pounds, plot a _____. > Answers 1. bar graph; histogram; polygon 2. data point 3. histogram 4. bar graph 5. polygon Thus, in Figure 2.3 we placed a data point over the X of 4 at an f of 4. 2-3 TYPES OF FREQUENCY DISTRIBUTIONS Although bar graphs and histograms are more common in published research, polygons are an important component of statistical procedures. This is because, in many different situations, nature produces frequency Chapter 2: Creating and Using Frequency Distributions Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 25 © iStockphoto.com/Marek Uliasz distributions that have similar characteristics and that form polygons having the same general shape. We have several common shapes, and your first task is to learn their names. When we apply them, we think of the ideal version of the polygon that would be produced in the population. By far the most important frequency distribution is the normal distribution. (This is the big one, folks.) 2-3a The Normal Distribution Figure 2.4 shows the polygon of the ideal normal distribution for some test scores from a population. Although specific mathematical properties define this polygon, in general it is a bell-shaped curve. But don’t call it a bell curve (that’s so pedestrian!). Call it a normal curve or a normal distribution, or say that the scores are normally distributed. Because this polygon represents an infinite population, it is slightly different from that for a sample. First, we cannot count the f of each score, so no numbers occur on the Y axis. Simply remember that frequencies increase as we proceed higher up the Y axis. Second, the polygon is a smooth curved line. The population contains so many different normal curve The symmetrical, whole and decimal scores that the indibell-shaped curve vidual data points form the curved line. produced by graphing Nonetheless, to see the frequency of a a normal distribution score, locate the score on the X axis normal and then move upward until you reach distribution A set of scores in which the line forming the polygon. Then, the middle score has the highest frequency and, proceeding toward higher or lower scores, the frequencies at first decrease slightly but then decrease drastically, with the highest and lowest scores having very low frequency Figure 2.4 The Ideal Normal Curve Scores farther above and below the middle scores occur with progressively lower frequencies. f tail of the distribution The far-left or far-right portion of a frequency polygon containing the relatively low-frequency, extreme scores 26 moving horizontally, determine whether the frequency of the score is relatively high or low. As you can see, in a normal distribution, the score with the highest frequency is the middle score (in Figure 2.4 it is the score of 30). The normal curve is symmetrical, meaning that the left half below the middle score is a mirror image of the right half above the middle score. As you proceed away from the middle, the frequencies decrease, with the highest and lowest scores having relatively very low frequency. However, no matter how low or high a score might be, the curve never actually touches the X axis. This is because, in an infinite population, theoretically any score might occur sometime, so the frequencies approach—but never reach—zero. Note: In the language of statistics, the portions of a normal curve containing the relatively low-frequency, extreme high or extreme low scores are each called a tail of the distribution. In Figure 2.4 the tails are roughly below the score of 15 and above the score of 45. The reason that the normal curve is important is because it is a very common distribution in behavioral research. For most of the variables that we study, most of the individuals have scores at or close to the middle score, with progressively fewer individuals scoring at the more extreme, higher or lower scores. Because of this, the normal curve is very common in our upcomingstatistical procedures. Therefore, before you proceed, be sure that you can read the normal curve. Can you see in Figure 2.4 that the most frequent scores are between 25 and 35? Do you see that a score of 15 has a relatively low frequency and a score of 45 Tail Tail 0 ... 5 10 15 20 25 30 35 Test scores 40 45 50 55 ... Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. negatively skewed distribution Figure 2.5 Idealized Skewed Distributions An asymmetrical distribution with lowfrequency, extreme low scores, but without corresponding low-frequency, extreme high scores; its polygon has only one pronounced tail, over the lower scores The direction in which the distinctive tail slopes indicates whether the skew is positive or negative. Negative skew Positive skew f f Low ... Middle scores ... High Low has the same low frequency? Do you see that there are relatively few scores in the tail above 50 or in the tail below 10? Do you see that the farther into a tail a score lies, the less frequently the score occurs? 2-3b Skewed Distributions Middle scores ... High positively skewed distribution An asymmetrical distribution with lowfrequency, extreme high scores, but without corresponding low-frequency, extreme low scores; its polygon has only one pronounced tail, over the higher scores The polygon on the right in Figure 2.5 shows a positively skewed distribution. This pattern is often found, for example, when measuring participants’ “reaction times” to a sudden stimulus. Usually, scores tend to be rather low (fast), but every once in a while, a person “falls asleep at the switch,” requiring a large amount of time that produces a high score. (To remember positively skewed, remember that the tail slopes away from zero, toward the higher, positive scores.) Whether a skewed distribution is negative or positive corresponds to whether the distinct tail slopes toward or away from zero. © iStockphoto.com/Jason Lugo Not all variables form normal distributions. One of the most common non-normal distributions is a skewed distribution. A skewed distribution is similar to a normal distribution except that it produces a polygon with only one pronounced tail. As shown in Figure 2.5, a distribution may be either negatively skewed or positively skewed. A negatively skewed distribution contains low-frequency, extreme low scores but does not contain low-frequency, extreme high scores.The polygon on the left in Figure 2.5 shows an idealized negatively skewed distribution. This pattern might be found, for example, by measuring the running speed of professional football players. Most would tend to run at the higher speeds, but a relatively few linemen lumber in at the slower speeds. (To remember negatively skewed, remember that the pronounced tail is over the lower scores, sloping toward zero, where the negative scores would be.) On the other hand, a positively skewed distribution contains low-frequency, extreme high scores but does not contain low-frequency, extreme low scores. ... Chapter 2: Creating and Using Frequency Distributions Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 27 bimodal distribution A distribution forming a symmetrical polygon with two humps where there are relatively highfrequency scores, with center scores that have the same frequency 2-3c Bimodal Distributions An idealized bimodal distribution is shown in Figure 2.6. A bimodal distribution forms a symmetrical polygon containing two distinct humps, each reflecting relatively high-frequency scores. At the center of each hump is one score that occurs more frequently than the surrounding scores, and technically the two center scores have the same frequency. Such a distribution would occur with test scores, for example, if most students scored around 60 or 80, with few students falling far below 60 or scoring in the 70s or 90s. 2-3d Labeling Frequency Distributions Figure 2.6 Idealized Bimodal Distribution Bimodal f Low ... Middle scores ... approximately normal curve in every study, we simplify the task by using the ideal normal curve we saw previously as our one “model” of any distribution that generally has this shape. This gives us one reasonably accurate way of envisioning the many approximately normal distributions that researchers encounter. The same is true for the other shapes we’ve seen. We also apply the names of the previous distributions to samples as a way of summarizing and communicating their general shapes. Figure 2.7 shows several examples, as well as the corresponding labels we might use. (Notice that we even apply these names to histograms and bar graphs.) We assume that in the population, the additional scores and their frequencies would “fill in” the sample curve, smoothing it out to be closer to the ideal curve. You need to know the names of the previous distributions because descriptive statistics describe the important characteristics of data, and one very important characteristic is the shape of the frequency distribution. First, the shape allows us to understand the data. If, for example, I tell you my data form a normal distribution, you can mentally envision the distribution and instantly understand how my participants generally performed. Also, the shape is important in determining which statistical procedures to employ. Many of our statistics are applied only when we have a normal distribution, while others are for non-normal Figure 2.7 distributions. Therefore, Simple Frequency Distributions of Sample Data with Appropriate Labels the first step when examNormal Positively skewed ining any data is to identify the shape of the frequency distribution that f f is present. Data in the real world, however, never form the perfect curves Low ... Middle ... High Low ... Middle ... we’ve discussed. Instead, scores scores the scores will form a Bimodal Negatively skewed bumpy, rough approximation to the ideal distribution. For example, f f data never form a perfect normal curve, and at best only come close to that Low ... Middle ... Low ... Middle ... High shape. However, rather scores scores than drawing a different, 28 High High High Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 2-4 RELATIVE FREQUENCY > Quick Practice > AND THE NORMAL CURVE The most common frequency distributions are normal distributions, negatively or positively skewed distributions, and bimodal distributions. More Examples The variable of intelligence (IQ) usually forms a normal distribution: The most common scores are in the middle, with higher or lower IQs occurring progressively less often. If IQ was positively skewed, there would be only one distinct tail, located at the higher scores. If IQ was negatively skewed, there would be only a distinct tail at the lower scores. If IQ formed a bimodal distribution, there would be two distinct humps in the curve containing the highest-frequency scores. For Practice 1. Arrange the scores below from most frequent to least frequent. We will return to frequency distributions—especially the normal curve—throughout this course. However, counting the frequency of scores is not the only thing we do. Another important procedure is to describe scores using relative frequency. 2-4a Understanding Relative Frequency Relative frequency is the proportion of the time that a score occurs in a distribution. Any proportion is a decimal number between 0 and 1 that indicates a fraction of the total. Thus, we use relative frequency to indicate what fraction of the sample is produced by the times that a particular score occurs. In other words, we determine the proportion of N that is made up by the f of a score. In symbols we have this formula: THE FORMULA FOR RELATIVE FREQUENCY IS Relative frequency ⫽ f N f A B C D 2. What label should be given to each of the following? f f Scores (a) Scores (b) f f Scores (c) Scores (d) > Answers Simply divide the score’s frequency (f) by the total number of scores (N). For example, if a score occurred 5 times in a sample of 10 scores, then Relative frequency ⫽ f 5 ⫽ ⫽ .50 N 10 The score has a relative frequency of .50, meaning that the score occurred .50 of the time in this sample. Or, say that a score occurred 6 times out of an N of 9. Then its relative frequency is 6/9, which is .67. We usually “round off” relative frequency to two decimals. Finally, we might find that several scores have a combined frequency of 10 in a sample of 30 scores: 10/30 equals .33, so these scores together have a relative frequency of .33—they make up .33 of this sample. As you can see here, one reason to compute relative frequency is relative because it can be easier to interpret frequency The proportion of time than simple frequency. Saying that a score occurs in a a score has a frequency of 6 can distribution be difficult to interpret because we 1. C, B, A, D 2. a. positively skewed; b. bimodal; c. normal; d. negatively skewed Chapter 2: Creating and Using Frequency Distributions Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 29 have no frame of reference—is this a high frequency or not? However, we can easily interpret the relative frequency of .67 because it means that the score occurred 67% of the time. We can also begin with relative frequency and work in the other direction, computing the corresponding simple frequency. To transform relative frequency into simple frequency, we multiply the relative frequency times N. Thus, if a score’s relative frequency is .4 when N is 10, then (.40)(10) gives f ⫽ 4. Finally, sometimes we transform relative frequency to percent so that we have the percent of the time that a score occurs. (Remember that officially, relative frequency is a proportion.) To transform relative frequency to a percent, multiply the proportion by 100. If a score’s relative frequency is .50, then we have (.50)(100), so this score occurred 50% of the time. To transform a percent back into relative frequency, divide the percent by 100. The score that is 50% of the sample has a relative frequency of 50%/100 ⫽ .50. (For further review of proportions, percents, and rounding, consult Appendix A.1.) When reading research, you may encounter frequency tables that show relative frequency. Here, the raw scores are arranged as we saw previously, but next to the simple frequency column, an additional column shows each score’s relative frequency. Also, the rules for creating the different graphs we’ve seen are the same inferential procedures is to compute relative frequency using the normal curve. 2-4b Finding Relative Frequency Using the Normal Curve To understand how we use the normal curve to compute relative frequency, first think about the curve in a novel way. Imagine you are in a helicopter flying over a large parking lot that contains a mass of people crowded together. The outline of the mass has that bell shape of a normal curve. Upon closer inspection, you see an X axis and a Y axis laid out on the ground, and at the marker for each X score are people standing in line who received that score. The lines of people are packed so tightly together that, from the air, all you see are the tops of many heads, in a solid “sea of humanity.” If you paint a line that goes behind the last person in line at each score, you would have the normal curve shown in Figure 2.8. From this perspective, the height of the curve above any score reflects the number of people standing in line at that score. Thus, in Figure 2.8, the score of 30 has the highest frequency because the longest line of people is standing at this score in the parking lot. Likewise, say that we counted the people in line at each score between 30 and 35. If we added them together, we would have the combined frequencies for these scores. The reason for using this “parking lot view” is so you won’t think of the normal curve as just a line floating g above the X axis. Instead, think of the space under the curve—the space between the polyun gon’s line and the X axis—as forming a solid figure that has area. This area represents the individuals and their scores in our data. The entire parking lot contains everyone Relative frequency indicates the proportion of time (out of N) that a score occurred. © iStockphoto.com/Maria Toutoudaki Figure 2.8 8 for relative frequency, except that the Y axis shows values of relative frequency ranging from 0 to 1.0. You should know what relative frequency is, but we will not emphasize the above formula. Instead, you will see that a core element of descriptive and 30 Parking Lot View of the Ideal Normal Curve The height of the curve above any score reflects the number of people standing at that score. f Tail Tail 0 ... 5 10 15 20 25 30 Scores 35 40 45 50 55 ... Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. we’ve studied and 100% of the scores. Therefore, this portion of the curve is the space proportion any portion of the parking lot—any portion of the occupied by the people having those of the area under the area under the curve—corresponds to that portion of scores. We then compare this area, to curve The our data, which is its relative frequency. the total area to determine the proporproportion of the For example, in Figure 2.8, a vertical line is drawn tion of the area under the curve total area under through the middle score of 30, and so half (.50) of the that we have selected. This proportion the normal curve at certain scores, which parking lot is to the left of that line. Because the comcorresponds to the combined relative represents the relative plete parking lot contains all participants, a part that frequency of the selected scores. frequency of those is .50 of the parking lot contains 50% of the particiOf course, statisticians don’t fly scores pants. (We can ignore those relatively few people who around in helicopters, eyeballing parkare straddling the line.) Participants who are standing ing lots, so here’s a different approach: to the left of the line received scores below 30. So, in Say that by using a ruler and protractor, we determine total, 50% of the participants received scores below that in Figure 2.9 the entire polygon occupies an area of 6 30. Now turn this around: If 50% of the participants square inches on this page. This total area corresponds to obtained scores below 30, then the scores below 30 all scores in the sample. Say that the area under the curve occurred 50% of the time. Thus, the scores below 30 between the scores of 30 and 35 covers 2 square inches. have a combined relative frequency of .50. This area is due to the number of times these scores occur. The logic here is so simple that it almost sounds Therefore, the scores between 30 and 35 occupy 2 out of tricky. But it’s not! If you “slice off” one-half of the parkthe 6 square inches created by all scores, so these scores ing lot, then you have one-half of Figure 2.9 the participants and one-half of Finding the Proportion of the Total Area under the Curve the scores, so those scores occur The complete curve occupies 6 square inches, with scores between 30 and 35 occupying 2 square inches. .50 of the time. Or, if your slice is 25% of the parking lot, then you 6 square inches have 25% of the participants and 25% of the scores, so those scores occur .25 of the time. And so on. 2 square inches This is how we describe what f we are doing using statistical terms: The total space occupied by people in the parking lot is the total area under the normal curve. We draw a line vertically to create 0 ... 5 10 15 20 25 30 35 40 45 50 55 ... Scores a “slice” of the polygon containing particular scores. The area of Antonio Scorza/Getty Images Chapter 2: Creating and Using Frequency Distributions Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 31 constitute 2/6, or .33, of the distribution. Thus, the scores between 30 and percentage of all scores in the sample that are 35 occur .33 of the time, so they have a below a particular score relative frequency of .33. Thus, usually we will have normally distributed scores. To determine the relative frequency of particular scores, we will identify the “slice” of the normal curve containing those scores. Then the proportion of the total area under the curve occupied by that slice will equal the relative frequency of those scores. (Although it is possible to create a very narrow slice containing only one score, we will usually seek the combined relative frequency of a number of adjacent scores that form a relatively wide slice.) Using the area under the curve is especially useful because in Chapter 5, you’ll see that statisticians have created a system for easily finding the area under any part of the curve. (No, you won’t need a ruler and a protractor.) Until then: percentile The .50 .15 f 45 50 55 60 65 70 75 Scores 80 85 90 95 For Practice 1. If a score occurs 23% of the time, its relative frequency is _____. 2. If a score’s relative frequency is .34, it occurs _____ percent of the time. 3. If scores occupy .20 of the area under the curve, they have a relative frequency of _____. 4. Say that the scores between 15 and 20 have a relative frequency of .40. They make up _____ of the area under the normal curve. > Quick Practice > > Relative frequency is the proportion of the time that a score occurs. The area under the normal curve corresponds to 100% of a sample, so a proportion of the curve will contain that proportion of the scores, which is their combined relative frequency. More Examples In the following normal curve, the shaded portion is .15 of the total area (so 15% of people in the parking lot are standing at these scores). Thus, scores between 55 and 60 occur .15 of the time, so their combined relative frequency is .15. Above the score of 70 is .50 of the curve, so scores above 70 have a combined relative frequency of .50. 32 > Answers 1. 23%/100 ⫽ .23 2. (.34)(100) ⫽ 34 3. .20 4. .40 The total area under the normal curve corresponds to all scores, so the proportion of this area occupied by some scores is the proportion of the time those scores occur, which is their relative frequency. 2-5 UNDERSTANDING PERCENTILE AND CUMULATIVE FREQUENCY We have one other approach for describing scores, and it is used when we want to know the standing of a particular score in terms of the other scores that are above or below it. Then, the most common procedure is to compute the score’s percentile. A percentile is usually defined as the percent of all scores in the data that are below a particular score. (Essentially, your percentile tells you the percent of all scores that you are beating.) For example, say that the score of 40 is at the 50th percentile. Then we say that 50% of the scores are below 40 (and 50% of the scores are above 40). Or, if you scored at the 75th percentile, then 75% of the group scored lower than you (and 25% scored above you). The formula for computing an exact percentile is very involved, so the easiest way to compute it is by using SPSS. However, you should know the name Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. of another way to organize scores that is part of the computations, called cumulative frequency. Figure 2.10 Normal Distribution Showing the Area under the Curve to the Left of Selected Scores Cumulative frequency is the number of scores in the data that are at or below a particular score. For example, say that .15 of area f in some data, 3 people .85 of area had the score of 20 and 5 people scored below 20. The cumulative frequency for the score of 20 0 ... 5 10 15 is 3 ⫹ 5 ⫽ 8, indicating that 8 people scored at or below 20. If 2 people scored at 21, and 8 scored below 21 (at 20 or below), then the cumulative frequency for 21 is 2 ⫹ 8 ⫽ 10. And so on. Computing a percentile then involves transforming a score’s cumulative frequency into something like a percent of the total. Researchers prefer percentile over cumulative frequency because percentile is usually easier to interpret. For example, if we know only that 10 people scored at 21 or below, it is difficult to evaluate this score. However, knowing that 21 is, for example, at the 90th percentile, gives a clearer understanding of this score and of the entire sample. You may have noticed that with cumulative frequency, we talked of the number of people scoring at or below a score, but with percentile we talked about only those scoring below a score. Technically, a percentile is the percent of the scores at or below a score. However, usually we are dealing with a large sample or population when computing percentile, so the relatively few participants at the score are a negligible portion of the total and we can ignore them. (Recall that we ignored those relatively few people who were straddling the line back in Figure 2.8.) Therefore, researchers usually interpret percentile as the percent of all scores below a particular score. Note: You may encounter names that researchers have for specific percentiles. The 10th percentile is called the first decile, the 20th percentile is the second decile, and so on. Likewise, the 25th percentile is the first quartile, and so on. Because we can ignore the people straddling the line in our parking lot view, a quick way to find an approximate percentile is to use the area under the .50 of area 20 25 30 Scores 35 40 45 50 55 ... normal curve. Percentile describes the cumulative scores that are lower than a particufrequency The number of scores in lar score, and on the normal curve, the data that are at lower scores are to the left. Therefore, or below a particular the percentile for a score corresponds score to the percent of the area under the curve that is to the left of the score. For example, Figure 2.10 shows that .50 of the curve is to the left of the score of 30. Because scores to the left of 30 are below it, 50% of the distribution is below 30 (in the parking lot, 50% of the people are standing to the left of the line and all of their scores are less than 30). Thus, the score of 30 is at the 50th percentile. Likewise, say that we find .15 of the distribution is to the left of the score of 20; 20 is at the 15th percentile. We can also work the other way to find the score at a given percentile. Say that we seek the score at the 85th percentile. We would measure over to the right until 85% of the area under the curve is to the left of a certain point. If, as in Figure 2.10, the score of 45 is at that point, then 45 is at the 85th percentile. Percentile indicates the percent of all scores below a particular score. On the normal curve, a score’s percentile is the percent of the area under the curve to the left of the score. Chapter 2: Creating and Using Frequency Distributions Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 33 USING SPSS Review Card 2.4 provides instructions for using SPSS to compute simple frequency, percent, and percentile in a set of data. You can also create bar graphs and histograms, as well as more elaborate graphs. Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out and use the Chapter Review Cards in the back of your book. Check out the additional study aids online in CourseMate at www.cengagebrain.com STUDY PROBLEMS (Answers for odd-numbered problems are in Appendix C.) 1. What do these symbols mean? (a) N; (b) f ? 2. Why must the sum of all fs in a sample equal N ? 3. (a) What is the difference between a bar graph and a histogram? (b) With what kind of data is each used? 4. What is a dot plotted on a graph called? 5. (a) What is the difference between a histogram and a polygon? (b) With what kind of data is each used? 6. (a) What does it mean when a score is in a tail of a normal distribution? (b) What is the difference between scores in the left-hand tail and scores in the right-hand tail? 7. (a) What is the difference between a score’s simple frequency and its relative frequency? (b) What is the difference between a score’s cumulative frequency and its percentile? 8. (a) What is the advantage of computing relative frequency instead of simple frequency? (b) What is the advantage of computing percentile instead of cumulative frequency? 9. (a) What is the difference between the polygon for a skewed distribution and the polygon for a normal distribution? (b) What is the difference between the polygon for a bimodal distribution and the polygon for a normal distribution? 10. What is the difference between the graphs for a negatively skewed distribution and a positively skewed distribution? 11. What is the difference between how we use the proportion of the total area under the normal curve to determine relative frequency and how we use it to determine percentile? 34 12. In reading psychological research, you encounter the following statements. Interpret each one. (a) “The IQ scores were approximately normally distributed.” (b) “A bimodal distribution of physical agility scores was observed.” (c) “The distribution of the patients’ memory scores was severely negatively skewed.” 13. What type of frequency graph is appropriate when counting the number of: (a) Blondes, brunettes, redheads, or “others” attending a college? (b) People having each body weight reported in a statewide survey? (c) Children in each grade at an elementary school? and (d) Car owners reporting above-average, average, or below-average problems with their car? 14. The distribution of scores on a statistics test is positively skewed. What does this indicate about the difficulty of the test? 15. The distribution of salaries at a large corporation is negatively skewed. (a) What would this indicate about the pay at this company? (b) If your salary is in the tail of this distribution, what should you conclude about your salary? 16. (a) On a normal distribution of exam scores, Crystal scored at the 10th percentile, so she claims that she outperformed 90% of her class. Why is she correct or incorrect? (b) Ernesto’s score is in a tail of the normal curve, so he claims to have one of the highest scores. Why is he correct or incorrect? 17. Interpret each of the following. (a) You scored at the 35th percentile. (b) Your score has a relative frequency of .40. (c) Your score is in the upper tail of the normal curve. (d) Your score is in the left-hand tail of the normal curve. (e) The cumulative frequency of your score is 50. (f) Using the area under the normal curve, your score is at the 60th percentile. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 18. Draw a normal curve and identify the approximate location of the following scores. (a) You have the most frequent score. (b) You have a low-frequency score, but it is higher than most. (c) You have one of the lower scores, but it has a relatively high frequency. (d) Your score seldom occurred. 19. The following shows the distribution of final exam scores in a large introductory psychology class. The proportion of the total area under the curve is given for two segments. (d) Rank-order A, B, C, and D to reflect the order of scores from the highest to the lowest frequency. (e) Rank-order A, B, C, and D to reflect the order of scores from the highest to the lowest score. 21. Organize the ratio scores below in a table and show their simple frequency and relative frequency. 49 52 47 52 52 47 49 47 50 51 50 49 50 50 50 53 51 49 22. Draw a simple frequency polygon using the data in problem 21. f .30 .20 45 50 55 60 65 70 75 Exam scores 80 85 95 90 (a) Order the scores 45, 60, 70, 72, and 85 from most frequent to least frequent. (b) What is the percentile of a score of 60? (c) What proportion of the sample scored below 70? (d) What proportion scored between 60 and 70? (e) What proportion scored above 80? (f) What is the percentile of a score of 80? 20. The following normal distribution is based on a sample of data. The shaded area represents 13% of the area under the curve. f 13% x A x x x B x C x x x D (a) What is the relative frequency of scores between A and B? (b) What is the relative frequency of scores between A and C? (c) What is the relative frequency of scores between B and C? x 23. What type of graph should you create when counting the frequency of: (a) The brands of cell phones owned by students? Why? (b) The different body weights reported in a statewide survey? Why? (c) The people falling into one of eight salary ranges? Why? (d) The number of students who were absent from a class either at the beginning, middle, or end of the semester? Why? 24. An experimenter studies vision in low light by having participants sit in a darkened room for either 5, 15, or 25 minutes and then testing their ability to correctly identify 20 objects. (a) What is the independent variable here? (b) What are the conditions? (c) What is the dependent variable? (d) You would use the scores from which variable to create a frequency distribution? 25. (a) Why do we create a bar graph with a nominal or ordinal X variable? (b) Why do we connect data points with straight lines with an interval or ratio X variable? Chapter 2: Creating and Using Frequency Distributions Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 35 Chapter 3 SUMMARIZING SCORES WITH MEASURES OF CENTRAL TENDENCY LOOKING BACK GOING F O R WA R D Be sure you understand: Your goals in this chapter are to learn: • From Chapter 1, the logic of statistics and parameters, what independent and dependent variables are, and how experiments show a relationship. • What central tendency is. • From Chapter 2, what normal, skewed, and bimodal distributions are, and how to compute percentile using the area under the curve. Sections 3-1 Some New Symbols and Procedures 3-2 What Is Central Tendency? 3-3 Computing the Mean, Median, and Mode 3-4 Applying the Mean to Research 3-5 Describing the Population Mean 36 • What the mean, median, and mode indicate and when each is appropriate. • The uses of the mean. • What deviations around the mean are. • How to interpret and graph the results of an experiment. T he frequency distributions discussed in Chapter 2 are important because the shape of the distribution is always the first important characteristic of data for us to know. However, graphs and tables are not the most efficient way to summarize a distribution. Instead, we compute individual numbers—statistics—that provide information about the scores. This chapter discusses statistics that describe the important characteristic of data called central tendency. The following sections present (1) the concept of central tendency, (2) the three measures of central tendency, and (3) how we use each measure to summarize and interpret data. But first, here are some new symbols and procedures. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Jerry and Marcy Monkman/EcoPhotography.com/Alamy 3-1 SOME NEW SYMBOLS AND PROCEDURES Beginning in this chapter, we will be using common statistical formulas. In them, we use X as the generic symbol for a score. When a formula says to do something to X, it means to do it to all of the scores you are calling X. A new symbol you’ll see is ⌺, the Greek capital letter S, called sigma. Sigma is the “summation sign,” indicating to add together the scores. It always appears with a symbol for scores, especially ⌺X. In words, ⌺X is called “sum of X” and literally means to find the sum of the X scores. Thus, ⌺X for the scores 5, 6, and 9 is 5 ⫹ 6 ⫹ 9, which is 20, so ⌺X ⫽ 20. Notice we do not care whether each X is a different score. If the scores are 4, 4, and 4, then ⌺X ⫽ 12. Also, often the answers from a formula will contain decimals that we must “round off.” The rule for rounding is to round the final answer to two more decimal places than were in the original raw scores. Usually we’ll have whole-number scores and then the answer contains two decimal places, even if they are zeros. However, carry more decimal places during your calculations: e.g., if the answer will have two decimals, have at least three decimal places in your calculations. (See Appendix A.1 to review how to round.) Now on to central tendency. A final answer should contain two more decimal places than are in the original raw scores. 3-2 WHAT IS CENTRAL TENDENCY? Statistics that measure central tendency are important because they answer a basic question about data: Are the scores in a distribution generally high scores or generally low scores? For example, after taking a test in some class, you first wonder how you did, but then you wonder how the whole class did. Did everyone generally score high, low, or what? You need this information to understand both how the class performed and how you performed relative to everyone else. But it is difficult to do this by looking at individual scores, or even at a frequency distribution. Instead, it is better if you know something like the class sum of X (⌺X) average. Likewise, in all research, The sum of the scores the first step is to shrink the data in a sample into one summary score, called a Chapter 3: Summarizing Scores with Measures of Central Tendency Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 37 Norah Levine Photography/Brand X Pictures/Getty Images measure of central tenSo when we ask, “Are the dency, that describes the scores in a distribution gensample as a whole. erally high scores or generTo understand central ally low scores?”, we are tendency, first change your actually asking, “Where on perspective about what a the variable is the distribuscore indicates. Think of tion located?” A measure a variable as a continuum of central tendency is a statis(a straight line) and think tic that indicates the location of a distriof a score as indicating a bution on a variable. Listen to its name: It participant’s location on that indicates where the center of the distribution variable. For example, if I am tends to be located. Thus, it is the point on 70 inches tall, don’t think that I the variable around where most of the scores have 70 inches of height. Instead, are located, and it provides an “address” for as in Figure 3.1, my score is at the distribution. In Sample A in Figure 3.2, the “address” labeled 70 inches. most of the scores are in the neighborhood of If my brother is 60 inches tall, 59, 60, and 61 inches, so a measure of central then he is located at 60. The tendency will indicate that the distribution is idea is not so much that he is located around 60 inches. In Sample B, the 10 inches shorter than I am, distribution is centered at 70 inches. but rather that we are sepaNotice how descriptive statistics allow us rated by a distance of 10 inches. to understand a distribution without looking at Thus, scores are locations, and the difference between every score. If I told you only that one normal distribuany two scores is the distance between them. tion is centered at 60 and another is centered around From this perspective a frequency polygon shows 70, you could mentally envision Figure 3.2 and have the location of all scores in a distribution. For exama good idea about all of the scores in the data. You’ll ple, Figure 3.2 shows the height scores from two samples. In our Figure 3.1 “parking lot view” of each normal curve, participants’ scores determine Locations of Individual Scores on the Variable of Height where they stand: A higher score puts them on the right side of the curve, Lower . . . X X X X X X X X X X X X . . . Higher scores scores a lower score puts them on the left side, and a middle score puts them 70 60 Distance ⫽ 10⬙ in a crowd in the middle. Further, with two distributions containing different scores, then the distributions have difFigure 3.2 ferent locations on the Two Sample Polygons on the Variable of Height variable. Each polygon indicates the locations of the scores and their frequencies. Sample A measures of central tendency Statistics that summarize the location of a distribution on a variable by indicating where the center of the distribution tends to be located 38 Sample B f 0 Lower . . . 58 scores 59 60 61 62 ... Height (in inches) 68 69 70 71 72 . . . Higher scores Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Figure 3.3 A Unimodal Distribution (a) and a Bimodal Distribution (b) The first step in summarizing any set of data is to compute its central tendency. Each vertical line marks a highest point on the distribution, indicating the most frequent score, which is the mode. 4 3 see other statistics that add to our understanding of a distribution, but measures of central tendency are at the core of summarizing data. f 2 1 3-3 COMPUTING THE 0 MEAN, MEDIAN, AND MODE In the following sections we consider the three common ways to measure central tendency: the mode, the median, and the mean. The mode is a score that has the highest frequency in the data. For example, say that we have these scores: 2, 3, 3, 4, 4, 4, 4, 5, 5, 6. The score of 4 is the mode. (There is no conventional symbol for the mode.) The frequency polygon of these scores is shown in the upper portion of Figure 3.3. It shows that the mode does summarize this distribution because the scores are located around 4. Notice that the polygon is roughly a normal curve, with the highest point over the mode. When a polygon has one hump, such as on the normal curve, the distribution is called unimodal, indicating that one score qualifies as the mode. However, we may not always have only one mode. Consider the scores 2, 3, 4, 5, 5, 5, 6, 7, 8, 9, 9, 9, 10, 11, 12. Here, the two scores of 5 and 9 are tied for the most frequent score. This sample is plotted in the lower portion of Figure 3.3. In Chapter 2 such a distribution was called bimodal because it has two modes. Identifying the two modes does summarize this distribution, because most of the scores are either around 5 or around 9. The mode is the preferred measure of central tendency when scores reflect a nominal scale of measurement (when participants are categorized using a qualitative variable). For example, say that we asked some people their favorite flavor of ice cream and 2 3 4 5 Test scores (a) 6 7 4 3 f 3-3a The Mode 1 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Test scores (b) counted the number of people choosing each category. Reporting that the mode was “Goopy Chocolate” does summarize the results, indicating that more people chose this flamode A score having the highest vor than any other. frequency in the data unimodal The mode is a score with the highest frequency and is used to summarize nominal data. A distribution whose frequency polygon has only one hump and thus has only one score qualifying as the mode bimodal A distribution whose frequency polygon shows two humps, each centered over a score having the highest frequency, so there are two modes Chapter 3: Summarizing Scores with Measures of Central Tendency Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 39 There are, however, two potential limitations of the mode. First, the distribution may contain many scores that all have the same highest frequency, and then the mode does not summarize the data. In the most extreme case we might obtain scores such as 4, 4, 5, 5, 6, 6, 7, 7. Here, there is no mode. A second limitation is that the mode does not take into account any scores other than the most frequent score(s), so it ignores much of the information in the data. This can produce a misleading summary. For example, in the skewed distribution containing 7, 7, 7, 20, 20, 21, 22, 22, 23, and 24, the mode is 7. However, most scores are not around 7 but instead are in the low 20s. Thus, the mode may not accurately summarize where most scores in a distribution are located. Because of these limitations, for ordinal, interval, or ratio scores, we usually rely on one of the other measures of central tendency, such as the median. 3-3b The Median 50% of the distribution from the upper 50%. Thus, on the normal curve in Figure 3.4, the score at the vertical line is the 50th percentile, so that score is the median. (Notice that here, the median is also the mode.) Likewise, in the skewed distribution in Figure 3.4, 50% of the curve is to the left of the vertical line, so the score at the line is the median. In both cases, the median is a reasonably accurate “address” for the entire distribution, with most of the scores around that point. We have two ways to calculate the median. Usually we have at least an approximately normal distribution, so one way (the way we’ll use in this book) is to estimate the median as the middle score that separates the two halves of the distribution. However, in real research we need more precision than an estimate can give. Therefore, the other way is to actually compute the median using our data. This involves a very tedious formula, so use SPSS, as described on Review Card 3.4. The median is the preferred measure of central tendency when the data are ordinal scores. For example, say that several students ranked how well a college professor teaches on a scale of 1 to 10. Reporting that the professor’s median ranking was 3 communicates that 50% of the students rated the professor as number 1, 2, or 3. Also, as you’ll see, the median is preferred when interval or ratio scores form a skewed distribution. However, the median still ignores some information in the data because it reflects only the frequency of scores, and doesn’t consider their mathematical values. Therefore, the median is not our first choice for describing the central tendency of normally distributed interval or ratio scores. The median is simply another name for the score at the 50th percentile. Recall that researchers usually say that 50% of the distribution is below the 50th percentile and 50% is above it. Thus, if the score of 10 is the median, then 50% of the scores are below 10 and 50% are above 10. The median presents fewer potential problems than the mode because (1) a distribution can have only one median and (2) the median will usually be around where most of the scores in a distribution are located. The symbol for the median is Mdn. Figure 3.4 illustrates how the Figure 3.4 median summarizes Location of the Median in a Normal Distribution and in a Skewed Distribution a distribution. Recall The vertical line indicates the location of the median, with one-half of the distribution on each side of it. from Chapter 2 that because a score’s Median Median percentile is the proportion of the area under the curve that is to the left of the f f score, the median separates the lower median (Mdn) The score located at the 50th percentile 40 Low ... Middle scores ... High Low ... Middle scores ... High Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. © ollyy/Shutterstock.com The median (Mdn) is the score at the 50th percentile and is used to summarize ordinal or skewed interval or ratio scores. > Quick Practice > > The mode is the most frequent score in the data. distribution, but it is what most people call the average. Compute a mean in the same way that you compute an average: Add up the scores and then divide by the number of scores you added. Unlike the mode or the median, the mean considers the magnitude of every score, so it does not ignore any information in the data. Let’s first compute the mean of a sample. Usually we use X to stand for the raw scores in a sample and then the symbol for the sample mean is X. To compute the X, recall that the symbol that indicates “add up the scores” is ⌺X, and the symbol for the number of scores is N, so The median is the 50th percentile. MORE EXAMPLES With the scores 1, 3, 3, 4, 5, 6, and 9, the most frequent score is 3, so the mode is 3. We calculate that the score having 50% of the scores below it is 4, so the median is 4. For Practice THE FORMULA FOR THE SAMPLE MEAN IS 1. What is the mode in 4, 6, 8, 6, 3, 6, 8, 7, 9, and 8? ⌺X N 2. When is the median the same score as the mode? X⫽ 3. With what types of scores is the mode the preferred statistic? For example, say we have the scores 3, 4, 7, 6: STEP 1: 4. With what types of scores is the median the preferred statistic? Compute ⌺X. Add the scores together. Here, ⌺X ⫽ 4 ⫹ 3 ⫹ 7 ⫹ 6 ⫽ 20 > Answers STEP 2: 4. With ordinal or skewed interval/ratio scores Determine N. Here, N ⫽ 4 3. With nominal scores STEP 3: 2. When the data form a normal curve Divide ⌺X by N. Here, X ⫽ 20>4 ⫽ 5 1. In this bimodal data, both 6 and 8 are modes 3-3c The Mean By far the most common measure of central tendency in behavioral research is the mean. The mean is defined as the score located at the mathematical center of a Saying that the mean of these scores is 5 indicates that the center of this distribution is located at the score of 5. What is the mathematical center of a distribution? Think of the center as the distribution’s balance point. mean The score located at the mathematical center of a distribution X The symbol used to represent the sample mean Chapter 3: Summarizing Scores with Measures of Central Tendency Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 41 Figure 3.5 The Mean as the Balance Point of Any Distribution 1 3 4 6 7 1 3 7 8 X =4 X=5 For example, above, on the left side of Figure 3.5 are the scores of 3, 4, 6, and 7. The mean of 5 is at the point that balances the distribution. On the right side of Figure 3.5, the mean is the balance point in a different distribution that is not symmetrical. Here, the mean of 4 balances this distribution. We compute the mean with only interval or ratio data, because it usually does not make sense to compute the average of nominal scores (e.g., finding the average of Democrats and Republicans) or of ordinal scores (e.g., finding the average position in a race). In addition, the distribution should be symmetrical and unimodal. In particular, the mean is appropriate with a normal distribution. For example, say we have the scores 1, 2, 3, 3, 4, 4, 4, 5, 5, 6, 7, which form the roughly normal distribution shown in Figure 3.6. Here, ⌺X ⫽ 44 and N ⫽ 11, so the mean score—the center—is 4. The reason we use the mean with such data is because the mean is the mathematical center of any distribution: On a normal distribution the center is the point around where most of the scores are located, so the mean is an accurate summary and provides an accurate address for the distribution. > Quick Practice > > The mean is the average score, located at the mathematical center of the distribution. Compute the mean with a normal, or approximately normal, distribution of interval or ratio scores. More Examples To find the mean of the scores 3, 4, 6, 8, 7, 3, and 5: ⌺X ⫽ 3 ⫹ 4 ⫹ 6 ⫹ 8 ⫹ 7 ⫹ 3 ⫹ 5 ⫽ 36, and N ⫽ 7. Then X ⫽ 36>7 ⫽ 5.1428; this rounds to 5.14. For Practice 1. What is the symbol for the sample mean? 2. What is the mean of 7, 6, 1, 4, 5, and 2? 3. With what data is the X appropriate? 4. How is a mean interpreted? > Answers The vertical line indicates the location of the mean score, which is the center of the distribution. 3. With normally distributed interval or ratio scores Location of the Mean on a Normal Distribution 4. It is the center or balance point of the distribution. Figure 3.6 1. X 2. ⌺X ⫽ 25, N ⫽ 6, X ⫽ 4.1666, rounding to 4.17 X 3 f 2 3-3d Comparing the Mean, 1 0 Median, and Mode 1 2 3 4 Scores 42 5 6 7 8 In a normal distribution, all three measures of central tendency are located at the same score. For example, in Figure 3.6 the mean of 4 also splits the curve in half, so 4 is the median. Also, 4 has the highest Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. © iStockphoto.com/David Lentz/Strato © iStockphoto.com/Stan Rohrer If the mean was used to measure the real estate values of these properties, the price of the one mansion would skew higher the true value of this row of run-down houses. frequency, so 4 is the mode. If a distribution is only roughly normal, then the mean, median, and mode will be close to the same score. However, because the mean uses all information in the data, and because it has special mathematical properties, the mean is the basis for most of the inferential procedures we will see. Therefore, when you are summarizing interval or ratio scores, always compute the mean unless it clearly provides an inaccurate description of the distribution. The mean will inaccurately describe a skewed distribution. This is because the mean must balance the distribution and to do that, the mean will be pulled toward the extreme tail of the distribution. In that case, the mean will not be where most of the scores are located. You can see this starting with the symmetrical distribution containing the scores 1, 2, 2, 2, 3. The mean is 2 and this accurately describes the Figure 3.7 Measures of Central Tendency for Skewed Distributions The vertical lines show the relative positions of the mean, median, and mode. Positive skew Mode Median Mean f f Low ... Middle scores ... High Low ... scores. However, including the score of 20 would give the skewed sample 1, 2, 2, 2, 3, 20. Now the mean is pulled up to 5. But! Most of these scores are not at or near 5. As this illustrates, the mean is always at the mathematical center, but in a skewed distribution that center is not where most of the scores are located. The solution is to use the median to summarize a skewed distribution. Figure 3.7 shows the relative positions of the mean, median, and mode in skewed distributions. In both graphs the mean is pulled toward the extreme tail and is not where most scores are located. Each distribution is also not centered around its mode. Thus, of the three measures, the median most accurately reflects the central tendency—the overall address—of a skewed distribution. It is for the above reasons that the government uses the median to summarize such skewed distributions as that of yearly income or the price of houses. For example, the median income in the United States is approximately $50,000 a year. But a relatively small number of corporate executives, movie stars, professional athletes, and the like make Negative skew millions! Averaging in Mode these high incomes Median would produce a mean Mean much higher than the median. However, most incomes are not located around this higher score, so the median is a Middle ... High better summary of this scores distribution. Chapter 3: Summarizing Scores with Measures of Central Tendency Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 43 the score “deviates” from the mean. A score’s deviation is equal to the score minus the mean. In symbols this is written as: With interval or ratio scores the mean is used to summarize normal distributions; the median is used to summarize skewed distributions. X⫺X Thus, if the sample mean is 47, a score of 50 deviates by ⫹3 because 50 ⫺ 47 is ⫹3. A score of 40 deviates from the mean of 47 by ⫺7 because 40 ⫺ 47 ⫽ ⫺7. Believe it or not, we’ve now covered the basic measures of central tendency. In sum, the first step in summarizing data is to compute a measure of central tendency to describe the score around which the distribution tends to be located. Determine which measure to compute based on (1) the scale of measurement of the scores, and (2) the shape of the distribution. For help, read research related to your study and see how other researchers computed central tendency. • Compute the mode with nominal data or with a distinctly bimodal distribution of any type of scores. • Compute the median with ordinal scores or with a very skewed distribution of interval/ratio scores. • Compute the mean with a normal or approximately normal distribution of interval or ratio scores. 3-4 APPLYING THE MEAN TO RESEARCH Most often the data in behavioral research are summarized using the mean. This is because most often, we measure variables using interval or ratio scores that naturally form a roughly normal distribution. Because the mean is used so extensively, in the following sections we will delve further into its characteristics and uses. 3-4a Deviations around the Mean deviation The distance a score is from the mean; indicates how much the score differs from the mean sum of the deviations around the mean The sum of all differences between the scores and the mean; symbolized by ⌺(X ⫺ X ) 44 First, you need to understand why the mean is at the center or “balance point” of a distribution. The answer is because the mean is just as far from the scores above it as it is from the scores below it. That is, the total distance that some scores are above the mean equals the total distance that the other scores are below the mean. The distance separating a score from the mean is called the score’s deviation, indicating the amount Always subtract the mean from the raw score when computing a score’s deviation. Do not think of deviations as positive or negative numbers in the traditional sense. Think of a deviation as having two components: the number, which indicates distance from the mean (which is always positive), and the sign, which indicates direction from the mean. Thus, a positive deviation indicates that the score is greater than the mean, and a negative deviation indicates that the score is less than the mean. The size of the deviation (regardless of its sign) indicates the distance the score lies from the mean: The larger the deviation, the farther the score is from the mean. A deviation of 0 indicates that the score equals the mean. When we determine the deviations of all the scores in a sample, we find the deviations around the mean. Then the sum of the deviations around the mean is the sum of all differences between the scores and the mean. And here’s why the mean is the mathematical center of a distribution: The sum of the deviations around the mean always equals zero. Table 3.1 Computing Deviations around the Mean X minus 3 ⫺ 4 6 ⫺ 7 ⫺ ⫺ equals Deviation 5 ⫽ ⫺2 5 5 ⫽ ⫺1 5 ⫽ ⫽ Sum ⫽ ⫹1 ⫹2 0 X Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. © iStockphoto.com/DNY59 For example, the scores 3, 4, 6, and 7 have a mean of 5. Tablee 3.1 shows how to compute the sum m of the deviations around the mean n for these scores. First, we subtract the mean from each score to obtain ain the score’s deviation. Then wee add the deviations together. © iStockphoto.com/Maartje van Caspel The sum is zero. In fact, for any distribution having any shape, the sum of the deviations around the mean will equal zero. This is because the sum of the positive deviations equals the sum of the negative deviations, so the sum of all deviations is zero. In this way the mean is the center of a distribution, because the mean is an equal distance from the scores above and below it. Note: Some of the formulas in later chapters involve computing deviations around the mean, so you need to know how we communicate the sum of the deviations using symbols. Combining the symbol for a deviation, (X ⫺ X ), with the symbol for sum, ⌺, gives the sum of the deviations as ⌺(X ⴚ X). We always work inside parentheses first, so this says to first subtract the mean from each score to find each deviation. Then we add all of the deviations together. Thus, in symbols, ⌺(X ⫺ X ) ⫽ 0. The importance of the sum of the deviations equaling zero is that it makes the mean literally the score around which everyone in the sample scored: Some scores are above the mean to the same extent that others are below it. Therefore, we think of the mean as the typical score, because it is the one score that more or less describes everyone’s score, with the same amounts of more and less. This is why the mean is so useful for summarizing a group of scores. This characteristic is also why the mean is the best score to use if you want to predict an individual’s score. Because the mean is the center score, any errors in our predictions will cancel out over the long run. Here’s why: The amount of error in one prediction n is the difference between what someone actually gets (X) and what we predict he or she will get (X). In symbols, ols, this difference is X ⫺ X, which we’ve seen is a deviation. ation. But alter your perspective: In this context, a deviation viation is the amount of error we have when we predict dict the mean as someone’s score. If we determine the he amount of error in every prediction, our total error rror is equal to the sum of the deviations, which equals uals zero. For example, if the mean on n an exam is 80, we’ll predict a score of 80 for every student in the class and, of course, sometimes we will be wrong. Say that one student scored 70. We would predict an 80, so we’d be off by b ⫺10 because this person’s score deviates from the m mean by ⫺10. However, the mean is the central score score, so another student would score 90. By estimatin estimating an 80 again, we’d be off by ⫹10 because th this person deviates by ⫹10. And so on, so that o over the entire sample, our errors will balance out to a total of zero because the total positive deviations cancel out the total negative deviations. (Likewise, any students taking the exam in the future should be like those we’ve tested and score around 80. Thus, we’d also predict they would score 80, and our errors would again balance out to zero.) If we consistently predict any score other than the mean, the total error will be greater than zero. However, a basic rule of statistics is that if we can’t perfectly describe every score, then the next best thing is to have our errors balance out to zero. There is an old joke about two statisticians shooting at a target. One hits 1 foot to the left of the target, and the other hits 1 foot to the right. “Congratulations,” one says, “we got it!” Likewise, we want our errors—our over- and underestimates—to balance out to zero. Only the mean provides this balancing capability. Therefore, when we do not know anything else about the scores, we predict that any individual will score at the mean score. So, to sum up, remember these three things about deviations around the mean: 1. A deviation equals X ⫺ X and indicates a score’s distance above or below the mean. 2. The mean is the central score, so the positive and negative deviations cancel out and the sum of the deviations around the mean equals zero. 3. A deviation also indicates the amount of error between the X we predict for someone and the X that she or he actually gets. The total error over all such predictions p equals the sum of the deviations, w which is zero. 3-4b Summarizing Research 3 N Now you can understand why researchers compute the mean anytime we have a sample of normally distributed interval or ratio sampl scores. So, if we’ve merely observed some participants, we w compute the mean number of times they exhibit a particular behavior, or we compute the exhib mean response in a survey. In a correlational mea Chapter 3: Summarizing Scores with Measures of Central Tendency Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 45 Table 3.2 Number of Words Correctly Recalled from a 5-, 10-, or 15-Item List Condition 1: 5-Item List 4 3 2 Condition 2: 10-Item List 7 6 5 Condition 3: 15-Item List 10 9 8 study we compute the mean score on the X variable and the mean score on the Y variable (symbolized by Y). Such means are then used to summarize the sample and to predict any individual’s score in this situation. The predominant way to summarize experiments is also to compute means. As an example, say that we conduct a very simple study of memory by having participants recall a list in one of three conditions in which the list contains 5, 10, or 15 words. Our dependent variable is the number of words recalled from the list. Table 3.2 above shows some idealized recall scores from the participants in each condition (each column). A relationship appears to be present because we see a different batch of higher recall scores occurring with each increase in list length. A real experiment would employ a much larger N, and so to see the relationship buried in the scores, we would compute a mean (or other measure of central tendency) for each condition. When selecting the appropriate measure, remember that the scores are from the dependent variable, so compute the mean, median, or mode depending upon (1) the scale of measurement of the dependent variable and (2) for interval or ratio scores, the shape of the distribution. In our recall experiment, we compute the mean of each condition, producing Table 3.3 below. When you are reading research, you will usually see only the means, and not the original raw scores. To interpret each mean, envision the scores that typically would produce it. In our condition 1, for example, a normal distribution Table 3.3 producing a mean of 3 would contain scores distributed above and below 3, with most scores close to 3. We then use this information to describe the scores: In condition 1, for example, we’d say participants score around 3, or the typical score is 3, and we’d predict that any participant would have a score of 3 in this situation. To see the relationship that is present, look at the pattern formed by the means: Because a different mean indicates a different batch of raw scores that produced it, a relationship is present when the means change as the conditions change. Table 3.3 shows a relationship because as the conditions change from 5 to 10 to 15 items in a list, the means indicate that the recall scores also change from around 3 to around 6 to around 9, respectively. Note, however, that not all means must change for a relationship to be present. If, for example, our means were 3, 5, and 5, respectively, then at least sometimes we see a different batch of recall scores occurring for different list lengths, and so a relationship is present. On the other hand, if the means for the three conditions had been 5, 5, and 5, this would indicate that essentially the same batch of recall scores occurred regardless of list length, so no relationship is present. Let’s assume that the data in Table 3.3 are representative of how the population behaves in this situation. (We must perform inferential procedures to check this.) If so, then we have demonstrated that list length makes a difference in the mean scores and thus in the individual recall scores. It is important to recognize that demonstrating a difference between the means is the same thing as demonstrating a relationship. In each case we are saying that a different group of scores occurs in each condition. This is important because in published research, researchers often imply that they have found a relationship by saying that they have found a difference between the means. If they find no difference, they have not found a relationship. In an experiment, if the means change as the conditions change, then the raw scores are changing, and a relationship is present. Means of Conditions in Memory Experiment 46 Condition 1: 5-Item List Condition 2: 10-Item List Condition 3: 15-Item List X⫽3 X⫽6 X⫽9 The above logic for interpreting an experiment also applies to the median and mode. A relationship is present if a different median or mode occurs in two or more conditions, because this indicates that a different batch of raw scores is occurring as the conditions change. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. graph is often a good way to present the results of an experiment, especially when they are complicated. To create a graph, begin by labeling the X axis with the conditions of the independent variable. Label the Y axis with the mean (or mode or median) of the dependent scores. (Do not be confused by the fact that we used X to represent the scores in the formula for computing the mean. We still plot the means on the Y axis.) Complete the graph by creating either a line graph or a bar graph. The type of graph to select is determined by the characteristics of the independent variable. Create a line graph when the independent variable is an interval or a ratio variable. In a line graph we plot a data point for each condition, and then connect adjacent points with straight lines. For example, in our previous memory study, the independent variable of list length is a ratio variable. Therefore, we create the line graph in the upper portion of Figure 3.8. A data point is placed above the 5-item condition opposite the mean Figure 3.8 Mean words recalled Line Graphs Showing the Relationship between Mean Words Recalled and List Length and No Relationship 10 9 8 7 6 5 4 3 2 1 Mean words recalled 0 5 10 Length of list 15 10 9 8 7 6 5 4 3 2 1 0 5 10 Length of list 15 of 3 errors, a data point is above the 10-item condition at the mean of 6 errors, and a data point is above the 15-item condition at the mean of 9 errors. Then we connect adjacent data points with straight lines. We use straight lines here for the same reason we used them when producing polygons: When the X variriable involves an interval or ratio scale, we assume it iss a continuous variable. The lines show that the relationship hip continues between the scores on the X axis. For exammple, we assume that if there had been a 6-item list, the he mean error score would fall on the line connecting the he means for the 5- and 10-item lists. The graph conveys the same information as the he sample means did back in Table 3.3. Look at the overall all pattern: If the vertical positions of the data points go up or down as the conditions change, then the means ns are changing. Different sample means indicate differerent scores in each condition, so a relationship is present. nt. However, say that instead, in every condition the mean an was 5, producing the lower graph in Figure 3.8. The he result is a horizontal line, indicating that the mean score ore stays the same as the conditions change, so essentially the same recall scores occur in each condition and no relationship is present. ©iStockphoto.com/Ivan Kmit GRAPHING THE RESULTS OF AN EXPERIMENT A If data points form a line that is not horizontal, the Y scores are changing as X changes, and a relationship is present. The other type of graph used to summarize experiments is a bar graph, like we saw in Chapter 2. Create a bar graph when the independent variable is a nominal or ordinal variable. Place a bar above each condition on the X axis to the height on the Y axis that corresponds to the mean, median, or mode for that condition. As usual, adjacent bars do not touch. We use bars here for the same reason we used them in Chapter 2: Nominal and line graph A graph of an experiment’s results when the independent variable is an interval or ratio variable; plotted by connecting data points with straight lines; as opposed to a bar graph, used when the independent variable is a nominal or ordinal variable Chapter 3: Summarizing Scores with Measures of Central Tendency Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 47 Figure 3.9 Bar Graph Showing Mean Words Recalled as a Function of College Major The height of each bar corresponds to the mean score for the condition. 14 Mean words recalled 12 10 > > 8 6 4 2 0 > Quick Practice Physics Psychology College major English ordinal X variables are assumed to be discrete, meaning you can have one score or another score, but nothing in between. The spaces between the bars communicate this. For example, say that we conducted an experiment comparing the recall errors made by psychology majors, English majors, and physics majors. This independent variable involves a nominal scale, so we have the bar graph shown in Figure 3.9 above. Because the tops of the bars do not form a horizontal line, we know that different means and thus different scores are in each condition. We see that individual scores are around 8 for physics majors, around 4 for psychology majors, and around 12 for English majors, so a relationship is present. Note: In some experiments, we may measure a nominal or an ordinal dependent variable. In that case we would plot the mode or median on the Y axis for each condition. Then, again depending on the characteristics of the independent variable, we would create either a line or bar graph. Graph the independent variable on the X axis and the mean, median, or mode of the dependent scores on the Y axis. Create a line graph when the independent variable is interval or ratio; create a bar graph when it is nominal or ordinal. More Examples Say that men and women rated their satisfaction with an instructor, and the mean scores were 20 and 30, respectively. To graph this, gender is a nominal independent variable, so plot a bar graph, with the labels “men” and “women” on X and the mean for each gender on Y. Or, say we measure the satisfaction scores of students tested with either a 10-, 40-, or 80-question final exam and, because the scores form very skewed distributions, we compute the median in each condition. Test length is a ratio independent variable, so plot a line graph, with the labels 10, 40, and 80 on X and the median of each condition on Y. For Practice 1. The independent variable is plotted on the ____ axis, and the dependent variable is plotted on the ____ axis. 2. A ____ shows a data point above each X, with adjacent points connected with straight lines. A ____ shows a discrete bar above each X. 3. The characteristics of the ____ variable determine whether to compute the mean, median, or mode. 4. The characteristics of the ____ variable determine whether to plot a line or bar graph. 5. Create a bar graph with ____ or ____ variables. 48 6. Create a line graph with ____ or ____ variables. > Answers 1. X; Y 2. line graph; bar graph 3. dependent 4. independent 5. nominal; ordinal 6. interval; ratio The scale of measurement of the dependent variable determines which measure of central tendency to compute. The scale of the independent variable determines the type of graph to create. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. / to.com ckpho © iSto Uliasz k re Ma 3-5 DESCRIBING THE POPULATION MEAN Recall that ultimately we seek to describe the population of scores found in a given situation. Populations are unwieldy, so we also summarize them using measures of central tendency. Usually we have normally distributed interval or ratio scores, so usually we describe the population mean. The symbol for a population mean is the Greek letter M (pronounced “mew”). Thus, to indicate that the population mean is 143, we’d say m ⫽ 143. We use m simply to show that we’re talking about a population, as opposed to a sample, but a population mean has the same characteristics as a sample mean: m is the average of the scores in the population, it is the center of the distribution, and the sum of the deviations around m equals zero. Thus, m is the score around which everyone in the population scored, it is the typical score, and it is the score we predict for any individual in the population. The symbol for a population mean is μ To visualize this M The symbol used relationship in pubto represent the population mean lished research, we usually assume that a graph of the population relationship would look like the graph from the sample data we created earlier. However, so that you can understand the statistics we discuss later, you should envision the relationship in the population in the following way. We assume that we know two things: By estimating each m, we know where on the dependent variable each population would be located. Also, by assuming that recall scores are normally distributed, we know the shape of each distribution. Thus, we can envision the populations of recall scores we expect as the frequency polygons shown in Figure 3.10. (These are frequency distributions, so the dependent (recall) scores are on the X axis.) The figure shows a relationship because, as the conditions of the independent variable change, scores on the dependent variable change so that we see a different population of scores for each condition. Essentially, for every 5 items added to a list, the distribution slides to the right, going from scores around 3 to around 6 to around 9. Conversely, say that we had found no relationship where, for example, every X was 3. Then, we’d envision one normal distribution located at m ⫽ 3 for all three conditions. Notice that by envisioning the relationship in the population, we have the scores for describing everyone’s behavior, so we are describing how nature operates in this situation. In fact, as we’ve done here, in every study we (1) use the sample means (or other measures) to describe the relationship in the sample, (2) perform our inferential procedures, and (3) use the sample data to envision the relationship found in the population—in nature. At that point we have basically achieved the goal of our research and we are finished with our statistical analyses. How do we determine m? If all scores in the population are known, then we compute m using the same formula used to compute the sample mean, so m ⫽ ⌺X/N. Usually, however, a population is infinitely large, so instead, we perform the inferential process we’ve discussed previously, using the mean of a sample to estimate m. Thus, if a sample’s mean in a particular situation is 99, then, assuming the sample accurately represents the population, we estimate that m in that situation is also 99. Likewise, ultimately we wish to describe any experiment in terms of the scores that would be found if we tested the entire population in each Figure 3.10 condition. For example, assume that the Locations of Populations of Recall Scores as a Function of List Length data from our previous list-length study Each distribution contains the recall scores we would expect to find if the population were tested is representative. Because the mean in under each condition. the 5-item condition was 3, we expect that everyone should score around 3 in µ for µ for µ for 5-item 10-item 15-item this situation, so we estimate that if the list list list population recalled a 5-item list, the m f would be 3. Similarly, we infer that if the population recalled a 10-item list, m would equal our condition’s mean of 6, 0 1 2 3 4 5 6 7 8 9 10 11 12 and if the population recalled a 15-item Recall scores list, m would be 9. Chapter 3: Summarizing Scores with Measures of Central Tendency Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 49 USING SPSS SPSS will compute the three measures of central tendency we’ve discussed for one sample of data (along with other statistics discussed in Chapter 4). Instructions for this are on Review Card 3.4. However, it is not necessary to repeatedly run this routine to compute the mean for each condition in an experiment. Instead, SPSS computes the means for all conditions at once as part of performing the experiment’s inferential procedures, which we’ll discuss later. Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out and use the Chapter Review Cards in the back of your book. Check out the additional study aids online in CourseMate at www.cengagebrain.com STUDY PROBLEMS (Answers for odd-numbered problems are in Appendix C.) 1. (a) What does a measure of central tendency indicate? (b) What are the three measures of central tendency? 13. The following distribution shows the locations of five scores. 2. What two aspects of the data determine which measure of central tendency to compute? 3. What is the mode, and with what type of data is it most appropriate? 4. What is the median, and with what type of data is it most appropriate? 5. What is the mean, and with what type of data is it most appropriate? 6. (a) Why does the mean accurately summarize a normal distribution? (b) Why does the mean inaccurately summarize a skewed distribution? 7. What do you know about the shape of the distribution if the median is a considerably lower score than the mean? 8. What two pieces of information about the location of a raw score does a deviation score convey? 9. (a) What does (X ⫺ X ) indicate? (b) What does ⌺(X ⫺ X ) indicate? (c) What two steps must be performed to compute ⌺(X ⫺ X ) for a sample? (d) Why does the sum of the deviations around the mean equal zero? 10. Why do we use the mean of a sample to predict anyone’s score in that sample? 11. For the following data, compute (a) the mean and (b) the mode. 55 57 59 58 60 57 56 58 61 58 59 12. (a) In question 11, what is your best estimate of the median? (b) Why? 50 f A B C D E a. Match the deviation scores ⫺7, ⫹1, 0, ⫺2, and ⫹5 with their locations. A ⫽ _____ B ⫽ _____ C ⫽ _____ D ⫽ _____ E ⫽ _____ b. Arrange the deviation scores to show the highest to lowest raw scores. c. Arrange the deviation scores to show the raw scores having the highest to lowest frequency. 14. (a) You misplaced one of the scores in a sample, but you have the other data below. What score should you guess for the missing score? (b) Why? 15 12 13 14 11 14 13 13 12 11 15 15. A researcher collected the following sets of data. For each, indicate the measure of central tendency she should compute: (a) the following IQ scores: 60, 72, 63, 83, 68, 74, 90, 86, 74, and 80; (b) the following error scores: 10, 15, 18, 15, 14, 13, 42, 15, 12, 14, and 42; (c) the following blood types: A2, A2, O, A1, AB2, A1, O, O, O, and AB1; (d) the following grades: B, D, C, A, B, F, C, B, C, D, and D. 16. On a normal distribution, four participants obtained the following deviation scores: ⫺5, 0, ⫹3, and ⫹1. (a) Which person obtained the lowest raw score? How do you know? (b) Which person’s Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. raw score had the lowest frequency? How do you know? (c) Which person’s raw score had the highest frequency? How do you know? (d) Which person obtained the highest raw score? How do you know? 17. Kevin claims a deviation of ⫹5 is always better than a deviation of ⫺5. Why is he correct or incorrect? 18. (a) What is the symbol “m” called and what does it stand for? (b) How do we usually determine its value? 19. In an experiment: (a) Which variable is plotted on the X axis? (b) Which variable is plotted on the Y axis? (c) How do you recognize the independent variable of an experiment? (d) How do you recognize the dependent variable? 20. (a) In an experiment, what is the rule for when to make a bar graph? (b) Define these scales. (c) What is the rule for when to make a line graph? (d) Define these scales. Mean number of errors 21. For the following experimental results, interpret specifically the relationship between the independent and dependent variables: 40 35 30 25 20 15 10 5 0 Y axis and which on the X axis, (2) whether the researcher should create a line graph or a bar graph, and (3) how she should summarize scores on the dependent variable: (a) a study of income for different age groups we’ve selected; (b) a study of politicians’ positive votes on environmental issues after we’ve classified them as having or not having a wildlife refuge in their political district; (c) a study of running speed depending on the amount of carbohydrates we’ve given participants; (d) a study of rates of alcohol abuse depending on which ethnic group we examine. 25. You conduct a study to determine the impact that varying the amount of noise in an office has on worker productivity. You obtain the following productivity scores. Condition 1: Low Noise Condition 2: Medium Noise Condition 3: Loud Noise 15 19 13 13 13 11 14 10 12 9 7 8 (a) Productivity scores are normally distributed ratio scores. Compute the summaries of this experiment. (b) Draw the appropriate graph for these data. (c) Draw how we would envision the populations produced by this experiment. (d) What conclusions should you draw from this experiment? 1 2 4 5 6 3 Hours of sleep deprivation 7 8 22. (a) If you participated in the study in question 21 and had been deprived of 5 hours of sleep, how many errors do you think you would have made? (b) If we tested all people in the world after 5 hours of sleep deprivation, how many errors do you think each would make? (c) What symbol stands for your prediction in part b? 23. You hear that a line graph of mean scores from the Grumpy Emotionality Test slants downward as the researcher increased the amount of sunlight present in the room where participants were tested. (Hint: Sketch this graph.) (a) What does this tell you about the mean scores for the conditions? (b) What does this tell you about the raw scores in the conditions? (c) What would we expect to find regarding the populations and their ms? (d) Should we conclude there is a relationship between emotionality and sunlight in nature? 24. For each of the experiments below, determine (1) which variable should be plotted on the 26. In a study of participants’ speeds of texting, the researchers concluded, “We found a difference between the three means for the three age groups, with slower speeds occurring with increased age. However, no speed differences were found between the overall means for males and females.” Based on this conclusion, describe the relationship we expect to find in nature between texting speed and (a) age; (b) gender. Chapter 3: Summarizing Scores with Measures of Central Tendency Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 51 Chapter 4 SUMMARIZING SCORES WITH MEASURES OF VARIABILITY LOOKING BACK GOING F O R WA R D Be sure you understand: Your goals in this chapter are to learn: • From Chapter 2, how to read and interpret the normal curve, and how to use the proportion of the area under the curve to determine relative frequency. • What is meant by variability. • From Chapter 3, what X and m stand for, what a deviation is, and why the sum of the deviations around the mean is zero. Sections 4-1 4-2 4-3 Understanding Variability The Range • What the range indicates. • What the standard deviation and variance are and how to interpret them. • How to compute the standard deviation and variance when describing a sample, when describing the population, and when estimating the population. Y ou have seen that the first steps in dealing with data are to consider the shape of the distribution and to compute the mean (or other measure of central tendency). This information simplifies the distribu- tion and allows you to envision its general properties. But not The Sample Variance and Standard Deviation everyone will behave in the same way, so we may see many 4-4 The Population Variance and Standard Deviation must also determine whether there are large or small differ- 4-5 Summary of the Variance and Standard Deviation 4-6 Computing Formulas for the Variance and Standard Deviation 4-7 52 Statistics in the Research Literature: Reporting Means and Variability different scores. Therefore, to completely describe data you ences among the scores. This chapter discusses the statistics for describing the differences among scores, which are called measures of variability. In the following sections we discuss (1) the concept of variability, (2) the statistics that describe variability in the sample, and (3) the statistics that describe variability in the population. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ©Stanth/Shutterstock.com 4-1 UNDERSTANDING VARIABILITY Computing a measure of variability is important because it answers the question “How large are the differences among the scores?” Without it, a measure of central tendency provides an incomplete description. For example, look at Table 4.1. Each sample has a mean of 6, so without looking at the raw scores, you might think they are identical distributions. But, Sample A contains scores that differ greatly from each other and from the mean, Sample B contains scores that differ less, and in Sample C there are no differences among the scores. Table 4.1 Three Different Distributions Having the Same Mean Score Sample A 0 2 6 10 12 Sample B 8 7 6 5 4 Sample C 6 6 6 6 6 X 6 X 6 X 6 Thus, to completely describe a set of data, we need to calculate statistics called measures of variability. Measures of variability describe the extent to which scores in a distribution differ from each other. In a sample with more frequent, larger differences among the scores, these statistics will produce larger numbers and we say the scores (and the underlying behaviors) are more variable or show greater variability. Measures of variability communicate three aspects of the data. First, the opposite of variability is consistency. Small variability indicates that the scores are consistently close to each other. Larger variability indicates a variety of scores that are inconsistent. Second, the amount of variability implies how accurately a measure of central tendency describes the distribution. Our focus will be on the mean and normal distributions: The more that scores differ from each other, the less accurately they are summarized by the mean. Conversely, the smaller the variability, the closer the scores are to each other and to the mean. Third, we’ve seen that the difference between two scores can be thought of as the distance that separates them. From this perspective, greater differences indicate greater distances between the scores, measures of so measures of variability indicate variability how spread out a distribution is. Statistics that (For this reason, researchers—and summarize the extent SPSS—also refer to variability as to which scores in a distribution differ dispersion: With greater variability, from one another the scores are more “dispersed.”) Chapter 4: Summarizing Scores with Measures of Variability Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 53 So, without even looking at the scores in the samples back in Table 4.1, by knowing their variability we’d know that Sample C contains consistent scores that are close to each other, so the mean of 6 accurately describes them. Sample B contains scores that differ more—are more spread out—so they are less consistent and 6 is not so accurate a summary. Sample A contains very inconsistent scores that are spread out far from each other, so 6 poorly describes most of them. You can see the same aspects when describing or envisioning a larger normal distribution. For example, Figure 4.1 shows three normal distributions that differ because of their variability. Let’s use our “parking lot view.” If our statistic indicates relatively small variability, Figure 4.1 Three Variations of the Normal Curve Distribution A X f 0 30 35 40 45 50 55 60 65 70 Scores Distribution B X f 0 30 35 40 45 50 55 60 65 70 75 Scores Distribution C it implies a distribution similar to Distribution A: This is very narrow because most of the people are standing in long lines located close to the mean (with few standing at, say, 40 or 60). Thus, most scores are close to each other, so their differences are small and this is why our statistic indicates small variability. However, if our statistic indicates intermediate variability, then it implies a distribution more like B: This is more spread out because longer lines of people are located at scores farther above and below the mean (more people stand near 40 and 60). In other words, a greater variety of scores occur here, producing more frequent and larger differences, and this is why our statistic is larger. Finally, when our statistic shows very large variability, we envision a distribution like C. It is very wide because long lines of people are at scores located farther into the tails (here, scores beyond 40 and 60 occur often). Therefore, frequently scores are anywhere between very low and very high, producing many large differences, and this is why the statistic is so large. Researchers have several ways to measure variability. The following sections discuss the three most common measures of variability: the range, the variance, and the standard deviation. But first: > Quick Practice > > Measures of variability describe the amount that scores differ from each other. When scores are variable, we see frequent large differences among the scores, indicating that participants are behaving inconsistently. More Examples If a survey produced high variability, then each person had a rather different answer—and score—than the next. This produces a wide normal curve. If the variability on a exam is small, then many students obtained either the same or close to the same score. This produces a narrow normal curve. X f For Practice 1. When researchers measure the differences among scores, they measure _________. 0 54 25 30 35 40 45 50 55 60 65 70 75 Scores 2. The opposite of variability is _________. (continued) Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 4-3 THE SAMPLE 3. When the variability in a sample is large, are the scores close together or very different from each other? VARIANCE AND VAR STANDARD STA DEVIATION DE 4. If a distribution is wide or spread out, then the variability is _________. > Answers 3. Different 4. large One way to describe variability is to determine how w far the lowest score is from the highest score. The he descriptive statistic that indicates the distance between en the two most extreme scores in a distribution is called ed the range. THE FORMULA FOR THE RANGE IS Glyn Jones/Cardinal/Corbis Range Highest score Lowest score For example, the scores back in Sample A (0, 2, 6, 10, range of 12 0 12. The less variable 12) have a ran Sample B (4, 5, 6, 7, 8) have a range of scores in Sam And the perfectly consistent Sample C 8 4 4. A (6, 6, 6, 6, 6) has a range of 6 6 0. Thus, the range does communiccate the spread in the data. However, the range is a rather crude measure. th Because it involves only the two most B extreme scores, the range is based on the extre typical and often least frequent scores, least typ all other scores. Therefore, we while ignoring ign use the range as our sole measure of usually us variability only with nominal or ordinal data. With nominal data, we compute the range the number of categories we’re by counting counti examining: For example, there is more consisexamining the participants in a study belong to 1 tency if th parties than if they belong to 1 of of 4 political politi parties. 14 parties With ordinal data, the range is the disbetween the lowest and highest rank: If tance be runners finish a race spanning only the 5 100 run positions from 1st through 5th, this is a close race with many ties; if they span 75 positions, the runners are more spread out. © Kaz Chiba/Taxi Japan/Getty Images 1. variability 2. consistency 4-2 THE RANGE Most behavioral research involves Mo interval or ratio scores that form a int normal distribution. In such situanor tions (when the mean is appropritio ate), we use two similar measures ate of variability called the variance o and the standard deviation. Understand that we use the variance and the standard deviation to describe how different the scores are from each other. We calculate them, however, by measuring how much the scores differ from the mean. Because the mean is the cenmean ter of a distribution, when scores are spread out from each other, they are also spread out from the mean. When scores are close to each other, they are close to the mean. The variance and standard deviation are the two measures of variability that indicate how much the scores are spread out around the mean. Mathematically, the distance between a score and the mean is the difference between the score and the mean. This difference is symbolized by X X, which, as in Chapter 3, is the amount that a score deviates from the mean. Because some scores will deviate from the mean by more than others, it makes sense to compute the average amount the scores deviate from the mean. The larger the “average of the deviations,” the greater the variability or spread between the scores and the mean. range The distance between the highest We cannot, however, simply and lowest scores in a compute the average of the deviaset of data tions. To compute an average, we Chapter 4: Summarizing Scores with Measures of Variability Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 55 first sum the scores, so we would first sum the deviations. In symbols, this is (X X). Recall, however, that the sum of the deviations always equals zero because the positive deviations cancel out the negative deviations. Therefore, the average of the deviations will always be zero. Thus, we want a statistic like the average of the deviations so that we know the average amount the scores differ from the mean. But, because the average of the deviations is always zero, we calculate slightly more complicated statistics called the variance and standard deviation. Think of them, however, as each producing a number that indicates something like the average or typical amount that the scores differ from the mean. Table 4.2 Calculation of Variance Using the Defining Formula Age Score 2 3 4 X 5 (X X) 3 5 5 2 5 1 0 (X X)2 9 4 1 5 6 7 5 5 1 2 1 4 8 5 3 9 N7 0 (X X ) 2 28 4-3a Understanding the Sample Variance If the problem with the average of the deviations is that the positive and negative deviations always cancel out to zero, then one solution is to square the deviations. This removes all negatives, so the sum of the squared deviations is not necessarily zero and neither is their average. By finding the average squared deviation, we compute the variance. The sample variance is the average of the squared deviations of scores around the sample mean. Our symbol for the sample variance is S 2X. Always include the squared sign (2). The capital S indicates that we are describing a sample, and the subscript X indicates it is a sample of X scores. We have a formula for the sample variance that defines it: THE DEFINING FORMULA FOR THE SAMPLE VARIANCE IS SX2 S X2 (X X) 2 28 4 N 7 This sample’s variance is 4. In other words, the average squared deviation of the age scores around the mean is 4. The symbol for the sample variance is S2X, and it indicates the average squared deviation. (X X ) 2 N This formula is important because it shows you the basis for the variance. Later we will see a different, faster formula to use when you are actually computing the variance. But sample 2 variance (S X) The first, to understand the concept, say that we measure the ages of some chilaverage of the squared deviations of scores dren. As shown in Table 4.2, we first around the sample compute each deviation, (X X), by mean subtracting the mean (which here is 5) 56 from each score. Next, as shown in the far-right column, we square each deviation. Adding the squared deviations gives (X X) 2, which here is 28. The N is 7 and so The good news is that the variance is a legitimate measure of variability. The bad news, however, is that the variance does not make much sense as the “average deviation.” There are two problems. First, squaring the deviations makes them very large, so the variance is unrealistically large. To say that our age scores differ from their mean by an average of 4 is silly, because none of the scores actually deviates from the mean by this much. The second problem is that variance is rather bizarre because it measures in squared Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. units. We measured ages, so the variance indicates that the scores deviate from the mean by 4 squared years (whatever that means!). So, the variance is only very roughly analogous to the “average” deviation. The variance is not a waste of time, however, because it is used extensively in statistics. Also, variance does communicate the relative variability of scores. If one sample has S 2X 1 and another has S 2X 3, you know that the first sample is less variable (more consistent) and more accurately described by its mean. Further, looking back at Figure 4.1, for the smaller variance you might envision a distribution like Distribution A, while for the larger variance, you would envision one more like Distribution B or C. Thus, think of variance as a number that generally communicates how variable the scores are: The larger the variance, the more the scores are spread out. The measure of variability that more directly communicates the “average of the deviations” is the standard deviation. To compute SX we first compute everything inside the square root sign to get the variance, as we did in Table 4.2. In the age scores the variance was 4. We compute the square root of the variance to find the standard deviation: so SX 2 The standard deviation of the age scores is 2. The standard deviation is as close as we come to the “average of the deviations,” so we interpret our SX of 2 as indicating that the age scores differ from the mean by something like an “average” of 2. Further, the standard deviation uses the same units as the raw scores, so the scores differ from the mean age by an “average” of 2 years. Thus, our younger participants who were below the mean usually missed it by about 2 years. When our older participants were above the mean, they were above it by an “average” of about 2 years. (X X )2 SX C N Notice that the symymbol for the sample standard ard deviation is SX, which is the square root of the symbol bol for the sample variance. © iStockphoto.com/Helder Almeida Standard Deviation THE DEFINING FORMULA FOR THE SAMPLE STANDARD D DEVIATION IS root of the sample variance; interpreted as somewhat like the “average” deviation SX 14 4-3b Understanding the Sample The sample variance is always an unrealistically large number because we square each deviation. A way to solve this problem is to take the square root of the variance. The answer is called the standard deviation. The sample standard deviation is the square root of the sample variance (the square root of the average squared deviation of scores around the mean). Conversely, squaring the standard deviation produces the variance. To create the formula that defines the standard deviation, we simply add the symbol forr the square root to the previous vious defining formula for variance. nce. sample standard deviation (SX ) The square The symbol for the sample standard deviation is SX, and we interpret it as somewhat like the average deviation of scores around the mean. Notice that the standa standard deviation also allows us to envision how spread out the distribution is and, correspondingly, how accurately ac the mean summarizes the t scores. If SX is relatively large, then a large proportion large of scores s are relatively far from tthe mean, which is why they produce pr a large “average” deviation. Therefore, we envision deviatio a relativ relatively wider distribution. If SX is ssmaller, then more often sscores sc ores are close to the mean and Chapter 4: Summarizing Scores with Measures of Variability Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 57 produce a smaller “average.” Then we envision a narrower distribution. In fact, there is a mathematical relationship between the standard deviation and the normal curve, so we can precisely describe where most of the scores in a distribution are located. 4-3c The Standard Deviation and Area under the Curve We’ve seen that the mean is the point around which a distribution is located, but “around” can be anywhere from a skinny distribution close to the mean to a wide distribution very spread out from the mean. The standard deviation allows us to quantify “around.” To do this we first find the raw score that is located at “plus 1 standard deviation from the mean” or, in symbols, at 1SX. We also find the score located at “minus 1 standard deviation from the mean,” or at 1SX. For example, say that on a statistics exam the mean is 80 and the standard deviation is 5. The score at 1SX is at 80 5, which is 75. The score at 1SX is at 80 5, which is 85. Figure 4.2 shows about where these scores are located on a normal distribution. As you can see, most of the distribution is between these two scores. In fact, we know precisely how much of the distribution is here because the standard deviation is related to the geometric properties of a normal curve (like the constant Pi is related to the geometric properties of circles). Recall that any “slice” of the normal curve contains an area under the curve and that this area translates into relative frequency, which is the proportion of time that the scores in the slice occur. Because of its shape, about 34% of the area under the normal curve is in the slice between the mean and the score that is 1 standard deviation from the mean. (Technically, it is 34.13%.) So, in Figure 4.2, about 34% of the scores are between 75 and 80, and 34% of the scores are between 80 and 85. Altogether, 68% of the scores are between the scores at 1SX and 1SX from the mean. Conversely, 16% of the scores are in the tail below 75, and 16% are above 85. (If the distribution is only approximately normal, then we expect to see approximately the above percentages.) Thus, it is accurate to say that most of the scores are around the mean of 80 between 75 and 85, because the majority of scores (68%) are here. Approximately 34% of the scores in any normal distribution are between the mean and the score that is 1 standard deviation from the mean. The characteristic bell shape of any normal distribution always places 68% of the distribution between the scores that are 1SX and 1SX from the mean. Look Figure 4.2 back at Figure 4.1 once more. Normal Distribution Showing Scores at Plus or Minus One Standard In Distribution A most scores Deviation are relatively close to the mean. With SX 5, the score of 75 is at − 1SX and the score of 85 is at 1SX. The percentages are the This will produce a small SX that approximate percentages of the scores falling into each portion of the distribution. is, let’s say, 5. Because all scores are relatively close to the mean, X 68% of them will be in the small area between 45 (50 5) and 55 (50 5). However, Distribution B is more spread f 34% 34% out, producing a larger SX (say it’s 7). Because the distribution 16% 16% 68% is wider, the middle 68% is also wider, and mathematically this 75 80 85 0 corresponds to between 43 and – 1SX + 1SX 57. Finally, Distribution C is the most spread out, with the 58 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 4. In Sample A, SX 6.82; in Sample B, SX 11.41. Sample A is _________ (more/less) variable and most scores tend to be _________ (closer to/farther from) the mean. 5. If X 10 and SX 2, then 68% of the scores fall between _________ and _________. > Answers 4. less; closer to 5. 8; 12 3. The standard deviation is the square root of the variance. largest SX (let’s say it’s 12). Because this distribution is so wide, the middle 68% is also very wide, spanning scores far from the mean, at 38 and 62. In summary, here is how to describe a distribution: If you know the data form a normal distribution, you can envision its general shape. If you know the mean, you know where the center of the distribution is and what the typical score is. And if you know the standard deviation, you know whether the distribution is relatively wide or narrow, you know the “average” amount that scores deviate from the mean, and you know between which two scores the middle 68% of the distribution lies. 1. SX2 2. SX > Quick Practice > > > 2 X The sample variance (S ) and the sample standard deviation (SX) are the two statistics to use with the mean to describe variability. The variance is the average squared deviation from the mean. The standard deviation is interpreted as the “average” amount that scores deviate from the mean. More Examples For the scores 5, 6, 7, 8, 9, the X 7. The variance (S 2X ) is the average squared deviation of the scores around the mean (here, S 2X 2). The standard deviation is the square root of the variance: Here, SX 1.41, so when participants missed the mean, they were above or below 7 by an “average” of 1.41. Further, in a normal distribution, about 34% of the scores would be between the X and 8.41 (7 1.41). About 34% of the scores would be between the X and 5.59 (7 1.41). For Practice 4-4 THE population standard deviation (sX) POPULATION VARIANCE AND STANDARD DEVIATION The square root of the population variance, or the square root of the average squared deviation of scores around the population mean Recall that our ultimate goal is to describe the population of scores. population variance Sometimes researchers have access to (sX2) The average a population, and then they directly squared deviation of calculate the actual population variscores around the ance and standard deviation. The population mean symbol for the true or actual population standard deviation is sX. (The s is the lowercase Greek letter s, called sigma.) Because the squared standard deviation is the variance, the symbol for the true population variance is sX2. The defining formulas for sX and sX2 are similar to those we saw for a sample: POPULATION STANDARD DEVIATION sX (X m) 2 B N POPULATION VARIANCE sX2 (X m) 2 N 1. The symbol for the sample variance is _________. 2. The symbol for the sample standard deviation is _________. 3. What is the difference between computing the standard deviation and computing the variance? The only novelty here is that we are computing how far each score deviates from the population mean, symbolized by m. Otherwise, the population standard deviation and variance tell us the same things about the population that we saw previously for a sample: Both are ways of measuring variability, indicating how much the scores are spread out. Further, we can Chapter 4: Summarizing Scores with Measures of Variability Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 59 interpret the population standard deviation as the “average” deviation The of the scores around m, with 68% of formula for the variance or standard deviation the scores in the population falling involving a final division between the scores that are at 1sX by N, used to describe a and 1sX from m. sample, but that tends to underestimate the Usually you will not have a poppopulation variability ulation of scores available, so you unbiased will not have to compute these paraestimators The meters. However, you will encounter formula for the variance situations where, based on much preor standard deviation vious research, researchers already involving a final division by N 1; calculated know about a population, and so using sample data to the variance or standard deviation is estimate the population given to you. Therefore, learn these variability symbols and use their formulas to understand what each represents. So, to sum up, we’ve seen how to describe the variability in a known sample (using SX2 or SX) and how to describe the variability in a known population (using sX2 or sX). However, we must discuss one other situation: Although the ultimate goal of research is usually to describe the population, often we will not know all of the scores in the population. In such situations, we use our sample data to infer or estimate the variability in the population. population, so we need a random sample of deviations. Yet, when we measure the variability of a sample, the mean is our reference point, so we have the restriction that the sum of the deviations must equal zero. Because of this, not all deviations in the sample are “free” to be random and to reflect the variability found in the population. For example, say that the mean of five scores is 6, and that four of the scores are 1, 5, 7, and 9. The sum of their deviations is 2. The final score can only be 8, because we must have a deviation of 2 so that the sum of all deviations is zero. Because this deviation is determined by the other scores, it is not a random deviation that reflects the variability in the population. Instead, only the deviations produced by the other four scores reflect the variability in the population. The same would be true for any four of the five scores. In fact, out of the N deviations in any sample, only N 1 of them (the N of the sample minus 1) actually reflect the variability in the population. However, if only N 1 of the deviations reflect the population’s variability, then when we get the “average” deviation, we should divide by N 1. The problem with the formulas for the previous biased estimators (SX and SX2 ) is that they divide by N. Because we divide by too large a number, the answer tends to be too small. Instead, if we divide by N 1, we compute the unbiased estimators of the population variance and population standard deviation. biased estimators 4-4a Estimating the Population Variance and Standard Deviation 60 Estimated Population Variance sX2 (X X)2 N1 Estimated Population Standard Deviation sX (X X)2 C N1 © joingate/Shutterstock.com We use the variability in a sample to estimate the variability we’d find if we could measure the entire population. However, we do not use the previous formulas for the sample variance and sample standard deviation as the basis for this estimate. Those formulas are used only when describing the variability of a sample. In statistical terminology, the formulas for SX2 and SX are called the biased estimators: When used to estimate the population, they are biased toward underestimating the true population parameters. Such a bias is a problem because, as we saw in the previous chapter, if we cannot be accurate, we at least want our under- and overestimates to cancel out over the long run. (Remember the two statisticians shooting targets?) With the biased estimators, the under- and overestimates do not cancel out. Instead, although the sample variance and sample standard deviation accurately describe a sample, they are too often too small to use as estimates of the population. Here’s why: To accurately estimate a population, we use a random sample. Here we are estimating deviations in the THE DEFINING FORMULAS FOR THE UNBIASED ESTIMATORS OF THE POPULATION VARIANCE AND STANDARD DEVIATION ARE Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Capital S represents the Sample; lowercase s represents ents the population estimate. mate. 4-4b Interpreting thee Estimated Population Variance and Standard Deviation Interpret the estimated population variance and standard deviation in the same way as you did S 2X and SX, except that here we describe what we expect the estimated “average deviation” in the population population variance (s2X) to be, how spread out we expect the The unbiased estimate distribution will be, and how accuof the population rately we expect the population to be variance calculated summarized by m. For example, based from sample data using N 1 on a statistics class with a mean of 80, we’d infer that the population would estimated population have a m of 80. The size of sX2 or sX standard estimates how spread out the popudeviation lation is, so if sX turned out to be 6, (sX) The unbiased estimate of the we’d expect the “average” amount population standard that individual scores deviate from 80 deviation calculated would be about 6. We can also deterfrom sample data using N 1 mine the scores at 1sX and 1sX from m, so we’d expect 68% of the population to score between 74 (80 – 6) and 86 (80 6). Notice that, assuming a sample is representative, we have reached our ultimate goal of describing the population of scores. Further, because these scores reflect behavior, this description gives us a good idea of how most individuals in the population behave in this situation (which is why we conduct research in the first place). 4-5 SUMMARY OF THE VARIANCE AND STANDARD DEVIATION To keep track of the different statistics you’ve seen, remember that variability refers to the differences between scores, which we describe by computing the variance and standar standard deviation. In each, we are finding the difference be between each score and the mean and then calculating something, more or less, like the average devia deviation. Organize your thinking about the particOrganiz ular measu measures of variability using Figure 4.3. Any standard standa deviation is merely the square root of the th corresponding variance. For either measure, meas compute the descriptive versions wh when the scores are available: When describing how far the scores are desc spread out from X, we use the sample sp vvariance (SX2 ) and the sample standard deviation (SX). When describd iing how far the scores are spread out from m, we use the population o vvariance 1s2X 2 and the population sstandard deviation 1sX 2. When the complete population of scores is th © iStockphoto.com/Imagesbybarbara The first formula above is for the estimated population variance. Notice that it involves the same basic computation we saw in our sample variance: We are finding the amount each score deviates from the mean, which will then form our estimate of how much the scores in the population deviate from m. The only novelty is that in computing the “average” of the squared deviations, we divide by N 1 instead of by N. The second formula above is the defining formula for the estimated population standard deviation. As with our previous formulas, we have simply added the square root symbol to the formula for the variance: The estimated standard deviation is the square root of the estimated variance. Because we have new formulas that produce new statistics, we also have new symbols. The symbol for the unbiased estimated population variance is the lowercase sX2 . The square root of the variance is the standard deviation, so the symbol for the unbiased estimated population standard deviation is sX. To keep your symbols straight, remember that the symbols for the sample involve the capital or big S, and in those formulas you divide by the “big” value of N. The symbols for estimates of the population involve the lowercase or small s, and here you divide by the smaller quantity N 1. Also, think of s 2X and sX as the inferential versions, because the only time you use them is to infer the variance or standard deviation of the population based on a sample. Think of S 2X and SX as the descriptive variance and standard deviation, because they are used to describe the sample. Chapter 4: Summarizing Scores with Measures of Variability Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 61 Figure 4.3 Organizational Chart of Descriptive and Inferential Measures of Variability Describing variability (differences between scores) Descriptive measures are used to describe a known sample or population In formulas, final division uses N To describe sample variance 2 compute S X Taking square root gives Sample standard deviation SX To describe population variance compute X2 Taking square root gives Population standard deviation X unavailable, we infer the variability of the population based on a sample by computing the unbiased estimators (sX or sX2 ). These inferential formulas require a final division by N 1 instead of by N. We use S 2X and SX to describe a sample, s 2X and sX to describe the true population, and s 2X and sX to estimate the population. Inferential measures are used to estimate the population based on a sample In formulas, final division uses N – 1 To estimate population variance compute s X2 Taking square root gives Estimated population standard deviation sX is to measure how far the scores are from their mean. However, in everyday use these formulas are very time-consuming and mind-numbing. By reworking the defining formulas, we have less obvious but faster “computing formulas” for describing the sample and for estimating the population. To create these formulas, the symbol for the mean (X) in the defining formulas is replaced by its formula (X>N). Then some serious reducing is performed. These formulas involve two new symbols that you must master for later statistics, too. They are: 1. The sum of squared Xs: The symbol ⌺ X 2 indicates to find the sum of the squared Xs. To do so, first square each X (each raw score) and then sum—add up—the squared Xs. Thus, to find X 2 for the scores 2, 2, and 3, add 22 22 32, which becomes 4 4 9, which equals 17. 2. The squared sum of X: The symbol ( ⌺ X)2 indicates to find the squared sum of X. To do so, work inside the parentheses first, so find the sum of the X scores. Then square that sum. Thus, to find (X 2) for the scores 2, 2, and 3, you have (2 2 3)2, which is (7)2, which is 49. 4-6 COMPUTING sum of squared Xs ( ⌺ X 2) Calculated by squaring each score in a sample and adding the squared scores squared sum of X [( ⌺ X)2] Calculated by adding all scores and then squaring their sum 62 THE FORMULAS FOR VARIANCE AND STANDARD DEVIATION The defining formulas we’ve seen are important because they show that the core computation in any version of the variance and standard deviation X 2 indicates the sum of the squared Xs, and (X)2 indicates the squared sum of X. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 4-6a Computing the Sample The formulas here were derived from the defining formulas for describing the sample variance and sample standard deviation, so our final division is by N. THE COMPUTING FORMULA FOR THE SAMPLE VARIANCE IS X 2 S X2 (X ) N STEP 2: Compute the squared sum of X. Here the squared sum of X is 352, which is 1225, so 2 203 N SX2 This says to first find the sum of X (X), square that sum, and divide the squared sum by N. Then subtract that result from the sum of the squared Xs (X 2). Finally, divide that quantity by N. For example, we can arrange our original age scores as shown in Table 4.3. STEP 1: Find X, X , and N. Here, X is 35, X is 203, and N is 7. Putting these quantities into the formula, we have 2 2 (X ) 2 (35) 2 203 N 7 N 7 X 2 SX2 © iStockphoto.com/Francisco Romero Variance and Standard Deviation 1225 7 7 STEP 3: Divide the (X)2 by N. Here 1225 divided by 7 equals 175, so SX2 203 175 7 STEP 4: Subtract in the numerator. Because 203 minus 175 equals 28, we have SX2 28 7 STEP 5: Divide. After dividing 28 by 7 we have SX2 4 Again, the sample variance for these age scores is 4, and it is interpreted as we discussed previously. Table 4.3 Calculation of Variance Using the Computational Formula X2 4 9 16 25 36 49 64 X 35 X 2 203 Do not read any further until you understand how to work this formula! © iStockPhoto.com/AJevs X Score 2 3 4 5 6 7 8 Chapter 4: Summarizing Scores with Measures of Variability Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 63 Recall that a standard deviation is the square root of the variance, so the computing formula for the sample standard deviation merely adds the square root symbol to the previous formula for the variance. THE COMPUTING FORMULA FOR THE SAMPLE STANDARD DEVIATION IS (X ) 2 X N SX R N X 2 S 2X S 2X (X ) 2 N N 255 (35) 2 5 5 255 245 2 5 2. To compute the sample standard deviation, find the above variance and then find its square root: X 2 2 SX R N 22 1.41 (X ) 2 N 255 R 1225 5 5 Again, we’ll use the age scores in Table 4.3. 2. What is X 2? 3. What is the sample variance? 4. What is the sample standard deviation? > Answers 2.667 STEP 6: Compute the square root. 2. 22 42 52 62 62 72 166 SX 2 > Quick Practice > > X 2 indicates to square each score and then find the sum of the squared Xs. (X)2 indicates to sum the scores and then find the squared sum of X. More Examples For the scores 5, 6, 7, 8, 9, we compute the sample variance as: 1. X 5 6 7 8 9 35; X 2 52 62 72 82 92 255; N 5, so 64 1. (30)2 900 As we saw originally, the standard deviation of these age scores is 2; interpret it as we did then. 900 166 6 SX 24 6 Follow Steps 2–5 described previously for computing the variance. Inside the square root symbol will be the variance, which here is again 4, so 1. What is (X )2? 3. S X2 (35)2 203 7 SX R 7 For Practice For the scores 2, 4, 5, 6, 6, 7: 4. SX 22.667 1.63 STEP 1: Find X, X 2, and N. Again, X is 35, X 2 is 203, and N is 7, so 4-6b Computing the Estimated Population Variance and Standard Deviation The only difference between the formulas for estimating the population and the previous formulas for describing the sample is that here, the final division is by N 1. THE COMPUTING FORMULA FOR THE ESTIMATED POPULATION VARIANCE IS 1 X2 2 N N1 X 2 sX2 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Notice that in the numerator we still divide by N and we use all scores in the sample. For example, previously we had the age scores of 3, 5, 2, 6, 7, 4, 8. To estimate the population variance, follow the same steps as before. First, find X and X 2. Here, X 35 and X2 203. Also, N 7 so N 1 6. Putting these quantities into the formula gives 1 X2 2 1352 2 X 203 N 7 2 sX N1 6 2 Work through this formula the same way you did for the sample variance: 352 is 1225, and 1225 divided by 7 equals 175, so sX2 203 175 6 Next, 203 minus 175 equals 28, so sX2 28 6 and the final answer is s2X 4.67 This answer is slightly larger than the sample variance for these scores, which was SX2 4. Although 4 accurately describes the sample variance, we estimate that the variance in the corresponding population is 4.67. In other words, if we could measure all scores in the population and then compute the true population variance, we would expect sX2 to be 4.67. The formula for the estimated population standard deviation merely adds the square root sign to the above formula for the variance. THE COMPUTING FORMULA FOR THE ESTIMATED POPULATION STANDARD DEVIATION IS 1 X2 2 N N1 X 2 sX R Using our age scores and performing the steps inside the square root sign as we did above produces 4.67. Therefore, sX is 24.67, which is 2.16. If we could compute the standard deviation using the entire population of scores, we would expect sX to equal 2.16. 4-7 STATISTICS IN THE RESEARCH LITERATURE: REPORTING MEANS AND VARIABILITY The standard deviation is most often reported in published research because it more directly communicates how consistently close the individual scores are to the mean and because it allows us to determine the middle 68% of the distribution. Thus, the mean from a study might describe the number of times participants exhibited a particular behavior, and a small standard deviation indicates they consistently did so. Or, in a survey, the mean might describe the typical opinion held by participants, but a large standard deviation indicates substantial disagreement among them. The same approach is used in experiments, in which we compute the mean and standard deviation in each condition. Then each mean indicates the typical score and the score we predict for anyone in that condition. The standard deviation indicates how consistently close the actual scores are to that mean. Or instead, often researchers report the estimated population standard deviation to estimate the variability of scores if everyone in the population was tested under that condition. You should be aware that the rules for reporting research results (such as those we saw for creating tables and graphs) are part of the guidelines for research publications established by the American Psychological Association (APA). We will also follow this “APA format” or “APA style” when we discuss how to report various statistics. However, research journals that follow APA format do not always use our statistical symbols. Instead (as if you don’t already have enough symbols!), the symbol they use for the sample mean is M. The symbol for the standard deviation is SD, and unless otherwise specified, you should assume it is the estimated population version. On the other hand, when a report discusses the true population parameters, our Greek symbols m and a are used. Chapter 4: Summarizing Scores with Measures of Variability Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 65 USING SPSS SPSS will simultaneously compute the mean, median, mode, range, standard deviation, and variance for a sample of data. The SPSS instructions on Review Card 4.4 show you how. The program computes only the unbiased estimators of the variance and standard deviation (our s 2X and sX). Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out and use the Chapter Review Cards in the back of your book. Check out the additional study aids online in CourseMate at www.cengagebrain.com STUDY PROBLEMS (Answers for odd-numbered problems are in Appendix C.) 1. What does a larger measure of variability communicate about: (a) the size of differences among the scores in a distribution? (b) how consistently the participants behaved? (c) how spread out the distribution is? 2. In any research, why is describing the variability important? 3. Thinking back on the previous three chapters, what are the three major pieces of information we need to know to summarize a distribution? 4. What is the difference between what a measure of central tendency tells us and what a measure of variability tells us? 5. (a) What is the range? (b) Why is it not the most accurate measure of variability? (c) When is it primarily used? 6. (a) What is the mathematical definition of the variance? (b) Mathematically, how is a sample’s variance related to its standard deviation and vice versa? 7. (a) What do both the variance and the standard deviation tell you about a sample? (b) Which measure will you usually want to compute? Why? 8. Why is the mean a less accurate description of the distribution if the variability is large than if it is small? 9. (a) What do SX , sX , and sX have in common? (b) How do they differ in their use? 10. (a) What do S 2X , s 2X , and s2X have in common? (b) How do they differ in their use? 11. (a) How do we determine the scores that mark the middle 68% of a sample? (b) How do we determine the scores that mark the middle 68% of a known population? (c) How do we estimate the scores that mark the middle 68% of an unknown population? 12. Why are your estimates of the population variance and standard deviation always larger than the 66 corresponding values that describe a sample from that population? 13. In a condition of an experiment, a researcher obtains the following scores: 3 2 1 0 7 4 8 6 6 4 Determine the following: (a) the range, (b) the variance, (c) the standard deviation, (d) the two scores between which 68% of the scores lie. 14. If you could test the entire population in question 13, what would you expect each of the following to be? (a) The typical, most common score; (b) the variance; (c) the standard deviation; (d) the two scores between which 68% of the scores lie. 15. Tiffany has a normal distribution of scores ranging from 2 to 9. (a) She computed the variance to be 2.06. What should you conclude about this answer and why? (b) She recomputes the standard deviation to be 18. What should you conclude and why? (c) If she computed that SX 0, what would this indicate? 16. From his statistics grades, Demetrius has a X 60 and SX 20. Andrew has X 60 and SX 8. (a) Who is the more inconsistent student and why? (b) Who is more accurately described as a 60 student and why? (c) For which student can you more accurately predict the next test score and why? (d) Who is more likely to do either extremely well or extremely poorly on the next exam and why? 17. Consider these normally distributed ratio scores from an experiment: Condition A Condition B Condition C 12 11 11 10 33 33 34 31 47 48 49 48 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. (a) What “measures” should you compute to summarize the experiment? (b) Compute the appropriate descriptive statistics and summarize the relationship in the sample data. (c) How consistent does it appear the participants were in each condition? 18. Say that you conducted the experiment in question 17 on the entire population. (a) Summarize the relationship that you’d expect to observe. (b) Compute how consistently you’d expect participants to behave in each condition. 19. In two studies, the mean is 40 but in Study A, SX 5, and in Study B, SX 10. (a) What is the difference in the appearance of the distributions from these studies? (b) Where do you expect the majority of scores to fall in each study? 20. Consider these normally distributed ratio scores from an experiment: Condition 1 Condition 2 Condition 3 18 13 9 8 11 6 3 9 5 (a) What should you do to summarize the experiment? (b) Summarize the relationship in the sample data. (c) How consistent are the scores in each condition? 21. Say that you conducted the experiment in question 20 on the entire population. (a) Summarize the relationship that you’d expect to observe. (b) How consistently would you expect participants to behave in each condition? 22. (a) What are the symbols for the true population variance and standard deviation? (b) What are the symbols for the biased estimators of the variance and standard deviation? (c) What are the symbols for the unbiased estimators of the variance and standard deviation? (d) When do we use the unbiased estimators? When do we use the biased estimators? 23. For each of the following, indicate the conditions of the independent variable, the scores from which variable to analyze, whether it is appropriate to compute the mean and standard deviation, and the type of graph you would create. (a) We test whether participants laugh longer (in seconds) at jokes told on a sunny or a rainy day. (b) We compare groups who have been alcoholics for 1, 3, or 5 years. In each, we measure participants’ income. (c) We count the number of creative ideas produced by participants who slept either 6, 7, or 8 hours the night before. 24. What is a researcher communicating with each of the following statements? (a) “The line graph of the means was close to flat, although the variability in each condition was quite large.” (b) “For the sample of men (M 14 and SD 3), we conclude. . . .” (c) “We expect that in the population, the average score is 14 and the standard deviation is 3.5. . . .” Chapter 4: Summarizing Scores with Measures of Variability Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 67 Chapter 5 DESCRIBING DATA WITH z-SCORES AND THE NORMAL CURVE LOOKING BACK GOING F O R WA R D Be sure you understand: Your goals in this chapter are to learn: • From Chapter 2, that relative frequency is the proportion of time that scores occur and that it corresponds to the area under the normal curve. A percentile equals the percent of the area under the curve to the left of a score. • What a z-score is. • From Chapter 3, what a deviation is and that the larger a deviation, the farther a score is from the mean. • What the sampling distribution of means is and what the standard error of the mean is. • From Chapter 4, that Sx and sx indicate the “average” deviation of scores around X and m, respectively. • What a z-distribution is and how it indicates a score’s relative standing. • How the standard normal curve is used with z-scores to determine relative frequency, simple frequency, and percentile. • How to compute z-scores for sample means and then determine their relative frequency. I n previous chapters we have summarized an entire distribu- Sections 5-1 5-2 Understanding z-Scores 5-3 Using the z-Distribution to Compare Different Variables 5-4 5-5 68 Using the z-Distribution to Interpret Scores Using the z-Distribution to Compute Relative Frequency tion of scores. In this chapter we’ll take a different approach and discuss the statistic to use when we want to interpret an individual score. Here we ask the question “How does any particular score compare to the other scores in a sample or population?” We answer this question by transforming raw scores into “z-scores.” In the following sections, we discuss (1) the logic of z-scores and their simple computation, (2) how z-scores are used to evaluate individual raw scores, and (3) how the same logic can be used to evaluate sample means. Using z-Scores to Describe Sample Means Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ©iStockphoto.com/Casarsa 5-1 UNDERSTANDING z-SCORES Researchers transform raw scores into z-scores because we usually don’t know how to interpret a raw score: We don’t know whether, in nature, a score should be considered high or low, good or bad, or what. Instead, the best we can do is to compare a score to the other scores in the distribution, describing the score’s relative standing. Relative standing reflects the systematic evaluation of a score by comparing it to the sample or population in which the score occurs. The way to determine the relative standing of a score is to transform it into a z-score. Using the z-score, we can easily compare the score to the group of scores, so we’ll know whether the individual’s underlying raw score is relatively good, bad, or in between. To see how this is done, say we are conducting a study at Prunepit University in which the first step is to measure the attractiveness of a sample of males. The scores form the normal curve shown in Figure 5.1 on the next page. We want to interpret these scores, especially those of three men: Chuck, who scored 35; Archie, who scored 65; and Jerome, who scored 90. You already know that the way to do this is to use a score’s location on the distribution to determine its frequency, relative frequency, and percentile. For example, Chuck’s score is far below the mean and has a rather low frequency. Also, the proportion of the area under the curve at his score is small, so his score has a low relative frequency. And because little of the distribution is to the left of (below) his score, he also has a low percentile. On the other hand, Archie is somewhat above the mean, so he is somewhat above the 50th percentile. Also, the height of the curve at his score is large, so his score has a rather high frequency and relative frequency. And then there’s Jerome: His score is far above the mean, with a low frequency and relative frequency, and a very high percentile. The problem with the above descriptions is that they are subjective and imprecise, and to get them we had to look at all scores in the distribution. The way to obtain the above information, but more precisely and without looking at every score, is to compute each man’s z-score. Then we can determine exactly where on the distribution a score is located so that we can precisely determine the score’s frequency, relative frequency, and percentile. 5-1a Describing a Score’s Relative Location as a z-Score We began the description of each man’s score above by noting whether it is above or below the relative standing A systematic evaluation of a score by comparing it to the sample or population in which it occurs Chapter 5: Describing Data with z-Scores and the Normal Curve Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 69 Frequency Distribution of Attractiveness Scores at Prunepit U Scores for three individuals are identified on the X axis. X f 25 30 35 40 45 Chuck 50 55 60 65 70 Archie Attractiveness scores mean. Likewise, our first calculation is to measure how far a raw score is from the mean by computing the score’s deviation, which equals X ⫺ X. For example, Jerome’s score of 90 deviates from the mean of 60 by 90 ⫺ 60 ⫽ ⫹30. A deviation of ⫹30 sounds as if it might be large, but is it? We need a frame of reference. For the entire distribution, only a few scores deviate by as much as Jerome’s score, and that makes his an impressively high score. Thus, a score is impressive if it is far from the mean, and “far” is determined by how frequently other scores deviate from the mean by that amount. Therefore, to interpret a score’s location, we must compare its deviation to the other deviations. As you saw in Chapter 4, the standard deviation is interpreted as the “average deviation.” By comparing a score’s deviation to the standard deviation, we can describe the score in terms of this average deviation. For example, say that in the attractiveness data, the sample standard deviation is 10. Jerome’s deviation of ⫹30 is equivalent to 3 standard deviations, so Jerome’s raw score is located 3 standard deviations above the mean. His raw score is impressive because it is three times as far above the mean as the “average” amount that scores are above the mean. By transforming Jerome’s deviation into standard deviation units, we z-score The statistic have computed his z-score. A z-score that indicates the indicates the distance a raw score is distance a score is from its mean when from the mean when measured in measured in standard standard deviations. The symbol for a deviation units z-score is z. A z-score always has two 70 75 80 85 90 95 Jerome components: (1) either a positive or a negative sign, which indicates whether the raw score is above or below the mean; and (2) the absolute value of the z-score (ignoring the sign), which indicates how far the score is from the mean in standard deviations. So, because Jerome is above the mean by 3 standard deviations, his z-score is ⫹3. If he had been below the mean by this amount, he would have z ⫽ ⫺3. A z-score indicates how far a raw score is above or below the mean when measured in standard deviations. Thus, like any score, a z-score is a location on a distribution. However, it also simultaneously communicates the distance it is from the mean. Therefore, knowing that Jerome scored at z ⫽ ⫹3 provides us with a frame of reference that we do not have by knowing only that his raw score was 90. 5-1b Computing z-Scores in a Sample or Population We computed Jerome’s z-score by first subtracting the mean from his raw score and then dividing by the standard deviation, so: THE FORMULA FOR TRANSFORMING A RAW SCORE IN A SAMPLE INTO A z-SCORE IS z⫽ X ⫺X SX Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ©iStockphoto.com/Helder Almeida Figure 5.1 standard deviation of the population (sX). (We do not compute z-scores using the estimated population standard deviation.) The logic here is the same as in the previous formula, but using the population symbols gives (This is both the defining and the computing formula.) To find Jerome’s z-score, STEP 1: Determine the X and SX. Then, filling in the above formula gives z⫽ X ⫺ X 90 ⫺ 60 ⫽ SX 10 THE FORMULA FOR TRANSFORMING A RAW SCORE IN A POPULATION INTO A z-SCORE IS STEP 2: Find the deviation in the numerator. Always subtract X from X. Then z⫽ z⫽ ⫹30 10 Now the answer indicates how far a raw score lies from the population mean, measured using the population standard deviation. For example, say that in the population of attractiveness scores, m ⫽ 60 and sX ⫽ 10. Jerome’s raw score of 90 is again a z ⫽ (90 ⫺ 60)/10 ⫽ ⫹3, but now this is his location in the population. STEP 3: Divide and you have z ⫽ ⫹3 Likewise, Archie’s raw score is 65, so z⫽ X ⫺ X 65 ⫺ 60 ⫹5 ⫽ ⫽ ⫽ ⫹.5 SX 10 10 Archie’s raw score is literally one-half of 1 standard deviation above the mean. Notice it is important to always include a positive or a negative sign when computing a z-score. Chuck, for example, has a raw score of 35, so z⫽ Always include a positive or a negative sign when computing a z-score. X ⫺ X 35 ⫺ 60 ⫺25 ⫽ ⫽ ⫽ ⫺2.5 SX 10 10 Here, 35 minus 60 results in a deviation of minus 25, so his z-score is ⫺2.5. This tells us that Chuck’s raw score is 2.5 standard deviations below the mean. Of course, a raw score that equals the mean produces a z-score of 0. Above, our mean is 60, so for an individual’s score of 60, we subtract 60 ⫺ 60, so z ⫽ 0. We can also compute a z-score for a score in a population, if we know the population mean (m) and the true 5-1c Computing a Raw Score When z Is Known © iStockphoto.com/Stephen Rees/© iStockphoto.com/Ines Koleva Chuck is Here. X⫺m sX Sometimes we know a z-score and want to transform it back to the raw score that produced it. For example, say that Leon scored at z ⫽ ⫹1. What is his attractiveness score? With X ⫽ 60 and SX ⫽ 10, his z-score indicates that he is 1 standard deviation above the mean. In other words, he is 10 points above 60, so his raw score is 70. What did we just do? We multiplied his z-score times SX and then added the mean. So, THE FORMULA FOR TRANSFORMING A z-SCORE IN A SAMPLE INTO A RAW SCORE IS X ⫽ (z)(SX) ⫹ X Chapter 5: Describing Data with z-Scores and the Normal Curve Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 71 So, to find Leon’s raw score, z-distribution The distribution produced by transforming all raw scores in a distribution into z-scores STEP 1: Determine the X and SX. Ours were X ⫽ 60 and SX ⫽ 10, so X ⫽ (⫹1)(10) ⫹ 60 STEP 2: Multiply z times SX. This gives: X ⫽ ⫹10 ⫹ 60 STEP 3: Add, and you have More Examples In a sample, X ⫽ 25 and S X ⫽ 5. To find z for X ⫽ 32: z⫽ X ⫺ X 32 ⫺ 25 ⫹7 ⫽ ⫽ ⫽ ⫹1.4 SX 5 5 To find the raw score for z ⫽ ⫺.43: X ⫽ (z)(S X ) ⫹ X ⫽ (⫺.43)(5) ⫹ 25 ⫽ ⫺2.15 ⫹ 25 ⫽ 22.85 X ⫽ 70 Adding a negative number is the same as subtracting its positive value, so X ⫽ 47 Brian’s raw score here is 47. The above logic also applies to finding the raw score for a z from a population, except that we use the symbols for the population. THE FORMULA FOR TRANSFORMING A z-SCORE IN A POPULATION INTO A RAW SCORE IS X ⫽ (z)(sX) ⫹ m Here, we multiply the z-score times the population standard deviation and then add m. > Quick Practice > > 72 A ⫹ z indicates that the raw score is above the mean, a ⫺ z that it is below the mean. The absolute value of z indicates the score’s distance from the mean, measured in standard deviations. With m ⫽ 100 and sX ⫽ 16, 3. What is the z for X ⫽ 132? 4. What X produces z ⫽ ⫹1.4? > Answers 1. z ⫽ (44 ⫺ 50)/10 ⫽ ⫺.60 X ⫽ ⫺13 ⫹ 60 2. What X produces z ⫽ ⫺1.3? 2. X ⫽ (⫺1.3)(10) ⫹ 50 ⫽ 37 so 1. What is z for X ⫽ 44? 3. z ⫽ (132 ⫺ 100)/16 ⫽ ⫹2 X ⫽ (⫺1.3)(10) ⫹ 60 For Practice With X ⫽ 50 and SX ⫽ 10, 4. X ⫽ (⫹1.4)(16) ⫹ 100 ⫽ 122.4 The raw score of 70 corresponds to a z of ⫹1. In another case, say that Brian has a z-score of ⫺1.3. Then with X ⫽ 60 and SX ⫽ 10, 5-2 USING THE z-DISTRIBUTION TO INTERPRET SCORES The reason that z-scores are so useful is that they directly communicate the relative standing of a raw score within a distribution. The way to see this is to envision any sample or population as a z-distribution. A z-distribution is the distribution produced by transforming all raw scores in the data into z-scores. For example, our attractiveness scores produce the z-distribution shown in Figure 5.2. Notice the two ways the X axis is labeled. This shows that by creating a z-distribution, we change only the way that we identify each score. Saying that Jerome has a z of ⫹3 is merely another way of saying that he has a raw score of 90. Because he is still at the Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. of the highest or lowest scores in the distribution and has a very low frequency. All normal z-distributions are similar because of three important characteristics: Figure 5.2 z-Distribution of Attractiveness Scores at Prunepit U The labels on the X axis show first the raw scores and then the z-scores. X f Raw scores z-scores 25 30 35 –3 Chuck ©iStockphoto.com/Pixhook 40 –2 45 50 –1 55 60 65 0 70 +1 Archie same location in the distribution, his z-score has the same frequency, relative frequency, and percentile as his raw score. Because all z-distributions are laid out in the same way, z-scores form a standard way to communicate relative standing. The z-score of 0 always indicates that the raw score equals the mean and is in the center of the distribution (and is also the median and mode). A positive z-score indicates that the z-score (and raw score) is above and graphed to the right of the mean. Positive z-scores become increasingly larger as we look farther to the right. Larger positive z-scores (and their corresponding raw scores) occur less frequently. Conversely, a negative z-score indicates that the z-score (and raw score) is below and graphed to the left of the mean. Because z-scores measure the distance a score is from the mean, negative z-scores become increasingly larger as we look farther to the left. Larger negative z-scores (and their corresponding raw scores) occur less frequently. Notice, however, that a negative z-score is not automatically a bad score. For some variables (e.g., credit card debt), a low raw score is the goal and so a larger negative z-score is a better score. Also notice that most of the z-scores are between ⫹3 and ⫺3. The symbol “ { ” means “plus or minus,” so we can restate this by saying that most z-scores are between { 3. A z-score near { 3 indicates a raw score that is one 75 80 +2 85 90 +3 Jerome 95 1. A z-distribution always has the same shape as the raw score distribution. When the underlying raw score distribution is normal, its z-distribution is normal. 2. The mean of any z-distribution is 0. Whatever the mean of the raw scores is, it transforms into a z-score of 0. (Also, the average of the positive and negative z-scores is 0.) 3. The standard deviation of any z-distribution is 1. Whether the standard deviation of the raw scores is 10 or 100, a score at that distance from the mean is a distance of 1 when transformed into a z-score, so the “average deviation” is now 1. (Also, if we compute SX using the z-scores in a z-distribution, the answer will be 1.) The larger a z-score—whether positive or negative—the farther the corresponding raw score is from the mean, and the less frequently the z-score and raw score occur. Because all z-distributions are similar, you can determine the relative standing of any raw score by computing its z-score and envisioning a z-distribution like that in Figure 5.2. Then, if the z-score is close to zero, the raw score is near the mean and is a very frequent, common score. A z greater than about {1 indicates a raw score that is less frequent. The closer Chapter 5: Describing Data with z-Scores and the Normal Curve Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 73 5-3 USING THE z-DISTRIBUTION TO COMPARE DIFFERENT VARIABLES ©iStockphoto.com/David Gilder the z is to {3, the farther into a tail it is, and the closer the raw score is to being one of the few highest or lowest scores in the distribution. Behavioral research usually involves normally distributed scores for which we compute the mean and standard deviation, so computing z-scores and using the z-distribution are usually appropriate. Besides describing relative standing as above, z-scores have two additional uses: (1) comparing scores from different distributions and (2) computing the relative frequency of scores. To see just how comparable the z-scores from these two classes are, we can plot their z-distributions on the same graph. Figure 5.3 shows the result, with the original raw scores also plotted. (The English curve is taller because of a higher frequency at each score.) Although the classes have different distributions of raw scores, the location of each z-score is the same. For example, any normal distribution is centered over its mean. This center is at z ⫽ 0, regardless of whether this corresponds to a 30 in statistics or a 40 in English. Also, scores that are ⫹1SX above their respective means are at z ⫽ ⫹1 regardless of whether this corresponds to a 35 in statistics or a 50 in English. Likewise, the raw scores of 40 in statistics and 60 in English are both 2 standard deviations above their respective means, so both are at the same location, called z ⫽ ⫹2. And so on: When two raw scores are the same distance in standard deviations from their A second important use of z-scores is to compare scores from different variables. Here’s a new example. Say that Althea received a grade of 38 on her statistics quiz and a grade of 45 on her English paper. These scores reflect different kinds of tasks, so it’s like comparing apples to oranges. The solution is to transform the raw scores from each class into z-scores. Then we can compare Althea’s relative standing in English to her relative standing in statistics, and we are no longer comparing apples and oranges. Note: z-scores equate or standardize different distributions, so they are often referred to as standard scores. Say that for the statistics quiz, the X was 30 and Figure 5.3 the SX was 5. Althea’s grade Comparison of Distributions for Statistics and English Grades, Plotted on of 38 becomes z ⫽ ⫹1.6. the Same Set of Axes For the English paper, the X was 40 and the SX was X English 10, so Althea’s 45 becomes z ⫽ ⫹.5. Althea’s z of ⫹1.6 Statistics in statistics is farther above f the mean than her z of ⫹.5 in English is above the mean, so she performed relatively better in statistics. z-scores 0 –3 –2 –1 0 +1 +2 +3 Say that another student, Statistics 15 20 25 30 35 40 45 Millie, obtained raw scores English 10 20 30 40 50 60 70 that produced z ⫽ ⫺2 in statistics and z ⫽ ⫺1 in English. Millie in Millie in Althea in Althea in Millie did better in English Statistics, English, English, Statistics, because her z-score of ⫺1 is z = –2 z = –1 z = +.5 z = +1.6 less distance below the mean. 74 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. respective mean, they produce the same z-score and are at the same location in the z-distribution. Using this z-distribution, we can see that Althea scored higher in statistics than in English, but Millie scored higher in English than in statistics. To compare raw scores from two different variables, transform the scores into z-scores. 5-4 USING THE z-DISTRIBUTION TO COMPUTE RELATIVE FREQUENCY The third important use of z-scores is to determine the relative frequency of specific raw scores. Recall that relative frequency is the proportion of time that a score occurs and that it can be computed using the proportion of the total area under the curve. Usually we are interested in finding the combined relative frequency for several scores in a “slice” of the normal curve. We can use the z-distribution to determine this because, as we saw in the previous section, a particular z-score is always at the same location on the z-distribution for any variable. By being at the same location, the z-score will always delineate the same slice, cutting off the same proportion of the curve. Therefore, the relative frequency of the z-scores in a slice will be the same on all normal z-distributions. The relative frequency of those z-scores is also the relative frequency of the corresponding raw scores. To understand this, look again at Figure 5.3. Although the heights of the two curves differ, the proportion under each curve is the same. For example, 50% of each distribution is to the left of its mean, and this is where all of the negative z-scores are. In other words, negative z-scores occur 50% of the time, so they have a combined relative frequency of .50. Having determined the relative frequency of the z-scores, we work backward to identify the corresponding raw scores. In Figure 5.3, the statistics students having negative z-scores have raw scores between 15 and 30, so the relative frequency of these scores is .50. The English students having negative z-scores have raw scores between 10 and 40, so the relative frequency of these scores is .50. Here’s another example. Recall from Chapter 4 that about 34% of the normal distribution is between the mean and the score that is 1 standard deviation above the mean (at⫹1SX). Now you know that a score at⫹1SX from the mean produces a z-score of ⫹1. Thus, in Figure 5.3, statistics scores between 30 and 35 occur .34 of the time. English scores between 40 and 50 occur .34 of the time. Likewise, we know that 68% of the scores are between the scores as ⫺1S X and ⫹1SX, which translates into between the z-scores of ⫹1 and ⫺1. Thus, in Figure 5.3, 68% of the statistics scores are between 25 and 35, and 68% of the English scores are between 30 and 50. We can also determine the relative frequencies for any other portion of a distribution. To do so, we employ the standard normal curve. 5-4a The Standard Normal Curve Because the relative frequency of a particular z-score is always the same for any normal distribution, we don’t need to draw a different z-distribution for each variable we measure. Instead, we envision one standard curve that, in fact, is called the standard normal curve. The standard normal curve is a perfect normal z-distribution that serves as our model of any approximately normal z-distribution. The idea is that most raw scores produce only an approximately normal z-distribution. However, to simplify things, we operate as if the z-distribution fits this perfect standard normal curve. We use this curve to first determine the relative frequency of particular z-scores. Then, as we did above, we work backward to determine the relative frequency of the corresponding raw scores. This is the relative frequency we would expect if our data formed a perfect normal distribution. Therefore, this approach is most accurate when (1) we have a large sample (or population) of (2) interval or ratio scores that (3) come close to forming a normal distribution. You may compute z-scores using either of our formulas for finding a z-score in a sample or in a population. Then the first step is to determine the relative frequency of the z-scores by looking standard at the area under the standard nornormal curve mal curve. Statisticians have already A perfect normal determined the proportion of the curve that serves as a model of any area under various parts of the curve, approximately normal as shown in Figure 5.4. The numz-distribution bers above the X axis indicate Chapter 5: Describing Data with z-Scores and the Normal Curve Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 75 ©iStockphoto.com/Audrey Roorda the proportion of the total area between the z-scores. The numbers below the X axis indicate the proportion of the total area between the mean and the z-score. (You won’t need to memorize them.) Each proportion is also the relative frequency of the z-scores—and raw scores—located in that part of the curve. For example, between a z of 0 and a z of ⫹1 is .3413 of the area under the curve, so about 34% of the scores really are here. Likewise, between z ⫽ ⫺1 and z ⫽ ⫹1 equals .3413 ⫹ .3413, which is .6826, so about 68% of the scores are located here. Or, between the z-scores of ⫹1 and ⫹2 is .1359 of the curve, so scores occur here 13.59% of the time. And combining this with the .3413 that is between the mean and a z of ⫹1 gives a total of .4772 between the mean and z ⫽ ⫹2. Also, we can add together nonadjacent portions. For example, out in the upper tail beyond z ⫽ ⫹2 is .0228 of the curve (because .0215 ⫹ .0013 ⫽ .0228). In the lower tail beyond z ⫽ ⫺2 is also .0228. Adding the two tails gives a total of .0456 of all scores that fall beyond z⫽{2. And so on. (Notice that z-scores beyond ⫹3 or beyond ⫺3 occur only .0013 of the time, which is why the range of z is essentially between {3.) We usually begin with a particular raw score in mind and then compute its z-score. For example, back in our original attractiveness scores, say that Steve has a raw score of 80. With X ⫽ 60 and SX ⫽ 10, we have We can envision Steve’s location as in Figure 5.5. We might first ask what proportion of the scores are expected to fall between the mean and Steve’s score. We saw above that .4772 of the total area falls between the mean and z ⫽ ⫹2. Therefore, we also expect .4772, or 47.72%, of our attractiveness scores to fall between the mean score of 60 and Steve’s score of 80. Conversely, .0228 of the area—and scores—are above his score. We might also ask how many people scored between X ⫺ X 80 ⫺ 60 the mean and Steve’s score. Then we would convert ⫽ ⫽ ⫹2 z⫽ SX 10 relative frequency to simple frequency by multiplying the N of the sample times the relative frequency. Say His z-score is ⫹2. that our N was 1000. If we expect .4772 of all scores to fall between the mean and a z of ⫹2, then Figure 5.4 (.4772)(1000) ⫽ 477.2, so we expect about 477 Proportions of Total Area under the Standard Normal Curve people to have scores The curve is symmetrical: 50% of the scores fall below the mean, and 50% fall above the mean. between the mean and .50 .50 Steve’s score. Mean We can also determine a score’s expected percentile (the percent f of the scores below— graphed to the left of—a score). As in .0013 .0215 .1359 .3413 .3413 .1359 .0215 .0013 Figure 5.5, on a normal z-scores –3 –2 –1 0 +1 +2 +3 distribution the mean .3413 .3413 is the median (the 50th .4772 .4772 percentile). A positive .4987 .4987 z-score is above the mean, so Steve’s score 76 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 5-4b Using the Figure 5.5 z-Table Location of Steve’s Score on the z-Distribution of Attractiveness Scores Steve’s raw score of 80 is a z-score of ⫹2. So far, our examples have involved whole-number .50 .50 Mean z-scores, but with real data, a z-score may contain decimals. To find the prof portion of the total area under the standard normal curve for any two-decimal z-score, look in Table 1 of Appendix B. A portion Raw scores 30 35 40 45 50 55 60 65 70 75 80 85 90 of this “z-table” is repro–3 –2 –1 0 +1 +2 +3 z-scores duced in Table 5.1. .50 .4772 .0228 Say that you seek the .9772 area under the curve above or below a z ⫽ ⫹1.63. Steve First, locate the z in column A, labeled “z.” Then move to the right. Column B is labeled “Area between of ⫹2 is above the 50th percentile. In addition, Steve’s the mean and z.” It contains each proportion under the score is above the 47.72% of the scores that fall curve between the mean and the z identified in column A. between the mean and his score. Thus, we add the .50 Here, .4484 of the curve is between the mean and the z of of the scores below the mean to the .4772 of the scores ⫹1.63. This is shown in Figure 5.6. Because the z is posibetween the mean and his score. This gives a total of tive, we place this area between the mean and the z on the .9772 of all scores that are below Steve’s score. We usually round off percentile to a whole number, so Steve’s raw score is at the 98th percentile. Table 5.1 Finally, we can work in the opposite direction to Sample Portion of the z-Table find a raw score at a particular relative frequency or percentile. Say that we seek the score that demarcates the upper .0228 of the distribution. First in terms of z-scores, we see that above a z ⫽ ⫹2 is .0228 of the B B distribution. Then to find the raw score that corresponds to this z, we use the formula for transforming a z-score into a raw score: X ⫽ (z)(SX) ⫹ X. We’ll C C find that above a raw score of 80 is .0228 of the distribution. ⫺z ⫹z X A To determine the relative frequency of raw scores, transform them into z-scores and then use the standard normal curve. z B Area between the mean and z C Area beyond z in the tail 1.60 1.61 1.62 1.63 1.64 1.65 .4452 .4463 .4474 .4484 .4495 .4505 .0548 .0537 .0526 .0516 .0505 .0495 Chapter 5: Describing Data with z-Scores and the Normal Curve Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 77 the relative frequency of scores between his score Distribution Showing the Area under the Curve for z ⫽ ⫹1.63 and the mean, then from and z ⫽ ⫺1.63 column B, we expect .4484 of the scores to be here. For simple freColumn B Column B quency, multiply N times the proportion: If N is 1000, then we expect f Column C Column C (1000)(.4484) ⫽ 448 men to have a z-score and raw score here. For percen.4484 .4484 .0516 .0516 tile, we want the percent z = –1.63 z = +1.63 X of the distribution to the left of his score, so from column C we see that below Anthony’s score is .0516 of the curve, so he right-hand side of the distribution. Next, Column C is is at about the 5th percentile. Finally, if we began by labeled “Area beyond z in the tail.” It contains each proasking what raw score creates either the slice containportion under the curve that is in the tail beyond the ing .0516 or the slice containing .4484 of the curve, z-score. Here, .0516 of the curve is in the right-hand tail we would first find the proportion in column C or of the distribution beyond the z of ⫹1.63 (also shown B, respectively, and then look in column A to find in Figure 5.6). the z-score of ⫺1.63. Then we would transform the Notice that the z-table shows no positive or z-score to its corresponding raw score using our previnegative signs. You must decide whether your z ous formula. is positive or negative and place the areas in their Note: If you seek a proportion not in the z-table, appropriate locations. Thus, if we had the negative use the z-score for the proportion that is nearest to z of ⫺1.63, columns B and C would provide the what you seek. Thus, say we seek .2000 in column B. respective areas shown on the left-hand side of FigThe nearest proportion is .1985, so z ⫽ { .52. ure 5.6. If you get confused when using the z-table, Table 5.2 summarizes all of the procedures we look at the drawing of the normal curve at the top have discussed. of the table, as was in Table 5.1. The different slices are labeled to indicate the part of the curve described in each column. Table 5.2 We can also work in the opposite Summary of Steps When Using the z-Table direction, starting with a specific proportion and finding the corresponding If You Seek First, You Should Then You z-score. First, find the proportion in Relative frequency of scores transform X to z find area in column B* column B or C, depending on the area between X and X you seek. Then identify the z-score in Relative frequency of scores transform X to z find area in column C* column A. For example, say that you beyond X in tail seek the z-score that marks off .4484 X that marks a given relative find relative frequency transform z to X of the curve between the mean and z. frequency between X and X in column B Find .4484 in column B of the table, and then, in column A, the z-score X that marks a given relative find relative frequency transform z to X is 1.63. frequency beyond X in tail in column C Use the information from the transform X to z find area in column B Percentile of an X above X z-table as we have done previously. For and add .50 example, say that we examine Anthony’s transform X to z find area in column C Percentile of an X below X raw score, which happens to produce the z of ⫺1.63. This is located on the *To find the simple frequency of the scores, multiply relative frequency times N. far-left side of Figure 5.6. If we seek Figure 5.6 78 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. > > To find the relative frequency of scores above or below a raw score, transform it into a z-score. From the z-table, find the proportion of the area under the curve above or below that z. To find the raw score at a specified relative frequency, find the proportion in the z-table and transform the corresponding z into its raw score. More Examples With X ⫽ 40 and SX ⫽ 4, To find the relative frequency of raw scores above 45: z ⫽ (X ⫺ X)/SX ⫽ (45 ⫺ 40)/4 ⫽ ⫹1.25. Saying “above” indicates "in the upper tail," so from column C the relative frequency is .1056. To find the percentile of the score of 41.5: z ⫽ (41.5 ⫺ 40)/4 ⫽ ⫹.38. Between this positive z and X in column B is .1480. This score is at the 65th percentile because .1480 ⫹ .50 ⫽ .6480 ⫽ .65. To find the proportion below z ⫽ ⫺.38: “Below” indicates "the lower tail," so from column C is .3520. To find the raw score at the 65th “percentile” we have .65 ⫺ .50 ⫽ .15. Then, from column B, the proportion closest to .15 is .1517, so z ⫽ ⫹.39. Then X ⫽ (⫹.39)(4) ⫹ 40 ⫽ 41.56. For Practice For a sample: X ⫽ 65, SX ⫽ 12, and N ⫽ 1000. 1. What is the relative frequency of scores below 59? 4. The “top” is the upper tail, so from column C the closest to .30 is .0301, with z ⫽ ⫹1.88; so X ⫽ (⫹1.88)(12) ⫹ 65 ⫽ 87.56 > Quick Practice 5-5 USING z-SCORES TO DESCRIBE SAMPLE MEANS We can also use the logic of z-scores to describe the relative standing of an entire sample. We do this by computing a z-score for the sample’s mean. To see how the procedure works, say that we give a subtest of the Scholastic Aptitude Test (SAT) to a sample of 25 students at Prunepit U. Their mean score is 520. Nationally, the mean of individual SAT scores is 500 (and sX is 100), so it appears that at least some Prunepit students scored relatively high, pulling the mean to 520. But how do we interpret the performance of the sample as a whole? The problem is the same as when we examined individual raw scores: Without a frame of reference, we don’t know whether a particular sample mean is high, low, or in between. The solution is to evaluate a sample mean by computing its z-score. Previously, a z-score compared a particular raw score to the other raw scores that occur in a particular situation. Now we’ll compare our sample mean to the other sample means that occur in a particular situation. However, our discussion must first take a small detour to see how to create a distribution showing these other means. This distribution is called the sampling distribution of means. 2. What is the percentile of 75? 3. How many scores are between the mean and 70? 4. What raw score delineates the top 3%? > Answers 3. z ⫽ (70 ⫺ 65)/12 ⫽ ⫹.42; from column B is .1628; (.1628)(1000) gives about 163 scores. ©nakamasa/Shutterstock.com 2. z ⫽ (75 ⫺ 65)/12 ⫽ ⫹.83; between z and the mean, from column B is .2967. Then .2967 ⫹ .50 ⫽ .7967 ⫽ 80th percentile. 1. z ⫽ (59 ⫺ 65)/12 ⫽ ⫺.50; “below” is the lower tail, so from column C is .3085. Chapter 5: Describing Data with z-Scores and the Normal Curve Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 79 sampling distribution of means A frequency distribution showing all possible sample means that occur when samples of a particular size are drawn from a population 5-5a The Sampling Distribution of Means If the national average of SAT scores is 500, then we can envision a population of SAT scores where m ⫽ 500. When we selected our sample of 25 central limit students and then obtained their SAT theorem A statistical principle scores, we essentially drew a sample that defines the mean, of 25 scores from this population. standard deviation, and To evaluate our sample mean, we shape of a sampling first create a distribution showing distribution all other possible means that could occur when selecting a sample of 25 scores from this population. To do so, pretend that we record all SAT scores from the population on slips of paper and put them in a large hat. Then we select a sample with the same size N as ours, compute the sample mean, replace the scores in the hat, draw another 25 scores, compute the mean, and so on. We do this an infinite number of times so that we create the population of means. Because the scores selected in each sample will not be identical, not all sample means will be identical. If we then construct a frequency polygon of the different values of X we obtained, we would create a sampling distribution of means. The sampling distribution of means is the frequency distribution of all possible sample means that occur when an infinite number of samples of the same size N are selected from one raw score population. Our SAT sampling distribution of means is shown in Figure 5.7. This is similar to a distribution of raw scores, except that each “score” along the X axis is a sample mean. Of course, in reality we cannot sample the SAT population an “infinite” number of times. However, we know that the sampling distribution would look like Figure 5.7 because of the central limit theorem. The central limit theorem is a statistical principle that defines the shape, the mean, and the standard deviation of a sampling distribution. From the central limit theorem, we know the following: 1. A sampling distribution is always an approximately normal distribution. Here our sampling distribution is a normal distribution centered around 500. In the right-hand portion of the curve are means above 500, and in the lefthand portion are means below 500. It is a normal distribution for the following reasons: Most scores in the population are close to 500, so most samples will contain scores close to 500, which will produce sample means close to 500. Sometimes, though, just by chance, strange samples will contain primarily scores that are farther below or above 500, and this will produce means that are farther below or above 500 that occur less frequently. Once in a great while, very unusual samples will occur that result in sample means that deviate greatly from 500. 2. The mean of the sampling distribution equals the mean of the underlying raw score population used to create the sampling distribution. The sampling distribution is the popFigure 5.7 ulation of all possible Sampling Distribution of SAT Means sample means, so its The X axis shows the different values of X obtained when sampling the SAT population. mean is symbolized by m, and it stands for the µ average sample mean. (That’s right—here, m is the mean of the means!) The m of our sampling f distribution equals 500 because the mean of the underlying raw score population is 500. X X X X X X X X X X X X X Because the individual Lower means Higher means SAT scores are balanced 500 around 500, over the 80 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. long run, the sample means created from those scores will also be balanced around 500, so the average mean (m) will equal 500. 3. The standard deviation of the sampling distribution is mathematically related to the standard deviation of the raw score population. As you’ll see in a moment, the variability of the raw scores influences the variability of the sample means. Note: We will always refer to the population of raw scores used to create the sampling distribution as the underlying raw score population. Here, our sampling distribution of SAT means was created from the underlying raw score population of SAT scores. The m of the sampling distribution always equals the m of the underlying raw score population. relative standing of any “score” on a normal distribution: We use z-scores. That is, we will determine where our mean of 520 falls on the X axis of this sampling distribution by finding its distance from the m of the sampling distribution when measured using the standard deviation of the sampling distribution. Then we will know whether our mean is: (a) one of the frequent means that are relatively close to the average sample mean that occurs with this underlying population, or (b) one of the higher or lower means that seldom occur with this population. To compute the z-score for a sample mean, we need one more piece of information: the “standard deviation” of the sampling distribution. 5-5b The Standard Error of the Mean The sampling distribution of means is a normal distribution having the same m as the underlying raw score population used to create it, and it shows all possible sample means that can occur when sampling from that raw score population. The importance of the central limit theorem is that we can describe a sampling distribution without having to infinitely sample a population of raw scores. Therefore, we can create the sampling distribution of means for any raw score population. Why do we want to see the sampling distribution? We took a small detour, but the original problem was to evaluate our Prunepit mean of 520. Once we envision the distribution back in Figure 5.7, we have a model of the frequency distribution of all sample means that occur when measuring SAT scores. Then we can use this distribution to determine the relative standing of our sample mean. The sampling distribution is a normal distribution, and you already know how to determine the The standard deviation of the sampling distribution of means is called the standard error of the mean. Like a standard deviation, the standard error of the mean can be thought of as the “average” amount that the sample means deviate from the m of the sampling distribution. That is, in some sampling distributions, the sample means may be very different from one another and deviate greatly from the average sample mean. In other distributions, the means may be very similar and deviate little from m. For now, we’ll discuss the true standard error of the mean, as if we had actually computed it using the entire sampling distribution. Its symbol is sX–. The s indicates that we are describing a population, and the subscript X indicates it is a population of sample means. The central limit theorem tells us that sX can be found using this formula: THE FORMULA FOR THE TRUE STANDARD ERROR OF THE MEAN IS sX ⫽ sX 1N Notice that the formula involves sX: This is the true standard deviation of the underlying raw score population used to create the sampling distribution. The size of sX depends on the size of sX because standard error of the mean (SX̄) The standard deviation of the sampling distribution of means Chapter 5: Describing Data with z-Scores and the Normal Curve Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 81 5-5c Computing a z-Score for more variable raw scores are likely to produce very different samples each time, so their means will differ more (and sX will be larger). Less variable scores will produce more similar samples and means (and sX will be smaller). The size of sX also depends on the size of our N. The larger the N, the more each sample is like the population, so the sample means will be closer to the population mean and to each other (and sX will be smaller). A smaller N allows for more different samples each time (and sX will be larger). To compute sX for our SAT example: a Sample Mean We use this formula to compute a z-score for a sample mean: THE FORMULA FOR TRANSFORMING A SAMPLE MEAN INTO A Z-SCORE IS z⫽ STEP 1: Identify the sX of the underlying raw score population and the N used to create your sample. For the SAT, the sX is 100, and the N was 25, so sX ⫽ sX 1N ⫽ In the formula, X is our sample mean, m is the mean of the sampling distribution (which equals the mean of the underlying raw score population), and sX is the standard error of the mean, which we computed above. The answer is a z-score that indicates how far the sample mean is from the mean of the sampling distribution (m), measured in standard error units (sX). To compute the z-score for our Prunepit sample, 100 125 STEP 2: Compute the square root of N. The square root of 25 is 5, so sX ⫽ X⫺m sX 100 5 STEP 1: Compute the standard error of the mean (sX) as described above, and identify the sample mean and m of the sampling distribution. STEP 3: Divide, and we have sX ⫽ 20 This indicates that in our SAT sampling distribution, the individual sample means differ from the m of 500 by an “average” of 20 points. The m of the sampling distribution equals the m of the underlying raw score population the sample is selected from. The standard error of the mean (sX) is the standard deviation of the sampling distribution of means. For our data, X ⫽ 520, m ⫽ 500, and sX ⫽ 20, so we have z⫽ Now, at last, we can calculate a z-score for our sample mean. STEP 2: Subtract m from X. Then ©iStockphoto.com/Jesus Jauregui 82 X ⫺ m 520 ⫺ 500 ⫽ sX 20 z⫽ ⫹20 20 STEP 3: Dividing gives z ⫽ ⫹1 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Thus, a sample mean of 520 has a z-score of ⫹1 on the SAT sampling distribution of means that occurs when N is 25. Here’s another example that combines everything we’ve done, again using our SAT population, where m ⫽ 500 and sX ⫽ 100. Say that over at Podunk College, a sample of 25 SAT scores produced a mean of 460. To find their z-score: sX ⫽ sX 1N ⫽ Bill Aron/PhotoEdit 1. First, compute the standard error of the mean (s X): 100 100 ⫽ ⫽ 20 5 125 2. Then find z: z⫽ X⫺m 460 ⫺ 500 ⫺40 ⫽ ⫽ ⫽ ⫺2 sX 20 20 The Podunk sample has a z-score of ⫺2 on the sampling distribution of SAT means. 5-5d Describing the Relative Frequency of Sample Means and z-table to describe the relative frequency of sample means in any part of a sampling distribution. Figure 5.8 shows the standard normal curve applied to our SAT sampling distribution. These are the same proportions that we used to describe individual raw scores. Here, however, each proportion is the expected relative frequency of the sample means that occur in this situation. For example, the sample mean of 520 from Prunepit U has a z of ⫹1. As shown, and as in column B of the z-table, .3413 of all scores fall between the mean and z of ⫹1 on any normal distribution. Therefore, .3413 of all sample means are expected to fall here, so we expect .3413 of all SAT sample means to be between 500 and 520 (when N is 25). Or, for sample means above our sample mean, from column C of the z-table, above a z of ⫹1 is .1587 of the distribution. Therefore, we expect that .1587 of all SAT sample means will be above 520. Everything we said previously about a z-score for an individual score applies to a z-score for a sample mean. So, because our original Prunepit mean has a z-score of ⫹1, we know that it is above the m of the sampling distribution by an amount equal to the “average” amount that sample means deviate above m. Our Podunk sample, however, has a z-score of ⫺2, so its mean is relatively low compared to other means that occur in this situation. And here’s the nifty Figure 5.8 part: Because a sampling Proportions of the Standard Normal Curve Applied to the Sampling Distribution distribution is always an of SAT Means approximately normal distribution, transforming all .50 .50 µ of the sample means in the sampling distribution into z-scores produces a normal z-distribution. Recall that f the standard normal curve is our model of any normal z-distribution. Therefore, .0215 .1359 .3413 .3413 .1359 .0215 .0013 .0013 as we did previously with SAT means 440 460 480 500 520 540 560 raw scores, we can use the z-scores –3 –2 –1 0 +1 +2 +3 standard normal curve Chapter 5: Describing Data with z-Scores and the Normal Curve Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 83 On the other hand, the Podunk sample mean was 460, producing a z of ⫺2. From column B of the z-table, a total of .4772 of the distribution falls between the mean and this z-score. Therefore, we expect .4772 of SAT means to be between 500 and 460. From column C, we expect only .0228 of the means to be below 460. We can use this same procedure to describe sample means from any normally distributed variable. To be honest, though, researchers do not often compute the z-score for a sample mean solely to determine relative frequency (nor does SPSS include this routine). However, it is extremely important that you understand this procedure, because it is the essence of all upcoming inferential statistical procedures (and you’ll definitely be doing those!). > Quick Practice > We can describe a sample mean by computing its z-score and using the z-table to determine the relative frequency of sample means above or below it. More Examples On a test, m ⫽ 100, sX ⫽ 16, and our N ⫽ 64. What proportion of sample means will be above X ⫽ 103? First, compute the standard error of the mean (sX ): sX ⫽ sX 1N ⫽ 16 16 ⫽ ⫽2 8 164 Next compute z: z⫽ Apply the standard normal curve model and the z-table to any sampling distribution. 5-5e Summary of Describing a Sample Mean with a z-Score X ⫺ m 103 ⫺ 100 ⫹3 ⫽ ⫽ ⫽ ⫹1.5 sX 2 2 Finally, examine the z-table: The area above this z is the upper tail of the distribution, so from column C is .0668. This is the proportion of sample means expected to be above a mean of 103. For Practice A population of raw scores has m ⫽ 75 and sX ⫽ 22; our N ⫽ 100 and X ⫽ 80. To describe a sample mean from any underlying raw score population: 1. The m of the sampling distribution here equals _____. 1. Create the sampling distribution of means with a m equal to the m of the underlying raw score population. 2. The symbol for the standard error of the mean is _____, and here it equals _____. 84 1. 75 3. Use the z-table to determine the relative frequency of z-scores above or below this z-score, which is the relative frequency of sample means above or below your mean. > Answers 2. sX ; 22/ 1100 ⫽ 2.2 b. Compute z, finding how far your X is from the m of the sampling distribution, measured in standard error units. 4. How often will sample means between 75 and 80 occur in this situation? 3. z ⫽ (80 ⫺ 75)/2.2 ⫽ ⫹2.27 a. Using the sX of the underlying raw score population and your sample N, compute the standard error of the mean, sX. 3. The z-score for a sample mean of 80 is _____. 4. From column B: .4884 of the time 2. Compute the z-score for the sample mean: Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. USING SPSS As described on Review Card 5.4, SPSS will simultaneously transform an entire sample of raw scores into z-scores. It does not, however, provide the information found in the z-table. Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out and use the Chapter Review Cards in the back of your book. Check out the additional study aids online in CourseMate at www.cengagebrain.com STUDY PROBLEMS (Answers for odd-numbered problems are in Appendix C.) 1. (a) What does a z-score indicate? (b) What are the three major uses of z-scores with individuals’ scores? 2. What two factors determine the size of a z-score? 3. (a) What is a z-distribution? (b) Is a z-distribution always a normal distribution? (c) What is the mean and standard deviation of any z-distribution? 4. Why are z-scores called “standard scores”? 5. (a) What is the standard normal curve? (b) How is it used to describe raw scores? (c) The standard normal curve is most appropriate when raw scores have what characteristics? 6. An instructor says that your test grade produced a very large positive z-score. (a) How well did you do on the test? (b) What do you know about your raw score’s relative frequency? (c) What does it mean if you scored at the 80th percentile? (d) What distribution would the instructor examine to make the conclusion in part c? 7. An exam produced scores with X ⫽ 86 and SX ⫽ 12. (a) What is the z-score for a raw score of 80? (b) What is the z-score for a raw score of 98? (c) What is the raw score for z ⫽ ⫺1.5? (d) What is the raw score for z ⫽ ⫹1? 8. Another exam produced raw scores with a sample mean of 25 and standard deviation of 2. Find the following: (a) the z-score for X ⫽ 31; (b) the z-score for X ⫽ 18; (c) the raw score for z ⫽ ⫺2.5; (d) the raw score for z ⫽ ⫹.5 9. Which z-score in each of the following pairs corresponds to the lower raw score? (a) z ⫽ ⫹1.0 or z ⫽ ⫹2.3; (b) z ⫽ ⫺2.8 or z ⫽ ⫺1.7; (c) z ⫽ ⫺.70 or z ⫽ ⫹.20; (d) z ⫽ 0 or z ⫽ ⫹1.4 10. For each pair in question 9, which z-score has the higher frequency? 11. (a) What are the steps for using the z-table to find: (a) the relative frequency of raw scores in a specified slice of a distribution? (b) the percentile for a raw score below the mean? (c) the percentile for a raw score above the mean? (d) the raw score that cuts off a specified relative frequency or percentile? 12. In a normal distribution, what proportion of all scores would fall into each of the following areas? (a) between the mean and z ⫽ ⫹1.89; (b) below z ⫽ ⫺2.30; (c) between z ⫽ ⫺1.25 and z ⫽ ⫹2.75; (d) above z ⫽ ⫹1.96 and below ⫺1.96 13. For a distribution, X ⫽ 100, SX ⫽ 16, and N ⫽ 500. (a) What is the relative frequency of scores between 76 and the mean? (b) How many participants are expected to score between 76 and the mean? (c) What is the percentile of someone scoring 76? (d) How many participants are expected to score above 76? 14. (a) What is a sampling distribution of means? (b) How do we use it? (c) What do we mean by the “underlying raw score population”? 15. (a) What three things does the central limit theorem tell us about the sampling distribution of means? (b) Why is the central limit theorem useful when we want to describe a sample mean? 16. What is the standard error of the mean and what does it indicate? 17. In an English class, Emily earned a 76 (with X ⫽ 85, SX ⫽ 10). Her friend Amber in French class earned a 60 (with X ⫽ 50, SX ⫽ 4). Should Emily be bragging about how much better she did? Why? 18. What are the steps for finding the relative frequency of sample means above or below a specified mean? Chapter 5: Describing Data with z-Scores and the Normal Curve Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 85 19. Derrick received a 55 on a biology test (with X ⫽ 50) and a 45 on a philosophy test (with X ⫽ 50). He is considering whether or not to ask his professor to curve the grades using z-scores. (a) Does he want the SX to be large or small in biology? Why? (b) Does he want the SX to be large or small in philosophy? Why? 20. It seems that everyone I meet claims to have an IQ above 145, and often above 160. I know that most IQ tests produce a normal distribution with a m at about 100 and a sX of about 15. Why do I doubt their claims? 21. Students may be classified as having a math dysfunction—and not have to take statistics—if they score below the 25th percentile on a diagnostic test. The m of the test is 75 and sX ⫽ 10. Approximately what raw score is the cutoff score needed to avoid taking statistics? 22. For the diagnostic test in problem 21, we want to create the sampling distribution of means when N ⫽ 64. (a) What does this distribution show? (b) What is the shape of the distribution and what is its m? (c) Calculate sX for this distribution. (d) What is your answer in part c called, and what information does it provide? (e) Determine the relative frequency of sample means above 77. 23. Candice has two job offers and must decide which one to accept. The job in City A pays $43,000, and 86 the average cost of living is $45,000, with a standard deviation of $15,000. The job in City B pays $46,000, but the average cost of living is $50,000, with a standard deviation of $18,000. Assuming salaries are normally distributed, which is the better job offer? Why? 24. Suppose you own shares of a company’s stock. Over the past 10 trading days, its mean selling price has been $14.89. For the history of the company, the average price of the stock has been $10.43 (with sX ⫽ +5.60). You wonder if the mean selling price for the next 10 days can be expected to get much higher. Should you wait to sell, or should you sell now? 25. A researcher develops a test for identifying intellectually gifted children, with a m of 56 and a sX of 8. (a) What percentage of children are expected to score below 60? (b) What percentage of the scores will be below 54? (c) A gifted child is defined as being in the top 20%. What is the minimum test score needed to qualify as gifted? 26. Using the test in question 25, you measure 64 children, obtaining a X of 57.28. Dexter says that because this X is so close to the m of 56, this sample is rather average. (a) Perform the appropriate procedure to evaluate this mean. (b) Decide if Dexter’s assertion is correct by using percent to describe this mean’s relative standing. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 4LTR Press solutions are designed for today’s learners through the continuous feedback of students like you. Tell us what you think about Behavioral Sciences STAT2 and help us improve the learning experience for future students. Complete the Speak Up survey in CourseMate at www.cengagebrain.com Follow us at www.facebook.com/4ltrpress Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. © iStockphoto.com/mustafahacalak YOUR FEEDBACK MATTERS. 6 Chapter USING PROBABILITY TO MAKE DECISIONS ABOUT DATA LOOKING BACK GOING F O R WA R D Be sure you understand: Your goals in this chapter are to learn: • From Chapter 1, the logic of using a sample to draw inferences about the population. • What probability is. • From Chapter 2, that relative frequency is the proportion of time that scores occur. • How random sampling should produce a representative sample. • From Chapter 5, how to compute a z-score for raw scores or sample means, and how to determine their relative frequency using the standard normal curve and z-table. • How to use a sampling distribution of means to decide whether a sample represents a particular population. Sections 6-1 6-2 6-3 6-4 6-5 Understanding Probability Probability Distributions Obtaining Probability from the Standard Normal Curve • How to compute the probability of raw scores and sample means using z-scores. • How sampling error may produce an unrepresentative sample. Y ou now know most of the common descriptive statistics used in behavioral research. Therefore, you are ready to begin learning the other type of statistical procedure, called inferential statistics. Recall that these procedures are used to draw inferences from sample data about the scores and relationship found in nature—in what we call the population. This chapter sets the foundation for these procedures by introducing you to the “wonderful” world of Random Sampling and Sampling Error probability. Don’t worry, though, because the discussion is rather Deciding Whether a Sample Represents a Population do need to understand the basics. In the following sections we’ll simple, and there is little in the way of formulas. However, you discuss (1) what probability is, (2) how to determine probability using the normal curve, and (3) how to use probability to draw conclusions about a sample mean. 88 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Zentilia/Dreamstime.com 6-1 UNDERSTANDING PROBABILITY Probability is used to describe random or chance events. By random we mean that nature is being fair, with no bias toward one event over another (no rigged roulette wheels or loaded dice). In statistical terminology, a random event that does occur in a given situation is our sample. The larger collection of all possible events that might occur in this situation is the population. Thus, the sample could be drawing a particular playing card from the population of all cards in the deck, or, when tossing a coin, the sample is the sequence of heads and tails we see from the population of all possible combinations of heads and tails. In research, the sample is the particular group of individuals selected from the population of individuals we are interested in. Because probability deals only with random events, we compute probability only for samples that are obtained through random sampling. Random sampling involves selecting a sample in such a way that all events or individuals in the population have an equal chance of being selected. Thus, in research, random sampling is anything akin to drawing participants’ names from a large hat that contains all names in the population. A particular sample occurs or does not occur solely because of the luck of the draw. But how can we describe an event that occurs only by chance? By paying attention to how often the event occurs over the long run. Intuitively, we use this logic all the time. If event A happens frequently over the long run, we think it is likely to happen again now, and we say that it has a high probability. If event B happens infrequently over the long run, we think that it is unlikely to happen now, and we say that it has a low probability. Using our terminology, when we discuss events occurring “over the long run,” we are talking about how often they occur in the population of all possible events. When we decide that one event happens frequently in the population, we are making a relative judgment and describing the event’s relative frequency. This random is the proportion of time that the sampling Selecting samples event occurs out of all events that so that all members might occur in the population. This of the population is also the event’s probability. The have the same chance of being probability of an event is equal selected to the event’s relative frequency in probability the population of possible events (p) The likelihood that can occur. The symbol for of an event when probability is p. a population is Probability is essentially a sysrandomly sampled; equal to the event’s tem for expressing our confidence relative frequency in that a particular random event the population will occur. First we assume that an Chapter 6: Using Probability to Make Decisions about Data Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 89 1 ⫺ .80 ⫽ .20, the probability is .20 that a word will be error free. Finally, underrstand that except pt when p equals either ther 0 or 1, we are never ver certain that an event will ill or will not occur. The probability ability of an event is its relative ive frequency over the long run n (in the population). It is up to chance— luck—whether thee event occurs in our sample. So, even though I make typos 80% of the time, I may go for quite a while without making one. That 20% of the time when I make no typos has to occur sometime. Thus, it is only over the long run that we expect to see precisely 80% typos. People who fail to understand that probability implies over the long run fall victim to what psychologists call the “gambler’s fallacy.” For example, after observing my errorless typing for a while, the fallacy would be thinking that errors “must” occur now, essentially concluding that errors have become more likely. Or, say we’re flipping a coin and get seven heads in a row. The fallacy would be thinking that a head is now less likely to occur, because it’s already occurred too often (as if the coin decides, “Hold it. That’s enough heads for a while!”). The mistake of the gambler’s fallacy is failing to recognize that the probability of an event is not altered by whether or not the event occurs over the short run: Probability is determined by what happens over the long run. ©iStockphoto.com/Ben phillips event’s past relative frequency will continue over the long run into the future. Then we express our confidence that the event will occur in any single sample by using a number between 0 and 1 to express this relative frequency as a probability. For example, I am a rotten typist and I randomly make typos 80% of the time. This means that in the population of my typing, typos occur with a relative frequency of .80. We expect the relative frequency of typos to continue at a rate of .80 in anything else I type. This expected relative frequency is expressed as a probability, so the probability is .80 that I will make a typo when I type the next woid. Likewise, all probabilities communicate our confidence in an event. So if event A has a relative frequency of zero in a particular situation, then p ⫽ 0. This means that we do not expect A to occur in this situation because it never does. If A has a relative frequency of .10 in this situation, then it has a probability of .10: Because it occurs only 10% of the time in the population, we have some—but not much— confidence that A will occur in the next sample. On the other hand, if A has a probability of .95, we are confident that it will occur: It occurs 95% of the time in this situation, so we expect it to occur in 95% of our samples. Therefore, our confidence is .95 that it will occur now, so we say p ⫽ .95. At the most extreme, an event’s relative frequency can be 1: It is 100% of the population, so p ⫽ 1. Here we are positive it will occur because it always occurs. An event cannot happen less than 0% of the time nor more than 100% of the time, so a probability can never be less than 0 or greater than 1. Also, all events together constitute 100% of the population. This means that the probability distribution The probabilities of all events must add probability of every up to 1. So, if the probability of my event in a population making a typo is .80, then because 6-2 PROBABILITY ©iStockphoto.com/Sharon Dominick DISTRIBUTIONS 90 To compute the probability of an event, we need only determine its relative frequency in the population. When we know the relative frequency of every event in a population, we have a probability distribution. A probability distribution indicates the probability of all possible events in a population. One way to create a probability distribution is to observe the relative frequency of events, creating an empirical probability distribution. Typically, however, we cannot observe the entire population, so the probability distribution is based on the Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. observed frequencies of events in a sample, which are used to represent the population. For example, say that Dr. Fraud is sometimes very cranky, and his crankiness is random. We observe him on 18 days and he is cranky on 6 of them. Relative frequency equals f/N, so the relative frequency of his crankiness is 6/18, or .33. We expect that he will continue to be cranky 33% of the time, so the probability that he will be cranky today is p ⫽ .33. Conversely, he was not cranky on 12 of the 18 days, which is .67. Thus, p ⫽ .67 that he will not be cranky today. Because his cranky days plus his noncranky days constitute all possible events, we have the complete probability distribution for his crankiness. Another way to create a probability distribution is to devise a theoretical probability distribution, which is based on how we assume nature distributes events in the population. From such a model, we determine the expected relative frequency of each event in the population, which is then the probability of each event. For example, consider tossing a coin. We assume that nature has no bias toward heads or tails, so over the long run we expect the relative frequency of heads to be .50 and the relative frequency of tails to be .50. Thus, we have a theoretical probability distribution for coin tosses: The probability of a head on any toss is p ⫽ .50 and the probability of a tail is p ⫽ .50. Or, consider drawing a playing card from a deck of 52 cards. Because there is no bias in favor of any card, we expect each card to occur at a rate of once out of every 52 draws over the long run. Thus, each card has a relative frequency of 1/52, or .0192, so the > Quick Practice > > An event’s probability equals its relative frequency in the population. A probability distribution indicates all probabilities for a population. More Examples One hundred raffle tickets have been sold. Assuming no bias, each should be selected at a rate of 1 out of 100 draws over the long run. Therefore, the probability that you hold the winning ticket is p ⫽ 1/100 ⫽ .01. For Practice 1. The probability of any event equals its ______ in the ______. 2. Probability applies only to what kinds of events? 3. If 25 people are in your class, what is the probability the professor will randomly call on you? 4. Tossing a coin 10 times produces 10 heads. What is the p of getting a head on the next toss? > Answers 1. relative frequency; population 2. random 3. p ⫽ 1/25 ⫽ .04 4. p ⫽ .50 PNC/Digital Vision/Jupiterimages probability of drawing any specific card on a single draw is p ⫽ .0192. Finally, if your state’s lottery says you have a 1 in 17 million chance of winning, it is because there are 17 million different number combinations to select from. Assuming no bias, we expect to draw all 17 million combinations equally often over the long run. Therefore, because we’ll draw your selection once out of every 17 million draws, your chance of winning today is 1 in 17 million. (Also, to the lottery machine, there is nothing special about a sequence like “1, 2, 3, 4,” so it has the same probability of being selected as a sequence that looks more random like “3, 9, 1, 6.”) And that is the logic of probability: We devise a probability distribution based on the relative frequency of each event in the population. An event’s relative frequency equals its probability of occurring in a particular sample. Chapter 6: Using Probability to Make Decisions about Data Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 91 6-3 OBTAINING PROBABILITY FROM THE STANDARD NORMAL CURVE 6-3a Finding the Probability of Sample Means We can compute the probability of obtaining particular sample means by using a sampling distribution of means, which is another theoretical probability distribution. Recall that a sampling distribution is the frequency distribution of all possible sample means that occur when a particular raw score population is sampled an infinite number of times using a particular N. For example, one last time let’s look at SAT scores. We begin with the population of students who have taken an SAT subtest. By collecting their scores, we have the SAT raw score population. Say its m is 500 and sX is 100. The sampling distribution shows us what to expect if we were to infinitely sample this underlying SAT raw score population. Say our N is 25. Then, it is as if we randomly selected 25 scores (by reaching into a large hat again), computed X, returned the scores to the hat, drew another sample and computed X, and so on. Figure 6.1 shows the resulting SAT sampling distribution. Recognize that the different values of X occur here simply because of the luck of the draw of which scores are in the sample each time. Sometimes a sample mean higher than 500 occurs because, by chance, the sample contains predominantly high scores. At other times, a sample mean lower than 500 occurs because, by chance, the sample contains predominantly low scores. Thus, the sampling distribution provides a picture of how often different sample means occur simply because of random chance. The sampling distribution is useful because, without actually sampling the underlying raw score population, we can see all of the means that occur, and we can determine the probability of randomly obtaining any particular means. (Now it is like we are reaching into a large hat containing all of the sample means.) Enough about flipping coins and winning the lottery. Now that you understand probability, you are ready to see how researchers determine the probability of data. Here, our theoretical probability distribution is the standard normal curve. You learned in Chapter 5 that the way to use the standard normal curve is to first compute a z-score to identify the scores in a part of the curve. Then from the z-table, we determine the proportion of the area under the curve for those scores. This proportion is also the relative frequency of those scores. But now you’ve learned in this chapter that the relative frequency of an event is its probability. Therefore, by finding the proportion of the area under the curve for particular scores, we also find the probability of those scores. For example, let’s say we have a normal distribution of scores that we collected as part of a national study. Say it has a mean of 60. Also say that we want to know the probability of randomly selecting a score below 60 (as if we were reaching into a big hat containing all scores from this population). Because 60 is the mean and we seek all scores below it, we are talking about the raw scores that produce negative z-scores. In the standard normal curve, negative z-scores constitute 50% of the area under the curve, so they occur .50 of the time in the population. Therefore, the probability is .50 that we will randomly select a negative z-score. Because negative z-scores correspond to raw scores below 60, the probability is also .50 that we will select a raw score below 60. And, because ultimately these scores are produced by participants, the probability is also .50 that we will randomly select a participant Figure 6.1 with such a score. Sampling Distribution of SAT Means When N ⫽ 25 In truth, researchµ ers seldom use the above procedure to determine the probability of individual f .3413 scores. However, they .0228 do use this procedure as part of inferential statistics to determine Sample means 440 460 480 500 z-scores –3 –2 –1 0 the probability of sample means. 92 .0228 520 +1 540 +2 560 +3 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. However, understand how we phrase this: We are finding the probability of obtaining particular sample means when we do draw a random sample from our underlying raw score population. For example, we know that the most likely sample mean is 500 when we are sampling our SAT raw score population, because that’s the mean that happens most often in our sampling distribution. Or, we know that 50% of the time, we’ll get an SAT mean below 500, because the sampling distribution shows us that’s how often those means occur when sampling this raw score population. We can also be more specific. Say we seek the probability that a mean will be between 500 and 520. To find this probability, first we transform the sample mean of 520 into a z-score. As in Chapter 5, the first step is to compute the standard error of the mean. Recall that its formula is sX ⫽ sX 1N With N ⫽ 25 and sX ⫽ 100, we have sX ⫽ sX 1N ⫽ 100 100 ⫽ 20 ⫽ 5 125 Next we compute the sample mean’s z-score using the formula z⫽ X ⫺m sX So we have z⫽ X ⫺ m 520 ⫺ 500 ⫹20 ⫽ ⫽ ⫽ ⫹1 sX 20 20 ©iStockphoto.com/Jesus Jauregui/ImagePixel Now we have rephrased the question to seek the probability that a z-score will be between the mean and z ⫽ ⫹1. Next, we use the z-table to find the proportion of the area under the normal no curve. Back in Figure 6.1 we see that (from column B of the th z-table) the relative frequency of z-scores between the mean and z ⫽ ⫹1 is .3413. Therefore, we know that .3413 of the time, tim we’ll get a mean between 500 5 and 520 when we are sa sampling our underlying SAT ra raw score population, because the sampling b distribution shows us that’s how often those means occur. So, when we select a sample of 25 scores from the underlying SAT raw score population, the probability is .3413 that the sample mean will be between 500 and 520. And here’s the most important part: Our underlying population of raw scores also reflects a population of students who have taken the SAT. When we select a sample of scores it is the same as selecting a sample of students who have those scores. Therefore, the probability of randomly selecting a particular sample mean is also the probability of randomly selecting a sample of participants whose scores produce that mean. Thus, we’ve determined that if we randomly select 25 participants from the population of students who have taken the SAT, the probability is .3413 that their sample mean will between 500 and 520. Here’s another example: Say we seek the probability of obtaining SAT sample means above 540. As in the right-hand tail of Figure 6.1, a mean of 540 has a z-score of ⫹2. As shown (in column C of the z-table), the relative frequency of z-scores beyond this z is .0228. Therefore, the probability is .0228 that we will select a sample whose SAT scores produce a mean higher than 540. Finally, say we seek the probability of means that are either above 540 or below 460. This translates into seeking z-scores beyond a z of {2. In Figure 6.1, beyond z ⫽ ⫹2 in the right-hand tail is .0228 of the curve, and beyond z ⫽ ⫺2 in the left-hand tail is also .0228 of the curve. When we talk about one area of the distribution or another area, we add the two areas together. Therefore, a total of .0456 of the curve contains z-scores beyond {2, so the probability is .0456 that we’ll obtain a mean above 540 or below 460. PROBABILITY OF SELECTING A PARTICULAR SAMPLE MEAN IS THE THE SAME AS THE PROBABILITY OF RANDOMLY SELECTING A SAMPLE OF PARTICIPANTS WHOSE SCORES PRODUCE THAT MEAN. Chapter 6: Using Probability to Make Decisions about Data Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 93 For Practice 1. With m ⫽ 500, sX ⫽ 100, and N ⫽ 25, what is the probability of selecting a X above 530? 2. Approximately what is the probability of selecting an SAT sample mean having a z-score between {1? 3. For some raw scores, if m ⫽ 100, are we more likely to obtain a sample mean close to 100 or a mean very different from 100? 4. The farther sample means are into the tail of the sampling distribution, the lower/higher their probability. > Answers 4. lower Before proceeding, be sure you understand how a sampling distribution indicates the probability of samwhich the individuals and scores accurately ple means. In particular, look again reflect the individuals at Figure 6.1 and see what happens and scores in the when means have a larger z-score that population places them farther into the tail of the sampling distribution: The height of the curve above the means decreases, indicating that they occur less often. Therefore, a sample mean having a larger z-score is less likely to occur when we are dealing with the underlying raw score population. For example, in Figure 6.1, a mean of 560 has a z-score of ⫹3, indicating that we are very unlikely to select a sample of students from the SAT population that has such a high mean. representative sample A sample in 3. A mean close to 100 is more likely. 2. With 68% of the distribution here, p ⫽ .68. 1. sX ⫽ 100/ 225 ⫽ 20; z ⫽ (530 ⫺ 500)/20 ⫽ ⫹1.5; p ⫽ .0668 The larger the absolute value of a sample mean’s z-score, the less likely the mean is to occur when samples are drawn from the underlying raw score population. > Quick Practice > > To find the probability of particular sample means, envision the sampling distribution, compute the z-score, and apply the z-table. Sample means farther into the tail of the sampling distribution are less likely. More Examples In a population, m ⫽ 35 and sX ⫽ 8. What is the probability of obtaining a sample (N ⫽ 16) with a mean above X ⫽ 38.3?, first compute the standard error of the mean: sX ⫽ sX / 2N ⫽ 8/ 216 ⫽ 2. Then z ⫽ ( x ⫺ m)/sX ⫽ (38.3 ⫺ 35)/2 ⫽ ⫹1.65. The means above 38.3 are in the upper tail of the distribution, so from column C of the z-table, sample means above 38.3 have a p ⫽ .0495. 94 6-4 RANDOM SAMPLING AND SAMPLING ERROR Now that you can compute the probability of sample means, you are ready to begin learning about inferential statistics. The first step is to understand why researchers need such procedures. Recall that in research, we select a sample of participants from the population we wish to describe. Then we want to conclude that the way our sample behaves is the way the population would behave, if we could observe it. We summarize this by using our sample’s mean to estimate the population’s mean. However, it is at this step that researchers need inferential statistics because there is no guarantee that the sample accurately reflects the population. In other words, we can never be certain that a sample is representative of the population. In a representative sample, the characteristics of the individuals and scores in the sample accurately reflect the characteristics of the individuals and scores in the population. For example, if 55% of a population is women, then a representative sample has 55% women so that we’ll have the correct mix of male and female scores. Or, if for some reason 20% of the population scored 475 on the SAT, then a representative sample has 20% scoring at 475. And so on. To put it simply, a representative sample is a miniature version of the population. More importantly, when Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Wayne HUTCHINSON/Alamy 4FR/E+/Getty Images we have a representative sample, our statistics will also be accurate: If the m in the SAT population is 500, then the X in a representative sample will be 500. The reason researchers select participants using random sampling is to produce a representative sample. A random sample should be representative because, by being unselective when choosing participants, we allow the characteristics of the population to occur naturally in the sample, the way they occur in the population. Thus, when 55% of the population is women, a random sample should also have 55% women, because that is how often we should encounter women when randomly selecting participants. In the same way, random sampling should produce a sample having all of the characteristics of the population. At least we hope it works that way! A random sample should be representative, but nothing forces this to occur. The problem is that just by the luck of the draw, a sample may be unrepresentative, having characteristics that do not match those of the population. So, for example, 55% of a population may be women but, by chance, we might select substantially more or fewer women, and then the sample would not have the correct mix of male and female scores. Or, 20% of the population may score at 475, but simply through the luck of who is selected, this score may occur more or less often in our sample. The problem is that if the sample is different from the population, then our statistics will be inaccurate: Although the m in the population may be 500, the sample mean will be some other number. Thus, any sample may be unrepresentative of the population from which it is selected. Because this is such a serious problem, we have a name for it—we say the sample reflects sampling error. Sampling error occurs when random chance produces an sampling unrepresentative sample, with the error When random chance result that a sample statistic (such produces an as X) is not equal to the population unrepresentative parameter it represents (such as m). In sample from a population, with plain English, because of the luck of the result that the the draw, a sample may contain too sample’s statistic is many high scores or too many low different from the scores relative to the population, so population parameter it represents the sample is in error to some degree in representing the population. Sampling error is a central problem for researchers and is the reason you must understand inferential statistics. As you’ll see, in research we will always be able to specify one known population that our sample may be representing. The dilemma for researchers occurs when our X is different from this population’s m: Maybe this is because of sampling error, and we have an unrepresentative sample from this population. Or, maybe this is because we have a representative sample but from some other population. A sample may poorly represent one population because of sampling error, or it may accurately represent some other population. Chapter 6: Using Probability to Make Decisions about Data Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 95 1. Maybe we have sampling error. rror. Perhaps we obtained a sample ple of relatively high SAT scoress simply because of the luck of the draw of who was selected for or our sample. Thus, maybe chancee produced an unrepresentative sample and, although it doesn’t look like it, we aree still representing the ordinary SAT population ation where m equals 500. 2. Maybe the sample does not represent this population. After all, these are Prunepit students, so maybe they are not part of the ordinary SAT population. Instead, they might belong to a different population, one having a m that is not 500. Maybe, for example, they belong to a population where m is 550, and our sample perfectly represents this population. The solution to this dilemma is to use inferential statistics to make a decision about the population being represented by our sample. The next chapter puts all of this into a research context, but in the following sections we will examine the basics for making this decision. 6-5 DECIDING WHETHER A SAMPLE REPRESENTS A POPULATION We deal with the possibility of sampling error in this way: Because we rely on random sampling, how representative a sample is depends on random chance— the luck of the draw of which individuals and scores are selected. Therefore, we can determine whether 96 ou sample is likely to come our from and represent a particular fro population. If the sample is likely pop to occur when that population is sampled, then we decide that the sam sample does represent that popusam lation. If our sample is unlikely to lation occur wh when that population is sampled, then we de decide that the sample does not represent that population, and instead represents some other population. Here’s a simple example. You come across a paragraph of someone’s typing, but you don’t know whose. Is it mine? D Does it represent the population of my typing? t Say that the paragraph contains zero typos. It’s possible that contain some q quirk of chance produced such an un unrepresentative sample, but it’s not likely: l I type errorless words only 20% of the time, so the probability that I could produce an entire errorless paragraph is extremely error small. Thus, because such a sample small is unlikely to come from the population of my typing, you should conclude that the sample represents the population of another, competent typist where such a sample is more likely. On the other hand, say that there are typos in 78% of the words in the paragraph. This is consistent with what you would expect if the sample represents my typing, but with a little sampling error. Although I make typos 80% of the time over the long run, you should not expect precisely 80% typos in every sample. Rather, a sample with 78% errors seems likely to occur simply by chance when the population of my typing is sampled. Thus, you can accept that this paragraph represents my typing, although somewhat poorly. We use the same logic to decide if our Prunepit sample represents the ordinary population of SAT scores where m is 500: We will determine the probability of obtaining a sample mean of 550 from this population. As you’ve seen, we determine the probability of a sample mean by computing its z-score on the sampling distribution of means. Thus, we first envision the sampling distribution created from the ordinary SAT population. This is shown in Figure 6.2. The next step is to calculate the z-score for our sample mean so that we can locate it on this distribution and thus determine its likelihood. ©iStockphoto.com/Eliza Snow For example, say that we return to Prunepit University and in a random sample obtain a mean SAT score of 550. This is surprising, rising, because the ordinary national population of SAT scores has a m of 500. 00. Therefore, we should have obtained d a sample mean of 500 if our sample was perfectly representative of this population. n. How do we explain a sample mean of 550? In every research study, we will use inferential statistics to decide between the following two opposing posing explanations for why a sample mean is different rent from a particular population mean. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. extent, containing slightly higher scores than occur in Sampling Distribution of SAT Means Showing Two Possible Locations of Our the population. Thus, this Sample Mean is exactly the kind of mean we’d expect if our Prunepit µ = 500 sample came from this SAT population and we had a little sampling error. Theref fore, using statistical terminology, we say we retain the idea that our sample probably comes from and X X X X X X X X X X X X X X X X X represents the ordinary SAT A B population, accepting that the difference between our X and m reflects sampling error. However, say that instead, our sample has a When evaluating a mean’s z-score, we will use z-score at location B in Figure 6.2: Following the this logic: Even if we are representing the populadashed line shows that this is a very infrequent and tion where m is 500, we should not expect a perunlikely mean. Seldom is a sample so unrepresentative fectly representative sample having a X of exactly of the ordinary SAT population that it will produce 500.000. . . . (Think how unlikely that is!) Instead, if this mean. In other words, our sample is unlikely to be our sample represents this population, then the samrepresenting this population, because this mean almost ple mean should be close to 500. For example, say never happens with this population. Therefore, we say that the z-score for our sample mean is at location A that we reject that our sample represents this populain Figure 6.2. Read what the frequency distribution tion, rejecting that the difference between the X and indicates by following the dotted line. Remember, the m reflects sampling error. Instead, it makes more sense sampling distribution shows what to expect if we are to conclude that the sample represents some other raw sampling from the underlying SAT raw score popuscore population that has some other m, where this lation. We see that this mean has a relatively high sample mean is more likely. We would make the same frequency and so is very likely in this situation. In decision for a sample mean in the extreme lower tail other words, when we are representing this populaof the sampling distribution. tion, often a sample will be unrepresentative to this Figure 6.2 THIS IS THE LOGIC USED IN ALL INFERENTIAL PROCEDURES, SO BE SURE THAT YOU UNDERSTAND IT: WE WILL ALWAYS BEGIN WITH A KNOWN, UNDERLYING RAW SCORE POPULATION THAT A SAMPLE MAY OR MAY NOT REPRESENT. FROM THE UNDERLYING RAW SCORE POPULATION, WE ENVISION THE SAMPLING DISTRIBUTION OF MEANS THAT WOULD BE PRODUCED. THEN WE DETERMINE THE LOCATION OF OUR SAMPLE MEAN ON THE SAMPLING DISTRIBUTION. THE FARTHER INTO THE TAIL OF THE SAMPLING DISTRIBUTION THE SAMPLE MEAN IS, THE LESS LIKELY THAT THE SAMPLE COMES FROM AND REPRESENTS THE UNDERLYING RAW SCORE POPULATION THAT WE BEGAN WITH. Chapter 6: Using Probability to Make Decisions about Data Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 97 > Quick Practice > > If the z-score shows the sample mean is unlikely in the sampling distribution, reject that the sample is merely poorly representing the underlying raw score population. If the z-score shows that the sample mean is likely in the sampling distribution, accept that the sample is representing the underlying raw score population, although somewhat poorly. More Examples On a sampling distribution created from body weights in the United States, a sample’s mean produces a z ⫽ ⫹5.00! This indicates that such a mean is extremely unlikely when representing this population, so we reject that our sample represents this population. However, say that another mean produced a z ⫽ ⫺.2. Such a mean is close to m and very likely, so our sample is likely to be representing this population, although with some sampling error. For Practice 1. ______ communicates that a sample mean is different from the m it represents. of obtaining our sample from this population, and (2) decide whether this probability makes the sample unlikely to be representing this population. We perform both tasks simultaneously by setting up the sampling distribution. We formalize the decision process in this way: At some point, a sample mean is so far above or below 500 that it is unbelievable that chance produced such an unrepresentative sample. Any samples beyond this point that are farther into the tail are even more unbelievable. To identify this point, as shown in Figure 6.3, we literally draw a line in each tail of the distribution. In statistical terms, the shaded areas beyond the lines make up the region of rejection. As shown, very infrequently are samples so poor at representing the SAT population that they have means in the region of rejection. Thus, the region of rejection is the part of a sampling distribution containing means that are so unlikely that we reject that they represent the underlying raw score population. Essentially, we “shouldn’t” get a sample mean that lies in the region of rejection if we’re representing the ordinary SAT population because such means almost never occur with this population. Therefore, if we do get such a mean, we probably aren’t representing this population: We reject that our sample represents the underlying raw score population and decide that the sample represents some other population. 2. Sampling error occurs because of ______. 3. A sample mean has a z ⫽ ⫹1 on the sampling distribution created from the population of psychology majors. Is this likely to be a sample of psychology majors? 4. A sample mean has a z ⫽ ⫺4 on the previous sampling distribution. Is this likely to be a sample of psychology majors? > Answers: Samples that have means in the region of rejection are so unrepresentative of the underlying raw score population that it’s a better bet they represent some other population. 1. sampling error 2. random chance 3. yes 4. no 6-5a Setting Up the Sampling Distribution To decide if our Prunepit sample represents the ordinary SAT population where m ⫽ 500, we must perform two tasks: (1) Determine the probability 98 Conversely, if our Prunepit mean is not in the region of rejection, then our sample is not unlikely to be representing the ordinary SAT population. In fact, by our definition, samples not in the region of rejection are likely to represent this population. In such cases we retain the idea that our sample is simply poorly representing this population of SAT scores. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. region of rejection The Figure 6.3 Setup of Sampling Distribution of Means Showing the Region of Rejection µ = 500 f Region of rejection equals 2.5% Region of rejection equals 2.5% extreme tails of a sampling distribution containing those sample means considered unlikely to be representing the underlying raw score population criterion The X X X X X X Less than 500 X X X X Sample means X X X X Greater than 500 X X X probability that defines whether a sample is unlikely to represent the underlying raw score population critical value ©iStockphoto.com/DNY59 How do we know where to draw the line that starts the region of rejection? By defining our criterion. The criterion is the probability that defines samples as unlikely to be representing the raw score population. Researchers usually use .05 as their criterion probability. (You’ll see why in Chapter 7.) Thus, using this criterion, those sample means that together occur only 5% of the time are defined as so unlikely that if we get any one of them, we’ll reject that our sample represents the underlying raw score population. Our criterion then determines the size of our region of rejection. In Figure 6.3, the sample means that occur 5% of the time are those that make up the extreme 5% of the sampling distribution. However, we’re talking about means above or below 500 that together are a total of 5% of the curve. Therefore, we divide the 5% in half so the extreme 2.5% of the sampling distribution will form our region of rejection in each tail. Now the task is to determine if our sample mean falls into the region of rejection. To do this, we compare the sample’s z-score to the critical value. The criterion probability that defines samples tha unlikely—and also as u determines the size of dete the region of rejection— th is usually p ⫽ .05 6-5b Identifying the Critical Value The score that marks the inner edge of the region of rejection in a sampling distribution; values that fall beyond it lie in the region of rejection There is a specific z-score at the spot on the sampling distribution where the line marks the beginning of the region of rejection. Because z-scores get larger as we go farther into the tails, if the z-score for our sample is larger than the z-score at the line, then our sample mean lies in the region of rejection. The z-score at the line is called the critical value. A critical value marks the inner edge of the region of rejection and thus identifies the value required for a sample to fall into the region of rejection. Essentially, it is the minimum z-score that defines a sample as too unlikely. How do we determine the critical value? By considering our criterion. With a criterion of .05, the region of rejection in each tail is the extreme .025 of the total area under the curve. From column C in the z-table, the extreme .025 lies beyond the z-score of 1.96. Therefore, in each tail, the region of rejection begins at 1.96, so {1.96 is the critical value of z. Thus, as shown in Figure 6.4, labeling the inner edges of the region of rejection with {1.96 completes how you should set up the sampling distribution. (Note: In the next chapter, using both tails like this is called a two-tailed test.) We’ll use Figure 6.4 to determine whether our Prunepit mean lies in the region of rejection by comparing our sample’s z-score to the critical value. A sample mean lies in the region of rejection if its z-score is beyond the critical value. Thus, if our Prunepit mean has a z-score that is larger than { 1.96, then the sample lies in the region Chapter 6: Using Probability to Make Decisions about Data Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 99 Figure 6.4 Setup of Sampling Distribution of SAT Means Showing Region of Rejection and Critical Values µ f Region of rejection equals 2.5% Sample means z-scores 440 –3 460 –2 Region of rejection equals 2.5% 480 –1 500 0 520 +1 –1.96 Critical value 540 +2 +1.96 Critical value 560 +3 good bet that chance produced our sample from this population. Therefore, we reject that our sample represents the population of SAT raw scores having a m of 500. Notice that we make a definitive, yesor-no decision. Because our sample is unlikely to represent the SAT raw score population where m is 500, we decide that no, it does not represent this population. of rejection. If the z-score is smaller than or equal to the critical value, then the sample is not in the region of rejection. 6-5c Deciding Whether the When a sample’s z-score is beyond the critical value, reject that the sample represents the underlying raw score population. When the z-score is not beyond the critical value, retain the idea that the sample represents the underlying raw score population. Sample Represents a Population Finally, we can evaluate our sample mean of 550 from Prunepit U. First, we compute the sample’s z-score on the sampling distribution created from the ordinary SAT raw score population. There, sX ⫽ 100 and N ⫽ 25, so the standard error of the mean is sX ⫽ sX 1N ⫽ 100 ⫽ 20 125 Then the z-score is X ⫺ m 550 ⫺ 500 ⫽ ⫽ ⫹2.5 sX 20 To complete the procedure, we compare the sample’s z-score to the critical value to determine where the sample mean is on the sampling distribution. As shown in Figure 6.5, our sample’s z of ⫹2.5—and the underlying sample mean of 550—lies in the right-hand region of rejection. This tells us that a sample mean of 550 is among those means that are extremely unlikely to occur when the sample represents the ordinary population of SAT scores. In other words, very seldom does chance—the luck of the draw—produce such unrepresentative samples from this population, so it is not a 100 Tetra Images Tetra Images/Newscom z⫽ We wrap up our conclusions in this way: If the sample does not represent the ordinary SAT population, then it must represent some other population. For example, perhaps the Prunepit students obtained the high mean of 550 because they lied about their scores, so they may represent the population of students who lie about the Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. SAT. Regardless, we Figure 6.5 use the sample mean Completed Sampling Distribution of SAT Means Showing Location of the to estimate the m of the Prunepit U Sample Relative to the Critical Value population that the sample does represent. µ A sample having a mean of 550 is most likely to come from a popuRegion of rejection f Region of rejection lation having a m of equals 2.5% equals 2.5% 550. Therefore, our best estimate is that the Prunepit sample repreSample means 440 460 480 500 520 540 560 sents a population of z-scores –3 –2 –1 0 +1 +2 +3 SAT scores that has a m +1.96 –1.96 Prunepit of 550. X = 474 Critical value Critical value X = 550 On the other hand, z = +2.5 say that our sample mean had been 474, resulting in a z-score of (474 ⫺ 500)>20 ⫽ ⫺1.30. Because ⫺1.30 does not lie beyond the critical value of 3. Compare the sample’s z-score to the critical value. {1.96, this sample mean is not in the region of rejecIf the sample’s z is beyond the critical value, it is tion. Look at Figure 6.5, and see that the sample mean in the region of rejection: Reject that the sample of 474 is a relatively frequent and thus likely mean. represents the underlying raw score population. If Therefore, we know that chance will often produce the sample’s z is not beyond the critical value, do such a sample from the ordinary SAT population, so not reject that the sample represents the underlyit is a good bet that chance produced our sample from ing population. this population. Because of this, we can accept that random chance produced a less than perfectly representative sample for us but that it probably represents the Other Ways to Set Up the SAT population where m is 500. 6-5e Sampling Distribution 6-5d Summary of How to Decide If a Sample Represents the Underlying Raw Score Population The basic question answered by all inferential statistical procedures is “Does the sample represent a particular raw score population?” To answer this: 1. Set up the sampling distribution. Draw the sampling distribution of means with a m equal to the m of the underlying raw score population. Select the criterion probability (usually .05), locate the region of rejection, and determine the critical value ( {1.96 in a two-tailed test). 2. Compute the sample mean and its z-score. a. Compute the standard error of the mean, sX. b. Compute z using X and the m of the sampling distribution. Previously, we placed the region of rejection in both tails of the distribution because we wanted to identify unrepresentative sample means that were either too far above or too far below 500. Instead, we can place the region of rejection in only one tail of the distribution. (In the next chapter you’ll see why you would want to use this one-tailed test.) Say that we are interested only in those SAT means that are less than 500, having negative z-scores. Our criterion is still .05, but now we place the entire region of rejection in the lower, left-hand tail of the sampling distribution, as shown in part (a) of Figure 6.6. This produces a different critical value. The extreme lower 5% of a distribution lies beyond the critical value of ⫺1.645. Therefore, the z-score for our sample must lie beyond ⫺1.645 for it to be in the region of rejection. If it does, we will again conclude that the sample mean is so unlikely to occur when sampling the SAT raw score population that we’ll reject that our sample represents this population. If the z-score is anywhere Chapter 6: Using Probability to Make Decisions about Data 101 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. else on the sampling distribution, even far into the upper tail, we will not reject that the sample represents the population where m ⫽ 500. On the other hand, say that we’re interested only in those sample means greater than 500, having positive z-scores. Then we place the entire region of rejection in the upper, right-hand tail of the sampling distribution, as shown in part (b) of Figure 6.6. Now the critical value is plus 1.645, so only if our sample’s z-score is beyond ⫹1.645 does the sample mean lie in the region of rejection. Then we reject that our sample represents the underlying raw score population. Figure 6.6 Setup of SAT Sampling Distribution to Test (a) Negative z-Scores and (b) Positive z-Scores µ (a) f Sample means z-scores Region of rejection equals 5% 440 –3 460 –2 480 –1 500 0 520 +1 540 +2 560 +3 Critical value = –1.645 µ (b) Region of rejection equals 5% f Sample means z-scores 440 –3 460 –2 480 –1 500 0 520 +1 540 +2 560 +3 Critical value = +1.645 For Practice To decide if a sample represents a particular raw score population, compute the sample mean’s z-score and compare it to the critical value on the sampling distribution. More Examples A sample of SAT scores (N ⫽ 25) produces X ⫽ 460. Does the sample represent the SAT population where m ⫽ 500 and sX ⫽ 100? Compute z : sX ⫽ sX /1N ⫽ 100/125 ⫽ 20; z ⫽ (X ⫺ m)/sX ⫽ (460 ⫺ 500)/20 ⫽ ⫺2. With a criterion of .05 and the region of rejection in both tails, the critical value is {1.96. The sampling distribution is like Figure 6.5. The z of ⫺2 is beyond ⫺1.96, so it is in the region of rejection. Conclusion: The sample does not represent this SAT population. 102 2. The ______ defines the z-score that is required for a sample to be in the region of rejection. 3. For a sample to be in the region of rejection, its z-score must be ______ the critical value. 4. On a test, m ⫽ 60 and sX ⫽ 18. A sample (N ⫽ 100) produces X ⫽ 65. Using the .05 criterion and both tails, does this sample represent this population? > Answers 1. unlikely 2. critical value 3. larger than (beyond) > 1. The region of rejection contains those samples considered to be ______ to represent the underlying raw score population. 4. sX ⫽ 18/ 1100 ⫽ 1.80; z ⫽ (65 ⫺ 60)/1.80 ⫽ ⫹2.78. This z is beyond {1.96, so reject that the sample represents this population; it’s likely to represent the population with m ⫽ 65. > Quick Practice Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out and use the Chapter Review Cards in the back of your book. Check out the additional study aids online in CourseMate at www.cengagebrain.com STUDY PROBLEMS (Answers for odd-numbered questions are in Appendix C.) 1. (a) What does probability convey about an event’s occurrence? (b) What is the probability of a random event based on? 13. Why do we reject that a sample represents the underlying raw score population if the sample mean is in the region of rejection? 2. What is random sampling? 14. Why do we retain that a sample represents the underlying raw score population if the sample mean is not in the region of rejection? 3. In a sample with a mean of 46 and a standard deviation of 8, what is the probability of randomly selecting (a) a score above 64? (b) a score between 40 and 50? 4. The population from which the sample in problem 3 was randomly drawn has a m of 51 and a sX of 14. What is the probability of obtaining a random sample of 25 scores with a mean (a) below 46? (b) below 46 or above 56? 5. David’s uncle is building a house on land that has been devastated by hurricanes 160 times in the past 200 years. However, there hasn’t been a major storm there in 13 years, so his uncle says this is a safe investment. David argues that he is wrong because a hurricane must be due soon. What is wrong with the reasoning of both men? 6. Four airplanes from different airlines have crashed in the past two weeks. This terrifies Megan, who must travel on a plane. Her travel agent claims that the probability of a plane crash is minuscule. Who is correct and why? 7. What does the term “sampling error” indicate? 8. When testing the representativeness of a sample mean: (a) What is the criterion probability? (b) What is the region of rejection? (c) What is the critical value? 9. (a) What does comparing a sample’s z-score to the critical value indicate? (b) What decision do we make about a sample when its z-score is beyond the critical value, and why? 10. What is the difference between using both tails versus one tail of the sampling distribution in terms of (a) the region of rejection? (b) the critical value? 11. Sharon asks a sample of students their choices in the election for class president. She concludes that Ramone will win. It turns out that Darius wins. What is the statistical explanation for this incorrect prediction? 12. (a) Why does random sampling produce representative samples? (b) Why does random sampling produce unrepresentative samples? 15. A researcher obtains a sample mean of 66, which produces a z of ⫹1.45. The researcher uses critical values of {1.96 and decides to reject that the sample represents the underlying raw score population having a m of 60. (a) Draw the sampling distribution and indicate the approximate locations of X, m, the computed z-score, and the critical values. (b) Is the researcher’s conclusion correct? Explain your answer. 16. In a population, m ⫽ 100 and sX ⫽ 25. A sample of 150 people has X ⫽ 120. Using two tails and the .05 criterion: (a) What is the critical value? (b) Is this sample mean in the region of rejection? How do you know? (c) What does the mean’s location indicate about the likelihood of this sample occurring in this population? (d) What should we conclude about the sample? 17. The mean of a population of raw scores is 33 (sX ⫽ 12). Our sample has X ⫽ 36.8 (N ⫽ 30). Using the .05 criterion and the upper tail of the sampling distribution: (a) What is the critical value? (b) Is the sample mean in the region of rejection? How do you know? (c) What does the mean’s location indicate about the likelihood of this sample occurring in this population? (d) What should we conclude about the sample? 18. We obtain a X ⫽ 46.8 (with N ⫽ 15). This sample may represent the population where m ⫽ 50 (sX ⫽ 11). Using the .05 criterion and the lower tail of the sampling distribution: (a) What is our critical value? (b) Is this sample in the region of rejection? How do you know? (c) What should we conclude about the sample and why? 19. The mean of a population of raw scores is 28 (sX ⫽ 9). Your X ⫽ 34 (with N ⫽ 35). Using the .05 criterion with the region of rejection in both tails of the sampling distribution, is the sample representative of this population? Why or why not? 20. The mean of a population of raw scores is 60 (sX ⫽ 16). Your X ⫽ 66 (with N ⫽ 40). Using the .05 criterion with the region of rejection in the lower tail, should you reject that the sample represents this population and why? Chapter 6: Using Probability to Make Decisions about Data 103 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 21. On a test of motor coordination, the population of average bowlers has a mean score of 24, with a standard deviation of 6. A random sample of 30 bowlers at Bubba’s Bowling Alley has a sample mean of 26. A second random sample of 30 bowlers, at Babette’s Bowling Alley, has a mean of 18. Using the .05 criterion and both tails of the sampling distribution, decide if each sample represents the population of average bowlers. 22. (a) In question 21, if a particular sample does not represent the population of average bowlers, what is your best estimate of the population m it does represent? (b) Explain the logic behind this conclusion. 23. A couple with eight daughters decides to have one more baby, because they think this time, they are sure to have a boy! Is this reasoning accurate? 24. In the population of typical statistics students, m ⫽ 75 on a national final exam (sX ⫽ 6.4). 104 For 25 students who studied statistics using a new technique, X ⫽ 72.1. Using two tails of the sampling distribution and the .05 criterion: (a) What is the critical value? (b) Is this sample mean in the region of rejection? How do you know? (c) Should we conclude that the sample represents the population of typical statistics students? 25. In a study you obtain the following data measuring the aggressive tendencies of some football players: 40 30 39 40 41 39 31 28 33 (a) Researchers have found that m ⫽ 30 in the population of non–football players, with sX ⫽ 5. Using both tails of the sampling distribution, determine whether your football players represent this population. (b) What do you conclude about the population of football players and its m? Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. USE THE TOOLS. • Rip out the Review Cards in the back of your book to study. Or Visit CourseMate for: • Full, interactive eBook (search, highlight, take notes) • Review Flashcards (Print or Online) to master key terms • Test yourself with Auto-Graded Quizzes • Bring concepts to life with Games, Videos, and Animations! Complete the Speak Up survey in CourseMate at www.cengagebrain.com Follow us at www.facebook.com/4ltrpress Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ©iStockphoto.com/A-Digit Go to CourseMate for Behavioral Sciences STAT2 to begin using these tools. Access at www.cengagebrain.com 7 Chapter OVERVIEW OF STATISTICAL HYPOTHESIS TESTING: THE z-TEST LOOKING BACK GOING F O R WA R D Be sure you understand: Your goals in this chapter are to learn: • From Chapter 1, what the conditions of an independent variable are and what the dependent variable is. • Why the possibility of sampling error causes researchers to perform inferential statistics. • From Chapter 4, that a relationship in the population occurs when different means from the conditions represent different ms and thus different distributions of dependent scores. • From Chapter 6, that when a sample’s z-score falls in the region of rejection, the sample is unlikely to represent the underlying raw score population. • When experimental hypotheses lead to either one-tailed or twotailed tests. • How to create the null and alternative hypotheses. • When and how to perform the z-test. • How to interpret significant and nonsignificant results. • What Type I errors, Type II errors, and power are. I n the previous chapter, you learned the basics involved Sections 7-1 The Role of Inferential Statistics in Research 7-2 7-3 7-4 Setting Up Inferential Procedures in inferential statistics. Now we’ll put these procedures into a research context and present the statistical language and symbols used to describe them. Until further notice, we’ll be talking about experiments. Performing the z-Test This chapter shows (1) how to set up an inferential Interpreting Significant and Nonsignificant Results procedure, (2) how to perform the “z-test,” (3) how to 7-5 7-6 7-7 Summary of the z-Test describe potential errors in our conclusions. 7-8 Errors in Statistical Decision Making 106 Behavioral Sciences STAT2 interpret the results of a procedure, and (4) the way to The One-Tailed Test Statistics in the Research Literature: Reporting the Results Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ©iunewind/Shutterstock.com 7-1 THE ROLE OF INFERENTIAL STATISTICS IN RESEARCH As you saw in the previous chapter, a random sample may be unrepresentative of a population because, just by the luck of the draw, the sample may contain too many high scores or too many low scores relative to the population. Because the sample is not perfectly representative of the population, it reflects sampling error, so the sample mean does not equal the population mean. Sampling error is a potential problem in all behavioral research. Recall that the goal of research is to use sample data to describe the relationship found in the population. The dilemma for researchers is that sampling error may produce misleading sample data, so that we draw incorrect conclusions about the population. For example, in an experiment we hope to see a relationship in which, as we change the conditions of the independent variable, participants’ scores on the dependent variable change in a consistent fashion. If the means for our conditions differ, we infer that, in nature, each condition would produce a different population of scores located at a different m. But! Here is where sampling error comes in. Perhaps we are wrong and the relationship does not exist in nature. Maybe all of the scores actually come from the same population, and the means in our conditions differ simply because of which participants we happened to select for each—because of sampling error. We won’t know this, of course, so we will be misled into thinking the relationship does exist. For example, say we compare the creativity scores of some men and women, although we are unaware that in nature, men and women do not differ on this variable. Through sampling error, however, we might select some females who are more creative than our males or vice versa. Then sampling error will mislead us into thinking there’s a relationship here, although really there is not. Or, perhaps there is a relationship in the population, but because of sampling error, we see a different relationship in our sample data. For example, say we measure the heights of some men and women and, by chance, obtain a sample of short men and a sample of tall women. If we didn’t already know that in the population men are taller, sampling error would mislead us into concluding that women are taller. Thus, in every study it is possible that we are being misled by sampling error, so that the relationship we see in our sample data is not the relationship found in nature. This is the reason why, in every study, we apply inferential stainferential statistics tistics. Inferential statistics are Procedures for used to decide whether sample data deciding whether represent a particular relationship in sample data represent a particular the population. Essentially, we decide relationship in the whether we should believe our sampopulation ple data: Should we believe what the Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test 107 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. sample data appear to indicate about the relationship in the popuInferential procedures that require certain lation, or instead, is it assumptions about the likely that the sample raw score population relationship is a coinrepresented by the sample; used when we cidence produced by compute the mean sampling error that misnonparametric represents what is found statistics Inferential in the population? procedures that do The specific infernot require stringent ential procedure we assumptions about the raw score population should use in a particurepresented by the lar experiment depends sample; used with the first on the charactermedian and mode istics of our dependent experimental variable. We have two hypotheses Two statements general categories of describing the predicted procedures to choose from. The first relationship that is parametric statistics. Parametmay or may not be demonstrated by a study ric statistics are procedures that require specific assumptions about the raw score populations being represented. Each procedure has its own assumptions, but there are two assumptions common to all parametric procedures: (1) The population of dependent scores should be at least approximately normally distributed. (2) The scores should be interval or ratio scores. So, parametric procedures are used when it is appropriate to compute the mean. The other category is nonparametric statistics, which are inferential procedures that do not require stringent assumptions about the populations being represented. These procedures are used with nominal or ordinal scores or with skewed interval or ratio distributions. So, nonparametric procedures are used when we compute the median or the mode. Image Source Ltd/the Agency Collection/Jupiter Images parametric statistics In this and upcoming chapters we will discuss the most common parametric statistics. (Chapter 13 deals with nonparametrics.) Once we have decided to use parametric procedures, we select a specific procedure depending on the particulars of our experiment’s design. However, we set up all inferential procedures in a similar way. 7-2 SETTING UP INFERENTIAL PROCEDURES Researchers perform four steps in an experiment: They create the experimental hypotheses, design and conduct the experiment to test the hypotheses, translate the experimental hypotheses into statistical hypotheses, and test the statistical hypotheses. 7-2a Creating the Experimental Hypotheses An experiment always tests two experimental hypotheses which describe the possible outcomes Parametric and nonparametric inferential statistics are for deciding if the data reflect a relationship in nature, or if sampling error is misleading us into thinking there is this relationship. 108 of the study. In general, one hypothesis states we will demonstrate that the predicted relationship operates in the population. By “predicted relationship” we mean that manipulating the independent variable will have the expected influence on dependent scores. The other hypothesis states we will not demonstrate the predicted relationship in the population (manipulating the independent variable will not “work” as expected). We can predict a relationship in one of two ways. Sometimes we predict that changing the conditions of the independent variable will cause dependent scores Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ©iStockphoto.com/Emilia Stasiak to change, even though we are not sure whether the scores will increase or decrease. This leads to a “twotailed” test. A two-tailed test is used when we do not predict the direction in which dependent scores will change. Thus, we’d have a two-tailed hypothesis if we thought men and women differ in creativity but were unsure who would score higher. The other approach is a one-tailed test. A one-tailed test is used when we do predict the direction in which dependent scores will change. We may predict that scores will only increase, or that they will only decrease. Thus, we’d have a one-tailed test if we predicted only that men are more creative than women. Or, we might predict only that women are more creative than men. Let’s first examine a study involving a two-tailed test. Say we’ve discovered a substance related to intelligence that we will test on humans using an “IQ pill.” The number of pills a person consumes is our independent variable, and the person’s IQ score is our dependent variable. Say that we believe the pill will affect IQ, but we are not sure whether it will make people smarter or dumber. Therefore, we have a two-tailed test. When you are applying inferential procedures, be sure to identify the “predicted relationship.” Here we implicitly predict that the more of the pill participants consume, the more their IQ scores will change. And note: We will simplify things by assuming that if IQ scores change, the only variable that could cause this change is our pill. (In real research we do not make this assumption.) Thus, here are our two-tailed experimental hypotheses: 1. We will demonstrate that the pill works by either increasing or decreasing IQ scores. 2. We will not demonstrate that the pill works, because IQ scores will not change. 7-2b Designing a One-Sample Experiment We could design an experiment to test the IQ pill in a number of different ways. However, the simplest is as a one-sample experiment. We will randomly select one sample of participants and give each person, say, one pill. Later we’ll give participants an IQ test. The sample will represent the population of people when they have taken one pill, and the sample X will represent that population’s m. However, to demonstrate a relationship, we must demonstrate that different conditions produce different populations having different ms. Therefore, to perform a one-sample experiment, we must already know the population mean for participants tested under another condition of the independent variable. So, we must compare the population receiving one pill represented by our sample to some other, known population that receives a different amount of the pill. One population we know about that has received a different amount is the population that has received zero amount of our pill. Say that our IQ test has been given to many people over the years who have not taken the pill, and that this population has a m of 100. We will compare this population without the pill to the population with the pill represented by our sample. If the population without the pill has a different m than the population with the pill, then we have evidence of a relationship in the population. 7-2c Creating the Statistical Hypotheses So that we can apply statistical procedures, we translate the experimental hypotheses into two statistical hypotheses. Statistical hypotheses describe the population parameters that the sample data represent if the predicted relationship does or does not exist. The two statistical hypotheses are called the alternative hypothesis and the null hypothesis. THE ALTERNATIVE HYPOTHESIS It is easier to create the alternative hypothesis first, because it corresponds to the experimental hypothesis that the experiment does work as two-tailed test The type of inferential test used when we do not predict whether dependent scores will increase or decrease one-tailed test The type of inferential test used when we predict that dependent scores will only increase or will only decrease statistical hypotheses Statements that describe the population parameters the sample statistics represent if the predicted relationship exists or does not exist Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test 109 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. predicted. The alternative hypothesis describes the scores might decrease, but we do know that the m of the population parameters that the sample data represent if population with the pill will be less than 100. the predicted relationship occurs in nature. It is always Our alternative hypothesis will communicate all the hypothesis of a difference: It says that changing the of the above. However, every procedure involves an independent variable “works” by producing the prealternative hypothesis so we use symbols to quickly dicted difference in scores in the population. You can express it. If the pill works as predicted, then the popsee our predicted relationship in Figures 7.1 and 7.2. ulation with the pill will have a m that is either greater Figure 7.1 shows the relationship if the pill than or less than 100. In other words, the population increases IQ. Without the pill, the population is cenmean with the pill will not equal 100. The symbol for tered at a score of 100. By giving everyone one pill, the alternative hypothesis is Ha. The symbol for not however, all scores tend to increase so that the distriequal is “⬆,” so our alternative hypothesis is bution moves to the right, over to the higher scores. Ha: m ⬆ 100 We don’t know how much scores will increase, but we do know that the m of the population with the pill This proposes that our sample mean represents a will be greater than 100, because 100 is the m of the m not equal to 100. Because the m without the pill is population without the pill. 100, we know that Ha implies there is a relationship in On the other hand, Figure 7.2 shows the relationthe population. (In a two-tailed test, Ha is always that ship if the pill decreases IQ. Here, the distribution with m ⬆ some value.) the pill moves to the left, over to the lower Figure 7.1 scores. We also don’t Relationship in the Population If the IQ Pill Increases IQ Scores know how much No pill µ = 100 alternative hypothesis (Ha) The hypothesis describing the population parameters the sample data represent if the predicted relationship does exist in nature f null hypothesis (H0) The hypothesis describing the population parameters the sample data represent if the predicted relationship does not exist in nature One pill µ > 100 X X X Low IQ scores X X X X X X X X X X X X X X X High IQ scores X X X X High IQ scores IQ scores Figure 7.2 Relationship in the Population If the IQ Pill Decreases IQ Scores ©iStockphoto.com/Timur Nisametdinov One pill µ < 100 No pill µ = 100 f X X X Low IQ scores X X X X X X X X X X X IQ scores 110 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. The statistical Figure 7.3 hypothesis corresponding to the experimental Population of Scores If the IQ Pill Does Not Affect IQ Scores hypothesis that the experiment does not work as predicted is called the null hypothesis. The µ = 100 null hypothesis describes the population with or without pill parameters that the sample data represent if the predicted relationship does not occur in nature. It is always the hypothesis of no difference: It says that changing the independent variable does not “work” because it does not f produce the predicted difference in scores in the population. If the IQ pill does not work, then it would be X X X X X X X X X X X as if the pill were not present. The population of Low IQ scores High IQ scores IQ scores without the pill has a m of 100. ThereIQ scores fore, if the pill does not work, then the population of scores will be unchanged and m will still be 100. Accordingly, if we measured the population with and without the pill, we would have one population of scores, located at the m of 100, as shown in Figure 7.3. Our null hypothesis will communicate the The null hypothesis shows the value of – above but we again express it using symbols. m that our X represents if the predicted The symbol for the null hypothesis is H0. (The relationship does not exist. subscript is 0 because null means zero, as in zero relationship.) The null hypothesis for the The alternative hypothesis shows the – IQ pill study is value of m that our X represents if the ©iStockphoto.com/Joanne Harris and Daniel Bubnich THE NULL HYPOTHESIS > Quick Practice H0: m ⫽ 100 This proposes that our sample with the pill represents the population where m is 100. Because this is the same population found without the pill, we know that H0 implies the predicted relationship does not occur in nature. (In a two-tailed test, H0 is always that m ⫽ some value.) > > predicted relationship does exist. More Examples In a experiment, we compare a sample of men to the population of women who have a m of 75. We predict simply that men are different from women, so this is a two-tailed test. The alternative hypothesis is that our men represent a different population, so their m is not 75; thus, Ha:m ⬆ 75. The null hypothesis is that men are the same as women, so the men’s m is also 75, so H0:m ⫽ 75. For Practice The alternative hypothesis (Ha) always says the sample data represent a m that reflects the predicted relationship. The null hypothesis (H0) says the sample data represent the m that’s found when the predicted relationship is not present. 1. A _____ test is used when we do not predict the direction that scores will change; a _____ test is used when we do predict the direction that scores will change. 2. The _____ hypothesis says the sample data represent a population where the predicted relationship exists. The _____ hypothesis says the sample data represent a population where the predicted relationship does not exist. (continued) Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test 111 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 3. The m for adults on a test is 140. We test a sample of children to see if they are different from adults. What are Ha and H0? 4. The m for days absent among workers is 15.6. We train a sample of new workers and ask whether the training changes worker absenteeism in the population. What are Ha and H0? > Answers 4. Ha: m ⬆ 15.6; H0: m ⫽ 15.6 3. Ha: m ⬆ 140; H0: m ⫽ 140 2. alternative; null 1. two-tailed; one-tailed 7-2d The Logic of Statistical Hypothesis Testing The statistical hypotheses for the IQ pill study are H0: m ⫽ 100 and Ha: m ⬆ 100. Remember, these are hypotheses—guesses—about the population that is represented by our sample with the pill. (We have no uncertainty about what happens without the pill; we know the m there.) Notice that, together, H0 and Ha include all possibilities, so one or the other must be true. We use inferential procedures to test (choose between) these hypotheses, so these procedures are called “statistical hypothesis testing.” Say that we randomly selected 36 people, gave them the pill, measured their IQs, and found their mean was 105. We would like to say this: People who have not taken this pill have a mean IQ of 100, so if the pill did not work, the sample mean should have been 100. Therefore, a mean of 105 suggests that the pill does work, raising IQ scores about 5 points. If the pill does this for the sample, it should do this for the population, so we expect that a population receiving the pill would have a m of 105. A m of 105 is “not equal to 100,” so our results fit our alternative hypothesis (Ha: m ⬆ 100), and it looks like the pill works. Apparently, if we measured everyone in the population with and without the pill, we would have the two distributions shown previously in Figure 7.1, with the population that received the pill located at the m of 105. But hold on! We just assumed that our sample is perfectly representative of the population it represents. 112 But what if we have sampling error? Maybe we obtained a mean of 105 not because the pill works, but because we inaccurately represented the situation where the pill does not work. After all, it is unlikely that any sample is perfectly representative, so even if our sample represents the population where m is 100, we don’t expect our X to equal exactly 100! So, maybe the pill does nothing, but by chance we happened to select participants who already had an above-average IQ. Thus, maybe the null hypothesis is correct: Maybe our sample actually represents the population where m is 100. Likewise, in any study we cannot automatically infer that the predicted relationship exists in nature when the sample data show the relationship, because we are still confronted by our two choices: 1. The H0 , which implies we are being misled by sampling error: By chance we obtained an unrepresentative sample that produced data that coincidentally fit the predicted relationship. This gives the appearance of the relationship in nature although in reality this relationship does not exist. Therefore, we should not believe that the independent variable influences scores as our sample indicates. (In our example, the m with the pill is really 100.) 2. The Ha, which implies we are representing the predicted relationship: We obtained sample data that fit the predicted relationship because this relationship operates in nature and it produced our data. Therefore, we can believe that the independent variable influences scores as our sample indicates. (In our example, the m with the pill is really 105.) Before we can believe our sample, we must first be sure we are not being misled by sampling error. However, the only way to prove whether H0 is true is to give the pill to everyone in the population and see whether m is 100 or 105. We cannot do that. We can, however, determine how likely it is that H0 is true. That is, using the procedure discussed in the previous chapter, we will determine the likelihood of obtaining a sample mean of 105 from the population that has a m of 100. If such a mean is too unlikely, then we will reject that our sample represents this population, rejecting that H0 is the correct hypothesis for our study. All parametric and nonparametric procedures use the above logic. To select the correct procedure for a particular experiment, you should check that the design and dependent scores fit the “assumptions” of the procedure. The IQ pill study meets the assumptions of the parametric procedure called the z-test. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 7-3 PERFORMINGG populat population that H0 says z-test The our sam sample represents. parametric procedure used in a singleIn our p pill study, we sample experiment want to see the likeliwhen the standard hood o of getting our deviation of the raw sample mean from sampl score population is known the p population where m ⫽ 100. whe alpha (A) The Greek letter that Thus, it is as if Th symbolizes the we have created w criterion probability the sampling th distribution by di infinitely sampling IQ scores from in the population of people who have th not taken the pill. no THE z-TEST ©iStockphoto.com/Anne-Louise Quarfoth The z-test is the procedure for computing a z-score for a sample ple mean that we’ve discussed in prerevious chapters. The z-test has four ur assumptions: 1. We have randomly selected onee sample. 2. The dependent variable is at least approximately normally distributed in the population and involves an interval or ratio scale. 2. Identify the m of the sampling distribution as the value of m given in the null hypothesis. In our pill study, if we infinitely sample the raw score population where the average score is 100, then the average sample mean will also be 100. 3. We know the mean of the population of raw scores under another condition of the independent variable. 4. We know the true standard deviation of the population (sX) described by the null hypothesis. 3. Select the alpha. Recall that the criterion is the probability that defines sample means as unlikely to represent the underlying raw score population. The symbol for the criterion is A, the Greek letter alpha. Usually our criterion, our “alpha level,” is .05, so in symbols we say a ⫽ .05. Say that from the research literature, we know that IQ scores meet the requirements of the z-test and that in the population where m is 100, the standard deviation is 15. Therefore, the next step is to perform the z-test. (Note: SPSS does not perform the z-test.) 4. Locate the region of rejection. Recall that we may use one or both tails of the sampling distribution. Which arrangement to use depends on whether we have a two-tailed or one-tailed test. Above, we created a two-tailed hypothesis, predicting that the pill makes people either smarter or dumber, producing a X that is either larger than 100 or 7-3a Setting Up the Sampling Distribution for a Two-Tailed Test To perform the z-test we must create the sampling distribution of means and identify Figure 7.4 the region of rejecSampling Distribution of IQ Means for a Two-Tailed Test tion as we did in the A region of rejection is in each tail of the distribution, marked by the critical values of { 1.96. previous chapter. The finished sampling disµ tribution is shown in Figure 7.4. To create it we performed the following steps (and f Region of rejection added some new equals 2.5% symbols). 1. Create the sampling distribution of means from the underlying raw score Sample means z-scores X –3 X X X –2 zcrit = –1.96 X –1 X 100 0 X X +1 X Region of rejection equals 2.5% X +2 X X +3 zcrit = +1.96 Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test 113 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. smaller than 100. Therefore, we have a twotailed test, with part of the region of rejection in each tail. our sample represents. For our IQ pill study, sX is 15 and N is 36, so 5. Determine the critical value. We’ll abbreviate the critical value of z as zcrit. Recall that with a ⫽ .05, the region of rejection in each tail is 2.5% of the distribution. From the z-table, z ⫽ 1.96 demarcates this region. Thus, we complete Figure 7.4 by adding that zcrit is {1.96. sX ⫽ sX 1N ⫽ 15 15 ⫽ ⫽ 2.5 6 136 Now compute zobt: The value of m to put in the formula is always the m of the sampling distribution, which is also the m of the raw score population that H0 says the sample represents (here m ⫽ 100). The X is computed from the scores in the sample. Then the z-score for our sample mean of 105 is zobt ⫽ The mean of the sampling distribution always equals the m of the raw score population that H0 says we are representing. X ⫺ m 105 ⫺ 100 ⫹5 ⫽ ⫽ ⫽ ⫹2 sX 2.5 2.5 The final step is to interpret this zobt by comparing it to zcrit. 7-3c Comparing the Obtained z to the Critical Value The sampling distribution always describes the situation when null is true: Here, it shows all possible means that occur when, as H0 says, our sample comes from the population where m is 100 (from the situation where the pill does not work). If we are to believe H0, the sampling distribution should show that a X of 105 is relatively frequent and thus likely in this situation. However, Figure 7.5 shows just the opposite. A zobt of ⫹2 tells us that a X of 105 seldom occurs 7-3b Computing z Now it’s time to compute the z-score for our sample mean. The z-score we compute is “obtained” from the data, so we’ll call it z obtained, which we abbreviate as zobt. You know how to compute this from previous chapters. THE FORMULA FOR THE z-TEST IS X⫺m zobt ⫽ sX Figure 7.5 Sampling Distribution of IQ Means The sample mean of 105 is located at zobt ⫽ ⫹2.00. µ where sX ⫽ sX 1N First, we compute the standard error of the mean (sX ). In the formula, sX is the known standard deviation of the underlying raw score population that H0 says 114 f Sample means z-scores X –3 X X –2 X zcrit = –1.96 X –1 X 100 0 X X +1 X X +2 zcrit = +1.96 X X +3 zobt = +2 ( X = 105) Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. by chance when we are representing the population where m is 100. This makes it difficult to believe that our sample mean of 105 occurred by chance from this population: A mean like ours hardly ever occurs in the situation where the pill doesn’t work. In fact, because our zobt of ⫹2 is beyond the zcrit of {1.96, our sample is in the region of rejection. Therefore, we conclude that our sample is so unlikely to represent the population where m ⫽ 100 that we reject that the sample represents this population. In other words, we reject that our results poorly represent the situation where the pill does not work. In statistical terms, we say that we have “rejected” the null hypothesis. If we reject H0, then we are left with Ha, and so we “accept Ha.” Here, Ha is m ⬆ 100, so we accept that our sample represents a population where m is not 100. If this makes your head spin, it may be because the logic actually involves a “double negative.” When our sample falls in the region of rejection, we say “no” to H0. But H0 says there is no relationship involving our pill. By rejecting H0 we are saying “no” to no relationship. This is actually saying, “Yes, there is a relationship involving our pill,” which is what Ha says. Thus, it appears we have evidence of a relationship in nature such that the pill would change the population of IQ scores. In fact, we can be more specific: A sample mean of 105 is most likely to represent the population where m is 105. So without the pill, the population of IQ scores is at a m of 100, but with the pill we expect scores will rise to produce a population with a m at about 105. NULL HYPOTHESIS IS SAYING “NO” TO THE IDEA THAT THERE IS NO RELATIONSHIP. REJECTING THE A sample statistic that lies beyond the critical value is in the region of rejection, so we reject H0 and accept Ha. 7 7-4a Interpreting 7-4 INTERPRETING ETING SIGNIFICANT AND NONSIGNIFICANT ANT RESULTS ©iStockphoto.com/Julien Tromeur Once we have made a decision sion about the statistical hypotheses (H0 and Ha), we then make a decision about the corresponding original ginal experimental hypothesis. When we reject H0 we also reject the experimental hypothesis that our independent variable ble does not work as predicted. ed. Therefore, we will reject that at our pill does not work. By accepting Ha we accept that at our pill appears to work. S Significant Results T The way to communicate th we have rejected H0 that a accepted Ha is to use the and te term significant. Significant do does not mean important or impressive. Significant indic indicates that our results are unlike to occur if the predicted unlikely relationshi does not exist in the relationship Ther population. Therefore, we imply that the relationship rela significant ex Describes results that found in the experiment are unlikely to result re is “believable,” representfrom sampling error ing a relationship found in when the predicted nature, and that it does not relationship does not exist; it indicates err from reflect sampling error rejection of the null the situation where the relahypothesis tionship does not exist. Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test 115 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Although we accept that a relationship exists, there the pill affect intelligence, what brain mechanisms are are three very important restrictions on how far we can involved, and so on. go with this claim. First, we never prove that H0 is false. The sampling distribution in Figure 7.5 shows that a Interpreting Nonsignificant mean of 105 does occur once in a while when we are Results representing the population where m is 100. Maybe our sample was one of these times. Maybe the pill did not Let’s say that the IQ pill had instead produced a samwork, and our sample was very unrepresentative of this. ple mean of 99. Now the z-score for the sample is Second, we do not prove that our independent variable caused the scores to change. In our pill study, X ⫺ m 99 ⫺ 100 ⫺1 although we’re confident that our sample represents a zobt ⫽ ⫽ ⫽ ⫽ ⫺.40 sX 2.5 2.5 population with a m above 100, we have not proven that it was the pill that produced these scores. Some As in Figure 7.6, a zobt of ⫺.40 is not beyond the zcrit other, hidden variable might have actually caused the of {1.96, so the sample mean does not lie in the region higher scores in the sample. of rejection. This indicates that a mean of 99 is likely Finally, we do not know the exact m of the popuwhen sampling the population where m ⫽ 100. Thus, lation represented by our sample. In our pill study, our sample mean was likely to have occurred if we were assuming that the pill does increase IQ scores, the representing this population. In other words, our mean population m would probably not be exactly 105. Our of 99 was a mean you’d expect if we were representing sample mean may contain (you guessed it) sampling the situation where the pill does not work and we had error! That is, the sample may accurately reflect that a little sampling error. Therefore, we will not conclude the pill increases IQ, but it may not perfectly represent that the pill works. After all, it makes no sense to claim how much the pill increases scores. Therefore, we conthat the pill works if the results were likely to occur clude that the m produced by our pill would probably without the pill. Thus, our null hypothesis—that our be around 105. sample represents the population of scores without the Bearing these qualifications in mind, we interpill—is a reasonable hypothesis, so we will not reject it. pret the X of 105 the way we wanted to several pages However, we have not proven that H0 is true; in such back: Apparently, the pill increases IQ scores by about situations, we “fail to reject H0” or we “retain H0.” 5 points. But now, because the results are significant, To communicate the above we say we have we are confident we are not being misled by sampling nonsignificant or not significant results. (Don’t say error. Instead, we are confident we have evidence “insignificant.”) Nonsignificant indicates that the of a relationship found in nature. Therefore, after describing this relationship, we return to being behavioral researchers Figure 7.6 and attempt to explain Sampling Distribution of IQ Means how nature operates in The sample mean of 99 has a zobt of ⫺.40. terms of the variables µ and behaviors we are studying. Thus, our final step would be to describe how the ingredients in 7-4b f nonsignificant Describes results that are likely to result from sampling error when the predicted relationship does not exist; it indicates failure to reject the null hypothesis 116 Sample means z-scores X –3 X X X –2 zcrit = –1.96 X –1 X 100 0 X X +1 X X +2 X X +3 zcrit = +1.96 zobt = –.40 ( X = 99) Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. relationship shown by our sample data was likely to have occurred through chance sampling error, without there being a real relationship in nature. Nonsignificant results do not prove that the independent variable does not work. We have simply failed to find convincing evidence that it does work. The only thing we’re sure of is that sampling error could have produced our data. Therefore, we still have two hypotheses that are both viable: 1. H0, that the results do not really represent a relationship 2. Ha, that the results do represent a relationship Maybe the pill does not work. Or, maybe the pill does work, but the sample data do not convincingly show this. We simply don’t know. Therefore, when you do not reject H0, do not say anything about whether the independent variable influences behavior or not. All that you can say is that you have failed to convincingly demonstrate the predicted relationship in the population. and determine if it is a one- or a two-tailed test. Then H0 describes the m that the X represents if the predicted relationship does not exist. Ha describes the m that the X represents if the relationship does exist. 2. Set up the sampling distribution. Select a, locate the region of rejection, and determine the critical value. 3. Compute the X and zobt. First, compute s . Then in X the formula for z, the value of m is the m of the sampling distribution, which is also the m of the raw score population that H0 says is being represented. 4. Compare zobt to zcrit. If zobt lies beyond zcrit, then reject H0, accept Ha, and say the results are “significant.” Then describe the relationship. If zobt does not lie beyond zcrit, do not reject H0, and say the results are “nonsignificant.” Do not draw any conclusions about the relationship. > Quick Practice Nonsignificant indicates that we have failed to reject H0 because our results are not in the region of rejection and are thus likely to occur when there is not a relationship in nature. Nonsignificant results provide no convincing evidence—one way or the other—as to whether a relationship exists in nature. > If zobt lies beyond zcrit, reject H0; the results are significant, and conclude there is evidence for the predicted relationship. Otherwise, the results are not significant, and make no conclusion about the relationship. More Examples We test a new techniques for teaching reading. Without it, the m on a reading test is 220, with sX ⫽ 15. An N of 25 participants has X ⫽ 211.55. Then: 1. With a two-tailed test, H0: m ⫽ 220; Ha: m ⬆ 220. 2. Compute zobt: s ⫽ sX / 1N ⫽ 15/ 125 ⫽ 3; X zobt ⫽ (X ⫺ m )/s ⫽ (211.55 ⫺ 220)/3 ⫽ ⫺2.817. X 3. With a ⫽ .05, zcrit is {1.96, and the sampling distribution is like Figure 7.4. 7-5 SUMMARY OF THE 4. The zobt of ⫺2.817 is beyond the zcrit of ⫺1.96, so the results are significant: The data reflect a relationship, with the m of the population using the technique at around 211.55, while those not using it at m ⫽ 220. Altogether, the preceding discussion can be summarized as follows. For a one-sample experiment that meets the assumptions of the z-test: Another reading study produced z ⫽ ⫺1.83. This zobt is not beyond zcrit so the results are not significant: Make no conclusion about the influence of the new technique. z-TEST 1. Determine the experimental hypotheses and create the statistical hypotheses. Identify the predicted relationship the study may demonstrate, (continued) Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test 117 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ©iStockphoto.com/Stephanie Brewer For Practice We test whether a sample of 36 successful dieters rs are more or less satisfied with their appearance than is the population of nondieters, where m ⫽ 40 (sX ⫽ 12).. 1. What are H0 and Ha? 2. The X for dieters is 45. Compute zobt. 3. Set up the sampling distribution. 4. What should we conclude? > Answers We test H0 by testing whether the sample represents the raw score population in which m equals 100. This is because our sample mean must be above 100 for us to even begin to conclude that our pill makes people smarter. If we then find that our mean is too high to represent a m equal to 100, then we automatically reject that it represents a m less than 100. 4. The zobt of ⫹2.5 is beyond zcrit of { 1.96, so the results are significant: The population of dieters are more satisfied (at a m around 45) than the population of nondieters (at m ⫽ 40). 3. With a ⫽ .05 the sampling distribution has a region of rejection in each tail, with zcrit ⫽ {1.96 (as in Figure 7.4). X 2. s ⫽ 12/ 136 ⫽ 2; zobt ⫽ (45 ⫺ 40)/2 ⫽ ⫹2.5 1. H0: m ⫽ 40; Ha: m ⬆ 40 7-6 THE ONE-TAILED TEST Recall that a one-tailed test is used when we predict the direction in which scores will change. The statistical hypotheses and sampling distribution are different in a one-tailed test. A one-tailed null hypothesis always includes that m equals some value. Test H0 by testing whether the sample data represent the population with that m. 7-6a The One-Tailed Test for Increasing Scores Say that we had developed a “smart pill.” Then the experimental hypotheses are (1) the pill makes people smarter by increasing IQ, or (2) the pill does not make people smarter. For the statistical hypotheses, start with the alternative hypothesis: People without the pill produce a m of 100, so if the pill makes them smarter, our sample will represent a population with a m greater than 100. The symbol for greater than is “+”; therefore, Ha: m ⬎ 100. For the null hypothesis, if the pill does not work as predicted, either it will leave IQ scores unchanged or it will decrease them (making people dumber). Then our sample mean represents a m either equal to 100 or less than 100. The symbol for less than or equal to is “ " ”, so H0: m ⱕ 100. 118 Thus, as shown in Figure 7.7, the sampling distribution again shows the means that occur if we are representing a m ⫽ 100 (the situation where the pill does nothing to IQ). We again set a ⫽ .05, but the region of rejection is in only one tail of the sampling distribution. You can identify which tail to put the region in by identifying the result you must have to be able to claim that your independent variable works as predicted (to support Ha). To say that the pill makes people smarter, the sample mean must be significant and larger than 100. Means that are significantly larger than 100 are in a region of rejection in the upper tail of the sampling distribution. Then, as in the previous chapter, the region of rejection is 5% of the curve, so zcrit is ⫹1.645. Say that after testing the pill, we find X ⫽ 106.58. The sampling distribution is still based on the Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. population with m ⫽ 100 and sX ⫽ 15. Say that N ⫽ 36, so sX ⫽ 15 ⫽ 2.5 136 Then zobt ⫽ 106.58 ⫺ 100 ⫽ ⫹2.63 2.5 you may miss a relationship that, with a two-tailed test, would be significant. 7-6b The One-Tailed Test for Decreasing Scores A study where a one-tailed test would be appropriate is if we had created a pill to lower IQ scores. If the pill works, then the sample mean represents a m less than 100. The symbol for less than is “⬍”, so Ha: m ⬍ 100. However, if the pill does not work, either it will leave scores unchanged, or it will increase scores. Then, the sample mean represents a m greater than or equal to 100. The symbol for greater than or equal to is “ " ”, so H0: m ⱖ 100. However, we again test H0 by testing whether m ⫽ 100. As in Figure 7.7, this zobt is beyond zcrit, so it is in the region of rejection. Therefore, the sample mean is unlikely to represent the population having m ⫽ 100, and it’s even less likely to represent a population that has a m ⬍ 100. Therefore, we reject the null hypothesis that m ⱕ 100, and accept the alternative hypothesis that m ⬎ 100. We conclude that the pill produces a significant increase in IQ scores, and estimate that with the pill, m would Figure 7.7 equal about 106.58. Sampling Distribution of IQ Means for a One-Tailed Test of Whether Scores Increase If zobt had not been in The region of rejection is entirely in the upper tail. the region of rejection, we µ would have retained H0 and had no evidence as to whether the pill makes people smarter or not. Region of rejection f Recognize there is a equals 5% risk to using one-tailed tests. The one-tailed zobt is significant only if it lies beyond zcrit X X X X X X X X X X X X 100 and has the same sign. So, Sample means z-scores –3 –2 –1 0 +1 +2 +3 above, our results would not zcrit = +1.645 have been significant if the zobt = +2.63 pill had produced very low IQ scores and a very large negative zobt. We would have had no region of rejection in Figure 7.8 the lower tail; and you can- Sampling Distribution of IQ Means for a One-Tailed Test of Whether Scores Decrease not move the region after The region of rejection is entirely in the lower tail. the fact to make the results µ significant. (After developing a “smart pill,” it would make no sense to suddenly say, “Whoops, I meant to Region of rejection f equals 5% call it a dumb pill.”) Therefore, use a one-tailed test only when it is the appropriate test of your independent Sample means X X X X X X X X X X X X 100 variable—when the variable z-scores –3 –2 –1 0 +1 +2 +3 can “work” only if scores go zcrit = –1.645 in one direction. Otherwise, Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test 119 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ONE-TAILED PERFORM A TEST ONLY WHEN IT IS THE APPROPRIATE WAY TO TEST YOUR INDEPENDENT VARIABLE. For us to conclude that the pill lowers IQ, our sample mean must be significantly less than 100. Therefore, the region of rejection is in the lower tail of the distribution, as in Figure 7.8. With a ⫽ .05, zcrit is now minus 1.645. If the sample produces a negative zobt beyond ⫺1.645 (for example, zobt ⫽ ⫺1.69), then we reject the H0 that the sample mean represents a m equal to or greater than 100, and accept the Ha that the sample represents a m less than 100. Then we have evidence the pill works. However, if zobt does not fall in the region of rejection (for example, if zobt ⫽ ⫺1.25), we do not reject H0, and we have no evidence as to whether the pill works or not. 7-7 STATISTICS IN THE RESEARCH LITERATURE: REPORTING THE RESULTS When reading published reports, you’ll often see statements such as “the IQ pill produced a significant difference in IQ scores,” or “it had a significant effect on IQ.” These are just other ways to say that the results reflect a believable relationship, because they are unlikely to occur through sampling error. However, whether a result is significant depends on the probability used to define “unlikely,” so we must also indicate our a. The APA format for reporting a result is to indicate the statistic computed, the obtained value, and a. Thus, for a significant zobt of ⫹2.00, a research report would have z ⫽ ⫹2.00, p ⬍ .05 Notice that instead of indicating that a equals .05, we indicate that the probability (p) is less than .05. (We’ll discuss the reason for this shortly.) For a nonsignificant zobt of ⫺.40, we would have We include p ⬍ .05 when reporting a significant result and p ⬎ .05 when reporting a nonsignificant result. > Quick Practice > > Perform a one-tailed test when predicting the direction the scores will change. When predicting that X will be higher than m, the region of rejection is in the upper tail of the sampling distribution. When predicting that X will be lower than m, the region of rejection is in the lower tail. More Examples We predict that learning statistics will increase a student’s IQ. Those not learning statistics have m ⫽ 100 and sX ⫽ 15. For 25 statistics students, X ⫽ 108.6. 1. With a one-tailed test, Ha: m ⬎ 100; H0: m ⱕ 100. 2. Compute zobt: sX ⫽ sX / 1N ⫽ 15/ 125 ⫽ 3; zobt ⫽ (X ⫺ m)/sX ⫽ (108.6 ⫺ 100)/3 ⫽ ⫹2.87. 3. With a ⫽ .05, zcrit is ⫹1.645. The sampling distribution is as in Figure 7.7. 4. The zobt of ⫹2.87 is beyond zcrit, so the results are significant: Learning statistics gives a m around 108.6, while people not learning statistics have m ⫽ 100. Say that a different mean produced zobt ⫽ ⫹1.47. This is not beyond zcrit, so it is not significant. We’d have no evidence whether or not learning statistics raises IQ. For Practice You test the effectiveness of a new weight-loss diet. 1. Why is this a one-tailed test? z ⫽ ⫺.40, p ⬎ .05 Notice, that with nonsignificant results, the p is greater than .05. 120 2. For the population of nondieters, m ⫽ 155. What are Ha and H0? (continued) Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 3. In which tail is the region of rejection? 4. With a ⫽ .05, the zobt for the sample of dieters is ⫺1.86. What do you conclude? > Answers 4. The zobt is beyond zcrit of ⫺1.645, so it is significant: The m for dieters will be less than the m of 155 for nondieters. 3. The left-hand tail 2. Ha: m ⬍ 155 and H0: m ⱖ 155 1. Because a successful diet lowers weight scores 7-8 ERRORS IN STATISTICAL DECISION MAKING We have one other issue to consider when performing hypothesis testing, and it involves potential errors in our decisions. Regardless of whether we conclude that the sample does or does not represent the predicted relationship, we may be wrong. 7-8a Type I Errors Sometimes the variables we investigate are not related in nature, and so H0 is really true. When we are in this situation, if we obtain data that cause us to reject H0, then we make a Type I error. A Type I error is defined as rejecting H0 when H0 is true. This error occurs because our sample so poorly represents the situation where the independent variable does not work that we are fooled into concluding that the variable does work. For example, say our pill does not make people smarter, but sampling error produces a very high mean that falls in the region of rejection. Then we’ll mistakenly conclude that the pill does work. We never know when we make a Type I error, because we never know whether our variables are related in nature. However, we do know that the theoretical probability of a Type I error equals a. Here’s why: When discussing the possibility of Type I errors, it is a given that H0 is true and should not be rejected. So assume that the IQ pill does not work, and that we can obtain our sample only from the underlying raw score population where m is 100. Assume a ⫽ .05. If we repeated our experiment many times in this situation, we’d again have the sampling distribution back in Figure 7.6 showing the different means obtained over the long run when the pill does not work. However, 5% of the time we would obtain extreme sample means that fall in the region of rejection and cause us to reject H0 even though H0 is true. Rejecting H0 when it is true is a Type I error, so over the long run, Type I errors will occur 5% of the time. Therefore, if we happen to be in the situation where H0 is true, the theoretical probability of making a Type I error is .05. (The same is true in a one-tailed test.) You either will or will not make the correct decision when H0 is true, so the probability of avoiding a Type I error is h ubnic B l ie 1 ⫺ a. That is, if 5% of Dan and arris ne H the time our samples are in the n a o m/J to.co kpho c region of rejection when H0 is true, then o t ©iS 95% of the time they are not in the region of rejection when H0 is true. Therefore, with a ⫽ .05, the theoretical probability is .95 that we will avoid a Type I error by retaining H0 when it is true. Although the theoretical probability of a Type I error equals a, the actual probability is slightly less than a. This is because the region of rejection includes the critical value, but to reject H0, our zobt must be larger than the critical value. We cannot determine the precise area under the curve for the point located at zcrit, so we can’t remove it from our 5%. All we can say is that when a is .05, the region of rejection is slightly less than 5% of the curve. Because the region of rejection is less than a, the probability of a Type I error is also less than a. Thus, in our previous examples when we rejected H0, the probability that we were making a Type I error was less than .05. That is why the APA format is to report significant results using p ⬍ .05. This is code for “the probability of a Type I error is less than .05.” On the other hand, we report nonsignificant results using p ⬎ .05. This communicates that we did not call a result significant because to do so would require a region of rejection greater than .05 of the curve. But then the probability of a Type I error Type I error would be greater than our a of .05 Rejecting the null and that’s unacceptable. hypothesis when it is The reason we never use an a true (that is, saying the independent larger than .05 is because then a variable works when Type I error is too likely. Instead, it does not) we limit the chances of making this Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test 121 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. error because it can lead to serious consequences. Researchers can cause enormous damage by claiming, for example, that new drugs, therapies, or surgical techniques work when they actually do not. In fact, sometimes making a Type I error is so dangerous that we want to reduce its probability even further. In that case, we usually set alpha at .01, so the probability of making a Type I error is p ⬍ .01. For example, if the smart pill had some dangerous side effects, we would set a ⫽ .01 so that we are even less likely to conclude that the pill works when it does not. However, we use the term significant in an all-or-nothing fashion: A result is not “more” significant when a ⫽ .01 than when a ⫽ .05. If zobt lies anywhere in a region of rejection, the result is significant, period! The only difference is that when a ⫽ .01, the probability that we’ve made a Type I error is smaller. When H0 is true: Rejecting H0 is a Type I error, and its theoretical probability is a; retaining H0 is avoiding a Type I error, and its probability is 1−a. 7-8b Type II Errors In addition to Type I errors, we must also be concerned about making a different kind of error. Sometimes the variables we investigate are related in nature and so H0 is really false. When we are in this situation, if we obtain data that cause us to retain H0. then we make a Type II error. A Type II error is defined as retaining H0 when H0 is false (and Ha is true). In other words, we fail to identify that an independent variable really does work. This error occurs because the sample so poorly represents the situation where the independent variable works that we are fooled into concluding that the variable does not work. For Type II error example, say our pill does make peoRetaining the null ple smarter but it raises scores by so hypothesis when it is little that the sample mean is not high false (that is, failing to identify that the enough to fall in the region of rejecindependent variable tion. Then we’ll mistakenly conclude does work as predicted) that the pill does not work. 122 Anytime we discuss the possibility of Type II errors, it is a given that, unknown to us, H0 is false and should be rejected. Thus, in our examples where we retained H0 and did not conclude that our pill worked, it is possible we made a Type II error. (Researchers can determine the probability of making this error, but the computations are beyond the introductory level.) Conversely, if we reject H0 in this situation, then we avoid a Type II error: We’ve made the correct decision by concluding that the pill works when it does work. In a study where the predicted relationship does exist in nature, concluding that the relationship does not exist is a Type II error; concluding that the relationship does exist avoids this error. To help you distinguish between Type I and Type II errors, remember that the type of error you can potentially make is determined by your situation— what nature “says” about whether there is a relationship between your variables. “Type I” is the error that may occur when the predicted relationship does not exist in nature. “Type II” is the error that may occur when the predicted relationship does exist in nature. Then, whether you actually make the error depends on whether you disagree with nature in each situation. Rejecting H0 when nature says there is not a relationship is a Type I error; retaining H0 when nature says there is a relationship is a Type II error. Also, you cannot be in a situation where there is and is not a relationship at the same time, so if you can possibly make one error, you cannot make the other error. Lastly, remember that you might be making a correct decision. Thus, one of four outcomes is possible in any study: When H0 is true—there is no relationship: 1. Our data cause us to reject H0, so we make a Type I error. 2. Our data cause us to retain H0, so we avoid making a Type I error. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. When H0 is false—the relationship exists: (at least 30), extreme sampling error power The is less likely, so we are less likely to probability that we will detect a misrepresent a relationship that is relationship and present. (When you study how to correctly reject a false design research, you’ll learn of other null hypothesis; the probability of avoiding ways to build in power.) a Type II error Thus, we can summarize how researchers deal with potential errors as follows: By using an a of .05 or less, we minimize the probability of making the wrong decision when H0 is true, so that we avoid a Type I error. At the same time, by maximizing power we minimize the probability of making the wrong decision when H0 is false, so that we avoid a Type II error. 3. Our data cause us to retain H0, so we make a Type II error. 4. Our data cause us to reject H0, so we avoid making a Type II error. 7-8c Power We’ve seen that we can minimize the probability of making Type I errors by selecting a small a. We can also minimize the probability of making Type II errors, but in an indirect way—by maximizing the probability that we will avoid them. This probability is called power. Power is the probability that we will reject H0 when it is false, and correctly conclude that the sample data represent a real relationship. In other words, power is the probability of not making a Type II error. Remember that a Type II error occurs when the predicted relationship does exist in nature, so to avoid the error we should reject H0 and have significant results. Therefore, researchers maximize the power of a study by maximizing the chances that the results will be significant. Then we are confident that if the relationship exists in nature, we will not miss it. If we still end up retaining H0, we are confident that this is because the relationship is not there. Therefore, we are confident in our decision to retain H0, and, in statistical lingo, we are confident we are avoiding a Type II error. We have several ways to increase the power of a study. First, it is better to design a study that employs parametric procedures, because they are more powerful than nonparametric procedures: Because of its theoretical basis, a parametric test is more likely to produce significant results. Second, a one-tailed test is more powerful than a two-tailed test: A zobt is more likely to be beyond the one-tailed zcrit of 1.645 than the two-tailed zcrit of 1.96, so the one-tailed test is more likely to be significant. Finally, testing a larger N provides greater power: With more scores in a sample > Quick Practice > > A Type I error is rejecting a true H0. A Type II error is retaining a false H0. Power is the probability of not making a Type II error. More Examples When H0 is true, there is no relationship: If the data cause us to reject H0, we make a Type I error. To decrease the likelihood of this, we keep alpha small. If the data cause us to retain H0, we avoid this error. When H0 is false, there is a relationship: If the data cause us to retain H0, we make a Type II error. If the data cause us to reject H0, we avoid this error. To increase the likelihood of this, we increase power. For Practice 1. Claiming that an independent variable works even though in nature it does not is a _____ error. 2. Failing to conclude that an independent variable works even though in nature it does is a _____ error. 3. If we reject H0, we cannot make a _____ error. 5. To be confident in a decision to retain H0, our power should be _____. > Answers 1. Type I 2. Type II 3. Type II 4. Type I 5. high ©iStockphoto.com/Dan Tero 4. If we retain H0, we cannot make a _____ error. Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test 123 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out and use the Chapter Review Cards in the back of your book. Check out the additional study aids online in CourseMate at www.cengagebrain.com STUDY PROBLEMS (Answers for odd-numbered questions are in Appendix C.) 1. Why does the possibility of sampling error present a problem to researchers when inferring a relationship in the population? 2. What are inferential statistics used for? 3. What does a stand for, and what two things does it determine? 4. (a) What are the two major categories of inferential procedures? (b) What characteristics of your study determine which category is appropriate for your study? (c) What is a statistical reason for preferring to design a study where you can use parametric procedures? 5. A researcher obtains sample data showing that participants who wear a pedometer tend to exercise more often than those who do not wear one. What are the two possible statistical explanations for this result? 6. Some of the children given a new flu vaccine later develop a neurological disease. Parents claim the vaccine caused the disease. What are the two possible statistical explanations for this result? 7. (a) What does H0 stand for and what does it communicate? (b) What does Ha stand for and what does it communicate? 8. (a) When do you use a one-tailed test? (b) When do you use a two-tailed test? 9. (a) What does “significant” convey about the results of an experiment? (b) Why is obtaining significant results a goal of behavioral research? there is no relationship in the population.” (d) “We have insignificant results.” (e) “We have no information about the relationship in the population.” (f) “The independent variable may work, but we might have sampling error in representing this.” 12. (a) In plain English, what is the incorrect statement you could make about a relationship when it does not exist in nature? (b) Which statistical hypothesis says the relationship does not exist? (c) What is the incorrect decision you can make with this hypothesis? (d) What is our name for this error? (e) What is the incorrect statement you could make about a relationship when it does exist in nature? (f) How do we make an incorrect decision about H0 when a relationship does exist? (g) What is our name for this error? 13. We ask if the attitudes toward fuel costs of 100 owners of hybrid electric cars (X ⫽ 76) are different from those reported in a national survey of owners of non-hybrid cars (m ⫽ 65 and sX ⫽ 24). Higher scores indicate a more positive attitude. (a) What is the predicted relationship here? (b) Is this a oneor a two-tailed test? (c) In words, state the H0 and the Ha. (d) Compute zobt. (e) What is zcrit? (f) What do you conclude about attitudes here? (g) Report your results in the correct format. 14. We ask if visual memory ability for a sample of 25 art majors (X ⫽ 49) is better than that for a population of engineering majors (m ⫽ 45 and sX ⫽ 14). Higher scores indicate a better memory. (a) What is the predicted relationship here? (b) Is this a one- or a twotailed test? (c) In words, state H0 and Ha. (d) Compute zobt. (e) What is zcrit? (f) What do you conclude about differences in visual memory ability? (g) Report your results using the correct format. 10. In a study we reject H0. Which of the following statements are incorrect and why? (a) “Now we know that H0 is false.” (b) “We have proof that our sample mean represents a particular m.” (c) “We have proof that the independent variable causes scores to change as predicted.” (d) “It is not possible that the difference between X and m is due to sampling error.” (e) “We have evidence that the predicted relationship does not exist.” 15. (a) In question 13, what is the probability we made a Type I error? What would be the error in terms of the independent and dependent variables? (b) What is the probability we made a Type II error? What would be the error in terms of the independent and dependent variables? 11. In a study we retain H0. Which of the following statements are incorrect and why? (a) “We have proof that the independent variable does not cause scores to change as predicted.” (b) “We have convincing evidence that the independent variable does not work.” (c) “We should conclude that 16. (a) In question 14, is it possible we made a Type I error? What would be the error in terms of the independent and dependent variables? (b) Is it possible we made a Type II error? What would be the error in terms of the independent and dependent variables? 124 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 17. Using the independent and dependent variables, describe the experimental hypotheses when we study: (a) whether the amount of pizza consumed by college students during finals week increases relative to the amount consumed during the rest of the semester, (b) whether having participants do breathing exercises alters their blood pressure, (c) whether sensitivity to pain is affected if we increase participants’ hormone levels, (d) whether frequency of daydreaming decreases when we increase the amount of light in the room. 18. For each study in question 17, indicate whether a one- or a two-tailed test should be used and give the H0 and Ha: Assume we have a one-sample experiment, and the m of the dependent scores is 50 before we change the independent variable X. 19. Listening to music while taking a test may be relaxing or distracting. Jonas tests 49 participants while they listen to music, obtaining a X ⫽ 74.36. The mean of the population taking this test without music is 70 (sX ⫽ 12). (a) Is this a one-tailed or a two-tailed test? Why? (b) Using symbols, what are H0 and Ha? (c) Compute zobt. (d) What is zcrit? (e) Does he have evidence of a relationship in the population? If so, describe the relationship. 20. Laura asks whether attending a private school versus a public school leads to higher or lower performance on a test of social skills. A sample of 100 students from a private school produces a mean of 71.30 on the test, and the mean for the population of students from a public school is 75.62 (sX ⫽ 28.0). (a) Should she use a one-tailed or a two-tailed test? Why? (b) Using symbols, what are H0 and Ha? (c) Compute zobt. (d) With a ⫽ .05, what is zcrit? (e) What should she conclude about this relationship? 21. Melissa measures the self-esteem of a sample of statistics students, predicting that the challenge of this course lowers their self-esteem relative to that of the typical college student, where nationally the m ⫽ 28 and sX ⫽ 11.35. She obtains these scores: 44 55 39 17 27 38 36 24 36 (a) Summarize the sample data. (b) Is this a onetailed or a two-tailed test? Why? (c) What are H0 and Ha? (d) Compute zobt. (e) What is zcrit? (f) What should she conclude about the relationship here? (g) What three possible situations involving these variables might be present in nature? 22. (a) What is power? (b) Why do researchers want to maximize power? (c) Is power more important when we reject or when we retain H0? Why? (d) Why is a one-tailed test more powerful than a two-tailed test? 23. Arlo claims that with a one-tailed test, the smaller zcrit makes us more likely to reject H0 even if the independent variable doesn’t work, so we are more likely to make a Type I error. Why is he correct or incorrect? 24. Researcher A finds a significant relationship between increasing stress level and ability to concentrate. Researcher B repeats this study but finds a nonsignificant result. Identify the statistical error that each researcher may have made. 25. Amber says increasing power also makes us more likely to reject H0 when it is true, making a Type I error more likely. Why is she correct or incorrect? Chapter 7: Overview of Statistical Hypothesis Testing: The z-Test 125 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 8 HYPOTHESIS TESTING USING THE ONE-SAMPLE t-TEST LOOKING BACK GOING F O R WA R D Be sure you understand: Your goals in this chapter are to learn: • From Chapter 4, that sX is the estimated population standard deviation, sX2 is the estimated population variance, and both involve dividing by N⫺1. • The difference between the z-test and the t-test. • From Chapter 7, the components of hypothesis testing and what significant indicates. Sections 8-1 Understanding the One-Sample t-Test 8-2 Performing the One-Sample t-Test 8-3 Interpreting the t-Test 8-4 Estimating M by Computing a Confidence Interval 8-5 126 • How the t-distribution and degrees of freedom are used. • When and how to perform the t-test. • What is meant by the confidence interval for m and how it is computed. T he logic of hypothesis testing discussed in the previous chapter is common to all inferential procedures. Therefore, for the remainder of this book, your goal is to learn how slightly different procedures (with different formulas) are applied when we have different research designs. We begin the process in this chapter by introducing the t-test, which is very similar to the previous z-test. The chapter presents (1) the similarities and differences between the z-test and the t-test, (2) when and how to perform the t-test, and (3) a new procedure—called the confidence interval—that is used to more precisely estimate m. Stastics in the Research Literature: Reporting t Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ©Yuri Arcurs/Shutterstock.com 8-1 UNDERSTANDING THE ONE-SAMPLE t-TEST Like the z-test, the t-test is used for significance testing in a one-sample experiment. In fact, the t-test is used more often in behavioral research. This is because the z-test requires that we know the population standard deviation (sX). However, usually researchers do not know such things about the population because they’re exploring uncharted areas of behavior. Instead, we must estimate the population variability by using the sample data to compute the unbiased estimators (the N⫺1 formulas) of the population’s standard deviation or variance. Then we compute something like a z-score for our sample mean, but, because the formula is slightly different, it is called t. The one-sample t-test is the parametric procedure used in a one-sample experiment when the standard deviation of the raw score population is not known. Here’s an example that requires the t-test: Say that a fashion/lifestyle magazine targeted toward “savvy” young women asks readers to complete an online survey (with cool prizes) to measure their level of optimism about the future. The survey uses an interval scale, where 0 is neutral, positive scores indicate optimism, and negative scores indicate pessimism. The magazine reports an average score of 10 (so the m is 10). Say that we then ask, “How would men score on this survey?” To answer this, we’ll perform a one-sample experiment by giving the survey to a comparable sample of one-sample men and use their X to estimate the m t-test The parametric procedure for the population of men. Then we used in a one-sample can compare the m for men to the m experiment when of 10 for women. If men score differthe standard ently from women, then we’ve found deviation of the raw score population is a relationship in which, as gender estimated changes, optimism scores change. Magazines don’t concern themselves with reporting a standard deviation, so we have a one-sample experiment where we don’t know the sX of the raw score population. Therefore, the t-test is appropriate. As usual, we first set up the statistical test. 1. The statistical hypotheses: We’re open-minded and look for any kind of difference, so we have a two-tailed test. If men are different from women, then our sample represents a m for men that will not equal the m for women of 10, so Ha is m ⬆ 10. If men are not different, then their m will equal that of women, so H0 is m ⫽ 10. Use the z-test when sX is known; use the t-test when it is not known. Chapter 8: Hypothesis Testing Using the One-Sample t-Test 127 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. estimated standard error of the mean (sX– ) An estimate of the standard deviation of the sampling distribution of means, used in calculating the one-sample t-test 2. Alpha: We select alpha; .05 sounds good. not related to optimism and we are being misled by sampling error: Maybe by chance we selected some exceptionally pessimistic men for our sample, but in the population men are not different from women and so the sample is simply poorly representing the men’s population where m ⫽ 10. To test this null hypothesis, we’ll use the same logic we’ve used previously: H0 says that the men’s mean represents the population where m is 10. We will compute tobt, which will locate our sample on the sampling distribution of means produced when we are sampling from this raw score population. The critical value that marks the region of rejection is tcrit. If tobt is beyond tcrit, our sample mean lies in the region of rejection, so we’ll reject the idea that the sample represents the population where m ⫽ 10. The only novelties here are that tobt is calculated differently than zobt and that tcrit comes from the “t-distribution.” Therefore, first we’ll see how to compute tobt and then we’ll see how to set up the sampling distribution. 3. Check the assumptions: The one-sample t-test requires the following: a. You have a one-sample experiment using interval or ratio scores. b. The raw score population forms a normal distribution. c. The variability of the raw score population is estimated from the sample. Our study meets these assumptions, so we proceed. For simplicity, say we test 9 men. (For adequate power, you should never collect so few scores.) Say the sample produces a X of 7.78. On the one hand (as in Ha), based on this X we might conclude that the population of men would score around a m of 7.78. Because women score at a m of 10, maybe we have demonstrated a relationship between gender and optimism. On the other hand (as in H0), maybe gender is 8-2 PERFORMING THE ONE-SAMPLE t-TEST The computation of tobt consists of three steps that parallel the three steps in the z-test. The first step in the z-test was to find the true standard deviation (sX) of the raw score population. For the t-test, we can compute the estimated standard deviation (sX), or, as we’ll see, the estimated population variance (sX2 ). Their formulas are shown here: (⌺X ) 2 N N⫺1 ⌺X 2 ⫺ sX ⫽ R and (⌺X ) 2 ⌺X ⫺ N 2 sX ⫽ N⫺1 ©iStockphoto.com/Zorani 2 128 The second step of the z-test was to compute the standard error of the mean (sX), which is the standard deviation of the sampling distribution of means. However, because now we estimate the population variability, we compute the estimated standard error of the mean, which is an estimate of the Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. standard deviation of the sampling distribution of means. The symbol for the estimated standard error of the mean is sX . (The s stands for an estimate of the population, and the subscript X indicates it is a population of means.) Previously, we computed the standard error using this formula: X ⫺m sX Now, using the estimated standard error, we have the very similar final step of computing tobt. THE FORMULA FOR THE ONE-SAMPLE t-TEST IS tobt ⫽ sX 1N You may use this formula to compute sX. However, to make your life a little easier and to prepare for formulas in the next chapter, we’ll make a small change. Instead of computing the estimated standard deviation as above, we will compute the estimated population variance (sX2 ). Recall that their difference is that the standard deviation requires first computing the variance, and then has the added step of finding its square root. So, using the symbol for the variance, here is how the previous formula computed the estimated standard error. 2sX2 ©iStockphoto.com/Seraficus 1N Using the estimated population standard deviation produces this very similar formula: sX ⫽ zobt ⫽ sX sX ⫽ sX ⫽ Finally, the third step in the z-test was to compute zobt using this formula: X ⫺m sX X is the sample mean, m is the mean of the sampling distribution (which equals the value of m that H0 says we are representing), and sX is the estimated standard error of the mean computed above. For example, say that our optimism study yielded the data in Table 8.1. STEP 1: Compute the X and the estimated variance using the sample data. Here X ⫽ 7.78. The sX2 equals (⌺X ) 2 (70)2 574 ⫺ N 9 ⫽ ⫽ 3.695 N⫺1 9⫺1 ⌺X 2 ⫺ sX2 ⫽ 1N Finding the square root in the numerator gives the standard deviation, and then dividing by the square root of N gives the standard error. However, to avoid all that square rooting, we can replace the two square root signs with one big one, producing this: THE FORMULA FOR THE ESTIMATED STANDARD ERROR OF THE MEAN IS sX ⫽ sX2 BN This formula divides the estimated population variance by the N of our sample and then takes the square root. Table 8.1 Optimism Scores of Nine Men Participants 1 2 3 4 5 6 7 8 9 N⫽9 Scores (X) 9 8 10 7 8 8 6 4 10 X2 81 64 100 49 64 64 36 16 100 ⌺X ⫽ 70 ⌺X 2 ⫽ 574 X ⫽ 7.78 Chapter 8: Hypothesis Testing Using the One-Sample t-Test 129 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. The sampling distribution of all values of t that occur when samples of a particular size are selected from the raw score population described by the null hypothesis sX ⫽ sX2 BN ⫽ 3.695 B 9 ⫽ 2.411 ⫽ .64 STEP 3: Compute tobt. tobt ⫽ X ⫺ m 7.78 ⫺ 10 ⫽ ⫽ ⫺3.47 sX .64 Thus, our tobt is ⫺3.47. > Quick Practice > Perform the one-sample t-test in a one-sample experiment when you do not know the population standard deviation. More Examples In a study, H0 is that m ⫽ 60. The X ⫽ 62, sX2 ⫽ 25, and N ⫽ 36. To compute tobt: sX ⫽ tobt ⫽ sX2 BN ⫽ 25 ⫽ 1.694 ⫽ .833 B 36 X ⫺ m 62 ⫺ 60 ⫹2 ⫽ ⫽ ⫽ ⫹2.40 sX .833 .833 For Practice In a study, H0 is that m ⫽ 6. The data are 6, 7, 9, 8, 8. 1. To compute tobt, what two statistics are computed first? 2. What do you compute next? 3. Compute the tobt. > Answers: 3. X ⫽ 7.6, sX2 ⫽ 1.30, N ⫽ 5; s X ⫽ 11.30/5 ⫽ .51; tobt ⫽ (7.6 ⫺ 6)>.51 ⫽ ⫹3.137 2. s X 1. X and sX2 130 8-2a The t-Distribution and df To evaluate a tobt we must compare it to tcrit, and for that we examine the t-distribution. Think of the t-distribution in the following way. Once again we infinitely draw samples of the same size N from the raw score population described by H0. For each sample we compute the X and its tobt. Then we plot the frequency distribution of the different means, labeling the X axis with tobt as well. Thus, the t-distribution is the distribution of all possible values of t computed for random sample means selected from the raw score population described by H0. You can envision the t-distribution as in Figure 8.1. As we saw with z-scores, increasing positive values of t are located farther to the right of m; increasing negative values of t are located farther to the left of m. As usual, this sampling distribution is still showing the different means that occur when H0 is true. So, if our tobt places our mean close to the center of the distribution, then we have a mean that is frequent and likely when we are representing the population described by H0. (In our example, our sample of men is likely to represent the population where m is 10.) But, if tobt places our mean far into a tail of the distribution, then we have a mean that hardly ever happens and is very unlikely when we are representing the population described by H0. (Our sample of men is unlikely to represent the population where m is 10.) As usual, to determine if our mean is far enough into a tail to be in the region of rejection, we first identify the critical value of t. But we have one important novelty here: The t-distribution does not fit the perfect standard normal curve (and z-table) the way our previous sampling distributions did. Instead, there are actually many versions of the t-distribution, each having a slightly different shape. The shape of a particular distribution depends on the size of the samples that are used when creating it. When using small samples, the t-distribution is only roughly normally distributed. With larger samples, Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ©Paul Cooklin/Brand X Pictures/Jupiter Images STEP 2: Compute the estimated standard error of the mean. t-distribution different critical values of t. For example, Example of a t-Distribution of Random Sample Means Figure 8.2 shows two t-distributions. Notice µ the size of the blue region of rejection in the tails of Distribution A. Say that this is the f extreme 5% of Distribution A and has the critical value shown. If we also use this tcrit Higher means Lower means X X X X X X X X X X X 10 X on Distribution B, the Values of t –3 –2 –1 0 +1 +2 +3 region of rejection is larger, containing more than 5% of the distribution. Conversely, the tcrit markthe t-distribution is a progressively closer approximaing off 5% of Distribution B will mark off less than tion to the perfect normal curve. This is because with 5% of Distribution A. (The same problem exists for a a larger sample, our estimate of the population varione-tailed test.) ance or standard deviation is closer to the true popuThis issue is important because not only is a the lation variance and standard deviation. As we saw in size of the region of rejection, it is also the probabilthe z-test, when we have the true population variability of a Type I error (which is rejecting H0 when it is ity the sampling distribution is a normal curve. true.) Unless we use the appropriate tcrit for a particuThe fact that there are differently shaped lar t-distribution, the actual probability of a Type I t-distributions for different sample sizes is important error will not equal our a and that’s not supposed to for one reason: Our region of rejection should conhappen! Thus, the obvious solution is to examine the tain precisely that portion of the curve defined by t-distribution that is created when using the same samour a. If a ⫽ .05, then we want the critical value to ple size as in our study. For the particular shape of this mark off precisely the extreme 5% of the curve as the distribution we determine the specific value of tcrit. Only region of rejection. However, for distributions that are then will the region of rejection (and the probability of a shaped differently, the point that marks the extreme Type I error) equal our a. 5% will be at different locations on the X axis of the However, in this context, the size of a sample is distribution. Because this point is at the critical value, not determined by N. Recall that when computing with differently shaped t-distributions we must use the estimated variance or estimated standard Figure 8.2 deviation, the final diviComparison of Two t-Distributions Based on Different Sample Ns sion involves the quanµ tity N ⫺ 1. In Chapter Distribution A 4 we saw that this is Distribution B the number of scores in a sample that actually f reflect the variability in the population. Thus, it is the size of the quantity N ⫺ 1 that determines the shape of the t-distriMeans X X X X X X X X X X X X X X X X X bution and our tcrit for a Values of t –4 –3 –2 –1 0 +1 +2 +3 +4 particular study. Critical value Critical value We have a special for Distribution A for Distribution A name for “N ⫺ 1”: It is Figure 8.1 Chapter 8: Hypothesis Testing Using the One-Sample t-Test 131 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. The number of scores in a sample that reflect the variability in the population; determine the shape of the sampling distribution when estimating sX called the degrees of freedom and is symbolized as df. Thus A Portion of the t-Table THE FORMULA FOR DEGREES OF FREEDOM IN THE ONE-SAMPLE t-TEST IS df ⫽ N ⫺ 1 where N is the number of scores in the sample. In our optimism study, N ⫽ 9 so df ⫽ 8. The larger the df, the closer the t-distribution comes to forming a normal curve. However, a tremendously large sample is not required to produce a normal t-distribution. When df is greater than 120, the t-distribution is virtually identical to the standard normal z-distribution. But when df is between 1 and 120 (which is often the case in research), a differently shaped t-distribution will occur for each df. Therefore, a different tcrit is required for each df. Thus, you will no longer automatically use the critical values of 1.96 or 1.645 as you did in previous chapters. Instead, when your df is between 1 and 120, use the df to identify the appropriate sampling distribution for your study. The tcrit on that distribution will accurately mark off the region of rejection so that the probability of a Type I error equals your a. So, in the optimism study with an N of 9, we will use the tcrit from the t-distribution for df ⫽ 8. In a different study, however, where N might be 25, we would use the different tcrit from the t-distribution for df ⫽ 24. And so on. The appropriate tcrit for the one-sample t-test comes from the t-distribution that has df equal to N⫺1. 8-2b Using the t-Table We obtain the different values of tcrit from Table 2 in Appendix B, titled “Critical Values of t.” In this “t-table” you’ll find separate tables for two-tailed and one-tailed tests. Table 8.2 contains a portion of the two-tailed table. To find the appropriate tcrit, first locate the appropriate column for your a level (either .05 or .01). Then find the 132 Table 8.2 Alpha Level df 1 2 3 4 5 6 7 8 a ⫽ .05 12.706 4.303 3.182 2.776 76 2.571 2.447 2.365 2.306 a ⫽ .01 63.657 9.925 5.841 4.604 4.032 3.707 3. 3.499 3.355 value of tcrit in the row at the df for your sample. For example, in the optimism ism study, dff is 8. For a two-tailed o-tailed test with a ⫽ .05 and df d ⫽ 8, tcrit is 2.306. In a different study, say the sample N is 61. Therefore, the df ⫽ N ⫺ 1 ⫽ 60. Look in Table 2 of Appendix B to find tcrit. With a ⫽ .05, the two-tailed tcrit ⫽ 2.000; the one-tailed tcrit ⫽ 1.671. 671. The table contains no positive or negative signs. In a two-tailed test you add the “ { ”, and in a one-tailed test you supply the appropriate “⫹” or “⫺”. Finally, the t-tables contain critical values for only some values of df. When the df of your sample does not appear in the table, select the two dfs that bracket above and below your df and use their values of tcrit (e.g., if your df ⫽ 65, use the tcrit at dfs of 60 and 120). This gives you a larger and a smaller tcrit, with your actual tcrit in between. Then: 1. If your tobt is larger than the larger tcrit, then your results are significant. If you are beyond a number larger than your actual tcrit, then you are automatically beyond the actual tcrit. 2. If your tobt is smaller than the smaller tcrit, then your results are not significant. If you are not beyond a number smaller than your actual tcrit, then you won’t be beyond the actual tcrit. Rarely, tobt will fall between the two critical values: Then either perform the t-test using SPSS, or consult an advanced book to use “linear interpolation” to compute the precise tcrit. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ©iStockphoto.com/Don Nichols degrees of freedom (df ) 8-3 INTERPRETING THE t-TEST Once you’ve calculated tobt and identified tcrit, you can make a decision about your results. Remember our optimism study? We must decide whether the men’s mean of 7.78 represents the same population of scores that women have, where m is 10. Our tobt is ⫺3.47, and the two-tailed tcrit is {2.306. Thus, we can envision the sampling distribution in Figure 8.3. Remember, this shows all means that occur by chance when H0 is true—here, when our sample represents the population where m is 10. But, our tobt lies beyond tcrit, so the results are significant: Our X is so unlikely to occur if we were representing the population where m is 10 that we reject that this is the population our sample represents. So, we reject H0 and accept Ha. With a ⫽ .05, the probability is p ⬍ .05 that we’ve made a Type I error (by incorrectly rejecting H0). We interpret these results using the same rules and cautions discussed in the previous chapter. But remember: Finding a significant result is not the end of the story. First we describe the relationship we’ve demonstrated. With a sample mean of 7.78, our best estimate is that the m for men is around 7.78. Because women have a different m, at 10, we conclude that our results demonstrate a relationship in the population where, as we change from men to women, optimism scores change from a m around 7.78 to a m around 10. Finally, we return to being researchers and interpret the relationship in psychological or sociological terms: Why are women more optimistic than men? Are there social/cultural reasons or perhaps physiological reasons? Or instead, do men only act more pessimistic as part of a masculinity issue? And so on. If tobt had not fallen beyond tcrit (for example, if tobt ⫽ ⫹1.32), then it would not lie in the region of rejection and would not be significant. Then we would consider whether we had sufficient power to avoid making a Type II error (incorrectly retaining H0 and missing the relationship). And, we would apply the rules for interpreting nonsignificant results as discussed in the previous chapter, concluding that we have no evidence—one way or the other— regarding a relationship between gender and optimism scores. 8-3a Performing One-Tailed Tests As usual, we perform one-tailed tests only when we can confidently predict the direction of the relationship. If we had a reason to predict that men score higher than women (who have a m of 10), then Ha would be that the sample represents a population with m greater than 10 (Ha: m ⬎ 10). Our H0 is always that our predictions are wrong, so here it would be that the sample represents a population with a m less than or equal to 10 (H0: m ⱕ 10). We compute tobt as shown previously, but we find the one-tailed tcrit from the t-table for our df and a. To decide which tail of the sampling distribution to put the region of rejection in, determine what’s needed to support Ha. For our sample to represent a population of higher scores, the X must be greater than 10 and be significant. As shown Figure 8.3 Two-Tailed t-Distribution for df ⫽ 8 When H0 Is True and m ⫽ 10 µ f Ambrophoto/Alamy Means Values of t X X tobt = –3.47 X X –3 X –2 tcrit = –2.306 X X –1 X 10 0 X X +1 X X +2 X X +3 X X tcrit = +2.306 X = 7.78 Chapter 8: Hypothesis Testing Using the One-Sample t-Test 133 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Figure 8.4 H0 Sampling Distribution of t for a One-Tailed Test H0: µ ≥ 10 Ha: µ < 10 H0: µ ≤ 10 Ha: µ > 10 µ µ f f .05 X X X –t –t –t X X –t 10 X 0 +t X X X +t +t .05 X X +t –t +tcrit in the left-hand sampling distribution in Figure 8.4, such means are in the upper tail, so tcrit is positive. On the other hand, say we had predicted that men score lower than women. Now Ha is that m is less than 10, and H0 is that m is greater than or equal to 10. For our sample to represent a population of lower scores, the X must be less than 10 and be significant. As shown in the right-hand sampling distribution in Figure 8.4, such means are in the lower tail, so tcrit is negative. In either example, if tobt is in the region of rejection, then the X is unlikely to represent a m described by H0. Therefore, reject H0, accept Ha, and describe the results as significant. If tobt is not in the region of rejection, then the results are not significant. 8-3b Summary of the one-Sample t-Test The one-sample t-test is used with a one-sample experiment involving normally distributed interval or ratio scores when the variability in the population is not known. Then X X –t –t ⫺tcrit X X 10 –t 0 X +t X X X +t +t X +t interpret the relationship. If tobt is not beyond tcrit, the results are not significant; make no conclusion about the relationship. > Quick Practice > Perform the one-sample t-test when sx is unknown. More Examples In a study, m is 40. We predict our condition will change scores relative to this m. This is a two-tailed test, so H0: m ⫽ 40; Ha: m ⬆ 40. Then compute tobt. The 25 scores produce X ⫽ 46 and s 2x ⫽ 196. We compute tobt to be ⫹2.14. Next, we find tcrit: With a ⫽ .05 and df ⫽ 24, tcrit ⫽ {2.064. The tobt lies beyond the tcrit. Conclusion: The independent variable significantly increases scores from a m of 40 to a m around 46. a. Compute X and sX2 . For Practice We test if artificial sunlight during the winter months lowers one’s depression. Without the light, a depression test has m ⫽ 8. With the light, our sample with N ⫽ 41 produced X ⫽ 6. The tobt ⫽ ⫺1.83. b. Compute sX. 1. What are the hypotheses? 1. Create either the two-tailed or one-tailed H0 and Ha. 2. Compute tobt. c. Compute tobt. 3. Envision the sampling t-distribution and use df ⫽ N ⫺ 1 to find tcrit in the t-table. 4. Compare tobt to tcrit. If tobt is beyond tcrit, the results are significant; describe the populations and 134 2. What is tcrit? 3. What is the conclusion? 4. If N had been 50, would the results be significant? Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 4. Yes; the ⫺tcrit would be between −1.684 and −1.671. 3. tobt is beyond tcrit. Conclusion: Artificial sunlight significantly lowers depression scores from a m of 8 to a m around 6. 2. With a ⫽ .05 and df ⫽ 40, tcrit ⫽ ⫺1.684 1. To “lower” is a one-tailed test: Ha: m ⬍ 8; H0: m ⱖ 8. 8-4 ESTIMATING M BY COMPUTING A CONFIDENCE INTERVAL 7.78 ©iStockphoto.com/Caroline Klapper As you’ve seen, after rejecting H0, we estimate the population m that the sample mean represents. There are two ways to estimate m. The first way is point estimation, in which we describe a point on the dependent variable at which the population m is expected to fall. We base this estimate on our sample mean. Earlier, for example, we estimated that the m of the men’s population is located on the optimism variable at the point equal to our men’s sample mean of 7.78. However, the problem with point estimation is that it is extremely vulnerable to sampling error. Our sample of men probably does not perfectly represent the population of men, so if we actually tested the entire population, m probably would not be exactly 7.78. This is why we have been saying that the m for men is around 7.78. The other, better way to estimate a m is to include the possibility of sampling error and perform interval estimation. With interval estimation, we specify a range of values within which we expect the population parameter to fall. One way that you often encounter such intervals in real life is when you hear of a result accompanied by “plus or minus” some amount. This is called the margin of error. For example, during an election you may hear that a survey showed that 45% of the voters support a particular candidate, with a margin of error of {3%. The survey would involve a sample, however, so it may contain sampling error in representing the population. The margin of error defines this sampling error by indicating that, if we could ask the entire population, we expect the result would be within {3 of 45%. That is, we would expect the actual percent of the population that supports the candidate to be inside the interval that is between 42% and 48%. Thus, the margin of error describes an interval by describing a central value, with plus or minus some amount. In behavioral research we perform interval estimation in a similar way by creating a confidence interval. Confidence intervals can be used to describe various population parameters, but the most common is for estimating m. The confidence interval for M point estimation A way to estimate a population parameter by describing a point on the variable at which the population parameter is expected to fall interval estimation A way to estimate a population parameter by describing an interval within which the population parameter is expected to fall margin of error Describes an interval by describing a central value, with plus or minus some amount confidence interval for M A range of values of m within which we are confident that the actual m is found describes the interval within which we are confident that a population m falls. So, in our optimism study, instead of merely saying that our sample of men represents a m somewhere around 7.78, we can use a confidence interval to define “around.” To do so, we’ll identify the values of m that the sample mean is likely to represent. You can visualize this as shown here: m low c m m m m 7.78 m m m m c m high 冦 > Answers values of m,one of which is likely to be represented by our sample mean The mlow is the lowest m that our sample mean is likely to represent, and mhigh is the highest m that the mean is likely to represent. When we compute these two values, we have the confidence interval because we are confident that the m being represented by our sample falls between them. When is a sample mean likely to represent a particular m? It depends on sampling error. For example, intuitively we know that sampling error is unlikely to produce a sample mean of 7.78 if m is, say, 500: A sample “couldn’t” be that unrepresentative. In other words, 7.78 is significantly different from 500. But Chapter 8: Hypothesis Testing Using the One-Sample t-Test 135 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. sampling error is likely to produce a sample mean of 7.78 if, for example, m is 8 or 9: That’s a believable amount of sampling error. In other words, 7.78 is not significantly different from these ms. Thus, a sample mean is likely to represent any m that the mean is not significantly different from. The logic behind a confidence interval is to compute the highest and lowest values of m that are not significantly different from our sample mean. All ms between these two values are also not significantly different from the sample mean, so the mean is likely to represent one of them. In other words, the m being represented by our sample mean is likely to fall within this interval. Now we simply need to rearrange this formula to find the value of m that, along with our X and sX , produces an answer equal to the tcrit for our study. However, we want to do this twice, once describing the highest m above our X and once describing the lowest m below our X Therefore we always use the two-tailed value of tcrit. Then we find the m that produces a ⫺tobt equal to ⫺tcrit, and we find the m that produces a ⫹tobt equal to ⫹tcrit. Luckily, we can combine all of these steps into this one formula. THE FORMULA FOR THE CONFIDENCE INTERVAL FOR m IS (sX )(⫺tcrit) ⫹X ⱕ m ⱕ (sX )(⫹tcrit) ⫹ X We compute a confidence interval only after finding a significant tobt. This is because we must be sure our sample is not representing the m described by H0 before we estimate any other m it might represent. Thus, we determined that our men represent a m that is different from that of women, so now we can describe that m. STEP 1: Find the two-tailed tcrit and fill in the formula. For our optimism study, the two-tailed tcrit for a ⫽ .05 and df ⫽ 8 is { 2.306. The X ⫽ 7.78 and sX ⫽ .64. Filling in the formula, we have (.64)(⫺2.306) ⫹7.78 ⱕ m ⱕ (.64)(⫹2.306)⫹7.78 STEP 2: Multiply each tcrit times sX. After multiplying .64 times ⫺2.306 and ⫹2.306, we have ⫺1.476 ⫹7.78 ⱕ m ⱕ ⫹1.476⫹ 7.78. 8-4a Computing the Confidence Interval The t-test forms the basis for the confidencee interval, and it works like this. We seek the highest m above our sample mean that is not significantly different rent from our mean and the lowest m below our sample ple mean that is not significantly different from our mean. The most that a m and sample mean can differ and d still not be significant is when they produce a tobt that at equals tcrit. We can state this using the formula for the he t-test: tcrit ⫽ 136 sample mean X ⫺m sX Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ©iStockphoto.com/LevKing A confidence interval describes the values of m that are not significantly different from our sample mean, so it is likely our mean represents one of them. The m in the formula stands for our m that we are estimating. The components to the left of m will produce mlow, so we are confident our m is greater than or equal to this value. The components to the right of m will produce mhigh, so we are confident that our m is less than or equal to this m. In the formula, the X and sX are from your data. Find the two-tailed value of tcrit in the t-table at your a for df ⫽ N ⫺ 1, where N is the sample N. STEP 3: Add the above positive and negative answers to the X. After adding {1.476 to 7.78, we have 6.30 ⱕ m ⱕ 9.26. This is the finished confidence interval: We are confident our sample mean represents a m that is greater than or equal to 6.30, but less than or equal to 9.26. In other words, if we could measure the optimism scores of all men in the population, we expect their m would be between these two values. (Notice that after Step 2, you have the margin of error, because we expect the m is 7.78, plus or minus 1.476.) > Quick Practice > A confidence interval for m provides a range of ms, any one of which our X is likely to represent. More Examples A tobt is significant with X ⫽ 50, N ⫽ 20, and sX ⫽ 4.7. To compute the 95% confidence interval, df ⫽19, so the two-tailed tcrit ⫽ {2.093. Then, (s X )(⫺tcrit )⫹X ⱕ m ⱕ (s X )(⫹tcrit )⫹ X (4.7)(⫺2.093) ⫹ 50 ⱕ m ⱕ (4.7)(⫹2.093) ⫹ 50 (⫺9.837) ⫹ 50 ⱕ m ⱕ (⫹9.837) ⫹ 50 40.16 ⱕ m ⱕ 59.84 Use the two-tailed critical value when computing a confidence interval even if you have performed a one-tailed t-test. For Practice 1. What does this 95% confidence indicate: 15 ⱕ m ⱕ 20? 2. With N ⫽ 22, you perform a one-tailed test (a ⫽ .50). What is tcrit for computing the confidence interval? 3. The tobt is significant when X ⫽ 35, sX ⫽ 3.33, and N ⫽ 22. Compute the 95% confidence interval. > Answers 3. (3.33)(⫺2.080)⫹35 ⱕ m ⱕ (3.33)(⫹2.080)⫹35 ⫽ 28.07 ⱕ m ⱕ 41.93 2. With df ⫽ 21, the two-tailed tcrit ⫽ {2.080. 1. We are 95% confident that our X represents a m between 15 and 20. Because we created our interval using the tcrit for an a of .05, there is a 5% chance that our m is outside of this interval. On the other hand, there is a 95% chance that the m is within the interval. Therefore, we have created what is called the 95% confidence interval: We are 95% confident that the interval between 6.30 and 9.26 contains our m. Usually this gives us sufficient confidence. However, had we used the tcrit for a ⫽ .01, the interval would have spanned a wider range, giving us even more confidence that the interval contained the m. Then we would have created the 99% confidence interval. Usually, researchers report the 95% confidence interval. Thus, we conclude our one-sample t-test by saying, with 95% confidence, that our sample of men represents a m between 6.30 and 9.26. The center of the interval is still at our X of 7.78, but now we have much more information than if we had merely said m is somewhere around 7.78. Therefore, you should compute a confidence interval anytime you are describing the m represented by the X in a condition of a significant experiment, or in any type of study in which you believe a X represents a distinct population m. Compute a confidence interval to estimate the m represented ⫺ by the X of a condition in an experiment. Chapter 8: Hypothesis Testing Using the One-Sample t-Test 137 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 8-5 STATISTICS IN THE RESEARCH LITERATURE: REPORTING t Report the results of a one- or two-tailed t-test in the same way that you reported the z-test, but also include the df. In our optimism study, we had 8 df, the tobt was ⫺3.47, and with a ⫽ .05, the result was significant. We report this as: t(8) ⫽ ⫺3.47, p ⬍ .05 Notice the df in parentheses. (Had these results not been significant, then we’d have p ⬎ .05.) Usually, confidence intervals are reported in sentence form, and we always indicate the confidence level used. So you might say, “The 95% confidence interval for the m of men was 6.30 to 9.26.” Note that researchers usually report the smallest value of a at which a result is significant For example, it turns out that when a is .01, tcrit is {3.355, so our tobt of ⫺3.47 also would be significant if we had used the .01 level. Therefore, instead of saying p ⬍ .05 above, we would provide more information by reporting that p ⬍ .01 because then we know that the probability of a Type I error is not in the neighborhood of .04, .03, or .02. Further, computer programs like SPSS determine the precise, minimum size of the region of rejection that our tobt falls into, so researchers often report the exact probability of a Type I error. For example, you might see “p ⫽ .04.” This indicates that tobt falls into a region of rejection that is .04 of the curve, and therefore the probability of a Type I error equals .04. This probability is less than the maximum of .05 that we require, so we conclude this result is significant. On the other hand, say that you see p ⫽ .07. Here, a larger region of rejection is needed for the results to be significant, However, with a region of rejection that is .07 of the curve, the probability of a Type I error is now .07. This probability is larger than the maximum of .05 that we require, so we conclude this result is not significant. USING SPSS As described on Review Card 8.4, SPSS will perform the one-sample t-test, computing tobt, the X , sX , sX , and the 95% confidence interval. Also, as described in the previous section, SPSS indicates the smallest region of rejection that our tobt will fall into. This is labeled as “Sig. (2-tailed)” and tells you the smallest two-tailed alpha level at which your tobt can be considered significant. Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out and use the Chapter Review Cards in the back of your book. Check out the additional study aids online in CourseMate at www.cengagebrain.com STUDY PROBLEMS (Answers for odd-numbered questions are in Appendix C.) 1. Using the terms relationship and sampling error, what are the two explanations for any experiment’s results? 2. (a) Why do we reject H0 when tobt is in the region of rejection? (b) Why do we retain H0 when tobt is not in the region of rejection? 3. (a) How must your experiment be designed so that you can perform the one-sample t-test? (b) What 138 type of dependent scores should you have? (c) How do you choose between the t-test and the z-test? 4. (a) What is the difference between sX and sX ? (b) How is their use the same? 5. (a) Why don’t we always use 1.96 or 1.645 as the critical value in the t-test? (b) Why are there different values of tcrit when samples have different Ns? (c) What must you determine in order to Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. find tcrit in a particular study? (d) What does df equal in the one-sample t-test? With a ⫽ .05 in a two-tailed test: (a) What is tcrit when N is 25? (b) What is tcrit when N is 61? (c) Why does tcrit decrease as N increases? What is the final step when dealing with significant results in any study? (a) What does a confidence interval for m indicate? (b) When do we compute a confidence interval? We conduct a one-sample experiment to determine if performing 15 minutes of meditation daily reduces an individual’s stress levels: X ⫽ 73.2. In the population of people who don’t meditate, stress levels produce m ⫽ 88. With N ⫽ 31 we compute tobt ⫽ ⫹2.98. (a) Is this a one- or a two-tailed test? (b) What are H0 and Ha? (c) What is tcrit? (d) Are the results significant? (e) Should we compute a confidence interval? (f) To compute the confidence interval, what is tcrit? Say our X ⫽ 44. (a) Estimate the m using point estimation. (b) What additional information does a confidence interval tell you? (c) Why is computing a confidence interval better than using a point estimate? The layout of this textbook is different from that of the typical textbook. We ask whether this is beneficial or detrimental to learning statistics. On a national statistics exam, m ⫽ 68.5 for students using other textbooks. A sample of 10 students using this book has X ⫽ 78.5, sx2 ⫽ 130.5. (a) What are H0 and Ha for this study? (b) Compute tobt. (c) With a ⫽ .05, what is tcrit? (d) What do you conclude about the use of this book? (e) Report your results using the correct format. (f) Compute the confidence interval for m if appropriate. A researcher predicts that smoking cigarettes decreases a person’s sense of smell. On a test of olfactory sensitivity, the m for nonsmokers is 18.4. A sample of 12 people who smoke a pack a day produces X ⫽ 16.25, sX2 ⫽ 4.75. (a) What are H0 and Ha for this study? (b) Compute tobt. (c) What is your tcrit? (d) What should the researcher conclude about this relationship? (e) Report your results using the correct format. (f) Compute the confidence interval for m if appropriate. Bonita studies if hearing an argument in favor of an issue alters participants’ attitudes toward the issue one way or the other. She presents a brief argument to 8 people. In a national survey about this issue, m ⫽ 50. She obtains X ⫽ 53.25 and sX2 ⫽ 69.86. (a) What are H0 and Ha? (b) What is tobt? (c) What is tcrit? (d) Report the results in the correct format and indicate if they are significant. Should she compute the confidence interval for m? (e) What should Bonita conclude about the relationship? In question 13, (a) what error has Bonita potentially made? (b) With her variables, what would the error be? (c) What statistical principle should she be concerned with and why? We ask whether people who usually use the grammar-checking function in a word-processing program make more or fewer grammatical errors in a hand-written draft. On a national test of students using such programs, the number of errors per page is m ⫽ 12. A sample prohibited from using the program for a semester has these scores: 8, 12, 10, 9, 6, 7 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. (a) What are H0 and Ha? (b) Perform the t-test and draw the appropriate conclusion. (c) Compute the confidence interval if appropriate. We study the effect of wearing uniforms in middle school on attitudes toward achieving good grades. On a national survey, the average attitude score for students who do not wear uniforms is m ⫽ 79. A sample of 41 students who wear uniforms has scores of X ⫽ 83.5, sX2 ⫽ 159.20. (a) What are H0 and Ha? (b) Perform the procedures to decide about the effect of wearing uniforms. (c) If we measured the population wearing uniforms, what m do you confidently predict we would find? You read that the starting mean salary in your chosen profession is $46,000 per year, { +4,000. (a) What is this type of estimate called? (b) Interpret this report. (a) Is the one-tailed or two-tailed tcrit used to compute a confidence interval? (b)Why? Senator Smith has an approval rating of 35%, and Senator Jones has an approval rating of 37%. For both, the margin of error is {3%. A news report says that, statistically, they are tied. Explain why this is true. SPSS computed two t-tests with the following results. For each, using a ⫽ .05, should you conclude the result is significant, and why? (a) p ⫽ .03; (b) p ⫽ .09 While reading research reports, you encounter the following statements. For each, identify the N, the predicted relationship, the outcome, and the possible type of error being made. (a) “When we examined the perceptual skills data (M ⫽ 55, SD ⫽ 11.44), comparing adolescents to adults produced t(45) ⫽ ⫹3.76, p ⬍ .01.” (b) “The influence of personality type failed to produce a difference in emotionality scores, with t(99) ⫽ ⫺1.72, p ⬎ .05.” In a two-tailed test, a ⫽ .05 and N is 35. (a) Is tobt ⫽ ⫹2.019 significant? (b) Is tobt ⫽ ⫺2.47 significant? Report the results in problem 22 using the correct format. Study A reports results with p ⫽ .031. Study B reports results with p ⬍ .001. What is the difference between these results in terms of: (a) how significant they are? (b) the size of their critical values? (c) the size of their regions of rejection? (d) the probability of a statistical error? Summarize the steps involved in conducting a one-sample experiment (with a t-test) from beginning to end. Chapter 8: Hypothesis Testing Using the One-Sample t-Test 139 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 9 HYPOTHESIS TESTING USING THE TWO-SAMPLE t-TEST LOOKING BACK GOING F O R WA R D Be sure you understand: Your goals in this chapter are to learn: • From Chapter 1, what a condition, independent variable, and dependent variable are. • The logic of a two-sample experiment. • From Chapter 8, how to perform the one-sample t-test using the t-distribution and df, and what a confidence interval is. Sections 9-1 Understanding the Two-Sample Experiment 9-2 The Independent-Samples t-Test • The difference between independent samples and related samples. • When and how to perform the independent-samples t-test. • When and how to perform the related-samples t-test. 2 • What effect size is and how it is measured using Cohen’s d or rpb . T his chapter presents the two-sample t-test, which is the major parametric procedure used when an experiment involves two samples. As the name implies, this test is similar to the one-sample t-test you saw in Chapter 8. However, we have two ways to create a two-sample design, and each requires a different procedure and formulas. Performing the Independent-Samples t-Test So that you don’t get confused, view the discussion of each 9-4 The Related-Samples t-Test called the independent-samples t-test. Then we will discuss the 9-5 Performing the Related-Samples t-Test related-samples t-test. Finally, we will discuss a new technique 9-6 Statistics in the Research Literature: Reporting a Two-Sample Study 9-7 Describing Effect Size 140 Behavioral Sciences STAT2 9-3 procedure as separate and distinct—a mini-chapter. First we will discuss the t-test for one type of two-sample experiment, t-test for the other type of two-sample experiment, called the for describing the relationship in either type of experiment, called effect size. Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Fancy Collection/SuperStock 9-1 UNDERSTANDING THE TWO-SAMPLE EXPERIMENT under Condition 2. A possible outcome from such an experiment is shown in Figure 9.1. If each condition represents a different population, then the experiment has demonstrated a relationship in nature. However, there’s the usual problem of sampling error. Even though we may have different sample means, changing the conditions of the independent variable may not really change the dependent scores in nature. Instead, we might find the same population of scores under each condition, but one or both of our conditions poorly represent this population. The one-sample experiment discussed in previous chapters is not often found in real research, because it requires that we know the value of m for a population under one condition of the independent variable. However, because we explore new behaviors and variables, we usually do not know m ahead of time. Instead, the much more common approach is to conduct a two-sample experiment, measuring Figure 9.1 participants’ dependent Relationship in the Population in a Two-Sample Experiment scores under two condiAs the conditions change, the population tends to change in a consistent fashion. tions of the independent variable. Condition 1 Condition 1 Condition 2 produces one sample µ1 µ2 X1 X2 mean—call it X1—that represents m1, the m we would find if we tested everyone in the populaf tion under Condition 1. Condition 2 produces another sample mean— call it X2—that repreX X X X X X X X X X X X X X sents m2, the m we would Low scores find if we tested everyDependent scores one in the population X X X X High scores Chapter 9: Hypothesis Testing Using the Two-Sample t-Test 141 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. independentsamples t-test The parametric procedure used to test sample means from two independent samples independent samples Samples created by selecting each participant for one condition without regard to the participants selected for any other condition homogeneity of variance The requirement that the populations represented in a study have equal variances Thus, in Figure 9.1 we might find only the lower or upper distribution, or we might find one in between. Therefore, before we make any conclusions about the experiment, we must determine whether the difference between the sample means reflects sampling error. The parametric statistical procedure for determining whether the results of a two-sample experiment are significant is the two-sample t-test. However, we have two different ways to create the samples, so we have two different versions of the t-test: One is called the independentsamples t-test, and the other is the related-samples t-test. Here is a study that calls for the independentsamples t-test. We propose that people may recall an event differently when they are hypnotized. To test this, we’ll have two groups watch a videotape of a supposed robbery. Later, one group will be hypnotized and then answer 30 questions about the event. The other group will answer the questions without being hypnotized. Thus, the conditions of the independent variable are the presence or absence of hypnosis, and the dependent variable is the amount of information correctly recalled. This design is shown in Table 9.1. We will compute the mean of each condition (each column). If the means differ, we’ll have evidence of a relationship where, as amount of hypnosis changes, recall scores also change. First we check that our study meets the assumptions of the statistical test. In addition to requiring independent samples, this t-test has two other requirements: 1. The dependent scores are normally distributed interval or ratio scores. 2. Here’s a new one: The populations have homogeneous variance. 9-2 THE INDEPENDENTSAMPLES t-TEST The independent-samples t-test is the parametric procedure for testing two sample means from independent samples. Two samples are independent when we randomly select participants for a condition without regard to who else has been selected for either condition. Then the scores res in one sample are not influenced by—are —are “independent” of—the scores in the other sample. You can recognize indendependent samples by the absence off things such as matching the participants in one condition with those in the other condition or repeatedly testing the same participants in both conditions. 142 Homogeneity of variance means that the variances (s2x) of the populations being represented are equal. You can determine if your data meet these assumptions by seeing how other researchers analyze your variables in the research literature. Note: You are not required to have the same number of participants in each condition, but the samples should not be massively unequal. Table 9.1 Diagram of Hypnosis Study Using an Independent-Samples Design The independent variable is amount of hypnosis, and the dependent variable is recall. Recall Scores ➝ ©iStockphoto.com/Grafissimo; ©iStockphoto.com/Jaroslaw Wojcik The two ways to calculate the two-sample t-test are the independent-samples t-test and the related-samples t-test. No Hypnosis X X X X X Hypnosis X X X X X X X Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 9-2a Statistical sampling distribution of differences between means Shows all Depending on our exper-imental hypotheses, we may perform either a one- or two-tailed test. t. Let’s begin with a twowotailed test: We simply predict that the hypnosis condition ion will produce recall scores that aree different from those in the no-hypnosis condition, so our samples represent different populations that have different ms. First, the alternative hypothesis: The predicted relationship exists if one population mean (m1) is larger or smaller than the other (m2). That is, m1 should not equal m2. We could state this as Ha: m1 ⬆ m2 , but there is a better way. If the two ms are not equal, then their difference does not equal zero. Thus, the two-tailed alternative hypothesis is Ha: m1 ⫺ m2 ⬆ 0 Ha implies that the means from our conditions each represent a different population of recall scores, so a relationship is present. Now, the null hypothesis: If no relationship exists, then if we tested everyone under the two conditions, each time we would find the same population of recall scores having the same m. In other words, m1 equals m2. We could state this as H0: m1 ⫽ m2 , but, again, there is a better way. If the two ms are equal, then their difference is zero. Thus, the two-tailed null hypothesis is H0: m1 ⫺ m2 ⫽ 0 H0 implies that both sample means represent the same population of recall scores, which have the same m, so no relationship is present. If our sample means differ, H0 maintains that this is due to sampling error in representing that one m. Notice that the above hypotheses do not contain a specific value of m. Therefore, they are the two-tailed hypotheses for any dependent variable. However, this is true only when you test whether the data represent zero difference between the populations. This is the most common approach and the one we will use. (You can also test for nonzero differences: You might know ©iStockphoto.com/RTimages Hypotheses for the he IndependentSamples t-Test differences between two means that occur when samples are drawn from the population of scores that H0 says we are representing of an existing difference between tw two populations and test if the independent variable alters alte that difference. Consult an advanced statistics book for the details of this test.) As usual, we test the null hypothesis, and to do that we examine the sampling distribution. 9-2b The Sampling Distribution for the Independent-Samples t-Test To understand the sampling distribution, let’s say that we find a mean recall score of 20 in the no-hypnosis condition and a mean of 23 in the hypnosis condition. We summarize these results using the difference between our means. Here, changing from no hypnosis to hypnosis results in a difference in mean recall of 3 points. We always test H0 by finding the probability of obtaining our results when no relationship is present, so here we will determine the probability of obtaining a difference of 3 between our Xs when they actually represent zero difference in the population. Think of the sampling distribution as being created in the following way. We select two random samples from one raw score population, compute the means, and arbitrarily subtract one from the other. (Essentially, this is what H0 implies that we did in our study, coincidently labeling the sample containing higher scores as “Hypnosis”.) When we arbitrarily subtract one mean from the other, the result is the difference between the means, symbolized by X1 ⫺ X2. If we do this an infinite number of times and plot the frequency distribution, we have the sampling distribution of differences between means. This is the distribution of all possible differences between two means when both samples come from the one raw score population that H0 says we are representing. You can envision this sampling distribution as in Figure 9.2. Chapter 9: Hypothesis Testing Using the Two-Sample t-Test 143 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. The mean of the Figure 9.2 sampling distribution is Sampling Distribution of Differences between Means When H0: m1 ⫺ m2 ⫽ 0 zero because, most often, The X axis has two labels: Each X 1⫺X 2 symbolizes the difference between two sample means; when labeled as t, a both sample means will larger { t indicates a larger difference between means that is less likely when H0 is true. equal the m of the raw score population, so µ their difference will be zero. Sometimes, however, because of sampling error, both sample f means will not equal m or each other. Depending on whether X1 or X2 is larger, the difference Differences X1 – X2 X1 – X2 X1 – X2 X1 – X2 X1 – X2 X1 – X2 X1 – X2 0 X1 – X2 will be positive or negaValues of t ... –3.0 –2.0 –1.0 0 +1.0 +2.0 +3.0 ... tive. In Figure 9.2, the negative differences are Larger negative differences Larger positive differences in the left half of the distribution, and the positive differences are in the of scores in a study. (Until now we’ve only had one right half. Small negative or positive differences occur sample so N was both). Now, with two conditions, frequently, but larger ones do not. we will use the lowercase n to stand for the number To test H0, we compute a new version of tobt to of scores in each sample. Thus, n1 is the number of determine where on this sampling distribution the scores in Condition 1, and n2 is the number of scores difference between our sample means lies. As in in Condition 2. (Adding the ns together equals N.) Figure 9.2, the larger the value of { tobt, the farther Likewise, we will compute an estimated population into the tail of the distribution our difference lies, so variance for each condition, using the symbols s21 for the less likely it is to occur when H0 is true (when the variance in Condition 1 and s22 for the variance in really there is no relationship). Condition 2. Arbitrarily decide which condition will be Condition 1 and which will be Condition 2. Then, as in the previous chapter, you compute tobt by performing three steps: (1) estimating the variance of the raw score population, (2) computing the estimated stanThe independent-samples t-test dard error of the sampling distribution, and (3) computing tobt. determines the probability of obtaining our difference between Xs when H0 is true. 9-3 PERFORMING THE INDEPENDENT-SAMPLES t-TEST Before computing tobt we must first expand our set of symbols. Previously, N has been the number of scores in a sample, but actually N is the total number 144 STEP 1: Compute the mean and estimated population variance in each condition. Using the scores in Condition 1, compute X1 and s21; using the scores in Condition 2, compute X2 and s22. THE FORMULA FOR THE ESTIMATED VARIANCE IN A CONDITION IS (⌺X ) 2 n n⫺1 ⌺X 2 ⫺ s2x ⫽ Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. can determine how spread out the sampling distribution is by computing the standard error. N stands for the total number of scores in an experiment; n stands for the number of scores in a condition. STEP 2: Compute the pooled variance. Both s21 and s22 estimate the population variance, but each may contain sampling error. To obtain the best estimate, we compute an average of the two. Each variance is “weighted” based on the size of its sample. This weighted average is called the pooled variance, and its symbol s2pool. THE FORMULA FOR THE POOLED VARIANCE IS s2pool ⫽ (n1 ⫺ 1)s21 ⫹ (n2 ⫺ 1)s22 (n1 ⫺ 1) ⫹ (n2 ⫺ 1) STEP 3: Compute the standard error of the difference. The standard error of the difference is the standard deviation of the sampling distribution of differences between means (of the distribution back in Figure 9.2). The symbol for the standard error of the difference is sX–– ⴚX–– . 1 (17 ⫺ 1)9.0 ⫹ (15 ⫺ 1)7.5 (17 ⫺ 1) ⫹ (15 ⫺ 1) After subtracting, we have s2pool ⫽ (16)9.0 ⫹ (14)7.5 16 ⫹ 14 In the numerator, 16 times 9 is 144, and 14 times 7.5 is 105. In the denominator, 16 plus 14 is 30, so 144 ⫹ 105 249 s2pool ⫽ ⫽ ⫽ 8.3 30 30 The s2pool is our estimate of the variability in the raw score population that H0 says we are representing. As in previous procedures, once we know how spread out the underlying raw score population is, we standard error of the difference (sX––1ⴚX––2) The estimated standard deviation of the sampling distribution of differences between the means 2 THE FORMULA FOR THE STANDARD ERROR OF THE DIFFERENCE IS 1 s2pool ⫽ The weighted average of the sample variances in a twosample t-test In previous chapters we computed the standard error by dividing the variance by N and then taking the square root. However, instead of dividing by N we can multiply by 1/N. Then for the two-sample t-test, we substitute the pooled variance and our two ns, producing this formula: sX ⫺X ⫽ This says to determine n1 ⫺ 1 and multiply this by the s21 that you computed above. Likewise, find n2 ⫺ 1 and multiply it by s22. Add the results together and divide by the sum of (n1 ⫺ 1) ⫹ (n2 ⫺ 1). For example, say that the hypnosis study produced the results shown in Table 9.2. Filling in the formula, we have pooled variance (s2pool) 2 B 2 (spool )a 1 1 ⫹ b n1 n2 To compute sX ⫺X , first reduce the fraction 1/n1 and 1/n2 to decimals. Then add them together. Then multiply the sum times s2pool, which you computed in Step 2. Then find the square root. For the hypnosis study, s2pool is 8.3, n1 is 17, and n2 is 15. Filling in the formula gives 1 sX ⫺X ⫽ 1 2 B 2 (8.3)a 1 1 ⫹ b 17 15 Table 9.2 Data from the Hypnosis Study Condition 1: Hypnosis X1 ⫽ 23 Condition 2: No Hypnosis X 2 ⫽ 20 Number of participants n1 ⫽ 17 n2 ⫽ 15 Estimated variance s21 ⫽ 9.0 s22 ⫽ 7.5 Mean recall score Chapter 9: Hypothesis Testing Using the Two-Sample t-Test 145 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. First, 1/17 is .059 and 1/15 is .067, so > Quick Practice sX ⫺X ⫽ 28.3(.059 ⫹ .067) 1 2 To compute the independent-samples tobt: After adding, sX ⫺X ⫽ 28.3(.126) ⫽ 21.046 ⫽ 1.023 1 2 STEP 4: Compute tobt. In previous chapters we’ve calculated how far the result of our study (X) was from the mean of the sampling distribution (m) when measured in standard error units. Now the “result of our study” is the difference between our two sample means, which we symbolize as (X1 ⫺ X2). The mean of the sampling distribution is the difference between the ms described by H0 and is symbolized by (m1 – m2). Finally, our standard error is sX ⫺X . All together, we have 1 > > > > Compute X1, s21, and n1; X2, s22, and n2. Then compute the pooled variance (s2pool). Then compute the standard error of the difference (sX––1⫺X––2). Then compute tobt. More Examples An independent-samples study produced the following data: X1 ⫽ 27, s21 ⫽ 36, n1 ⫽ 11, X2 ⫽ 21, s22 ⫽ 33, and n2 ⫽ 11. 2 s2pool ⫽ THE FORMULA FOR THE INDEPENDENTSAMPLES t-TEST IS tobt ⫽ (X1 ⫺ X2) ⫺ (m1 ⫺ m2) sX ⫺X 1 sX ⫺X ⫽ 2 1 Here, X1 and X2 are the sample means, sX ⫺X is computed in Step 3, and the value of m1 ⫺ m2 is specified by the null hypothesis. We write H0 as m1 ⫺ m2 ⫽ 0 to indicate that the value of m1 ⫺ m2 to put into this formula is always zero. Then the formula measures how far our difference between Xs is from the zero difference between the ms that H0 says we are representing, when measured in standard error units. For the hypnosis study, our sample means are 23 and 20, the difference between m1 and m2 is 0, and sX ⫺X is 1.023. Putting these values into the formula gives 1 1 2 ⫽ 2 2 tobt ⫽ ⫽ (23 ⫺ 20) ⫺ 0 1.023 tobt ⫽ (n1 ⫺ 1)s21 ⫹ (n2 ⫺ 1)s22 (n1 ⫺ 1) ⫹ (n2 ⫺ 1) (10)36 ⫹ (10)33 ⫽ 34.5 10 ⫹ 10 B B s2pool a 1 1 ⫹ b n1 n2 34.5a 1 1 ⫹ b ⫽ 2.506 11 11 (X1 ⫺ X2) ⫺ (m1 ⫺ m2) sX ⫺X 1 2 (27 ⫺ 21) ⫺ 0 ⫽ ⫽ ⫹2.394 2.506 For Practice We find X1 ⫽ 33, s21 ⫽ 16, n1 ⫽ 21, X2 ⫽ 27, s22 ⫽ 13, and n2 ⫽ 21 1. Compute the pooled variance (s2pool). 2. Compute the standard error of the difference (sX ⫺X ). 1 2 3. Compute tobt. After subtracting the means: 1 2 2. sX ⫺X ⫽ 2 1. spool ⫽ B 14.5a 1 1 ⫹ b ⫽ 1.18 21 21 (20)16 ⫹ (20)13 ⫽ 14.5 20 ⫹ 20 146 (33 ⫺ 27) ⫺ 0 ⫽ ⫹5.08 1.18 Our tobt is ⫹2.93. Thus, the difference between our sample means is located at something like a z-score of ⫹2.93 on the sampling distribution of differences when both samples represent the same population. > Answers 3. tobt ⫽ (⫹3) ⫺ 0 ⫹3 tobt ⫽ ⫽ ⫽ ⫹2.93 1.023 1.023 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. the IndependentSamples t-Test To determine if tobt is significant, we compare it to tcrit which is found in the t-table (Table 2 in Appendix B). We again obtain tcrit using degrees of freedom, but with two samples, the df is computed differently. Figure 9.3 H0 Sampling Distribution of Differences between Means When m1 ⫺ m2 ⫽ 0 The tobt shows the location of a difference of ⫹3.0. µ f Differences THE FORMULA FOR THE DEGREES OF FREEDOM FOR THE INDEPENDENTSAMPLES t-TEST IS Values of t X1 – X2 X1 – X2 –3.0 df ⫽ (n1 ⫺ 1) ⫹ (n2 ⫺ 1) where each n is the number of scores in a condition. Another way of expressing pressing this is df d ⫽ (n1 ⫹ n2) ⫺ 2 For the hypnosis study, tudy, n1 ⫽ 17 and n2 ⫽ 15, so df equals (17 ⫺ 1) ⫹ (15 15 ⫺ 1), which is 30. With alpha at .05, the two-tailed ailed tcrit is { 2.042. The complete sampling pling distribution is in Figure 9.3. It shows all differences nces between sample means that occur through sampling mpling error when the samples really represent no difference in the population (when hypnosis does not influence recall). Our H0 says that the difference nce of ⫹3 between our sample means is merely erely a poor representation of no difference. rence. But, the sampling distribution shows that a difference of ⫹3 hardly ly ever occurs when the samples represent esent no difference. Therefore, it is difficult fficult to believe that our difference of ⫹3 represents no difference. In fact, the tobt lies beyond tcrit, so the results are significant: nificant: Our difference of ⫹3 is so unlikely to occur if our samples es are representing no difference ence in the population that we reject that this is what hat they represent. Thus, we reject H0 and accept Ha that our data represent a difference ence X1 – X2 –2.0 X1 – X2 –1.0 0 0 – tcrit = –2.042 X1 – X2 X1 – X2 X1 – X2 +1.0 +2.0 + tcrit = +2.042 X1 – X2 +3.0 + tobt = +2.93 between ms that is not zero. We can summarize this by saying that our difference of ⫹3 is significantly different from 0. Or we can say that our two means differ significantly from each other. The mean for hypnosis (23) is larger larg than the mean for no hypnosis hypno (20), so we can also conclude co that hypnosis leads to t significantly higher recall scores. (As usual with a ⫽ .05, the probability of a Type I error abili is p ⬍ .05.) If our tobt had not been be beyond tcrit, then this would indicate that our difference between means occurs ofte often when representing the situation where there iis no difference in the population. Therefore, we would w not reject H0, and we would have no evide evidence for or against a relationship between hypnosi hypnosis and recall. Then we would consider if our design had sufficient power so that we’d be confident we h had not made a Type II error (retaining a false H0). Because w we did find a significant result, w we describe and interpret the rela relationship. First, from our sample m means, we expect the m for no hypnosis hy to be around 20 and the m for hypnosis to be around 23. T To precisely describe these ms, we could compute a confidence inter interval for each m, using ©iStockphoto.com/CurvaBezier; ©iStockphoto.com/ Jaroslaw Wojcik ©iStockphoto.com/VikramRaghuvanshi 9-3a Interpreting Chapter 9: Hypothesis Testing Using the Two-Sample t-Test 147 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. the formula in the previous chapter and the data from one condition at a time. Alternatively, we could compute the confidence interval for the difference between two ms. In our study, the difference between our means was ⫹3, so we expect the difference between the population ms for hypnosis and no hypnosis to be around ⫹3. This confidence interval describes a range of differences between ms within which we are confident the actual difference between our ms falls. The computations for this confidence interval are presented in Appendix A.2. Finally, remember that finding a significant result is not the end of the story. Now we become behavioral researchers again, interpreting and explaining the relationship in terms of our variables: What are the psychological or neurological mechanisms that resulted in improved recall after hypnosis? And note: An important piece of information for interpreting the influence of your independent variable is called the effect size, which is described in the final section of this chapter. difference that is in the right-hand tail of the sampling distribution, so tcrit is positive.) 6. Complete the t-test as we did previously. Be careful to subtract the Xs in the same way you subtracted the ms! (We used mh ⫺ mn, so we’d compute Xh ⫺ Xn.) Confusion arises because, while still predicting a larger mh, we could have reversed Ha, saying mn ⫺ mh ⬍ 0. Subtracting the larger mh from the smaller mn should produce a negative difference, so now the region of rejection is in the left-hand tail, and tcrit is negative. 9-3c Summary of the Independent-Samples t-Test After checking that the study meets the assumptions, the independent-samples t-test involves the following. 1. Create either the two-tailed or the one-tailed H0 and Ha. 2. Compute tobt by following these four steps. 9-3b Performing One-Tailed Tests on Independent Samples Recall that one-tailed tests are used only when we can confidently predict the direction the dependent scores will change. For example, we could have conducted the hypnosis study using a one-tailed test if we had reason to believe that hypnosis results in higher recall scores than no hypnosis. Everything discussed above applies here, but to prevent confusion, use more meaningful subscripts than 1 and 2. For example, use the subscript h for hypnosis and n for no-hypnosis. Then follow these steps: 1. Decide which X and corresponding m is expected to be larger. (We think the m for hypnosis is larger.) 2. Arbitrarily decide which condition to subtract from the other. (We’ll subtract no-hypnosis from hypnosis.) 3. Decide whether the difference will be positive or negative. (Subtracting what should be the smaller mn from the larger mh should produce a positive difference, one that’s greater than zero.) 4. Create Ha and H0 to match this prediction. (Our Ha is that mh ⫺ mn ⬎ 0; H0 is that mh ⫺ mn ⱕ 0.) 5. Locate the region of rejection based on your predictions and subtraction. (We expect a positive 148 a. Compute X1, s21, and n1; X2, s22, and n2. b. Compute the pooled variance (s2pool). c. Compute the standard error of the difference (sX ⫺X ). 1 2 d. Compute tobt. 3. Set up the sampling distribution: Find tcrit in the t-table using df ⫽ (n1 ⫺ 1) ⫹ (n2 ⫺ 1). 4. Compare tobt to tcrit. If tobt is beyond tcrit, the results are significant; describe the relationship. If tobt is not beyond tcrit, the results are not significant; make no conclusion about the relationship. 5. If the results are significant, compute the “effect size” as described in Section 9.7. > Quick Practice > Perform the independent-samples t-test in experiments that test two independent samples. More Examples We perform a two-tailed experiment, so H0: m1 ⫺ m2 ⫽ 0 and Ha: m1 ⫺ m2 ⬆ 0. The X1 ⫽ 24, s21 ⫽ 9, n1 ⫽ 14, X2 ⫽ 21, s22 ⫽ 9.4, and n2 ⫽ 16. (continued) Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 9-4 THE RELATED-SAMPLES Then s2pool ⫽ ⫽ sX ⫺X ⫽ 1 2 ⫽ tobt ⫽ (n1 ⫺ 1) ⫹ (n2 ⫺ 1) (13)9 ⫹ (15)9.4 ⫽ 9.214 13 ⫹ 15 B B s2poola 1 1 ⫹ b n1 n2 9.214a 1 1 ⫹ b ⫽ 1.111 14 16 (X1 ⫺ X2) ⫺ (m1 ⫺ m2) sX ⫺X 1 ⫽ t-TEST (n1 ⫺ 1)s21 ⫹ (n2 ⫺ 1)s22 2 (24 ⫺ 21) ⫺ 0 ⫽ ⫹2.70 1.111 With a ⫽ .05 and df ⫽ (n1 ⫺ 1) ⫹ (n2 ⫺ 1) ⫽ 28, tcrit ⫽ {2.048. The tobt is significant: We expect m1 to be 24 and m2 to be 21. For Practice We test whether “cramming” for an exam is harmful to grades. Condition 1 crams for a pretend exam, but Condition 2 does not. Each n ⫽ 31, the cramming X is 43 (s2x ⫽ 64), and the no-cramming X is 48 (s2x ⫽ 83.6). 1. Subtracting cramming from no cramming, what are H0 and Ha? 2. Will tcrit be positive or negative? 3. Compute tobt. 4. What do you conclude about this relationship? > Answers Now we will discuss the other way to analyze the results of a two-sample experiment. The relatedsamples t-test is the parametric procedure used with two related samples. Related samples occur when we pair each score in one sample with a particular score in the other sample. Researchers create related samples to have more equivalent and thus comparable samples. The two types of research designs that produce related samples are matched-samples designs and repeated-measures designs. In a matched-samples design, the researcher matches each participant in one condition with a particular participant in the other condition. The matching is based on a variable relevant to the behavior being studied, but not the independent or dependent variable. For example, say we are studying some aspect of playing basketball, so we might match participants on the variable of their height. We would select pairs of people who are the same height and assign one member of the pair to each condition. Thus, relatedif two people are 6 feet tall, one is samples t-test assigned to one condition and the The parametric procedure used for other to the other condition. Liketesting sample means wise, a 4-foot person in one condifrom two related tion is matched with a 4-footer in samples the other condition, and so on. Then, related overall, the conditions are comparasamples Samples created by matching ble in height, so we’d proceed with each participant in the experiment. In the same way, we one condition with might match participants using their a participant in the age or physical ability, or we might other condition or by repeatedly use naturally occurring pairs, such measuring the same as roommates or identical twins. participants under all The other way to create related conditions samples is with a repeated-meamatchedsures design, in which each particsamples design When ipant is tested under all conditions each participant of the independent variable. That in one condition is, first participants are tested under is matched with a participant in the Condition 1, and then the same parother condition ticipants are tested under Condition repeated2. Although we have one sample of measures participants, we have two samples design When the of scores. same participants Matched-groups and repeatedare measured under all conditions of the measures designs are analyzed in the independent variable same way, using the related-samples 4. With a ⫽ .05 and df ⫽ 60, tcrit ⫽ ⫹1.671, tobt is significant: mc is around 43; mac is around 48. tobt ⫽ (5)>2.190 ⫽ ⫹2.28 1 2 sX ⫺X ⫽ 173.80(.065) ⫽ 2.190; 2 3. spool ⫽ (1920 ⫹ 2508)>60 ⫽ 73.80; 2. Positive 1. Ha: mac ⫺ mc ⬎ 0; H0: mac ⫺ mc ⱕ 0 Chapter 9: Hypothesis Testing Using the Two-Sample t-Test 149 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. t-test. (Related samples are also called dependent samples.) Except for requiring related samples, the assumptions for this t-test are the same as for the independentsamples t-test: (1) The dependent variable involves normally distributed interval or ratio scores and (2) the populations being represented have homogeneous variance. Because related samples form pairs of scores, the ns in the two samples must be equal. 9-4a The Logic of the Table 9.3 Finding the Difference Scores in the Phobia Study Each D ⫽ Before ⫺ After 1 (Millie) Before Therapy 11 2 (Archie) 3 (Jerome) 16 20 4 (Althea) 17 ⫺ ⫺ 5 (Leon) 10 ⫺ Related-Samples t-Test ©iStockphoto.com/Eric Isselée Let’s say that we have a new therapy to N⫽5 test on arachnophobes—people who are overly frightened by spiders. From the local phobia club we randomly select the unpowerful N of five participants and test our therapy using repeated measures of two conditions: before therapy and after therapy. Before therapy we measure each person’s fear response to a picture of a spider, measuring heart rate, perspiration, etc., and compute a “fear” score between 0 (no fear) and 20 (holy terror!). After providing the therapy, we again measure each person’s fear response to the picture. (A before-andafter, or pretest/posttest, design such as this always uses the related-samples t-test.) The left side of Table 9.3 shows the fear scores from the two conditions. First, we compute the mean of each condition. Before therapy the mean fear score is 14.80, and after therapy the mean is 11.20. It looks as if therapy reduces fear scores by an average of 14.80 ⫺ 11.20 ⫽ 3.6 points. But, here we go again! On the one hand, maybe we are accurately representing that the therapy works in nature: If we tested all such participants before and after therapy, we would have two populations of fear scores having different ms. On the other hand, maybe we are inaccurately representing that the therapy does nothing to fear scores; If we tested everyone before and after therapy, each time we would find the same population of fear scores with the same m. 150 X ⫽ 14.80 ⴚ ⫺ ⫺ After Therapy 8 ⴝ D D2 ⫽ ⫹3 9 ⫽ ⫹5 11 ⫽ ⫽ ⫹5 ⫹6 25 25 11 ⫽ 11 15 X ⫽ 11.20 36 1 ⫺1 2 ⌺D ⫽ ⫹18 ⌺D ⫽ 96 D ⫽ ⫹3.6 But, if we are testing the same people and our therapy does nothing, why do our samples have different before and after scores that produce different means? Because people are seldom perfectly consistent, Through random psychological and physiological fluctuations, anyone’s performance on a task may change from moment to moment. So, perhaps by luck, our participants were having a particularly bad, scary day when they were tested before therapy, but were having a good, not-so-scary day when tested after therapy. This would give the appearance that the therapy reduces fear scores, because one or both measurements are in error when we sample participants’ behaviors. So, maybe the therapy does nothing, but we have sampling error in representing this, and we obtained different means simply through the luck of the draw of when we happened to test participants. To resolve this issue, we perform the t-test. However, because of advanced statistical reasons, we cannot directly create a sampling distribution for related samples the way we did for independent samples. Instead, we must first transform the raw scores and then perform the t-test on the tran transformed scores. As in the right side of Table 9.3, we transform tran the data by finding the difference between the two tw fear scores for each participant. Thus, we subtract Millie’s M after score (8) from her before score (11) for a difference d of ⫹3; we subtract Archie’s after score (11) from f his before score (16) for a difference of ⫹ ⫹5; and so on. Notice that the symbol for d difference score is D. Here we arbitrarily subtracted after-therapy from before-therapy. You could subtract in before-the the op opposite direction; just be sure Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ©iStockphoto.com/Hans Van IJzendoorn to subtract all scores in the same direction. If this were a matched-samples design, we’d subtract the scores in each pair of matched participants. Next, compute the mean difference, symbolized as D. Add the positive and negative differences to find the sum of the differences, symbolized by ⌺D. Then divide by N, the number of difference scores. In Table 9.3, D ⫽ 18>5 ⫽ ⫹3.6: The before scores were, on average, 3.6 points higher than the after scores. Notice this is the same difference we found when we subtracted the means of the original fear scores. (As in the far righthand column of Table 9.3, later we’ll need to square each difference and then find the sum, finding ⌺D2.) Finally, here’s a surprise: Because now we have one sample mean from one sample of scores, we perform the one-sample t-test! The fact that we have difference scores is irrelevant, so we create the statistical hypotheses and test them in virtually the same way that we did with the one-sample t-test in the previous chapter. The related-samples t-test is performed by applying the one-sample t-test to the difference scores. we identify as mD. To create the statistical hypotheses, we determine the predicted values of mD in H0 and Ha. Let’s first perform a two-tailed test, predicting that the therapy either raises or lowers fear scores. The H0 always says our independent variable does not work as predicted, so here it is as if we had provided no therapy and simply measured everyone’s fear score twice. Ideally, everyone should have the same score on both occasions so everyone in the population should have a D of zero. However, this will not occur because of those random fluctuations that cause participants to behave inconsistently. Implicitly, H0 says everyone has good days and bad days so, by chance, each person may exhibit higher or lower fear scores at different times. This produces a variety of Ds that are sometimes positive numbers and sometimes negative numbers, creating the population of different Ds shown in the top portion of Figure 9.4. Notice that larger positive or negative Ds occur less frequently. Because chance produces the positive and negative Ds, over the long run they should balance out so that the average D in this population (mD) is zero. And, because these Ds are generated from the situation where the therapy does not work, this is the population that H0 says our sample of Ds represents. Thus, we have: H 0: m D ⫽ 0 This implies that our D represents a mD of 0. If D does not equal 0, it is because of sampling error in representing this population. (Likewise, in a matched-pairs design, each pair of individuals would not perform identically, so H0 would still say we have a population of Ds with mD ⫽ 0.) For the alternative hypothesis, if the therapy alters fear scores in the population, then either the before scores or the after scores will be consistently higher. Then, after subtracting them, the population of Ds will tend to contain only positive or only negative scores. Therefore, mD will be a positive or a negative number and not zero. So, the alternative hypothesis is: H a: m D ⬆ 0 9-4b Statistical Hypotheses for the Related-Samples t-Test Our sample of difference scores represents the population of difference scores that would result if we measured everyone in the population before therapy and again after therapy, and then computed their difference scores. This population of difference scores has a m that As usual, we test H0 by examining the sampling distribution, which here is called the sampling distribution of mean differences. It is shown in the bottom portion of Figure 9.4. The underlying raw score population used to create the sampling distribution is the population of Ds in the top portion of Figure 9.4 that H0 says we mean difference The mean of the differences between the paired scores in a related-samples t-test; symbolized as D in the sample and mD in the population Chapter 9: Hypothesis Testing Using the Two-Sample t-Test 151 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. are representing. As usual, it is as if we have infinitely sampled this population using our N, and each time computed D. Thus, the sam- Figure 9.4 Population of Difference Scores Described by H0 and the Resulting Sampling Distribution of Mean Differences pling distribution of mean differences shows all possible values of D that occur when samples are drawn from the population of difference scores that H0 says we are f representing. Remember, these Ds are produced when our two samples of raw scores represent no relationship in the population. So, for the phobia study, the sampling distriD D D D bution essentially shows all values Larger negative D s of D we might get by chance when the therapy does not work. Because larger positive or negative Ds occur less frequently, the Ds that are farther into the tails of the distribution are less likely to occur when H0 is f true and the therapy does not work. Notice that the hypotheses H0: mD ⫽ 0 and Ha: mD ⬆ 0 and the above sampling distribution are appropriate for the two-tailed test D D D D for any dependent variable when Larger negative D s you test whether there is zero difference between your conditions. This is the most common approach and the one we’ll discuss. (You can test whether you’ve altered a nonzero difference. Consult an advanced statistics book for details.) We test H0 by determining where on the above sampling distribution our D is located. To do that, we compute tobt. 9-5 PERFORMING THE Distribution of difference scores µD D D D 0 are drawn from the population of difference scores that H0 says we are representing 152 D D D D D D Larger positive D s Distribution of mean differences µD D D D 0 D D D D D D D Larger positive D s population of difference scores shown on the top in Figure 9.4. Replacing the Xs in our previous formula for variance with Ds gives THE FORMULA FOR THE ESTIMATED VARIANCE OF THE DIFFERENCE SCORES IS RELATED-SAMPLES t-TEST Computing tobt here is identical to computing the one-sample t-test discussed in Chapter 8—only the symbols have been changed, from sampling X to D. There, we first computed distribution the estimated population variof mean ance, then the standard error of the differences Shows all possible values of D mean, and then tobt. We perform the that occur when samples same three steps here. D (⌺D)2 N N⫺1 ⌺D2 ⫺ s2D ⫽ Note: For all computations in this t-test, N equals the number of difference scores. Using the phobia data from Table 9.3, we have (⌺D)2 (18)2 ⌺D ⫺ 96 ⫺ N 5 2 sD ⫽ ⫽ ⫽ 7.8 N⫺1 4 2 STEP 1: Compute s2D, which is the estimated variance of the Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. For the phobia study, D is ⫹3.6, sD is 1.249, and mD equals 0, so STEP 2: Compute the standard error of the mean difference. This is the standard deviation of the sampling distribution of D. Its symbol is sD––. tobt ⫽ THE FORMULA FOR THE STANDARD ERROR OF THE MEAN DIFFERENCE IS sD ⫽ So, tobt ⫽ ⫹2.88 9-5a Interpreting the Related- s2D Samples t-Test BN Interpret tobt by comparing it to tcrit from the t-table in Appendix B. For the phobia study, s2D ⫽ 7.8, and N ⫽ 5, so sD ⫽ s2D BN ⫽ D ⫺ mD ⫹ 3.6 ⫺ 0 ⫽ ⫽ ⫹2.88 sD 1.249 7.8 ⫽ 11.56 ⫽ 1.249 A 5 THE FORMULA FOR THE DEGREES OF FREEDOM FOR THE RELATED-SAMPLES t-TEST IS STEP 3: Find tobt. df ⫽ N ⫺ 1 THE FORMULA FOR THE RELATED-SAMPLES t-TEST IS where N is the number of difference scores. For the phobia study, with a ⫽ .05 and df ⫽ 4, the tcrit is {2.776. The complete standard error sampling distribution is shown in of the mean Figure 9.5. Remember, it shows the difference (sD––) The standard distribution of all Ds that occur deviation of the when we are representing a populasampling distribution tion of Ds where mD is 0 (when our of mean differences therapy does not work). But, we see D ⫺ mD tobt ⫽ sD In the formula, D is the mean of your difference scores, sD is computed as above, and m is the value given in H0. (It is always 0 unless you are testing for a nonzero difference.) Figure 9.5 Two-Tailed Sampling Distribution of Ds When mD ⫽ 0 µD f D Values of t D D D ... –2.0 – tcrit = –2.776 D D –1.0 D 0 0 D D +1.0 D D +2.0 ... + tcrit = +2.776 D D D + tobt = +2.88 ( D = +3.6) Chapter 9: Hypothesis Testing Using the Two-Sample t-Test 153 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. that a D like ours hardly ever occurs when the sample represents this population. In fact, our tobt lies beyond tcrit, so we conclude that our D of ⫹3.6 is unlikely to represent the population of Ds where mD ⫽ 0. Therefore, the results are significant: We reject H0 and accept Ha, concluding that the sample represents a mD around ⫹3.6. (As usual, with a ⫽ .05, the probability of a Type I error here is p ⬍ .05.) Now we work backward to our original fear scores. Recall that our D of ⫹3.6 is equal to the difference between the original mean fear score for before therapy (X ⫽ 14.80) and the mean fear score for after therapy (X ⫽ 11.20). According to H0, this difference is due to sampling error, and we are really representing zero difference in the population. However, using D we have determined that ⫹3.6 is significantly different from zero: Our data are unlikely to poorly represent zero difference in the population. Therefore, it is also unlikely that the original means of our fear scores poorly represent zero difference. Thus, we conclude that the means of 14.80 and 11.20 differ significantly from each other, and are unlikely to represent the same population of fear scores. Instead, the therapy appears to work, with the data representing a relationship in the population such that fear scores change from a m around ound 14.80 before therapy to a m around 11.20 after therapy. ©Hemera Technologies/PhotoObjects.net/Jupiter Images If the related-samples elated-samples tobt is significant, gnificant, then the original inal raw score means differ fer significantly from each other. As usual, now we describe and interpret this relationship. onship. Again, a helpful statistic for or doing this is to compute the effect size, as described in Section tion 9.7. Also, it would be useful to compute a confifidence interval to better estimate the 154 m of the fear scores for each condition. However, we cannot do that! The confidence interval for m requires independent samples, which we do not have. We can, however, compute a confidence interval for mD. For example, with our D of ⫹3.6, we assume that if we measured the entire population before and after therapy, the resulting population of Ds would have a mD around ⫹3.6. The confidence interval provides a range of values around ⫹3.6, within which we are confident the actual mD falls. The computations for this confidence interval are presented in Appendix A.2. If tobt had not been beyond tcrit, the results would not be significant and we would make no conclusions about whether our therapy influences fear scores (and we’d again consider our power). 9-5b One-Tailed Hypotheses with the Related-Samples t-Test As usual, we perform a one-tailed test only when we can confidently predict the direction of the difference between the two conditions. Realistically, in the phobia study, we would predict we’d find lower scores in the after-therapy condition. Then to create Ha, we first arbitrarily decide dec which condition to subtract from which and what the differences should sho be. We subtracted after from be before, so lower after-therapy scores would w produce positive differences. Then our D should be feren positive, representing a posip tive mD. Therefore, we have: Ha: mD ⬎ 0 Conversely, H0 always i implies the independent variaable doesn’t work, so here it would be that we have higher w or unchanged fear scores after therapy. This would produce Ds therapy that are ar either negative or at zero, respectively. Therefore, we have respecti H0: mD ⱕ 0 We again examine the sampling dis distribution that occurs when mD ⫽ 0, like li in Figure 9.5, except we use only one tail. We are predicting a positiv positive D, which will be on the right-hand side of the distribution, right-ha Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. so the region of rejection is in only the upper tail, and the one-tailed tcrit (from the t-table) is positive. Had we predicted higher after scores, then by subtracting after from before, the Ds and their D should be negative, representing a negative mD. Then, Ha: mD ⬍ 0, and H0: mD ⱖ 0. Now the region of rejection is in the lower tail and tcrit is negative. For either test, compute tobt and find tcrit as we did previously. Be sure you subtract to get your Ds in the same way as when you created your hypotheses. Wife 4 5 3 5 Husband 6 8 9 8 X ⫽ 4.25 X ⫽ 7.75 D −2 −3 −6 −3 D ⫽ ⫺3.5 D ⫽ ⫺14>4 ⫽ ⫺3.5 9-5c Summary of the Related-Samples t-Test After checking that the design is matched-samples or repeated-measures and meets the assumptions, the related-samples t-test involves the following: 1. Create either the two-tailed or one-tailed H0 and Ha. 2. Compute tobt. a. Compute the difference score for each pair of scores. b. Compute D and s2D. (⌺D)2 (⫺14)2 58 ⫺ N 4 ⫽ ⫽3 N⫺1 3 ⌺D 2 ⫺ s2D ⫽ sD ⫽ tobt ⫽ s2D BN ⫽ 13>4 ⫽ .866 D ⫺ mD sD ⫽ (⫺3.5 ⫺ 0)>.866 ⫽ ⫺4.04 With a ⫽ .05 and df ⫽ 3 tcrit is { 3.182. The tobt is significant. For wives, we expect m is 4.25, and for husbands, we expect m is 7.75. c. Compute sD. d. Compute tobt. 3. Create the sampling distribution and, using df ⫽ N ⫺ 1, find tcrit in the t-table. 4. Compare tobt to tcrit . If tobt is beyond tcrit the results are significant; describe the populations of raw scores and interpret the relationship. If tobt is not beyond tcrit, the results are not significant; make no conclusion about the relationship. For Practice A two-tailed study tests the same participants in both Conditions A and B, with these data. 1. This way of producing related samples is called a ______ design. 2. What are H0 and Ha? 5. If the results are significant, compute the “effect size” as described in Section 9.7. 3. Subtracting A − B, perform the t-test. > Quick Practice > Answers sD ⫽ 12.8>5 ⫽ .748; tobt ⫽ 13.4 ⫺ 02>.748 ⫽ ⫹4.55. 3. D ⫽ 17>5 ⫽ ⫹3.4; sD2 ⫽ 2.8; 2. H0: mD ⫽ 0; Ha: mD ⬆ 0 1. repeated-measures More Example In a two-tailed study, we compare husband-and-wife pairs, with H0: mD ⫽ 0 and Ha: mD ⬆ 0. Subtracting wife– husband produces 4. Subtracting A − B, what are H0 and Ha if we predicted that B would produce lower scores? With a ⫽ 0.5, tcrit ⫽ {2.776 and tobt is significant. Perform the related-samples t-test with a matched groups or repeated measures design. B 7 5 6 5 6 4. H0: mD ⱕ 0; Ha: mD ⬎ 0 > A 8 10 9 8 11 Chapter 9: Hypothesis Testing Using the Two-Sample t-Test 155 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 9-6 STATISTICS IN THE RESEARCH LITERATURE: REPORTING A TWO-SAMPLE STUDY Report the results of an independent- or relatedsamples t-test using the same format as in previous chapters. For example, in our hypnosis study, the tobt of ⫹2.93 was significant with 30 df, so we report t(30) ⫽ ⫹2.93, p ⬍ .05. As usual, df is in parentheses, and because a ⫽ .05, the probability is less than .05 that we’ve made a Type I error. Also, as described in the next section, you should include a measure of “effect size” for any significant result. In fact, the American Psychological Association requires published research to report effect size. In addition, as in Chapter 3, we report the mean and standard deviation from each condition. Also, with two or more conditions, researchers often include graphs of their results. Recall that we graph the results of an experiment by plotting the mean of each condition on the Y axis and the conditions of the independent variable on the X axis. Note: In a related-samples study, report the means and standard deviations of the original raw scores— not the Ds—and graph these means. 9-7 DESCRIBING relationship is “believable” because it is unlikely to be due to sampling error. Although a relationship must be significant to be potentially important, it can be significant and still be unimportant. We have two methods for measuring effect size. The first is to compute Cohen’s d. 9-7a Effect Size Using Cohen’s d One approach for describing effect size is in terms of how big the difference is between the means of the conditions. For example, the presence/absence of hypnosis produced a difference between the means of 3, so this is the size of its effect. However, we don’t know if 3 should be considered a large amount or not. To decide this, we need a frame of reference, so we also consider the estimated population standard deviation. Recall that a standard deviation reflects the “average” amount that scores differ. Thus, if individual scores differ by an “average” of, say, 30, then large differences between scores frequently occur, so a difference of 3 between their means is not all that impressive. However, if scores differ by an “average” of, say, only 5, then a difference between their means of 3 is more impressive. Cohen’s d measures effect size by describing the size of the difference between the means relative to the population standard deviation. We have two versions of how it is computed, depending on which twosample t-test we have performed. EFFECT SIZE An important statistic for describing a significant relationship is called a measure of effect size. The “effect” is the influence that the independent variable had on dependent scores. Effect size indicates the amount of influence that changing the conditions of the independent variable had effect size The amount of influence on dependent scores. For example, that changing the the extent to which changing the conditions of the amount of hypnosis produced differindependent variable had on dependent ences in recall scores is the effect size scores of hypnosis. Cohen’s d A We want to identify those varimeasure of effect ables that most influence a behavior, size in a two-sample so the larger the effect size, the more experiment that reflects scientifically important the indethe magnitude of the differences between pendent variable is. But! Remember the means of the that “significant” does not mean conditions “important”—only that the sample 156 THE FORMULAS FOR COHEN’S d ARE INDEPENDENTSAMPLES t-TEST d⫽ X1 ⫺ X2 2s RELATED SAMPLES t-TEST d⫽ 2 pool D 2 2sD In the formula for the independent-samples t-test, the difference between the conditions is measured as X1 ⫺ X2, and the standard deviation comes from the square root of the pooled variance. For our hypnosis study, the means were 23 and 20, and s2pool was 8.3, so d⫽ X1 ⫺ X2 2s 2 pool ⫽ 23 ⫺ 20 28.3 ⫽ ⫹3 ⫽ 1.04 2.88 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. This tells us that the effect of changing our conditions was to change scores by an amount that is slightly larger than 1 standard deviation. In the formula for the related-samples t-test, the difference between the conditions is measured by D, and the standard deviation comes from finding the square root of the estimated variance (s2D). In our phobia study, D ⫽ ⫹3.6 and s2D ⫽ 7.8, so therapy manipulation had a slightly larger impact on the dependent variable. The other way to measure effect size is by computing the proportion of variance accounted for. D ⫹3.6 ⫹3.6 ⫽ ⫽ ⫽ 1.29 2 2sD 27.8 2.79 Instead of measuring effect size in terms of the size of the changes in scores as above, we can also determine how consistently the scores change. Here, an independent variable has a greater impact the more it controls behavior: It produces one set of similar behaviors and scores for everyone in one condition, while producing a different set of similar behaviors and scores for everyone in a different condition. A variable is more minor, however, when it exhibits less control over behaviors and scores. In statistical terminology, when we describe the consistency among scores produced by the conditions, we are describing the proportion of variance accounted for. To see how this works, here are some possible fear scores from our phobia study. d⫽ 9-7b Effect Size Using Proportion of Variance Accounted For Thus, the effect size of the therapy was 1.29. The larger the effect size, the greater the influence that an independent variable has on dependent scores and thus the more important the variable is. Before Therapy After Therapy 10 5 11 6 12 7 Values of d d ⫽ .2 d ⫽ .5 d ⫽ .8 Interpretation of Effect Size small effect medium effect large effect Thus, in our previous examples we found two very large effects. Second, we can compare the size of different ds to determine the relative impact of different independent variables. In our previous examples, the d for hypnosis was 1.04, but for therapy it was 1.29. Therefore, in the respective studies, the 1 Cohen, J. (1988) Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Associates. ©iStockphoto.com/Archives We can interpret the above ds in two ways. First, the larger the absolute size of d, the larger the impact of the independent variable. In fact, Cohen1 proposed the following guidelines. Each after-therapy score is 5 points lower than the corresponding before-therapy score. These differ ferences among the scores can be attributed to cha changing the conditions of our independent variabl However, we also see differences among scores able. wi within each condition: In the before scores, for exa example, one participant had a 10 while someone els had an 11. These differences cannot be attribelse ute to changing the independent variable. Thus, out uted of all the differences among these six scores, some dif differences seem to have been produced by changing the independent variable while others were not. In other words, some proportion of all differences among the scores can be attributed to changing our conditions. By attributing these differences to changing the conditions, we explain their cause or account for them. And finally, recall that one way to measure differences among scores is to measure their variance. So, altogether, we are describing “the proportion of variance accounted for.” Chapter 9: Hypothesis Testing Using the Two-Sample t-Test 157 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. proportion of variance accounted for In an experiment, the propor- In our hypnosis study, tobt ⫽ ⫹2.93 with df ⫽ 30. So tion of variance accounted for is the proportion of all differences in In an experiment, dependent scores that can be attribthe proportion of uted to changing our conditions. the differences in The larger the proportion, the more dependent scores associated with that differences in scores seem to be changing the conditions consistently caused by changing the of the independent independent variable, so the larger variable is the variable’s effect in determining squared scores. point-biserial However, adjust your expectacorrelation coefficient (r 2pb) tions about what is “large.” Any Indicates the behavior is influenced by many variproportion of variance ables, so one variable by itself will in dependent scores that is accounted for have a modest effect. Therefore, here by the independent is a rough guide of what to expect variable in a twoin real research: An effect size less sample experiment than .09 is considered small; between about .10 and .25 is considered moderate and is relatively common; and above .25 is considered large and is rare. In the next chapter, we will see that the proportion of variance accounted for depends on the consistency of the relationship. There, we will also discuss the statistic for summarizing a relationship called the correlation coefficient. It turns out that the mathematical steps needed to determine the proportion of variance accounted for are accomplished by computing the appropriate correlation coefficient and then squaring it. In the two-sample experiment, we describe the proportion of variance accounted for by computing the squared point-biserial correlation coefficient Its symbol is r 2pb. It also turns out that these steps are largely accomplished by computing tobt. So, 2 rpb ⫽ ⫽ (tobt)2 (tobt)2 ⫹ df ⫽ (2.93)2 (2.93)2 ⫹ 30 8.585 ⫽ .22 38.585 Thus, .22 or 22% of the differences in our recall scores are accounted for by changing our hypnosis conditions. You can understand why this is a moderate effect by considering that if 22% of the differences in scores are due to changing our conditions, then 78% of the differences are due to something else (perhaps motivation or the participant’s IQ played a role). Therefore, hypnosis is only one of a number of variables that influence memory here, and thus, it is only somewhat important in determining scores. On the other hand, in the phobia study tobt ⫽ ⫹2.88 and df ⫽ 4, so r ⫽ 2 pb (tobt)2 (tobt)2 ⫹ df ⫽ (2.88)2 ⫽ .68 (2.88)2 ⫹ 4 This indicates that 68% of all differences in our fear scores are associated with before- or after-therapy. Therefore, our therapy had an extremely large effect size and is an important variable in determining fear scores. Further, we use effect size to compare the importance of different independent variables: The therapy accounts for 68% of the variance in fear scores, but hypnosis accounts for only 22% of the variance in recall scores. Therefore, in the respective relationships, the therapy variable had a much larger effect, so it is scientifically more important for understanding the relationship and the behavior. 2 THE FORMULA FOR rpb IS r ⫽ 2 pb (tobt)2 (tobt)2 ⫹ df This can be used with either the independentsamples or the related-samples t-test. In the numerator, square tobt. In the denominator, add (tobt)2 to the df from the study. For independent samples, df ⫽ (n1 ⫺ 1) ⫹ (n2 ⫺ 1), but for related samples, df ⫽ N ⫺ 1. 158 Effect size as measured by the proportion of variance accounted for indicates how consistently an independent variable influences dependent scores. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. USING SPSS As described on Review Card 9.4, SPSS performs the independent-samples or the related-samples t-test, including computing the mean and estimated population variance for each condition and the two-tailed a at which tobt is significant. The program also computes the confidence interval for the difference between two ms for the independent-samples test and the confidence interval for mD for the related-samples test. It does not compute measures of effect size. Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out and use the Chapter Review Cards in the back of your book. Check out the additional study aids online in CourseMate at www.cengagebrain.com STUDY PROBLEMS (Answers for odd-numbered problems are in Appendix C.) 1. (a) With what design do you perform the independent-samples t-test? (b) With what design do you perform the related-samples t-test? 2. (a) How do you identify the independent variable in an experiment? (b) How do you identify the dependent variable in an experiment? (c) What is the goal of an experiment? 3. What are the two explanations for obtaining a different mean in each condition of a two-sample experiment? 4. (a) How do you create independent samples? (b) What is the difference between n and N? 5. Explain the two ways to create related samples. 6. In addition to requiring independent or related samples, the two-sample t-test has what other assumptions? 7. (a) What does the sampling distribution of the differences between means show? (b) What information does the standard error of the difference provide? (c) What does being in the region of rejection of a sampling distribution indicate? 8. What does “homogeneity of variance” mean? 9. (a) What does the sampling distribution of mean differences show? (b) What information does the standard error of the mean difference provide? (c) What does being in the region of rejection of this sampling distribution indicate? 10. (a) What does effect size indicate? (b) What does d indicate? (c) What does the proportion of variance accounted for indicate? (d) What statistic describes this proportion in the two-sample t-test? 11. (a) What is your final task after finding a significant result? (b) Why is effect size useful at this stage? 12. For the following, which type of t-test is required? (a) Studying the effects of a memory drug on Alzheimer’s patients by testing a group of patients before and after administration of the drug. (b) Studying whether men and women rate the persuasiveness of an argument delivered by a female speaker differently. (c) The study described in part b, but with the added requirement that for each man of a particular age, there is a women of the same age. 13. We solicit two groups of professors at our school: One group agrees to check their e-mail once per day; the other group can check it as often as they wish. After two weeks, we give everyone a standard test of productivity to determine who has the higher (more productive) scores. The productivity scores from two independent samples are Once: X ⫽ 43, s2x ⫽ 22.79, n ⫽ 15 Often: X ⫽ 39, s2x ⫽ 24.6, n ⫽ 15 (a) What are H0 and Ha? (b) Compute tobt. (c) With a ⫽ .05, what is tcrit? (d) What should we conclude about this relationship? (e) Using our two approaches, how big is the effect of checking e-mail on productivity? (f) Describe how you would graph these results. 14. We investigate if a period of time feels longer or shorter when people are bored compared to when they are not bored. Using independent samples, we obtain these estimates of the time period (in minutes): Sample 1 (bored): X ⫽ 14.5, s2x ⫽ 10.22, n ⫽ 28 Sample 2 (not bored): X ⫽ 9.0, s2x ⫽ 14.6, n ⫽ 34 (a) What are H0 and Ha? (b) Compute tobt. (c) With a ⫽ .05, what is tcrit? (d) What should we conclude about this relationship? (e) Using our two approaches, how important is boredom in determining how quickly time seems to pass? Chapter 9: Hypothesis Testing Using the Two-Sample t-Test 159 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 15. A researcher asks if people score higher or lower on a questionnaire measuring their emotional well-being when they are exposed to much sunshine compared to when they’re exposed to little sunshine. A sample of 8 people is measured under both levels of sunshine and produces these wellbeing scores: Low: 14 13 17 15 18 17 14 16 (a) Subtracting before from after, what are H0 and Ha? (b) Compute tobt. (c) With a ⫽ .05, what is tcrit? (d) What should we conclude about this relationship? (e) How large is the effect of violence in terms of the difference it produces in aggression scores? 18. You investigate whether the older or the younger male in pairs of brothers tends to be more extroverted. You obtain the following extroversion scores: High: 18 12 20 19 22 19 19 16 (a) Subtracting low from high, what are H0 and Ha? (b) Compute tobt. (c) With a ⫽ .05, what do you conclude about this study? (d) What is the predicted well-being score for someone when tested under low sunshine? Under high sunshine? (e) How consistently are dependent scores changed by changing the amount of sunlight? (f) How scientifically important are these results? 16. A researcher investigates whether classical music is more or less soothing to air-traffic controllers than modern music. She plays a classical selection for one group and a modern selection for another. She gives each person an irritability questionnaire and obtains the following: Classical: n ⫽ 6, X ⫽ 14.69, s2x ⫽ 8.4 Modern: n ⫽ 6, X ⫽ 17.21, s2x ⫽ 11.6 (a) Subtracting C ⫺ M what are H0 and Ha? (b) What is tobt? (c) With a ⫽ .05, are the results significant? (d) Report the results using the correct format. (e) What should she conclude about the relationship in nature between type of music and irritability? (f) What other statistics should be computed? 17. We predict that children exhibit more aggressive acts after watching a violent television show. The scores for ten participants before and after watching the show are Sample 1 (After) 5 6 4 4 7 3 2 1 4 3 160 Sample 2 (Before) 4 6 3 2 4 1 0 0 5 2 Sample 1 (Younger) 10 11 18 12 15 13 19 15 Sample 2 (Older) 18 17 19 16 15 19 13 20 (a) What are H0 and Ha? (b) Compute tobt. (c) With a ⫽ .05, what is tcrit? (d) What should you conclude about this relationship? (e) Which of our approaches should we use to determine the effect size here? 19. A rather dim student proposes testing the conditions of “male” and “female” using a repeatedmeasures design. What’s wrong with this idea? 20. With a ⫽ .05 and df ⫽ 40, a significant independent-samples tobt was ⫹ 4.55 How would you report this in the literature? 21. An experimenter investigated the effects of a sensitivity-training course on policemen’s effectiveness at resolving domestic disputes (using independent samples who had or had not completed the course). The dependent variable was the ability to resolve a domestic dispute. These success scores were obtained: No Course 11 14 10 12 8 15 12 13 9 11 Course 13 16 14 17 11 14 15 18 12 11 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. (a) Should a one-tailed or a two-tailed test be used? (b) What are H0 and Ha? (c) Subtracting course form no course, compute tobt and determine whether it is significant. (d) What conclusions can the experimenter draw from these results? (e) Using our two approaches, compute the effect size and interpret it. 22. (a) What does it mean that an independent variable “accounts” for more of the variance? (b) Why is such a variable more scientifically important? 23. When reading a research article, you encounter the following statements. For each, identify the design, the statistical procedure performed, if a Type I or Type II error is possibly being made, and the influence of the independent variable in the relationship being studied. (a) “The t-test indicated a significant difference between the mean for men (M = 5.4) and the mean for women (M = 9.3), with t(58) = +7.93, p < .01. Unfortunately, the effect size was only r2pb ⫽ .08.” (b) “The t-test indicated that participants’ weights were significantly reduced after three weeks of dieting, with t(40) = 3.56, p < .05, and r2pb ⫽ .26.” 24. For each of the following, which type of t-test is required? (a) Studying whether psychology or sociology majors are more prone to math errors on a statistics exam. (b) The study in part a, but for each psychology major, there is a sociology major with the same reading score. (c) A study of the spending habits of a group of teenagers, comparing the amount of money each spends in an electronic games store and in a clothing store. (d) A study of the effects of a new anti-anxiety drug, measuring participants’ anxiety before and after administration of the drug. (e) Testing whether women in the U.S. Army are more aggressive than women in the U.S. Marine Corps. 25. (a) When do you perform a parametric inferential procedure in an experiment? (b) What are the four parametric inferential procedures for experiments that we have discussed so far in this book, and what is the design in which each is used? Chapter 9: Hypothesis Testing Using the Two-Sample t-Test 161 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 10 DESCRIBING RELATIONSHIPS USING CORRELATION AND REGRESSION LOOKING BACK GOING F O R WA R D Be sure you understand: Your goals in this chapter are to learn: • From Chapter 1, the difference between an experiment and a correlational study, and how to recognize a relationship between the scores of two variables. • How to create and interpret a scatterplot. • From Chapter 4, that greater variability indicates that scores are not consistently close to each other. • The logic of predicting scores using linear regression and of r 2. • What a regression line is. • When and how to compute the Pearson r. • How to perform significance testing of the Pearson r. • From the previous chapters, the basics of significance testing. R ecall that in a relationship we see a pattern where, Sections 10-1 Understanding Correlations 10-2 The Pearson Correlation Coefficient 10-3 as the scores on one variable change, the scores on the other variable also change in a consistent manner. This chapter presents a new statistic for describing a relationship called the correlation coefficient. In the following sections we will discuss (1) what a correlation coefficient is and Significance Testing of the Pearson r how to interpret it, (2) how to compute the most common coef- 10-4 Statistics in the Research Literature: Reporting r and (4) how we use a relationship to predict unknown scores. 10-5 An Introduction to Linear Regression 10-6 The Proportion of Variance Accounted For: r 2 162 ficient, (3) how to perform inferential hypothesis testing of it, Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. © Martin Holtkamp/Taxi Japan/Getty Images 10-1 UNDERSTANDING © iStockphoto.com/Michael Krinke CORRELATIONS Whenever we find a relationship, we then want to know its characteristics: What pattern is formed, how consistently do the scores change together, what direction do the scores change, and so on. The best—and easiest—way to answer these questions is by computing a correlation coefficient. The term correlation means relationship, and a correlation coefficient is a number that describes a relationship. The correlation coefficient is a statistic that describes the important characteristics of a relationship. We compute the correlation coefficient using our sample data, and the answer is a single number that quantifies the pattern in a relationship. No other statistic can do this. Thus, the correlation correla coefficient is important because it simpliimporta fies a complex relationship involving many scores into one easily sco interpreted statistic. in Correlation coefficients are most commonly used to sumcom marize the relationship found in a correlational design, but computing a correlation coefficient does not create this type of study. Recall from Chapter 1 that in a correlational design, we simply measure participants’ scores on two variables. For example, as people drink more coffee, they typically become more nervous. To study this in a correlational study, we might ask participants the amount of coffee they had consumed that day and also measure how nervous they were. This is different from an experiment because there we would manipulate coffee consumption by assigning some people to a one-cup condition, others to a two-cup condition, and so on, and then measure their nervousness. Thus, the major difference here is how we go about demonstrating the relationship. However, it is important to note another major distinction between these designs: A correlational design does not allow us to conclude that changes in X cause changes in Y. It may be that X does cause Y. But it also may be that Y causes X or that a third variable influences both X and Y. Thus, if we find that higher coffee scores are associated with higher nervousness scores, it may be that more coffee makes people more nervous. But, it may instead be that participants who were already more nervous then drank more coffee. Or, perhaps a hidden variable was correlation operating: Perhaps some particicoefficient pants had less sleep than others the A statistic that night before testing, and this caused describes the important characteristics of a these people both to be more nerrelationship vous and to drink more coffee. Chapter 10: Describing Relationships Using Correlation and Regression 163 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. So remember, a correlation by itself does not indicate causality. You A graph of the must also consider the research design individual data points from a set of X-Y pairs used to demonstrate the relationship. In experiments we have more control over the variables and the situation, so they tend to provide better evidence for identifying the causes of a behavior. Correlational research—and correlations— simply describe how nature relates the variables, without identifying the cause. scatterplot to create a scatterplot. A scatterplot is a graph of the individual data points from a set of X-Y pairs. Figure 10.1 shows some data and then the scatterplot we might have from studying nervousness and coffee consumption. (Real research typically involves a larger N, and the data points will not form such a pretty pattern.) To create this scatterplot, the first person in the data had 1 cup of coffee and a nervousness score of 1, so we place a data point above an X of 1 cup at the height of a score of 1 on the Y variable of nervousness. The second 10-1a Distinguishing Characteristics of Correlational Analysis Figure 10.1 There are four major differences between how we handle data in a correlational analysis versus in an experiment. First, in our coffee experiment we would examine the mean nervousness score (the Y scores) for each condition of the amount of coffee consumed (each X). Then we would examine the relationship they show. However, with a correlational study we typically have a rather large range of different X scores: People will probably report many different amounts of coffee consumed in a day. Comparing the mean nervousness scores for so many amounts would be very difficult. Therefore, in correlational procedures we do not compute a mean Y score at each X. Instead, the correlation coefficient simultaneously summarizes the entire relationship that is formed by all pairs of X-Y scores in the data. A second difference is that, because we examine all pairs of X-Y scores, correlational procedures involve one sample: In correlational analysis, N stands for the number of pairs of scores in the data. Third, in a correlational study, either variable may be labeled as the X or Y variable. How do we decide which is X or Y? Any relationship may be seen as asking, “For a given X, what are the Y scores?” So, simply identify your “given” variable and it is X. Thus, if we ask, “For a given amount of coffee, what are the nervousness scores?”, then amount of coffee is the X variable and nervousness is the Y. Conversely, if we ask, “For a given nervousness score, what is the amount of coffee consumed?”, then nervousness is X and amount of coffee is Y. (Note: Researchers disagree about whether to then call X and Y the independent and dependent variables. The safer approach is to reserve these terms for experiments.) Finally, the data are graphed differently in correlational research. We use the individual pairs of scores Each data point is created using a participant’s coffee consumption as the X score and nervousness as the Y score. 164 Scatterplot Showing Nervousness as a Function of Coffee Consumption Cups of Coffee: X Nervousness Scores: Y 1 1 1 2 2 3 3 4 4 5 5 6 6 1 1 2 2 3 4 5 5 6 8 9 9 10 Nervousness scores Scatterplot 10 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 Cups of coffee consumed 6 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. participant had the same X and Y scores, so that data point is on top of the previous one. The third participant scored an X of 1 and Y of 2, and so on. We interpret a scatterplot by examining the overall pattern formed by the data points. We read a graph from left to right and see that the Xs become larger. So, get in the habit of describing a relationship by asking, “As the X scores increase, what happens to the Ys?” Back in Figure 10.1, as the X scores increase, the data points move higher on the graph, indicating that the corresponding Y scores are higher. Thus, the scatterplot reflects a relationship. Recall that in any relationship as the X scores increase, the Y scores change such that different values of Y tend to be paired with different values of X. Drawing the scatterplot allows you to see your particular relationship and to map out the best way to summarize it. The shape and orientation of a scatterplot reflect the characteristics of the relationship that are described by the correlation coefficient. The two important characteristics of any relationship that we need to know about are the type and the strength of the relationship. The following sections discuss these characteristics. that follows one straight line. This is type of because in a linear relationship, relationship The overall direction as the X scores increase, the Y scores the Y scores tend to tend to change in only one direcchange as X scores tion. Figure 10.2 shows two linear increase relationships: between the amount linear of time that students study and their relationship test performance, and between the A relationship in which the Y scores number of hours that students watch tend to change in television and the amount of time only one direction as they sleep. These are linear because the X scores increase as students study longer, their grades tend only to increase, and as they watch more television, their sleep time tends only to decrease. To better see the overall pattern in a scatterplot, visually summarize it by drawing a line around its outer edges. As in Figure 10.2, a scatterplot that forms a slanted ellipse indicates a linear relationship: By slanting, Figure 10.2 Scatterplots Showing Positive and Negative Linear Relationships Positive linear study–test relationship The type of relationship that is present in a set of data is the overall direction in which the Y scores change as the X scores change. The two general types of relationships are linear and nonlinear relationships. Test scores 10-1b Types of Relationships LINEAR RELATIONSHIPS The term linear means “straight line,” and a linear relationship forms a pattern 100 90 80 70 60 50 40 30 0 1 2 3 4 5 6 7 Hours of study time per day 8 Hours of sleep © John Lund/Sam Diephuis/Blend Images/Jupiterimages Negative linear television–sleep relationship 10 9 8 7 6 5 4 3 0 1 2 3 4 5 6 Hours of TV time per day 7 8 Chapter 10: Describing Relationships Using Correlation and Regression 165 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. it indicates the Y scores are changing anging as the X as the X scores in increase, the Y scores do not only scores increase; by slanting in one ne direction, it increase or only decrease: At some point, the Y indicates it is a linear relationship.. scores alter th their direction of change. (Note: Further, we also summarizee the relaAnother nam name for nonlinear is curvilinear.) tionship by drawing a line through ough the Nonlinear relationships come in many Nonlin scatterplot. This line is called the regresdifferent shapes, but Figure 10.3 shows sion line. While the correlation coeffitwo com common ones. On the left is the relacient is the statistic that summarizes izes tionship between a person’s age and tionsh a relationship, the regression on the amount of time required to move line is the line that summafrom one place to another. At first, fr rizes the relationship. The aas age increases, movement time linear regression line is decreases; but, beyond a certain the straight line that summaage, the time scores change direc© iStockphoto.com/Thomas Flügge rizes a relationship by passing tion and begin to increase. (This through the center of the scatis called a U-shaped pattern.) The terplot. That is, although not all data points are on the scatterplot on the right shows the relationship between line, the distance that some are above the line equals the number of alcoholic drinks consumed and feeling the distance that others are below it, so that, “on averwell. At first, people tend to feel better as they drink, age,” the regression line passes through the center but beyond a certain point, drinking more makes of the scatterplot. Therefore, think of the regression them feel progressively worse. (This pattern reflects an line as reflecting the pattern that all data points more inverted U-shaped relationship.) Curvilinear relationor less follow, so it shows the linear—straight-line— ships may be more complex than those above, producrelationship hidden in the data. ing a wavy pattern that repeatedly changes direction. The different ways that the scatterplots slant back Also, the scatterplot does not need to be curved to be in Figure 10.2 illustrate the two subtypes of linear nonlinear. Scatterplots similar to those in Figure 10.3 relationships that occur, depending on the direction in might be best summarized by two straight regression which the Y scores change. The study–test relationship lines that form a V and an inverted V, respectively. Or is a positive relationship. In a positive we might see lines that form angles like or . All linear relationship, as the X scores are still nonlinear relationships, because they cannot be linear increase, the Y scores also tend to summarized by one straight line. regression line The straight line that increase. Any relationship that fits the Note that the preceding terminology is also used summarizes a linear pattern “the more X, the more Y” is a to describe the type of relationship found in experirelationship by passing positive linear relationship. ments. If, as the amount of the independent variable through the center of the scatterplot On the other hand, the television– (X) increases, the dependent scores (Y) also increase, sleep relationship is a negative relathen you have a positive linear relationship. If the positive linear relationship tionship. In a negative linear dependent scores decrease as the independent variA relationship in which relationship, as the X scores increase, able increases, then you have a negative linear relathe Y scores tend to the Y scores tend to decrease. Any relationship. And if, as the independent variable increases, increase as the X scores tionship that fits the pattern “the more increase X, the less Y” is a negative linear relanegative linear tionship. (Note: The term negative does relationship A relationship in which not mean there is something wrong the Y scores tend to with the relationship: It merely indidecrease as the X cates the direction in which the Y scores scores increase A linear relationship follows one change as the X scores increase.) nonlinear (curvilinear) relationship A relationship in which the Y scores change their direction of change as the X scores increase 166 NONLINEAR RELATIONSHIPS If a relationship is not linear, then it is nonlinear, meaning that the data cannot be summarized by one straight line. In a nonlinear relationship, straight line and may be positive (with increasing Y scores) or negative (with decreasing Y scores). Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Figure 10.3 Scatterplots Showing Nonlinear Relationships Inverted U-shaped pattern 7 7 6 6 Feeling of wellness Time for movement U-shaped pattern 5 4 3 5 4 3 2 2 1 1 0 10 20 30 40 50 Age in years 60 70 0 1 2 3 4 5 6 7 Alcoholic drinks consumed 8 9 10 the dependent scores alter their direction of change, then you have a nonlinear relationship. HOW THE COEFFICIENT DESCRIBES THE TYPE OF RELATIONSHIP The first step when summarizing data is to decide on the specific correlation coefficient to compute. This begins with deciding between a linear and a nonlinear version. Most behavioral research uses a linear correlation coefficient—one designed to summarize a linear relationship. How do you know whether your data form a linear relationship? If the scatterplot generally follows a straight line, then a linear correlation is appropriate. (We will discuss only linear correlations.) By computing a linear correlation coefficient, we communicate to other researchers that we have a linear relationship. Then, the coefficient itself communicates whether the relationship is positive or negative. Sometimes our computations will produce a negative number (with a minus sign), indicating we have a negative linear relationship. Other data will produce a positive number (and we place a plus sign with it), indicating we have a positive linear relationship. The other characteristic of a relationship communicated by the correlation coefficient is the strength of the relationship. 10-1c Strength of the Relationship Recall that a relationship can exhibit varying degrees of consistency. The strength of a relationship is the extent to which one value of Y is consistently A linear correlation coefficient has two components: the sign, indicating a positive or a negative relationship, and the absolute value, indicating the strength of the relationship. paired with one and only one value of X. (This is also referred to as the degree of association.) The strength of a relationship is indicated by the absolute value of the correlation coefficient (ignoring the sign). The larger the coefficient, the stronger, more consistent the relationship is. The largest possible value of a correlation coefficient is 1, and the smallest value is 0. When you include the sign, a linear correlation coefficient can be any value between 1and 1. Thus, the closer the coefficient is to { 1, the more consistently one value of Y is paired with one and only one value of X. strength of a relationship Recognize that correlation coefThe extent to ficients do not directly measure which one value units of “consistency.” Thus, if one of Y is consistently associated with one correlation coefficient is .40 and and only one value another is .80, you cannot conof X clude that the second relationship Chapter 10: Describing Relationships Using Correlation and Regression 167 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Perfect positive coefficient ⴝ ⴙ1 X Y 1 1 1 3 3 3 5 5 5 2 2 2 5 5 5 8 8 8 168 8 7 6 5 4 3 2 1 0 1 2 3 X scores 4 5 1 2 3 X scores 4 5 Perfect negative coefficient ⴝ ⴚ1 X Y 1 1 1 3 3 3 5 5 5 8 8 8 5 5 5 2 2 2 8 7 6 5 4 3 2 1 0 Dave & Les Jacobs/Cultura/Getty Images correlation coefficient of 1 or 1 describes a perfectly consistent linear relationship. Figure 10.4 shows an example of each. In this and the following figures, first look at the scores to see how they pair up; then look at the scatterplot. Other data having the same correlation coefficient will produce similar patterns having similar scatterplots. Interpreting any correlation coefficient involves envisioning the scatterplot that is present. Here are four related ways to think about what a coefficient tells you about the relationship. First, the coefficient indicates the relative degree of consistency with which Ys are paired with Xs. A coefficient of {1 indicates that one, identical Y score was obtained by everyone who obtains a particular X. Then, every time X changes, the Y scores all change to one new value. Data and Scatterplots Reflecting Perfect Positive and Negative Correlations Y scores THE PERFECT CORRELATION A Figure 10.4 Y scores is twice as strong as the first. Instead, evaluate any correlation coefficient by comparing it to the extreme values of 0 and {1. The starting point is a perfect relationship. Second, the opposite of consistency is variability, so the coefficient communicates the variability in the group of Y scores paired with each X. When the coefficient is {1, only one Y is paired with an X, so there is no variability among the Y scores paired with each X. Third, the coefficient indicates how closely the scatterplot fits the regression line. Because a coefficient equal to {1 indicates zero variability or spread among the Y scores at each X, the data points form a perfect straight-line relationship so that they all lie on the regression line. Fourth, the coefficient communicates the relative accuracy of our predictions. A goal of behavioral science is to predict the specific behaviors—and the scores that reflect them—that occur in a particular Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. situation. We do this using relationships because a particular Y score is naturally paired with a particular X score. Therefore, if we know someone’s X, we use the relationship to predict that individual’s Y. A coefficient of {1 indicates perfect accuracy in our predictions: Because only one value of Y occurs with an X, by knowing someone’s X, we can predict exactly what his or her Y will be. For example, in both graphs an X of 3 is always paired with a Y of 5, so we predict that anyone scoring an X of 3 will have a Y score of 5. The correlation coefficient communicates how consistently the Ys are paired with X, the variability in Ys at each X, how closely the scatterplot fits the regression line, and the accuracy in our predictions of Y. that is not {1 indicates that the data form a linear relationship to only some degree. The key to understanding the strength of any relationship is this: A RELATIONSHIP The correlation coefficient communicates this because, as the variability in the Ys at each X becomes larger, the correlation coefficient becomes smaller. 2. Variability: A coefficient less than {1 indicates there is variability among the Y scores at each X. In other words, different Y scores are now paired with an X. However, .98 is close to 1, indicating this variability is relatively small so that the different Ys at an X are relatively close to each other. 4. Predictions: When the coefficient is not {1, there is not one Y score for a particular X, so we can predict only around what someone’s Y score will be. In Figure 10.5, at an X of 1 are Y scores of 1 and 2. Split the difference and for each person here we’d predict a Y of around 1.5. But no one scored exactly 1.5, so we’ll have some error in our predictions. However, .98 is close to 1, indicating that, overall, our error will be relatively small. Figure 10.5 Data and Scatterplot Reflecting a Correlation Coefficient of .98 X Y 1 1 1 3 3 3 5 5 5 1 2 2 4 5 5 7 8 8 Y scores (DIFFERENCES) AMONG THE GROUP OF Y SCORES PAIRED WITH EACH X BECOMES LARGER. 1. Consistency: An absolute value less than {1 indicates that not every participant who obtained a particular X obtained the same Y. However, a coefficient of .98 is close to 1, so here we have close to perfect association between the X and Y scores. 3. The scatterplot: A coefficient less than 1 indicates variability in Y at each X, so the data points at an X are vertically spread out above and below the regression line. However, a coefficient of .98 is close to 1, so we know the data points are close to the regression line, resulting in a scatterplot that is a narrow ellipse. INTERMEDIATE STRENGTH A correlation coefficient BECOMES WEAKER AS THE VARIABILITY For example, Figure 10.5 shows data that produce a correlation coefficient of .98. Again, interpret the coefficient in four ways: 8 7 6 5 4 3 2 1 0 1 2 3 X scores 4 Chapter 10: Describing Relationships Using Correlation and Regression 5 169 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 2. the variability in the Y scores paired with each X is relatively large; 3. the large variability produces data points on the scatterplot at each X that are vertically very spread out above and below the regression, forming a relatively wide scatterplot; and 4. because each X is paired with many different Ys, knowing a participant’s X will not get us close to his or her Y, so our predictions will contain large amounts of error. Greater variability in the group of Y scores at each X reduces the strength of a relationship and the size of the correlation coefficient. Data and Scatterplot Reflecting a Correlation Coefficient of .28 X Y 1 1 1 3 3 3 5 5 5 9 6 3 8 6 3 7 5 1 Y scores 1. this relationship is not close to showing perfectly consistent association; Figure 10.6 9 8 7 6 5 4 3 2 1 0 1 X Y 1 1 1 3 3 3 5 5 5 3 5 7 3 5 7 3 5 7 4 5 9 8 7 6 5 4 3 2 1 1 0 2 3 X scores 4 5 2. The spread in Y at any X is 2 at maximum and equals the overall spread of Y in the data. value of the correlation coefficient nt is 0, indicating that no relationship is present. Figure 10.7 shows data that produce such a coefficient. A correlation coefficient of 0 is as far as possible from {1, telling us the scatterplot is as far as possible from m forming a slanted straight line. Thereerefore, we know the following: 170 3 X scores Data and Scatterplot Reflecting a Correlation Coefficient of 0 ZERO CORRELATION The lowest possible 3. The scatterplot is horizontal 3 and elliptical (or circular), and in no way hugs the regression line. © Chris Stein/Stone/Getty Images 1. No Y score tends to be consistently tly associated with only one X, and instead, d, virtually the same batch of Y scores is paired with every X. 2 Figure 10.7 Y scores On the other hand, Figure 10.6 shows data that produce a coefficient of .28. Because this coefficient is not very close to 1, this tells us: 4. Because each X is paired 4 with virtually all Y scores, knowing someone’s X score is no help in predicting his or her Y score. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 2. The more that you smoke cigarettes, the lower is your healthiness. This is a(n) _____ linear relationship, producing a scatterplot that slants _____ as X increases. The larger a correlation coefficient (whether positive or negative), the stronger the linear relationship, because the Ys are spread out less at each X, and so the data come closer to forming a straight line. 3. The more that you exercise, the better is your muscle tone. This is a(n) _____ linear relationship, producing a scatterplot that slants _____ as X increases. 4. In a stronger relationship, the variability among the Y scores at each X is _____, producing a scatterplot that forms a(n) _____ ellipse. 5. The _____ line summarizes the scatterplot. > Answers > Quick Practice 1. linear; nonlinear 2. negative; down 3. positive; up > 4. smaller; narrower 5. regression > In a positive linear relationship, as X increases, Y increases. In a negative linear relationship, as X increases, Y decreases. The larger the correlation coefficient, the more consistently one Y occurs with one X, the smaller the variability in Ys at an X, the narrower the scatterplot, and the more accurate our predictions. More Examples A coefficient of .84 indicates (1) as X increases, Y consistently increases; (2) the Y scores paired with a particular X show little variability; (3) the scatterplot is a narrow ellipse, with the data points lying near the upward-slanting regression line; and (4) by knowing an individual’s X, we can closely predict his/her Y score. However, a coefficient of .38 indicates (1) as X increases, Y somewhat consistently increases; (2) a variety of Y scores are paired with each X; (3) the scatterplot is a relatively wide ellipse around the upward-slanting regression line; and (4) knowing an X score produces only moderately accurate predictions of the paired Y score. For Practice 1. In a(n) _____ relationship, as the X scores increase, the Y scores increase or decrease only. This is not true in a(n) _____ relationship. 10-2 THE PEARSON CORRELATION COEFFICIENT Statisticians have developed several different linear correlation coefficients that are used with different scales of measurement and different designs. However, the most common correlation in behavioral research is the Pearson correlation coefficient, which describes the linear relationship between two interval variables, two ratio variables, or one interval and one ratio variable. The Pearson symbol for the Pearson correlation correlation coefficient in a sample is the lowercoefficient (r) The coefficient that case r. All of the example coefficients describes the linear in the previous section were rs. relationship between In addition to requiring intertwo interval or ratio val or ratio scores, the r has two variables other requirements. First, the X restricted and Y scores should each form range Occurs when the range of an approximately normal distriscores on a variable bution. Second, we should avoid is limited, producing a restricted range of X or Y. A an r that is smaller than it would be if restricted range occurs when the the range was not range of scores from a variable is restricted limited so that we have only a few Chapter 10: Describing Relationships Using Correlation and Regression 171 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. multiplying each pair of z-scores together as in this formula: r different scores that are close together. Then we will inaccurately describe the relationship, obtaining an r that is smaller than it would be if the range was not restricted. Generally, a restricted range occurs when researchers are too selective when obtaining participants. So, for example, if we are interested in the relationship between participants’ high school grades and their subsequent salaries, we should not restrict the range of grades by studying only honor students. Instead, we should include all students to get the widest range of grades possible. 10-2a Computing the Pearson r Computing r requires that we have collected pairs of X and Y scores. Then we use the same symbols for Y that we’ve previously used for X, so Y is the sum of the Y scores, Y 2 is the sum of the squared Y scores, and (Y)2 is the squared sum of Y. You will also see two new symbols: We have XY, called the sum of the cross products. This says to first multiply each X score in a pair times its corresponding Y score. Then sum all of the resulting products. We also have (X)(Y). This says to find the sum of the Xs and the sum of the Ys. Then multiply the two sums together. Mathematically, r determines the “average” amount the X and Y scores correspond. However, as you saw in Chapter 5, we compare scores from different variables by transforming them into z-scores. Thus, computing r involves transforming each Y score into a z-score (call it zY) and transforming each X score into a z-score (call it zX). Then, because z-scores involve positive and negative numbers that always balance out to zero, we can measure their correspondence by 172 (zX zY ) N This says to multiply each zX times the paired zY, sum the products, and divide by N. The answer will always be between {1. If in each pair tthe zs tend to have the same sign, then the more the zz-scores match, the closer the r is to 1. If in each pair the t zs tend to have opposite signs, then the more the two z-scores match, the closer the r is to 1. However, not only is this formula extremely time-consuming, it also leads to substantial rounding errors. To derive a better computing formula for r, the symbols zX and zY in the above formula were replaced by their respective formulas. Then in each z-score formula, the symbols for the mean and standard deviation were replaced by their computing formulas. This produces a monster of a formula. After reducing it, we have the smaller monster below. THE COMPUTING FORMULA FOR THE PEARSON r IS r N(X Y ) (X )(Y ) 2[N(X 2) (X ) 2][N(Y 2) (Y ) 2] In the numerator, the N (the number of pairs of scores) is multiplied by XY. Then subtract the quantity (X)(Y). In the he denominator, in the left brackets multiply N times X X 2, and from that subtract act (X) 2. In the right brackets, multiply N times Y 2, and from that subtract (Y) 2. Multiply the he answers from the brackets together, and then find the square root. Then n divide the denominator into nto the numerator and, voilà, là, the answer is the Pearson on r. For example, say ay that we ask 10 people the number of times they visited a Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. © iStockphoto.com/Floortje © iStockphoto.com/Mario Aguilar/© iStockphoto.com/UteHil THE DEFINING FORMULA FOR THE PEARSON r IS doctor in the last year and the number of glasses of orange juice they drink daily. To describe the linear relationship between juice drinking and doctor visits, we compute r. Table 10.1 shows a good way to set up the data. STEP 1: Compute X, (X) 2, X 2, Y, (Y) 2, Y 2, XY, and N. As in Table 10.1, in addition to the columns for X and Y, make columns for X 2 and Y 2. Also, make a column for XY where you multiply each X times its paired Y. Then sum all of the columns. Then square X and Y. N(XY ) (X )(Y ) 2[N(X 2) (X ) 2][N(Y 2) (Y ) 2] 279 2(450 289)(2750 2209) On the left, subtracting 450 289 gives 161. On the right, subtracting 2750 2209 gives 541. So 2[10(45) 289][10(275) 2209] STEP 2: Compute the numerator. 10 times 52 is 520, and 17 times 47 is 799. Now, we have 2[10(45) 289][10(275) 2209] STEP 3: Compute the denominator and then divide. First perform the operations within each set of brackets. In the left brackets above, 10 times 45 is 450. In the right brackets, 10 times 275 is 2750. This gives r 10(52) (17)(47) 520 799 Complete the numerator: 799 from 520 is 279. (Note the negative sign.) r Filling in the formula for r, we get r r 279 2(161)(541) Now multiply: 161 times 541 equals 87,101. After taking the square root we have r 279 .95 295.129 Table 10.1 Sample Data for Computing the r between Orange Juice Consumed (the X variable) and Doctor Visits (the Y variable) Participant 1 2 3 4 5 6 7 8 9 10 Glasses of Juice per Day X X2 0 0 0 0 1 1 1 1 1 1 2 4 2 4 3 9 3 9 4 16 N 10 X 17 (X)2 289 X 2 45 Doctor Visits per Year Y Y2 8 64 7 49 7 49 6 36 5 25 4 16 4 16 4 16 2 4 0 0 Y 47 XY 0 0 7 6 5 8 8 12 6 0 Y 2 275 XY 52 (Y )2 2209 Our correlation coefficient between orange juice drinks and doctor visits is .95. (Note: We usually round off a correlation coefficient to two decimals.) Interpret this r as we discussed previously: On a scale of 0 to {1, our .95 indicates an extremely strong negative linear relationship: Each amount of orange juice is associated with a very small range of doctor visits, and as juice scores increase, doctor visits consistently decrease. Therefore, we envision a very narrow scatterplot that slants downward. Further, based on participants’ juice scores, we can very accurately predict their frequency of doctor visits. Chapter 10: Describing Relationships Using Correlation and Regression 173 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. > Quick Practice The Pearson correlation coefficient (r) describes the linear relationship between two interval and/or ratio variables. More Examples > Answers r 14 .64 21.817 2 3 5(19) 81 43 5(46) 196 4 5(28) (9)(14) 3 2 4 5 5 6 1 3 2 4 4 1 1 2 2 3 3 1 1 2 2 3 14 Y Y 2 3 14 43 34 4 X X > For Practice Compute r for the following: To compute r for the above scores: X 12, (X) 2 144, X 2 28, Y 25, (X) 2 625, Y 2 155, XY 56, and N 6 r r N(XY ) (X )(Y ) 2 3 N(X ) (X ) 43 N(Y ) (Y ) 4 2 2 2 r 6(56) (12)(25) 2 3 6(28) 144 43 6(115) 625 4 336 300 2 3 6(28) 144 43 6(115) 625 4 r 174 The Pearson r describes a sample. Ultimately, however, we wish to describe the relationship that occurs in nature—in the population. Therefore, we use the sample’s correlation coefficient to estimate or infer the coefficient we’d find if we could study everyone in the population. But before we can believe that the sample correlation represents the relationship found in the population, we must first perform statistical hypothesis testing and conclude that r is significant. 36 2 3 6(28) 144 43 6(115) 625 4 In the denominator, 6 times 28 is 168; 6 times 115 is 690, so r TESTING OF THE PEARSON r 2 In the numerator, 6 times 56 is 336, and 12 times 25 is 300, so r 10-3 SIGNIFICANCE 36 2 3 168 144 43 690 625 4 36 Never accept that a sample correlation coefficient reflects a relationship in nature unless it is significant. 2 3 24 43 65 4 36 21560 36 .91 39.497 Here’s a new example. We are interested in the relationship between a man’s age and his physical agility. We select 25 men, measure their age and their Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. H a: r ⬆ 0 On the other hand, the null hypothesis is always that the predicted relationship does not exist, so here it is that the correlation in the population is zero. So, H0 : r 0 distribution showing all possible values of r that occur when samples are drawn from a population in which r is zero obtained a slanting elliptical sample scatterplot from this population, producing an r that does not equal zero. So, in our example, age and agility are not really related, but the scores in our sample happen to pair up so that it looks like they’re related. (On the other hand, Ha implies that the population’s scatterplot would be similar to the sample’s slanting elliptical scatterplot.) As usual, we test H0, so here we determine the likelihood of obtaining our sample’s r when r is zero. To do so, we examine the sampling distribution of r. To create this, it is as if we infinitely sample the population back in Figure 10.8, each time computing r. The sampling distribution of r shows all possible values of the r Figure 10.8 Scatterplot of a Population for Which r 0, as Described by H0 Our r results from sampling error when selecting a sample from this scatterplot. Sample scatterplots Test scores These are the two-tailed hypotheses whenever you test that the sample either does or does not represent a relationship. This is the most common approach and the one we’ll use. (You can also test the H0 that your sample represents a nonzero r. Consult an advanced statistics book for details.) sampling distribution of r A frequency © iStockphoto.com/Alexander Yakovlev agility, and using the previous formula, compute that r .45. This suggests that the correlation in the population would also be .45. The symbol for the Pearson population correlation coefficient is the Greek letter R, called “rho.” A r is interpreted in the same way as r: It is a number between 0 and {1, indicating either a positive or a negative linear relationship in the population. The larger the absolute value of r, the stronger the relationship and the more closely the population’s scatterplot hugs the regression line. Thus, we might estimate that r would equal .45 if we measured the agility and age of all men. But, on the other hand, there is always the potential problem of sampling error. Maybe these variables are not really related in nature, but, through the luck of the draw of who we tested, the sample data coincidentally form a pattern that produces an r equal to .45. This leads to our statistical hypotheses. As usual, we can perform either a one- or two-tailed test. The two-tailed test is used when we do not predict the direction of the relationship, predicting that the correlation will be either positive or negative. First, any alternative hypothesis always says the predicted relationship exists. If there is a correlation in the population that is either positive or negative, then r does not equal zero. So, 10-3a The Sampling Distribution of r Our H0 implies that we obtained our r because of sampling error—that is, we have an unrepresentative sample that poorly represents zero correlation in the population. You can understand this by looking at Figure 10.8. It shows the scatterplot in the population that H0 says we would find: There is no relationship here, so r is 0. However, H0 implies that, by chance, we Age Chapter 10: Describing Relationships Using Correlation and Regression 175 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. that occur by chance when samples are drawn from a population in which r is zero. Such a distribution is shown in Figure 10.9. The only novelty here is that along the X axis are different values of r. When r 0, the most frequent sample r is also 0, so the mean of the sampling distribution—the average r—is 0. Because of sampling error, however, sometimes we might obtain a positive r and sometimes a negative r. Most often the r will be relatively small (close to 0), but less frequently we may obtain a larger r that falls more into the tails of the distribution. Thus, the larger the r (whether positive or negative), the less likely it is to occur when the sample represents a population where r 0. To test H0, we determine where our r is located in this “r-distribution.” The size of our r directly communicates its location. For example, the mean of the sampling distribution is always 0, so our r of .45 is a distance of .45 below the mean. Therefore, we test H0 simply by examining our obtained r, which is robt. To determine whether robt is in the region of rejection, we compare it to rcrit. 10-3b Drawing Conclusions about r As with the t-distribution, the shape of the r-distribution is slightly different for each df, so there is a different value of rcrit for each df. Table 3 in Appendix B gives the critical values of the Pearson correlation coefficient. Use this “r-table” in the same way you’ve used the t-table: Find rcrit for either a one- or a twotailed test at the appropriate a and df. THE FORMULA FOR THE DEGREES OF FREEDOM FOR THE PEARSON CORRELATION COEFFICIENT IS: df N 2 where N is the number of X-Y pairs in the data Figure 10.9 Sampling Distribution of r When r 0 It is an approximately normal distribution, with values of r plotted along the X axis. μ f –1. 0 ... –r –r –r –r –r –r –r –r 0 +r +r +r +r +r +r +r +r ... +1.0 Values of r Figure 10.10 H0 Sampling Distribution of r When H0: r 0 For the two-tailed test, there is a region of rejection for positive values of robt and for negative values of robt. μ f Values of r –1.0 ... –r robt = –.45 176 –r –r –r –r –r –rcrit = –.396 0 +r +r +r +r +r +r +rcrit = +.396 ... +1.0 For our example, N was 25, so df 23. For a .05, the two-tailed rcrit is { .396. We set up the sampling distribution as in Figure 10.10. An robt of .45 is beyond the rcrit of {.396, so it is in the region of rejection. Thus, our r is so unlikely to occur if we had been representing the population where r is 0 that we reject the H0 that we were representing this population. We conclude that the robt is “significantly different from zero.” (If your df is not listed in the r-table, use the bracketing df and critical values as we did for the t-test.) The rules for interpreting a significant result here are the same as with previous statistics. In particular, a is again the theoretical probability of a Type I error. Here, a Type I error is rejecting the H0 that the population correlation is zero, when in fact the correlation is zero. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Because the sample robt is .45, our best estimate is that in the population, r equals .45. However, recognizing that the sample may contain sampling error, we say that r is around .45. (We could more precisely describe this r by computing a confidence interval for r: Consult an advanced textbook.) If robt did not lie beyond rcrit, then we would retain H0 and conclude that the sample may represent a population where r 0. However, we have not proven there is no relationship in the population. Also, we would be concerned about having sufficient power so that we did not make a Type II error by missing a relationship that exists. Therefore, we make no claims about the relationship, one way or the other. Figure 10.11 H0 Sampling Distribution of r Where r 0 for One-Tailed Test Predicting positive correlation H0 : ρ ≤ 0 Ha: ρ > 0 μ f –1.0 ... –r –r –r –r 0 +r +r +r +r ... +1.0 10-3c One-Tailed Tests of r +rcrit Predicting negative correlation If we had predicted only a positive correlation or only a negative correlation, then we would have performed a one-tailed test. H0: ρ ≥ 0 Ha: ρ < 0 μ THE ONE-TAILED HYPOTHESES FOR TESTING A CORRELATION COEFFICIENT © ColorBlind Images/Blend Images/Jupiterimages Predicting a positive Predicting a negative correlation correlation H0 : r 0 H0 : r 0 H a: r 0 H a: r f –1.0 ... 0 –r –r –r –r 0 +r +r +r +r ... +1.0 –rcrit When predicting a positive relationship, we are saying that r will be greater than zero (in Ha) but H0 says we are wrong. When predicting a negative relationship, we are saying that r will be less than zero (in Ha) but H0 says we are wrong. We test each H0 by again examining the sampling distribution of o r, created for when r 0. From the r-table in Appendix B, find the one-tailed critical value for df and a, and set up one of the sampling distributions shown in Figure 10.11. When predicting a positive correlation, use the upper distribution: roobt is significant if it falls beyond the positive rcrit. When predicting a negative correlation, use the lower distribution: robt is significant if it falls beyond the negative rcrit. 10-3d Summary of the Pearson Correlation Coefficient The Pearson correlation coefficient describes the strength and direction of a linear relationship between normally distributed interval/ratio variables. 1. Compute robt. 2. Create either the two-tailed or the one-tailed H0 and Ha. 3. Set up the sampling distribution and, using df N 2 (where N is the number of pairs), find rcrit in the r-table. 4. If robt is beyond rcrit, the results are significant, so describe the relationship and population correlation coefficient (r). If robt is not beyond rcrit, the results are not significant and make no conclusion about the relationship. 5. Further describe significant relationships using the linear regression and r2 procedures discussed in the next sections. Chapter 10: Describing Relationships Using Correlation and Regression 177 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. > Quick Practice > You should always perform hypothesis testing on a correlation coefficient to be confident it is not produced by sampling error. More Examples We compute an r .32 (N 42). We predict some kind of relationship, so H0: r 0; Ha: r ≠ 0. With a .05 and df 42 2 40, the two-tailed rcrit {.304. The robt is beyond rcrit, so it is significant: We expect the population correlation coefficient (r) to be around .32. For Practice We predict a negative relationship and obtain robt .44. 1. What are H0 and Ha? 2. With a .05 and N 10, what is rcrit? 3. What is the conclusion about robt? 4. What is the conclusion about the relationship in the population? > Answers 4. Make no conclusion. 3. not significant 2. df 8, rcrit .549 1. H0: r ≥ 0; Ha: r < 0 linear regression The procedure used to predict Y scores based on correlated X scores predictor variable The known X scores used in linear regression procedures to predict unknown Y scores criterion variable The unknown Y scores that are predicted in linear regression procedures 178 10-4 STATISTICS IN THE RESEARCH LITERATURE: REPORTING r Report the Pearson correlation coefficient using the same format as with previous statistics. Thus, in our agility study, the robt of .45 was significant with 23 df. We report this as r(23) .45, p < .05. As usual, df is in parentheses and because a .05, the probability of a Type I error is less than .05. Understand that, although theoretically a correlation coefficient may be as large as {1, in real research such values do not occur. Any data will reflect the behaviors of living organisms, who always show variability. Therefore, adjust your expectations about real correlation coefficients: Typically, researchers obtain coefficients in the neighborhood of {.30 to {.50, so below {.30 is considered rather weak and above {.50 is considered extremely strong. Finally, published research often involves a rather large N, producing a complex scatterplot that is difficult to read. Therefore, instead, a graph showing only the regression line may be included. 10-5 AN INTRODUCTION TO LINEAR REGRESSION We’ve seen that in a relationship, particular Y scores are naturally paired with particular X scores. Therefore, if we know an individual’s X score and the relationship between X and Y, we can predict the individual’s Y score. The statistical procedure for making such predictions is called linear regression. Linear regression is the procedure for predicting unknown Y scores based on known correlated X scores. To use regression, we first establish the relationship by computing r for a sample and determining that it is significant. Then, essentially, the regression procedure identifies the Y score that everyone in our sample scored around when they scored at a particular X. We predict that anyone else at that X would also score around that Y. Therefore, we can measure the X scores of individuals who were not in our sample and we have a good idea of what their corresponding Y score would be. For example, the reason that students take the Scholastic Aptitude Test (SAT) when applying to some colleges is because researchers have previously established that SAT scores are positively correlated with college grades: We know the typical college grade average (Y) that is paired with a particular SAT score (X). Therefore, through regression techniques, the SAT scores of applying students are used to predict their future college performance. If the predicted grades are high enough, the student is admitted to the college. Because we base our predictions on someone’s X score, in correlational research the X variable is often referred to as the predictor variable. The Y variable is called the criterion variable. Thus, above, SAT score is the predictor variable and college grade average is the criterion variable. You can understand the regression procedure by looking at Figure 10.12. First, in the scatterplot on the left we see that participants who scored an X of 1 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Figure 10.12 Graphs Showing the Actual Y Scores and the Predicted Y Scores from the Regression Line Y scores Actual scores Predicted scores 7 7 6 6 5 5 4 Y' 4 4 3 3 2 2 1 1 0 1 2 3 4 X scores 0 5 had Ys of 1, 2, or 3, so literally they scored around 2. Thus, our best prediction for anyone in the future who scores an X of 1 is to predict a Y of 2. Likewise, for any other X, our best prediction is the central value of Y. So, for example, for people scoring an X of 3, we see Ys of 3, 4, and 5, so for anyone else scoring an X of 3, we predict a Y of 4. Notice that because the regression line passes through the center of the scatterplot, you could obtain these predicted Y scores by traveling vertically from an X until you intercept the regression line, and then traveling horizontally until you intercept the Y axis. For example, the arrows in the right-hand graph of Figure 10.12 show that at the X of 3, the predicted Y score is again 4. The symbol for this predicted Y score is Y which is pronounced “Y prime.” In fact, the Y at any X is the value of Y falling on the regression line. Therefore, the regression line consists of the data points formed by pairing every possible value of X with its corresponding value of Y . (In a less symmetrical scatterplot, the regression line would still, “on average,” pass through the center of the scatterplot so that, considering the entire linear relationship in the data, our best prediction would still be the value of Y falling on the regression line.) Performing linear regression is like reading off the value of Y on the regression line at a particular X as we did above, but we can be more precise than that. Instead, we compute the linear regression equation. The linear regression equation is the equation that produces the value of Y at each X and defines the straight line that summarizes a relationship. Thus, the 1 2 3 4 X scores 5 equation allows us to do two things: First, we use the equation to produce the value of Y at several Xs. When we plot the data points for these X-Y pairs and connect them with a line, we have plotted the regression line. Second, because the equation allows us to determine the Y at any X, we can use the equation to directly predict anyone’s Y score. The general form of the linear regression equation is Y bX a. The b stands for the slope of the line, a number indicating the degree and direction the line slants. The X stands for the score on the X variable. The a stands for the Y intercept, the value of Y when the regression line intercepts, or crosses, the Y axis. So, using our data, we compute a and b. Then the formula says that to find the value of Y for a given X score, multiply b times the score and add a. Essentially, the regression equation describes how, starting at a particular value of Y (the Y intercept), the Y scores tend to change at a particular rate (described by the slope) as the X scores increase. Then, the center of the Y scores paired with a particular X is Y . This Y is the predicted Y score for anyone scoring that X. (Appendix A.3 provides an expanded discussion of the components of the regression equation and shows their computation. SPSS also computes them.) Of course, not all relationships are equal, so we are also concerned about the accuracy of our predictions. We determine this accuracy by seeing how well we can predict the predicted Y score (Y ) In linear actual Y scores in our data. Recall regression, the predicted from previous sections that the Y score at a particular larger the correlation coefficient, X, based on the linear relationship summarized the more accurate our predicby the regression line tions will be. This is because in a linear regression stronger relationship, there is less equation The variability or “spread” among the equation that produces group of Y scores at each X, so the the value of Y at each X and defines the straight Y scores are closer to the regresline that summarizes a sion line. Therefore, the Y scores relationship at an X are closer to the central Y Chapter 10: Describing Relationships Using Correlation and Regression 179 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. we predict, so our predictions are closer to the actual scores. You can see this in the scatterplot back in Figure 10.5, where a high correlation produced a narrow scatterplot with data points close to the regression line. The data points represent actual Y scores, and the regression line represents Y scores, so the Y scores are close to their Y and we have greater accuracy. Conversely, the weaker relationship back in Figure 10.6 produced a scatterplot that is vertically spread out so that data points are relatively far from the regression line. Therefore, the actual Y scores are far from their Y , so we have less accuracy. Thus, the relative size of r indicates the relative accuracy of our predictions. However, we can directly measure the accuracy by using an advanced statistic called the standard error of the estimate. (See Appendix A.3 for its formula.) The standard error of the estimate is somewhat like the “average” difference between the actual Y scores that participants would obtain and the Y scores we predict for them. Therefore, the larger it is, the greater our “average” error when using regression procedures to predict Y scores. 10-6 THE PROPORTION OF VARIANCE ACCOUNTED FOR: r2 From the correlation coefficient we can compute one more piece of information about a relationship, called the proportion of variance accounted for. In the previous chapter we saw that with experiments this measured the “effect size.” With correlational studies, we don’t call it effect size, because we cannot confidently conclude that changing X caused Y to change, so we can’t say that X had an “effect.” The logic, however, is the same. In any relationship, the proportion of variance accounted for describes the proportion of all differences in Y scores that are associated with changes in the X variable. For example, consider the scores in this simple relationship: proportion of variance accounted for In a correlational design, the proportion of the differences in Y scores associated with changes in X 180 X 1 1 1 Y 1 2 3 2 2 2 5 6 7 When we change from an X of 1 to an X of 2, the Ys change from scores around 2 to scores around 6. These are differences in Y associated with changes in X. However, we also see differences in Y when X does not change: At an X of 1, for example, one participant had a 1 while someone else had a 2. These are differences not associated with changes in X. Thus, out of all the differences among these six Y scores, some differences are associated with changing X and some differences are not. The proportion of variance accounted for is the proportion of all differences in Y scores that are associated with changes in X scores. Luckily, most of the mathematical operations needed to measure the variability in Y that is associated with changing X are performed by first computing the Pearson r. Then the proportion of variance accounted for is THE FORMULA FOR THE PROPORTION OF VARIANCE ACCOUNTED FOR IS Proportion of variance accounted for r 2 Not too tough! Compute r and then square it. For example, previously our age and agility scores produced r .45, so r2 (.45)2 .20. Thus, out of all of the differences in the agility scores, .20 or 20% of them are associated with differences in our men’s ages. (And 80% of the differences in the agility scores are not associated with changes in age.) Thus, while r describes the consistency of the pairing of a particular X with a particular Y in a relationship, r2 is slightly different: It indicates the extent to which the differences in Y occur along with changes in X. The r2 can be as low as 0 (when r 0), indicating that no differences in Y scores are associated with X, to as high as 1 (when r { 1), indicating that 100% of the changes in Y occur when X changes, with no differences in Y occurring at the same X. However, we previously noted that in real research, correlation coefficients tend to be between .30 and .50. Therefore, squaring these values indicates that the proportion of variance accounted for is usually between .09 and .25. Computing the proportion of variance accounted for is important because it gives an indication of the accuracy of our predictions if we were to perform the linear regression procedure. If 20% of the differences in agility scores are related to a man’s age, then by Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. using the resulting regression equation, we can predict these 20% of the differences. In other her words, we are, on average, 20% closer to knowing ng a man’s specific agility score when we use this relationship elationship to predict scores, compared to if we did not use—or know about—the relationship. Therefore, e, this relationship improves our ability to predict agility scores by 20%. The proportion of variance accounted for is used to judge the usefulness and scientific importance of a relationship. Although a relationship must be significant in order to be potentially important (and for us ZUMA Press, Inc./Alamy r2 indicates the proportion of variance accounted for by y the relationship. This is the e proportion of all differences in Y that occur with changes in X. to even compute r 2), it can be significant and an still be unimportant. The r 2 indicates indicate the importance of a relationship beca because the larger it is, the closer the relationship gets us to our goal of relat understanding behavior. That is, by underst understanding the differences in Y underst scores that th are associated with changes in X, we are actually describing the differences in behaviors that are associated with changes in X. Further, we compare different Furth relationships by comparing each r2. relationsh Say that tha in addition to correlating age and agility, we also correlated a man’s w weight with his agility, finding a significant robt .60, so r 2 .36. signifi Thus, while a man’s age accounts for only 20% of the differences in agility scores, his weight accounts for 36% of these differences. Therefore, the relationship involving a man’s weight is the more useful and important relationship, because with it we are better able to predict and understand differences in physical agility. USING SPSS Review Card 10.4 contains instructions for using SPSS to compute the Pearson r, simultaneously performing either a one- or two-tailed significance test. SPSS will also compute the mean and estimated population standard deviation for the X and Y scores. A separate routine will compute the components of the linear regression equation. Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out and use the Chapter Review Cards in the back of your book. Check out the additional study aids online in CourseMate at www.cengagebrain.com STUDY PROBLEMS (Answers for odd-numbered problems are in Appendix C.) 1. What is the difference between an experiment and a correlational study in terms of how we (a) collect the data? (b) examine the relationship? 2. In a correlational study, how do you decide which variable is X and which is Y? 3. (a) What is a scatterplot? (b) What is a regression line? 4. Define: (a) positive linear relationship; (b) negative linear relationship; (c) nonlinear relationship. Chapter 10: Describing Relationships Using Correlation and Regression 181 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 5. (a) When do you compute a Pearson correlation coefficient? (b) What two characteristics of a linear relationship are described by the coefficient? 6. As the value of r approaches { 1, what does it indicate about the following? (a) The consistency that Ys are paired with X; (b) the variability of the Y scores at each X; (c) the closeness of Y scores to the regression line at each X; (d) the accuracy with which we can predict Y if X is known. 7. What are the two statistical explanations for why we obtain a particular r in a study? 8. (a) What does r stand for? (b) How do we determine the value of r? 9. Summarize the steps involved in hypothesis testing of the Pearson correlation coefficient, including additional analyses you may perform. 10. (a) What is linear regression used for? (b) How do researchers set up and use the linear regression procedure to predict unknown Y scores? (c) What is the symbol for a predicted Y score and what is it called? 11. (a) What is the proportion of variance accounted for? (b) How is it measured in a correlational study? (c) How do we use the size of this statistic to judge a relationship? 12. For each of the following, indicate whether it is a positive linear, negative linear, or nonlinear relationship: (a) Quality of performance (Y) increases with increased mental arousal (X) up to an optimal level; then quality of performance decreases with increased arousal. (b) More overweight people (X) are less healthy (Y). (c) As number of minutes of exercise increases each week (X), dieting individuals lose more pounds (Y). (d) The number of bears in an area (Y) decreases as the area becomes increasingly populated by humans (X). 13. Ritchie sees the data in question 12(d) and concludes, “We should stop people from moving into bear country so that we can preserve the bear population.” Why is he correct or incorrect? 14. John finds r .60 between the variables of number of hours studied (X ) and number of errors on a statistics test (Y ). He also finds r .30 between the variables of size of the classroom (X ) and number of errors on the test (Y ). (a) Describe the relative shapes of the two scatterplots. (b) Describe the relative amount of variability in Y scores at each X in each study. (c) Describe the relative closeness of Y scores to the regression line in each study. (d) Which scatterplot will lead to more accurate predictions and why? 15. In question 14, Kim claims that the relationship involving hours studied (r .60) is twice as strong as the relationship with classroom size (r .30), so it is twice as accurate for predicting test scores. (a) Is she correct? (b) Which variable is better for predicting test errors and how do you know? 182 16. Andres asked if there is a relationship between the quality of sneakers worn by a sample of 20 volleyball players and their average number of points scored per game. He computed r .21 and immediately claimed he had evidence that better-quality sneakers are related to better performance. (a) Is his claim correct? Why? (b) What are H0 and Ha? (c) With a .05, what is rcrit? (d) What should he conclude, and how should he report the results? (e) What other computations should he perform to describe this relationship? 17. Tasha asked whether the number of errors made on a math test (X ) is related to the person’s level of satisfaction with his/her performance (Y ). She obtained these scores. Participant Errors (X) Satisfaction (Y) 1 2 3 4 5 6 7 9 8 4 6 7 10 5 3 2 8 5 4 2 7 (a) Summarize this relationship. (Hint: Compute something!) (b) What does this tell you about the sample relationship? (c) What are H0 and Ha? (d) With a .05, what is rcrit? (e) What do you conclude about this relationship in nature? (f) Report the results using the correct format. (g) What proportion of the differences in participants’ satisfaction is linked to their error scores? 18. A researcher believes nurses are absent from work more frequently when they score higher on a test of “psychological burnout.” These data are collected: Participant Burnout (X) Absences (Y ) 1 2 3 4 5 6 7 8 9 2 1 2 3 4 4 7 7 8 4 7 6 9 6 8 7 10 11 (a) Compute the correlation coefficient. (b) What are H0 and Ha? (c) With a .05, what is rcrit? (d) What do you conclude about the strength of this relationship in nature? (e) Report the results using the correct format. (f) How scientifically useful is this relationship? Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 19. We predict that the more people are initially attracted to a person of the opposite sex, the more anxious they become before their first date. We obtain these scores. Participant Attraction (X) Anxiety (Y) 1 2 3 4 5 6 7 8 9 10 2 6 1 3 6 9 6 6 4 2 21. Alonzo suspects that as a person’s stress level changes, so does the amount of his or her impulse buying. He collects data from 72 people and obtains an robt .38. (a) Using a .05, is this r significant? (b) Report his results using the correct format. (c) What other statistics would be appropriate to compute? 8 14 5 8 10 15 8 8 7 6 (a) Compute the correlation coefficient. (b) Is this a relatively strong or relatively weak sample relationship, and what does that tell you about the Y scores at each X? (c) What are H0 and Ha? (d) What is rcrit? (e) Report your results using the correct format. What do you conclude about this relationship in the population? (f) How much does knowing participants’ attraction scores improve your ability to predict their anxiety scores? 20. Ramona measures how positive a person’s mood is and how creative he or she is, obtaining the following interval scores: Participant Mood (X) Creativity (Y ) 1 2 3 4 5 6 7 8 9 10 10 8 9 6 5 3 7 2 4 1 7 6 11 4 5 7 4 5 6 4 (a) Compute the correlation coefficient. (b) What are H0 and Ha? (c) With a .05, what is rcrit? (d) What should she conclude about this relationship in nature? (e) Report the results using the correct format. (f) How scientifically useful is this relationship? 22. Bertha computes the correlation between participants’ physical strength and college grade average using SPSS. She gets r .09, p < .001. She concludes that this relationship is very significant and is a useful tool for predicting which college applicants are more likely to succeed academically. Do you agree or disagree? Why? 23. We wish to determine what happens to creativity scores as participants’ intelligence scores increase. Which variable is our X and which is Y? 24. (a) What do we mean by “restricted range”? (b) How do researchers create a restricted range? (c) Why should we avoid a restricted range? 25. Relationship A has r .20; Relationship B has r .40. (a) The relationship with the scatterplot that more closely hugs the regression line is _____. (b) The relationship having Y scores closer to the Y scores that we’ll predict for them is ______. (c) Relationship A improves our understanding of differences in behavior Y by ______%. (d) Relationship B improves our understanding of differences in behavior Y by ______%. (e) Relationship B is ______ times as useful in predicting scores as Relationship A. Chapter 10: Describing Relationships Using Correlation and Regression 183 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE LOOKING BACK GOING F O R WA R D Be sure you understand: Your goals in this chapter are to learn: • From Chapter 1, what an independent variable, a condition, and a dependent variable are. • The terminology of analysis of variance. • From Chapter 4, that variance measures the differences among scores. • From Chapter 7, why we limit the probability of a Type I error to .05. • When and how to compute Fobt. • Why Fobt should equal 1 if H0 is true, and why it is greater than 1 if H0 is false. • When and how to compute Tukey’s HSD test. • How eta squared describes effect size. • From Chapter 9, what independent samples and related samples are and what effect size is. I n this chapter we return to analyzing experiments. We have Sections 11-1 An Overview of the Analysis of Variance 11-2 11-3 11-4 Components of the ANOVA 11-5 Statistics in the Research Literature: Reporting ANOVA 11-6 11-7 Effect Size and Eta2 184 Performing the ANOVA only one more common type of parametric procedure to learn, and it is called the analysis of variance. This chapter will show you (1) the general logic behind the analysis of variance, (2) how to perform this procedure, and (3) an additional part to this analysis called post hoc tests. Performing the Tukey HSD Test A Word about the Within-Subjects ANOVA Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. © Influx Productions/Photodisc/Jupiterimages 11-1 AN OVERVIEW OF THE ANALYSIS OF VARIANCE It is important to know about the analysis of variance because it is the most common inferential statistical procedure used to analyze experiments. Why? Because there are actually many versions of this procedure, so it can be used with many different designs: It can be applied to independent or related samples, to an independent variable involving any number of conditions, and to a study involving any number of independent variables. Such complex designs are common because, first, an adequate test of the experimental hypotheses may require such a design. Second, after all of the time and effort involved in setting up a study, often little more is needed to test additional conditions or variables. Then we learn even more about a behavior (which is the purpose of research). Therefore, you’ll often encounter the analysis of variance when conducting your own research or when reading about the research of others. The analysis of variance has its own language, which is also commonly used in research: 1. Analysis of variance is abbreviated as ANOVA. 2. An independent variable is also called a factor. 3. Each condition of the independent variable is also called a level or a treatment, and differences produced by the independent variable are a treatment effect. 4. The symbol for the number of levels in a factor is k. 5. We have slightly different formulas for an ANOVA depending on our design. A one-way ANOVA is performed when one independent variable is tested. (A “two-way” ANOVA is used with two independent variables, and so on.) 6. When an independent variable is studied using independent samples, it is called a betweensubjects factor and involves using the formulas for a between-subjects ANOVA. 7. When a factor is studied using related samples, it is called a within-subjects factor and requires the formulas for a within-subjects ANOVA. In this chapter we discuss the one-way betweensubjects ANOVA. (The formulas for a one-way withinsubjects ANOVA are presented in Appendix A.5.) As an example, let’s examine how people perform a task, depending on how difficult they believe the task will be—the “perceived difficulty” of the task. We’ll create three conditions containing the unpowerful n of 5 participants each and provide them with the same easy 10 math problems. However, we will tell participants in Level 1 that the problems are easy, in Level 2 that the problems are of medium difficulty, and in Level 3 that the problems are difficult. Our dependent variable is the number of problems that participants correctly solve within an allotted time. Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance 185 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Table 11.1 Anova Key Terms FACTOR An independent variable LEVELS (TREATMENTS) The conditions of the independent variable TREATMENT EFFECT The result of changing the conditions of an independent variable so that different populations of scores having different ms are produced ONE-WAY ANOVA The ANOVA performed when an experiment has only one independent variable The way to diagram such a one-way ANOVA is shown in Table 11.2. Each column is a level of the factor, containing the scores of participants tested under that condition (here, symbolized by X). The mean of each level is the mean of the scores from that column. Because we have three levels, k 3. (Notice that the general format is to label the factor as factor A, with levels A1, A2, A3, and so on.) Our purpose here is the same as in the two-sample t-test, except now we have three conditions. We hope to see a relationship where, as we change the level of perceived difficulty, the mean number of correct solutions will also change. We would like to conclude that this relationship is also found in nature, where each sample and X represent a different population located at a different m. But there’s the usual problem: Maybe changing perceived difficulty does nothing, and if we tested all participants in the population under all three conditions, we would repeatedly find the same population of scores having the same m. By chance, however, our three samples poorly represent this one population, and because some of the samples happen to contain lower or higher scores than the other samples, we have the appearance of a relationship. Therefore, we must test whether the differences between our sample means reflect sampling error. The analysis of variance (ANOVA) is the parametric procedure for determining whether significant differences analysis of variance occur in an experiment containing (ANOVA) The two or more sample means. Notice parametric procedure that when you have only two confor hypothesis testing in an experiment ditions, you can use either a twocontaining two or more sample t-test or the ANOVA: You’ll conditions reach the same conclusions, and 186 BETWEEN-SUBJECTS FACTOR An independent variable that is studied using independent samples in all conditions BETWEEN-SUBJECTS ANOVA The type of ANOVA that is performed when a study involves between-subjects factors WITHIN-SUBJECTS FACTOR An independent variable that is studied using related samples in all conditions WITHIN-SUBJECTS ANOVA The type of ANOVA performed when a study involves within-subjects factors both have the same probability of making Type I and Type II errors. However, you must use ANOVA when you have more than two conditions. Table 11.2 Diagram of a Study Having Three Levels of One Factor Each column represents a condition of the independent variable. Factor A: Independent Variable of Perceived Difficulty Level A1: Level A2: Level A3: Easy Medium Difficult X X X X X X X X X X X X X X X X1 X2 Conditions k3 X3 The one-way between-subjects ANOVA requires that 1. all conditions contain independent samples. 2. the dependent scores are normally distributed interval or ratio scores. 3. the variances of the populations are homogeneous. Note: The ns in all conditions need not be equal, although they should not be massively different. However, these procedures are much easier to perform with equal ns. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ©Zeamonkey Images/Shutterstock.com ©Peter Clark/Shutterstock.com ©Fedor Selivanov/Shutterstock.com LEVEL 1 LEVEL 2 LEVEL 3 11-1a Controlling the Experiment-Wise Error Rate You might be wondering why we even need ANOVA. Couldn’t we use the independent-samples t-test to test for significant differences among the three means of our perceived-difficulty study? Couldn’t we test whether X1 differs from X2, then test whether X2 differs from X3, and then test whether X1 differs from X3? The answer is that we cannot use this approach because of the resulting probability of making a Type I error (rejecting a true H0). To understand this, we must distinguish between the probability of making a Type I error when comparing a pair of means and the probability of making a Type I error somewhere in the experiment. In previous chapters we’ve seen that alpha (a) is the probability of making a Type I error when we compare a pair of means. The probability of making a Type I error somewhere among the comparisons in an experiment is called the experiment-wise error rate. Until now, we have made only one comparison in an experiment, so with a .05, the experiment-wise error rate was also .05. But if we performed multiple t-tests in the present study, we could make a Type I error when comparing X1 to X2, or when comparing X2 to X3, or when comparing X1 to X3. Now we have three opportunities to make a Type I error, so the overall probability of making at least one Type I error somewhere in the experiment—the experiment-wise error rate—is much greater than .05. Remember, a Type I error is the dangerous error of concluding the independent variable has an effect when really it does not. The probability of making a Type I error should never be greater than .05. Therefore, we do not perform multiple t-tests because the resulting experiment-wise error rate would be greater than our alpha. Instead, we perform The reason for performing ANOVA is that it keeps the experiment-wise error rate equal to a. ANOVA because it limits the experiment-wise error rate, so that when we are finished with all of our decisions, the probability that we’ve made any Type I errors will equal our a. 11-1b Statistical Hypotheses in ANOVA ANOVA tests only two-tailed hypotheses. The null hypothesis is that there are no differences between the populations represented by the conditions. Thus, for our perceived-difficulty study with the three levels of easy, medium, and difficult, we have H0: m1 m2 m3 In general, when we perform ANOVA on a factor with k levels, the null hypothesis is H0: m1 m2 c mk. The “ . . . mk” indicates there are experimentas many ms as there are levels. wise error However, the alternative rate The hypothesis is not that all ms are probability of making a Type I error when different, or Ha: m1 ⬆ m2 ⬆ m3. A comparing all means study may demonstrate differin an experiment ences between some but not all Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance 187 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. conditions. Perhaps our data represent a difference between m1 and m2, but not between m1 and m3, or perhaps only m2 and m3 differ. To communicate this idea, the alternative hypothesis is If Fobt is significant, then perform post hoc comparisons to determine which specific means differ significantly. Ha: not all ms are equal Ha implies that a relationship is present because the population mean represented by one of the sample means is different from the population mean represented by at least one other sample mean. As usual, we test H0, so ANOVA tests whether all sample means represent the same population mean. 11-1c The F Statistic and Post Hoc Comparisons Completing an ANOVA requires two major steps. First, we compute the statistic called F to determine whether any of the means represent different ms. We calculate Fobt, which we compare to Fcrit. When Fobt is not significant, it indicates there are no significant differences between any means. Then, the experiment has failed to demonstrate a relationship and it’s back to the drawing board. When Fobt is significant, it indicates only that somewhere among the means at least two or more of them differ significantly. But, Fobt does not indicate which specific means differ significantly. So, if Fobt for the perceived-difficulty study is significant, we will know we have one or more significant differences somewhere among the means of the easy, medium, and difficult levels, but we won’t know where they are. Therefore, when Fobt is significant we perform a second statistical procedure, called post hoc comparisons. Post hoc comparisons are like t-tests in which we compare all pairs of means from a factor, one pair at a time, to determine which means differ significantly from each other. For the difficulty study we’ll compare the means from easy and medium, from easy and difficult, and from medium and difficult. However, we perform post hoc comparisons only when Fobt is significant. A significant Fobt followed by post hoc comparisons ensures that the experiment-wise probability of a Type I error will equal our alpha. post hoc The one exception to performing comparisons Procedures used to post hoc comparisons is when you compare all pairs of have only two levels in the factor. Then means in a significant the significant difference indicated by factor to determine which means differ Fobt must be between the only two significantly from each means in the study, so it is unnecessary other to perform post hoc comparisons. 188 > Quick Practice > > The one-way ANOVA is performed when testing two or more conditions from one independent variable. A significant Fobt followed by post hoc comparisons indicates which level means differ significantly, with the experiment-wise error rate equal to a. More Examples We measure the mood of participants after they have won $0, $10, or $20 in a rigged card game. With one independent variable, a one-way design is involved, and the factor is the amount of money won. The levels are $0, $10, or $20. If independent samples receive each treatment, we perform the between-subjects ANOVA. (Otherwise, perform the within-subjects ANOVA.) A significant Fobt will indicate that at least two of the conditions produced significant differences in mean mood scores. Perform post hoc comparisons to determine which levels differ significantly, comparing the mean mood scores for $0 versus $10, $0 versus $20, and $10 versus $20. The probability of a Type I error in the study—the experiment-wise error rate—equals a. For Practice 1. A study involving one independent variable is a(n) _____ design. 2. Perform the ____ ANOVA when a study involves independent samples; perform the _____ ANOVA when it involves related samples. 3. An independent variable is also called a(n) _____, and a condition is also called a _____ or _____. (continued) Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. In the ANOVA we compute this variance from two perspectives, called the mean square within groups and the mean square between groups. 4. The _____ will indicate whether any of the conditions differ, and then the _____ will indicate which specific conditions differ. 5. The probability of a Type I error in the study is called the ______. > Answers 5. experiment-wise error rate 4. Fobt; post hoc comparisons 3. factor; level; treatment 2. between-subjects; within-subjects 1. one-way 11-2 COMPONENTS OF THE ANOVA The logic and components of all versions of the ANOVA are very similar. In each case, the analysis of variance does exactly that—it “analyzes variance.” But we do not call it variance. Instead, ANOVA has more of its own terminology. We begin with the defining formula for the estimated population variance on the left: s2x (X X )2 sum of squares N1 degress of freedom SS mean square MS df In the numerator of the variance, we find the sum of the squared deviations between the mean and each score. In ANOVA, the “sum of the squared deviations” is shortened to sum of squares, which is abbreviated as SS. In the denominator we divide by N 1, which is our degrees of freedom or df. Recall that dividing the sum of the squared deviations by the df produces the “average” or the “mean” of the squared deviations. In ANOVA, this is shortened to mean square, which is symbolized as MS. So, when we compute MS we are computing an estimate of the population variance. 11-2a The Mean Square within Groups The mean square within groups describes the variability in scores within the conditions of an experiment. It is symbolized by MSwn. Recall that variance is a way to measure the differences among the scores. Here, we find the differences among the scores within each condition and “pool” them (like we did in the independent-samples t-test). Thus, the MSwn is the “average” variability of the scores within each condition. Because we look at scores within one condition at a time, MSwn stays the same regardless of whether H0 is true or false. Either way, the MSwn is an estimate of the variability of individual scores in any of the populations being represented. 11-2b The Mean Square between Groups The other variance we compute is the mean square between groups. The mean square between groups describes the differences between the means of the conditions in a factor. It is symbolized by MSbn. We measure the differences between the means by treating them as scores, finding the “average” amount they deviate from their mean, which sum of is the overall mean of the experiment. squares (SS) The sum of the In the same way that the deviations squared deviations of of raw scores around their mean a set of scores around describe how different the scores are the mean of those from each other, the deviations of scores the sample means around their overmean square all mean describe how different the (MS) In ANOVA, an estimated population sample means are from each other. variance As we’ll see, performing the mean square ANOVA involves first using our within groups data to compute the MSwn and MSbn. (MSwn) Describes The final step is to then compare the variability of scores within the them by computing Fobt. conditions IF H0 IS TRUE THEN F OBT SHOULD EQUAL 1 mean square between groups (MSbn) Describes the variability among the means in a factor Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance 189 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 11-2c Comparing the population. If H0 is true in our study and we are dealing with one population, then Mean Squares: The Logic in our data, the variability of the sample of the F-Ratio means should equal the variability of the individual scores. The MSbn estimates The key to understanding ANOVA is first the variability of the sample means, and understanding what MSbn reflects when the the MSwn estimates the variability null hypothesis true. In this case, of the individual scores. Thereeven though all conditions represent fore: If we are dealing with only the same population and m, we will one population, our MSbn should not perfectly represent this because equal our MSwn. So, if H0 is true ©iStockphoto.com/Marek Uliasz of sampling error, so the means of for our study, the answer we comour conditions will not necessarily equal each other. pute for MSbn should be the same answer we compute Thus, MSbn is our way of measuring how much the for MSwn. means of our levels differ from each other because of An easy way to determine if two numbers are sampling error. Essentially, here the MSbn is an estimate equal is to make a fraction out of them, which is what of the “average” difference between sample means that we do when computing Fobt. sampling error will produce when we are representing one underlying raw score population. THE FORMULA FOR Fobt IS The test of H0 is based on the fact that statisticians have shown that when samples of scores are selected MSbn Fobt from one population, the size of the differences among MSwn the sample means will equal the size of the differences among individual scores. This makes sense because how This fraction is referred to as the F-ratio. The F-ratio much the sample means differ depends on how much equals the MSbn divided by the MSwn. (The MSbn is the individual scores differ. Say that the variability in always on top!) the population is small so that all scores are very close If we place the same number in the numerator as to each other. When we select samples we will have litin the denominator, the ratio will equal 1. Thus, when tle variety in scores to choose from, so each sample will H0 is true and we are representing one population, the contain close to the same scores as the next and their MSbn should equal the MSwn, so Fobt should equal 1. means also will be close to each other. However, if the Of course Fobt may not equal exactly 1.00 when H0 variability is very large, we have many different scores is true, because we may have sampling error in either available. When we select samples of these scores, we MSbn or MSwn. That is, either the differences among will often encounter a very different batch each time, so our individual scores and/or the differences among our the means also will be very different each time. means may inaccurately represent the corresponding differences in the population. Therefore, realistically, we expect that when H0 is true for our study, Fobt will be “around” 1. In fact, statisticians have shown that when Fobt is a fraction less than 1, mathematically this can occur only because H0 is true and we have sampling In one population, the variability error in representing this. (Each MS is a variance that of sample means equals the squares differences, so Fobt cannot be a negative number.) variability of individual scores. It gets interesting, however, as Fobt becomes larger than 1. No matter what our data show, the H0 implies that Fobt is “trying” to equal 1, and if it does not, it’s Here, then, is the logic of the because of sampling error. Let’s think about that. If ANOVA. We know that when we Fobt 2, it is twice what H0 says it should be, although F-ratio The ratio are dealing with one population, the according to H0, we simply had a little bad luck in of the mean square between groups to the variability of sample means from representing the one population that is present. Or mean square within that population equals the variabilsay that Fobt 4, which means the MSbn is four times groups ity of individual scores from that the size of MSwn, and Fobt is four times what it should 190 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. be if H0 is correct. Yet H0 says that MSbn would have equaled MSwn, but we happened to get a few unrepresentative scores. If Fobt is, say, 10, then it and the MSbn are ten times what H0 says they should be! Still, H0 says, “No big deal—a little sampling error.” As this illustrates, the larger the Fobt, the more difficult it is to believe that our data are poorly representing the situation where H0 is true. Of course, if sampling error won’t explain so large an Fobt, then we need something else that will. The answer is our independent variable. When Ha is true so that changing our conditions would involve more than one population of scores, MSbn will be larger than MSwn, and Fobt will be larger than 1. Further, the more that changing the levels of our factor changes scores, the larger will be the differences between our level means and so the larger will be MSbn. However, the MSwn stays the same. Thus, greater differences produced by our factor will produce a larger Fobt. Turning this around, the larger the Fobt, the more it appears that Ha is true. Putting this all together: population. Therefore, retain the H0 that all conditions represent the same population. Say instead that MSbn 24 and MSwn 6, so Fobt 4. Because MSbn is so much larger than MSwn, at least two conditions might represent different populations. If Fobt is beyond Fcrit, these results are unlikely to be due to sampling error, so accept the Ha that at least two conditions represent different populations. For Practice 1. MSwn is the symbol for the _____, and MSbn is the symbol for the _____. 2. Differences between the individual scores in the population are estimated by _____. 3. Differences between sample means in the population are estimated by _____. 4. The larger the Fobt, the _____ likely that H0 is true. > Answers 3. MSbn 4. less 1. mean square within groups; mean square between groups If our Fobt is large enough to be beyond Fcrit, we will conclude that our Fobt is so unlikely to occur if H0 were true that we will reject H0 and accept Ha. (This is the logic of all ANOVAs, whether for a between- or a within-subjects design.) The MSbn measures the differences among level means. THE ANOVA MS The MSwn measures the differences among individual scores Fobt 11-3 PERFORMING Now we can discuss the computations involved in performing the ANOVA. In the beginning of this chapter you saw that the formula for a mean square involves dividing the sum of squares by the degrees of freedom. In symbols, this is > Quick Practice > > > 2. MSwn THE LARGER THE FOBT, THE LESS LIKELY THAT H0 IS TRUE AND THE MORE LIKELY THAT HA IS TRUE. MSbn MSwn More Examples In a study, MSbn 6 and MSwn 6, so Fobt 1. The MSbn equals the MSwn when all samples belong to the same SS df Adding subscripts, we compute the mean square between groups (MSbn) by computing the sum of squares between groups (SSbn) and dividing by the degrees of freedom between groups (dfbn). Likewise, we compute the mean square within groups (MSwn) by computing the sum of squares within groups (SSwn) and dividing by the degrees of freedom within groups (dfwn). With MSbn and MSwn, we compute Fobt. Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance 191 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. If all this strikes you as the most confusing thing ever devised, you’ll find an ANOVA summary table very helpful. Table 11.3 below shows the general format: (2) the degrees of freedom, (3) the mean squares, and (4) Fobt. COMPUTING THE SUMS OF SQUARES The computa- Table 11.3 tions here require four steps. Summary Table of One-Way ANOVA STEP 1: Compute the sums and means. As in Table 11.4, compute X, X 2, and X for each level (each column). Each n is the number of scores in the level. Then add together the X from all levels to get the total, which is “Xtot.” Also, add together the X 2 from all levels to get the total, which is “X 2tot.” Add the ns together to obtain N. Source Between Within Total Sum of Squares SSbn SSwn SStot Mean Square MSbn MSwn df dfbn dfwn dftot F Fobt STEP 2: Compute the total sum of squares (SStot ). The “Source” column identifies each source of variability, either between or within, and we also consider the total. Using the following formulas, we’ll compute the components for the other columns. 11-3a Computing Fobt Say that we performed the perceived-difficulty study discussed earlier, telling participants that some math problems were easy, of medium difficulty, or difficult, and measuring the number of problems they solved. The data are presented in Table 11.4. As shown in the following sections, there are four parts to computing the Fobt: finding (1) the sum of squares, Data from Perceived-Difficulty Experiment Factor A: Perceived Difficulty Level A1: Level A2: Level A3: Easy Medium Difficult 9 4 1 12 6 3 4 8 4 8 2 5 7 10 2 X 30 X 354 X 220 n1 5 n2 5 X1 8 192 2 X2 6 (Xtot) 2 N 2 Using the data from Table 11.4, Xtot 629, Xtot 85, and N 15, so (85)2 SStot 629 15 7225 SStot 629 15 SStot 629 481.67 (Xtot) 2 (X in column) 2 SSbn a b n in column N X 15 Xtot 85 X 2 55 Xtot2 629 n3 5 N 15 k3 X3 3 STEP 3: Compute the sum of squares between groups (SSbn ). THE FORMULA FOR THE SUM OF SQUARES BETWEEN GROUPS IS Totals 2 SStot Xtot2 Thus, SStot 147.33 Table 11.4 X 40 THE FORMULA FOR THE TOTAL SUM OF SQUARES IS In Table 11.4, each column represents a level of the factor. Find the X for a column, square that X, and then divide by the n in that level. After doing this for all levels, add the results together and subtract the quantity (Xtot)2 >N: SSbn a (40)2 (30)2 (15)2 (85)2 b 5 5 5 5 so SSbn (320 180 45) 481.67 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. COMPUTING CO and MEAN SQUARES You can work S SSbn 545 481.67 63.33 directly from the sumd mary table to compute ma the mean squares. th © iStockphoto.com/Malerapaso STEP 4: Compute the sum of squares uares within groups (SSwn). Mathathematically, SStot equals SS Sbn plus SSwn. So, the total minuss the between leaves the within. THE STEP 1: Compute the mean ST square between groups. THE FORMULA FOR THE MEAN SQUARE BETWEEN GROUPS IS THE FORMULA FOR THE SUM OF SQUARES Q WITHIN GROUPS IS SSwn SStot SSbn MSbn SSbn dfbn From the summary table In the example, SStot is 147.33 and SSbn is 63.33 so SSwn 147.33 63.33 SSwn 84.00 STEP 2: Compute the mean square within groups. COMPUTING THE DEGREES OF FREEDOM Compute the dfbn, dfwn, and dftot. STEP 1: The degrees of freedom between groups equal k 1, where k is the number of levels in the factor. In the example are three levels of perceived difficulty, so k 3. Thus, dfbn 2. STEP 2: The degrees of freedom within groups equal N k, where N is the total N in the experiment and k is the number of levels in the factor. In the example N is 15 and k is 3, so dfwn 15 3 12. STEP 3: The degrees of freedom total equals N 1, where N is the total N in the experiment. In the example N is 15, so dftot 15 1 14 The dftot must equal the dfbn plus the dfwn. At this point the ANOVA summary table looks like this: Source Between Within Total Sum of Squares 63.33 84.00 147.33 63.33 31.67 2 MSbn df 2 12 14 Mean Square MSbn MSwn F Fobt THE FORMULA FOR THE MEAN SQUARE WITHIN GROUPS IS MSwn SSwn dfwn For the example MSwn 84 7.00 12 Do not compute the mean square for SStot because it has no use. COMPUTING THE F Finally, compute Fobt. THE FORMULA FOR Fobt IS Fobt MSbn MSwn In the example MSbn is 31.67 and MSwn is 7.00, so Fobt MSbn MSwn 31.67 4.52 7.00 Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance 193 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. F-distribution The sampling distribution of all values of F that occur when the null hypothesis is true and all conditions represent one population m Now the completed ANOVA summary table is Source Between (difficulty) Within Total Sum of Squares 63.33 df 2 84.00 147.33 12 14 Mean F Square 31.67 4.52 7.00 2. Compute the degrees of freedom: dfbn k 1 2 1 1 dfwn N k 8 2 6 dftot N 1 8 1 7 3. Compute the mean squares: MSbn SSbn >dfbn 24.5>1 24.5 MSwn SSwn >dfwn 7.5>6 1.25 The Fobt is placed in the row labeled “Between.” (Because this row reflects differences due to our treatment, we may also include the name of the independent variable here.) 4. Compute Fobt. > Quick Practice For Practice > To compute Fobt, compute SStot, SSbn, and SSwn and dftot, dfbn, and dfwn. Dividing SSbn by dfbn gives MSbn; dividing SSwn by dfwn gives MSwn. Dividing MSbn by MSwn gives Fobt. More Examples We test participants under conditions A1 and A2. 6 8 9 8 X1 4.25 X 17 X 2 75 n1 4 X 7.75 X 31 X 2 245 n2 4 Xtot 48 X tot2 320 N8 1. Compute the sums of squares: SStot X2tot SSbn a a (Xtot)2 N 320 a 482 b 32 8 (Xtot)2 (X in column)2 b n in column N 172 312 482 b a b 24.5 4 4 8 SSwn SStot SSbn 32 24.5 7.5 194 3. Finally, Fobt equals ______ divided by ______. > Answers 1. The SS and the df 4 5 3 5 2. For between groups, to compute ______ we divide ______ by ______. For within groups, to compute ______ we divide ______ by ______. 2. MSbn, SSbn, dfbn; MSwn, SSwn, dfwn A2 1. What two components are needed to compute any mean square? 3. MSbn, MSwn A1 Fobt MSbn >MSwn 24.5>1.25 19.60 11-3b Interpreting Fobt The final step is to compare Fobt to Fcrit, and for that we examine the F-distribution. The F-distribution is the sampling distribution showing the various values of F that occur when H0 is true and all conditions represent one population. To create it, it is as if, using our ns and k, we select the scores for all of our conditions from one raw score population (like H0 says we did in our experiment), and compute MSwn, MSbn, and then Fobt. We do this an infinite number of times, and plotting the various Fs we obtain produces the sampling distribution, as shown in Figure 11.1. The F-distribution is skewed because there is no limit to how large Fobt can be, but it cannot be less than zero. The mean of the distribution is 1 because, Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. determine the shape of an F-distribution: the df used when computSampling Distribution of F When H0 Is True for dfbn 2 and dfwn 12 ing the mean square between groups (dfbn) and the df used when computμ ing the mean square within groups (dfwn). Therefore, to obtain Fcrit, turn to Table 4 in Appendix B, titled f “Critical Values of F.” A portion of this “F-table” is presented in Table 11.5 below. Across the top of the α = .05 table, the columns are labeled “df between groups.” On the left-hand F F 1.0 F F F F F F F F F F F 0 side, the rows are labeled “df within Fcrit = 3.88 groups.” Locate the appropriate Fobt = 4.52 column and row using the dfs from your study. The critical values in dark type are for a .05, and those in light type are for a .01. For our example, dfbn 2 and dfwn 12. For a .05, the Fcrit most often when H0 is true, MSbn will equal MSwn, is 3.88. (If your df is not listed in the F-table, use the and so F will equal 1. The right-hand tail, however, bracketing dfs and their critical values as we’ve done shows that sometimes, by chance, F is greater than 1. previously.) Because our Fobt can reflect a relationship in the popuAs shown in Figure 11.1, in our perceived-difficulty lation only when it is greater than 1, the entire region study, Fobt is 4.52 and Fcrit is 3.88. Our H0 says that Fobt of rejection is in this upper tail of the F-distribution. is greater than 1 because of sampling error and that (That’s right, ANOVA involves two-tailed hypotheses, actually, we are poorly representing no relationship in but they are tested using only the upper tail of the the population. However, our Fobt is beyond Fcrit and in sampling distribution.) the region of rejection, so we reject H0: Our Fobt is so The F-distribution is actually a family of curves, unlikely to occur if our samples were representing no each having a slightly different shape, depending on difference in the population that we reject that this is our degrees of freedom. However, two values of df what they represent. Therefore, we conclude that the Fobt is significant and that Table 11.5 the factor of perceived diffiPortion of Table 4 in Appendix B, “Critical Values of F” culty produces a significant difference in mean perforDegrees of Freedom mance scores. Within Groups Degrees of Freedom Between Groups Of course, had Fobt (degrees of freedom (degrees of freedom in numerator of F-ratio) been less than Fcrit, then the in denominator corresponding differences a of F-ratio) 1 2 3 4 5 between our means would 1 .05 161 200 216 225 230 not be too unlikely to occur .01 4,052 4,999 5,403 5,625 5,764 when H0 is true, so we — — — — — — — would not reject it. Then, as usual, we’d draw no conclu— — — — — — — sion about the influence of 11 .05 4.84 3.98 3.59 3.36 3.20 our independent variable, .01 9.65 7.20 6.22 5.67 5.32 one way or the other 12 .05 4.75 3.88 3.49 3.26 3.11 (and we would consider if we had sufficient power to .01 9.33 6.93 5.95 5.41 5.06 prevent a Type II error). Figure 11.1 Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance 195 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Studio DL/Corbis STEP 1: Find qk. Use Table 5 in Appendix B, entitled “Values of Studentized Range Statistic.” Locate the column labeled with the k that corresponds to the number of means in your factor. Find the row labeled with the dfwn used to compute your Fobt. (If your df is not in the table, use the df in the table that is closest to it.) Then find the qk for the appropriate a. For our perceived-difficulty study, k 3, dfwn 12, and a .05, so qk 3.77. STEP 2: Compute the HSD. Because we rejected H0 and accepted Ha, we return to the means from the levels: THE FORMULA FOR TUKEY’S HSD IS HSD (qk)a Perceived Difficulty Easy Medium Difficult X1 8 X2 6 X3 3 To see the treatment effect, look at the overall pattern: Because the means change, the scores that produce them are changing, so a relationship is present; as perceived difficulty increases, performance scores decrease. However, we do not know if every increase in difficulty results in a significant decrease in scores. To determine that, we must perform the post hoc comparisons. 11-4 PERFORMING THE TUKEY HSD TEST Remember that a significant Fobt indicates at least one significant difference somewhere among the level means. To determine which means differ, we perform post hoc comparisons. Statisticians have developed a variety of post hoc procedures that differ in how likely they are to produce Type I or Type II errors. One common procedure that has reasonably low error rates is Tukey’s HSD test. It is used only when the ns in all levels of the factor are equal. The Tukey’s HSD HSD is a rearrangement of the t-test test The post hoc that computes the minimum difference procedure performed between two means that is required with ANOVA to compare means from for them to differ significantly. (HSD a factor when all levels stands for Honestly Significant Differhave equal ns ence.) The four steps to performing the HSD test are: 196 MSwn b B n MSwn is the denominator from your significant F-ratio, and n is the number of scores in each level of the factor. In the example MSwn was 7.0, n was 5, and qk is 3.77, so HSD (qk)a MSwn 7.0 b (3.77)a b B n A 5 (3.77)( 11.4) (3.77)(1.183) 4.46 Thus, HSD is 4.46. STEP 3: Determine the differences between each pair of means. Subtract each mean from every other mean. Ignore whether differences are positive or negative because this is a two-tailed test. The differences for the perceived-difficulty study can be diagramed as shown below. Easy Medium Difficult X1 8 X2 6 X3 3 2.0 3.0 5.0 HSD 4.46 On the line connecting any two levels is the absolute difference between their means. STEP 4: Compare each difference to the HSD. If the absolute difference between two means is greater than the HSD, then these means Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. In our example, the HSD was 4.46. The means from the easy level (8) and the difficult level (3) differ by more than 4.46, so they differ significantly. The mean from the medium level (6), however, differs from the other means by less than 4.46, so it does not differ significantly from them. Thus, our final conclusion is that we demonstrated a relationship between performance and perceived difficulty, but only when we changed from the easy to the difficult condition. If everyone in the population were tested under these two conditions, we would expect to find two populations of scores, one for the easy condition at a m around 8, and one for the difficult condition at a m around 3. Further, we could compute a confidence interval for each m (using only the scores in a condition and the formula in Chapter 8) to describe an interval within which we are confident the m produced by the condition would fall. However, we cannot say anything about the population produced by the medium condition, because it did not differ significantly from the other conditions. Finally, as usual, we return to being behavioral researchers and interpret the results in terms of our variables and behaviors: “Psychologically,” why and how does the perceived difficulty of a task influence performance? 11-4a Summary of the ©Keith Levit/Shutterstock.com differ significantly. (It’s as if you performed a t-test on them and tobt was significant.) If the absolute difference between two means is less than or equal to the HSD, then it is not a significant difference (and would not produce a significant tobt). 4. With a significant Fobt, more than two levels, and equal ns, perform the Tukey HSD test. a. Find qk in Table 5, using k and dfwn. b. Compute the HSD. c. Find the difference between each pair of level means. d. Any differences larger than the HSD are significant. 5. Draw conclusions about the influence of your independent variable by considering the significant means of your levels. Also consider the measure of effect size described in Section 11.6. > Quick Practice one-way ANOVA It’s been a long haul, but after checking the assumptions, here is how to perform a one-way ANOVA: 1. The null hypothesis is H0: m1 m2 . . . mk, and the alternative hypothesis is Ha: not all ms are equal. 2. Compute Fobt. a. Compute the sums of squares and the degrees of freedom. b. Compute the mean squares. c. Compute Fobt. 3. Compare Fobt to Fcrit. Find Fcrit in the F-table using dfbn and dfwn. Envision the F-distribution as in Figure 11.1. If Fobt is larger than Fcrit, then Fobt is significant, indicating that the means in at least two conditions differ significantly. > > Perform post hoc comparisons when Fobt is significant to determine which levels differ significantly. Perform Turkey’s HSD test when all ns are equal. More Examples An Fobt is significant, with X1 4.0, X2 1.5, and X3 6.8. All n 11, MSwn 20.61, and dfwn 30. To compute Tukey’s HSD, find qk. For k 3 and dfwn 30, qk 3.49. Then: HSD (qk) a MSwn B n b (3.49) a 20.61 b 4.78 B 11 (continued) Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance 197 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. The differences are 4.0 1.5 2.5; 1.5 6.8 5.3; 4.0 6.8 2.8. Comparing each difference to HSD 4.78, only X2 and X3 differ significanlty. For Practice We have X1 16.50, X2 11.50, and X3 8.92, with n 21 in each condition, MSwn 63.44, and dfwn 60. 1. Which post hoc test should we perform? 2. What is qk here? 3. What is the HSD? 4. Which means differ significantly? > Answers 4. Only X1 and X3 differ significantly. 3. HSD (3.40)(163.44>21) 5.91 2. For k 3 and dfwn 60, qk 3.40. 1. Turkey’s HSD 11-5 STATISTICS IN THE RESEARCH LITERATURE: REPORTING ANOVA In research publications, an Fobt is reported using the same format as previous statistics, except that we include both the dfbn and the dfwn. In the perceiveddifficulty study, the significant Fobt was 4.52, with dfbn 2 and dfwn 12. We report this as 11-6 EFFECT SIZE AND ETA 2 Recall that in experiments we describe the effect size, which tells us how large an impact the independent variable had on dependent scores. In Chapter 9, you saw one measure of effect size, called Cohen’s d, but it is generally used only with two-sample designs. Instead, with larger designs we compute the proportion of variance accounted for. Recall that this is the proportion of all differences among dependent scores in an experiment that are produced by changing our conditions. The greater the proportion of differences we can account for, the greater the impact of the independent variable in that the more it controls the behavior. This produces a stronger or more consistent relationship in which differences in scores tend to occur only when the conditions change. Then we see a set of similar behaviors and scores for everyone in one condition, with a different set of similar behaviors and scores for everyone in a different condition. In ANOVA, this effect size is computed by squaring a new correlation coefficient symbolized by the Greek letter “eta” (pronounced “ay-tah”). The symbol for eta squared is H2. Eta squared indicates the proportion of variance in dependent scores that is accounted for by changing the levels of a factor. An h2 can be used to describe any linear or nonlinear relationship containing two or more levels of a factor. In a particular experiment, h2 will be a proportion between 0 and 1, indicating the extent to which dependent scores change as the independent variable changes. THE FORMULA FOR ETA SQUARED IS h2 F(2, 12) 4.52, p .05 Note: In the parentheses always report dfbn and then dfwn. Usually the HSD value is not reported. Instead, indicate that the Tukey HSD was performed, give the eta squared 2 alpha level used, and identify which (H ) A measure of effect size in levels differ significantly. However, ANOVA, indicating for completeness, the means and the proportion of standard deviations from all levels variance in the are reported, even those that do not dependent variable that is accounted for by differ significantly. Likewise, when changing the levels of graphing the results, the means a factor from all levels are plotted. 198 SSbn SStot The SSbn reflects the differences that occur when we change the conditions. The SStot reflects the differences among all scores in the experiment. Thus, h 2 reflects the proportion of all differences in scores that are associated with changing the conditions. For example, for the perceived-difficulty study, SSbn was 63.33 and SStot was 147.33. So h2 SSbn SStot 63.33 .43 147.33 Thus, 43% of all differences in scores were accounted for by changing the levels of perceived difficulty. Because 43% is a substantial amount, this factor Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. plays an important role in determining participants’ scores, so it is also a scientifically important variable for understanding participants’ underlying behaviors. h2 (eta squared) measures the effect size of a factor by indicating the proportion of all differences in dependent scores that is accounted for by changing the levels of a factor. 11-7 A WORD ABOUT THE WITHIN-SUBJECTS ANOVA Recall that we create related samples either by matching participants in each condition or by using repeated measures of the same participants. However, matching participants is usually unworkable when we have more than two levels. Instead, in our perceived-difficulty study, we might have repeatedly measured one group of participants under all three levels of difficulty. This would equate the conditions for such things as participants’ math ability or their math anxiety. Then, in each condition we’d give them different but equivalent problems to solve. Repeated measures would create a one-way within-subjects design, so we would perform the oneway within-subjects ANOVA. As shown in Appendix A.5, the only difference in this analysis is how we compute the denominator of the F-ratio. Otherwise, we are still testing the same H0 that the conditions do not represent different populations. We use the same logic in which Fobt should equal 1 if H0 is true, so the larger the Fobt is, the less likely that H0 is true and the more likely that Ha is true. If Fobt is larger than Fcrit, then it is significant, so we perform Tukey’s HSD (we’ll have equal ns) and compute h2. The results are reported and interpreted as in the between-subjects design. USING SPSS Review Card 11.4 describes how to use SPSS to perform the one-way between-subjects ANOVA, including the HSD test. The program also computes the mean and estimated population standard deviation for each condition, determines the 95% confidence interval for each mean, and plots a line graph of the means. However, it does not compute h 2 (and the “partial eta squared” provided is not what we’ve discussed). Instructions are also provided for the one way within-subjects ANOVA. Here, SPSS will not perform the HSD test. Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out and use the Chapter Review Cards in the back of your book. Check out the additional study aids online in CourseMate at www.cengagebrain.com STUDY PROBLEMS (Answers for odd-numbered problems are in Appendix C.) 1. What does each of the following terms mean? (a) ANOVA; (b) one-way design; (c) factor; (d) level; (e) treatment; (f) between subjects; (g) within subjects; (h) k. Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance 199 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 2. (a) How do you identify which variable in a study is the factor? (b) How do you identify the levels of a factor? (c) How do you identify the dependent variable? 3. What are the two statistical explanations for why we have differing means among the conditions of an experiment? 4. (a) Describe all the requirements of a design for performing the one-way between-subjects ANOVA. (b) Describe the requirements of a design for performing the one-way within-subjects ANOVA. 5. (a) What is the experiment-wise error rate? (b) How do multiple t-tests create a problem with the experiment-wise error rate? (c) How does ANOVA fix this problem? 6. What are two reasons for conducting a study with more than two levels of a factor? 7. Summarize the steps involved in analyzing an experiment when k 2. 8. (a) When is it necessary to perform post hoc comparisons? (b) Why do we perform post hoc tests? (c) What is the name of the post hoc test discussed in this chapter? (d) When do you perform it? 9. In words, what are H0 and Ha in ANOVA? 10. What are the two types of mean squares and what does each estimate in the population? 11. (a) Why should Fobt equal 1 if the data represent the H0 situation? (b) Why is Fobt greater than 1 when the data represent the Ha situation? (c) What does a significant Fobt indicate about the means of the levels of a factor? 12. (a) What is h 2 called? (b) What does it measure? (c) What does it tell us about the influence of a factor? 13. (a) What does the F-distribution show? (b) What do we know about an Fobt if it is in the region of rejection? (c) How does such an Fobt relate back to what our conditions represent? 14. A study compares four levels. (a) What is H0? (b) What is Ha? (c) Why is Ha not written as Ha: m1 ⬆ m2 ⬆ m3 ⬆ m4? 15. (a) Dixon computes an Fobt of .63. How should this be interpreted? (b) He computes another Fobt of 1.7. How should this be interpreted? 16. Lauren obtained a significant Fobt from an experiment with five levels. She immediately concluded that changing each level results in a significant change in the dependent variable. (a) Is she correct? Why or why not? (b) What must she do? 200 17. A report says that the between-subjects factor of participants’ salaries produced significant differences in self-esteem. (a) Describe the design of this study. (b) What was the outcome of the ANOVA and what does it indicate? (c) What is the researcher’s next step? 18. A report says that the level of math anxiety for 30 statistics students decreased over the duration of the semester. (a) How were the students tested? (b) What was the factor? (c) What was the outcome of the ANOVA and what does it indicate? (d) What do we call testing the same participants in this way? (e) In ANOVA, what do we call this design? 19. In a study in which k 3, n 21, X 1 45.3, X 2 16.9, and X 3 8.2, you obtain these sums of squares. Source Between Within Total Sum of Squares df Mean Square F 147.32 862.99 1010.31 — — — — — — (a) Complete the ANOVA summary table. (b) What are H0 and Ha? (c) What is Fcrit? (d) What do you conclude about Fobt? Report your results in the correct format. (e) Perform the Tukey HSD test if appropriate. (f) What do you conclude about this relationship in the population? (g) What is the effect size in this study, and what does this tell you about the influence of the independent variable? 20. A researcher investigated the effect of different volumes of background noise on participants’ accuracy rates while performing a difficult task. He tested three groups (n 11) and obtained the following means: for low volume, X 66.5; for medium, X 61.5; for loud, X 48.25. He computed the following sums of squares: Source Between Within Total Sum of Squares df Mean Square F 452.16 522.75 974.91 (a) Complete the ANOVA summary table. (b) What are H0 and Ha? (c) What is Fcrit? (d) What do you conclude about Fobt? Report your results in the correct format. (e) Perform the Tukey HSD test if appropriate. (f) What do you conclude about this relationship? (g) What is the effect size in this study, and what does this tell you about this factor? 21. A researcher investigated the number of viral infections people contracted as a function of Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. the amount of stress they experienced during a 6-month period. She obtained the following data: Amount of Stress Negligible Stress Minimal Stress Moderate Stress Severe Stress 2 1 4 1 4 3 2 3 6 5 7 5 5 7 8 4 24. We compare the final exam grades of students taking statistics in the morning (X 76.2), in the afternoon (X 74.45), and in the evening (X 72.53). With n 10 we compute the following sums of squares. Source (a) What are H0 and Ha? (b) Complete the ANOVA summary table. (c) What is Fcrit? (d) What do you conclude about Fobt? Report your results in the correct format. (e) Perform the Tukey HSD test if appropriate. (f) What do you conclude about this study? (g) Compute the effect size and interpret it. 22. Here are data from an experiment studying the effect of age on creativity scores: Age 4 Age 6 Age 8 Age 10 3 5 7 4 3 9 11 14 10 10 9 12 9 8 9 7 7 6 4 5 23. In an ANOVA, with dfbn 4 and dfwn 51, you have Fobt 4.63. (a) Is the Fobt significant? (b) How did you determine this? (a) Compute Fobt and create an ANOVA summary table. (b) What do you conclude about Fobt? (c) Perform post hoc comparisons if appropriate. (d) What should you conclude about this relationship? (e) How important is age in determining creativity scores? (f) Describe how you would graph these results. Between Within Total Sum of Squares df Mean Square F 127.60 693.45 821.05 (a) Complete the ANOVA summary table. (b) What is Fcrit? (c) Is there a significant difference between the class means? (d) What other procedures should be performed? (e) Based on these results, what psychological explanation can you give for why the time of day the class meets has no influence on grades? 25. Considering the chapters you’ve read, identify the inferential procedure to perform for the following studies: (a) Doing well in statistics should reduce students’ math phobia, so we measure their fear after selecting groups who received a final grade of either an A, B, C, or D. (b) To determine if recall is better or worse than recognition, participants study a list of words, and then half of them recall the words and the other half perform a recognition test. (c) We test the aggressiveness of a group of rats after 1, 3, 5, and 7 weeks to see if they become more aggressive as they grow older. (d) We want to use students’ scores on the first exam in a course to predict their final exam grades. (e) We ask if pilots have quicker reaction times than the copilots they usually fly with. Chapter 11: Hypothesis Testing Using the One-Way Analysis of Variance 201 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 12 UNDERSTANDING THE TWO-WAY ANALYSIS OF VARIANCE LOOKING BACK GOING F O R WA R D Be sure you understand: Your goals in this chapter are to learn: • From Chapter 11, the terms factor and level, what a significant F indicates, and how to perform and interpret the Tukey HSD test and h 2. • What a two-way ANOVA is. • How to calculate main effect means and cell means. • What a significant main effect indicates. • What a significant interaction indicates. • How to perform the Tukey HSD test on the interaction. • How to interpret the results of a two-way experiment. I n the previous chapter, we used ANOVA to test the means Sections from one factor. In this chapter, we’ll expand the experiment to involve two factors. This analysis is similar to the previous 12-1 Understanding the Two-Way Design 12-2 Understanding Main Effects 12-3 Understanding the Interaction Effect Appendix A.4). So, think of this chapter as teaching you how to 12-4 Completing the Two-Way ANOVA such designs in research publications and in research you may con- 12-5 Interpreting the Two-Way Experiment ogy, and purpose of such ANOVAs. The following sections present ANOVA, except here we compute several Fs. The good news is we will NOT focus on the computations. Nowadays we usually analyze such experiments using SPSS or other programs (although the formulas for the between-subjects version are presented in understand the computer’s output. You will frequently encounter duct yourself, so you need to understand the basic logic, terminol(1) the general layout of a two-factor experiment, (2) what the ANOVA indicates about the influence of your variables, (3) how to compute a special case of the Tukey HSD that SPSS does not perform, and (4) how to interpret a completed study. 202 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ©Aeypix/Shutterstock.com 12-1 UNDERSTANDING THE TWO-WAY DESIGN The two-way ANOVA is the parametric inferential procedure that is applied to designs that involve two independent variables. When both factors involve independent samples, we perform the two-way betweensubjects ANOVA. When both factors involve related samples, we perform the two-way within-subjects ANOVA. If one factor is tested using independent samples and the other factor involves related samples, we perform the two-way mixed-design ANOVA. The logic of these ANOVAs is identical except for slight variations in the formulas. In this chapter we’ll discuss the between-subjects version. A specific design is described using the number of levels in each factor. If, for example, factor A has two levels and factor B has two levels, we have a “twoby-two” ANOVA, which is written as 2 2. Or, with four levels of one factor and three levels of the other, we have a 4 3 ANOVA, and so on. Each factor can involve any number of levels. Here’s a semi-fascinating idea for a study. Let’s say we are interested in what aspects of a message make it more or less persuasive. One obvious physical characteristic is how loud the message is: Does a louder message “grab” your attention and make you more persuaded? To answer this question, we will present a recorded message supporting a fictitious politician to participants at one of three volumes. Volume is measured in decibels, but to simplify things we’ll call our volumes soft, medium, and loud. Say we are also interested in differences in how our male and female participants are persuaded, so our other two-way ANOVA The factor is the gender of the listener. parametric procedure So, we have a two-way experiment performed when an involving three levels of volume and experiment contains two independent two levels of gender. The dependent variables variable measures how persuasive the two-way message is, with higher scores indibetweencating greater persuasiveness. subjects Understand that this two-way ANOVA The design will tell us everything we parametric procedure performed when both would learn by conducting two, onefactors are betweenway studies: one that compared only subjects factors the persuasiveness scores from the two-way three levels of volume, and one that withincompared the scores of women and subjects ANOVA The men. However, the advantage of the parametric procedure two-way design is that we will also be performed when both able to study something that we’d othfactors are withinsubjects factors erwise miss—the interaction between gender and volume. For now, think of two-way mixed-design an interaction as the influence of comANOVA The bining the two factors. Interactions parametric procedure are important because they often performed with one influence a behavior in nature. Thus, within-subjects factor and one betweena primary reason for conducting a subjects factor study with two (or more) factors is to Chapter 12: Understanding the Two-Way Analysis of Variance 203 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. cell In a twoway ANOVA, the combination of one level of one factor with one level of the other factor factorial design A design in which all levels of one factor are combined with all levels of the other factor main effect The effect on the dependent scores of changing the levels of one factor after collapsing over the other factor observe the interaction between them. A second reason is again that once you’ve created a design for studying one factor, often only a minimum of additional effort is required to include additional factors. The way to organize our 3 2 design is shown in Table 12.1. In the diagram: We perform the two-way between-subjects ANOVA if (1) each cell is an independent sample, and (2) we have normally distributed interval or ratio scores that have homogeneous variance. Note: These procedures are much easier when we have equal cell ns throughout, so we’ll assume we do. Then, as in the following sections, any two-way ANOVA involves examining three things: the two main effects and the interaction effect. 1. Each column represents a level of the volume factor. (In general we’ll call the column factor “factor A.”) Thus, the scores in column A1 are from participants tested under soft volume. 12-2 UNDERSTANDING 2. Each row represents a level of the gender factor. (In general we’ll call the row factor “factor B.”) Thus, scores in row B1 are from male participants. 3. Each small square produced by combining a level of factor A with a level of factor B is called a cell. Here, we have six cells, each containing a sample of three participants who are one gender and given one volume. For example, the highlighted cell contains scores from three females presented with medium volume. (With 3 participants per cell, we have a total of 9 males and 9 females, so N 18.) 4. In a “multi-factor” design like this, when we combine all levels of one factor with all levels of the other factor, the design is also called a factorial design. Here, all levels of gender are combined with all levels of our volume factor. MAIN EFFECTS The first step in the two-way ANOVA is to examine the influence of each factor by itself. This is called a factor’s main effect. The main effect of a factor is the overall effect that changing the levels of that factor has on dependent scores while we ignore the other factors in the study. So, in the persuasiveness study, we will examine the main effect that changing volume by itself has on scores. Then we will examine the main effect that changing gender by itself. has on scores. In any two-way ANOVA, we examine the main effect of factor A and the main effect of factor B. 12-2a The Main Effect of Factor A In the persuasiveness study, the way to examine the main effect of volume by itself is to ignore gender. To do this, we will literally erase the horizontal line that separates the rows of males and females in Table 12.1. Table 12.1 A 3 2 Design for the Factors of Volume and Gender Each column represents a level of the volume factor; each row represents a level of the gender factor; each cell contains the scores of participants tested under a particular combination of volume and gender. © iStockphoto.com/ra-Photos Level B1: Male 204 Factor B: Gender Level B2: Female Level A1: Soft 9 4 11 2 6 4 Factor A: Volume Level A2: Medium 8 12 13 9 10 17 One of the six cells Level A3: Loud 18 17 15 6 8 4 N 18 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Once we erase that horizontal line, we treat the experiment as if it were this one-way design: Factor A: Volume Level A2: Level A3: Level A1: Soft Medium Loud 9 8 18 4 12 17 11 13 15 2 9 6 6 10 8 4 17 4 XA 6 1 XA 11.5 2 XA 11.33 3 By ignoring the distinction between males and females, we simply have six people in each column, so we have a study consisting of one factor with three levels of volume. Then, as usual, we find the mean of each level by averaging the scores in each column. However, in a two-way design, these means are called the main effect means. Here we have the main effect means for volume. In statistical terminology, we computed the main effect means for volume by collapsing the factor of gender. Collapsing a factor refers to averaging together all scores from all levels of that factor. When we collapse one factor, we make it disappear (like we did with gender), so we are left with the main effect means for the remaining factor. Thus, a main effect mean is the mean of the level of one factor after collapsing the other factor. Once we have the main effect means, we can determine the main effect of the factor. To see a main effect, look at how the main effect means change as the levels of the factor change. For the main effect of volume, we look at how persuasiveness scores change as volume increases: Scores go up from around 6 (at soft) to around 11.5 (at medium), but then drop slightly to around 11.33 (at high). So it appears there is a main effect—an influence—of changing the levels of volume. BUT! There’s the usual problem. Although there appears to be a relationship between volume and scores, maybe we are being misled by sampling error. Maybe changing volume does nothing, but by chance we happened to get three samples containing different scores. Therefore, to determine if these are significant differences—if there is a significant main effect of the volume factor—we essentially perform a one-way ANOVA that compares these main effect means. The H0 says there is no difference between the levels of factor A in the population, so we have H0: mA mA mA . The Ha is that at least two of the main effect means reflect different populations, so we have Ha: not all mA are equal. 1 2 3 collapsing Averaging together scores from all levels of one factor to calculate the main effect means for the other factor main effect means The means of the levels of one factor after collapsing the levels of the other factor When we examine the main effect of factor A, we look at the overall mean of each level of A. We test this H0 by computing an Fobt, which we’ll call FA. Approach this exactly as you did the one-way ANOVA in the previous chapter. First, we compare FA to Fcrit, and if it is significant, it indicates that at least two means from factor A differ significantly. Then (with equal ns) we determine which specific levels differ by performing the Tukey HSD test. We also compute the factor’s effect size (h2) and graph the main effect means. Then we describe and interpret the relationship (here describing how changing volume influences persuasiveness scores). 12-2b The Main Effect of Factor B After analyzing the main effect of factor A, we examine the main effect of factor B. To see this main effect, we collapse the levels of factor A (volume), so we erase the vertical lines separating the levels of volume back in Table 12.1. Then we have this: Factor B: Gender Level B1: Male Level B2: Female 9 4 11 2 6 4 8 12 13 9 10 17 18 17 XB 11.89 15 6 8 XB 7.33 4 1 2 We simply have the persuasiveness scores of 9 males and 9 females, ignoring the fact that some of each heard the message at different volumes. So now we have a study consisting of one factor with two Chapter 12: Understanding the Two-Way Analysis of Variance 205 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. levels. Averaging the scores in each row yields the mean persuasiveness score for each gender. These are our main effect means for gender. To see the main effect of this factor, we again look at the pattern of the means: Apparently, changing from males to females leads to a drop in scores from around 11.89 to around 7.33. As usual, though, there is the problem of sampling error, so to determine if this is a significant difference—if there is a significant main effect of the gender factor—we perform essentially another one-way ANOVA that compares these main effect means. Our H0 says there is no difference between the levels of factor B in the population, so we have H0: mB mB . Our Ha is that at least two of the main effect means reflect different populations, so we have Ha: not all mB are equal. X 130 Collapsing the levels of A produces the main effect means for factor B. Differences among these means reflect the main effect of B. For Practice In this study. A1 B1 B2 2 2 2 11 10 9 A2 5 4 3 7 6 5 1. The means produced by collapsing across factor B equal _____ and _____. They are called the _____ means for factor _____. 2. Describe the main effect of A. 3. The means produced by collapsing across factor A are _____ and _____. They are called the _____ means for factor _____. 4. Describe the main effect of B. > Answers 1. XA1 6; XA2 5; main effect; A Collapsing (averaging together) the scores from the levels of factor B produces the main effect means for factor A. Differences among these means reflect the main effect of A. The column means are the main effect means for dose: The main effect is that mean IQ increases from 110 to 130 as dosage increases. The row means are the main effect means for age: The main effect is that mean IQ decreases from 125 to 115 as age increases. 2. Changing from A1 to A2 produces a decrease in scores. > Quick Practice 206 X 110 X 115 3. XB1 3; XB2 8; main effect; B We test this H0 by computing another Fobt, which is FB. We compare this to Fcrit, and if FB is significant, it indicates that at least two main effect means from factor B differ significantly. Then, if needed, we perform the Tukey HSD test to determine which means differ, we compute the effect size, and we describe and interpret this relationship (here describing how gender influences persuasiveness scores). > 20 years X 125 2 When we examine the main effect of factor B, we look at the overall mean for each level of B by examining the row means. > 10 years Factor B: Age Factor A: Dose One Pill Two Pills 100 140 105 145 110 150 110 110 115 115 120 120 4. Changing from B1 to B2 produces an increase in scores. 1 More Examples We compare the effects of two dose levels of a “smart pill” and two levels of age. We have the IQ scores shown here. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 12-3 UNDERSTANDING under one level of factor B. Then see if this effect—this pattern—for factor A is different when you look at the other levels of factor B. For example, here is the first row of Table 12.2, showing the relationship between volume and scores for males. As volume increases, the means also increase, forming an approximately positive linear relationship. THE INTERACTION EFFECT After we have examined the main effects of factors A and B, we examine the effect of their interaction. The interaction of two factors is called a two-way interaction. It is the influence on scores created by combining each level of factor A with each level of factor B. In our example, it is combining each volume with each gender. In general an interaction is identified as A B. Here, factor A has 3 levels and factor B has 2 levels, so it is a 3 2 (say “3 by 2”) interaction. Because an interaction examines the influence of combining the levels of the factors, we do not collapse (ignore) either factor. Instead, we examine the cell means. A cell mean is the mean of the scores from one cell. The cell means for the interaction between volume and gender are shown in Table 12.2. Here we will compare the mean in malesoft to the mean in male-medium, then compare it to the mean in female–soft, and so on. B1: Male However, now look at the relationship between volume and scores for females, using the cell means from the bottom row of Table 12.2. B2: Female For the interaction effect we compare the cell means. ns. For a main effect we compare the level means. ns. However, examining an interaction n is not as simple as saying that the celll means differ significantly from each other. Instead, the way to look at an interaction is to look at the influence of changing the levels of factor A © iStockphoto.com/archives Table 12.2 The Volume by Gender Interaction Each mean is a cell mean. Factor A: Volume Soft Medium Loud Factor B: Gender Male X8 X 11 X 16.67 Female X4 X 12 X6 Factor A: Volume Soft Medium Loud X 8 X 11 X 16.67 Factor A: Volume Soft Medium Loud X 4 X 12 X6 Here, as volume increases, the means first increase but then decrease, producing a nonlinear relationship. Thus, there is a different relationship between volume and persuasiveness scores for each gender level. A two-way interaction int effect is present when the relationship between one factor and the dependent scores changes ch as the levels of the other factor change. In other words, an interaction effect occurs when w the influence of changing one factor is not n the same for each level of the other factor. facto Here, for example, increasing volume does doe not have the same effect on males that it i does on females. An easy way to spot that an interaction effect is present is that you must use the word “depends” when describing the influence of a factor: What effect does increasing the volume have? h It depends on whether wh we’re talking cell mean The about abou males or females. mean of the scores Likewise, Likew you can see the from one cell in a interaction effect by looking two-way design at the difference between males and two-way females at each volume. Who score interaction effect Occurs higher, males or females? It depends when the relationship on which level of volume we’re talkbetween one factor ing about. and the dependent scores depends on Conversely, an interaction the level of the other effect would not be present if the factor that is present cell means formed the same pattern Chapter 12: Understanding the Two-Way Analysis of Variance 207 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. for males and females. For example, say the cell means had been as follows: Factor A: Volume Soft Medium Loud Factor B: Gender Male X5 X 10 X 15 Female X 20 X 25 X 30 A two-way interaction effect indicates that the influence one factor has on scores depends on which level of the other factor is present. Increasing the volume increases scores by about 5 points, regardless of whether it’s for males or females. Or, females always score higher, regardless of volume. Thus, an interaction effect is not present when the influence of changing the levels of one factor does not depend on which level of the other variable we are talking about. In other words, there’s no interaction when we see the same relationship between the dependent scores and one factor for each level of the other factor. Here’s another example of an interaction. Say that in a different study, for one factor we measure whether participants are in a sad or a happy mood. Our second factor involves having participants learn a list of 15 happy words (e.g., love, beauty, etc.) or a list of 15 sad words (e.g., death, pain, etc.). Each group then recalls its list. Research suggests we would obtain mean recall scores forming a pattern like this: So, in our persuasiveness study it appears we have an interaction effect. But . . . that’s right: Perhaps by chance we obtained cell means that form such a pattern, but in the population (in nature), these variables do not interact in this way. To determine if we have a significant interaction effect, we perform essentially another one-way ANOVA that compares the cell means. To write the H0 and Ha in symbols is complicated, but in words, H0 is that the cell means do not represent an interaction effect in the population, and Ha is that at least some of the cell means do represent an interaction effect in the population. To test H0, we compute another Fobt, called FAB. If FAB is significant, it indicates that at least two of the cell means differ significantly in a way that produces an interaction effect. Then we perform a slightly different version of the Tukey HSD test, we compute the effect size, and we describe and interpret the relationship (here describing how the different combinations of volume and gender influence persuasiveness scores). Words Happy Sad X 10 X5 Happy X5 X 10 Is there an interaction effect here? Yes, because the relationship between recall scores and sad/happy words changes with the level of mood. Participants recall more sad words than happy words when in a sad mood. But they recall fewer sad words than happy words when in a happy mood. So, what is the influence of sad or happy words on recall? It depends on what mood particir, in which pants are iin. Or, ple recall mood wil will people words be best? It depends on whet whether they are py or sad recalling happy words. 208 > Quick Practice > We examine the interaction effect by looking at the cell means. An effect is present if the relationship between one factor and the dependent scores changes as the levels of the other factor change. More Examples Mor Here are the cell means when factor A is dose of the smart pill and factor B is age of participants. smar © Thinkstock/Jupiter Images Mood Sad Factor A: Dose One Pill Two Pills Factor B: 10 years Age 20 years X 105 X 145 X 115 X 115 (continued) Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. We see an interaction effect because the influence of increasing dose depends on participants’ age. Dosage increases mean IQ for 10-year-olds from 105 to 145, but it does not change mean IQ for 20-year-olds (always at 115). Or the influence of increasing age depends on dose. With 1 pill, 20-year-olds score higher (115) than 10-year-olds (105), but with 2 pills, 10-year-olds score higher (145) than 20-year-olds (115). For Practice A study produces these data: B1 B2 A1 2 2 2 11 10 9 A2 5 4 3 7 6 5 1. The means to examine for the interaction are called the _____ means. 2. When we change from A1 to A2 for B1, the cell means are _____ and _____. 3. When we change form A1 to A2 for B2 the cell means are _____ and _____. 4. How does the influence of changing from A1 to A2 depend on the level of B that is present? 5. Is an interaction effect present? > Answers 1. cell 2. 2, 4 3. 10, 6 4. Under B1 the means increase; under B2 they decrease. 5. yes 12-4 COMPLETING THE TWO-WAY ANOVA As you’ve seen, in the two-way ANOVA we compute three Fs: one for the main effect of factor A, one for the main effect of factor B, and one for the interaction of A B. The formulas for the two-way betweensubjects ANOVA applied to our persuasiveness data are presented in Appendix A.4. The completed ANOVA Summary Table is shown here in Table 12.3. (This is similar to the output produced by SPSS.) The row labeled Factor A reflects the differences between groups due to the main effect of changing volume. The row labeled Factor B reflects the differences between groups due to the main effect of gender. The row labeled Interaction reflects the differences between groups (cells) formed by combining volume with gender. The row labeled Within reflects the differences among individual scores within each cell, which are then pooled. (In SPSS, this row is labeled Error.) The logic and calculations for the Fs here are the same as in the one-way ANOVA. For each, if H0 is true and the data represent no relationship in the population, then the variability of the means (the MSbn) should equal the variability of the scores (the MSwn). Then the F-ratio of the MSbn divided by the MSwn should equal 1. However, the larger the Fobt, the less likely that H0 is true. The novelty here is that we have three versions of the MSbn, one for factor A, one for factor B, and one for the interaction. To compute them, in each row of Table 12.3 we first compute the appropriate sum of squares and then divide by the appropriate df. This produces the corresponding mean square between groups. In the row labeled Within, dividing the sum of squares by the df produces the one MSwn we use to compute all three Fs. Then, the Fobt for factor A (volume) of 7.14 is produced by dividing 58.73 by 8.22. The Fobt for factor B (gender) of 11.36 is produced by dividing 93.39 by 8.22. And the Fobt for the interaction (volume gender) of 6.25 is produced by dividing 51.39 by 8.22. Each Fobt is tested by comparing it to Fcrit from the F-table in Appendix B. To find the Fcrit for a particular Fobt, use the dfbn and dfwn used to compute that Fobt. You may have different dfs for each Fobt. Above, for our factor A and the interaction, we find Fcrit using dfbn 2 and dfwn 12. However, for factor B, we use dfbn 1 and dfwn 12. Also, which of your Fs Table 12.3 Completed Summary Table of Two-Way ANOVA Source Between Factor A (volume) Factor B (gender) Interaction (vol gen) Within Total Sum of Squares df Mean Square F 117.45 93.39 102.77 98.67 412.28 2 1 2 12 17 58.73 93.39 51.39 8.22 7.14 11.36 6.25 Chapter 12: Understanding the Two-Way Analysis of Variance 209 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. are significant depends solely on Table 12.4 your particular data: Any comMain Effect and Interaction Means from the Persuasiveness Study bination of the main effects and/ or interaction may or may not be Factor A: Volume significant. Level A1: Level A2: Level A3: As shown in Appendix A.4, Soft Medium Loud the persuasiveness data produced 8 11 16.67 significant Fs for factor A (volX male 11.89 Factor B: Level B1: Male ume), for factor B (gender), and Gender Level B : Female 4 12 6 X fem 7.33 2 for the interaction. Table 12.4 shows all of the means for the X soft 6 X med 11.5 X loud 11.33 persuasiveness study. The means inside the matrix are again the cell means. The means under the means and then follow the procedure described in the matrix are again the column means, which are the previous chapter. main effect means for volume. The means to the right of the matrix are again the row means, which are the main effect means for gender. THE FORMULA FOR THE TUKEY HSD TEST IS Notice that instead of using the individual raw scores in a column to compute the main effect mean MSwn of the column, we can average together the two cell HSD (qk)a b B n means in the column (e.g., for soft volume, (8 4)/2 6). Likewise, to compute the main effect mean for a row, we can average together the three cell means where MSwn is from the ANOVA and n is the numin the row (e.g., for males, (8 11 16.67)/3 ber of scores in a level. Find qk in Table 5 in Appen11.89). dix B using the k for the factor and dfwn from the To understand and interpret the results of a ANOVA. two-way ANOVA, you should examine the means Be aware the k and n may be different for each from each significant main effect and interaction by factor, so you may need to compute a separate HSD performing the HSD test, graphing the means, and for each factor. In our study, k for volume is 3, but k computing h 2. for gender is 2. Also, be careful when identifying n. 12-4a Examining the Main Effects © Samantha Everton/Photolibrary Approach each main effect as the separate one-way ANOVA that we originally diagrammed. As usual, a significant Fobt merely indicates differences somewhere among the main effect means. Therefore, if the ns are equal and we have more than two levels in the factor, we determine which specific main effect means differ by performing Tukey’s HSD test. For one factor at a time, we find the differences among all main effect Look back at Table 12.1. When we collapsed gender to get the main effect means of volume, we combined two groups of 3 scores each, so our n in the HSD when comparing volume means is 6. (There are 6 scores in each column.) However, when we collapsed volume to get the main effect means of gender, we combined three groups of 3 scores each, so our n in the HSD comparing gender means is 9! (There are 9 scores in each row.) As shown in Appendix A.4, the HSD test for our main effect means for the volume factor indicates that the soft condition (6) differs significantly from both the medium (11.5) and loud (11.33) levels. However, medium and loud do not differ significantly. For the gender factor, it must be that the mean for males (11.89) differs significantly from the mean for females (7.33). If factor B had involved more than 210 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. To compute h 2 for factor A, use the sum of squares for factor A as the SSbn in the above formula. So, from Table 12.3, h 2 117.45/412.28 .28. To compute h 2 for factor B, use the sum of squares for factor B as the SSbn, so 93.39/412.28 .23. Thus, changing volume accounts for 28% of the differences in persuasiveness scores, and changing gender accounts for 23% of the variance in these scores. two lines on the graph to show ušan Zidar hoto.com/D © iStockp this relationship twice, once for males and once for females. So, approach this in the same way as when we examined the means for the interaction effect. There we first looked at the relationship between volume and persuasiveness scores for only males: From Table 12.4 their cell means are Xsoft 8, Xmedium 11, and Xloud 16.67. We plot these three means and connect the data points with straight th lines. Then look at the relationship between volume and scores for females: Their cell means are Xsoft 4, Xmedium 12, and Xloud 6. We plot these means and connect their adjacent data points with straight lines. (Notice: Always provide a key to identify each line.) The way to read the graph is to look at one line at a time. For males (the dashed line), as volume increases, mean persuasiveness scores increase. However, for females (the solid line), as volume increases, persuasiveness scores first increase but then decrease. Thus, we see one relationship for males and a different relationship for females, so the graph shows an interaction effect. 12-4b Examining the Figure 12.1 THE FORMULA FOR h2 IS h2 SSbn SStot Interaction Effect We examine an interaction effect in the same ways that we examine a main effect. So first, using the above formula, we can determine h 2 for the interaction. We find the sums of squares back in Table 12.3, and then h 2 102.77/412.28 .25. Thus, our various combinations of volume and gender in the interaction account for 25% of the differences in persuasiveness scores. Second, an interaction can be a beast to interpret, so always graph it! As usual, label the Y axis with the mean of the dependent variable. To produce the simplest graph, place on the X axis the factor with the most levels. So, for the persuasiveness study the X axis is labeled with the three volume levels. Then we plot the cell means. The resulting graph is shown in Figure 12.1. As in any graph, we are showing the relationship between the X and Y variables, so here we show the relationship between volume nd persuasiveness. However, we plot Graph of Cell Means, Showing the Interaction of Volume and Gender 18 16 Male Female 14 Mean persuasiveness two levels, however, we would compute a new HSD and proceed as usual. Also, it is appropriate to produce a separate graph ph for each significant main effect. As you saw in Chapter 3, we show the relationship between the levels of the factor (independent variable) on the X axis, and the main effect means (dependent variable) on the he Y axis. Include all means, even those that do not differ significantly. Finally, SPSS will not compute the effect size. Therefore, you should compute h 2 to determine the proportion of variance accounted for by each significant main effect. 12 10 8 6 4 2 0 Soft Medium Volume of message Loud Chapter 12: Understanding the Two-Way Analysis of Variance 211 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. An interaction effect is present when its graph produces lines that are not parallel. Graph the interaction by drawing a separate line for each level of one factor that shows the relationship between the other factor on the X axis and the dependent (Y) scores. comparing cells connected by the dashed lines in Table 12.5. Instead, we perform only unconfounded comparisons, in which two cells differ along only one factor. The cells connected by solid lines in Table 12.5 are examples of unconfounded comparisons. Thus, we compare only cell means within the same column because these differences result from factor B. We compare means within the same row because these differences result from factor A. We do not, however, make any diagonal comparisons, because they are confounded comparisons. Take note that an interaction effect can produce a graph showing any kind of pattern, except that it always produces lines that are not parallel. Each line shows the relationship between X and Y, so a line that is shaped or oriented differently from another line indicates a different relationship. Therefore, when the lines are not parallel, they indicate that the relationship between X and Y changes depending on the level of the second factor, so an interaction effect is present. Conversely, when an interaction effect is Table 12.5 not present, the lines will be virtually parallel, Interaction Means for Persuasiveness Study with each line depicting essentially the same Any horizontal or vertical comparison is unconfounded; any diagonal comparison relationship between X and Y. 12-4c Performing the Tukey Soft Factor A: Volume Medium Loud B1: Male X 8 X 11 X 16.67 B2: Female X 4 X 12 X 6 HSD Test on the Interaction We also apply the Tukey HSD test to a signifiFactor B: cant interaction effect so that we can deterGender mine which of the cell means differ significantly. However, SPSS will not perform this test, and it is slightly different from the test for main effects. First, recognize that we do not compare every cell mean to every other cell mean. Look at Table 12.5. We would not, for example, compare the mean for males at soft volume to the mean for females at medium dium volume. Because the two cells differ iffer both in terms of gender and volume, lume, we confounded comparison A cannot determine which of these hese varicomparison of two cells ables caused the difference. Therefore, that differ along more we are confused or “confounded.” unded.” A than one factor confounded comparison n occurs unconfounded when two cells differ along ng comparison A comparison of two cells more than one factor. Other er that differ along only examples of confounded ed one factor comparisons would involve ve 212 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. © iStockphoto.com/Niels Laan is confounded. Values of Adjusted k Design of Number of Cell Study Means in Study 4 22 Adjusted Value of k 3 23 6 5 24 8 6 In the left-hand column locate your design: Ours is a 3 2 or a 2 3. In the middle column confirm the number of cell means in the interaction: We have 6. In the right-hand column is the adjusted k: For our study it is 5. Use the adjusted k as the value of k for finding qk. For our study, in Table 5 we look in the column for k equal to 5. With a .05 and dfwn 12, the qk is 4.51. We compute the HSD using this qk and the usual formula. Notice that n is different than in the main effects: Back in Table 12.1, each cell mean is based on the 3 scores in the cell, so now the n in the HSD is 3. > Quick Practice > > The graph of an interaction shows the relationship between one factor on X and dependent scores on Y for each level of the other factor. When performing Tukey’s HSD test on an interaction effect, determine the adjusted value of k and make only unconfounded comparisons. More Examples We obtain the cell means below on the left. To produce the graph of the interaction on the right, plot data points at 2 and 6 for B1 and connect them with a solid line. Plot data points at 10 and 4 for B2 and connect them with a dashed line. B1 B2 A1 A2 X 2 X 6 X 10 X 4 A1 A2 Say that dfwn 16, MSwn 5.19, and the n per cell is 5. For the HSD, from Table 5 in Appendix B, the adjusted k is 3, so MSwn 5.19 b 3.72 B n B 5 The unconfounded comparisons involve subtracting the means in each column and each row. All differences are significant except when comparing 6 versus 4. b (3.65)a For Practice We obtain the following data: B1 B2 A1 A2 X 13 X 14 X 12 X 22 The dfwn 12, MSwn 4.89, and n 4. 1. The adjusted k is _____. 2. The qk is _____. 3. The HSD is _____. 4. Which cell means differ significantly? > Answers 1. 3 2. 3.77 3. 4.17 4. only 12 versus 22 and 14 versus 22 To complete the HSD test, look back at Table 12.5. For each column, we subtract every mean in the column from every other mean in that column. For each row, we subtract every mean in the row from every other mean in that row. Any difference between two cell means that is larger than the HSD is significant. As shown in Appendix A.4 our persuasiveness study produced only three significant differences: (1) between females at soft volume and females at medium volume, (2) between males at soft volume and males at loud volume, and (3) between males at loud volume and females at loud volume. B1 B2 10 8 6 4 2 0 HSD (qk )a Performing the HSD test on the interaction requires making only unconfounded comparisons and finding the adjusted k. Mean score We also have one other difference when performing the HSD test on an interaction. Recall that to compute the HSD requires qk. Previously, we found qk in Table 5 in Appendix B using k, the number of means being compared. However, because we are not comparing all cell means, we must “adjust k.” You obtain the adjusted k from the table titled Values of Adjusted k at the top of Table 5 in Appendix B. A portion of the table is below: Chapter 12: Understanding the Two-Way Analysis of Variance 213 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 12-5 INTERPRETING THE TWO-WAY EXPERIMENT The way to interpret an experiment is to look at the significant differences between means from the post hoc comparisons for all significant main effects and interaction effects. All of the differences found in the persuasiveness study are summarized in Table 12.6. Each line connecting two means indicates that they differ significantly. Often the interpretation of a two-way study may focus on the significant interaction, even when main effects are significant. This is because our conclusions about main effects may be contradicted by the interaction. After all, the interaction indicates that the influence of one factor depends on the levels of the other factor and vice versa, so you should not act like either factor has a consistent effect by itself. For example, look at the main effect means for gender in Table 12.6: This leads to the conclusion that males (at 11.89) score higher than females (at 7.33). However, now look at the cell means of the interaction: Gender differences depend on volume because only in the loud condition is there a significant difference between males (at 16.67) and females (at 6). Therefore, it is inaccurate to conclude that males always score higher than females. Likewise, look at the main effect means for volume: Increasing volume from soft to medium significantly raises scores (from 6 to 11.5), as does increasing volume from soft to loud (from 6 to 11.33). However, the interaction indicates that increasing volume from soft to medium produces a significant difference only for females (from 4 to 12). Increasing volume from soft to loud produces a significant difference only for males (from 8 to 16.67). So, it is inaccurate to conclude that increasing volTable 12.6 ume always has the same effect. When the interaction is not significant we can focus our interpretation on the main effects, because then they have a more consistent effect. For completeness, however, always perform all analyses of significant main and interaction effects, and report all significant and nonsignificant results. 214 Report each Fobt using the same format we used for the one-way ANOVA. Also, the size of h2 for each main effect and the interaction can guide your conclusions. In our study, all of our effects are rather large, with volume and gender accounting for 28% and 23% of the variance, respectively, and the interaction accounting for 25% of the variance. Therefore, they are all of equal importance in understanding differences in persuasiveness scores. However, had any of the effects been small, we would downplay the role of that effect in our interpretations. In particular, if the interaction’s effect is small, then although the interaction contradicts the main effect, it is only slightly and inconsistently contradictory. In such cases, you may focus your interpretation on significant main effects that have a more substantial effect. So, looking at the significant cell means in Table 12.6 we conclude our persuasiveness study by saying that increasing the volume of a message beyond soft tends to increase persuasiveness scores in the population, but this increase occurs for females with medium volume and for males with loud volume. Further, differences in persuasiveness scores occur between males and females in the population, but only if the volume of the message is loud. The primary interpretation of a two-way ANOVA may focus on the significant interaction. Summary of Significant Differences in the Persuasiveness Study Each line connects two means that differ significantly. Factor B: Gender Level B1: Male Level B2: Female Level A1: Soft 8 Factor A: Volume Level A2: Medium 11 Level A3: Loud 16.67 4 12 6 X soft 6 X med 11.5 X loud 11.33 X male 11.89 X fem 7.33 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. USING SPSS See Review Card 12.4 for instructions for performing the two-way between-subjects ANOVA. In addition, you can choose for SPSS to (1) compute all descriptive statistics, including main effect and cell means; (2) graph the main effects and interaction; and (3) perform the Tukey HSD test, but on main effects only. You must perform the HSD test on the interaction as described in this chapter and in Appendix A.4 . Also, you must compute each h 2. Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out and use the Chapter Review Cards in the back of your book. Check out the additional study aids online in CourseMate at www.cengagebrain.com STUDY PROBLEMS (Answers for odd-numbered problems are in Appendix C.) 1. (a) When do you perform the two-way ANOVA? (b) What information can be obtained from a two-way ANOVA that cannot be obtained from two one-way designs that test the same factors? 2. Which type of ANOVA is used in a two-way design when (a) both factors are tested using independent samples? (b) one factor involves independent samples and the other factor involves related samples? (c) both factors involve related samples? 3. Explain the following terms: (a) factorial; (b) cell; (c) collapsing a factor. 4. What is the difference between a main effect mean and a cell mean? 5. (a) What do we mean by the “main effect” of factor A? (b) How are the main effect means of factor A computed? (c) What does a significant FA indicate about these means? 6. (a) How do we obtain the means examined in an interaction effect? (b) What does a significant interaction effect indicate about the factors and the dependent scores? 7. (a) Identify the F-ratios computed in a two-way ANOVA and what they apply to. (b) What must be done for each significant effect in a two-way ANOVA before interpreting the experiment? (c) Why must this be done? 8. (a) What is a confounded comparison, and how do you spot it in a study’s diagram? (b) What is an unconfounded comparison, and when does it occur in a study’s diagram? (c) Why don’t we perform post hoc tests on confounded comparisons? 9. We study the effect of factor A, which is four different memory techniques, and factor B, which is participants age of 15, 20, or 25 years old. We test 10 participants per cell. (a) Draw a diagram of this study. (b) Using two numbers, describe the design. (c) What is the n in each level when computing the main effect means for the memory factor? (d) What is the n in each level when computing the main effect means for the age factor? (e) What is the n in each group when performing the HSD test on the interaction? 10. For a 2 2 ANOVA, describe in words the statistical hypotheses for factor A, factor B, and A B. 11. (a) What is the major difference when computing the HSD for a main effect and for an interaction? (b) What is the major difference when finding the differences between the means in the HSD test for a main effect and for an interaction? 12. (a) When is it appropriate to compute h 2 in a two-way ANOVA? (b) For each effect, what does it tell you? 13. The diagram below shows the means from a study. A1 A2 A3 B1 10 8 7 8.33 B2 9 13 18 13.33 9.5 10.5 12.5 (a) Does there appear to be an interaction, and if so, why? (b) What are the main effect means for A, and what is the apparent conclusion about the main effect of factor A? (c) What are the main effect means for B, and what is the apparent conclusion about this main effect? (d) Graph the interaction. Chapter 12: Understanding the Two-Way Analysis of Variance 215 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 14. Joanne examined eye–hand coordination scores for three levels of reward and three levels of practice. She obtained the means given below. Low Practice Medium High Low 4 Reward Medium 10 High 7 7 5 5 14 8 15 15 15 15 8 10 12 (a) What are the main effect means for reward, and if Fobt is significant, what do they appear to indicate? (b) What are the main effect means for practice, and if Fobt is significant, what do they appear to indicate? (c) What procedure should be performed to confirm your answers in parts a and b? (d) What else has she forgotten to examine? 15. In question 14: (a) If FA B is significant, what is the apparent conclusion about scores from the viewpoint of increasing rewards? (b) How does the interaction contradict your conclusions about the main effect of rewards in question 14? (c) What should she do next? (d) How would you find unconfounded comparisons of the cell means? (e) The dfwn 60. For this HSD what is the appropriate qk? 16. Felix measured participants’ preferences for two brands of soft drinks (factor A). For each brand he tested male and female participants (factor B). The ANOVA produces all significant Fs. The MSwn 9.45, n 11 per cell, and dfwn 40. The means are below. 17. Below are the cell means of three experiments. For each, compute the main effect means and decide whether there appears to be an effect of A, B, and/or A B. Study 1 A2 A1 Factor B 14 29 21.5 25 12 18.5 19.5 20.5 (a) What are the main effect means for brands? Describe this main effect on preferences. (b) What are the main effect means for gender? Describe this main effect on preferences. (c) Perform Tukey’s HSD test where appropriate. (d) Describe the interaction. (e) Describe a graph of the interaction when factor A is on the X axis. (f) Why does the interaction contradict your conclusions about the main effects? 216 Study 3 A1 A2 B1 2 4 B1 10 5 B1 8 14 B2 12 14 B2 5 10 B2 8 2 18. In question 17, if you label the X axis with factor A and graph the cell means, what pattern will you see for each interaction? 19. We classified participants as high- or lowfrequency cell phone users, and also as having one of four levels of part-time income (from low to high). The dependent variable was satisfaction with their social lives. The ANOVA produced only a significant main effect for income level and a significant interaction effect. What can you conclude about differences between mean satisfaction scores occurring with (a) phone usage? (b) income level? (c) the interaction? 20. You measure the dependent variable of participants’ relaxation level as a function of whether they meditated before being tested, and whether they were shown a film containing a low, medium, or high amount of fantasy. Here are the data and the ANOVA. Mediation Factor A Level A1: Level A2: Brand X Brand Y Level B1: Males Level B2: Females Study 2 A1 A2 No Mediation Low 5 6 2 2 5 10 10 9 10 10 Amount of Fantasy Medium 7 5 6 9 5 2 5 4 3 2 Source Sum of Squares df A: Fantasy B: Meditation A B: Interaction Within Total 42.467 .833 141.267 38.800 223.367 2 1 2 24 29 High 9 8 10 10 10 5 6 5 7 6 Mean Square 21.233 .833 70.633 1.617 F 13.134 .515 43.691 (a) Which effects are significant? (b) Compute the main effect means and the interaction Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. means. (c) Perform the Tukey HSD test where appropriate. (d) What do you conclude about the relationship(s) this study demonstrates? (e) Evaluate the impact of each effect. 21. A 2 2 design tests participants’ frustration levels when solving problems as a function of the difficulty of the problem and whether they are math or logic problems. The results are that logic problems produce significantly more frustration than math problems; greater difficulty leads to significantly greater frustration; and difficult math problems produce significantly greater frustration than difficult logic problems, but the reverse is true for easy problems. Which effects are significant in the ANOVA? 22. In question 21, say instead the researcher found no difference between math and logic problems, but frustration significantly increases with greater difficulty, and this is true for both math and logic problems. Which effects are significant in this ANOVA? 23. Summarize the steps in analyzing a two-way experiment and describe what each step accomplishes. 24. (a) What do researchers do to create a design that fits a two-way ANOVA? (b) What must be true about the dependent variable? (c) Which versions of ANOVA are available? 25. For the following, identify the parametric procedure to perform. (a) We measure babies’ irritability when their mothers are present and when they are absent. (b) We test the driving ability of participants who score either high, medium, or low on the trait of “thrill seeker.” For each type, we test some participants who have had either 0, 1, or 2 accidents. (c) We compare the degree of alcoholism in participants with alcoholic parents to those with nonalcoholic parents. (d) Our participants identify visual patterns after sitting in a dim room for 1 minute, again after 15 minutes, and again after 30 minutes. (e) To test if creativity scores change with age, we test groups of 5-, 10-, and 15-year-olds. (f) We measure the happiness of some mothers and the number of children they have to determine if happiness and number of children are related. Chapter 12: Understanding the Two-Way Analysis of Variance 217 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 13 CHI SQUARE AND NONPARAMETRIC PROCEDURES LOOKING BACK GOING F O R WA R D Be sure you understand: Your goals in this chapter are to learn: • From Chapter 2, the four types of measurement scales (nominal, ordinal, interval, and ratio). • When to use nonparametric statistics. • From Chapter 9, the independentsamples t-test and the related-samples t-test. • From Chapter 11, the one-way betweensubjects or within-subjects ANOVA. • From Chapter 12, what a two-way interaction indicates. Sections 13-1 Parametric versus Nonparametric Statistics 13-2 13-3 Chi Square Procedures 13-4 The One-Way Chi Square: The Goodness of Fit Test • The logic and use of the one-way chi square. • The logic and use of the two-way chi square. • The names of the nonparametric procedures used with ordinal scores. P revious chapters have discussed the category of inferential statistics called parametric procedures. Now we’ll turn to the other category, called nonparametric statistics. Nonparametric procedures are still used for deciding whether the relationship in the sample accurately represents the relationship in the population. Therefore, H0 and Ha, sampling distributions, Type I and Type II errors, alpha, critical values, and significance all apply. Although a number of different nonparametric procedures are available, we’ll focus on the The Two-Way Chi Square: The Test of Independence most common ones. This chapter presents (1) the one-way chi 13-5 Statistics in the Research Literature: Reporting X 2 procedures for ordinal scores. 13-6 A Word about Nonparametric Procedures for Ordinal Scores 218 square, (2) the two-way chi square, and (3) a brief review of the Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Nicholas Pavloff/Iconica/Getty Images 13-1 PARAMETRIC VERSUS NONPARAMETRIC STATISTICS Previous parametric procedures have required that our dependent scores involve an interval or ratio scale, that the scores are normally distributed, and that the population variances are homogeneous. But sometimes researchers obtain data that do not fit these requirements. Some dependent variables are nominal variables (e.g., whether someone is male or female). Sometimes we can measure a dependent variable only by assigning ordinal scores or “ranks” (e.g., judging this participant as showing the most of an attribute, this one the second-most, and so on). And sometimes a variable involves an interval or ratio scale, but the populations are severely skewed and/or do not have homogeneous variance (e.g., we saw that yearly income forms a positively skewed distribution). It is better to design a study that allows us to use parametric procedures, because they are more powerful than nonparametric procedures. Recall this means we are less likely to make a Type II error, which is missing a relationship that actually exists in nature. But, on the other hand, if our data violate the rules of a parametric procedure, then we increase the probability of making a Type I error (rejecting H0 when it’s true), so that the actual probability of a Type I error will be larger than the alpha level we’ve set. Therefore, when data clearly do not fit a parametric procedure, we turn to nonparametric procedures. Nonparametric statistics are inferential procedures used with either nominal or ordinal data. That is, some nonparametric procedures are appropriate if we originally measure participants using nominal scores. Other nonparametric procedures are appropriate for ordinal scores. However, we have two ways of obtaining such scores. Our original raw scores may indicate each participant’s rank. Or, our original scores may be interval or ratio scores that violate the rules of parametric procedures, so we transform the scores into ranks, assigning the highest score a “1,” the next highest a “2,” and so on. Either way, we then apply a nonparametric procedure for ordinal data. Use nonparametric statistics when dependent scores are measured using ordinal or nominal scales. In published research, the most common nonparametric data is nominal data, and then the chi square procedure is performed. nonparametric statistics Inferential procedures used with nominal or ordinal (ranked) data Chapter 13: Chi Square and Nonparametric Procedures 219 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 13-2 CHI SQUARE PROCEDURES Chi square procedures are performed when participants are measured using a nominal variable. With nominal variables we do not measure an amount, but rather we indicate the category that participants fall into, and then count the number—the frequency—of individuals in each category. Thus, we have nominal variables when counting how many individuals answer yes or no to a question; how many claim to vote Republican, Democratic, or Socialist; how many say they were or were not abused children; and so on. After counting the frequency of “category membership” in the sample, we want to draw inferences about the population. For example, we might find that out of 100 people, 40 say yes to a question and 60 say no. These numbers indicate how the frequencies are distributed in the sample. But can we then infer that if we asked the entire population this question, 40% would also say yes and 60% would say no, or would we see a different distribution, say with a 50-50 split? To make chi square procedure (X 2) inferences about the frequencies in The nonparametric the population, we perform the chi procedure for square procedure (pronounced “kigh testing whether the frequencies of square”). The chi square procedure category membership is the nonparametric procedure for in the sample testing whether the frequencies in represent the predicted each category in sample data repfrequencies in the population resent specified frequencies in the population. The symbol for the chi one-way chi square square statistic is X 2. The procedure for Theoretically, there is no limit to testing whether the the number of categories—levels— frequencies of category membership on one you may have in a variable and no variable represent the limit to the number of variables you predicted distribution in may have. Therefore, we describe a the population chi square design in the same way observed we described ANOVA: When a study frequency (fo) The frequency with has only one variable, we use the onewhich participants fall way chi square; when a study has into a category of a two variables, we use the two-way variable chi square. Use the chi square procedure (x 2) when you count the number of participants falling into different categories. 13-3 THE ONE-WAY CHI SQUARE: THE GOODNESS OF FIT TEST The one-way chi square is computed when data consist of the frequencies with which participants belong to the different categories of one variable. Here is an example. Being right-handed or left-handed is apparently related to brain organization, and many of history’s great geniuses were left-handed. Therefore, using an IQ test, we select a sample of 50 geniuses. Then we count how many are left- or right-handed (ambidextrous is not an option). The results are shown here: Handedness Left-Handers Right-Handers fo 10 fo 40 k2 N total fo 50 Each column contains the frequency with which participants are in that category. We call this the observed frequency, symbolized by fo. The sum of the fos from all categories equals N, the total number of participants in the study. Notice that k stands for the number of categories, or levels, and here k 2. So, 10 of the 50 geniuses (20%) are left-handers, and 40 of them (80%) are right-handers. We might argue that the same distribution of 20% left-handers and 80% right-handers would occur in the population of all geniuses. But there is the usual problem: sampling error. Maybe our sample is unrepresentative, © iStockphoto.com/Hamza Türkkol 220 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. those who disagree d with a statement). (5) For theoretical rreasons, the “expected frequencies” must be at least le 5 per category. 13-3a Computing the One-Way X 2 © iStockphoto.com/Marek Uliasz The first step in computing x 2 is to translate H0 into the expected frequency for each category. The expected frequency is the frequency we expect in a category if the sample data perfectly represent the distribution of frequencies in the population described by H0. The symbol for an expected frequency is fe. Our H0 is that the frequencies of left- and right-handedness are equal. We translate this into the expected frequency in each group based on our N. If our samples perfectly represented equal frequencies, then out o of our 50 participants, 25 should be right-handed and an 25 should be left-handed. Thus, the expected frequency fre in each category is fe 25. When H0 is that the frequencies in the categories are equal, the fe will be the same in all categories, and there’s a shortcut for computing it: © iStockphoto.com/MARIA TOUTOUDAKI so that in the population of all geniuses we would not find this distribution off right- and left-handers. Maybe our results poorly represent some other distribution. Ass usual, this is the null hypothesis, implying that we are being misled by sampling error. Technically, there is a relationship in our data here, because the frequencies change as handedness changes. Usually, researchers test the H0 that there is no difference among the frequencies in the categories in the population, meaning there is no relationship in the population. Forr the moment we’ll ignore that theree are more right-handers than left-handers handers in the world. Therefore, our H0 is that the he frequencies of left- and right-handed geniusess are equal in the population. We have no conventional ntional way to write this in symbols, so simply write rite H0: all frequencies in the population are equal. al. This implies that if the observed frequencies (fo) in the sample are not equal, it is because of sampling ling error. The alternative hypothesis always ays implies that the relationship does exist in the population population, so here it implies that, at a minimum, the frequencies of left- and right-handed geniuses are not equal. However, like in ANOVA, a study may involve more than two levels of a variable, and all levels need not represent a difference in the population. Therefore, our general way of stating Ha is: not all frequencies in the population are equal. For our handedness study, Ha implies that our observed frequencies represent the frequencies of left- and right-handers found in the population of geniuses. We can test only whether the frequencies are or are not equal, so the one-way x 2 tests only two-tailed hypotheses. Also, the one-way x 2 has five assumptions: (1) Participants are categorized along one variable having two or more categories, and we count the frequency in each category. (2) Each participant can be in only one category (i.e., you cannot have repeated measures). (3) Category membership is independent: The fact that an individual is in one category does not influence the probability that another participant will be in any category. (4) We include the responses of all participants in the study (i.e., you would not count only the number of right-handers, or in a different study, you would count both those who agree and THE FORMULA FOR EACH EXPECTED FREQUENCY WHEN TESTING AN H0 OF NO DIFFERENCE IS: fe in each category N k Thus, in our study, with N 50 and k 2, the fe in each category 50/2 25. (Sometimes fe may contain a decimal. For example, if we included a third category, ambidextrous, then k 3, and each fe would be 16.67.) For the handedness study we expected have these frequencies: frequency (fe) The frequency Handedness Left-Handers Right-Handers fo 10 fo 40 fe 25 fe 25 expected in a category if the data perfectly represent the distribution of frequencies described by the null hypothesis Chapter 13: Chi Square and Nonparametric Procedures 221 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. The x 2 compares the difference between our observed frequencies and the expected frequencies. 2 We compute an obtained x 2, which we call xobt . THE FORMULA FOR THE CHI SQUARE IS: x a 2 obt ( fo fe )2 fe b This says to find the difference between fo and fe in each category, square that difference, and then divide it by the fe for that category. After doing this for all 2 categories, sum the quantities, and the answer is xobt . Thus, altogether we have: STEP 1: Compute the fe for each category. We computed our fe to be 25 per category. STEP 2: Create the fraction (fo fe) 2 fe The formula becomes x a 2 obt a ( fo fe) 2 fe for each category. b (10 25) 2 (40 25) 2 ba b 25 25 13-3b Interpreting the One-Way X2 2 We interpret xobt by determining its location on the x 2 sampling distribution. The X 2-distribution contains all possible values of x 2 that occur when H0 is true. Thus, for the handedness study, the x 2-distribution is the distribution of all possible values of x 2 when the frequencies in the two categories in the population are equal. You can envision the x 2-distribution as shown in Figure 13.1. Even though the x 2-distribution is not at all normal, it is used in the same way as previous sampling distributions. When the data perfectly represent the H0 situation, so that each fo equals its fe, then x 2 is zero. However, sometimes by chance the observed frequencies differ from the expected frequencies, producing a x 2 greater than zero. The larger the differences, the larger the x 2. But, the larger the x 2, the less likely it is to occur when H0 is true. Because x 2 can become only larger, we again have two-tailed hypotheses but one region of rejection. 2 To determine if xobt , is significant, we compare it 2 to the critical value, symbolized by xcrit . As with previ2 ous statistics, the x -distribution changes shape as the degrees of freedom change, so to find the appropriate 2 value of xcrit , first determine the degrees of freedom. THE FORMULA FOR THE DEGREES OF FREEDOM IN A ONE-WAY CHI SQUARE IS: df k 1 STEP 3: Perform the subtraction in the numerator of each fraction. After subtracting, 2 xobt a (15) 2 (15) 2 ba b 25 25 Remember that k is the number of categories. Figure 13.1 STEP 4: Square the numerator in each fraction. This gives 2 xobt a 225 225 ba b 25 25 STEP 5: Perform the division in each fraction and then sum the results. X 2-distribution The sampling distribution of values of x 2 that occur when the samples represent the frequencies described by the null hypothesis 222 Sampling Distribution of x 2 When H0 Is True f Region of rejection 2 xobt 9 9 18 so x 18 2 obt STEP 6: Compare 2obt to 2crit This is discussed below. α = .05 0 χ2 χ2 χ2 χ 2 greater than 0 χ2 χ2 χ2 χ2 χ2 χ2 χ2 crit Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. © iStockphoto.com/Evgeny Kuklev Find the critical value of x 2 in Table 6 in Appendix B, titled “Critical Values of Chi Square.” For the handedness study, k 2 so df 1, and with a .05, 2 2 the xcrit 3.84. Our xobt of 18 2 is larger than this xcrit, so the results are significant: The differences between our observed and expected frequencies are so large 2 that they, and the xobt they produce, are unlikely to occur if H0 is true. Therefore, we reject the H0 that our categories are poorly representing a distribution of equal frequencies in the population (rejecting that we are poorly representing that geniuses are equally leftor right-handed). 2 When our xobt is significant, we accept the Ha that the sample represents frequencies in the population that are not equal. In fact, as in our samples, we would expect to find about 20% left-handers and 80% righthanders in the population of geniuses. We conclude that we have evidence of a relationship between the categories of handedness and the frequency with which geniuses fall into each. Then, as usual, we interpret the relationship, here attempting to explain what aspects of being left-handed and being a genius are related. Unlike ANOVA, a significant one-way chi square that involves more than two conditions usually is not followed by post hoc comparisons. Instead, we simply use the observed frequency in each category to estimate the frequencies that would be found in the population. Also, there is no measure of effect size here. 2 had not been significant, we would not If xobt reject H0 and would have no evidence—one way or the other—regarding how handedness is distributed among geniuses. 13-3c The “Goodness of Fit” Test Notice that the one-way chi square procedure is also called the goodness of fit test: Essentially, it tests how “good” the “fit” is between our data and the frequencies we expect if H0 is true. This is simply another way of asking whether sample data are likely to represent the frequencies in the population described by H0. This name is especially appropriate when we test an H0 that does not say the frequencies in all categories are equal in the population. Instead, from past research or from a hypothesis, we may create a “model” of how the frequencies in the different categories are distributed. Then we compute the expected frequencies (fe) using the model and test whether the data “fit” the model. For example, in the handedness study we ignored the fact that right-handers are more common in the real world than left-handers. Only about 10% of the general population is left-handed, so we should have tested whether our geniuses fit this model. Now H0 is that our geniuses are like the general population, being 10% left-handed and 90% right-handed. Our Ha is that the data represent a population of geniuses that does not have this distribution. Each fe is again based on our H0. Say that we had tested our previous 50 geniuses. Our H0 says that left-handed geniuses should occur 10% of the time: 10% of 50 is 5, so fe 5. Right-handed geniuses should occur 90% of the time: 90% of 50 is 45, so fe 45. Then we com2 pute xobt as we did previously: x a 2 obt ( fo fe) 2 fe b (40 45) 2 (10 5) 2 ba b 5.56 a 5 45 2 With a .05 and k 2, the xcrit is again 3.84. There2 fore, the xobt of 5.56 is significant: We reject H0 and conclude that the observed frequengoodness of cies are significantly different from fit test A name what we would expect if handedness for the one-way chi in the population of geniuses was dissquare procedure, tributed as it is in the general populabecause it tests how “good” the “fit” is tion. Instead, we estimate that in the between the data population of geniuses, 20% are leftand H0 handers and 80% are right-handers. Chapter 13: Chi Square and Nonparametric Procedures 223 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 13-4 THE TWO-WAY > Quick Practice > The one-way x2 is used when counting the frequency of category membership on one variable. More Examples Below are the number of acts of graffiti that occur on walls painted white, painted blue, or covered with chalkboard. H0 is frequencies of graffiti are equal in the population. With N 30, fe N/k 30/3 10 for each category. White Blue Chalk fo 8 fe 10 fo 5 fe 10 fo 17 fe 10 2 xobt a (fe fo) 2 fe b (10 8) 2 (10 5)2 (10 17)2 10 10 10 4 25 49 2 xobt 7.80 10 10 10 2 xobt 2 With df k 1 2 and a .05, xcrit 5.99, the wall coverings produce a significant difference in the frequency of graffiti acts. In the population, we expect 27% of graffiti on white walls, 17% on blue walls, and 57% on chalkboard walls. For Practice 1. The one-way chi square is used when we count the ______ that participants fall into different ______. 2. We find fo 21 in category A and fo 39 in category B. Ho is that the frequencies are equal. The fe for A is ______, and the fe for B is ______. 2 3. Compute xobt for question 2. 2 4. The df is ______, so at a .05, xcrit is ______. 2 5. We conclude the xobt is ______, so in the population we expect membership is around _____% in A and around _____% in B. CHI SQUARE: THE TEST OF INDEPENDENCE The two-way chi square is used when we count the frequency of category membership along two variables. (This is similar to the factorial ANOVA discussed in the previous chapter.) The procedure for computing x 2 is the same regardless of the number of categories in each variable. The assumptions of the two-way chi square are the same as for the one-way chi square. 13-4a Logic of the Two-Way Chi Square Here is a study that calls for a two-way chi square. At one time psychologists claimed that someone with a “Type A” personality tends to be very pressured and never seems to have enough time. The “Type B” personality, however, tends not to be so pressured, and is more relaxed and mellow. A controversy developed over whether people with Type A personalities are less healthy, especially when it comes to having heart attacks. Therefore, say that we select a sample of 80 people and determine how many are Type A and how many are Type B. We then count the frequency of heart attacks in each type. We must also count how many in each type have not had heart attacks. Therefore, we have two categorical variables: personality type (A or B) and health (heart attack or no heart attack). Table 13.1 shows the layout of this study. Notice, with two rows and two columns, this is a 2 2 (“2 by 2”) matrix, so we have a 2 2 design. With different variables, the design might be a 2 3, a 3 4, etc. Although this looks like a two-way ANOVA, it is not analyzed like one. The two-way x 2 is also called the test of independence: It tests only whether the frequency of participants falling into the categories of one Table 13.1 A Two-Way Chi Square Design Comparing Participants’ Personality Type and Health > Answers 4. 1; 3.84 5. significant; 35; 65 2 3. xobt (30 21)2 (30 39)2 5.40 30 30 1. frequency; categories 2. Each fe 60/2 30. 224 Health Heart Attack No Heart Attack Personality Type Type A Type B fo fo fo fo Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. two-way chi square The © iStockphoto.com/Eric Isselée © iStockphoto.com/Eric Isselée procedure for testing whether category membership on one variable is independent of category membership on the other variable; also called the “Test of Independence” Table 13.3 Example of Dependence Personality type and heart attacks are perfectly dependent. Personality Type Type A Type B Health Heart Attack fo 40 fo 0 No Heart Attack fo 0 fo 40 © iStockphoto.com/Dino Ablakovic variable is independent of, or unrenrelated to, the frequency of their falling alling into the categories of the other variable. riable. Thus, in our example we will testt whether the frequencies of having or not having aving a heart attack are independent of the frequencies quencies of being Type A or Type B. Essentially, the two-way x 2 tests the interaction, which, as in the two-way ANOVA, tests whether the influence of one factor depends on the level of the other factor that is present. Thus, we’ll ask, “Does the frequency of people having heart attacks depend on their frequency of being Type A or Type B?” To understand “independence,” Table 13.2 shows an example where category membership is perfectly independent. Here, the frequency of having or not having a heart attack does not depend on the frequency of being Type A or Type B. Another way to view the two-way x 2 is as a test of whether a correlation exists between the two variables. When variables are independent, there is no correlation, and using the categories from one variable is no help in predicting the frequencies for the other variable. Here, knowing if people are Type A or Type B does not help to predict if they have heart attacks (and heart attacks do not help in predicting personality type). How However, Table 13.3 shows a pattern we might see when the variables are totally dependent. Here, the freto quency of a heart attack or no heart q attack depends on personality type. at Likewise, a perfect correlation exists Like here because whether people are Type her A or o Type B is a perfect predictor of whether or not they have had a heart wh attack (and vice versa). attac Say that our actual data are shown in Table 13.4. A degree of dependence occurs here because a heart attack tends to be more frequent for Type A, while no Table 13.4 13 4 Observed Frequencies as a Function of Personality Type and Health Table 13.2 Example of Independence Personality type and heart attacks are perfectly independent. Personality Type Type A Type B Health Heart Attack fo 20 fo 20 No Heart Attack fo 20 fo 20 Personality Type Type A Type B Health Heart Attack fo 25 No Heart Attack fo 5 fo 10 fo 40 N 80 Chapter 13: Chi Square and Nonparametric Procedures 225 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 1. H0 is that category membership on one variable is independent of (not correlated with) category membership on the other variable. If the sample data look correlated, this is due to sampling error. 2. Ha is that category membership on the two variables in the population is dependent (correlated). 13-4b Computing the Two-Way Each fe is based on the probability of a participant falling into the cell if the two variables are independent. The expected frequency then equals this probability multiplied by N. Luckily, the steps involved in this can be combined to produce this formula: THE FORMULA FOR THE EXPECTED FREQUENCY IN A CELL OF A TWO-WAY CHI SQUARE IS: fe (Cell>s row total fo )(Cell>s column total fo ) N To find the fe for a particular cell, multiply the total observed frequency for the row containing the cell times the total observed frequency for the column containing the cell. Then divide by the N of the study. Thus, to compute the two-way x 2: STEP 1: Compute the fe for each cell. Table 13.5 shows the computations of all fe for the example. STEP 2: Compute 2obt. Use the same formula as in the one-way design, which is Chi Square Again, the first step is to compute the expected frequencies. To do so, first compute the total of the observed frequencies in each column and the total of the observed frequencies in each row. This is shown in Table 13.5. Also, note N, the total of all observed frequencies. Now we compute the expected frequency in each cell. 2 x obt a Personality Type Type A Type B Health a (5 16.875)2 (40 28.125)2 ba b 16.875 28.125 Heart fo 25 Row fo 10 Attack fe 13.125 fe 21.875 Total 35 (35)(30)/80 (35)(50)/80 No Row fo 5 fo 40 Heart fe 16.875 fe 28.125 Total 45 Attack (45)(30)/80 (45)(50)/80 Column total 30 226 b (25 13.125)2 (10 21.875)2 ba b 13.125 21.875 Diagram Containing fo and fe for Each Cell Each fe equals the row total times the column total, divided by N. fe First form a fraction for each cell: In the numerator square the difference between the fe and fo for the cell. In the denominator is the fe for the cell. Thus, from the data in Table 13.5 we have 2 xobt a Table 13.5 ( fo fe)2 Column total 50 N 80 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. © iStockphoto.com/ heart attack is more frequent for Type B. Therefore, some degree of correlation exists between the variables. On the one hand, we’d like to conclude that this correlation occurs in the population. But, on the other hand, we have the usual problem of sampling error: Perhaps a correlation really does not exist in the population, but by chance we obtained frequencies that poorly represent this, misleading us to believe the correlation exists. The above choices translate into our null and alternative hypotheses. In the two-way chi square: STEP 3: Perform the subtraction in the numerator of each fraction. After subtracting, we have (11.875)2 (11.875)2 2 xobt 13.125 21.875 (11.875)2 (11.875)2 16.875 28.125 having a heart attack depends on the frequency of being Type A or Type B (and vice versa). 2 If xobt is not larger than the critical value, we do not reject H0. Then we cannot say whether these variables are independent or not. phi coefficient (F) The statistic that describes the strength of the relationship in a two-way chi square that is a 2 2 design. STEP 4: Square the numerator in each fraction. This gives 2 xobt 141.016 141.016 13.125 21.875 141.016 141.016 16.875 28.125 STEP 5: Perform the division in each fraction and then sum the results. A significant two-way x 2 indicates that the sample data are likely to represent variables that are dependent (correlated) in the population. 2 xobt 10.74 6.45 8.36 5.01 13-4c Describing the Relationship so in a Two-Way Chi Square 2 xobt 30.56 STEP 6: Compare 2obt to 2crit First, determine the degrees of freedom. In the diagram of your study, count the number of rows and columns. Then: THE FORMULA FOR THE DEGREES OF FREEDOM IN A TWO-WAY CHI SQUARE IS df Number of a b rows 1 a Number of b columns 1 2 2 Our xobt of 30.56 is larger than xcrit , so it is significant: The differences between our observed and expected frequencies are unlikely to occur if our data represent variables that are independent. Therefore, we reject H0 that the variables are independent and accept the alternative hypothesis that the sample represents variables that are dependent in the population. In other words, the correlation is significant such that the frequency of having or not © iStockphoto.com/Paul Kline For our study, df is (2 1) multiplied times (2 1), which is 1. Find the critical value of x 2 in Table 6 in 2 Appendix B. At a .05 and df 1, the xcrit is 3.84. A significant two-way chi square indicates a significant correlation between the two variables. To determine the size of this correlation, we compute one of two new correlation coefficients, either the phi coefficient or the contingency coefficient. If you have performed a 2 2 chi square and it is significant, compute the phi coefficient. The symbol for the phi coefficient is F, and its value can be between 0 and 1. Think of phi as comparing your data to the ideal situations that were illustrated back in Tables 13.2 and 13.3, when the variables are or are not dependent. A value of 0 indicates that the data are perfectly independent. The larger the value of phi, Chapter 13: Chi Square and Nonparametric Procedures 227 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. the closer the data come to being perfectly dependent. (Real research tends to find values in the range of .20 to .50.) THE FORMULA FOR THE PHI COEFFICIENT IS f 2 xobt B N 2 The formula says to divide the xobt by N (the total number of participants) and then find the square root. 2 For the heart attack study, xobt was 30.56 and N was 80, so f 2 xobt B N 30.56 2.382 .62 A 80 Thus, on a scale of 0 to 1, where 1 indicates perfect dependence, the correlation is .62 between the frequency of heart attacks and the frequency of personality types. Further, recall that by squaring a correlation coefficient we obtain the proportion of variance accounted for, which is the proportion of differences in one variable that is associated with the other variable. If we do not compute the square root in the formula above, we have f 2. For our study f 2 .38. This is analogous to r 2, indicating that about 38% of the differences in whether people have heart attacks are associated with differences in their personality type—and vice versa. The other correlation coefficient is the contingency coefficient, symbolized by C. This is used to describe a significant two-way chi square that is not a 2 2 design (when it is a 2 3, a 3 3, etc.). THE FORMULA FOR THE CONTINGENCY COEFFICIENT IS C 2 xobt 2 B N xobt > Quick Practice > > The two-way x2 is used when counting the frequency of category membership on two variables. The H0 is that category membership for one variable is independent of category membership for the other variable. More Examples We count the participants who like or dislike statistics and their gender. The H0 is that liking/disliking is independent of gender. The results are Like Dislike Male fo 20 fe 15 fo 10 fe 15 Total fo 30 Female fo 5 fe 10 fo 15 fe 10 Total fo 20 Total fo 25 Total fo 25 As above, first compute each fe (row total fo)(column total fo) fe N For example, for male–like: fe (30)(25)>50 15. Then 2 xobt a (fo fe)2 fe b (20 15)2 (10 15)2 15 15 (5 10)2 (15 10)2 10 10 2 xobt 8.334 df (Number of Rows 1)(Number of Columns 1) df (2 1)(2 1) 1 2 2 with a .05, xcrit 3.84, so xobt is significant: The frequency of liking/disliking statistics depends on whether participants are male or female. For Practice contingency coefficient (C ) The statistic that describes the strength of the relationship in a twoway chi square that is not a 2 2 design. 228 This says to first add N to the 2 xobt in the denominator. Then divide 2 that quantity into xobt , and then find the square root. Interpret C in the same way as f. Likewise, C 2 is analogous to f 2. 1. The two-way x 2 is used when counting the ______ with which participants fall into the ______ of two variables. 2. The H0 is that the categories of one variable are ______ of those of the other variable. (continued) Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 3. Below are the frequencies for people who are satisfied/dissatisfied with their job and who do/ don’t work overtime. What is the fe in each cell? Overtime N 50 Figure 13.2 Frequencies of (a) Left- and Right-Handed Geniuses and (b) Heart Attacks and Personality Type No Overtime Satisfied fo 11 fo 3 Dissatisfied fo 8 fo 12 2 4. Compute xobt . 40 30 f 20 10 2 5. The df ______ and xcrit ______. 0 6. What do you conclude about these variables? Left Right Handedness (a) > Answers 45 40 35 30 f 25 20 15 10 5 0 Heart attacks No heart attacks A B Personality type (b) 2 6. xobt is significant: The frequency of job satisfaction/ dissatisfaction depends on the frequency of overtime/no overtime. 5. 1; 3.84 (11 7.824)2 (3 6.176)2 2 4. xobt 7.824 6.176 (8 11.176)2 (12 8.824)2 4.968 11.176 8.824 3. For satisfied–overtime, fe 7.824; for satisfied–no overtime, fe 6.176; for dissatisfied–overtime, fe 11.176; for dissatisfied–no overtime, fe 8.824. 2. independent 1. frequency; categories 13-5 STATISTICS IN THE RESEARCH LITERATURE: REPORTING X 2 The chi square is reported like previous results, except that in addition to df, we also include the N. For example, in our one-way design involving geniuses and handedness, we tested an N of 50, df was 1, and 2 the significant xobt was 18. We report these results as x2(1, N 50) 18.00, p .05 We report a two-way x 2 using the same format. As usual, a graph is useful for summarizing the data. For a one-way design, label the Y axis with frequency and the X axis with the categories, and then plot the fo in each category. Because the X variable is nominal, we create a bar graph. The upper bar graph in Figure 13.2 shows the results of our handedness study. The lower graph in Figure 13.2 shows the bar graph for our heart attack study. To graph the data from a two-way design, place frequency on the Y axis and one of the nominal variables on the X axis. The levels of the other variable are indicated in the body of the graph. (This is similar to the way a two-way interaction was plotted in the previous chapter, except that here we create bar graphs.) 13-6 A WORD ABOUT NONPARAMETRIC PROCEDURES FOR ORDINAL SCORES Recall that we also have nonparametric procedures that are used with ordinal (rank-ordered) scores. The procedures are the same regardless of whether the Chapter 13: Chi Square and Nonparametric Procedures 229 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. original raw scores were ranks or were interval or ratio scores that you then transformed to ranks. The coefficient that Although the computations for describes the linear nonparametric procedures are difrelationship between ferent from those for parametric pairs of ranked scores procedures, their logic and rules are Mann–Whitney the same. A relationship occurs here test The nonparametric version when the ordinal scores consistently of the independentchange. For example, in an experisamples t-test for ment we might see the scores change ranked scores from predominantly low ranks in one Wilcoxon test condition (with many participants The nonparametric tied at, or near, 1st, 2nd, etc.) to preversion of the relatedsamples t-test for dominantly higher ranks in another ranked scores condition (with many participants at Kruskal– or near, say, 20th). The null hypothWallis test esis says our data show this pattern The nonparametric because of sampling error: We are version of the one-way between-subjects poorly representing that no relationANOVA for ranked ship occurs in the population where scores each condition contains a mix of Friedman test high and low ranks. The alternative The nonparametric hypothesis says our data reflect the version of the one-way within-subjects ANOVA relationship that would be found in for ranked scores the population. We test H0 by computing an obtained statistic that describes our data. By comparing it to a critical value, we determine whether the sample relationship is significant. If it is, then our data are so unlikely to occur when H0 is true that we reject that H0 was true for our study. Instead, we conclude that the predicted relationship exists in the population (in nature). If the data are not significant, we retain H0 and make no conclusion about the relationship, one way or the other. In the literature you will encounter a number of nonparametric procedures. The computations for each Spearman correlation coefficient (rs) Perform nonparametric procedures for ranked data when the dependent variable is measured in, or transformed to, ordinal scores. are found in more advanced textbooks (or you can use SPSS). We won’t dwell on their computations because you are now experienced enough to compute and understand them if you encounter them. However, you should know when we use the most common procedures. 13-6a Common Nonparametric Procedures for Ranked Scores 1. The Spearman correlation coefficient is analogous to the Pearson correlation coefficient for ranked data. Its symbol is rs. It produces a number between {1 that describes the strength and type of linear relationship that is present when data consist of pairs of X-Y scores that are both ordinal scores. If significant, rs estimates the corresponding population coefficient. 2. The Mann–Whitney test is analogous to the independent-samples t-test. It is performed when a study contains two independent samples of ordinal scores. 3. The Wilcoxon test is analogous to the relatedsamples t-test. It is performed when a study has two related samples of ordinal scores. Recall that related samples occur either through matching or through repeated measures. PCN Photography/Alamy 4. The Kruskal–Wallis test is analogous to a one-way between-subjects ANOVA. It is performed when a study has one factor with at least three conditions, and each involves independent samples of ordinal scores. 230 5. The Friedman test is analogous to a one-way within-subjects ANOVA. It is performed when a study has one factor with at least three levels, and each involves related samples of ordinal scores. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. USING SPSS Check out Review Card 13.4 for instructions on using SPSS to perform the one-way or the two-way chi square procedure. The program computes x 2obt and provides the usual minimum a level. In the two-way design, the program also computes f or C. Also, SPSS will perform the nonparametric procedures for ordinal scores discussed in this chapter. Consult an advanced SPSS text.) Need some extra practice? Be sure to complete all study problems at the end of each chapter. Tear out and use the Chapter Review Cards in the back of your book. Check out the additional study aids online in CourseMate at www.cengagebrain.com STUDY PROBLEMS (Answers for odd-numbered problems are in Appendix C.) 1. What do all nonparametric inferential procedures have in common with all parametric procedures? 2. (a) Which variable in an experiment determines whether to use parametric or nonparametric procedures? (b) Which two scales of measurement always require nonparametric procedures? 3. (a) What two things can be “wrong” with interval or ratio scores that lead us to use nonparametric procedures? (b) What must you do to the interval/ ratio scores first? 4. (a) Why, if possible, should you design a study that meets the assumptions of a parametric test instead of a nonparametric test? (b) Explain the error this relates to. (c) Why shouldn’t you use parametric procedures if data violate the assumptions? (d) Explain this error. 5. What do researchers do to create a design requiring a one-way chi square? 6. What do researchers do to create a design requiring a two-way chi square? 7. (a) What is the major difference between two studies if one uses the one-way ANOVA and the other uses the one-way chi square? (b) How is the purpose of the ANOVA and the chi square the same? 8. (a) What is the symbol for observed frequency and what does it refer to? (b) What is the symbol for expected frequency and what does it refer to? 2 9. (a) When calculating xobt what makes it become a 2 larger number? (b) Why does a larger xobt mean that H0 is less likely to be true? (c) What does the x 2 sampling distribution show? (d) Why do we 2 reject H0 when xobt is in the region of rejection and significant? 10. (a) Usually what is H0 in a one-way chi square? (b) How do we interpret a significant one-way chi square? (c) What is H0 in a two-way chi square? (d) How do we interpret a significant two-way chi square? 11. What are the two ways to go about computing the fe in a one-way x 2 depending upon our experimental hypotheses? 12. (a) When is the phi coefficient computed, and when is the contingency coefficient computed? (b) What do both indicate? 13. A survey of 89 women finds that 34 prefer to go out with men much taller than themselves, and 55 prefer going out with men slightly taller than themselves. We ask whether there is really no preference in the population. (a) What procedure should we perform? (b) In words, what are H0 and Ha? (c) What must you 2 compute before calculating xobt and what are 2 your answers? (d) Compute xobt and what do you conclude about the preferences of women in the population? (e) Describe how you would graph these results. 14. A report about an isolated community claims there are more newborn females than males, although we assume equal frequencies for each. Records indicate 628 boys and 718 girls born in the past month. (a) What are H0 and Ha? (b) Compute the appropriate statistic. (c) What do you conclude about birthrates in this community? (d) Report your results in the correct format. Chapter 13: Chi Square and Nonparametric Procedures 231 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 15. The following data reflect the frequency with which people voted in the last election and were satisfied with the officials elected. We wonder if voting and satisfaction are correlated. Satisfied Voted Yes No Yes 48 35 No 33 52 20. (a) What is the name of the nonparametric correlation coefficient for ranked data and what is its symbol? (b) What do you use it for? (a) What procedure should we perform? (b) What are H0 and Ha? (c) What must you 2 compute before calculating xobt and what 2 answers did you compute? (d) Compute xobt . (e) What do you conclude about these variables? (f) How consistent is this relationship? 16. As part of the above study, we also counted the frequency of the different political party affiliations for men and women to see if they are related. The following data were obtained: Affiliation Republican Democrat Gender Other Men 18 43 14 Women 39 23 18 (a) What procedure should we perform? (b) What are H0 and Ha? (c) What do you compute next and 2 what are your answers? (d) Compute xobt . (e) What do you conclude about gender and party affiliation in the population? (f) How consistent is this relationship? 17. In the general population, political party affiliation is 30% Republican, 55% Democrat, and 15% Other. To determine whether these percentages also “fit” the elderly population, we ask a sample of 100 senior citizens and find 26 Republicans, 66 Democrats, and 8 Other. (a) What procedure should we perform? (b) What are H0 and Ha? (c) What must you compute next and what are your answers? 2 (d) Compute xobt . (e) What do you conclude about party affiliation among senior citizens? 2 18. After testing 40 participants, a significant xobt of 13.31 was obtained. With a .05 and df 2, how would this result be reported in a publication? 232 19. Foofy counts the students who like Professor Demented and those who like Professor Randomsampler. She then performs a one-way x 2 to test for a significant difference between the frequencies of students liking each professor. (a) Why is this approach incorrect? (Hint: Check the assumptions of the one-way x 2.) (b) How should she analyze the data? 21. What is the nonparametric version of each of the following? (a) a one-way between-subjects ANOVA; (b) an independent-samples t-test; (c) a related-samples t-test; (d) a one-way within-subjects ANOVA 22. A researcher performed the Mann–Whitney test and found a significant difference between psychologists and sociologists. Without knowing anything else, what does this tell you about the researcher’s ultimate conclusions about sampling error versus a relationship in nature? 23. Select the nonparametric procedure to perform for the following: (a) We test the effect of a pain reliever on rankings of the emotional content of words describing pain. One group is tested before and after taking the drug. (b) We test the effect of four different colors of spaghetti sauce on its tastiness. A different sample tastes each color, and tastiness scores are ranked. (c) Last semester a teacher gave 25 As, 35 Bs, 20 Cs, and 10 Ds. According to college policy, each grade should occur 20% of the time. Is the teacher diverging from the college’s model? (d) We examine the (skewed) reaction time scores after one group of participants consumes 1, 3, and then 5 alcoholic drinks. (e) We test whether two levels of family income produced a difference in the percentage of income spent on clothing last year. Percentages are then ranked. 24. (a) How do you recognize when you need to perform x 2? (b) How do you recognize whether to perform a one-way or a two-way x 2? (c) Summarize the steps when performing the x 2 procedure. 25. Thinking back on this and previous chapters, what three aspects of the design of your independent variable(s) and one aspect of your dependent variable determine the specific inferential procedure to perform in a particular experiment? Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. WHY CHOOSE? Every 4LTR Press solution comes complete with a visually engaging textbook in addition to an interactive eBook. Go to CourseMate for Behavioral Sciences STAT2 to begin using the eBook. Complete the Speak Up survey in CourseMate at www.cengagebrain.com Follow us at www.facebook.com/4ltrpress Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ©iStockphoto.com/A-Digit Access at www.cengagebrain.com appendix A Sections A-1 Review of Basic Math A-2 Computing Confidence Intervals for the TwoSample t-Test A-3 Computing the Linear Regression Equation A-4 Computing the Two-Way Between-Subjects ANOVA A-5 Computing the One-Way Within-Subjects ANOVA 234 MATH REVIEW AND ADDITIONAL COMPUTING FORMULAS A-1 REVIEW OF BASIC MATH The following is a review of the math used in performing statistical procedures. There are accepted systems for identifying mathematical operations, for rounding answers, for computing a proportion and a percent, and for creating graphs. A-1a Identifying Mathematical Operations Here are the mathematical operations you’ll use in statistics, and they are simple ones. Addition is indicated by the plus sign, so for example, 4 2 is 6. (I said this was simple!) Subtraction is indicated by the minus sign. We read from left to right, so X Y is read as “X minus Y.” This order is important because 10 4, for example, is 6, but 4 10 is 6. With subtraction, pay attention to what is subtracted from what and whether the answer is positive or negative. Adding two negative numbers together gives a larger negative number, so 4 3 7. Adding a negative number to a positive number is the same as subtracting by the negative’s amount, so 5 2 5 2 3. When subtracting a negative number, a double negative produces a positive. Thus, in 4 3, the minus 3 becomes 3, so we have 4 3 7. X We indicate division by forming a fraction, such as . The number Y above the dividing line is called the numerator, and the number below the line is called the denominator. Always express fractions as decimals, dividing the denominator into the numerator. (After all, 1/2 equals .5, not 2!) Multiplication is indicated in one of two ways. We may place two components next to each other: XY means “X times Y.” Or we may indicate multiplication using parentheses: 4(2) and (4)(2) both mean “4 times 2.” The symbol X 2 means square the score, so if X is 4, X 2 is 16. Conversely, 2X means “Find the square root of X,” so 24 is 2. (The symbol 2 also means “Use your calculator.”) Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. A-1b Determining the Order of Mathematical Operations Statistical formulas often call for a series of mathematical steps. Sometimes the steps are set apart by parentheses. Parentheses mean “the quantity,” so always find the quantity inside the parentheses first and then perform the operations outside of the parentheses on that quantity. For example, (2)(4 3) indicates to multiply 2 times “the quantity 4 plus 3.” So first add, which gives (2)(7), and then multiply to get 14. A square root sign also operates on “the quantity,” so always compute the quantity inside the square root sign first. Thus, 22 7 means find the square root of the quantity 2 7; so 22 7 becomes 29, which is 3. Most formulas are giant fractions. Pay attention to how far the dividing line is drawn because the length of a dividing line determines the quantity that is in the numerator and the denominator. For example, you might see a formula that looks like this: X 6 3 14 264 The longer dividing line means you should divide the square root of 64 into the quantity in the numerator, so first work in the numerator. Before you can add 6/3 to 14, you must reduce 6/3 by dividing 6/3 2. Then you have X 2 14 264 Now adding 2 14 gives 16, so X 16 264 Before we can divide we must find the square root of 64, which is 8, so we have X 16 2 8 When working with complex formulas, perform one step at a time and then rewrite the formula. Trying to do several steps in your head is a good way to make mistakes. If you become confused in reading a formula, remember that there is a rule for the order of mathematical operations. Often this is summarized with PEMDAS, or you may recall the phrase “Please Excuse My Dear Aunt Sally.” Either way, the letters indicate that, unless otherwise indicated, first compute inside any Parentheses, then compute Exponents (squaring and square roots), then Multiply or Divide, and finally, Add or Subtract. Thus, for (2)(4) 5, first multiply 2 times 4 and then add 5. For 22 32, first square each number, resulting in 4 9, which is then 13. Finally, an important distinction is whether a squared sign is inside or outside of parentheses. Thus, in (22 32) we square first, giving (4 9), so the answer is 13. But! For (2 3)2 we add first, so 2 3 5 and then squaring gives (5)2, so the answer is 25! A-1c Working with Formulas We use a formula to find an answer, and we have symbols that stand for that answer. For example, in the formula B AX K, the B stands for the answer we will obtain. The symbol for the unknown answer is always isolated on one side of the equal sign, but we will know the numbers to substitute for the symbols on the other side of the equal sign. For example, to find B, say that A 4, X 11, and K 3. In working any formula, the first step is to copy the formula and then rewrite it, replacing the symbols with their known values. Thus, start with B AX K Filling in the numbers gives B 4(11) 3 Rewrite the formula after performing each mathematical operation. Above, multiplication takes precedence over addition, so multiply and then rewrite the formula as B 44 3 After adding, B 47 Do not skip rewriting the formula after each step! A-1d Rounding Numbers Close counts in statistics, so you must carry out calculations to the appropriate number of decimal places. Usually, you must “round off” your answer. The rule is this: Always carry out calculations so that your final Appendix A: Math Review and Additional Computing Formulas 235 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. answer after rounding has two more decimal places than the original scores. Usually, we have wholenumber scores (e.g., 2 and 11) so the final answer contains two decimal places. But say the original scores contain one decimal place (e.g., 1.4 and 12.3). Here the final answer should contain three decimal places. So when beginning a problem, first decide the number of decimals that should be in your final answer. However, if there are intermediate computing steps, do not round off to this number of decimals at each step. This will produce substantial error in your final answer. Instead, before rounding, carry out each intermediate step to more decimals than you’ll ultimately need. Then the error introduced will be smaller. If the final answer is to contain two decimal places, round off your intermediate answers to at least three decimal places. Then after you’ve completed all calculations, round off the final answer to two decimal places. Round off your final answer to two more decimal places than are in the original scores. To round off a calculation use the following rules: If the number in the next decimal place is 5 or greater, round up. For example, to round to two decimal places, 2.366 is rounded to 2.370, which becomes 2.37. If the number in the next decimal place is less than 5, round down: 3.524 is rounded to 3.520, which becomes 3.52. We add zeros to the right of the decimal point to indicate the level of precision we are using. For example, rounding 4.966 to two decimal places produces 5, but to show we used the precision of two decimal places, we report it as 5.00. A-1e Computing Proportions and Percents Sometimes we will transform an individual’s original score into a proportion. A proportion is a decimal number between 0 and 1 that indicates a fraction 236 of the total. To transform a number to a proportion, divide the number by the total. If 4 out of 10 people pass an exam, then the proportion of people passing the exam is 4/10, which equals .4. Or, if you score 6 correct on a test out of a possible 12, the proportion you have correct is 6/12, which is .5. We can also work in the opposite direction from a known proportion to find the number out of the total it represents. Here, multiply the proportion times the total. Thus, to find how many questions out of 12 you must answer correctly to get .5 correct, multiply .5 times 12, and voilà, the answer is 6. We can also transform a proportion into a percent. A percent (or percentage) is a proportion multiplied by 100. Above, your proportion correct was .5, so you had (.5)(100) or 50% correct. Altogether, to transform the original test score of 6 out of 12 to a percent, first divide the score by the total to find the proportion and then multiply by 100. Thus (6/12) (100) equals 50%. To transform a percent back into a proportion, divide the percent by 100 (above, 50/100 equals .5). Altogether, to find the test score that corresponds to a certain percent, transform the percent to a proportion and then multiply the proportion times the total number possible. Thus, to find the score that corresponds to 50% of 12, transform 50% to a proportion, which is .5 and then multiply .5 times 12. So, 50% of 12 is equal to (50/100)(12), which is 6. Recognize that a percent is a whole unit: Think of 50% as 50 of those things called percents. On the other hand, a decimal in a percent is a proportion of one percent. Thus, .2% is .2, or two-tenths, of one percent, which is .002 of the total. A-1f Creating Graphs Recall that the horizontal line across the bottom of a graph is the X axis, and the vertical line at the lefthand side is the Y axis. (Draw the Y axis so that it is about 60 to 75% of the length of the X axis.) Where the two axes intersect is always labeled as a score of zero on X and a score of zero on Y. On the X axis, scores become larger positive scores as you move to the right. On the Y axis, scores become larger positive scores as you move upward. Say that we measured the height and weight of several people. We decide to place weight on the Y axis and height on the X axis. (How to decide this is discussed later.) We plot the scores as shown in Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Figure A.1. Notice that because the lowest height score is 63, the lowest label on the X axis is also 63. The symbol // in the axis indicates that we cut out the part between 0 and 63. We do this with either axis when there is a large gap between 0 and the lowest score we are plotting. In the body of the graph, we plot the scores from the table above the graph. Jane is 63 inches tall and weighs 130 pounds, so we place a dot above the height of 63 and opposite the weight of 130. And so on. As mentioned in Chapter 2, each dot on a graph is called a data point. Notice that you read the graph by using the scores on one axis and the data points. For example, to find the weight of the person who has a height of 67, travel vertically from 67 to the data point and then horizontally to the Y axis: 165 is the corresponding weight. Always label the X and Y axes to indicate what the scores measure (not just X and Y), and always give your graph a title indicating what it describes. Practice these concepts with the following practice problems. Figure A.1 For Practice 1. (a) 13.7462 (b) 10.043 (c) 10.047 (d) .079 (e) 1.004 2. The intermediate answers in a formula based on whole-number scores are X 4.3467892 and Y 3.3333. What values of X and Y do we use when performing the next step in the calculations? 3. For Q (X Y)(X 2 Y 2), find Q when X 8 and Y 2. 4. Below, find D when X 14 and Y 3. Da Height Weight Jane Bob Mary Tony Sue Mike 63 64 65 66 67 68 130 140 155 160 165 170 Using the formula in problem 4, find D when X 9 and Y 4. 6. (a) What proportion is 5 out of 15? (b) What proportion of 50 is 10? (c) One in a thousand equals what proportion? 7. Transform each answer in problem 6 to a percent. 8. Of the 40 students in a gym class, 35% played volleyball and 27.5% ran track. (a) What proportion of the class played volleyball? (b) How many students played volleyball? (c) How many ran track? 9. You can earn a total of 135 points in your statistics course. To pass you need 60% of these points. (a) How many points must you earn to pass the course? (b) You actually earned a total of 115 points. What percent of the total did you earn? 10. Weight (in pounds) 170 160 150 140 130 0 Jane 63 64 65 66 Height (in inches) 67 68 XY b 12X 2 Y 5. Plot of Height and Weight Scores Person Round off the following numbers to two decimal places: Create a graph showing the data points for the following scores: X Score Student’s Age 20 25 35 45 25 40 45 Y Score Student’s Test Score 10 30 20 60 55 70 3 Appendix A: Math Review and Additional Computing Formulas 237 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. > Answers 15 Test scores the difference between our two sample means. This procedure is appropriate when the sample means are from independent samples. For example, in Chapter 9 we discussed the experiment that compared recall scores under the conditions of Hypnosis and Nohypnosis. We found a difference of 3 between the sample means 1X1 X2 2. If we could examine the corresponding m1 and m2, we’d expect that their difference (m1 m2) would be around 3. We say “around” because we may have sampling error, so the actual difference between m1 and m2 might be 2 or 4. The confidence interval contains the highest and lowest values around 3 that the difference between our sample means is likely to represent. THE FORMULA FOR THE CONFIDENCE INTERVAL FOR THE DIFFERENCE BETWEEN TWO ms IS (sX X )(tcrit ) (X1X2) m1m2 10. 20 25 30 Age 35 40 45 80 70 60 50 40 30 20 10 0 Plot of Students’ Age and Test Scores 9. (a) 60% of 135 is (60%/100)(135) 81 (b) (115/135)(100) 85% 8. (a) 35%/100 .35 (b) (.35)(40) 14 (c) (27.5%/100)(40) 11 7. (a) 33% (b) 20% (c) .1% 1 2 (sX X )(tcrit) (X1X2) 1 2 Here, m1 m2 stands for the unknown difference we are estimating. The tcrit is the two-tailed value found for the appropriate a at df (n1 1) (n2 1). The values of sX X and (X1 X2) are computed in the independent-samples t-test. 1 2 13 5. D =a b(3)=( 3.250)(3)= 9.75 4 6. (a) 5/15 .33 (b) 10/50 .20 (c) 1/1000 .001 4. D (3.667)(3.742) 13.72 3. Q (8 2)(64 4) (6)(68) 408 2. Carry at least three places, so X 4.347 and Y 3.333. 1. (a) 13.75 (b) 10.04 (c) 10.05 (d) .08 (e) 1.00 A-2 COMPUTING CONFIDENCE INTERVALS FOR THE TWO-SAMPLE t-TEST Two versions of a confidence interval can be used to describe the results from the two-sample t-test described in Chapter 9. For the independent-samples t-test we compute the confidence interval for the difference between two ms; for the related-samples t-test we compute the confidence interval for mD. A-2a Confidence Interval for the Difference between Two Ms The confidence interval for the difference between two ms describes a range of differences between two ms, any one of which is likely to be represented by 238 The confidence interval for the difference between two ms describes the difference between the population means represented by the difference between our sample means in the independent-samples t-test. In the hypnosis study, the two-tailed tcrit for df 30 and a .05 is 2.042, sX X is 1.023, and X1 X2 is 3. Filling in the formula gives 1 2 (1.023)(2.042) (3) m1 m2 (1.023)(2.042) (3) Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Multiplying 1.023 times ±2.042 gives 2.089 (3) m1 m2 2.089 (3) So finally, .911 m1 m2 5.089 Because a .05, this is the 95% confidence interval: We are 95% confident that the interval between .911 and 5.089 contains the difference we’d find between the ms for no-hypnosis and hypnosis. In essence, if someone asked how big is the average difference between when the population recalls under hypnosis and when it recalls under no-hypnosis, we’d be 95% confident the difference is between .91 and 5.09. The tcrit is the two-tailed value for df N 1, where N is the number of difference scores. The sD is the standard error of the mean difference computed in the t-test, and D is the mean of the difference scores. For example, in Chapter 9 we compared the fear scores of participants who had or had not received our phobia therapy. We found that the mean difference score in the sample was D 3.6, sD 1.25, and with a = .05 and df 4, tcrit is ±2.776. Filling in the formula gives (1.25)(2.776) 3.6 mD (1.25)(2.776) 3.6 which becomes (3.47) 3.6 mD (3.47) 3.6 and so For Practice 1. In question 13 of the study problems in Chapter 9, what is the 95% confidence interval for the difference between the ms? 2. In question 21 of the study problems in Chapter 9, what is the 95% confidence interval for the difference between the ms? > Answers .13 mD 7.07 Thus, we are 95% confident that our sample mean of differences represents a population mD within this interval. In other words, if we performed this study on the entire population, we would expect the average difference in before- and after-therapy scores to be between .13 and 7.07. For Practice 2. (1.03)(2.101) 2.6 m1 m2 (1.03)(2.101) 2.6 4.76 m1 m2 .44 1. In question 15 of the study problems in Chapter 9, what is the 95% confidence interval for mD? 1. (1.78)(2.048) 4 m1 m2 (1.78)(2.048) 4 .35 m1 m2 7.65 2. In question 17 of the study problems in Chapter 9, what is the 95% confidence interval for mD? > Answers 2. (.359)(2.262) 1.2 mD (.359)(2.262) 1.2 .39 mD 2.01 A-2b Computing the Confidence Interval for MD THE FORMULA FOR THE CONFIDENCE INTERVAL FOR mD IS (sD )(tcrit) D mD (sD )(tcrit) D 1. (.75)(2.365) 2.63 mD (.75)(2.365) 2.63 .86 mD 4.40 The other confidence interval is used with the relatedsamples t-test to describe the m of the population of difference scores (mD) that is represented by our sample of difference scores (D). The confidence interval for mD describes a range of values of mD, one of which our sample mean is likely to represent. The interval contains the highest and lowest values of mD that are not significantly different from D. A-3 COMPUTING THE LINEAR REGRESSION EQUATION As discussed in Chapter 10, computing the linear regression equation involves computing two components: the slope and the Y intercept. First we compute the slope of the regression line, which is symbolized by b. This is a number that Appendix A: Math Review and Additional Computing Formulas 239 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. mathematically conveys the direction and the amount that the regression line is slanted. A negative number indicates a negative linear relationship; a positive number indicates a positive relationship. A slope of 0 indicates no relationship. THE FORMULA FOR THE SLOPE OF THE REGRESSION LINE IS N(XY ) (X )(Y ) b N(X 2) (X )2 N is the number of pairs of scores in the sample, and X and Y are the scores in our data. Notice that the numerator of the formula here is the same as the numerator of the formula for r, and that the denominator of the formula here is the left-hand quantity in the denominator of the formula for r. [An alternative formula is b (r)(SY /SX).] For example, in Chapter 10 we examined the relationship between daily juice consumption and yearly doctor visits and found r .95. In the data (in Table 10.1) we found X 17, (X)2 289, X2 45, Y 47, XY 52, and N 10. Filling in the above formula for b gives: b 10(52) (17)(47) 520 799 10(45) 289 450 289 279 1.733 161 Thus, in this negative relationship we have a negative slope of b 1.733. Next we compute the Y intercept, symbolized by a. This is the value of Y when the regression line crosses the Y axis. (Notice that a negative number is also possible here if the regression line crosses the Y axis at a point below the X axis.) THE FORMULA FOR THE Y INTERCEPT OF THE REGRESSION LINE IS a Y (b)(X ) Here we first multiply the mean of the X scores times the slope of the regression line. Then we subtract that quantity from the mean of the Y scores. For our example, X 1.70, Y 4.70, and b 1.733, so a 4.70 (1.733)(1.70) 4.70 (2.946) 240 Subtracting a negative number is the same as adding its positive value, so a 4.70 2.946 7.646 Thus, when we plot the regression line, it will cross the Y axis at the Y of 7.646. A-3a Applying the Linear Regression Equation We apply the slope and the Y intercept in the linear regression equation. THE LINEAR REGRESSION EQUATION IS Y bX a This says to obtain the Y for a particular X, multiply the X by b and then add a. For our example data, substituting our values of b and a, we have Y (1.733)X 7.646 This is the equation for the regression line that summarizes our juicedoctor visits data. To plot the regression line: We need at least two data points to plot a line. Therefore, choose a low value of X, insert it into the completed regression equation, and calculate the value of Y. Choose a higher value of X and calculate Y for that X. (Do not select values of X that are above or below those found in your original data.) Plot your values of X-Y and connect the data points with a straight line. To predict a Y score: To predict an individual’s Y score, enter his/her X score into the completed regression equation and compute the corresponding Y. This is the Y score we predict for anyone who scores at that X. A-3b The Standard Error of the Estimate We compute the standard error of the estimate to determine the amount of error we expect to have when we use a relationship to predict Y scores. Its symbol is SY. THE FORMULA FOR THE STANDARD ERROR OF THE ESTIMATE IS sY (SY )( 21r 2) This says to find the square root of the quantity 1 r2 and then multiply it times the standard deviation of all Y scores (SY). In our juicedoctor visits data in Table 10.1, Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. the Y 47, Y 2 275, and N 10. Thus, first we compute SY. (Y ) 2 (47)2 275 N 10 N R 10 Y 2 sY R 25.41 2.326 Our r in these data is .95, so the standard error of the estimate is sY (SY ) 121r 2 2 (2.326) 121(.95)22 (2.326) 121.90252 (2.326)(.312).73 Thus, in this relationship, SY =.73. It indicates that we expect the “average error” in our predictions to be .73 when we use this regression equation to predict Y scores. For example, if someone’s Y score is 2, we expect to be “off” by about .73, so we expect his or her actual Y score will be between 1.27 and 2.73. For Practice 1. Compute the regression equation for each set of scores. a. X 1 1 2 2 3 3 Y 3 2 4 5 5 6 b. X 1 1 2 2 3 4 Y 5 3 4 3 2 1 2. What will the standard error of the estimate indicate for each set of scores? > Answers A-4 COMPUTING THE TWO-WAY BETWEENSUBJECTS ANOVA As discussed in Chapter 12, the following presents the formulas for computing the two-way between subjects ANOVA, the Tukey HSD test for main effects and interactions, and h2. A-4a Computing the ANOVA Chapter 12 discusses a 3 2 design for the factors of volume of a message and participants’ gender, and the dependent variable of persuasiveness. Organize the data as shown in Table A.1. The ANOVA involves five parts: computing (1) the sums and means, (2) the sums of squares, (3) the degrees of freedom, (4) the mean squares, and (5) the Fs. COMPUTING THE SUMS AND MEANS STEP 1: Compute X and X2 in each cell. Note the n of the cell. For example, in the malesoft cell, X 4 9 11 24; X2 42 92 112 218; n 3. Also, compute the mean in each cell (for the malesoft cell, X 8). These are the interaction means. STEP 2: Compute X vertically in each column of the study’s diagram. Add the Xs from the cells in a column (e.g., for soft, X 24 12). Note the n in each column (here, n 6) and compute the mean for each column (e.g., Xsoft 6). These are the main effect means for factor A. STEP 3: Compute X horizontally in each row of the diagram. Add the Xs from the cells in a row (for males, X 24 33 50 107). Note the n in each row (here, n 9). Compute the mean for each row (e.g., Xmale 11.89). These are the main effect means for factor B. STEP 4: Compute Xtot. Add the Xs from the levels (columns) of factor A, so Xtot 36 69 68 173. STEP 5: Compute X2tot. Add the X2s from all cells, so X2tot 218 377 838 56 470 116 2075. Note N 18. 2. It indicates the “average error” we expect between the actual Y scores and the predicted Y scores. 6(32) (13)(18) b. b 1.024; 6(35) (13)2 a 3 (1.024)(2.167) 5.219; Y 1.042X 5.219 6(56) (12)(25) 1. a. b = = 1.5; 6(28) (12)2 a 4.167 (1.5)(2) 1.17; Y 1.5X 1.17 Appendix A: Math Review and Additional Computing Formulas 241 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Table A.1 Summary of Data for 3 2 ANOVA A1: Soft Factor A: Volume A2: Medium A3: Loud 4 9 11 8 12 13 18 17 15 X8 B1: Male Factor B: Gender B2: Female X 33 X 2 377 X 50 X 2 838 n3 2 6 4 n3 9 10 17 n3 6 8 4 X4 X 12 X6 242 X 7.33 n3 n3 n3 Xsoft 6 Xmed 11.5 Xloud 11.33 Xtot 173 X 36 X 69 X 68 X tot2 2075 n6 n6 n6 N 18 d (173)2 d 2057 1662.72 18 412.28 n9 X 18 X 2 116 This says to divide (Xtot)2 by N and then subtract the answer from X2tot. From Table A.1, Xtot 173, X2tot 2075, and N 18. Filling in the formula gives SStot 2075 c X 107 X 36 X 2 470 THE FORMULA FOR THE TOTAL SUM OF SQUARES IS N Xmale 11.89 X 12 X 2 56 STEP 1: Compute the total sum of squares. 2 SStot X tot c X 16.67 X 24 X 2 218 COMPUTING THE SUMS OF SQUARES (Xtot)2 X 11 X 66 n9 Note: (Xtot)2/N above is also used later and is called the correction (here, the correction equals 1662.72). STEP 2: Compute the sum of squares for factor A. Always have factor A form your columns. THE FORMULA FOR THE SUM OF SQUARES BETWEEN GROUPS FOR COLUMN FACTOR A IS SSA c c (X in the column)2 d n of scores in the column (Xtot)2 N d This says to square the Xs in each column of the study’s diagram, divide by the n in the column, add the answers together, and subtract the correction. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 362 182 b 1662.72 3 3 From Table A.1 the column sums are 36, 69, and 68, and n was 6, so SSA a 362 692 682 1732 ba b 6 6 6 18 (216 793.5 770.67) 1662.72 SSA 1780.17 1662.72 117.45 STEP 3: Compute the sum of squares between groups for factor B. Factor B should form the rows. THE FORMULA FOR THE SUM OF SQUARES BETWEEN GROUPS FOR ROW FACTOR B IS SSB c c (X in the row)2 d n of scores in the row (Xtot)2 N d This says to square the Xs for each row of the diagram and divide by the n in the level. Then add the answers and subtract the correction. In Table A.1, the row sums are 107 and 66, and n was 9, so SSB a 1072 662 b 1662.72 9 9 1756.11 1662.72 93.39 STEP 4: Compute the sum of squares between groups for the interaction. First, compute the overall sum of squares between groups, SSbn. THE FORMULA FOR THE OVERALL SUM OF SQUARES BETWEEN GROUPS IS (X in the cell)2 SSbn c d n of scores in the cell c (Xtot)2 N d Find (X)2 for each cell and divide by the n of the cell. Then add the answers together and subtract the correction. From Table A.1 SSbn a 242 332 502 122 3 3 3 3 SSbn 1976.33 1662.72 313.61 To find SSA B, subtract the sum of squares for both main effects (in Steps 2 and 3) from the overall SSbn. Thus, THE FORMULA FOR THE SUM OF SQUARES BETWEEN GROUPS FOR THE INTERACTION IS SSA B SSbn SSA SSB In our example, SSbn 313.61, SSA 117.45, and SSB 93.39, so SSA B 313.61 117.54 93.39 102.77 STEP 5: Compute the sum of squares within groups. Subtract the overall SSbn in Step 4 from the SStot in Step 1 to obtain the SSwn. THE FORMULA FOR THE SUM OF SQUARES WITHIN GROUPS IS SSwn SStot SSbn Above, SStot = 412.28 and SSbn = 313.61, so SSwn 412.28 313.61 98.67 COMPUTING THE DEGREES OF FREEDOM STEP 1: The degrees of freedom between groups for factor A is kA 1, where kA is the number of levels in factor A. (In our example, kA is the three levels of volume, so dfA 2.) STEP 2: The degrees of freedom between groups for factor B is kB 1, where kB is the number of levels in factor B. (In our example, kB is the two levels of gender, so dfB 1.) STEP 3: The degrees of freedom between groups for the interaction is the df for factor A multiplied by the df for factor B. (In our example, dfA 2 and dfB 1, so dfA B 2.) STEP 4: The degrees of freedom within groups equals N kA B, where N is the total N of the study and kA B is the number of cells in the study. (In our example, N is 18 and we have six cells, so dfwn 18 6 12.) Appendix A: Math Review and Additional Computing Formulas 243 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. STEP 5: The degrees of freedom total equals N 1. Use this to check your previous calculations, because the sum of the above dfs should equal dftot. (In our example, dftot 17.) Table A.2 Summary Table of Two-Way ANOVA with df and Sums of Squares Place each SS and df in the ANOVA summary table as shown in Table A.2. Perform the remainder of the computations using this table. COMPUTING THE MEAN SQUARES STEP 1: Compute the mean square between groups for factor A. Source Between Factor A (volume) Factor B (gender) Interaction (vol Within Total gen) Sum of Squares df 117.45 93.39 102.77 98.67 412.28 2 1 2 12 17 Mean Square F MSA MSB FA FB MSA B MSwn FA B THE FORMULA FOR THE MEAN SQUARE BETWEEN GROUPS FOR FACTOR A IS MSA SSA STEP 4: Compute the mean square within groups. dfA THE FORMULA FOR THE MEAN SQUARE WITHIN GROUPS IS From Table A.2, 117.45 MSA 58.73 2 STEP 2: Compute the mean square between groups for factor B. THE FORMULA FOR THE MEAN SQUARE BETWEEN GROUPS FOR FACTOR B IS MSB SSwn dfwn Thus, we have MSwn 98.67 8.22 12 SSB COMPUTING F dfB STEP 1: Compute the Fobt for factor A. In our example MSB MSwn 93.39 93.39 1 THE FORMULA FOR THE MAIN EFFECT OF FACTOR A IS FA STEP 3: Compute the mean square between groups for the interaction. MSA MSwn In our example, we have THE FORMULA FOR THE MEAN SQUARE BETWEEN GROUPS FOR THE INTERACTION IS MSA B SSA dfA B B FA 58.73 7.14 8.22 STEP 2: Compute the Fobt for factor B. THE FORMULA FOR THE MAIN EFFECT OF FACTOR B IS Thus, we have 102.77 MSA B 51.39 2 244 FB MSB MSwn Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Thus, FB three effects are significant: FA (7.14) is larger than its Fcrit (3.88), FB (11.36) is larger than its Fcrit (4.75), and FA B (6.25) is larger than its Fcrit (3.88). 93.39 11.36 8.22 A-4b Performing the Tukey STEP 3: Compute the Fobt for the interaction. HSD Test THE FORMULA FOR THE INTERACTION EFFECT IS FA B MSA Perform post hoc comparisons on any significant Fobt. If the ns in all levels are equal, perform Tukey’s HSD procedure. However, the procedure is computed differently for an interaction than for a main effect. B MSwn Thus, we have FA B PERFORMING TUKEY’S HSD TEST ON MAIN EFFECTS Perform the HSD on each main effect, using the procedure described in Chapter 11 for a one-way design. 51.39 6.25 8.22 And now the finished summary table is in Table A.3. THE FORMULA FOR THE HSD IS INTERPRETING EACH F Determine whether each Fobt HSD (qk)a is significant by comparing it to the appropriate Fcrit. To find each Fcrit in the F-table (Table 4 in Appendix B), use the dfbn and the dfwn used in computing the corresponding Fobt. 1. To find Fcrit for testing FA, use dfA as the df between groups and dfwn. In our example, dfA 2 and dfwn 12. So for a .05, the Fcrit is 3.88. 2. To find Fcrit for testing FB, use dfB as the df between groups and dfwn. In our example, dfB 1 and dfwn 12. So at a .05, the Fcrit is 4.75. 3. To find Fcrit for the interaction, use dfA B as the df between groups and dfwn. In our example, dfA B 2 and dfwn 12. Thus, at a .05, the Fcrit is 3.88. Interpret each Fobt as you have previously: If an Fobt is larger than its Fcrit, the corresponding main effect or interaction effect is significant. For the example, all Table A.3 Completed Summary Table of Two-Way ANOVA Source Between Factor A (volume) Factor B (gender) Interaction (vol Within Total gen) Sum of Squares df Mean Square 117.45 93.39 102.77 2 1 2 58.73 93.39 51.39 98.67 412.28 12 17 8.22 MSwn b B n The MSwn is from the two-way ANOVA, and qk is found in Table 5 of Appendix B for dfwn and k (where k is the number of levels in the factor). The n in the formula is the number of scores in a level. Be careful here: For each factor there may be a different value of n and of k. In the example, six scores went into each mean for a level of volume (each column), but nine scores went into each mean for a level of gender (each row). The n is the n in each group that you are presently comparing! Also, because qk depends on k, when factors have a different k, they have different values of qk. After computing the HSD for a factor, find the difference between each pair of its main effect means. Any difference that is larger than the HSD is significant. In the example, for volume, n 6; MSwn 8.22; and with a .05, k 3, and dfwn 12, the qk is 3.77. Thus, the HSD is 4.41. The main effect mean for soft (6) differs from the means for medium (11.5) and loud (11.33) by more than 4.41, so these are significant differences. The means for medium and F loud, however, differ by less than 4.41, so they do not differ significantly. No HSD is 7.14 needed for the gender factor. 11.36 6.25 PERFORMING TUKEY’S HSD TEST ON INTERACTION EFFECTS The post hoc comparisons for a significant interaction involve the cell means. However, as discussed in Chapter 12, we perform only Appendix A: Math Review and Additional Computing Formulas 245 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. unconfounded comparisons, in which two cells differ along only one factor. Therefore, we find the differences only between the cell means within the same column or within the same row. Then we compare the differences to the HSD. However, when computing the HSD for an interaction, we find qk using a slightly different procedure. Previously, we found qk in Table 5 using k, the number of means being compared. For an interaction, we first determine the adjusted k. This value “adjusts” for the actual number of unconfounded comparisons you will make. Obtain the adjusted k from Table A.4 (or at the beginning of Table 5 of Appendix B). In the left-hand column locate the design of your study. Do not be concerned about the order of the numbers. We called our persuasiveness study a 3 2 design, so look at the row labeled “2 3.” Reading across that row, confirm that the middle column contains the number of cell means in the interaction (we have 6). In the righthand column is the adjusted k (for our study it is 5). The adjusted k is the value of k to use to obtain qk from Table 5. Thus, for the persuasiveness study with a .05, dfwn 12, and in the column labeled k 5, the qk is 4.51. Now compute the HSD using the same formula used previously. In each cell are 3 scores, so HSD (qk)a 7.47 MSwn 8.22 b (4.51)a b B n A 3 The HSD for the interaction is 7.47. The differences between our cell means are shown in Table A.5. On the line connecting any two cells is the absolute difference between their means. Any difference between two means that is larger than the Table A.4 Table A.5 Table of the Interaction Cells Showing the Difference between Unconfounded Means Factor B: Gender Factor A: Volume A1: Soft A2: Medium A3: Loud 8.0 11.0 16.67 3.0 5.67 1.0 10.67 4.0 8.67 B1: Male 4.0 B2: Female 12.0 8.0 6.0 6.0 2.0 HSD 7.47 HSD is a significant difference. Only three differences are significant: (1) between the mean for females at the soft volume and the mean for females at the medium volume, (2) between the mean for males at the soft volume and the mean for males at the loud volume, and (3) between the mean for males at the loud volume and the mean for females at the loud volume. A-4c Computing H2 In the two-way ANOVA, we again compute eta squared (h2) to describe effect size—the proportion of variance in dependent scores that is accounted for by a relationship. Compute a separate h2 for each significant main and interaction effect. THE FORMULA FOR ETA SQUARED IS h2 SSbn SStot Values of Adjusted k Design of Study 2 2 246 Number of Cell Means in Study 4 Adjusted Value of k 3 2 2 3 4 6 8 5 6 3 3 9 7 3 4 4 4 12 16 8 10 4 5 20 12 Here, we divide the SStot into the sum of squares between groups for each significant effect, either SSA, SSB, or SSA B. For example, for our factor A (volume), SSA was 117.45 and SStot was 412.28. Therefore, the h2 is .28. Thus, the main effect of changing the volume of a message accounts for 28% of our differences in persuasiveness scores. For the gender factor, SSB is 93.39, so h2 is .23: The conditions of male or female account for an additional 23% of the variance in scores. Finally, for the interaction, SSA B is 102.77, so h2 .25: The particular combination of gender and volume we created in our cells accounts for an additional 25% of the differences in persuasiveness scores. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 2 1 > Answers 2 1 e. Compute the effect size where appropriate. 1 2 d. What do you conclude about the relationships that this study demonstrates? 1 2 c. Perform the appropriate post hoc comparisons. b. For factor A, X1 8.9, X2 10.1; for factor B, X1 11.9, X2 7.1; for the interaction, XA B 9.0, XA B 8.8, XA B 14.8, XA B 5.4. b. Compute the main effect means and interaction means. c. Because factor A is not significant and factor B contains only two levels, such tests are unnecessary for them. For A B, adjusted k 3, so qk 3.65, HSD (3.65)126.05>52 4.02; the only significant differences are between males and females tested by a male, and between females tested by a male and females tested by a female. a. Using a .05, perform an ANOVA and complete the summary table. d. Conclude that a relationship exists between gender and test scores when testing is done by a male, and that male versus female experimenters produce a relationship when testing females, p < .05. Level B2: Female 105.8 .33. 325 Factor B: Experimenter B, h2 Level B1: Male Factor A: Participants Level A1: Level A2: Males Females 6 8 11 14 9 17 10 16 9 19 8 4 10 6 9 5 7 5 10 7 115.2 .35; for A 325 1. A study compared the performance of males and females tested by either a male or a female experimenter. Here are the data: e. For B, h2 For Practice A-5 COMPUTING THE ONE-WAY WITHIN-SUBJECTS ANOVA This section contains formulas for the one-way withinsubjects ANOVA discussed in Chapter 11. (However, it also involves the concept of an interaction described in Chapter 12, which is briefly explained here.) This ANOVA is used when either the same participants are measured repeatedly or different participants are matched under all levels of one factor. (Statistical terminology still uses the old-fashioned term subjects instead of the more modern participants.) The other assumptions are (1) the dependent variable is a normally distributed ratio or interval variable and (2) the population variances are homogeneous. A-5a Logic of the One-Way Within-Subjects ANOVA As an example, say we’re interested in whether one’s form of dress influences how comfortable one feels in a social setting. On three consecutive days, we ask participants to “greet” people arriving for a different experiment. On day one, participants dress casually; For each factor, df 1 and 16, so Fcrit 4.49: Factor B and the interaction are significant, p < .05. Between groups Factor A Factor B Interaction Within groups Total 1. a. Source 7.20 115.20 105.80 96.80 325.00 Sum of Squares 1 1 1 16 df 7.20 115.20 105.80 6.05 Mean Square 1.19 19.04 17.49 F Appendix A: Math Review and Additional Computing Formulas 247 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. on another day, Table A.6 they dress semiforOne-Way Repeated-Measures Study of the Factor of Type of Dress mally; on another day, they dress forFactor A: Type of Dress mally. Each day Level A1: Level A2: Level A3: participants comCasual Semiformal Formal plete a questionnaire measuring the Xsub 14 1 4 9 1 dependent variable Xsub 21 2 6 12 3 of their comfort Xsub 16 3 8 4 4 level. Our data are Subjects shown in Table A.6. Xsub 15 4 2 8 5 As usual, we test Xsub 19 5 10 7 2 whether the means from the levels repTotal: resent different ms. X 30 X 40 X 15 Xtot 30 40 15 85 Therefore, H0: m1 2 2 2 X 220 X 354 X 55 X tot2 220 354 55 629 m2 m3, and Ha: Not all ms are equal. n1 5 n2 5 n3 5 N 15 Notice that this k3 X1 6 X2 8 X3 3 one-way ANOVA can be viewed as a two-way ANOVA: Factor A (the columns) is one factor, and the different participants or subjects (the rows) are a second facComputing the One-Way tor, here with five levels. That is, essentially we creWithin-Subjects ANOVA ated a “condition” when we combined Subject 1 and Casual dress, producing a score of 4. This situSTEP 1: Compute the X, the X, and the X2 for ation is different from when we combined Subject each level of factor A (each column). Then 2 with Casual dress, producing a score of 6. As discompute Xtot and X2tot. Also, compute cussed in Chapter 12, such a combined condition Xsub, which is the X for each participant’s is called a “cell,” and combining subjects with type scores (each row). of dress creates the “interaction” between these two Then follow these steps. variables. In Chapter 12, we computed F by dividing by STEP 2: Compute the total sum of squares. the mean square within groups (MSwn). This was an estimate of the variability in the population. We comTHE FORMULA FOR THE TOTAL SUMS OF puted MSwn using the differences between the scores SQUARES IS in condition or cell and their mean. However, in Table (Xtot)2 A.4, each cell contains only one score. Therefore, the 2 SS X a b tot tot mean of each cell is the score in the cell, and the difN ferences within a cell are always zero. So, we cannot compute MSwn in the usual way. From the example, we have Instead, the mean square for the interaction (85)2 between factor A and subjects (abbreviated MSA subs) SS 629 a b tot reflects the variability of scores. It is because of the 15 variability among people that the effect of type of SStot 629 481.67 147.33 dress will change as we change the “levels” of which Note that the quantity (Xtot)2 >N is the correction in participant we test. Therefore, MSA subs is our estithe following computations. (Here, the correction is mate of the variance in the scores, and it is used as the 481.67.) denominator of the F-ratio. 1 2 3 4 5 A-5b 248 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. STEP 3: Compute the sum of squares for the column factor, factor A. In the example, SSA THE FORMULA FOR THE SUM OF SQUARES BETWEEN GROUPS IS SSA c c 147.33 63.33 11.33 72.67 subs STEP 6: Determine the degrees of freedom. (Sum of scores in the column)2 d n of scores in the column THE DEGREES OF FREEDOM BETWEEN GROUPS FOR FACTOR A IS (Xtot)2 dfA kA 1 N d kA is the number of levels of factor A. In the example, kA 3, so dfA is 2. Find X in each level (column) of factor A, square the sum, and divide by the n of the level. After doing this for all levels, add the results together and subtract the correction. In the example, THE DEGREES OF FREEDOM FOR THE INTERACTION IS dfA 302 402 152 SSA a b 481.67 5 5 5 (kA1)(ksubs1) subs kA is the number of levels of factor A, and ksubs is the number of participants. In the example with three levels of factor A and five subjects, dfA subs (2)(4) 8. SSA 545 481.67 63.33 STEP 4: Find the sum of squares for the row factor of subjects. STEP 7: Find the mean squares for factor A and the interaction. THE FORMULA FOR THE SUM OF SQUARES FOR SUBJECTS IS SSsubs (Xsub1)2 (Xsub2)2 g (Xn)2 (Xtot)2 ka THE FORMULA FOR THE MEAN SQUARE FOR FACTOR A IS MSA N MSA 142 212 162 152 192 481.67 3 SSsubs 493 481.67 11.33 STEP 5: Find the sum of squares for the interaction by subtracting the sums of squares for the other factors from the total. THE FORMULA FOR THE INTERACTION OF FACTOR A BY SUBJECTS IS SSA SStot SSA SSsubs subs dfA In our example, Square the sum for each subject (Xsub). Then add the squared sums together. Next, divide by ka, the number of levels of factor A. Finally, subtract the correction. In the example, SSsubs SSA SSA dfA 63.33 31.67 2 THE FORMULA FOR THE MEAN SQUARE FOR THE INTERACTION OF FACTOR A BY SUBJECTS IS MSA subs SSA dfA subs subs In our example, MSA subs SSA dfA subs subs 72.67 9.08 8 Appendix A: Math Review and Additional Computing Formulas 249 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. STEP 8: Find Fobt. c. With a .05, what do you conclude about Fobt? THE FORMULA FOR THE WITHIN-SUBJECTS F-RATIO IS Fobt e. What is the effect size in this study? MSA MSA subs In the example, Fobt MSA MSA d. Perform the post hoc comparisons. subs 31.67 3.49 9.08 The finished summary table is Sum of Source Squares Subjects 11.33 Factor A (dress) 63.33 Interaction 72.67 (A subjects) Total 147.33 df 4 2 8 Mean Square 31.67 9.08 F 3.49 14 STEP 9: Find the critical value of F in Table 5 of Appendix B. Use dfA as the degrees of freedom between groups and dfA subs as the degrees of freedom within groups. In the example for a .05, dfA 2, and dfA subs 8, the Fcrit is 4.46. Interpret the above Fobt the same way you did in Chapter 11. Our Fobt is not larger than Fcrit, so it is not significant. Had Fobt been significant, then at least two of the means from the levels of type of dress differ significantly. Then, for post hoc comparisons, graphing, eta squared, and confidence intervals, follow the procedures discussed in Chapter 11. However, in any of those formulas, in place of the term MSwn use MSA subs. For Practice 1. We study the influence of practice on eye-hand coordination. We test people with no practice, 1 hour of practice, or 2 hours of practice. a. What are H0 and Ha? Subjects 1 2 3 4 5 6 7 8 Amount of Practice None 1 Hour 2 Hours 4 3 6 3 5 5 1 4 3 3 4 6 1 5 6 2 6 7 2 4 5 1 3 8 2. You measure 21 students’ degrees of positive attitude toward statistics at four equally spaced intervals during the semester. The mean score for each level is: time 1, 62.50; time 2, 64.68; time 3, 69.32; and time 4, 72.00. You obtain the following sums of squares: Source Subjects Factor A Sum of Squares 402.79 189.30 688.32 A subjects Total 1280.41 df Mean Square F a. What are H0 and Ha? b. Complete the ANOVA summary table c. With a .05, what do you conclude about Fobt? d. Perform the appropriate post hoc comparisons. e. What is the effect size in this study? f. What should you conclude about this relationship? b. Complete the ANOVA summary table. 250 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Appendix A: Math Review and Additional Computing Formulas 251 1. a. H0: m1 = m2 = m3; Ha: Not all ms are equal. b. SStot 477 392.04; SSA 445.125 392.04; 1205 b 392.04 3 and SSsubs a Source Subjects Factor A A subjects Total Sum of Squares df 9.63 7 53.08 2 22.25 14 84.96 c. With dfA 2 and dfA Fobt is significant. subs Mean Square F f. Attitudes during the second half of the semester are significantly higher than during the first half, but this is a small to moderate effect. 2. a. H0: m1 m2 m3 m4; Ha: Not all ms are equal. b. 26.54 1.59 Source Subjects Factor A 16.69 23 A subjects Total 14, the Fcrit is 3.74. The d. The qk 3.70 and HSD 1.65. The means for 0, 1, and 2 hours are 2.13, 4.25, and 5.75, respectively. Significant differences occurred between 0 and 1 hour and between 0 and 2 hours, but not between 1 and 2 hours. e. Eta squared (h2) 53.08 .62. 84.96 > Answers Sum of Squares df 402.79 20 189.30 3 688.32 60 1280.41 c. With dfA 3 and dfA Fobt is significant. subs Mean Square 63.10 11.47 F 5.50 83 60, the Fcrit is 2.76. The d. The qk 3.74 and HSD 2.76. The means at time 1 and time 2 differ from those at times 3 and 4, but time 1 and time 2 do not differ significantly, and neither do times 3 and 4. e. Eta squared (h2) 189.30>1280.41 .15. appendix B STATISTICAL TABLES Sections Table 1 Proportions of Area under the Standard Normal Curve: The z-Table Table 2 Critical Values of t: The t-Table Table 3 Critical Values of the Pearson Correlation Coefficient: The r-Table Table 4 Critical Values of F: The F-Table Table 5 Values of Studentized Range Statistic, qk Table 6 Critical Values of Chi Square: The X2-Table 252 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Table 1 Proportions of Area under the Standard Normal Curve: The z-Table Column (A) lists z-score values. Column (B) lists the proportion of the area between the mean and the z-score value. Column (C) lists the proportion of the area beyond the z-score in the tail of the distribution. (Note: Because the normal distribution is symmetrical, areas for negative z-scores are the same as those for positive z-scores.) A z B B B B B B C C ⫺z X ⫹z B Area between Mean and z C C ⫺z X ⫹z B Area between Mean and z C C ⫺z X ⫹z B Area between Mean and z C Area beyond z in Tail A z C Area beyond z in Tail A z C Area beyond z in Tail (continued) Appendix B: Statistical Tables 253 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Table 1 (cont.) Proportions of Area under the Standard Normal Curve: The z-Table A z 254 B B B B B B C C ⫺z X ⫹z B Area between Mean and z C C ⫺z X ⫹z B Area between Mean and z C C ⫺z X ⫹z B Area between Mean and z C Area beyond z in Tail A z C Area beyond z in Tail A z C Area beyond z in Tail Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Table 1 (cont.) Proportions of Area under the Standard Normal Curve: The z-Table A z B B B B B B C C ⫺z X ⫹z B Area between Mean and z C C ⫺z X ⫹z B Area between Mean and z C C ⫺z X ⫹z B Area between Mean and z C Area beyond z in Tail A z C Area beyond z in Tail A z C Area beyond z in Tail Appendix B: Statistical Tables 255 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Table 2 Critical Values of t : The t-Table (Note: Values of —tcrit ⫽ values of ⫹ tcrit.) Two-Tailed Test –tcrit 0 One-Tailed Test 0 +tcrit Alpha Level df A ⴝ .05 +tcrit Alpha Level A ⴝ .01 df A ⴝ .05 A ⴝ .01 From Table 12 of E. Pearson and H. Hartley, Biometrika Tables for Statisticians, Vol. 1, 3rd ed. Cambridge: Cambridge University Press, 1966. Reprinted with the permission of the Biometrika Trustees. 256 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Table 3 Critical Values of the Pearson Correlation Coefficient: The r-Table Two-Tailed Test –rcrit 0 One-Tailed Test 0 +rcrit Alpha Level df (no. of pairs ⴚ 2) A ⴝ .05 A ⴝ .01 +rcrit Alpha Level df (no. of pairs ⴚ 2) A ⴝ .05 A ⴝ .01 From R. A. Fisher and F. Yates, Statistical Tables for Biological, Agricultural and Medical Research, 6th ed. Copyright © 1963, R. A. Fisher and F. Yates. Reprinted by permission of Pearson Education Limited. Appendix B: Statistical Tables 257 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Table 4 Critical Values of F: The F-Table Critical values for a ⫽ .05 are in dark numbers. Critical values for a ⫽ .01 are in light numbers. Degrees of Freedom within Groups (degrees of freedom in denominator of F-ratio) 258 Fcrit 0 Degrees of Freedom between Groups (degrees of freedom in numerator of F-ratio) α 1 2 3 4 5 6 7 8 9 10 11 12 14 16 20 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Table 4 (cont.) Critical Values of F: The F-Table Degrees of Freedom within Groups (degrees of freedom in denominator of F-ratio) Degrees of Freedom between Groups (degrees of freedom in numerator of F-ratio) α 1 2 3 4 5 6 7 8 9 10 11 12 14 16 20 (continued) Appendix B: Statistical Tables 259 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Table 4 (cont.) Critical Values of F: The F-Table Degrees of Freedom within Groups (degrees of freedom in denominator of F-ratio) Degrees of Freedom between Groups (degrees of freedom in numerator of F-ratio) α 1 2 3 4 5 6 7 8 9 10 11 12 14 16 20 From G. Snedecor and W. Cochran, Statistical Methods, 8th edition. Copyright © 1989 by the Iowa State University Press. 260 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Table 5 Values of Studentized Range Statistic, qk For a one-way ANOVA, or a comparison of the means from a main effect, the value of k is the number of means in the factor. To compare the means from an interaction, find the appropriate design (or number of cell means) in the table below and obtain the adjusted value of k. Then use adjusted k as k to find the value of qk. Values of Adjusted k Design of Study Number of Cell Means in Study 2⫻2 2⫻3 2⫻4 3⫻3 3⫻4 4⫻4 4⫻5 Adjusted Value of k 4 6 8 9 12 16 20 3 5 6 7 8 10 12 Values of qk for a ⫽ .05 are dark numbers and for a ⫽ .01 are light numbers. Degrees of Freedom within Groups (degrees of freedom in denominator of F-ratio) k ⴝ Number of Means Being Compared α 2 3 4 5 6 7 8 9 10 11 12 (continued) Appendix B: Statistical Tables 261 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Table 5 (cont.) Values of Studentized Range Statistic, qk Degrees of Freedom within Groups (degrees of freedom in denominator of F-ratio) k ⴝ Number of Means Being Compared α 2 3 4 5 6 7 8 9 10 11 12 From B. J. Winer, Statistical Principles in Experimental Design, McGraw-Hill, Copyright © 1962. Reproduced by permission of the McGraw-Hill Companies, Inc. 262 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Table 6 Critical Values of Chi Square: The χ2-Table 2 χ crit 0 Alpha Level df A ⴝ .05 A ⴝ .01 From R. A. Fisher and F. Yates, Statistical Tables for Biological, Agricultural and Medical Research, 6th ed. Copyright © 1963, R. A. Fisher and F. Yates. Reprinted by permission of Pearson Education Limited. Appendix B: Statistical Tables 263 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. appendix C ANSWERS TO ODD-NUMBERED STUDY PROBLEMS Chapter 1 1. To understand the laws of nature pertaining to the behaviors of living organisms. 3. (a) It is the large group of individuals (or scores) to which we think a law of nature applies. (b) It is a subset of the population that is actually measured and that represents or stands in for the population. (c) Assuming the sample is representative, we use the scores and behaviors observed in the sample to infer the scores and behaviors that would be found in the population. (d) The behavior of everyone in a specified group in nature. 5. (a) A relationship exists when as the scores on one variable change, the scores on the other variable change in a consistent fashion. (b) No consistent pattern of change occurs with virtually the same batch of Y scores occurring at every X. 7. (a) It accurately reflects the scores and relationship in the population. (b) It is different from and inaccurately reflects the data found in the population. (c) The luck of the draw of the particular participants selected for the sample. 9. In an experiment, the researcher actively controls and manipulates one variable (the independent variable). In a correlational study, the researcher passively measures participants’ scores on two variables. 11. The independent variable is the overall variable the researcher is interested in; the conditions are the specific amounts or categories of the independent variable under which participants are tested. 13. (a) Statistics describe an aspect of a sample, and parameters describe an aspect of a population. (b) Statistics are symbolized by English letters, and parameters are symbolized by Greek letters. 15. (a) A continuous variable allows for fractional amounts. A discrete variable measures fixed 264 17. 19. 21. 23. amounts that cannot be broken into smaller amounts. (b) Nominal and ordinal scales are assumed to be discrete; interval and ratio scales are assumed to be continuous. Researcher A has an experiment because alcohol consumption is manipulated. Researcher B has a correlational study because both variables are simply measured. (a) The independent variable is volume of music. The conditions are whether the music is soft, loud, or absent. The dependent variable is the final exam score. (b) The independent variable is size of the college. The conditions are small, medium, and large. The dependent variable is the amount of fun had. (c) The independent variable is birth order. The conditions are being born first, second, or third. The dependent variable is level of intelligence. (d) The independent variable is length of exposure to the lamp. The conditions are 15 or 60 minutes. The dependent variable is amount of depression. (e) The independent variable is wall color. The conditions are blue, green, or red walls. The dependent variable is aggression. Sample A (as X increases, Y increases) and Sample D (as X increases, Y tends to decrease). We see a group of similar dependent scores in one condition, and a different group of similar scores in the next condition, and so on. 25. Variable Personality type Academic major Number of minutes before and after an event Continuous or Discrete Discrete Discrete Continuous Type of Measurement Scale Nominal Nominal Interval Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Variable Restaurant ratings (best, next best, etc.) Speed Dollars in your pocket Change in weight Savings account balance Reaction time Letter grades Clothing size Registered voter Therapeutic approach Schizophrenia type Work absences Words recalled Continuous or Discrete Discrete Type of Measurement Scale Ordinal Continuous Discrete Ratio Ratio 11. 13. 15. Continuous Discrete Interval Ratio Continuous Discrete Discrete Discrete Discrete Ratio Ordinal Ordinal Nominal Nominal 17. 19. Discrete Nominal Discrete Discrete Ratio Ratio Chapter 2 1. (a) N is the number of scores in a sample. (b) f is the frequency of a score or scores. 3. (a) In a bar graph adjacent bars do not touch; in a histogram they do. (b) Bar graphs are used with nominal or ordinal scores; histograms are used with interval or ratio scores. 5. (a) A histogram has a bar above each score; a polygon has data points above the scores that are connected by straight lines. (b) Histograms are used with a few different interval or ratio scores; polygons are used with many different interval/ ratio scores. 7. (a) Simple frequency is the number of times a score occurs; relative frequency is the proportion of time the score occurs. (b) Cumulative frequency is the number of scores at or below a particular score; percentile is usually defined as the percent of the scores below a particular score. 9. (a). A skewed distribution has one distinct tail; a normal distribution has two. (b) A bimodal distribution has two distinct humps above the two highest-frequency scores; a normal 21. distribution has one hump and one highestfrequency score. For relative frequency, we find the proportion of the total area under the curve at the specified scores. For percentile, we find the proportion of the total area under the curve that is to the left of a particular score. (a) Bar graph. (b) Polygon. (c) Bar graph. (d) Histogram. (a) The most frequent salaries tend to be in the middle to high range, with relatively few extremely low salaries. (b) Yours is one of the lowest, least common salaries. (a) 35% of the sample scored below you. (b) Your score occurred 40% of the time. (c) It is one of the highest and least frequent scores. (d) It is one of the lowest and least frequent scores. (e) 50 participants had either this score or a score below it. (f) 60% of the area under the curve is to the left of (below) your score. (a) 70, 72, 60, 85, 45. (b) Because .20 of the area under the curve is to the left of 60, it’s at the 20th percentile. (c) With .50 of the area under the curve to the left of 70, .50 of the sample is below 70. (d) With .50 of the area under the curve below 70, and .20 of the area under the curve below 60, then .50 .20 .30 of the area under the curve is between 60 and 70. (e) .20. (f) With .50 below 70 and .30 between 80 and 70, a total of .50 .30 .80 of the curve is below 80, so it is at the 80th percentile. Score 53 52 51 50 49 48 47 f 1 3 2 5 4 0 3 Relative Frequency .06 .17 .11 .28 .22 .00 .17 23. (a) Bar graph; for a nominal (categorical) variable. (b) Polygon; for many different ratio scores. (c) Histogram; for only 8 different ratio scores. (d) Bar graph; for an ordinal variable. 25. (a) These variables are assumed to be discrete, and the spaces between bars communicate a discrete variable. (b) These variables are assumed to be continuous, and the lines between data points communicate that the variable continues between the plotted X scores. Appendix C: Answers to Odd-Numbered Study Problems 265 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 1. (a) It indicates where on a variable most scores tend to be located. (b) The mode, median, and mean. 3. The mode is the most frequently occurring score; it is used with nominal scores. 5. The mean is the average score, the mathematical center of a distribution; it is used with normal distributions of interval or ratio scores. 7. The distribution is positively skewed, with the mean pulled toward the extreme high scores in the tail. 9. (a) It is the symbol for a score’s deviation from the mean. (b) It is the symbol for the sum of the deviations around the mean. (c) Compute all deviations in the sample and then find their sum. (d) The mean is the center, so the positive deviations cancel out the negative deviations, producing a total of zero. 11. (a)X 638, N 11, X 58. (b) The mode is 58. 13. (a) A 7, B 2, C 0, D 1, E 5. (b) 5, 1, 0, 2, 7. (c) 0, 1, 2, 5, 7. 15. (a) Mean. (b) Median (these ratio scores are skewed). (c) Mode (this is a nominal variable). (d) Median (this is an ordinal variable). 17. He is incorrect if the variable is something on which it is undesirable to have a high score (e.g., number of errors on a test). In that case, being below the mean with a negative deviation is better. 19. (a) The independent variable. (b) The dependent variable. (c) It is the variable manipulated by the researcher that supposedly influences a behavior. (d) It measures participants’ behavior that is expected to be influenced by the independent variable. 21. Mean errors do not change until there have been 5 hours of sleep deprivation. Mean errors then increase as sleep deprivation increases. 23. (a) Reading the graph left to right, the mean scores on the Grumpy Test decrease as sunlight increases. (b) Individual Grumpy raw scores tend to decrease as sunlight increases. (c) The populations of Emotionality scores and, therefore, the ms would tend to decrease as sunlight increases. (d) Yes, these data provide evidence of a relationship in nature. 25. (a) The means for conditions 1, 2, and 3 are 15, 12, and 9, respectively. 266 (b) Mean productivity Chapter 3 15 14 13 12 11 10 9 8 Low Medium Noise level High (c) µ for high noise f 7 8 9 µ for medium noise 10 13 11 12 Productivity scores µ for low noise 14 15 (d) You have evidence for a relationship where, as noise level decreases, the typical productivity score increases from around 9 to around 12 to around 15. Chapter 4 1. (a) There are larger and/or more frequent differences among the scores. (b) The behaviors are more inconsistent. (c) The distribution is wider, more spread out. 3. The shape of the distribution, and the statistics that indicate its central tendency and its variability. 5. (a) The range is the distance between the highest and lowest scores in a distribution. (b) It includes only the most extreme and often least-frequent scores. (c) With nominal or ordinal scores. 7. (a) They communicate how much the scores are spread out around the mean. (b) The standard deviation because it is closer to being the “average deviation,” and we can use it to find the middle 68% of the distribution. 9. (a) All are forms of the standard deviation, communicating the “average” amount scores differ from the mean. (b) SX is a sample’s standard deviation; sX is an estimate of the population’s standard deviation based on a sample; and sX is the population’s true standard deviation. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 11. (a) Find the scores at 1 SX and at 1 SX from the mean. (b) Find the scores at 1 sX and at 1 sX from m. (c) Use X to estimate m, and then find the scores at 1 sX and at 1 sX from m. 13. (a) Range 8 0 8. (b) X 41, X2 231, N 10; so S2X (231 – 168.1)/10 6.29. (c) SX 26.29 2.51. (d) With X 4.1 and SX 2.51, the scores are 4.1 2.51 1.59 and 4.1 2.51 6.61. 15. (a) She made an error. Variance measures the distance that scores are from the mean, and distance cannot be a negative number. (b) She made another error. Her “average” deviation cannot be greater than the range between the lowest and highest scores. (c) It would incorrectly indicate that everyone has the same score. 17. (a) Compute the mean and sample standard deviation in each condition. (b) As we change the conditions from A to B to C, the dependent scores change from around a mean of 11.00 to 32.75 to 48.00, respectively. The SX for the three conditions are .71, 1.09, and .71, respectively. (c) The SX for the three conditions seem small, so participants scored consistently in each condition. 19. (a) Study A has a relatively narrow/skinny distribution, and Study B has a wider distribution. (b) In A, between 35 (40 5) and 45 (40 5); in B, between 30 (40 10) and 50 (40 10). 21. (a) Based on the sample means for conditions 1, 2, and 3, we’d expect a m of about 13.33, 8.33, and 5.67, respectively. (b) Somewhat inconsistently, because computing each sX indicates to expect a sX of 4.51, 2.52, and 3.06, respectively. 23. (a) Conditions are sunny versus rainy; compute X and SX using the ratio scores of length of laughter; create a bar graph for this discrete independent variable. (b) Conditions are years of alcoholism; for these skewed income scores, compute the median, not the X and SX; create a line graph for this continuous independent variable. (c) Conditions are hours slept; compute the X and SX for the ratio scores of number of ideas; create a line graph for this continuous independent variable. 3. 5. 7. 9. 11. 13. 15. 17. Chapter 5 1. (a) A z-score indicates the distance a score is above or below the mean when measured in standard deviation units. (b) z-scores are used to determine relative standing, compare scores 19. from different variables, and compute relative frequency and percentile. (a) It is the distribution that results from transforming a distribution of raw scores into z-scores. (b) No, only when the raw scores are normally distributed. (c) The mean is 0 and the standard deviation is 1. (a) It is our model of the perfect normal z-distribution. (b) It is used as a model of any normal distribution of raw scores after being transformed to z-scores. (c) When scores are a large group of normally distributed interval or ratio scores. (a) z (80 86)/12 .5. (b) z (98 86)/12 1. (c) X (1.5)(12) 86 68. (d) X (1)(12) 86 98. (a) z 1. (b) z 2.8. (c) z .70. (d) z 0. (a) Convert the raw score that marks the slice to z; find in column B or C the proportion in the slice, which is also the relative frequency of the scores in the slice. (b) Convert the raw score to z; the proportion of the curve to the left of z (in column C) becomes the percentile. (c) Convert the raw score to z; the proportion of the curve between the z and the mean (in column B) plus .50 becomes the percentile. (d) Find the specified proportion in column B or C, identify the z in column A, and transform the z to its corresponding raw score. (a) z (76 100)/16 1.5; from column B, the relative frequency is .4332. (b) (.4332) (500) 216.6 people. (c) From column C, it is .0668, or about the 7th percentile. (d) With .4332 between z and the mean, and .50 above the mean, .4332 .50 .9332; then (.9332)(500) 466.6 people. (a) That it is normally distributed, that its m equals the m of the underlying raw score population, and that its standard deviation is related to the standard deviation of the raw score population. (b) Because it describes the sampling distribution for any population without our having to actually measure all possible sample means. No. To compare the scores we need z-scores: For Emily, z (76 85)/10 .90; for Amber, z (60 50)/4 2.5. Relative to their respective classes, Amber did much better than Emily. (a) Small. This will give him a large positive z-score, placing him at the top of his class. (b) Large. Then he will have a small negative z and will be relatively close to the mean. Appendix C: Answers to Odd-Numbered Study Problems 267 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 21. From column C in the z-table, the 25th percentile is at approximately z .67. The cutoff score is X (.67)(10) 75 68.3. 23. To make the salaries comparable, compute z. For City A, z (43,000 45,000)/15,000 .13. For City B, z (46,000 50,000)/18,000 .22. City A is the better offer, because her income will be closer to (less below) the average cost of living in that city. 25. (a) z (60 56)/8 .50, so from column B, .1915 of the curve is between 60 and 56. Adding .50 of the curve below the mean gives a total of .6915, or 69.15% of the curve is expected to be below 60. (b) z (54 56)/8 .25, so from column C, .4013 of the curve is below 54. (c) The approximate upper .20% of the curve from column C is .2005, at z .84. The corresponding raw score is X (.84)(8) 56 62.72. 15. (a) z 17. 19. Chapter 6 1. (a) It is our expectation or confidence the event will occur. (b) The relative frequency of the event in the population. 3. (a) z (64 46)/8 2.25; from column C, p .0122. (b) z (40 46)/8 .75; z (50 46)/8 .50; using column B, p .2734 .1915 .4649. 5. The p of a hurricane is 160/200 .80. The uncle may be looking at an unrepresentative sample over the past 13 years. David uses the gambler’s fallacy, failing to realize that p is based on the long run, and so there may not be a hurricane soon. 7. It indicates that by chance, we’ve selected too many high or too many low scores, so a sample is unrepresentative of its population. Then X does not equal the m it represents. 9. (a) It indicates whether or not the sample’s z-score (and X) lies in the region of rejection. (b) We reject that the sample comes from or represents the underlying raw score population, because it is very unlikely to do so. 11. She had sampling error, obtaining an unrepresentative sample that contained a majority of Ramone’s supporters, but the majority in the population supported Darius. 13. Because it is a mean and a sample that is very unlikely to occur if we were representing that population. 268 21. 23. 25. 1.96 µ 60 z 1.45 z 1.96 (b) No: The sample’s z-score is not beyond the critical value, so this is a frequent sample mean when representing this population so we should not reject that the sample represents the population with m 60. (a) 1.645. (b) Yes, because sX 2.19, so z (36.8 33)/2.19 1.74, which lies beyond 1.645. (c) Such a sample is unlikely to occur when representing this population. (d) Reject that the sample represents this population. sX 1.521, so z (34 28)/1.521 3.945. No, because the z is beyond the critical value of 1.96, so this sample is unlikely to represent this population. For Bubba’s, z (26 24)/1.10 1.82, which is not beyond 1.96. Retain that this sample represents the population of average bowlers. For Babette’s, z (18 24)/1.10 5.45, which is beyond 1.96. Reject that this sample represents the population of average bowlers. No: Having a boy now is no more likely than at any other time. (a) First compute the X, which is 35.67; sX 5> 29 1.67 ; z (35.67 30)/1.67 3.40. With a critical value of 1.96, conclude that your football players do not represent this population. (b) Football players, as represented by your sample, form a population different from non–football players, with a m of about 35.67. Chapter 7 1. By poorly representing a population, a sample may mislead us to incorrectly describe an existing relationship, or to describe a relationship that does not exist. 3. a stands for the criterion probability; it determines the size of the region of rejection and the theoretical probability of a Type I error. 5. Perhaps wearing the device causes people to exercise more, and the researcher’s result accurately reflects this. Or, perhaps wearing the device does Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 7. 9. 11. 13. 15. 17. nothing and, by chance, the researcher happened to select an unrepresentative sample who naturally tend to exercise more than in the general population. (a) The null hypothesis, indicating the sample represents a population where the predicted relationship does not exist. (b) The alternative hypothesis, indicating the sample represents a population where the predicted relationship does exist. (a) That the sample relationship was so unlikely to occur if there was not the predicted relationship in nature that we conclude there is this relationship in nature. (b) Because then we can believe that we’ve found a relationship and are not being misled by sampling error. (a), (b), and (c) are incorrect because by retaining H0, we still have both hypotheses, so we can make no conclusions about the relationship; in (d) the results are nonsignificant; (e) and (f) are correct. (a) That owning a hybrid causes different attitudes than not owning one. (b) Two-tailed. (c) H0: The m of hybrid owners equals that of nonhybrid owners, at 50; Ha: The m of hybrid owners is not equal to that of nonhybrid owners, so it is not equal to 50. (d) sX 24> 2100 2.40; zobt (76 65)/2.40 4.58. (e) zcrit 1.96. (f) The zobt of 4.58 is larger than the zcrit of 1.96, so the results are significant: Owners of hybrid cars have significantly more positive attitudes (with m around 77) than owners of nonhybrid cars (with m 65). (g) z 4.58, p .05. (a) The probability of a Type I error is p .05; this is concluding that type of car influences attitudes, when really it does not. (b) By rejecting H0, there is no chance of a Type II error, which would be retaining H0; this is not concluding that type of car influences attitudes, when really it does. (a) Changing the independent variable from not finals week to finals week increases the dependent variable of amount of pizza consumed; we will not demonstrate an increase. (b) Changing the independent variable from not performing breathing exercises to performing them changes the dependent scores of blood pressure; we will not demonstrate a change in scores. (c) Changing the independent variable by increasing hormone levels changes the dependent scores of pain sensitivity; we will not demonstrate a change in scores. (d) Changing the independent variable by increasing amount of light will decrease the 19. 21. 23. 25. dependent scores of frequency of daydreams; we will not demonstrate a decrease. (a) A two-tailed test. (b) H0: m 70; Ha: m 70. (c) sX 12> 249 1.71; zobt (74.36 70)/1.71 2.55. (d) zcrit 1.96. (e) Yes; because zobt is beyond zcrit, the results are significant: Changing from no music to music results in test scores changing from a m of 70 to a m around 74.36. (a) The X is 35.11; SX 10.629. (b) One-tailed test—she predicts that taking statistics lowers self-esteem. (c) H0: m 28; Ha: m 28. (d) sX 11.35> 29 3.78 ; zobt (35.11 28)/3.78 1.88. (e) zcrit 1.645. (f) Because the positive zobt is not beyond the negative zcrit, the results are not significant. She can make no claim about the relationship. (g) Perhaps statistics do nothing to self-esteem; perhaps they lower selfesteem and she poorly represented this; or perhaps they actually raise self-esteem as the X suggests. He is incorrect because the total size of the region of rejection (which is a) is the same for a one- or a two-tailed test. This is also the probability of making a Type I error, so it is equally likely with either type of test. Incorrect: Whether we make a Type I or Type II error is determined by whether the independent variable “works” in nature. Power applies when it does work, increasing the probability of rejecting H0 when it is false. When H0 is true, then a is the probability of making a Type I error. Chapter 8 1. Either (1) the predicted relationship does not occur in nature, but by chance, sampling error produced unrepresentative data that make it look like the relationship occurs or (2) the data accurately represent the predicted relationship, which does occur in nature. 3. (a) As a one-sample experiment. (b) Normally distributed interval or ratio scores. (c) Perform the t-test when sX of the underlying raw score population is not known; perform the z-test when sX is known. 5. (a) Because the t-distribution is not a perfectly normal distribution like the z-distribution is. (b) Different Ns produce differently shaped t-distributions, so a different tcrit is needed to demarcate a region of rejection equal to a. (c) The degrees of freedom (df). (d) df N 1. Appendix C: Answers to Odd-Numbered Study Problems 269 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 7. To describe the relationship and interpret it in terms of our variables. 9. (a) One-tailed. (b) H0: m 88; Ha: m 88. (c) With df 30 and a .05, the one-tailed tcrit is 1.697. (d) Yes. (e) Yes. (f) The two-tailed tcrit is 2.042 (not the one-tailed 1.697). 11. (a) H0: m 68.5; Ha: m 68.5. (b) sX 2130.5>10 3.61 ; tobt (78.5 68.5)/3.61 2.77. (c) With df 9, tcrit 2.262. (d) Compared to other books, which produce m 68.5, this book produces a significant improvement in exam scores, with a m around 78.5. (e) t(9) 3.62, p .05. (f) (3.61) (2.262) 78.5 m (3.61)(2.262) 78.5 70.33 m 86.67. 13. (a) H0: m 50; Ha: m 50. (b) tobt (53.25 50)/8.44 .39. (c) For df 7, tcrit .2.365. (d) t(7) .39, p .05, so the results are not significant, so do not compute the confidence interval. (e) She can make no conclusion about whether the argument changes people’s attitudes. 15. (a) H0: m 12; Ha: m 12. (b) X 8.667; s2X 4.67; sX 24.67>6 .882; tobt (8.667 12)/.882 3.78. With df 5, tcrit 2.571. The results are significant: Not using the program significantly reduces grammatical errors. (c) (.882)(2.571) 8.667 m (.882) (2.571) 8.667 6.399 m 10.935. 17. (a) A margin of error. (b) If we could measure the population in your profession, we expect the mean to be between $42,000 and $50,000. 19. In the population, Smith’s true rating is between 32% and 38%, and Jones’ is between 34% and 40%. Because of the overlap, Smith might be ahead, or Jones might be ahead. Using these statistics there is no clear winner, so we conclude there is a tie. 21. (a) N 46; that adolescents and adults differ in perceptual skills; with p .01, the difference is significant; Type I error. (b) N 100; that each personality type produces a different population of emotionality scores; with p .05, the difference is not significant; Type II error. 23. (a) t(34) 2.019, p .05. (b) t(34) 2.47, p .05. 25. Create the experimental hypotheses and design a study to obtain the X of dependent scores under one condition to compare to a known m under another condition. Create the one- or 270 two-tailed H0 and Ha, and perform the t-test to determine if X differs significantly from m. If the results are significant, describe and interpret the relationship and compute the confidence interval for the m being represented. If the results are not significant, make no conclusion about the predicted relationship, one way or the other. Chapter 9 1. (a) In an experiment with two independent samples. (b) In an experiment with two related samples. 3. Either (1) changing the conditions does not produce the predicted relationship in nature, but by chance, sampling error produced unrepresentative sample data that produce different means, making it look like the relationship occurs or (2) the data accurately represent that changing the conditions produces the predicted relationship that does occur in nature. 5. Using matched pairs, where each participant in one condition is paired with a participant in the other condition, or using repeated measures, where the same sample of participants is tested under all conditions. 7. (a) It is a distribution showing all differences between two means that occur when two samples are drawn from the one population of raw scores described by H0. (b) It is the standard deviation of the sampling distribution of differences between means from independent samples. (c) That the difference between our means is unlikely to occur if we are representing the population described by H0, where there is not the predicted relationship. 9. (a) It is a distribution showing all values of D that occur when samples represent a population of difference scores where mD 0 and where the predicted relationship does not exist. (b) It is the standard deviation of the sampling distribution of D. (c) That our mean difference (D) is unlikely to occur if we are representing the population described by H0, where there is not the predicted relationship. 11. (a) To interpret the results in terms of the variables and underlying behaviors. (b) Because it indicates the size of the impact the independent variable has on the dependent variable and the behavior it reflects. Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 13. (a) H0: m1 m2 0; Ha: m1 m2 ⬆ 0. (b) s2pool 23.695; sX X 1.78; tobt (43 39)>1.78 2.25. (c) With df (15 1) (15 1) 28, tcrit 2.048. (d) The results are significant: In the population, Once (with m around 43) leads to more productivity than Often (with m around 39). (e) d (43 39)> 223.695 .82; r2pb (2.25)2 > 3 (2.25)2 28 4 .15. This is a moderate to large effect. (f) Label the X axis as e-mail checked; label the Y axis as mean productivity. Plot the data point for Once at Y 43; plot the data point for Often at Y 39. Create a bar graph: Frequency of checking is measured as an ordinal variable. 15. (a) H0: mD 0; Ha: mD ⬆ 0. (b) tobt (2.63 0)>.75 3.51. (c) With df 7, tcrit 2.365, so people exposed to much sunshine exhibit a significantly higher well-being score than when exposed to less sunshine. (d) The X of 15.5; the X of 18.13. (e) r2pb (3.51)2 > 3 (3.51)2 7 4 .64; 64% of the changes are due to changing sunlight. (f) By accounting for 64% of the variance, these results are very important. 17. (a) H0: mD 0; Ha: mD 0. (b) D 1.2, s2D 1.289, sD .359; tobt (1.2 0)>.359 3.34. (c) With df 9, tcrit 1.833. (d) The results are significant. In the population, children exhibit more aggressive acts after watching the show (with m about 3.9) than they do watching before the show (with m about 2.7). (e) d 1 .2/21 .289 1.14; a relatively large difference. 19. You cannot test the same people first when they’re males and then again when they’re females. 21. (a) Two-tailed. (b) H0: m1 m2 0; Ha: m1 m2 ⬆ 0. (c) X1 11.5, s21 4.72; X 2 14.1, s22 5.86, s2pool 5.29, sX X 1.03, tobt (11.5 14.1)>1.03 2.52. With df = 18, tcrit = 2.101, so tobt is significant. (d) Police who’ve taken this course are more successful at solving disputes than police who have not taken it. The m for the police 1 2 1 2 with the course is around 14.1, and the m for the police without the course is around 11.5. (e) d 1.13; r2ph .26; taking the course is important for effectively resolving disputes. 23. (a) Independent-samples design; independentsamples t-test; a significant result, so Type I error (with p . 01); changing from male to female increased scores from around 5.4 to 9.3, but this was an inconsistent change because it accounts for only 8% of the variance in scores. (b) Repeated-measures design; related-samples t-test; a significant result (with p .05), so Type I error; dieting decreased participants’ weights and accounts for 26% of the variance in scores, which is a consistent effect. 25. (a) When the dependent variable is measured using normally distributed interval or ratio scores that have homogeneity of variance. (b) z-test, used in a one-sample experiment when the sX of the underlying raw score population is known; one-sample t-test, used in a one-sample experiment when the sX of the underlying raw score population is not known; independent-samples t-test, used when two independent samples are tested under two conditions; related-samples t-test, used when participants are tested under two conditions, with either matching pairs of different participants or repeated measures of the same participants. Chapter 10 1. (a) In experiments we manipulate one variable and measure participants on another variable; in correlational studies we measure participants on two variables. (b) In experiments we compute the mean of the dependent (Y) scores for each condition of the independent variable (X); in correlational studies the correlation coefficient simultaneously examines the relationship formed by all X-Y pairs. 3. (a) It is a graph of the individual data points formed by a sample of X-Y pairs. (b) It is the summary straight line drawn through the center of the scatterplot. 5. (a) When you want to summarize the relationship between two normally distributed interval or ratio variables. (b) The type (direction) of the relationship and its strength. 7. Either (1) the r accurately reflects the predicted relationship that does occur in nature or (2) the Appendix C: Answers to Odd-Numbered Study Problems 271 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 9. 11. 13. 15. 17. 19. 21. predicted relationship does not occur in nature, but by chance, we obtained misleading sample data that produces the r so that it looks like the relationship occurs. Create the two- or the one-tailed H0 and Ha. Compute robt. Set up the sampling distribution and, using df N 2, find rcrit. If robt is larger than rcrit, the coefficient is significantly different from zero, so estimate the population coefficient (r) as being around the value of robt. Determine the proportion of variance accounted for using r2 and perform linear regression procedures. (a) It describes the proportion of all of the differences in Y scores that are associated with changing X scores. (b) By computing r2. (c) The larger the r2, the more accurately we can predict scores (and the underlying behavior), so the relationship is scientifically more useful and important. He is incorrect, inferring that this relationship shows that an increased population causes fewer bears. (a) No; she should square each r, giving r2 (.60)2 .36, and r2 (.30)2 .09. The relationship with hours studied is four times more consistent than that with classroom size. (b) Study time, because it accounts for 36% of the differences in test scores; classroom size accounts for only 9%. (a) First compute r. X 49, X2 371, (X)2 2401, Y 31, Y2 171, (Y)2 961, XY 188, and N = 7, so r (1316 1519)> 2(196)(236) .94. (b) This is a very strong relationship, with close to one Y score paired with each X. (c) H0: r 0; Ha: r 0. r2. (d) With df 5, the two-tailed rcrit .754. (e) The coefficient is significant, so we expect in the population that r is around .94. (f) r(5) .94, p .05. (g) r2 (.94)2 .88, so 88% of the differences in participants’ satisfaction scores are linked to their error scores. (a) X 45, X2 259, Y 89, Y2 887, XY 460, N 10; r (4600 4005)> 2(565)(949) .81. (b) Very strong: Close to one value of Y tends to be associated with one value of X. (c) H0: r 0; Ha: r 0. (d) With df 8, the one-tailed rcrit .669. (e) r(7) .81, p .05: This r is significant, so we expect r is around .81. (f) r2 .66, so predictions are 66% more accurate. (a) With N 78 there is no rcrit for df 76. The nearest bracketing values are .232 and .217 for a df of 70 and 80, respectively. The robt of 272 .38 is well beyond these critical values, so it is significant. (b) r(76) .31, p .05. (c) Compute r2 and the linear regression equation. 23. Implicitly we are asking, “For a given intelligence score, what creativity score occurs?”, so intelligence is X and creativity is Y. 25. (a) B. (b) B. (c) 4. (d) 16. (e) 4. Chapter 11 1. (a) Analysis of variance. (b) A study that contains one independent variable. (c) An independent variable. (d) A condition of the independent variable. (e) Another name for a level. (f) All samples are independent. (g) All samples are related, usually through a repeated-measures design. (h) The number of levels in a factor. 3. Either (1) the independent variable influences the behavior in nature, producing the different means in the different conditions so that there is a relationship or (2) the independent variable does not influence the behavior in nature, but by chance we obtain sample data that produce different means, making it appear the relationship exists. 5. (a) It is the probability of making a Type I error after comparing all possible pairs of means in an experiment. (b) Multiple t-tests result in an experiment-wise error rate that is larger than our alpha. (c) Performing ANOVA and then post hoc tests keeps the experiment-wise error rate equal to alpha. 7. Using either a between- or a within-subjects ANOVA, compute Fobt and compare it to Fcrit If Fobt is significant, perform Tukey’s HSD test if all ns are equal. Describe the effect size by computing h2. Graph the results and/or compute confidence intervals for each condition. Interpret the results “psychologically.” 9. The H0 is that the independent variable does not have an influence on behaviors or scores, so the means from the levels “really” represent the same population m. The Ha is that the independent variable has an influence on behaviors and scores, so the means from two or more levels represent different population ms. 11. (a) Because the treatment does not work, one population is present and then MSbn should equal MSwn, so the F-ratio is 1. (b) Because the treatment produces differences in scores among the conditions, producing an MSbn that is larger Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 13. 15. 17. 19. than MSwn, so the F-ratio is greater than 1. (c) It indicates that two or more of the level means probably represent different values of m so the data represent a relationship in the population. (a) It shows all values of F that occur by chance when H0 is true (only one population is represented). (b) That the Fobt hardly ever occurs when H0 is true. (c) It indicates that our conditions are unlikely to represent only one population. (a) The MSbn is less than the MSwn and H0 is assumed to be true. (b) He made an error: Fobt cannot be a negative number. (a) Several independent samples of participants were created based on their particular salaries. Then their self-esteem was measured. (b) The salary factor produced a significant Fobt, indicating “believable” differences among the mean self-esteem scores for two or more conditions. (c) Perform a post hoc test to identify which salary levels differ. (a) Source Between Within Total Sum of Squares 147.32 862.99 1010.31 df 2 60 62 Mean Square 73.66 14.38 F 5.12 H0: m1 m2 m3; Ha: Not all ms are equal. For dfbn 2 and dfwn 60, Fcrit 3.15. The Fobt is significant: F(2, 60) 5.12, p .05. HSD (3.40) 1219.18>212 3.25. All three means differ from each other by more than 3.25, so all differ significantly. (f) We expect that changing the levels of the factor produces a change in scores from a m around 45.3 to a m around 16.9 to a m around 8.2. (g) h2 147.32>1010.31 .15, The factor determines 15% of the differences in scores, so it does not have a very large or important influence. 21. (a) H0: m1 m2 m3 m4; Ha: Not all ms are equal. (b) Sum of Squares 47.69 20.75 68.44 df 3 12 15 Mean Square 15.897 1.729 HSD (4.20)(21.73>42 2.76; significant differences are between negligible and moderate, negligible and severe, and minimal and severe. (f) As stress levels increased, illness rates significantly increased, although only differences between nonadjacent stress conditions were significant. (g) h2 47.69>68.44 .70; increasing the stress level accounted for 70% of the variance in illness rates. 23. (a) Yes. (b) A dfwn 51 is not in the F-table, but the bracketing dfs of 50 and 55 have critical values of 2.56 and 2.54, respectively. The Fobt is well beyond the Fcrit that would be at dfwn 51, so it is significant. 25. (a) One-way between-subjects ANOVA. (b) Independent samples t-test or betweensubjects ANOVA. (c) One-way within-subjects (repeated-measures) ANOVA. (d) Pearson r and linear regression. (e) With these matched pairs, the related-samples t-test or the within-subjects ANOVA. Chapter 12 (b) (c) (d) (e) Source Between Within Total (e) X1 2.0, X2 3.0, X3 5.75, X4 6.0; F 9.19 (c) For dfbn 3 and dfwn 12, Fcrit 3.49. (d) The Fobt is significant: F(3, 12) 9.19, p .05. 1. (a) When the experiment simultaneously examines two independent variables. (b) The effect of the interaction of the two variables. 3. (a) All levels of one factor are combined with all levels of the other factor. (b) The combination of a level of factor A with a level of factor B. (c) Collapsing a factor is averaging together all scores from all levels of the factor. 5. (a) The overall effect on the dependent scores of changing the levels of that factor. (b) By averaging together all scores from all levels of factor B so that we have only the mean for each level of A. (c) That at least two of the means are likely to represent different ms. 7. (a) The FA for the main effect of factor A, FB for the main effect of factor B, and FA×B for the interaction. (b) Post hoc comparisons must be performed. (c) To determine which specific levels or cells differ significantly. 9. (a) It forms a matrix with four columns for the levels of A and three rows for the levels of B, with n 10 per cell. (b) 4 × 3. (c) 30. (d) 40. (e) 10. 11. (a) Use k when finding qk for a main effect; use the adjusted k when finding qk for the interaction. (b) Find the difference among each pair Appendix C: Answers to Odd-Numbered Study Problems 273 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Scores of means for a main effect; find only unconfounded differences among the cell means in the interaction. 13. (a) Yes: Because changing the levels of A produced one relationship for B1 (decreasing scores) and a different relationship for B2 (increasing scores). (b) For A1, X 9.5; for A2, X 10.5; for A3, X 12.5. Changing A increases scores from around 9.5 to around 10.5 to around 12.5. (c) For B1, X 8.33; for B2, X 13.33. Changing B increases scores from around 8.33 to around 13.33. 20 (d) 18 16 14 12 10 8 6 4 2 0 B1 B2 A1 A2 Levels of Factor A A3 15. (a) The way the scores change with increasing reward depends on the level of practice: For low practice, scores increase and then decrease; for medium practice, scores are level and then increase; for high practice, scores are unchanged. (b) As reward increases, performance does not increase under every level of practice, contradicting the main effect means. (c) Perform the HSD test on the cell means. (d) Subtract each mean from every other mean within each column and within each row. (e) For this 3 × 3 design, adjusted k 7 and dfwn 60, so qk 4.31. 17. Study 1: For A, means are 7 and 9; for B, means are 3 and 13. Apparently there are effects for A and B but not for A × B. Study 2: For A, means are 7.5 and 7.5; for B, means are 7.5 and 7.5. There is no effect for A or B but there is an effect for A × B. Study 3: For A, means are 8 and 8; for B, means are 11 and 5. There is no effect for A, but there are effects for B and A × B. 19. (a) No difference occurs between the means for low and high users. (b) The means are different for two or more of the income levels. (c) The difference between the means for high or low users depends on income level. (Or, differences among income levels depend on whether participants are in the high or the low usage group.) 21. The main effect for math versus logic problems, the main effect for difficulty level, and the interaction of math or logic and difficulty level are all significant. 274 23. Perform the ANOVA to compute FA for testing the differences among main effect means of factor A, FB for testing the differences among main effect means of factor B, and FA×B for testing the cell means for the interaction. Perform the Tukey HSD test for each significant F to determine which means differ significantly. Compute h2 to determine the size of each significant effect. 25. (a) Related-samples t-test or one-way withinsubjects ANOVA. (b) Two-way between-subjects ANOVA. (c) Independent-samples t-test or one-way between-subjects ANOVA. (d) Oneway within-subjects ANOVA. (e) One-way between-subjects ANOVA. (f) Pearson correlation coefficient. Chapter 13 1. Both types of procedures test whether, due to sampling error, the data poorly represent the absence of the predicted relationship in the population. 3. (a) They form non-normal distributions, or they do not have homogeneous variance. (b) Transform them to ranked scores. 5. They count the frequency of participants falling into each category of one variable. 7. (a) In the ANOVA, the researcher measures the amount of a behavior or attribute; in the chi square, the researcher counts the number of participants showing or not showing the behavior or attribute. (b) They both test whether the groups differ significantly, representing a relationship in nature. 9. (a) It becomes larger as the differences between each fo and fe become larger. (b) Because there is a larger difference between the frequencies we have obtained and the frequencies we should have obtained if H0 was true. (c) It shows the frequency of all values of x2obt that occur when H0 is true. (d) Because our x2obt is so unlikely to occur if H0 was true that we reject it is true. 11. If testing no difference between the groups, then each fe N/k. If testing the goodness of fit of some other model, each fe is based on the percentage given in the model. 13. (a) The one-way x2 (b) H0: In the population, the frequencies of women’s preferences for much or slightly taller men are equal; Ha: The frequencies of women’s preferences for much or slightly Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. taller men are not equal in the population. (c) Compute each fe: With N 89, fe 89/2 44.5 for each group. (d) The x2obt [(34 44.5)2/44.5] [(55 44.5)2/44.5] 4.96. With df 1, x2crit 3.84 so the results are significant. Conclude: In the population, about 55/89, or 62%, of women prefer slightly taller men, and about 38% prefer much taller men, p .05. (e) Label the Y axis with f. Label the X axis with each preference, and for each, draw a bar graph to the height of the f. 15. (a) The two-way x2. (b) H0: In the population, the frequency of voting or not is independent of the frequency of satisfaction or not; Ha: The frequency of voting or not is dependent on the frequency of satisfaction or not, and vice versa. (c) Compute each fe. For voters: Satisfied fe (83)(81)/168 40.02, dissatisfied fe (83)(87)/168 42.98; for nonvoters: Satisfied fe (81)(85)/168 40.98, dissatisfied fe (85)(87)/168 44.02. (d) x2obt [(48 40.02)2/40.02] [(35 42.98)2/42.98] [(33 40.98)2/40.98] [(52 44.02)2/44.02] 6.07. (e) For df 1, x2crit 3.84. The results are significant, so in the population, satisfaction with election results is correlated with— depends on—voting. (f) For the phi coefficient: w 26.07>168 .19, so the relationship is somewhat consistent. 17. (a) The one-way x2. (b) H0: The elderly population is 30% Republican, 55% Democrat, and 15% Other; Ha: The elderly population is not distributed this way. (c) Compute each fe: For Republicans, fe (.30)(100) 30; for Democrats, fe (.55)(100) 55; for Other, fe (.15) (100) 15. (d) x2obt .53 2.20 3.27 6.00. (e) For df 2, x2crit 5.99, so the results are significant: Party affiliation among senior citizens is different from affiliations in the general population. As in our samples, we expect 26% Republican, 66% Democrat, and 8% Other. 19. (a) The frequency of students disliking each professor must be included. (b) She should perform the two-way x2 to test whether liking or disliking one professor is correlated with liking or disliking the other professor. 21. (a) Kruskal–Wallis test. (b) Mann–Whitney test. (c) Wilcoxon test. (d) Friedman test. 23. (a) Wilcoxon test. (b) Kruskal–Wallis test. (c) One-way chi square. (d) Transform the scores to ranks, and then perform the Friedman test. (e) Mann–Whitney test. 25. For the independent variable: whether independent or related samples were tested, the number of independent variables, and if only one variable, the number of conditions. For the dependent variable: whether the scores meet the requirements of parametric or a nonparametric procedure. Appendix C: Answers to Odd-Numbered Study Problems 275 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. INDEX A alpha (α), 113, 128 alternative hypothesis (Ha), 109–110 creating, 111 defined, 110, 111 in independent-sample t-tests, 147–148 interaction effects and, 208 main effects and, 205–206 mean square and, 191 in one-tailed tests, 148, 154–156 in one-way ANOVA, 187–188 in one-way chi square, 221–223 Pearson r and, 175–177 population mean in, 110, 143, 188 proving, 115–117 rejecting, 115–117 in related-sample t-tests, 151–154 in t-distribution, 133–134 in two-way chi square, 226 in within-subjects ANOVA, 199 American Psychological Association (APA) symbols, 65 analysis of variance (ANOVA). See one-way analysis of variance (ANOVA); twoway analysis of variance (ANOVA) area under the curve, 30–31. See also proportion of the area under the curve association. See also correlational study degree of, 167 relationship and, 7 assumptions in independent-samples t-test, 142, 148 in one-sample t-test, 128 in one-way ANOVA, 197 in one-way chi square, 197, 221 in parametric statistics, 108, 112 in related samples t-test, 149, 150, 155 in two-way chi square, 224 in z-tests, 113, 117 average. See mean B bar graph, 23–24, 47–48, 229 behavioral research. See research bell-shaped curve, 26. See normal curve 276 between-subjects ANOVA, 185, 186 two-way, 203 between-subjects factor, 185, 186 biased estimators, 60–61 bimodal distribution, 27–28, 39 C cause and independent variables, 12 cell, 204 cell mean, 207 central limit theorem, 80–81 central tendency, 37–38. See also measure of central tendency chance, 89–92. See also probability (p) chi square distribution, 222–223 chi square procedures (χ2) defined, 220 nonparametric statistics and (See nonparametric statistics) one-way (See one-way chi square) reporting, 229 sampling distribution, 222–223 SPSS to perform, 231 two-way (See two-way chi square) Cohen’s d, 156–157 collapsing, 205 condition defined, 12 of independent variables, 12 confidence interval for μ, 137 computing, 136–137 defined, 135–137 to estimate population mean, 135–137 in independent-samples t-test, 147–148 in related-samples t-test, 154 reporting, 138 in Tukey HSD test, 197 confounded comparison, 212 consistency correlation coefficients and, 167–168, 169 vs. measures of variability, 53–55 in proportion of variance accounted for, 157–158, 180 of relationships, 8–9 contingency coefficient (C), 228 continuous variables, 16–17 defined, 16 vs. discrete variables, 16–17 correlation, 163–164 causality and, 164 defined, 163 negative, 177 perfect, 168–169 positive, 177 type of relationship described by, 167 zero, 170, 176 correlational design, 163 correlational study, 162–183 characteristics of, 164–165 defined, 14 vs. experiment, 164 linear relationships and, 165–166 mean score and, 45–46 nonlinear relationships and, 166–167 scatterplots and, 164–165 strength of relationship and, 167–171 type of relationship and, 165–167 correlation coefficient. See also correlational study; Pearson correlation coefficient (r) computing, 163 consistency and, 167–168, 169 contingency, 228 defined, 163 linear, 167 measures of variability and, 168–169 phi, 227–228 vs. regression line, 166 Spearman, 230 squared point-biserial, 158 criterion defined, 99 selecting, 101 symbol for, 113 criterion variable, 178 critical value in computing confidence interval, 136–137 defined, 99 determining, 114 identifying, 99–100 interpreting t-test and, 133 in one-tailed test, 119–120, 133–134 region of rejection and, 99–100, 101 representative samples, 99–101 t-distribution and, 130–132 t-table and, 132 z compared to, 99–100, 114–115 cumulative frequency, 32–33 defined, 33 percentile and, 32–33 curvilinear relationships. See nonlinear relationships D data, statistics and, 3 data point, 24–25 degree of association. See strength of relationship degrees of freedom (df ), 130–132 computing, 193 defined, 132 in independent-samples t-test, 147 in one-samples t-test, 132 in one-way ANOVA, 189, 191–195 in one-way chi square, 222 in Pearson r, 176 in related-samples t-test, 153 t-distribution and, 130–132 in two-way chi square, 227 dependent samples. See related samples dependent variables, 12–13, 46 descriptive statistics, 10 design correlational, 163 defined, 11 factorial, 204 matched-samples, 149 pretest/posttest, 150 repeated-measures, 149 two-way mixed-design ANOVA, 203 deviation defined, 44 sum of, around the mean, 44–45 difference scores, 150–153 discrete scales, 24 discrete variables, 16–17 vs. continuous variables, 16–17 defined, 16 dispersion, 53. See also measures of variability E effect size, 156–158 Cohen’s d and, 156–157 defined, 156 measure of, 156 one-way ANOVA and, 198–199 using proportion of variance accounted for, 157–158 empirical probability distribution, 90 error. See also standard error sampling, 94–96 sum of the deviations around the mean and, 45 Type I, 121–122 Type II, 122–123 estimated population standard deviation, 60–61 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. computing, 64–65 defined, 61 defining formulas for unbiased estimators of, 60–61 estimated population variance, 60–61 computing, 64–65 defined, 61 defining formulas for unbiased estimators of, 60–61 estimated standard error of the mean, 128–130 estimating population μ by computing confidence interval, 135–137 estimators, biased and unbiased, 60–61 eta squared (η 2), 198–199, 211 expected frequency (fe) defined, 221 in one-way chi square, 221 in two-way chi square, 226–227 experimental hypotheses creating, 108–109 defined, 108 experiments components of, 13 vs. correlational study, 164 correlational study and, 14 defined, 12 dependent variables and, 12–13 drawing conclusions from, 13–14 independent variables and, 12 experiment-wise error rate, 287 F factorial design, 204 factors. See also independent variables between-subjects, 185, 186 collapsing, 205 in computing degrees of freedom, 193 defined, 185, 186 eta squared and, 198–199 levels in, 185, 186 main effect of, 204–206 mean square between groups and, 189 post hoc comparisons and, 188 Tukey HSD test and, 196 within-subjects, 185, 186 F distribution, 194–196 formulas. See specific procedures F-ratio, 190–191 frequency cumulative, 32–33 defined, 21 expected, 221, 226–227 polygons, 24–25 relative, 29–32 symbol for, 20 frequency distributions, 20–35 defined, 21 graphing, 23–25 labeling, 28 simple, 21, 22, 28 symbols and terminology used in, 21 in table, 22–23 types of, 25–28 frequency polygons, 24–25 Friedman test, 230 F statistic, 188 F-table, 195, 197 G gambler’s fallacy, 90 generalize, 5 goodness of fit test, 223–224. See also one-way chi square defined, 223 graphs bar graphs, 23–24, 47–48, 229 data points on, 24–25 of frequency distributions, 23–25 frequency polygons, 24–25 of grouped distributions, 25–26 histograms, 24 of interaction effect, 211–212 line graph, 47 in one-way ANOVA, 198 regression line, 178, 179 scatterplots, 164–165 in two-way ANOVA, 210 X axis, 23 Y axis, 23 grouped distributions defined, 25 graphs of, 25–26 sampling distribution for, 143–144 standard error of the difference and, 145–146 statistical hypothesis for, 143 summary of, 148–149 independent variables. See also factors cause and, 12 conditions of, 12 defined, 12 treatment effect and, 185 inferential statistics, 10–11, 107–112 experimental hypotheses and, 108–109 one-sample experiment and, 109 statistical hypotheses and, 109–112 defined, 107 population mean and, 107 in research, 94–96, 107–108 sampling error and, 95–96 interaction effect, two-way, 207–209, 211–212 interval estimation, 135 interval scale, 16 inverted U-shaped relationship, 166, 167 K H Kruskal–Wallis test, 230 histogram, 24 homogeneity of variance, 142 Honestly Significant Difference test. See HSD test HSD test, 196–197 hypotheses in research, 5 hypothesis testing, 106–139. See also specific tests alternative hypothesis and, 109–110 errors, 121–123 experimental hypotheses and, 108–109 inferential statistics and, 107–112 null hypothesis and, 110, 111–112 one-tailed test and, 118–120 results, reporting, 120–121 statistical, 109–112 statistical hypotheses and, 109–112 two-tailed test and, 113–114 L I independence, test of, 224–225 independent samples, 142 independent-samples t-test, 142–149 assumptions in, 142, 148 defined, 142 homogeneity of variance and, 142 interpreting, 147–148 one-tailed tests and, 148 performing, 144–149 pooled variance and, 145 labeling frequency distributions, 28 laws of nature, 5 levels (k), 185, 186 linear correlation coefficient, 167 linear regression, 178–180 criterion variable in, 178 defined, 178 predicted Y score in, 179–180 predictions and, 178–180 predictor variable in, 178 procedure, 178–179 linear regression equation, 179 linear regression line, 166. See also regression line linear relationships, 165–166 line graph, 47 M main effect, 204–206 defined, 204 examining, 210–211 of factor A, 204–205 of factor B, 205–206 main effect mean, 205 Mann–Whitney test, 230 margin of error, 135 matched-samples design, 149 mean, 41–48. See also sample mean applied to research, 45–48 as balance point of distribution, 41–42 cell, 207 defined, 41 deviations around, 44 estimated standard error of, 128–130 inferential procedures and, 43 location of, on normal distribution, 42 main effect, 205 vs. mode and median, 42–44 reporting, 65 sampling distribution of, 79–81 sampling distribution of differences between, 143–144 standard error of, 81–82 sum of deviations around, 44–45 mean difference defined, 151 sampling distribution of, 152 standard error of, 153 mean squares, 189–191 comparing, 190–191 computing, 193 formula, 189 F-ratio and, 190–191 between groups, 189 within groups, 189 measurement scales, 15–16 interval, 16 nominal, 15–16 ordinal, 16 ratio, 16 measure of central tendency, 37–49. See also mean; median; mode defined, 38 mean, 41–45 median, 40–41 mode, 39–40 measure of effect size, 156 measures of variability, 52–67 vs. consistency, 53–55 correlation coefficient and, 168–169 defined, 53 importance of, 53 normal distributions and, 54 population variance and standard deviation, 59–61 range, 55 reporting, 65 sample standard deviation and, 57–58 sample variance and, 55–57 in SPSS, 66 standard deviation and, 55, 57–59 median, 40–41 defined, 41 vs. mode and mean, 42–44 mixed-design ANOVA, two-way, 203 mode, 39–40 bimodal distribution and, 39 defined, 39 limitations of, 39–40 vs. median and mean, 42–44 unimodal distribution and, 39 N negative linear relationship, 166 negatively skewed distributions, 27 Index 277 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 95% confidence interval, 137 nominal scale, 15–16, 39, 48, 219 nonlinear relationships, 166–167 non-normal distributions, 27, 28 nonparametric statistics chi square procedure and, 220 defined, 108, 219 for ordinal scores, 229–230 vs. parametric statistics, 219 for ranked scores, 230 SPSS to perform, 231 nonsignificant defined, 116–117 results, 115, 116–117 normal curve. See also standard normal curve defined, 26 to find relative frequency, 30–32 importance of, 26–27 proportion of the area under, 31–32 tail of the distribution and, 26 normal distribution, 26–27. See also normal curve defined, 26 location of mean on, 42 measures of variability and, 54 null hypothesis population mean in, 188, 221 null hypothesis (H0), 110, 111–112 creating, 111 defined, 110, 111 f-distribution and, 194–195 in independent-sample t-tests, 143–148 interaction effects and, 208 main effects and, 205–206 mean square and, 189–191 in nonparametric procedures, 230 in one-sample t-tests, 127–128 in one-tailed tests, 118–120, 133–134, 148, 154–156 in one-way ANOVA, 187–188 in one-way chi square, 221–223 Pearson r and, 175–177 power and, 123 proving, 115–117 rejecting, 115–117 in related-sample t-tests, 151–154 in t-distribution, 130–131, 133–134 in two-way ANOVA, 209 in two-way chi square, 224, 226 in Type I errors, 121–122 in Type II errors, 122–123 in within-subjects ANOVA, 199 number of scores (N), 21, 24, 29 number of scores (n), 144 O observed frequency (fo), 220–221 one-sample experiment, 109 one-sample t-test, 126–139 assumptions in, 128 278 defined, 127 interpreting, 133–135 one-tailed tests and, 133–134 performing, 128–132 setting up, 127–128 statistical hypotheses and, 127 summary of, 134–135 one-tailed test, 101, 118–120 defined, 109 independent-samples t-test and, 148 in one-sample t-test, 133–134 of Pearson r, 177 related-samples t-test and, 154–155 scores decreasing, 119–120 scores increasing, 118–119 t-distribution and df and, 130–132 t-table and, 132 one-way analysis of variance (ANOVA), 185–201. See also factors assumptions in, 197 components of, 189–191 defined, 185, 186 degrees of freedom in, 189, 191, 193, 194, 195 diagramming, 186 effect size and, 198–199 eta squared and, 198–199 experiment-wise error rate and, 187 F statistic and, 188 key terms, 185, 186 performing, 191–196 post hoc comparisons and, 188 reasons for conducting, 186 reporting, 198 SPSS to perform, 199 statistical hypotheses in, 187–188 summary of, 197–198 summary table of, 192 Tukey HSD test in, 196–197 within-subjects, 199 one-way chi square, 220–224 assumptions in, 197, 221 computing, 221–222 defined, 220 goodness of fit test and, 223–224 interpreting, 222–223 observed frequency in, 220–221 SPSS to perform, 231 ordinal scale (scores), 16–17, 24, 229–230 P parameter defined, 11 vs. statistic, 11 parametric statistics, 108 assumptions in, 108, 112 defined, 108 vs. nonparametric statistics, 219 participants, 5–6 Pearson correlation coefficient (r), 171–178 computing, 172–174 defined, 171 drawing conclusions about, 176–177 one-tailed test of, 177 reporting, 178 restricted range in, 171–172 sampling distribution of, 175–176 significance testing of, 174–177 summary of, 177–178 percentile cumulative frequency and, 32–33 defined, 32 relative frequency and, 30 percent, 30 perfect correlation, 168–169 perfectly consistent relationship, 8 phi coefficient (φ), 227–228 point estimation, 135 polygon, 24–25 pooled variance, 145 population mean (μ) in alternative hypothesis, 110, 143, 188 in computing z-score, 71 confidence interval used to estimate, 135–137 describing, 49 inferential statistics and, 107 interval estimation used to estimate, 135 margin of error and, 135 in null hypothesis, 188, 221 in one-sample experiment, 109 point estimation used to estimate, 135 population standard deviation and, 59 population variance and, 59 vs. sample mean, 94, 96 populations defined, 5 inferential statistics and, 10–11 samples and, 5–6 population standard deviation defined, 59 estimated, 60–61 interpreting, 61 population mean and, 59 population variance defined, 59 estimated, 60–61 interpreting, 61 population mean and, 59 positive linear relationship, 166 positively skewed distributions, 27 post hoc comparisons. See also Tukey HSD test defined, 188 in one-way ANOVA, 188–189, 196 in one-way chi square, 223 in two-way ANOVA, 214 power, 123 predicted Y score (Y'), 179–180 predictions deviations around the mean and, 45 to interpret coefficient, 168–170 linear regression and, 178–180 one-tailed tests and, 133 predictor variable, 178 pretest/posttest design, 150 probability (p), 88–104 defined, 89 logic of, 91 probability distributions and, 90–91 random sampling and, 89, 94–96 representative samples and, 96–102 of sample means, 92–94, 96–97 sampling error and, 94–96 standard normal curve and, obtaining from, 92–94 of Type I errors, 121–122 of Type II errors, 122 probability distributions, 90–91 proportion of the area under the normal curve defined, 31 finding, 31–32 standard deviation and, 58–59 z-distribution to determine, 75–79 proportion of variance accounted for computing, 180–181 defined, 180 effect size using, 157–158, 198, 211 Q qualitative variables, 7 quantitative variables, 7 quasi-independent variables, 12 R random sampling, 94–96 defined, 89 sampling error and, 94–96 range, 55 ranked scores, 230 ratio scale, 16 raw scores. See also z-scores computing when z is known, 71–72 decimal places and, 37 defined, 21 means and, 41, 44 relative frequency and, 30 standard deviation and, 57, 58 in z-distribution, 72–75 region of rejection critical value and, 99–100, 101 defined, 98 sampling distribution and, 98–99, 101–102 regression line vs. correlation coefficient, 166 linear, 166 predicted Y score and, 179–180 scatterplot and, 169, 170, 175, 179 related samples, 149–150 related-samples t-test, 149–155 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. assumptions in, 149, 150, 155 defined, 149 interpreting, 153–154 logic of, 150–151 matched-samples design and, 149 mean difference in, 151 one-tailed tests and, 154–155 performing, 152–155 repeated-measures design and, 149 sampling distribution of mean differences and, 152 standard error of the mean difference and, 153 statistical hypotheses for, 151–152 summary of, 155 in two-sample experiment, 142 relationships. See also correlational study absence of, 9–10 association and, 7 consistency of, 8–9 defined, 7 strength of, 9, 167–171 types of, 165–167 weakness of, 9 relative frequency, 29–32 computing, 29–30 defined, 29 normal curve used to find, 30–32 of sample means, 83–84 standard normal curve and, 75–76 z-distribution to compute, 75–79 z-tables and, 76–78 relative standing, 69 defined, 69 z-distribution and, 73–74 repeated-measures design, 149 representative samples critical value and, 99–101 defined, 94 likelihood of representativeness and, 96–102 random sampling and, 94–96 sampling distribution and, setting up, 98–99, 101–102 sampling error and, 94–96 vs. unrepresentative samples, 6 research. See also statistics conducting, 4 confidence interval created in, 135–136 distributions in (See frequency distributions) inferential statistics in, 94–96, 107–108 linear correlation coefficient in, 167 logic of, 5–7 mean in, 41, 44, 45–48, 74 normal curve in, 26–27 Pearson correlation coefficient in, 171–174 populations and, 5–6 samples and, 5–6 sampling error in, 109 standard deviation in, 55, 76 t-test in (See one-sample t-test; two-sample t-test) variables in, 6–9 variance in, 55 research literature, statistics in reporting ANOVA, 198 reporting χ2, 229 reporting means and variability, 65 reporting Pearson r, 178 reporting significant/ nonsignificant results, 120–121 reporting t, 136 reporting two-sample study, 156 restricted range, 171–172 rho (ρ), 175 rounding, rule for, 37 S sample mean computing z-score for, 82–83 formula for, 41 vs. population mean, 94, 96 probability of, 92–94, 96–97 relative frequency of, 83–84 z-score to describe, 79–84 samples. See also representative samples defined, 5 independent, 142 populations and, 5–6 random sampling and, 89 related, 149–150 sample standard deviation, 57–58 sample variance, 55–57 computing, 63–64 defined, 56 formula for, 56 sampling distribution chi square, 222–223 of differences between means, 143–144 of F, 194–196 for independent-samples t-test, 143–144 of mean differences, 152 of means, 79–81 for one-tailed test, 101 of Pearson r, 175–176 probability of sample means and, 92–94, 96–97 region of rejection and, 98–99, 101–102 representativeness and, 100–101 setting up, 98–100, 101–102 for two-tailed test, 99, 101 sampling error defined, 95 inferential statistics and, 95–96 probability and, 94–96 scales of measurement. See measurement scales scatterplot, 164–165. See also correlation coefficient scores. characteristics of, 15–17 continuous vs. discrete variables and, 16–17 deviation of, 46 frequency of (See frequency distributions) as locations on continuum, 38 measurement scales and, 15–16 number of, 21 one-tailed test for decreasing, 119–120 one-tailed test for increasing, 118–119 pairs of, 13, 164, 172 populations of vs. samples of, 6 raw (See raw scores) sigma (s). See population standard deviation sigma (∑). See sum significance testing of Pearson r, 174–177 t-test used for, 127 significant, defined, 115 significant results, determining, 115–116 simple frequency distributions, 21, 22, 28 skewed distributions, 27 slope of the line, 179 Spearman correlation coefficient (rs), 230 spread of scores. See measures of variability; normal distribution; standard deviation SPSS chi square procedures and, 231 measures of central tendency and, 40, 50–51 nonparametric procedures and, 231 one-sample t-test and, 138 one-way ANOVA and, 199 Pearson r and, 181 percentiles and, 32–33 statistics and, 4–5 two-sample t-test and, 159 two-way ANOVA and, 215 two-way chi square and, 231 z-scores in, 85–86 squared point-biserial correlation coefficient, 158 squared sum of X, 62–63 squared sum of Y, 172 standard deviation. See also population standard deviation area under the curve, 58–59 computing, 63–64 formulas for, computing, 63–64 measures of variability and, 55, 57–59 summary of, 61–62 standard error of the difference, 145–146 of the estimate, 180 of the mean, 81–82 of the mean difference, 153 standard normal curve defined, 75 probability obtained from, 92 relative frequency of sample means and, 83 t-distribution and, 130 z-distribution and, 75–77 z-table and, 83–84 standard scores. See z-scores statistical hypotheses, 109–112 alternative hypothesis and, 109–110 creating, 109–112 defined, 109 for independent-samples t-test, 143 logic of, 112 null hypothesis and, 110, 111–112 one-sample t-test and, 127 in one-way ANOVA, 187–188 parametric statistics in, 112 for related-samples t-test, 151–152 statistical procedures. See statistics statistics. See also research defined, 11 descriptive, 10 inferential, 10–11 vs. parameter, 11 purpose of, 3–4 SPSS computer program and, 4–5 studying, 4 strength of relationship, 9, 167–171 defined, 167 intermediate strength, 169–170 perfect correlation, 168–169 zero correlation, 170 sum of cross products, 172 of the deviations around the mean, 44–45 of squares (SS), 189, 192–193 of X, 37, 62 of Y, 172 symmetrical distribution. See normal distribution T tables frequency distributions in, 22–23 F-tables, 195, 197 summary table of one-way ANOVA, 192 t-tables, 132 z-tables, 76–78, 83–84 tail of distribution, 26 t-distribution defined, 130 degrees of freedom and, 130–132 test of independence, 224–225. See also two-way chi square defined, 224–225 theoretical probability distribution, 91 treatment, 185, 186 treatment effect, 185, 186, 196 t-table, 132 t-tests. See one-sample t-test; two-sample t-test Tukey HSD test in one-way ANOVA, 196–197 in two-way ANOVA, 212–213 two-sample t-test, 140–161 effect size and, 156–158 experiment, understanding, 141–142 Index 279 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. two-sample t-test (Continued) independent-samples t-test and, 142–149 related-samples t-test and, 142, 149–155 reporting, 156 two-tailed test, 99, 101 defined, 109 sampling distribution for, 113–114 two-way analysis of variance (ANOVA), 203–217 cells in, 204 completing, 209–213 defined, 203 interpreting, 214 main effects and, 204–206 reasons for conducting, 203–204 SPSS to perform, 215 Tukey HSD test and, 212–213 two-way between-subjects, 203 two-way interaction effect and, 207–209 two-way mixed design, 203 two-way within-subjects, 203 two-way between-subjects ANOVA, 203 two-way chi square, 224–229 assumptions in, 224 computing, 226–227 defined, 224, 225 logic of, 224–226 relationship in, 227–229 SPSS to perform, 231 test of independence and, 224–225 two-way interaction effect, 207–209 two-way mixed design ANOVA, 203 factorial design in, 204 280 two-way within-subjects ANOVA, 203 Type I errors, 121–122 defined, 121 experiment-wise error rate and, 187 probability of, 121–122 Type II errors, 122–123 defined, 122 power and, 123 probability of, 122 type of relationship, 165–167 coefficient used to describe, 167 linear, 165–166 nonlinear, 166–167 terminology to describe, 166 U unbiased estimators, 60–61 unconfounded comparison, 212 under the curve, 30–31. See also proportion of the area under the curve unimodal distribution, 39 unrepresentative samples, 6 U-shaped pattern, 166 V variability. See measures of variability variables continuous vs. discrete, 16–17 defined, 6 dependent, 12–13 independent, 12, 185 qualitative vs. quantitative, 7 quasi-independent, 12 relationships between, 7–10 in research, 6–9 understanding, 6–7 z-distribution to compare, 74–75 variance estimated population variance, 60–61 population, 59–61 sample, 55–57 W weakness of relationship, 9 Wilcoxon test, 230 within-subjects ANOVA, 199 two-way, 203 X X axis, 23 X variable. See independent variable; predictor variable Y Y axis, 23 Y intercept, 179 Y variable, See dependent variable; criterion variable Z z-distribution, 72–79 area under the curve and, 75–79 characteristics of, 73 to compare different variables, 74–75 to compute relative frequency, 75–79 defined, 72 to interpret scores, 72–74 raw scores in, 72–75 zero correlation, 170, 176 z-scores, 69–72, 79–84 computing, 114 computing for sample mean, 82–83 computing in sample or population, 70–71 computing raw score when z is known, 71–72 critical value compared to, 99–100, 114–115 critical value of, 99–100, 114–115 defined, 70 to describe sample means, 79–84 to determine relative frequency of raw scores, 75–79 raw scores transformed into, 69 relative location as, 69–70 relative standing of, 69 SPSS and, 85–86 z-tables, 76–78, 83–84 z-test, 113–115. See also onesample t-test assumptions in, 113, 117 comparing z to critical value and, 114–115 computing z and, 114 defined, 113 nonsignificant results, interpreting, 115, 116–117 sampling distribution for two-tailed test and, 113–114 significant results, interpreting, 115–116 summary of, 117–118 Behavioral Sciences STAT2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. reviewcard CHAPTER 1 INTRODUCTION TO STATISTICS AND RESEARCH CHAPTER SUMMARY 1-1 KEY TERMS Learning about Statistics • Statistical procedures are used to make sense out of data obtained in behavioral research: They are used to organize, summarize, and communicate data and to interpret what the data indicate. 1-2 The Logic of Research • The population is the group of individuals to which a law of nature—and our conclusions—apply. • The subset of the population that is actually measured in a study is the sample. • The individuals measured in the sample are the participants. • The scores and behaviors of the sample are used to infer (estimate) the scores and behaviors found in the population. • A representative sample accurately describes the population. However, by chance, a sample may be unrepresentative. • A variable is anything about a situation or behavior that can produce two or more different scores. • Variables may be quantitative, and reflect an amount, or qualitative, and reflect a quality or category. 1-3 variable quantitative qualitative relationship Applying Descriptive and Inferential Statistics • Descriptive statistics are used to describe sample data. • Inferential statistics are used for deciding whether the sample data accurately represent the relationship found in the population. • A statistic is a number that describes a characteristic of a sample of scores, symbolized using a letter from the English alphabet. • A parameter is a number that describes a characteristic of a population of scores, symbolized using a letter from the Greek alphabet. 1-5 sample participants Understanding Relationships • A relationship occurs when, as the scores on one variable change, the scores on the other variable tend to change in a consistent fashion. • The consistency in a relationship is also referred to as its strength. • In a perfectly consistent relationship, only one value of Y is associated with one X, and a different Y is associated with a different X. • In a weaker relationship, one batch of Y scores is associated with one X, and a different batch of Y scores is associated with a different X. • When no relationship is present, virtually the same batch of Y scores is paired with every X. 1-4 population descriptive statistics inferential statistics statistic parameter Understanding Experiments and Correlational Studies • A study’s design is the way in which the study is laid out. • In an experiment, we manipulate the independent variable and then measure participants’ scores on the dependent variable. A specific amount or category of the independent variable that participants are tested under is called a condition. • An experiment shows a relationship if participants’ dependent scores tend to consistently change as we change the conditions of the independent variable. • In a correlational study, neither variable is actively manipulated. Scores on both variables are simply measured, and then the relationship between them is examined. design experiment independent variable dependent variable condition correlational study 1.1 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. CHAPTER SUMMARY 1-6 KEY TERMS The Characteristics of Scores The four scales of measurement are: • A nominal scale, in which numbers name or identify a quality or characteristic. • An ordinal scale, in which numbers indicate rank order. • An interval scale, in which numbers measure a specific amount, but with no true zero, so negative numbers are allowed. • A ratio scale, in which numbers measure a specific amount and zero indicates truly zero amount. • A continuous variable can be measured in fractional amounts. • A discrete variable cannot be measured in fractional amounts. PROCEDURES AND FORMULAS Summary of Identifying an Experiment’s Components Researcher’s Activity Role of Variable Name of Variable Amounts of Variable Present Researcher manipulates variable → Variable influences a behavior → Independent variable → Conditions Researcher measures variable → Variable measures behavior that is influenced → Dependent variable → Scores Summary of Measurement Scales: Type of Measurement Scale Nominal Ordinal Interval Ratio What Does the Scale Indicate? Quality Relative quantity Quantity Quantity Is There an Equal Unit of Measurement? No No Yes Yes Is There a True Zero? No No No Yes How Might the Scale Be Used in Research? To identify males and females as 1 and 2 To judge who is 1st, 2nd, etc., in aggressiveness To convey the results of intelligence and personality tests To count the number of correct answers on a test Additional Examples Social Security numbers Elementary school grade Individual’s standing relative to class average Distance traveled Nominal and ordinal scales are assumed to be discrete, in which fractions are not possible. Interval and ratio scales are assumed to be continuous, in which fractions are possible. Summary of the Flow of Research 1. Based on a hypothesis about nature, we design either an experiment or a correlational study to observe the relationship between our variables in the sample. 2. Depending on the design and scale of measurement used, we select particular descriptive statistics to understand the scores and the relationship in the sample. 3. Depending on the design and scale used, we select particular inferential procedures to decide whether the sample accurately represents the scores and relationship found in the population. 4. By describing the scores and relationship in the population, we are describing the behavior of everyone in a particular situation, so we are describing an aspect of nature. 1.2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. reviewcard CHAPTER 1 INTRODUCTION TO STATISTICS AND RESEARCH PUTTING IT ALL TOGETHER 1. The purpose of statistical procedures is to make data meaningful by organizing, _____, communicating, and _____scores. 2. The large group of individuals or scores to which a law of nature applies is called the _____. 3. The small group of individuals we measure in a study is called the _____. 4. The individuals we measure in a study are called the _____. 5. To draw a research conclusion, we use the scores in the _____ to estimate the scores in the _____. 6. For inferences about a population to be accurate, the sample must be _____ of the population. 7. However, a sample may be unrepresentative because of _____ in determining which participants were selected. 8. Anything about a situation or behavior that when measured produces different scores is called a(n) _____. 9. If we are measuring actual amounts of a variable, it is a(n) _____variable. 10. If we are identifying qualities or categories, we have a(n) _____ variable. 11. When the scores on one variable change in a consistent manner as the scores on another variable change, a(n) _____ is present. 12. The clearer the pattern in a relationship, the more _____ the X and Y scores pair up. 13. How consistently the Y scores are associated with each X is also referred to as the _____ of a relationship. 14. The procedures used to describe a sample of data are called _____ procedures. 15. The procedures used to make inferences about the scores and relationship in the population are called _____ procedures. 16. A number describing an aspect of scores in the population is called a(n) _____. 17. A number describing an aspect of the sample data is called a(n) _____. 18. The layout of a study is called its _____. 19. The two general types of designs used to demonstrate a relationship are _____ and _____. 20. When we actively manipulate one variable to create a relationship with another variable, we are conducting a(n) _____. 21. When we passively measure scores from two variables to discover a relationship, we are conducting a(n) _____. 22. In an experiment, the variable manipulated by the experimenter is called the _____ variable. 23. In an experiment, the variable that measures participants’ behavior and produces our data is the _____variable. 24. The situation created by each amount or category of the independent variable is called a(n) _____. 25. Say that we measure the number of hours spent “social networking” by students who have spent 1, 2, 3, or 4 years in college. The hours spent networking is our _____variable, and year in college is our _____ variable. 26. In question 25, if we measure only freshmen and seniors, then we have two _____. 27. In question 25, we will demonstrate a relationship if the _____ scores for each group tend to be different. 28. The particular descriptive or inferential procedures we use in a study are determined by the _____ of the study and the _____ used to measure the variables. 1.3 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. SPSS INSTRUCTIONS 29. When we use numbers to identify qualitative or category differences, we have a(n) _____ scale. Entering Data and Naming Variables 30. When the scores indicate rank order, we have a(n) _____ scale. • The first step is to input the data. Open SPSS to the large grid labeled “Data Editor.” Across the top is a “Menu Bar” with buttons for dropdown menus such as File or Analyze. • Enter a set of scores from one variable in one column, one score per box. You may simply type in the data, but if you do so, the variable will be named “VAR00001” and so on. • It is better to give variables meaningful names. To name them, click on Variable View at the bottom of the editor. • In the left column under “Name,” click on the first rectangle and type a variable’s name. Press Enter on the keyboard. The information that appears is the SPSS defaults, including rounding to two decimal places. Click a rectangle to change the default. • For a second variable, click on the second rectangle in the “Name” column and enter the variable’s name. • Click on Data View and type in the scores under the corresponding variable. • When participants are measured on two variables, each row holds the scores from the same participant. • To save data, on the Menu Bar click File and then Save. • To retrieve a file to add more data or to analyze it, double-click on the saved file. This also opens SPSS. 31. When scores measure an amount, but there is no true zero, we have a(n) _____ scale. 32. When scores measure an amount and zero indicates truly zero amount, we have a(n) _____ variable. 33. A variable that can be measured in fractional amounts is called a(n) _____ variable. 34. A variable that cannot be measured in fractional amounts is called a(n) _____ variable. Answers to Putting It All Together 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 1. summarizing; interpreting population sample participants sample; population representative chance variable quantitative qualitative relationship consistently 20. 21. 22. 23. 24. 13. 14. 15. 16. 17. 18. 19. strength descriptive inferential parameter statistic design experiments; correlational studies experiment correlational study independent dependent condition 26. 27. 28. 29. 30. 31. 32. 33. 34. 25. dependent; independent conditions networking design; scale nominal ordinal interval ratio continuous discrete 1.4 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. reviewcard CHAPTER 2 CREATING AND USING FREQUENCY DISTRIBUTIONS CHAPTER SUMMARY 2-1 • • • • • Some New Symbols and Terminology The initial scores in a study are the raw scores. A score’s (simple) frequency is the number of times the score occurs. The symbol for simple frequency is f. A frequency distribution shows the frequency of each score in the data. The symbol for the total number of scores in the data is N. 2-2 data point polygon grouped distribution normal distribution normal curve tail negatively skewed distribution positively skewed distribution bimodal distribution Relative Frequency and the Normal Curve • The relative frequency of a score is the proportion of time that it occurred. • The total area under the normal curve represents 100% of the scores. A proportion of this area occupied by particular scores equals the combined relative frequency of those scores. 2-5 bar graph histogram Types of Frequency Distributions • In a normal distribution forming a normal curve, extreme high and low scores are relatively infrequent, scores closer to the middle score are more frequent, and the middle score occurs most frequently. • The low-frequency, extreme low and extreme high scores are in the tails of a normal distribution. • A negatively skewed distribution shows a pronounced tail only for low-frequency, extreme low scores. • A positively skewed distribution shows a pronounced tail only for low-frequency, extreme high scores, • A bimodal distribution shows two humps containing relatively high-frequency scores, with a score in each having the highest frequency. 2-4 raw score frequency f frequency distribution N Understanding Frequency Distributions • Create a bar graph (adjacent bars do not touch) with nominal or ordinal scores. • Create a histogram (adjacent bars touch) with a small range of interval or ratio scores. • A data point is a dot plotted on a graph. • Create a polygon (data points connected by straight lines) with a large range of interval or ratio scores. • In a grouped distribution, different X scores are grouped together and their combined frequency is reported. 2-3 KEY TERMS relative frequency area under the normal curve Understanding Percentile and Cumulative Frequency • Percentile is the percent of all scores below a given score. • Cumulative frequency is the number of scores at or below a particular score. • On the normal curve the percentile of a score is the percent of the area under the curve to the left of the score. percentile cumulative frequency 2.1 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. PROCEDURES AND FORMULAS When to create a bar graph, histogram, or polygon Consider the scale of measurement of scores on the X axis. Graph When Used? How Produced? Bar graph With nominal or ordinal X scores Adjacent bars do not touch Histogram With small range of interval/ratio Adjacent bars do touch X scores Polygon With large range of interval/ratio X scores Straight lines; add points above and below actual scores Computing Relative Frequency, Proportions, and Percents 1. To compute a score’s relative frequency, divide its frequency (f ) by the total number of scores (N ). 2. To transform relative frequency to simple frequency, multiply the relative frequency times N. 3. To transform relative frequency to a percent, multiply the proportion by 100. 4. To compute percent beginning with a raw score, perform steps 2 and 3 above. 5. Using the area under the normal curve: The combined relative frequency for a group of scores equals the proportion of the area in the slice above those scores. A score’s percentile equals the proportion of the area under the curve to the left of the score. Chapter Formulas Relative Frequency ⫽ f N 2.2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. reviewcard CHAPTER 2 CREATING AND USING FREQUENCY DISTRIBUTIONS PUTTING IT ALL TOGETHER 1. The number of times a score occurs in the data is called the score’s __________and is symbolized as _________. 2. A(n) _________ organizes the data according to the number of times each score occurs. 3. The total number of scores in a sample is symbolized by _________ and equals the sum of the _________ of the individual scores. 4. In a frequency table, scores are listed starting from the top with the __________ score. 5. A graph of a frequency distribution shows the scores on the _______axis and their corresponding frequency on the _______axis. 6. A frequency graph showing vertical bars that do not touch is called a(n) _________ graph and is used when the scale is either _________ or _________. 7. The height of each bar indicates the score’s _________. 8. The gap between adjacent bars indicates the variable on the _________ axis is a _________ variable. 9. When adjacent bars on the graph touch, the graph is called a(n) _________. 10. The lack of a gap between adjacent bars indicates the X variable is __________, so this graph is produced when the scale is either _________ or _________. 11. When a frequency graph is created by plotting a dot above each score and connecting adjacent dots with straight lines, the graph is called a(n) ________. 12. The dots on a graph are called ________ . 13. Polygons are created when the measurement scale is either _______ or _______. 14. The continuous line formed by the polygon indicates the scores were measured using a _________ scale. 15. Histograms are preferred when we have a _________ range of X scores, and polygons are preferred when we have a ________ range of X scores. 16. With too many scores to plot individually, we may combine scores into small groups and report the total frequency for each group. Such distributions are called _________ distributions. 17. A distribution forming a symmetrical, bell-shaped polygon is known as a ________ distribution. 18. In a normal distribution, scores near the________ of the distribution occur frequently, while the highest and lowest scores occur ________. 19. The portions of the curve containing low-frequency, extreme high or low scores are called the ____________ of the distribution. 20. A nonsymmetrical distribution that has only one distinct tail is called a(n) _________ distribution. 21. When the only tail is at the extreme low scores, the distribution is _________ skewed; when the only tail is at the extreme high scores, the distribution is ___________skewed. 22. A symmetrical distribution with two areas in which there are high-frequency scores is called a(n) __________ distribution. 23. Most behavioral research produces distributions that form a ________ distribution. 24. The proportion of time that a score occurs in the data is called the score’s __________ . 25. A score’s relative frequency is computed by dividing the score’s ________by_______. 26. To convert a relative frequency back to simple frequency, _________ the relative frequency by N. 2.3 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. SPSS INSTRUCTIONS 27. To convert a relative frequency to percent, _________ it by 100. 28. To convert a percent to relative frequency, _________ it by 100. 29. A “slice” of the normal curve that contains a particular proportion of the area under the curve also contains that proportion of all _________ in the data. 30. The proportion of the area under the normal curve in a slice at certain scores equals those scores’ combined _________. 31. The percent of the scores that are lower than a particular score is that score’s __________. 32. The number of scores that are at or below a particular score is the score’s _________. 33. When using the normal curve, a score’s percentile equals the proportion of the area under the normal curve that is located to the_________of the score. Answers to Putting It All Together Frequency Tables and Percentile • Enter the data and name it as described in Chapter 1. • On the Menu Bar, click Analyze, Descriptive Statistics, and Frequencies. • Click the arrow to move the variable to “Variable(s).” • Click OK. The frequency table appears. (“Cumulative percent” is not the precise percentile.) • To find the score at a particular percentile, click Analyze, Descriptive Statistics, Frequencies, and move the data to “Variables(s).” • Click Statistics. Checkmark “Percentile(s)” and type the percentile you seek. • Click Add. Add other percentiles, quartiles, or cut points. • Click Continue and OK. The percentile(s) will be listed in the “Statistics” table . Graphs To plot bar graphs and histograms, click Analyze, Descriptive Statistics, and Frequencies. 1. frequency; f 2. frequency distribution 3. N; frequencies 4. highest 5. X; Y 6. bar; nominal; ordinal 7. frequency 8. X; discrete 9. histogram 10. continuous; interval; ratio 11. polygon 12. 13. 14. 15. 16. 17. 18 19. 20. 21. 22. 23. data points interval; ratio continuous small; large grouped normal middle; infrequently tails skewed negatively; positively bimodal normal 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. relative frequency f; N multiply multiply divide scores relative frequency percentile cumulative frequency left • Click Charts. Select “Chart Type.” Click Continue. Right-click on the graph to export it. • To plot polygons and more complex graphs, on the Menu Bar, click Graphs and then Chart Builder. If asked, click OK to define the chart. • Under “Choose from,” select the graph’s style, and from the gallery, drag and drop the version you want. • Drag and drop your variable name to the X axis. (These graphs can also be used for plotting other types of X–Y relationships.) 2.4 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. reviewcard CHAPTER 3 SUMMARIZING SCORES WITH MEASURES OF CENTRAL TENDENCY CHAPTER SUMMARY 3-1 Some New Symbols and Procedures • The symbol ⌺X stands for the sum of X. • Round a final answer to two more decimal places than were in the original raw scores. 3-2 mode unimodal; bimodal median (Mdn) mean X Applying the Mean to Research • The amount a score differs from the mean is its deviation, computed as X ⫺ X. • The sum of the deviations around the mean, ⌺(X ⫺ X), equals zero. This makes the mean the best score to predict for any individual because the total error across all predictions will equal zero. • In an experiment, a relationship between the independent and dependent variables is present when the means from two or more conditions are different. No relationship is present when all means are the same. • When graphing the results of an experiment, the independent variable is plotted on the X axis and the dependent variable on the Y axis. • A line graph is created when the independent variable is measured using a ratio or an interval scale. A bar graph is created when the independent variable is measured using a nominal or ordinal scale. • On a graph, if the data form a pattern that is not horizontal, then the Y scores are changing as the X scores change, and a relationship is present. If the data form a horizontal line, then the Y scores are not changing as X changes, and a relationship is not present. 3-5 central tendency Computing the Mean, Median, and Mode • The mode is the most frequently occurring score or scores in a distribution, and is used primarily to summarize nominal data. • A distribution with only one mode is unimodal; a distribution with two modes is bimodal. • The median (Mdn) is the score at the 50th percentile. It is used primarily with ordinal data and with skewed interval or ratio data. • The mean is the average score, located at the mathematical center of a distribution. It is used with interval or ratio data that form a normal distribution. • The symbol for a sample mean is X. 3-4 sum of X What Is Central Tendency? • Measures of central tendency indicate a distribution’s location on a variable, indicating where the center of the distribution tends to be. • The three measures of central tendency are the mean, median, and mode. 3-3 KEY TERMS deviation sum of the deviations around the mean line graph bar graph Describing the Population Mean • The symbol for a population mean is μ. • The X in each condition of an experiment is the best estimate of (1) the score of any participant in that condition and (2) the mean that would be found if the population was tested under that condition. • We conclude that a relationship in the population is present when we infer different values of m, implying different distributions of dependent scores for two or more conditions of the independent variable. μ 3.1 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. PROCEDURES AND FORMULAS Selecting a Measure of Central Tendency Type of Data Compute What It Is Nominal scores Mode Most frequent score Ordinal scores Median 50th percentile Skewed interval or ratio scores Median 50th percentile Normally distributed interval or ratio scores Mean Average score Steps in Summarizing an Experiment 1. Identify the independent and dependent variables. 2. Summarize the dependent scores. Depending on the characteristics of the dependent variable, compute the mean, median, or mode in each condition. 3. Graph the results. Depending on the characteristics of the independent variable, create a line graph (with interval or ratio variables) or a bar graph (with nominal or ordinal variables). Chapter Formulas ⌺X N Sample Mean X⫽ Deviation (X ⫺ X ) Sum of the Deviations ⌺(X ⫺ X ) 3.2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. reviewcard CHAPTER 3 SUMMARIZING SCORES WITH MEASURES OF CENTRAL TENDENCY PUTTING IT ALL TOGETHER 1. Measures of central tendency describe the general _________ of a distribution on a variable. 2. The three measures of central tendency are the _________. 3. The most frequently occurring score in a distribution is called the _________ and is the preferred measure when the scores involve a(n) _________ scale of measurement. 4. The score at the 50th percentile is the measure of central tendency called the _________ and symbolized as _________. 5. The term “50th percentile” means that 50% of the scores are _________ this score. 6. The median is preferred when the data involve a(n) _________ scale. 7. The statistic located at the mathematical center of a distribution is the _________. 8. The mean is the preferred measure with a normal distribution of _________ or _________ scores. 9. The _________ is the preferred measure with skewed interval and ratio data. 10. The symbol for a sample mean is _________. 11. The formula for a sample mean is _________ divided by _________. 12. In words, “⌺X” is called the _________. 13. The difference between a score and the mean is called the score’s _________. 14. The symbol for a deviation is _________. 15. The distance a score is from the mean is indicated by the _________ of the deviation, and the direction a score is from the mean is indicated by the _________ of the deviation. 16. A positive deviation indicates the score is _________ than the mean, and a negative deviation indicates the score is _________ than the mean. 17. A score equal to the mean has a deviation of _________. 18. When we add together all the deviations in a sample, we find the “_________.” 19. We indicate this sum in symbols as _________. 20. The sum of the deviations around the mean always equals _________. 21. To predict any individual’s score, we should predict the _________ . 22. Then, in symbols, the difference between a participant’s actual score and his or her predicted score is _________. 23. The total of all the prediction errors in symbols is _________, and always equals _________. 24. In an experiment, we measure participants’ scores on the _________ variable under each condition of the _________ variable. 25. We decide which statistics to compute based on the characteristics of the _________ variable. 26. We decide which type of graph to create based on the characteristics of the _________ variable. 27. We usually summarize experiments by computing the _________ for each _________. 28. When a relationship is present, the _________ change as the conditions change. 29. To graph an experiment, place the _________ variable on X and the _________ variable on Y. 30. Create a _________ graph if the _________ variable is interval or ratio; create a _________ graph if the variable is nominal or ordinal. 31. A graph shows a relationship if the pattern formed by the data points is not _________. 3.3 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. SPSS INSTRUCTIONS Computing Central Tendency 32. The symbol for a population mean is _________. 33. We usually estimate the population mean by computing _________ in a sample. 34. The ultimate goal of an experiment is to conclude that changing our conditions would produce different _________ located at different values of _________. Answers to Putting It All Together • Enter the data as in Chapter 1. • On the Menu Bar, select Analyze, Descriptive Statistics, and Frequencies. • Move each variable to “Variables(s).” • Click Statistics. In the “Frequencies: Statistics” box, check each measure under Central Tendency that you seek. • Click Continue and OK. Your answers will appear in the “Statistics” box. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 1. location 2. mode, median, and mean mode; nominal median; Mdn below ordinal mean interval; ratio median X ⌺ X; N sum of X 19. 20. 21. 22. 23. 24. 13. 14. 15. 16. 17. 18. deviation X⫺X size; sign larger; smaller zero sum of the deviations around the mean ⌺ (X ⫺ X) zero mean X⫺X ⌺ (X ⫺ X); zero dependent; 30. 31. 32. 33. 34. 25. 26. 27. 28. 29. populations; m X m independent dependent independent mean; condition means and scores independent; dependent line; independent; bar horizontal 3.4 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. reviewcard CHAPTER 4 SUMMARIZING SCORES WITH MEASURES OF VARIABILITY CHAPTER SUMMARY 4-1 Understanding Variability • Measures of variability indicate how much the scores differ from each other, how accurately the mean represents the scores, and how much the distribution is spread out. 4-2 population variance (sX2) population standard deviation (sX) biased estimators unbiased estimators estimated population variance (sX2) estimated population standard deviation (sX) Computing Formulas for the Variance and Standard Deviation • The symbol (⌺X)2 is the squared sum of X. • The symbol ⌺X 2 is the sum of squared Xs. 4-7 sample standard deviation (SX) Summary of the Variance and Standard Deviation • For the variance: S2X describes the sample, sX2 describes the population, and s2X is the estimated population variance based on a sample. • For the standard deviation: SX describes the sample, sX describes the population, and sX is the estimated population standard deviation based on a sample. 4-6 sample variance (S 2X) The Population Variance and Standard Deviation • The true population variance (sX2) indicates the average squared deviation of scores around m. • The true population standard deviation (sX) can be interpreted as somewhat like the “average” amount that scores deviate from μ. • The formulas for SX and S 2X are biased estimators of the population’s variability because they use N in the denominator. • The formulas for sX and s2X are the unbiased estimators of the population’s variability because they use N – 1 in the denominator. 4-5 range The Sample Variance and Standard Deviation • The variance and standard deviation are used with the mean to describe a normal distribution of interval or ratio scores. • The sample variance (S 2X) is the average of the squared deviations of scores around the mean. • The sample standard deviation (SX) is the square root of the variance, but can be interpreted as like the “average” amount that scores deviate from the mean. • On a normal distribution, 34% of the scores are between the mean and the score that is 1 standard deviation from the mean. 4-4 measures of variability The Range • The range is the difference between the highest and the lowest scores. • It is used as the sole measure of variability with nominal or ordinal scores. 4-3 KEY TERMS squared sum of X [(⌺X)2] sum of squared Xs (⌺ X 2) Statistics in the Research Literature: Reporting Means and Variability • In research publications, the symbol for the sample mean is M, and the symbol for the estimated population standard deviation is SD. 4.1 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. PROCEDURES AND FORMULAS Organizing the Measures of Variability Describing variability (differences between scores) Descriptive measures are used to describe a known sample or population In formulas, final division uses N Inferential measures are used to estimate the population based on a sample In formulas, final division uses N – 1 To describe population variance compute X2 To describe sample variance 2 compute S X Taking square root gives Taking square root gives Population standard deviation X Sample standard deviation SX To estimate population variance compute s X2 Taking square root gives Estimated population standard deviation sX Chapter Formulas 1. The formula for the range is Range ⫽ highest score ⫺ lowest score 2. The computing formula for the sample variance is (⌺X )2 N ⌺X 2 ⫺ SX2 ⫽ N 3. The computing formula for the sample standard deviation is ⌺X 2 ⫺ SX ⫽ H (⌺X )2 N N 4. The computing formula for estimating the population variance is (⌺X )2 N N⫺1 ⌺X 2 ⫺ sX2 ⫽ 5. The computing formula for estimating the population standard deviation is (⌺X )2 ⌺X ⫺ N sX ⫽ H N⫺1 2 4.2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. reviewcard CHAPTER 4 SUMMARIZING SCORES WITH MEASURES OF VARIABILITY PUTTING IT ALL TOGETHER 1. In statistics the extent to which scores differ from one another is called _______. 2. A distribution with greater variability is _______ spread out around the mean, so the mean is _______ accurate for describing the scores. 3. The opposite of variability is _______. 4. The measure of variability indicating how far the lowest score is from the highest score is the _______. 5. It is used primarily when variables involve _______ or _______ scales. 6. The range is computed by subtracting the _______ score from the _______ score. 7. When the mean is the appropriate measure of central tendency, the two measures of variability to compute are the _______ and _______. 8. The smaller these statistics are, the _______ the scores are spread out around the mean, and the _______ the scores differ from one another. 9. The variance is defined as the _______ deviation of the scores around the mean. 10. The symbol for the sample variance is _______. 11. The variance can be difficult to interpret because it measures the variable in _______ units. 12. The measure of variability more similar to the average deviation of the scores around the mean is called the _______. 13. Mathematically, the standard deviation equals the _______ of the variance. 14. The symbol for the sample standard deviation is _______. 15. The larger the standard deviation, the _______ the distribution is spread out. 16. To describe where most of the scores in a sample are located, we find the scores at _______ and _______ 1 standard deviation from the mean. 17. In a normal distribution, _______% of all scores fall between the mean and a score that is 1 standard deviation from the mean. 18. Between the scores at ⫹1SX and ⫺1SX from the mean are _______% of the scores. 19. The population variance and standard deviation indicate how spread out the scores are around _______. 20. The symbol for the true population variance is _______, and for the true population standard deviation, it is _______. 21. We expect 68% of the population to fall between the scores at ______ and _______. 22. We do not estimate the population variability using the formulas for SX and S2X because they are called the _______. 23. They are called this because they tend to produce an answer that is too _______. 24. This occurs because they divide by N when only _______ of the scores in a sample reflect the variability in the population. 25. Instead, we estimate the variability in the population using the _______, which divide by _______. 26. The symbol for the estimated population standard deviation is ________, and the symbol for the estimated population variance is _______. 27. In symbols, for the variance we compute _______ to estimate _______; for the standard deviation we compute _______ to estimate _______. 4.3 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. SPSS INSTRUCTIONS 28. In published research the measure of variability researchers usually report is the _______. 29. Publications usually use the symbol _______ for the sample mean and the symbol ________ for the standard deviation. Answers to Putting It All Together Computing Variability • After entering the data, on the Menu Bar select Analyze, Descriptive Statistics, and Frequencies. Move each variable to “Variable(s).” • Click Statistics. Check the measures of Central Tendency and the measures of Dispersion that you seek. • Click Continue and OK. • In the “Statistics” box, the standard (std.) deviation and variance given are the estimated population versions. (You’ll encounter the “S.E. mean” in Chapter 5.) 1. variability 2. more; less 3. consistency 4. range 5. nominal; ordinal 6. lowest; highest 7. variance; standard deviation 8. less; less 9. average squared 10. S 2X 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. m sX2 ; sX ⫹ 1 sX ; ⫺ 1 sX squared standard deviation square root SX more plus; minus 34 68 22. biased estimators 23. small 24. N ⫺ 1 25. unbiased estimators; N⫺1 26. sX ; s2X 27. s2X ; s2X ; sX ; sX 28. estimated population standard deviation 29. M; SD 4.4 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. reviewcard CHAPTER 5 DESCRIBING DATA WITH z-SCORES AND THE NORMAL CURVE CHAPTER SUMMARY 5-1 Understanding z-Scores • The relative standing of a score reflects a systematic evaluation of it relative to a sample or a population. • A z-score indicates the distance a score is from the mean when measured in standard deviations. • z-scores are used to describe the relative standing of raw scores, to compare scores from different variables, and to determine their relative frequency. 5-2 relative standing z-score (z) Using the z-Distribution to Interpret Scores • A z-distribution is produced by transforming all raw scores in a distribution into z-scores. • The mean of a z-distribution is 0 and the standard deviation is 1. • A positive z-score indicates that the raw score is above the mean, and a negative z-score indicates that the raw score is below the mean. • The larger the absolute value of z, the farther the raw score is from the mean, so the less frequently the z-score and raw score occur. 5-3 KEY TERMS z-distribution Using the z-Distribution to Compare Different Variables • z-scores equate scores from different variables by comparing them using relative standing. • z-scores are also called “standard scores.” • A particular z-score is always at the same relative location on the z-distribution for any variable. 5-4 Using the z-Distribution to Compute Relative Frequency • The standard normal curve is a perfect normal z-distribution that is our model of the z-distribution. It is used with normally distributed, interval or ratio scores. • Raw scores that produce the same z-score have the same relative frequency and percentile. • Using the standard normal curve and z-table, we determine the proportion of the area under the curve that is above or below a particular z-score. This proportion is also the expected relative frequency of raw scores in this portion of the curve. 5-5 standard normal curve Using z-Scores to Describe Sample Means • The sampling distribution of means is the frequency distribution of all sample means that occur when an underlying raw score population is infinitely sampled using a particular N. • The central limit theorem shows that a sampling distribution is approximately normal, has a m equal to the m of the underlying raw score population, and has variability related to the variability of the raw scores. • The true standard error of the mean (sX) is the standard deviation of the sampling distribution of means. • A z-score for a sample mean indicates how far the mean is from the m of the sampling distribution when measured in standard error units. • Using the standard normal curve and z-table, we determine the proportion of the area under the curve that is above or below our mean’s z-score. This proportion is the relative frequency of sample means above or below our mean that occur when sampling from the underlying raw score population. sampling distribution of means central limit theorem standard error of the mean (sX–) 5.1 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. PROCEDURES AND FORMULAS Summary of Steps When Using the z-Table If You Seek First, You Should Then You Relative frequency of scores between X and X transform X to z find area in column B* Relative frequency of scores beyond X in tail transform X to z find area in column C* X that marks a given relative frequency between X and X find relative frequency in column B transform z to X X that marks a given relative frequency beyond X in tail find relative frequency in column C transform z to X Percentile of an X above X transform X to z find area in column B and add .50 Percentile of an X below X transform X to z find area in column C *To find the simple frequency of the scores, multiply relative frequency times N. Summary of Steps When Computing a z-Score for a Sample Mean 1. Create the sampling distribution of means with m equal to the m of the underlying raw score population. 2. Compute the sample mean’s z-score: a. Compute the standard error of the mean, sX. b. Compute z. 3. Use the z-table to determine the relative frequency of scores above or below this z-score. This is the relative frequency of sample means when sampling from the underlying raw score population. Chapter Formulas 1. The formula for a z-score in a sample is z⫽ X⫺X SX 2. The formula for a z-score in a population is z⫽ X⫺m sX 3. The formula for transforming a z-score to its raw score in a sample is X ⫽ (z)(SX) ⫹ X 4. The formula for transforming a z-score to its raw score in a population is X ⫽ (z)(sX) ⫹ m 5. The formula for the standard error of the mean is sX ⫽ sX 1N 6. The formula for computing a z-score for a sample mean is z⫽ X⫺m sX 5.2 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. reviewcard CHAPTER 5 DESCRIBING DATA WITH z-SCORES AND THE NORMAL CURVE PUTTING IT ALL TOGETHER 1. When we evaluate a raw score relative to other scores in the data, we are describing the score’s _____. 2. The best way to accomplish this evaluation is by computing the score’s _____. 3. The definition of a z-score is that it indicates the _____ a raw score deviates from the mean when measured in _____ units. 4. The symbol for a z-score is _____. 5. A positive z-score indicates that the raw score is _____ than the mean and graphed to the _____ of it. 6. A negative z-score indicates that the raw score is _____ than the mean and graphed to the _____ of it. 7. The size of the z-score (ignoring the sign) indicates the _____ the score is from the mean. 8. The z-score for a raw score that equals the mean is _____. 9. The larger a positive z-score is, the _____ is the raw score, and the _____ frequently it occurs. 10. The larger a negative z-score is, the _____ is the raw score, and the _____ frequently it occurs. 11. Seldom are z-scores greater than { _____. 12. Transforming all raw scores in a distribution to z-scores results in a(n) _____. 13. The mean of a z-distribution always equals _____ and the standard deviation always equals _____. 14. One reason to transform raw scores to z-scores is to make scores on different variables _____, because we compare participants’ _____ in each sample. 15. Another reason to compute z-scores is to determine the _____ of raw scores. 16. Relative frequency tells us the _____ a score occurs. 17. The model of the z-distribution we employ is called the _____. 18. To use the standard normal curve, we first compute a(n) _____ to identify a slice of the normal curve. 19. Then, from the z-table we determine the _____ in the slice. 20. Each proportion also equals the _____ of the corresponding z-scores in the slice. 21. The relative frequency of the z-scores is also the expected relative frequency of the corresponding _____. 22. A raw score’s percentile equals the proportion of the curve that is to the _____ of its z-score. 23. When computing a percentile, we add .50 to the proportion obtained from the z-table when the z-score has a(n) _____ sign, but not when the z-score has a(n) _____ sign. 24. We also use z-scores to describe the _____ of a sample mean when compared to a distribution of all possible sample means that might occur. 25. This distribution is called the _____. 26. The statistical principle called the _____ defines the shape, the mean, and the standard deviation of a sampling distribution. 27. The mean of the sampling distribution always equals the mean of the _____ population. 28. The standard deviation of the sampling distribution is called the _____. 29. The symbol for the true standard error of the mean is _____. 30. A z-score for a sample mean indicates the amount the mean deviates from the _____ of the sampling distribution when measured in _____ units. 5.3 Copyright 2013 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBo