USING PROBABILITY TO MAKE
DECISIONS ABOUT DATA
Your goals in the chapter are to learn:
• What probability is
• How to compute the probability of raw scores and sample means using z-scores
• How random sampling should produce a representative sample
• How sampling error may produce an unrepresentative sample
• How to use a sampling distribution of means to decide whether a sample represents a particular population
Selecting a sample so that all events or individuals in the population have an equal chance of being selected is known as random
sampling.
• The probability of an event is equal to the event’s relative frequency in the population of possible events that can occur
• The symbol for probability is p
Probability Distributions
A probability distribution indicates the probability of all possible events in a population
An empirical probability distribution is created by observing the relative frequency of every event in the population
A theoretical probability distribution is based on how we assume nature distributes events in the population
Obtaining Probability from the Standard
Normal Curve
The proportion of the total area under the
standard normal curve for particular scores equals the probability of those scores.
One type of theoretical probability distribution known as the sampling distribution of means is used to determine the probability of randomly obtaining any particular sample means.
Sampling Distribution of SAT
Means When N = 25
• The probability of selecting a particular sample mean is the same as the probability of randomly selecting a sample of participants whose scores produce that sample mean
• The larger the absolute value of a sample mean’s z-score, the less likely the mean is to occur when samples are drawn from the underlying raw score population
Random Sampling and
Sampling Error
• A representative sample is one in which the characteristics of the individuals and scores in the sample accurately reflect the characteristics of the individuals and scores in the population
• Sampling error occurs when random chance produces a sample statistic (e.g., s 2 ) not equal to the population parameter it represents
(e.g., s 2
)
• It is always possible to obtain a sample that is not representative
• Therefore, any sample might either poorly represent one population because of sampling error or accurately represent a different population
Deciding Whether a Sample
Represents a Population
Sample mean A is likely. Sample mean B is unlikely.
• At some point, a sample mean is so far above or below the population mean it is unbelievable that chance produced such an unrepresentative sample
• The area beyond these points is called the region of rejection
The region of rejection is the part of a sampling distribution containing values so unlikely we
“reject” the idea they represent the underlying raw score population.
A Sampling Distribution of Means
Showing the Region of Rejection
The criterion is the probability defining samples as unlikely to be representing the raw score population.
• A critical value marks the inner edge of the region of rejection
• For a criterion of .05, the area in each tail equals .025
• ± 1.96 is the critical value of z for a criterion of .05 in a two-tailed test
• When a sample’s z-score lies beyond the critical value, reject the idea the sample represents the underlying raw score population reflected by the sampling distribution
• When the z-score does not lie beyond the critical value, retain the idea the sample represents the underlying raw score population
1. Set up the sampling distribution
– Select the criterion (e.g., .05)
– Locate the region of rejection
– Determine the critical value (e.g., ± 1.96 in a two-tailed test with a criterion of .05)
2. Compute the sample mean and its z-score
– s
X
– X m of the sampling distribution
• Two-tailed tests—we reject the idea the sample mean is representative if it falls in either the negative tail or the positive tail of the distribution
• One-tailed tests
– If we are interested in positive z-scores, reject the idea the sample mean is representative only if it falls in the positive tail
– If we are interested in negative z-scores, reject the idea the sample mean is representative only if it falls in the negative tail
A sample of 10 scores yields a sample mean of 305.
Does the sample represent the population where m
325 and s
X
25 ?
s
X z
s
X
N
( X s
X m
)
25
10
305
7 .
91
325
7 .
91
20
7 .
91
2 .
53
• With a criterion of 0.05 and a region of rejection in two tails, the critical value is
1.96.
• Since the sample z of –2.53 is beyond –1.96, it is in the region of rejection. The sample does not represent the population.