Chapter 6 Probability Introduction We usually start a study asking questions about the population. But we conduct the research using a sample. The role of inferential statistics is to use the sample data as the basis for answering questions about the population. Introduction (cont.) To accomplish this goal, inferential procedures are typically built around the concept of probability. Specifically, the relationships between samples and populations are usually defined in terms of probability. By knowing the makeup of a population, we can determine the probability of obtaining specific samples. This way, probability gives us a connection between populations and samples. This way, probability gives us a connection between populations and samples which will be the foundation for inferential statistics (later chapters) The marble samples began with a population and ended with a sample obtained. The goal of inferential statistics begins with a sample that answers general questions about the population. Two steps to reach the goal: – Develop probability as a bridge from population to samples – Then, reverse the probability rules to allo the samples to move to populations. Figure 6.1 Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning The role of probability in inferential statistics The role of probability in inferential statistics. Probability is used to predict what kind of samples are likely to be obtained from a population. Thus, probability establishes a connection between samples and populations. Inferential statistics rely on this connection when they use sample data as the basis for making conclusions about populations. Probability Definition In a situation where several different outcomes are possible, we define the probability for any particular outcome as a fraction or proportion. If the possible outcomes are identified as A, B, C, D, and so on, then Probability of A = number of outcomes classified as A Total number of possible outcomes Example A deck of cards – there are 52 cards. The probability of choosing the king of hearts is p = 1/52 The probability of choosing an ace is p = 4/52 = 1/13 = .0769 Use a notation system – p(hearts) – p(aces) Note: Probability is defined as a proportion. Can restate any probability problem as a proportion problem. – What is the probability of obtaining a king from a deck of cards? 4/52 – Out of the whole deck, what proportion are kings? 4/52 There is a reason to understand this now. Example: p(tails) = ½ = .50 = 50% Any of the three forms are acceptable. Question: – If you had a jar of all white marbles, what is the possibility of choosing a black marble? – What is the probability of choosing a white marble? Random Sampling For the definition of probability to be accurate, the outcomes must be obtained through random sampling: Random sampling must satisfy two requirements: – Each individual in the population must have an equal chance of being selected Assures no bias in the selection process Requirements for Random Sample (cont.) – If more than one individual is to be selected for the sample, there must be constant probability for each and every selection p(jack of diamonds) = 1/52 for the first draw p(jack of diamonds) = 1/51 for the second draw p(jack of diamonds) = 0 if the jack of diamonds was the first draw This contradicts the first requirement that states the probability must stay constant. Sampling with replacement To keep the probabilities from changing from one selection to the next, it is necessary to replace each sample before you make the next selection – Sampling with replacement Types of random sampling Simple random sample Independent random sample Sampling with replacement Sampling without replacement There are different sampling techniques used by researchers Probability and Frequency Distribution In education, we are usually concerned with probability that will involve a population of scores that can be displayed in a frequency distribution graph. If the graph represents the entire population, then a portion of the graph represents a different portion of the population. Probability and Frequency Distribution (cont.) Because probability and proportion are equivalent, a particular proportion of the graph corresponds to a particular probability in the population. Thus, whenever a population is presented in a frequency distribution graph, it will be possible to represent probabilities as proportions of the graph. Example: N = 10 scores 1, 1, 2, 3, 3, 4, 4, 4, 5, 6 If you take a random sample of n=1 score from this population, what is the probability of obtaining a score greater than 4? p(X > 4) = ? p(X > 4) = ? Using this criteria, there are 2 scores that meet this criterion out of the total group of N=10 scores, so p = 2/10 We are now defining probability as the proportion of area in the frequency distribution graphs. – very graphic and concrete way of representing probability What is the probability of selecting a score less than 5? p(X < 5) = ? What part of the graph is unshaded? p = 8/10 Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Figure 6.3 The normal distribution Probability and the Normal Distribution Note that the normal distribution is symmetrical Highest frequency in the middle Frequencies tapering off as you move towards the extremes Normal shape can also be described by the proportions of area contained in each section of the distribution Probability and the Normal Distribution (cont.) Statisticians often identify sections of a normal distribution by using z-scores Remember that z-scores measure positions in a distribution in terms of standard deviations from the mean The graph shows the percentage of scores that fall in each of these sections Figure 6.4 The normal distribution following a z-score transformation Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Of the population Of the population Of the population In this way it is possible to define a normal distribution in terms of its proportions That is, a distribution is normal if and only if it has all the right proportions Note: Because the normal distribution is symmetrical the sections on the left side of the distribution have exactly the same proportions as the corresponding sections on the right side Note: Because the locations in the distribution are identified by z-scores, the proportions shown in the figure apply to any normal distribution regardless of the values for the mean and the standard deviation When any distribution is transformed into z-scores, the mean becomes zero and the standard deviation becomes one The process of answering probability questions about a normal distribution Example 6.2 – Adult heights form a normal distribution with a mean of 68 inches and a standard deviation of 6 inches. – Given this information about the population and the known proportions for a normal distribution – We can determine the probability associated with specific examples Figure 6.4 The normal distribution following a z-score transformation Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Of the population Of the population Of the population For example… What is the probability of randomly selecting an individual from this population who is taller than 6 feet 8 inches (X=80 inches) P(X > 80) 1. The probability question is translated into a proportion question: Out of all possible adult heights, what proportion is greater than 80? 2. We know that “all possible adult heights” is simply the population distribution. The m = 68, so the score X = 80 it to the right of the mean. Because we are interested in all heights greater than 80, we shade in the area to the right of the 80. This area represents the proportion we are trying to determine. Identify the exact position of X=80 by computing a z-score. For this example, z = X - m = 80 - 68 = s 6 12 = 2.00 6 A height of 80 is 2 s.d. above the mean and corresponds to a z-score of +2.00. The proportion we are trying to determine may now be expressed in terms of a zscore p(X > 80) = ? p(z > 2.00) = ? p(X > 80) = p(z > 2.00) = 2.28% Figure 6.4 Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning The normal distribution following a z-score transformation All normal distributions will have 2.28% of the scores in the tail beyond z = +2.00 Of the population Unit Normal Table The graph of the normal distribution shows proportions for only a few selected z-score values. A more complete listing of z-scores and proportions is provided in the unit normal table. This table lists proportions of the normal distribution for a full range of possible zscore values. Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Figure 6.6 A portion of the unit normal table z= 0.25 59.87% 40.13% Column A lists z-score values corresponding to different locations in a normal distribution Column B and C: identify the proportion of the distribution in each of the two sections Column B: presents the proportion in the body (the larger portion) Column C: presents the proportion in the tail When you use the unit table… Keep in mind… – The body corresponds to the larger part (either right-hand or left-hand) – The tail corresponds to the smaller part of the distribution – The proportions on the right-hand side are exactly the same as the corresponding proportions on the left-hand side When you use the unit table… Proportions will always be positive (even if z-score is negative For any specific z-score value, the two proportions will always add up to 1.00 (the whole distribution) Let’s review… The unit normal table lists relationships between z-scores locations and proportions in a normal distribution For any z-score location, you can use the table to look up the corresponding proportions If you know the proportions, you can use the table to look up the specific z-score location Because we have defined probability as equivalent to proportion, you can also use the unit normal table to look up probabilities for normal distributions Example 6.3A 15.87% What proportion of the normal distribution corresponds to z-score values greater than z = 1.00? Shade the area you are trying to determine. Look up z=1.00 in Column A Read Column C for the proportion which is 0.1587 or 15.87% Example 6.3B For a normal distribution, what is the probability of selecting a z-score less than z = 1.50? 93.32% Example 6.3C What is the proportion of the normal distribution that corresponds to the tail beyond z = - 0.50? 30.85% If you have the proportion, can you find the z-score? For a normal distribution, what z=score separates the top 10% from the remainder of the distribution? 10% = .1000 Locate .1000 on the table – Column C or 90% = .9000 Locate .9000 on the table – Column B Choose the closest number that you can. For this case, it would be 0.1003 in Column C. Z= +1.28 (Make sure to designate + or -). Example 6.4B Body Tail For a normal distribution, what z-score value forms the boundary between the top 60% and the bottom 40% of the scores? Column B - 0.6000 Column C - 0.4000 z = - 0.25 Probabilities, Proportions, and Scores (X values) In most situations, it will be necessary to find probabilities for specific X values – Transform the X value into z-scores – Use the unit normal table to look up the proportions corresponding to the z-score values Example It is known that IQ scores form a normal distribution with m = 100 and s = 15. What is the probability of randomly selecting an individual with an IQ score greater than 130? Example 6.5 P (X > 130) = ? We want to find the proportion of the IQ distribution that corresponds to scores greater than 130. Change the X values into z-scores p (X > 130) = ? X = 130 z = X – m = 130 – 100 = 30 = 2.00 s 15 15 Look up the z-score value in the unit normal table p (X>130) = 0.0228 = 2.28% Finding proportions/probabilities located between two scores This example demonstrates the process of finding the probability of selecting a score that is located between two specific values. We are now looking for a proportion defined by a slice from the middle of the normal distribution. Finding proportions/probabilities located between two scores The final answer does not correspond to either the body or the tail of the distribution, which means that you cannot read the answer directly from the table. Instead, you must use the information in the table to calculate the final answer. Example 6.6 The distribution of SAT scores in normal with m = 500 and s = 100. What is the probability of randomly selecting an individual with a score between X = 600 and X = 700? In other words… Find p (600 < X < 700) = ? Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Figure 6.10 The distribution for Example 6.6 Transform each X value into a z-score For X = 600 z = X – m = 600 – 500 = 100 = 1.00 s 100 100 For X = 700 z = X – m = 700 – 500 = 200 = 2.00 s 100 100 Now find the proportion of the normal distribution that is located between z = +1.00 and z = +2.00 We can approach the problem one of two ways… Approach 1 This method focuses on the proportions in the tail of the distribution. Use Column C to find the proportion in the tail beyond z = +1.00 0.1587 This includes the shaded portion that we are trying to find, but it also includes an extra portion in the tail beyond z = +2.00 Use the table again to find the extra portion beyond z=+2.00 z = +2.00 = 0.0228 Now subtract the two p(600 < X < 700) = 0.1587 – 0.0228 = 0.1359 = 13.59% Approach 2 Find how much of the distribution is located outside the section we want to measure. We want the unshaded areas of the distribution z = 2.00 = 0.0228 z = 1.00 = 0.8413 The total area (that we do not want) is 0.0228 + 0.8413 = .8641 Subtract from 1.000 (because the whole distribution is 1.000) 1.0000 - .8641 = .1359 = 13.59% Finding scores corresponding to specific proportions or probabilities In the previous examples, the problem was to find the proportion or probability corresponding to specific X values. The two-step process is illustrated in the following example: Figure 6.11 A map for probability problems We have only described how to go clockwise in this process. Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Start here We can go backwards to find a corresponding score for a certain proportion. Example 6.7 Scores on the SAT form a normal distribution with m = 500 and s = 100. What is the minimum score necessary to be in the top 15% of the SAT distribution? Begin with 15% = .1500 We are looking for a score. Figure 6.11 A map for probability problems We have only described how to go clockwise in this process. Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Start here We can go backwards to find a corresponding score for a certain proportion. We can go from proportion to X by going via z=scores. 1. Use the unit normal table to find the z- score that corresponds to a proportion of 0.15 2. Look at the graphic We will need to use Column C because the shaded area is the tail. The closest value in the table is 0.1492, and the z-score that corresponds to this proportion is z = 1.04. Next: Determine whether the z-score is positive or negative. In this case z = + 1.04 Now to change the z-score into an X value – Use the z-score equation: X = m + zs = 500 + 1.04 (100) = 500 + 104 = 604