Probability Distributions What proportion of a group of kittens lie in any selected part of a pile of kittens? Probability Distributions Sometimes we want to know the chances that something will occur? For example: 1. What are the odds that I will win the lottery? 2. What are my chances of getting an A? 3. If a person is young, what are the chances that he or she will be in poverty? 4. What chances do poor people have of graduating from college? To answer questions such as these, we turn to probability. Probability Distributions Probability: Out of all possible outcomes, the proportionate expectation of a given outcome. Values for statistical probability range from 0 (never) to 1 (always) or from 0% chance to 100% chance. For example: 12 of 25 students in an engineering class are women. The probability that a randomly selected student in that engineering class will be a woman is 12/25 = .48 or 48%. 13 12 F M Probability Distributions What is the probability that a student will get a C in Statistics? What about a C or Higher? 10 5 0 3 F 5 D 12 C 7 B 5 A Probability Distributions What is the probability that a student will get a C in Statistics? 12/32 = .375 What about a C or Higher? 24/32 = .75 10 5 0 3 F 5 D 12 C 7 B What is the probability that a person in the class got a grade? 32/32 = 1 5 A Probability Distributions Empirical probability distribution: All the outcomes in a distribution of research results and each of their probabilities—what actually happened The probability distribution of a variable lists the possible outcomes together with their probabilities Probability Distributions What is the probability that a student will get a C in Statistics? 12/32 = .375 What about a C or Higher? 24/32 = .75 .375 .219 0 F D C B A P=1 100% of cases P=.25 or 25% F D C B A Empirical Rule Many naturally occurring variables have bell-shaped distributions. That is, their histograms take a symmetrical and unimodal shape. When this is true, you can be sure that the empirical rule will hold. Empirical rule: If the histogram of data is approximately bell-shaped, then: 1. About 68% of the cases fall between Y-bar – s.d. and Y-bar + s.d. 2. About 95% of the data fall between Y-bar – 2s.d. and Y-bar + 2s.d. 3. All or nearly all the data fall between Y-bar – 3s.d. and Y-bar + 3s.d. Empirical Rule Empirical rule: If the histogram of data is approximately bell-shaped, then: 1. About 68% of the cases fall between Y-bar – s.d. and Y-bar + s.d. 2. About 95% of the cases fall between Y-bar – 2s.d. and Y-bar + 2s.d. 3. All or nearly all the cases fall between Y-bar – 3s.d. and Y-bar + 3s.d. Body Pile: 100% of Cases s.d. 15 15 15 s.d. 15 M = 100 55 70 85 s.d. = 15 115 130 145 + or – 1 s.d. + or – 2 s.d. + or – 3 s.d. Probability Distributions The Normal Probability Distribution A continuous probability distribution in which the horizontal axis represents all possible values of a variable and the vertical axis represents the probability of those values occurring. Values are clustered around the mean in a symmetrical, unimodal pattern known as the bell-shaped curve or normal curve. Probability Distributions The Normal Probability Distribution No matter what the actual s.d. () value is, the proportion of cases under the curve that corresponds with the mean ()+/- 1s.d. is the same (68%). The same is true of mean+/- 2s.d. (95%) And mean +/- 3s.d. (almost all cases) Because of the equivalence of all Normal Distributions, these are often described in terms of the Standard Normal Curve where mean = 0 and s.d. = 1 and is called “z” Probability Distributions The Normal Probability Distribution No matter what the actual s.d. () value is, the proportion of cases under the curve that corresponds with the mean ()+/- 1s.d. is the same (68%). The same is true of mean+/- 2s.d. (95%) And mean +/- 3s.d. (almost all cases) 68% Because of the equivalence of all Normal Distributions, these are often described in terms of the Standard Normal Curve where mean = 0 and s.d. = 1 and is called “z” 68% Z = -3 -2 -1 0 1 2 3 Z=-3 -2 -1 0 1 2 Z = # of standard deviations away from the mean 3 Probability Distributions Converting to z-scores To compare different normal curves, it is helpful to know how to convert data values into z-scores. It is like have two rulers beneath each normal curve. One for data values, the second for z-scores. = 100 IQ Values 55 Z-scores -3 = 15 70 85 100 -2 -1 0 115 1 130 2 145 3 Probability Distributions Converting to z-scores Z=Y– Z = 100 – 100 / 15 = 0 Z = 145 – 100 / 15 = 45/15 = 3 Z = 70 – 100 / 15 = -30/15 = -2 Z = 105 – 100 / 15 = 5/15 = .33 = 100 IQ Values 55 Z-scores -3 = 15 70 85 100 -2 -1 0 115 1 130 2 145 3 Probability Distributions Engagement Ring Example: Mean cost of an engagement ring is $500, and the standard deviation is $100. Z = 500 – 500 / 100 = 0 Z=Y– Z = 200 – 500 / 100 = -300/100 = -3 Ring Values Z = 550 – 500 / 100 = 50/100 = .5 = 100 = 15 200 300 400 500 Z-scores -3 Z = 600 – 500 / 100 = 100/100 = 1 -2 -1 0 600 700 800 1 2 3 Probability Distributions Engagement Ring Example: Mean cost of an engagement ring is $500, and the standard deviation is $100. Now, use the empirical rule… What percentage of people will be above or below my preferred ring price of $300? Ring = $500 2.5% Values = $100 200 300 400 500 Z-scores -3 2.5% 68% -2 -1 0 600 700 800 1 2 3 Probability Distributions Comparing two distributions by Z-score Imagine that your partner didn’t get you a ring, but took you on a trip to express their love for you. You could convert the trip’s price into a ring price using z-scores. Your trip cost $2,000. The average “love trip” costs $1,500 with a s.d. of $250. What is the equivalent ring price? Trips Rings 200 300 400 500 -3 -2 -1 0 600 700 1 2 800 750 1000 1250 1500 3 -3 -2 -1 0 1750 2000 2250 1 2 3 Probability Distributions Comparing two distributions by Z-score Your trip cost $2,000. The average “love trip” costs $1,500 with a s.d. of $250. What is the equivalent ring price? What percentage of persons got a trip that cost less than yours? Trips Rings 200 300 400 500 -3 -2 -1 0 600 700 1 2 800 750 1000 1250 1500 3 -3 -2 -1 0 1750 2000 2250 1 2 3 Probability Distributions Comparing two distributions by Z-score What about ACT versus SAT scores? SAT ACT 15 -3 18 -2 21 -1 24 27 30 33 400 0 1 2 3 -3 600 800 1000 -2 -1 0 1200 1400 1600 1 2 NOTE: This is a helpful process, but can be illogical at times. Remember that you are comparing scores on a “population base” or percent of people above or below each score. Is it logical to compare SAT score to self-esteem this way? No. 3 Probability Distributions How to use a z-score table. (I could use some z z z z’s). F-N&L-G Appendix B has reports from the literal measurements of area under normal curves. The table gives you the percent of values above, below, or between particular z-scores (# of s.d.s away from the mean). Left column = z (out to two decimals) Second column is area—proportion of distribution—from mean to z Right column is area—proportion of distribution—from z to the end of the line. Can work in reverse to find z-scores too. Other tables will use different layouts, online you can get automatic answers without using a table. Probability Distributions Theoretical probability distribution: The proportion of times we would expect to get a particular outcome in a large number of trials—what would happen if we had the time to observe it. Q: Why are these important? A: Sociologists usually get only one chance to draw a sample from a population. Therefore, if we know what kind of variation in measurement we would see if we repeatedly sampled (theoretically), we can judge the chance that numbers produced by our sample are accurate (this will make sense later). Probability Distributions Theoretical probability distribution: The number of times we would expect to get a particular outcome in a large number of trials. For Example: Let’s say the mean GPA at SJSU is 2.5. Randomly take 100 SJSU students’ GPAs. Record it. Now, take 100 more SJSU students’ GPAs. Record that. Now, repeat the above. Record again. Now, lather, rinse, repeat. Again. Again. And on and on. What might you see? Probability Distributions Theoretical probability distribution: The number of times we would expect to get a particular outcome in a large number of trials. 50% of samples would have a mean GPA greater than 2.5 1.3 1.5 1.7 1.9 2.1 2.3 2.5 2.7 2.9 3.1 3.3 3.5 3.7 3.9 = a sample’s mean 2.5 = Overall Mean