Okun PSY 230 STUDY GUIDE NUMBER 5 Probability, Sampling, and Sampling Distributions 1. What is inferential statistics? Inferential statistics involves testing hypotheses about a population parameter based upon a sample statistic. I. Probability 2. What role does probability play in inferential statistics? Statisticians compute the probability of obtaining whatever difference is observed between the hypothesized value of the population parameter and the sample statistic. If this probability is sufficiently low, then they will reject the hypothesis with which they started. 3. How can probability be defined and calculated? Probability The probability of a given event, P (event), is determined by dividing the number of outcomes in which the event occurs [NEO] by the total number of outcomes [NTO]. P (event) = NEO/NTO. Probability represents a proportion, that is, the number of event outcomes divided by the total number of outcomes. Deck of Playing Cards Spades Ace King Queen Jack 10 9 8 7 6 5 4 3 2 Hearts Ace King Queen Jack 10 9 8 7 6 5 4 3 2 Clubs Ace King Queen Jack 10 9 8 7 6 5 4 3 2 1 Diamonds Ace King Queen Jack 10 9 8 7 6 5 4 3 2 4. How can the probability of either of two events be defined and calculated? The probability of either of two events, P (A or B) = P(A) + P(B) - P (A and B) If Event A and Event B are mutually exclusive, i.e., they cannot occur simultaneously, then this formula simplifies to P (A or B) = P(A) + P(B). 5. How is the absolute reduction in risk calculated? Absolute reduction in risk is the difference in the proportion of cases in two groups. Typically, the smaller proportion is subtracted from the larger proportion. 6. How is relative risk calculated? What does it tell us? Relative risk is calculated by dividing the proportion of cases in one group by the proportion of cases in the other group. Typically the larger proportion is divided by the smaller proportion. Relative risk tells us how much more likely a person in the control group is to have a condition than a person in the intervention group. 7. What does the base rate tell us? The base rate is the proportion of people in the control group who have the condition. 2 II. Sampling 8. What role does sampling play in inferential statistics? Sampling plays a key role in inferential statistics. Whether or not a sample provides a sound basis for making an inference about a population parameter from a sample statistic depends on the method by which the sample is selected and recruited. 9. What is a census? What is a sample? What is sampling? When a researcher uses a census, every element in the population is included in the study. In practice, researchers rarely collect data from every element in the population. Rather data are collected from a subset or portion of all elements, called a sample. Sampling is a process of selecting and recruiting a subset of all of the elements that constitute a population. 10. What is the definition of a random sample? A sample is random if each element in the population has an equal chance of being selected. 11. How can a random sample be distinguished from a stratified (modified) random sample? What is the advantage and disadvantage of each type of sample? Stratified (modified) random sampling involves sorting elements of a population according to a characteristic and making separate lists of the elements of the population. Then the same number of elements is drawn randomly from each list. 3 12. What is systematic sampling error? Systematic errors can occur in when the sampling is not done on a completely random basis. Flaws in sampling can be critical because nonrandom samples can result in statistics that provide biased estimates of population parameters. III. Sampling Distributions 13. How is a sampling distribution created? The sampling distribution of a statistic is created by drawing all possible samples of a given size from a population and each time computing the sample statistic. 14. What does sampling error refer to? Sampling error is the discrepancy, or amount of error, between a sample statistic and its corresponding population parameter. With respect to a proportion, sampling error refers to the amount on average that we should expect a sample proportion to deviate by chance from the population proportion. 15. What role does a sampling distribution play in inferential statistics? Sampling distributions enable us to determine the amount of sampling error that we should expect when we draw a single random sample of a given size and have specified a value of . 16. What obstacle does a statistician have to overcome in order to use inferential statistics? 4 Distinguishing among a Population, a Sample, and a Distribution of Sample Proportions A population consists of all elements that meet the criterion established by the researcher (say all registered Democrats in the U. S.) There is only 1 value for the population proportion, . In inferential statistics, we start with a hypothesized specific value of , typically that = .50. For example, we might hypothesize that the proportion of Democrats in the population who intend to vote for Senator Joe Biden (versus Senator Christopher Dodd) in the Democratic Presidential primary = .50. We don’t know the actual value of but we want to determine whether it is plausible that the population proportion of Democrats who will vote for Biden = .50 A sample consists of a subset of the elements of our population, say 300 registered Democrat voters in the U. S. There are many possible samples of 300 cases that can be drawn from the population of registered Democrat voters in the U. S. In inferential statistics, we have only 1 sample of n cases and want to test whether our hypothesized value of is plausible given the value of our single sample proportion, pBIDEN, equals say .55. A distribution of sample proportions (also called the sampling distribution of the proportion) consists of the sample proportions (or ps) computed from all possible samples of a given size (say 300) of Democrat registered voters. The distribution of sample proportions is needed for us to estimate, assuming our hypothesized value of (say .50) is true, how likely or unlikely it is for us to obtain a sample proportion that deviates by say 5 percentage points. It is impractical for researchers to be able to create the distribution of sample proportions for two reasons: (1) the researcher would need to collect data from the population to compute the sample proportion for all possible samples of a given size; (2) the computations involved would be very labor intensive. To engage in inferential statistics, statisticians had to overcome this obstacle by figuring out how to estimate the variability in the distribution of sample means (sampling error) when the researcher has data from a single sample. 5 Illustrating the Process of Creating a Sampling Distribution for a Proportion Deck of Cards SAMPLE # of Red Cards Drawn # in a Sample of 5 cards Proportion of Red Cards 1_______________________________________________________________ 2_______________________________________________________________ 3_______________________________________________________________ 4_______________________________________________________________ 5_______________________________________________________________ 6_______________________________________________________________ 7_______________________________________________________________ 8____________________________________________________________________________ 9____________________________________________________________________________ 10____________________________________________________________________________ 6 17. What does the population standard deviation of a proportion for a variable with two categories equal? ________ x = () (1-) ________ If is hypothesized to be .50, then x = (.50) (.50) = .50. ________ If is hypothesized to be .75, then x = (.75) (.25) = .43. 18. How can p, the standard deviation of the distribution of sample proportions, be computed? _________ ___ ___________ p = (() (1- ) / n = () (1- ) / n If is hypothesized to be .50, and n = 300, then ____________ p = (.50) (.50) / 300 = .0289 19. How does x differ from p? For x observations consist of the categories that each individual in a population is assigned to (e.g. male or female). For p observations consist of all possible sample proportions for a designated value of and a particular sample size (n). 20. How can we interpret the meaning of p? p is also known as the standard error of the proportion. The standard error of the proportion represents an approximate measure of how far, on average, sample proportions are located from . p is an index of the amount of sampling error. 21. What factors affect p? How does the size of p change as (a) the deviation of from .50 increases; and (b) sample size (n) increases? As the deviation of from .50 increases and as n increases, p decreases If is hypothesized to be .75, and n = 300, then ______________ p = (.75) (.25) / 300 = .025 If is hypothesized to be .50, and n = 500, then ______________ p = (.50) (.50) / 500 = .0224 By increasing n, a researcher can decrease the size of p, which is desirable. 7 22. How can the sampling distribution of proportion be created given (a) a value of ; and (b) a sample size (n)? Sampling Distribution of p when = .50 and n = 300 Sampling Distribution of p when = .50 and n = 500 23. 8 23. What are the 3 rules for constructing the sampling distribution of proportions? 1. The mean of the distribution of sample proportions will equal the population proportion (∏). 2. The standard deviation of the distribution of sample proportions (p) will equal ___________ () (1- ) / n 3. If n is equal to or greater than 100, the shape of the distribution of sample proportions will be normal. 9