Statistics in Psychology: Midterm II Notes Sheet Chapter 7: The Distribution of Sample Means The location of a score in a sample or a population can be represented with a z-score. However, researchers typically want to study entire samples instead of single scores within said samples. This is because samples provide an estimate of the population. The problem with this is that samples often provide incomplete pictures of the population. This is known as sampling error – the natural discrepancy (or amount of error) between a sample statistic and its corresponding population parameter. Importantly, sampling error does not indicate that a mistake was made. Samples, by nature, vary – two samples are very rarely identical (due to variance between individuals and outside factors that are often impossible to fully account for). Two separate samples will likely be different even if they are taken from the same population. Distribution of Sample Means: the collection of sample means for all the possible random samples of a particular size (n) that can be obtained from a population. ● You can think of it as a normal distribution, only instead of using scores from one sample, it uses means from numerous different samples. Characteristics of distributions of sample means: ● The sample means should pile up around the population mean. ● The distribution of sample means is approximately normal in shape. ● The larger the sample size, the closer the sample means should be to the population mean. “All possible random samples for a given population” is, obviously, a very large number, to the point where it’s impossible to expect to collect data from all of them. However, it is possible to determine exactly what the distribution of sample means looks like without taking hundreds or thousands of samples. Central Limit Theorem is used to specify the shape, central tendency, and variability of the distribution of sample means. It has a few specific rules, or characteristics: Rule 1: Distribution of sample means approaches a normal distribution as n approaches infinity. ● The distribution of sample means is almost perfectly normal in either of 2 conditions: - The population from which the samples are selected is a normal distribution - The number of scores in each sample is relatively large – at least 30 Rule 2: Distribution of sample means for samples of size n will have a mean of μM; and μM = μ. - μM refers to the mean of the distribution of sample means, while μ refers to the mean of the original population. - This means that the mean of the distribution of sample means is equal to the mean of the population. Rule 3: Distribution of sample means for samples of size n will have a standard deviation (σM) of: σM = σ/√n OR σM = √(σ²/n) Standard Error of M ● Variability of a distribution of scores is measured by the standard deviation. ● Variability of a distribution of sample means is measured by the standard deviation of the sample means. This is called the standard error of M, and is represented by the symbol σM. ● Standard error of M is the standard deviation of the distribution of sample means – when it is large, the means are widely scattered. ○ It provides a measure of how much distance is expected on average between M and μ. Law of Large Numbers: The larger the sample size (n), the more likely the sample mean is to be closer to the population mean. Above is an image showing this concept. (a) is a distribution representing the original population of IQ scores. In it, you can see that μ = 100 and σ = 12. (b) is a random sample of n = 16 scores selected from this original population. Averaging it all out, you’ll notice that the mean of this sample is 98.75. While close, it does not match the original population exactly. (c) is the distribution of sample means for every possible random sample of n = 16. From the law of large numbers, taking every possible sample eliminates sampling error, and the mean of the resulting distribution matches that of the original population. μM = μ = 100. z-Scores and Probability for Sample Means The primary use for the distribution of sample means is to find the probability of selecting a sample with a specific mean. Proportions of the normal curve are used to represent probabilities. A z-score for the sample mean is computed. The sign of the z-score tells us where it is located relative to the mean – a positive value is above the mean, a negative value is below. The number itself tells us the distance between the location of the number and the mean in terms of the number of standard deviations. - To make it easier, you can consider the standard deviation as the unit by which we’re measuring the distance. Z-SCORE FORMULA: z = (M - μ)/σM What z-score separates the top 10% from the remainder of the distribution? In order to find this z-score, we need to locate the row in the unit table that corresponds to this value. This means the row that has 0.1000 in column C or 0.9000 in column B. As a result, the z-score that separates the middle 80% from the rest of the values is either z = 1.28 or z = -1.28. NOTE: if anyone is confused about the unit normal table, leave a comment or text me so I can add an explanation and make it clearer More about Standard Error: ● There will usually be a discrepancy between a sample mean and a true population mean, which is referred to as sampling error. ● The amount of sampling error varies across samples ● The variability of sampling error is measured by the standard error of the mean. Journals vary in how they refer to standard error. Often, however, they use: - SE - SEM Standard error is often reported in a table along with n and M for the different groups in an experiment. It may also be added to a bar or line graph. Exercises: 1. A population has μ = 60 with σ = 6; the mean of the distribution of sample means for samples of size n = 4 selected from this population would have an expected value of: a. 5 b. 60 c. 3 d. 15 2. A population has μ = 60 with σ = 6; the standard deviation for the distribution of sample means for samples of size n = 4 selected from this population would have an expected value of: a. 5 b. 60 c. 3 d. 15 3. The shape of a distribution of sample means is always normal: True/False 4. Which of the following is not a characteristic of distribution of sample means? a. The sample means should pile up around the population mean. b. The distribution of sample means is approximately normal in shape. c. The mean of distribution of sample means is different to the mean of the population. d. The larger the sample size, the closer the sample means should be to the population mean. 5. As sample size increases, the value of the standard error decreases: True/False 6. A random sample of n = 16 scores is obtained from a population with μ = 50 and σ = 8. If the sample mean is M = 58, the z-score corresponding to the sample mean is: a. z = 1.00 b. z = 2.00 c. z = 4.00 d. Cannot be determined 7. A sample mean corresponding to z = 3.00 on the distribution of means is a fairly typical, representative sample: True/False 8. The mean of the sample is unlikely to be equal to the population mean: True/False Exercises: 1. A population has μ = 60 with σ = 6; the mean of the distribution of sample means for samples of size n = 4 selected from this population would have an expected value of: a. 5 b. 60 c. 3 d. 15 2. A population has μ = 60 with σ = 6; the standard deviation for the distribution of sample means for samples of size n = 4 selected from this population would have an expected value of: a. 5 b. 60 c. 3 d. 15 3. The shape of a distribution of sample means is always normal: True/False 4. Which of the following is not a characteristic of distribution of sample means? a. The sample means should pile up around the population mean. b. The distribution of sample means is approximately normal in shape. c. The mean of distribution of sample means is different to the mean of the population. d. The larger the sample size, the closer the sample means should be to the population mean. 5. As sample size increases, the value of the standard error decreases: True/False 6. A random sample of n = 16 scores is obtained from a population with μ = 50 and σ = 8. If the sample mean is M = 58, the z-score corresponding to the sample mean is: a. z = 1.00 b. z = 2.00 c. z = 4.00 d. Cannot be determined 7. A sample mean corresponding to z = 3.00 on the distribution of means is a fairly typical, representative sample: True/False 8. The mean of the sample is unlikely to be equal to the population mean: True/False Chapter 8: Introduction to Hypothesis Testing Definition of Hypothesis Testing: A statistical method that uses sample data to evaluate a hypothesis about a population. → Used to test predictions about characteristics of a population From the point of view of the hypothesis test, the entire population receives the treatment and then a sample is selected from the treated population. In the actual research study, however, a sample is selected from the original population and the treatment is administered to the sample. From either perspective, the result is a treated sample which represents the treated population. Four steps of a Hypothesis Test: 1. 2. 3. 4. State the hypotheses Set the criteria for a decision Collect data & compute sample statistics Make a decision Step 1: State the hypotheses There are 2 types of hypotheses: ● Null Hypothesis (H0): The treatment has no effect – there is no change, whether positive or negative, meaning there is no difference or relationship. ○ The independent variable (treatment) has no effect on the dependent variable. ● Alternative Hypothesis (H1): The treatment has an effect – there is a change, whether positive or negative, meaning there is a difference or relationship. ○ The independent variable (treatment) has an effect on the dependent variable. Step 2: Set the criteria for a decision Distribution of sample means is divided into 2 sections/categories: - Those that are likely to be obtained if H0 is true. - Those that are very likely to be obtained if H0 is true. Alpha level (aka. Level of significance): A probability value used to define the concept of “very unlikely” in a hypothesis test. Critical region: Consists of the extreme sample values that are “very unlikely” (as defined by the alpha level) to be obtained if the null hypothesis is true. Boundaries for the critical region(s) are determined by the probability set by the alpha level. The more sample data falls in the critical region, the more likely H0 is to be true. Step 3: Collect data & compute sample statistics Data is always collected after Steps 1&2: ● The hypotheses are stated ● The criteria for a decision has been established This sequence ensures objectivity – an honest, objective evaluation of the data. If the data is collected before stating the hypotheses, the probability of rejecting a true null hypothesis becomes greater. After collecting data, you compute the sample statistic (z-score) to show the exact position of the sample on the distribution of means. z = (M - μ)/σM Step 4: Make a decision The obtained z-score value is used to make a decision. There are 2 possible outcomes: ● If the sample data is located in the critical region (aka. Region where the null hypothesis being true is unlikely according to the alpha level), then the correct decision is to reject the null hypothesis. This means that the treatment has an effect. ● If the sample data is not located in the critical region, then the data does not provide strong evidence that the null hypothesis is wrong. Therefore, this means that the correct decision is to fail to reject the null hypothesis, concluding that the treatment has no effect. (Note that this does not prove the null hypothesis true) Hypothesis testing is an inferential process. ● Inferential process means it uses limited information from a sample to make a statistical decision, which then is used to make a general conclusion about the population. In other words, you start with a small sample and extrapolate the information taken from this sample onto the larger population from which it was drawn. This process, however, is not infallible – errors are still possible. There are 2 types of errors: ● Type 1 Error: Rejecting a null hypothesis which is actually true – The researcher concludes that a treatment has an effect when in reality it has none. This is often caused by failing to account for extraneous variables which may cause differences in the data between the sample that received the treatment and the sample that did not. ○ Example: A researcher is studying a new treatment for OCD. In this study, he concludes that the treatment is effective at treating OCD, when in actuality the treatment had no effect – the difference in the data between the treated and untreated samples was caused by other factors. When this researcher rejects H0 (which is true in this case), this is a Type 1 error. ● Type 2 Error: The researcher fails to reject a false null hypothesis – The researcher concludes that a treatment had no effect when in reality it did. This is caused when the statistical power is low (read below for more in-depth explanation of statistical power). ○ Example: A researcher is studying the effect of a new form of therapy for people with ADHD. In this study, he concludes that the therapy has no effect on people with ADHD, when in reality the therapy did have an effect. When the researcher then fails to reject H0 (which in this case is false), this is Type 2 error. Table of possible outcomes (for clarity): H0 is true (treatment H0 is false has no effect) (treatment has an effect) Researcher’s Decision Reject Ho Type 1 error (rejected a true null hypothesis) Correct decision Researcher’s Decision Fail to reject Ho Correct decision Type 2 error (failed to reject a false null hypothesis) A result is statistically significant if it is very unlikely to occur when H0 is true. To report a statistically significant result in APA format: ● Report that you found a significant effect ● Report the value of your test statistic ● Report the p-value (probability) of your test statistic Factors that influence a hypothesis test: ● Size of the difference between the sample mean and the original population mean ○ Larger discrepancies lead to larger z-scores. ● Variability of the scores ○ More variability → larger standard error (note that standard error refers to the approx. standard deviation of the population) ● Number of scores in the sample ○ Larger n → smaller standard error Assumptions for hypothesis tests with z-scores ● ● ● ● Random sampling (meaning the sample was picked at random with no bias) Independent observations (meaning the data points are not affected by each other) Value of σ is unchanged by the treatment Normal sampling distribution ○ Note that the unit normal table can only be used if the distribution of sample means is normal. Directional (One-tailed) hypothesis testing: ● The standard hypothesis testing procedure is referred to as a two-tailed (nondirectional) test because the critical region is divided between the 2 tails (ends) of the distribution ● However, sometimes the researcher has a specific prediction about the direction of the treatment effect. ● When a specific direction of the treatment effect can be predicted, it can be incorporated into the hypotheses. ● In a directional (one-tailed) hypothesis test, the researcher specifies either an increase or a decrease in the population mean as a result of the treatment. ○ e.g. Predicting that a treatment has an effect would be considered a two-tailed test, whereas predicting that it will have a positive (or negative) effect specifically would be considered a one-tailed test. In a one-tailed test, the alternative hypothesis refers to the specific prediction made by the researcher, whereas the null hypothesis refers to everything that falls outside this specific prediction. ● EXAMPLE: Let’s say a researcher is testing the effect of a new therapy for people with ODD (Oppositional-Defiant Disorder). The researcher predicts that the therapy will reduce levels of defiance in people with ODD. ○ The statistic he is measuring is the mean level of defiance – his prediction is that the therapy will have a negative effect (reduce) the mean level. ● In this case, H0 does not just mean the treatment had no effect – it means the treatment had an effect equal to or greater than 0. ○ Therefore, if the therapy is shown to have a negative effect on the mean level of defiance (meaning the patients showed less defiance after the therapy was administered), then the null hypothesis (that the treatment would have a positive effect (or no effect) on the mean) is rejected. ● In this case, H1 refers to the specific prediction made by the researcher. Since the prediction is that the treatment will have a negative effect on the mean, H1 means that the treatment had an effect less than 0 (DOES NOT include an effect equal to 0 – this would be part of the null hypothesis). ○ Therefore, if the treatment is shown to have a positive effect on the mean level of defiance (the patients showed more defiance after the therapy was administered), the alternative hypothesis (that the treatment would have a negative effect on the mean) is rejected. ● A one-tailed test allows for rejecting H0 when there is a relatively small difference in the specified direction, whereas a two-tailed test requires a relatively large difference regardless of the direction. ● Note that in general, two-tailed (nondirectional) tests should be used unless there is a strong justification for a directional prediction. Effect Size: The absolute magnitude of a treatment effect, independent of sample size. Cohen’s d measures the effect size in a simple, standardized way. ● Cohen’s d = mean difference/standard deviation = μtreatment - μno treatment/σ Effect size is grouped into 3 categories: ● d = 0.20 → small effect ● d = 0.50 → medium effect ● d = 0.80 → large effect Statistical Power: The probability of correctly rejecting a false null hypothesis. ● Power = 1 - β ○ β (beta) is the probability of a Type 2 error. Power is usually estimated before starting a study, and requires several assumptions about factors that influence power. Factors which increase power: ● Increased effect size ● Larger sample sizes ● One-tailed tests increase power relative to two-tailed tests Factors which decrease power: ● Reducing the alpha level ● Using a two-tailed test (relative to a one-tailed test) Exercises: 1. A sports coach is investigating the effect of a new training method. What would Ho be? a. The new training program produces different results to the existing one. b. The new training program produces results similar to the existing one. c. The new training program produces results better than the existing one. d. There is no way to predict the results of the new training program. 2. T/F: If the alpha level is decreased, the size of the critical region decreases 3. T/F: The critical region defines unlikely values if the null hypothesis is true 4. In what order are the steps of a hypothesis test done? a. Set the criteria for a decision → Collect the data & compute sample statistics → State the hypotheses → Make a decision b. Collect the data & compute sample statistics → State the hypotheses → Set the criteria for a decision → Make a decision c. State the hypotheses → Set the criteria for a decision → Collect the data & compute sample statistics → Make a decision d. Make a decision → Collect the data & compute sample statistics → State the hypotheses → Set the criteria for a decision 5. T/F: When the z-score is extreme, it shows the null hypothesis is false. 6. T/F: A decision to retain the null hypothesis means you proved it is true. 7. A researcher is studying the effect of a new treatment for ADHD. His null hypothesis states it has no effect, whereas his alternative hypothesis states it does have an effect. If he rejects the null hypothesis when it is actually true, this is: a. Type 1 error b. Type 2 error 8. A result is statistically significant if: a. It is very unlikely to occur if H0 is false. b. It is very likely to occur if H0 is true. c. It falls outside the critical region. d. It is very unlikely to occur if HO is true. 9. A researcher is predicting that a treatment will decrease scores. If this treatment is evaluated using a directional hypothesis test, then the critical region for the test: a. Would be entirely in the right-hand tail of the distribution. b. Would be entirely in the left-hand tail of the distribution. c. Would be divided equally between the two tails of the distribution. d. Cannot be identified without knowing the alpha level. 10. The power of a statistical test is the probability of: a. Rejecting a true null hypothesis b. Supporting a true null hypothesis c. Rejecting a false null hypothesis d. Supporting a true null hypothesis 11. T/F: Cohen’s d is used because alone, a hypothesis test does not measure the size of the treatment effect. 12. T/F: Lowering the alpha level from 0.5 to 0.1 will increase the power of a statistical test. 13. A researcher is doing a one-tailed test on the effect of a treatment for depression on patients. His null hypothesis states the treatment will either decrease or have no effect on the scores of the patients. In this case, the alternative hypothesis states: a. The treatment will either increase or have no effect on the scores of the patients. b. The treatment will decrease the scores of the patients. c. The treatment will increase the scores of the patients. d. The treatment will have no effect on the scores of the patients. 13.1. After collecting the data, the researcher measures the magnitude of the effect of the treatment on the patients. The mean of scores of the sample who did not get the treatment was 9. The mean of scores of the sample who did get the treatment was 6. The standard deviation is 4. The magnitude of the effect is: a. -0.75 b. 3 c. 0.75 d. -3 13.2. The effect size is: a. Small b. Large c. Medium d. Cannot be determined 13.3. Based on the resulting effect size, as well as the prediction made by the researcher, which is the correct conclusion and decision? a. b. c. d. The treatment increased the scores of the patients – reject the alternative hypothesis. The treatment increased the scores of the patients – reject the null hypothesis. The treatment decreased the scores of the patients – reject the alternative hypothesis. The treatment decreased the scores of the patients – reject the null hypothesis. Chapter 8 Answer Key: 1. A sports coach is investigating the effect of a new training method. What would Ho be? a. The new training program produces different results to the existing one. b. The new training program produces results similar to the existing one. c. The new training program produces results better than the existing one. d. There is no way to predict the results of the new training program. H0 refers to the null hypothesis, which states that the treatment has no effect. 2. If the alpha level is decreased, the size of the critical region decreases: True/False 3. The critical region defines unlikely values if the null hypothesis is true: True/False 4. In what order are the steps of a hypothesis test done? a. Set the criteria for a decision → Collect the data & compute sample statistics → State the hypotheses → Make a decision b. Collect the data & compute sample statistics → State the hypotheses → Set the criteria for a decision → Make a decision c. State the hypotheses → Set the criteria for a decision → Collect the data & compute sample statistics → Make a decision d. Make a decision → Collect the data & compute sample statistics → State the hypotheses → Set the criteria for a decision 5. When the z-score is extreme, it shows the null hypothesis is false: True/False 6. A decision to retain the null hypothesis means you proved it is true: True/False 7. A researcher is studying the effect of a new treatment for ADHD. His null hypothesis states it has no effect, whereas his alternative hypothesis states it does have an effect. If he rejects the null hypothesis when it is actually true, this is: a. Type 1 error b. Type 2 error 8. A result is statistically significant if: a. It is very unlikely to occur if H0 is false. b. It is very likely to occur if H0 is true. c. It falls outside the critical region. d. It is very unlikely to occur if HO is true. 9. A researcher is predicting that a treatment will decrease scores. If this treatment is evaluated using a directional hypothesis test, then the critical region for the test: a. Would be entirely in the right-hand tail of the distribution. b. Would be entirely in the left-hand tail of the distribution. c. Would be divided equally between the two tails of the distribution. d. Cannot be identified without knowing the alpha level. 10. The power of a statistical test is the probability of: a. rejecting a true null hypothesis b. Supporting a true null hypothesis c. Rejecting a false null hypothesis d. Supporting a true null hypothesis 11. Cohen’s d is used because alone, a hypothesis test does not measure the size of the treatment effect: True/False 12. Lowering the alpha level from 0.5 to 0.1 will increase the power of a statistical test: True/False 13. A researcher is doing a one-tailed test on the effect of a treatment for depression on patients. His null hypothesis states the treatment will either decrease or have no effect on the scores of the patients. In this case, the alternative hypothesis states: a. The treatment will either increase or have no effect on the scores of the patients. b. The treatment will decrease the scores of the patients. c. The treatment will increase the scores of the patients. d. The treatment will have no effect on the scores of the patients. For a one-tailed test, you can think of the alternative and null hypotheses as “opposites” – the null hypothesis is everything that isn’t in the alternative hypothesis and vice versa. The null hypothesis says the treatment will decrease or have no effect on scores. Therefore, the alternative hypothesis says the treatment will increase the scores. 13.1. After collecting the data, the researcher measures the magnitude of the effect of the treatment on the patients. The mean of scores of the sample who did not get the treatment was 9. The mean of scores of the sample who did get the treatment was 6. The standard deviation is 4. The magnitude of the effect is: a. -0.75 b. 3 c. 0.75 d. -3 Cohen’s d = mean difference/standard deviation = μtreatment Cohen’s d = (6-9)/4 = -3/4 = -0.75 - μno treatment/σ 13.2. The effect size is: a. Small b. Large c. Medium d. Cannot be determined Effect size is grouped into 3 categories: ● d = 0.20 → small effect ● d = 0.50 → medium effect ● d = 0.80 → large effect 13.3. Based on the resulting effect size, as well as the prediction made by the researcher, which is the correct conclusion and decision? a. b. c. d. The treatment increased the scores of the patients – reject the alternative hypothesis. The treatment increased the scores of the patients – reject the null hypothesis. The treatment decreased the scores of the patients – reject the alternative hypothesis. The treatment decreased the scores of the patients – reject the null hypothesis. The effect size shows how exactly the treatment affected the scores of the population. Since the effect size is negative, the treatment decreased the scores. The alternative hypothesis stated that the treatment would increase the scores of the population – therefore, the correct decision is to reject it. Chapter 9: Introduction to the t Statistic The problem with the z-score is that it requires more information than researchers typically have available. - To find the z-score, you need the standard deviation of the population (in order to compute the standard error) Typically, however, researchers only have the sample data. The t Statistic: ● An alternative to the z-score (can be considered an “approximate” z-score) ● The estimated standard error (SM) is used as an estimate of the actual standard error σM when the value of σ is unknown. ○ This value is computed from the sample variance or sample standard deviation – It provides an estimate of the standard distance between a sample mean M and the population mean μ. The estimated standard error formula uses s2 to estimate σ2. SM = s/√n OR √(s2/n) The t statistic uses the estimated standard error in place of σM T = (M - μ)/SM Note that the t statistic is an estimation – it is not as precise as the z-score. The t Distribution: ● Is a “family” of distributions, one for each possible number of degrees of freedom ● Approximates the shape of a normal z-score distribution (but is not exactly the same) ○ Flatter than the normal z-score distribution ○ More spread out than the normal z-score distribution ○ More Variability (“fatter tails”) in t distribution ● Due to these differences, we use the table of values of t instead of the unit normal table for hypothesis tests. There are a few methods of hypothesis testing using the t-statistic: ● One-sample t test: ○ Comparing a sample’s mean with the population ● Independent samples t test (t test for 2 independent samples) ○ Comparing means of two samples ● The paired (dependent) samples t test (or t test for 2 related samples) ○ Comparing two different scores of the same sample One-sample t test statistic (assuming null hypothesis is true) t = (sample mean - population mean)/estimated standard error = (M - μ)/SM = 0 There are 4 steps to using the t statistic for hypothesis testing: 1. 2. 3. 4. State the null and alternative hypotheses and select an alpha level Locate the critical region using the t distribution table and the value for df Calculate the t test statistic Make a decision regarding H0 Above is an example of the critical region in the t distribution, assuming α = .05 and df = 8 Assumptions of the t Test: ● The values in the sample must consist of independent observations (i.e. the data points do not affect each other) ● The population sampled must be normal ○ When the sample size is relatively large, this assumption can be violated without affecting the validity of the hypothesis test ○ With sample sizes equal to or greater than 30, the shape of the distribution of means is normal (regardless of the shape of the distribution of scores in the population) Reporting the results of a t Test: 1. Report whether or not the test was significant - Significant → H0 rejected - Not significant → Fail to reject H0 2. Report the t statistic value (including df) - E.g. t(8) = -2.67 3. Report significance level, either: - p < alpha, e.g. p < .05 OR - If known, exact probability (e.g. p = .029) Directional hypotheses and one-tailed tests: ● Non-directional (two-tailed) tests are most commonly used. ○ It is assumed that the test is non-directional unless stated otherwise ● Directional tests may be used for particular research situations (e.g., exploratory investigations or pilot studies). ○ If used, directional hypotheses must be stated/reported. ● Four steps of hypothesis tests are carried out, main difference is the critical region is defined in just one tail of the t distribution.