Uploaded by Gabe Seidman

09H-intro-hypothesis-testing

advertisement
Intro to Hypothesis Tests – Stat 120, Spring 2023
Activity 1: Building intuition
Background Evidence suggests that reward systems may operate in the opposite way from what is intended
(e.g., ranking systems may decrease productivity; grading systems may not stimulate learning). To investigate
this phenomenon, Amabile, T. M. (1985) designed a study to explore whether motivation type (intrinsic or
extrinsic) impacted creativity scores. In this study, 47 creative writers were randomly assigned to one of
two questionnaires where they ranked reasons they write. One questionnaire listed intrinsic motivations and
the other listed extrinsic motivations. After completing the questionnaire, all subjects wrote a Haiku about
laughter, which was graded for creativity by a panel of poets. The average rating (out of 40 points) for each
subject was recorded.
In this activity, you will explore whether there are discernible differences in creativity between the two groups.
GROUP QUESTIONS
1. Is this an experimental or observational study? How do you know?
2. What research question is being investigated? That is, what is the objective of this study?
3. Write down the competing claims (i.e., hypotheses) being investigated.
4. Identify the response and explanatory variables.
5. What type of plot could you use to compare the two treatment groups? Why did you choose this plot
type? How does this plot help you investigate the competing claims?
1
INDIVIDUAL QUESTIONS Please do not discuss your answers with your group until you start question #8.
Below is a lineup of plots where one plot displays the observed data and the other 19 plots are “decoy” plots
where there is no difference between the treatment groups.
1
2
3
4
5
6
7
8
30
20
10
30
20
Score
10
Treatment
9
10
11
12
30
Extrinsic
Intrinsic
20
10
13
14
Extrinsic Intrinsic
Extrinsic Intrinsic
15
16
Extrinsic Intrinsic
Extrinsic Intrinsic
30
20
10
Treatment
6. Which plot do you think is the most different from the others?
7. What feature(s) of the plot led you to this choice?
GROUP QUESTIONS
8. Which plot does your group think is the most different from the others?
9. What feature(s) of the plot led you to this choice?
2
The observed data are in plot # ______.
10. Did your group choose the observed (data) plot?
11. Based on your answer to question 10, what does this suggest about your competing claims?
STOP HERE! We will have a large group discussion sharing the results and then the plot that displays the
observed data will be revealed.
Activity 1 Part 2: Permutation test
Rather than using a “lineup” of plots, we will construct a randomization distribution to represent the
behavior of the test statistic when the null hypothesis is in fact true.
1. Suppose we had slips of paper with each response written on it. How could we use these cards to
simulate a new sample under the null hypothesis?
2. Create a randomization distribution using the StatKey app:
•
•
•
•
•
•
Open a web browser and go to http://lock5stat.com/statkey.
Click on the appropriate link to conduct your simulation (Test for difference in means).
Go to https://aloy.rbind.io/data/CreativeWriting.csv.
In StatKey, click on the button that says Edit Data.
Highlight all of the data in the window and press Delete.
Copy the data and paste it into the Edit Data window. Click Ok.
3. What is the value of the test statistic? This is given in the Original Sample section in the top right.
4. Click “Generate 1000 samples” to create a randomization distribution with 1000 samples. Describe
what the plot of the randomization distribution represents. Note: If you hover over each dot the lower
right section shows you the sample.
3
5. Where is the plot centered? Why does this make sense?
6. Where does the location of the sample statistic fall in the plot?
7. Does the observed test statistic appear to be rare? What does this indicate about the null hypothesis?
8. In StatKey, calculate the proportion of simulated statistics that are at least as far out in the tail as
your observed test statistic:
• Check the “Left Tail” checkbox (This direction is determined by Ha )
• Click on the box under the x-axis and enter the value of the observed test statistic.
• The proportion will display in the text box above the x-axis.
Because the simulation assumes the null hypothesis to be true, we sometimes refer to the distribution
you simulated as the null distribution. When you determine the proportion of the results in the null
distribution that are at least as extreme as the observed results, you are estimating a probability. We
will refer to this probability as the p-value. We use the p-value to evaluate the strength of evidence
against the null hypothesis.
4
Activity 2:
Shargorodsky et al. (2010) compared the current hearing loss rate among teens (12-19 years) to previous
years to see whether teen hearing loss is increasing, possibly due to heavier use of ear buds. They collected
hearing loss data from 1771 teens in the NHANES 2005-6 study (333 with some level of hearing loss) and
2928 teens from NHANES III (1988-1994), with 480 showing some level of hearing loss. Is there a discernible
difference between these two groups?
1. Identify the two populations and the variable being considered in this study.
2. Create a two-way (contingency) table displaying the results from the NHANES studies. Have columns
represent the time span and the rows represent the level of hearing loss (some vs. no hearing loss).
3. Write a sentence or two comparing the distributions of hearing loss between these two studies. Be sure
to report an appropriate statistic.
4. Do these data convince you that there is a difference in the population proportions? If not, what could
be another explanation for the difference you see in these numerical and graphical summaries for these
two samples?
5. Let p94 represent the proportion of all American teenagers in 1994 with at least some hearing loss, and
similarly for p06 . Define the parameter of interest to be p94 − p06 , the difference in the population
proportions between these two years. State appropriate null and alternative hypotheses about this
parameter to reflect the researchers’ conjecture that hearing loss by teens is becoming more prevalent.
5
6. Create a randomization distribution using http://lock5stat.com/statkey:
• Click on the appropriate link to conduct your simulation (Test for difference in means).
• In StatKey, click on the button that says Edit Data.
• Enter the count and sample size for each group and press Ok.
Sketch a plot of the randomization distribution below.
7. Describe what the plot of the randomization distribution represents
8. Where is the plot centered? Why does this make sense?
9. Where does the location of the sample statistic fall in the plot?
10. Based on your answer to the previous question, would you expect to see a small or large p-value?
11. Use StatKey to calculate the p-value and report it. What information does the p-value tell us?
12. Provide an answer to the research question in context.
6
Download