What to Look for in Published Data

advertisement
What to Look for in Published Data
What hypotheses are being tested?
Are the hypotheses about a population mean, population proportion or
some other population characteristic?
Does the validity of the test depend on any assumptions about the
population from which the sample was selected? If so, are the
assumptions reasonable?
What is the P-value associated with the test? Was a significance level
selected for the test (as opposed to simply reporting the P-value)? Is the
chosen significance level reasonable?
Is the reported interval a confidence interval? If the reported interval is
not a confidence interval, you may want to construct a confidence
interval from the given information.
What confidence level is associated with the given interval? What does
the confidence level say about the long-run error rate of the method used
to construct the interval?
Is the reported interval relatively narrow or relatively wide? Has the
population characteristic been estimated precisely?
Are the variables of interest categorical or numerical?
Are the data in the article given in the form of a frequency table?
If a contingency table is involved, is the question of interest one of
homogeneity or one of independence?
What is the value of the test statistic? Should the null be rejected?
Are the conclusions of the authors consistent with the results of the test?
Does the result have practical significance as well as statistical
significance?
Example 1
Consider the following statement (based on information in the article
“Serum Transferrin Receptor for the Detection of Iron Deficiency in
Pregnancy,” (American Journal of Clinical Nutrition [1991]: 1077 – 1081):
“In a total sample of 176 pregnant women, mean serum receptor
concentration did not differ significantly from 5.63, the mean for women
who are not pregnant (P>0.10).”
The statement does not indicate what test was performed or what the
value of the test statistic was. It appears that the hypotheses of interest
are H 0 :   5.63 versus H a :   5.63 , where  represents the true mean
serum receptor concentration for pregnant women. Because the sample
size is large, the one-sample t test would be appropriate if the sample
can be considered a random sample. With the large sample size, no
assumptions abut the shape of the population distribution of serum
receptor concentration values are necessary. Because the reported Pvalue is so large (P-value>0.10), there is no reason to reject H 0 . We
cannot conclude that the mean for pregnant women differs from the
known mean of 5.63 for women who are not pregnant.
Example 2
Consider the article “Increased Vital and Total Lung Capacity in Tibetan
Compared to Han Residents of Lhasa” (American Journal of Physical
Anthropology [1991]:mm341 – 351) that compared various physical
characteristics of people who lived at high altitudes with those of people
who live at sea level. The article includes the following statements: “We
studied 38 Tibetan and 43 Han residents….The Tibetan compared with
the Han subjects had a larger total lung capacity
6.80  0.19 (mean  SEM) vs. 6.24  0.18 liters .” SEM means standard error of
the mean not margin of error.
The report intervals are of the form estimate  standard error . We can use
this information to construct a confidence interval for the mean total
lung capacity for residents of each of the two locations. Because the
sample sizes are both large, we can use the t confidence interval formula
mean   t critical value standard deviation of the mean 
The Tibetan sample has df = 38 – 1 =37 and
The Han sample has df = 43 – 1 = 42.
The 95% confidence intervals are
Tibetan residents: 6.80  2.03  0.19   6.41,7.19
Han residents: 6.24  2.02  0.18   5.88,6.60
Assuming that it is reasonable to view these samples as random
samples, we can interpret these intervals as follows: Based on the
information provided in the sample of Tibetan residents, we can be 95%
confident that the mean total lung capacity of Tibetan residents is
between 6.42 and 7.18, and based on the information provided in the
sample of Han residents, we can be 95% confident that the mean total
lung capacity of Han residents is between 5.88 and 6.60. These intervals
are not very narrow, indicating that the value of the population mean has
not been estimated as precisely as we might like in either case. This is
not surprising, given the reported sample sizes and the variability in each
sample. Not that the two intervals overlap. This may cause us to be
skeptical of the statement that Tibetans have a higher total lung capacity
than Han residents. Formal methods for directly comparing two groups
(2-sample t interval) should used to further investigate this issue.
Example 3
Consider a study reported in the article “The Relationship Between
Distress and Delight in Males’ and Females’ Reaction to Frightening
Films” (Human Communication Research [1991]: 625 – 637). The
investigation measured emotional responses of 50 males and 60 females
after the subjects viewed a segment from a horror film. The article
included the following statement: “Females were much more likely to
express distress than were males. While males did express higher levels
of delight than females, the difference was not statistically significant.”
The following summary information was also contained in the article:
GENDER
Males
Females
DISTRESS INDEX
Mean
SD
31.2
10.0
40.4
9.1
P-value<.001
DELIGHT INDEX
Mean
SD
12.02
3.65
9.09
5.55
Not significant
P-value>.05
The P-values are the only evidence of the hypothesis tests that support
the given conclusions. The P-value<.001 for the distress index means
that the hypothesis H 0 : F  M  0 was rejected in favor of H 0 : F  M  0 .
The nonsignificant P-value (P-value>.05) reported for the delight index
means that the hypothesis H 0 : F  M  0 could not be rejected. Chance
sample-to-sample variability is a plausible explanation for the observed
difference in sample means (12.02 – 9.09). Thus we would not want to
put much emphasis on the author’s statement that males express higher
levels of delight than females, because it is based only on the fact that
12.02>9.09, which could plausibly be due entirely to chance.
The article describes the samples as consisting of undergraduates
selected from the student body of a large Midwestern university. The
authors extrapolate their results to American men and women in general.
If this type of generalization is considered unreasonable, we could be
more conservative and view the sampled populations as male and female
university students or even male and female students at this university.
The comparison of males and females was based on two independently
selected groups (not paired). Because the sample sizes were large, the 2sample t test for means could reasonably have been used, and this would
have required no specific assumptions abut the two underlying
populations.
Example 4
The authors of the article “Predicting Professional Sports Game Outcomes from
Intermediate Game Scores” (Chance [1992]: 18-22) used a chi-square test to determine
whether there was any merit to the idea that basketball games were not settled until the
last quarter, whereas baseball games are over the seventh inning. They also
considered football and hockey. Data were collected for 189 basketball games, 92
baseball games, 80 hockey games, and 93 football games. The analyzed games were
sampled randomly from all games played during the 1990 season for baseball and
football and for the 1990 – 1991 season for basketball and hockey. For each game, the
late-game leader was determined, and then it was noted whether the late-game leader
actually ended up winning the game. The resulting data are summarized in the
following table.
Sport
Basketball
Baseball
Hockey
Football
Late-Game
Leader
2Wins
150
86
65
72
Late-Game
Leader
Loses
39
6
15
21
The authors stated that the “late-game leader us defined as the team that is ahead after
three quarters in basketball and football, two periods in hockey, and seven innings in
baseball. The chi-square value (with three degrees of freedom) is 10.52 (P<.015).” They
also concluded that “the sport that is an anomaly is baseball. Only 6.5% of baseball
games resulted in late reversals…. [The chi-square test] is statistically significant due
almost entirely to baseball.
In this particular analysis, the authors are comparing four populations (games from
each of the four sports) on the basis of a categorical variable with tow categories (lategame leader wins and late-game leader loses). The appropriate null hypothesis is then
H 0 : The true proportion in each category (leader wins, leader loses) is the same for all four sports.
Based on the reported value of the chi-square statistic and the associated P-value, this
null hypothesis is rejected, leading to the conclusion that the category proportions are
not the same for all four sports.
The validity of the chi-square test requires that the sample sizes be large enough so that
no expected counts are less than 5. The smallest expected count is 14.27, so the
sample sizes are large enough to justify the use of the
2
X 2 test. Note also that baseball
contributes a total of 8.042 to the value of the X statistic of 10.518. This is due to the
large discrepancies between the observed and expected counts in the other cells. This
is probably the basis for the authors’ conclusion that baseball is the only anomaly and
that the other sports were similar.
AP Statistics Project (1)
Due: 4/23/10
Part 1
Find and attach full copies of two related journal articles that use
statistical inference to support their position or findings. The articles do
not have to come from the same publication. The articles should be on
the same or similar topic. Please have me approve your articles before
you proceed to step 2 and 3.
Part 2
Look for the statistical features in one of the articles. Attached is a list
(set of questions) of some of the features. Also attached are four
examples of how to analyze the articles.
Part 3
Report on one of the two articles:
Your report is oral and in the form of a Power Point presentation. Prepare
your Power Point presentation as per the directions below. Place the
presentation into my “STATS” folder on the student common drive by
3:00 pm on 4/23/10.
Power Point presentation should contain:
1. 2 – 3 slides - No more than 10 minute explanation.
2. A brief explanation about the nature and purpose of the research. Be
sure to include the author, title and source.
3. A comment about how the author generalizes to a larger population
based on his sample. Give details about the sample (W’s).
4. An explanation about one of the hypothesis tests performed and the
conclusions drawn. Do you agree? Why?
5. And be ready to answer general questions about the second article.
Name:
1. 2 journal articles appropriate
2. powerpoint and articles submitted on time
3. Power Point presentation should contain:
A. 2 – 3 slides - No more than 10 minute explanation.
B. A brief explanation about the nature and purpose of the research. Be
sure to include the author, title and source.
C. A comment about how the author generalizes to a larger population
based on his sample. Give details about the sample (W’s).
D. An explanation about one of the hypothesis tests performed and the
conclusions drawn. Do you agree? Why?
E. And be ready to answer general questions about the second article.
Download