Populations: - BCIT Commons

advertisement
MATH 2441
Probability and Statistics for Biological Sciences
Recognizing Statistical Language in Reports
Recall our general definition that statistics is the branch of mathematics which provides methods
for "describing, drawing conclusions about, or making predictions about populations", based on
information available for a random sample from that population.
What makes statistics such an important field of study for technologists, scientists, and other
professionals, is that almost all experimental "research" work done has the goal of "describing,
drawing conclusions about, or making predictions about populations." Even when you read a
relatively non-technical report in a current periodical or newspaper, you need to be able to sort
out the statistical issues in order to correctly understand what the report really means.
Even at the most basic levels of understanding, there are certain statistical elements in any report
on research results. Six of them are:
1. populations: exactly what populations were considered, or which populations does
the report comment about?
2. samples: which samples were studied (including matters of sample size, method of
study, type of data collected)?
3. numbers: The report will quote numbers, sometimes as actual numbers, and
sometimes mentioned as words (eg. "half of all respondents …" implies the number
0.5). How can you classify these numbers as pertaining to a sample or a population,
and as being a sample statistic (that is, a numerical summary of some characteristic
of the sample), a population parameter (that is, a numerical summary of some
characteristic of the population), or simply a number that describes some defining
characteristic of the experiment. Sample statistic values are experimentally
measured, whereas population parameters are estimates.
4. statistical issues: What claims are made about the populations considered?
Ultimately, these claims must be expressible in terms of the values or comparisons of
values of population parameters.
5. technical issues: What questions are raised that can really only be answered by
reference to scientific or technical principles, as opposed to statistical studies? How
can you explain the observed statistical result by reference to scientific or technical
principles?
6. controversies: Which claims or conclusions that are reported are still considered
controversial, and what is it about the study that makes them controversial?
If you sort out each of these items explicitly when you are reading a report, you will have a much
greater likelihood of understanding what the report is really saying, and you will be less likely to
adopt conclusions that aren't intended.
To illustrate how this sort of analysis might go, we consider a recent short article from the
magazine, New Scientist (May 30, 1998 issue, page 5) entitled "Health heresy." The New
Scientist magazine is available in the BCIT library. We duplicate the text of the article here with
the permission of the publisher to illustrate the analysis process suggested above. The article
consists of nine paragraphs, numbered as indicated for easier reference:
David W. Sabo (1999)
Recognizing Statistical Language in Reports
Page 1 of 3
Health heresy
By Nell Boyce
(1)
Smoking cigarettes can prevent breast cancer in women carrying rare genes that predispose them to
the disease, suggests a highly controversial new study.
(2)
An international team, coordinated by Steven Narod of the Women's College Hospital in Toronto,
looked at the relationship between lifestyle and breast cancer in more than 300 women with inherited
mutations in genes called BRCA1 and BRCA2. Only one in 250 women carry these mutations, but
among those who do, 80 per cent develop breast cancer before they reach 70.
(3)
The average age of the women who took part in the study was 50. Half of them had already
developed breast tumours, and the most significant difference between these women and those who
were still free from cancer turned out to be that the latter were more likely to be smokers. Getting
through the equivalent of 20 cigarettes a day for four years seemed to cut the risk of developing
breast cancer by 53 per cent, Narod and his colleagues report in the latest Journal of the National
Cancer Institute (vol 90, p 761).
(4)
The researchers and the agencies that funded their work stress that the results apply only to women
carrying these mutations. "This study should not cause women to take up smoking," says Paul
Kleihues, director of the International Agency for Research on Cancer in Lyon. "The adverse health
effects are overwhelming."
(5)
But the findings, if confirmed, mean that women with BRCA1 and BRCA2 mutations might actually
gain a net health benefit from smoking--although the researchers involved refuse to make this
statement. "Obviously, it's not politically correct to advocate smoking," says Timothy Rebbeck of the
University of Pennsylvania in Philadelphia. "We felt a little afraid of the results," admits JeanSebastien Brunet of the University of Toronto.
(6)
Though nobody knows how smoking might protect against breast cancer, oestrogen can spur the
growth of cancers, and women who smoke seem to have lower levels of the female sex hormone.
Another possibility is that breast cells react to cigarette smoke by churning out detoxifying enzymes
that also work against the mutations. If the mechanism can be understood, it might be possible to
identify benign drugs with the same effect, the researchers speculate.
(7)
However, some epidemiologists aren't convinced by the new study, noting that it involved a relatively
small group of women. "If you're making a claim as remarkable as this, you've got to have more
convincing evidence," says Richard Peto of the University of Oxford. He believes the association
found by Narod's team is "pure chance".
(8)
Huge epidemiological studies, including one conducted by the American Cancer Society, completed
in 1994 and involving more than half a million women, have found no strong link between smoking
and breast cancer. But the new study is not the first to claim a connection between the two for
women carrying particular mutations. In 1996, researchers in the US claimed that women carrying
mutations in a gene called NAT2, which makes an enzyme that detoxifies carcinogens, were more
likely to develop breast cancer if they smoked (New Scientist, print edition, 9 November 1996, p 4).
(9)
Peto is equally sceptical about this study. "The small studies suggesting hazards and the small
studies suggesting benefits are wrong," he claims.
From New Scientist, 30 May 1998
In outline, examples of items in the six categories that can be found in this short article are (using
p1, p2, etc. to indicate the paragraph number):
Page 2 of 3
Recognizing Statistical Language in Reports
David W. Sabo (1999)
Populations:
p1/p2: all women who inherit mutations on genes BRCA1 and BRCA2.
p8: all women (in the population sampled for the American Cancer Society study)
p8 all women who carry NAT2 mutations
Samples:
p2: the 300 women studied by Narod et al formed a sample of the population of all women with
the BRCA1 and BRCA2 genetic mutations
p8 the half million women involved in the American Cancer Society study formed a sample of
the population of all women (no restrictions are mentioned in this article -- you would
have to check the original report to determine if the sampling was restricted to certain
national or geographic areas, certain age groups, etc.)
Numbers:
p2: 1 in 250 women carry the BRCA1 and BRCA2 mutations, estimate of the proportion of the
population of all women
p2: 80% of women with these mutations develop breast cancer, estimate of the proportion of the
population of all women with these mutations
p2: age 70, not really a statistic or the estimate of a population parameter, but a number that
defines the population being studied
p3: average age of women participating in Narod et al study was 50, sample mean
p3: half of them had already developed breast cancer, sample proportion
p3: 20 cigarettes a day for four years, not really a statistic or estimate of a population parameter,
numbers which define the population being studied
p3: cancer risk cut by 53% -- report about the relative sizes of two sample proportions: the
proportion of women getting breast cancer in the 20 cigarette a day sample was 47% of
the proportion of women getting breast cancer in remainder of the sample of 300.
Statistical Issues (Claims Made About Populations):
p1: for women with two rare genetic mutations, the proportion of smokers who get breast
cancer is different (less than) the proportion of nonsmokers who get breast cancer
p4: a rather complex statistical issue or collection of issues alluded to here -- does the increased
risk of other health hazards of smoking more than cancel out the benefit of the reduced
risk of breast cancer for women with the BRCA1 and BRCA2 mutations?
p8: American Cancer Society study found no difference between the proportion of all smoking
women who got breast cancer and the proportion of all non-smoking women who got
breast cancer
p8: NAT2 study claimed that for the population of women with the NAT2 genetic mutation, the
proportion of smokers who got breast cancer was higher than the proportion of nonsmokers who got breast cancer
Technical Issues:
p6: what is the mechanism by which smoking might protect these women from breast cancer
What Claims are Considered Controversial & Why?:
p9: both the claim that smoking causes higher rates of breast cancer for some (based on the
NAT2 study), and the claim that it reduces rates of breast cancer for some (based on the
Narod et. al. study) are suspicious because the samples are considered to be small. The
observed results could be due to coincidence (sampling error).
David W. Sabo (1999)
Recognizing Statistical Language in Reports
Page 3 of 3
Download