MATH 2441 Probability and Statistics for Biological Sciences Recognizing Statistical Language in Reports Recall our general definition that statistics is the branch of mathematics which provides methods for "describing, drawing conclusions about, or making predictions about populations", based on information available for a random sample from that population. What makes statistics such an important field of study for technologists, scientists, and other professionals, is that almost all experimental "research" work done has the goal of "describing, drawing conclusions about, or making predictions about populations." Even when you read a relatively non-technical report in a current periodical or newspaper, you need to be able to sort out the statistical issues in order to correctly understand what the report really means. Even at the most basic levels of understanding, there are certain statistical elements in any report on research results. Six of them are: 1. populations: exactly what populations were considered, or which populations does the report comment about? 2. samples: which samples were studied (including matters of sample size, method of study, type of data collected)? 3. numbers: The report will quote numbers, sometimes as actual numbers, and sometimes mentioned as words (eg. "half of all respondents …" implies the number 0.5). How can you classify these numbers as pertaining to a sample or a population, and as being a sample statistic (that is, a numerical summary of some characteristic of the sample), a population parameter (that is, a numerical summary of some characteristic of the population), or simply a number that describes some defining characteristic of the experiment. Sample statistic values are experimentally measured, whereas population parameters are estimates. 4. statistical issues: What claims are made about the populations considered? Ultimately, these claims must be expressible in terms of the values or comparisons of values of population parameters. 5. technical issues: What questions are raised that can really only be answered by reference to scientific or technical principles, as opposed to statistical studies? How can you explain the observed statistical result by reference to scientific or technical principles? 6. controversies: Which claims or conclusions that are reported are still considered controversial, and what is it about the study that makes them controversial? If you sort out each of these items explicitly when you are reading a report, you will have a much greater likelihood of understanding what the report is really saying, and you will be less likely to adopt conclusions that aren't intended. To illustrate how this sort of analysis might go, we consider a recent short article from the magazine, New Scientist (May 30, 1998 issue, page 5) entitled "Health heresy." The New Scientist magazine is available in the BCIT library. We duplicate the text of the article here with the permission of the publisher to illustrate the analysis process suggested above. The article consists of nine paragraphs, numbered as indicated for easier reference: David W. Sabo (1999) Recognizing Statistical Language in Reports Page 1 of 3 Health heresy By Nell Boyce (1) Smoking cigarettes can prevent breast cancer in women carrying rare genes that predispose them to the disease, suggests a highly controversial new study. (2) An international team, coordinated by Steven Narod of the Women's College Hospital in Toronto, looked at the relationship between lifestyle and breast cancer in more than 300 women with inherited mutations in genes called BRCA1 and BRCA2. Only one in 250 women carry these mutations, but among those who do, 80 per cent develop breast cancer before they reach 70. (3) The average age of the women who took part in the study was 50. Half of them had already developed breast tumours, and the most significant difference between these women and those who were still free from cancer turned out to be that the latter were more likely to be smokers. Getting through the equivalent of 20 cigarettes a day for four years seemed to cut the risk of developing breast cancer by 53 per cent, Narod and his colleagues report in the latest Journal of the National Cancer Institute (vol 90, p 761). (4) The researchers and the agencies that funded their work stress that the results apply only to women carrying these mutations. "This study should not cause women to take up smoking," says Paul Kleihues, director of the International Agency for Research on Cancer in Lyon. "The adverse health effects are overwhelming." (5) But the findings, if confirmed, mean that women with BRCA1 and BRCA2 mutations might actually gain a net health benefit from smoking--although the researchers involved refuse to make this statement. "Obviously, it's not politically correct to advocate smoking," says Timothy Rebbeck of the University of Pennsylvania in Philadelphia. "We felt a little afraid of the results," admits JeanSebastien Brunet of the University of Toronto. (6) Though nobody knows how smoking might protect against breast cancer, oestrogen can spur the growth of cancers, and women who smoke seem to have lower levels of the female sex hormone. Another possibility is that breast cells react to cigarette smoke by churning out detoxifying enzymes that also work against the mutations. If the mechanism can be understood, it might be possible to identify benign drugs with the same effect, the researchers speculate. (7) However, some epidemiologists aren't convinced by the new study, noting that it involved a relatively small group of women. "If you're making a claim as remarkable as this, you've got to have more convincing evidence," says Richard Peto of the University of Oxford. He believes the association found by Narod's team is "pure chance". (8) Huge epidemiological studies, including one conducted by the American Cancer Society, completed in 1994 and involving more than half a million women, have found no strong link between smoking and breast cancer. But the new study is not the first to claim a connection between the two for women carrying particular mutations. In 1996, researchers in the US claimed that women carrying mutations in a gene called NAT2, which makes an enzyme that detoxifies carcinogens, were more likely to develop breast cancer if they smoked (New Scientist, print edition, 9 November 1996, p 4). (9) Peto is equally sceptical about this study. "The small studies suggesting hazards and the small studies suggesting benefits are wrong," he claims. From New Scientist, 30 May 1998 In outline, examples of items in the six categories that can be found in this short article are (using p1, p2, etc. to indicate the paragraph number): Page 2 of 3 Recognizing Statistical Language in Reports David W. Sabo (1999) Populations: p1/p2: all women who inherit mutations on genes BRCA1 and BRCA2. p8: all women (in the population sampled for the American Cancer Society study) p8 all women who carry NAT2 mutations Samples: p2: the 300 women studied by Narod et al formed a sample of the population of all women with the BRCA1 and BRCA2 genetic mutations p8 the half million women involved in the American Cancer Society study formed a sample of the population of all women (no restrictions are mentioned in this article -- you would have to check the original report to determine if the sampling was restricted to certain national or geographic areas, certain age groups, etc.) Numbers: p2: 1 in 250 women carry the BRCA1 and BRCA2 mutations, estimate of the proportion of the population of all women p2: 80% of women with these mutations develop breast cancer, estimate of the proportion of the population of all women with these mutations p2: age 70, not really a statistic or the estimate of a population parameter, but a number that defines the population being studied p3: average age of women participating in Narod et al study was 50, sample mean p3: half of them had already developed breast cancer, sample proportion p3: 20 cigarettes a day for four years, not really a statistic or estimate of a population parameter, numbers which define the population being studied p3: cancer risk cut by 53% -- report about the relative sizes of two sample proportions: the proportion of women getting breast cancer in the 20 cigarette a day sample was 47% of the proportion of women getting breast cancer in remainder of the sample of 300. Statistical Issues (Claims Made About Populations): p1: for women with two rare genetic mutations, the proportion of smokers who get breast cancer is different (less than) the proportion of nonsmokers who get breast cancer p4: a rather complex statistical issue or collection of issues alluded to here -- does the increased risk of other health hazards of smoking more than cancel out the benefit of the reduced risk of breast cancer for women with the BRCA1 and BRCA2 mutations? p8: American Cancer Society study found no difference between the proportion of all smoking women who got breast cancer and the proportion of all non-smoking women who got breast cancer p8: NAT2 study claimed that for the population of women with the NAT2 genetic mutation, the proportion of smokers who got breast cancer was higher than the proportion of nonsmokers who got breast cancer Technical Issues: p6: what is the mechanism by which smoking might protect these women from breast cancer What Claims are Considered Controversial & Why?: p9: both the claim that smoking causes higher rates of breast cancer for some (based on the NAT2 study), and the claim that it reduces rates of breast cancer for some (based on the Narod et. al. study) are suspicious because the samples are considered to be small. The observed results could be due to coincidence (sampling error). David W. Sabo (1999) Recognizing Statistical Language in Reports Page 3 of 3