The problems of Statistics: sex, money and education Preliminary remarks I was both very grateful and very nervous to receive the invitation to give this talk to the Assistant Statisticians and Statistical Officers Conference. Very grateful because, as an applied probabilist, my experience of `statistics at the sharp end’ is pretty limited; and very nervous … for exactly the same reason. To be more precise about my experience: I have done some consulting for the water industry, I trained as an actuary before moving to Warwick University as a lecturer, and I have some expertise in Financial Mathematics. As you can see, a paltry set of credentials to wave under the nose of your average statistician. In mitigation I can honestly claim that I have maintained an interest in statistics at the two extremes throughout my career. What do I mean by the two extremes? I mean the very theoretical end and the - applied has completely the wrong overtones, so let us say - the `How do we go about this?’ end. Some of you will now be objecting that these are the same end of the spectrum but I hope that during this talk it will become clear both what I mean and that it is the, or at least an-, other end of the spectrum. To turn then to the subject of this talk: `the problems of statistics’. A pretentious and over-ambitious title? Well, yes! On the other hand, I think if one stands back a long way one sees that, when it comes to the serious issues of practise, methodological disagreements between statisticians and, in particular, the almost religious divide between frequentists and Bayesians assume their proper relevance (or irrelevance). It is regrettable, perhaps, that one sees (to quote Laurence Sterne (the master of portrayal of emotionally-charged intellectual argument)) “one half of a learned profession tilting full butt against the other half of it, and then tumbling and rolling over one another like hogs.”1, but it is not very important when it comes to the significance of real statistics. Turning to the subtitle of this talk: back in the days when there was such a thing as `polite society’ it was a firm rule that three things should not be discussed at a dinner party: money, religion and politics. To these we might add `sex’, on the grounds that, whilst it wouldn’t have been considered necessary to ban this topic in those days, it certainly might be now. The point, I take it, of banning them is that these subjects arouse the strongest of emotions in those who hold an opinion; and it is rare to find someone without an opinion. So, intense argument and disgust is likely to ensue when such matters are discussed. We, however, all have strong stomachs and besides, we’re not at dinner. My overall point is that these topics, along with education (because we’ve all had one), are subjects, spheres of human activity where everyone has an opinion, a corner to fight, an axe to grind. Picture, if you will, the noble statistician, like some character from the art of socialist realism, striding forth, hammer or sickle in hand, to do battle with the monsters of 1 Tristram Shandy, p207. Laurence Sterne The problems of statistics 2 ignorance on some topic such as AIDS, home childbirth, the money supply, cancer screening or the quality of education. In my opinion, ignorance is the least of your problems. The real `monsters’ you face are special interests, the precise definition of terms and of `treatments’, political expediency and the ability of people, in the mass, to circumvent attempts to control them. Religion: I do not intend to discuss this but would say with Lord Melbourne: “While I cannot be regarded as a pillar, I must be regarded as a buttress of the church, because I support it from the outside”. There is an excellent paper by Bartholomew2 in JRSSA to which you should refer if you are interested in the interaction between statistics and religion. I would only point out two conflicts explored in that paper: the first between what is often called Cromwell’s law: `I beseech you, in the bowels of Christ, think it possible you may be mistaken’3 which is usually interpreted as `don’t rule possibilities out in your model’, and Occam’s razor [least hypothesis] – William of Occam’s principle that the simplest hypothesis adequately explaining the facts is to be preferred. Popper sides with Cromwell by asserting that, a priori, more complicated models are more likely than simpler ones (since with more parameters it’s easier to fit a model). The second concerns whether life itself is evidence for God: `life is so unlikely that God must have set things up’ versus `God would never have set things up so that life was so unlikely, therefore life is an accident’. Sex Let me turn swiftly to the first of the topics in the subtitle. Of course the term sex here is merely a way to suck you in to a discussion of medicine and medical statistics. It might be said that the medical view of sex is that it has three outcomes: HIV/AIDS, other (more strictly defined) sexually transmitted diseases and pregnancy. Indeed, to slightly change the grim medical joke, we might say that there are only two, since the third outcome is a subset of the second. Two faulty views of medics persist today. The first views them as educated people striving only to save life or improve the quality of people’s health. People, in other words, who always remember Galen’s stricture: `first, do no harm’. The second regards them as overanxious, arrogant know-alls; too keen to intervene in situations where things are best left well alone. In short, as the fools who do the rushing-in. I must admit that I tend to belong to the second camp and, like many people, could be regarded as a spiritual descendent of Laurence Sterne when he caricatures them as one side of a discussion about how the body arranges the quantity of blood, in particular with reference to someone who has lost both his legs in a battle: “Nature accommodates herself to these emergencies, cried the opponents—else what do you say to the case of a whole stomach—a whole pair of lungs but half a man, when both his legs have been unfortunately shot off? — 2 3 Bartholomew, D J: Probability, statistics and theology. JRSSA, 151, 137-178, 1988 Letter to the General Assembly, Church of Scotland. 3 Aug. 1650. The problems of statistics 3 He dies of a plethora, said they—or must spit blood, and in a fortnight or three weeks go off in a consumption— It happens otherwise—replied the opponents. — It ought not, they said.”4 So, let’s start with the thorny issue of home births-their advisability and desirability. First a little history. Back in the 1930’s approximately 95% of UK births were at home and 5% in hospital5. Crude rates for perinatal mortality were better for home births (actually the relevant study was by social class but this was very closely correlated with place of birth for obvious reasons pre the NHS6), however the rate was about 60 per 1000. By 1966 the percentage of hospital births had risen to 75%7, with mortality rates substantially better for home births. ‘Round about 19708 (and with mortality rates still substantially better for home births), in the face of a mortality rate of about 20 per 10009 (which was considered much too high), there was a [more] concerted move to hospital births so that, by 1990 the home birth rate was about 1%. Opponents of the medicalisation of childbirth pointed to other countries (in particular the Netherlands, which did not follow the `hospital route’ and still has a home birth percentage of about 60% and similar mortality to the UK), suggesting that for lowrisk births, home-birth was as safe as, if not safer than hospital birth. Then, in 1996, the Northern Region Perinatal Mortality Survey10 was published in the BMJ, comparing all home births (in the region) for the period 1986-1993 with all hospital births. In 3466 home births, mortality was 134 (an enormously high figure). At first sight this confirms the wisdom of hospitalisation. However, 131 of these deaths were in cases where the home birth was not planned or where there was no plan for delivery at all (i.e. the pregnancy had either been concealed or not diagnosed)! The remaining mortality was substantially better than the average. An editorial in the same BMJ issue refers to the Cumberlege11 report, which `sees home birth as a real option’ and suggests home birth is an option for `women with low risk of obstetrical complications’. However, it points out that `some primary care practitioners may need to be persuaded to provide the option for their patients: the survey from Britain’s Northern region found that GPs and, to a lesser extent, midwives often had reservations about home birth and tended to discourage it.’ 4 Tristram Shandy, p261. Laurence Sterne Maternal Services. The Bourn report. HMSO 1990 6 Johanson R, Newburn M and MacFarlane A: Has the medicalisation of childbirth gone too far? BMJ 324, 892-895, 13 Apr 2002 7 The Court Report on child health. 1976? 8 Central Health Services Committee. Standing Maternity and Midwifery Committee. Report of the sub-committee on domiciliary midwifery and maternity bed needs. HMSO 1970 5 9 Year 1966 1969 1970 1971 1972 1973 % live births which were at home 25.2 16.4 13.6 11.0 8.6 6.1 % still births which were at home 9.2 7.0 5.2 5.3 4.3 3.9 (see ref. 7) 10 Collaborative survey of perinatal loss in planned and unplanned home births. Northern Region Perinatal Mortality Survey Coordinating Group. BMJ 313, 1306-1309, 23 Nov. 1996. 11 Changing childbirth. Dept. of Health Expert Maternity Group. HMSO 1993. The problems of statistics 4 So, it’s all sorted then! Well, not quite, there are two problems. Firstly there are dissenting voices to the removal of those 131 deaths from the 134 figure (see Drife12 and letters in the BMJ 320 (18 Mar 2000) p 798)) and secondly there is a dearth of midwives experienced at home birth. Actually there’s a third caveat: `… the absence of randomised clinical trials’ which (in a glorious understatement) are `difficult to achieve’! The NBTF enquiry13 attempted (but failed) to form matched (by risk factors) pairs of low risk (planned) hospital and home births. There were 5971 women in the home group and only 4724 in the hospital group. Probably the most important conclusion (apart from the lower rates of infant mortality and morbidity and of caesareans and other interventions in the home group) was that `home births will probably increase to 4 or 5% of all maternities in the UK over the next decade and this needs preparatory planning’! To give a very partisan summary then, the NHS moves (at great cost and with no supporting evidence) from 25% home births to, essentially, all hospital births over 20 years. After a decade of argument, it is then conceded that statistics suggest that the NHS possibly shouldn’t have done this, but we’ve lost so much expertise that we can’t move back quickly, indeed it will take a decade to reverse one eighth of the change, and it will be done in the face of continued resistance from the providers of primary health care. What is the problem for statisticians? To be included in the decision making process, and to have their advice taken seriously. Now that word `sex’ again. The OED gives the following definition: Sex: 1. Either of the two divisions of organic beings distinguished as males and females respectively … Of course this is the meaning for which we now commonly use the word `gender’ (as in Gender Studies) so let’s check that in the OED: Gender: 1. Kind, sort. 2. Each of the two or three grammatical `kinds’, … , into which substantive nouns are discriminated … Gender (v): 2. To copulate. As far as I’m concerned this is sufficient justification to include the topic of breast cancer screening (which, I am reliably informed, is a women’s issue) under the heading of sex. We’ll start again with a brief history. After the publication in 1985 of the results from the `two counties’ randomised controlled trial in Sweden, the UK introduced a breast cancer screening programme (mammography) for women aged 50-69, with a target of a 25% reduction in breast cancer mortality in the target age group by 2000. 12 13 Drife, J: Data on babies’ safety during hospital births are being ignored. BMJ 319, 1008, 9 Oct 1999 Chamberlain G, Wraight A, Crowley P: Birth at home. Practical Midwife, 2(7), 35-39, 1999. The problems of statistics 5 My attention was first drawn to this subject by hearing a discussion on Woman’s Hour14 between a Danish statistician and the Head of the NHS breast cancer-screening programme. The statistician made the, I discover, fairly standard assertions that there was no evidence of improvement and that there was evidence of adverse effects on women’s health. The response from the Head of the screening programme was horror and outrage: `how could anyone suggest that screening was ineffective, let alone detrimental. This was an important issue for women and he should shut up’ is a not unfair summary. It took some time to track down the statistician: as it turns out, he is a member of the Cochrane Breast Cancer Group (a part of the influential Cochrane Collaboration). The paper15 reported a meta-analysis of eight randomised controlled trials (5 from Sweden). Six (!) were rejected for reasons of bias in randomisation and the remaining two gave a relative mortality risk for the screened group of 1.06 (10% mortality) so that `for every 1000 women screened over twelve years, one breast cancer death is averted but the total number of deaths is increased by six.’ In addition, the mastectomy rate was increased by 25%, as was the radiotherapy rate. Incidentally, the office of the NHS cancer screening programmes stated “It is difficult to evaluate these claims … based on … two studies classified as poor quality studies by Gøtzsche and Olsen.”16 This is a mistake (whether deliberate or accidental is not for me to say); the authors classify these studies as `adequately randomised’ and `unbiased’, it was the excluded studies that they classified as `poor’. To quote a Lancet editorial17: `At present there is no evidence from large randomised trials to support screening mammography programmes’. Now see the Lancet commentary of 200218: `The benefits appear real but modest’, but, despite trials with 247010 participants `the latest analysis does not tell us whether the massive effort… is worthwhile’! The statistician’s problem: to measure a (possible) small improvement sufficiently accurately so as to determine its cost-effectiveness in the face of enormous political pressure supporting that measure. AIDS Turning briefly now to AIDS, I recently performed a little experiment. I asked 17 friends and acquaintances (aged over 40) what they remembered about the history of AIDS and the predictions back in the early to mid 80s and what they thought about it now. There was a surprisingly uniform and simple response: `You lot got it wrong”! On further exploration the common theme was that by late 1986 statisticians had made `doomsday’ predictions about the likely incidence of AIDS in the UK and that these had proved vast overestimates. So I checked. The AIDS awareness campaign started in mid 1986 and various measures (in particular needle exchanges and a concerted safe sex campaign) were fully in place by 1989. So I searched for papers appearing by the end of 1988. A literature search unsurprisingly turned up many Woman’s Hour. Radio 4, 6/7 April 2002. O. Olsen and P C Gøtzsche: Is screening for breast cancer with mammography justified? Lancet 355, 129-134, 8 Jan 2000 16 Mayor S: Row over breast cancer screening shows that scientists “bring some subjectivity to their work”. BMJ 323, 956, 7 Oct 2001 17 Horton R: Screening mammography—an overview revisited. Lancet 358, 1284-1285, 20 Oct 2001 18 Gelmon K A and Olivotto I: The mammography screening debate—time to move on. Lancet 359, 904-905, 16 Mar 2002 14 15 The problems of statistics 6 papers, and in particular a special issue of JRSSA19, coincidentally the same issue which contained the paper by Bartholomew which I mentioned earlier. I was at first surprised to discover no numerical predictions whatsoever. Of course, on reflection this is no surprise at all. Statisticians may be contentious but they aren’t (usually) suicidal. It was clear that the incubation period for AIDS was long and the data with which to estimate parameters in any reasonable model simply weren’t there. Very definitely a case of the dog which didn’t bark in the night, and yet, on the basis of my survey, statisticians are carrying the can. The statistician’s problem: sometimes the fact that you can’t do anything is not an acceptable excuse, and making If…, then… statements won’t get you off the hook. The Money Supply The economists’ definition of money is: `a store of value, a medium of exchange and a unit of account’. It’s important to understand this economic definition of money since otherwise one would assume that the money supply consisted of notes and coins. I don’t have time to give the standard economics lecture on how money is created by the banking system by fractional banking and on the treatment of liquid assets as money but let us just say for now that in 1994 one measure of money (M2) put the money supply at £401bn whilst the value of notes and coins was £21bn20. The reason for this is basically that many assets (including deposits at banks) are usually regarded as money by their owners when it comes to making decisions about spending. Why seek to control the money supply? After a moment’s thought, it should be clear that the growth in money spent should be the real growth in production of goods and services plus inflation. Thus controlling the money supply should control inflation (since there is a limit to how fast a given stock of money can circulate). To quote a standard introductory university text: “In the 1970’s the authorities in many parts of the world’s governments became converted to monetarism, the belief that… important macro variables can be manipulated by manipulating the money supply”21. OK, so never mind the policy, there are good reasons why we might want to know the amount of money available. So, the statistical problem is to measure how much money is available for transactions (we might also want to measure the velocity of circulation, but that’s a separate question). 19 Issue 1, JRSSA 151, 1988 Lipsey R G and Chrystal K A: An introduction to positive economics (8 th edition), OUP, 1995, p746. 21 Lipsey R G: An introduction to positive economics (6th Edition), Weidenfeld and Nicholson, 1983, p690. 20 The problems of statistics 7 MONEY SUPPLY DEFINITIONS M0 M1 M2 £M3 M3 PSL1 PSL2 Notes and coins X Bank deposits: Sight deposits Checkable deposits Other interest bearing Time deposits< 1month Time deposits 1month to 2 years Time deposits over 2 years Sector: Private Public Foreign currency: Size<£100,000 Size >£100,000 Other money market instruments Savings deposits and securities X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X As to what happened, I can do little better than quote Lipsey again: “There are so many highly substitutable monetary assets that control of any one group can often lead to disintermediation as decision makers slip into holding more of a similar but uncontrolled asset and less of the controlled one…” “Many central banks started out controlling M1. Those who were successful, however, often found the simple statistical relation between M1 and those macro aggregates they sought to control breaking down. The public learned to do with assets not in M1 and the central banks then sought to control a wider monetary aggregate”22 [See also table 38.1 in Lipsey (8th Edition)] By the end of monetary targeting in this country in 1986 it’s clear that people were not just using the money under their granny’s mattress, but were allowing for future grannies they might acquire by remarriage and adding in their stock of Monopoly money together with their telephone number. The statistician’s problem: sometimes quantum effects apply at the macroscopic level – the act of measuring affects the (desired effect of) the measurement. To spell it out: if you control a surrogate (for the thing you really want to control) you’ll influence behaviour, but quite possibly not in the way that you want. 22 Lipsey R G: An introduction to positive economics (6 th Edition), Weidenfeld and Nicholson, 1983, p691. The problems of statistics 8 Education: turning now to education, if we ignore the current, high-octane, issue of university fees, you might say `what’s the problem?’ Literacy rates are at 95% (higher than they’ve ever been), `A’ level pass rates rise and rise and soon 50% of young people will go to university. In short, it’s an unmitigated success story. My initial response is that it’s a problem of definition: the top three priorities of the current government in its first term were, in my opinion, `undefined term, undefined term, undefined term.’ Of course, back in 1988 only 6% of candidates achieved grade A in an A level23 whereas the figure now is about 24%; but I would claim that we are not comparing like with like. To be more detailed, macroscopic measures of achievement can only be compared over time if there is some stability in syllabuses and in standards, and I would claim that this stability is totally absent. In particular, I invite you to compare an A level question in Maths from today with one from 1980. So, let me instead take an example from teaching methods: the so-called Phonics system. In essence, as it is generally practised in this country, this system attempts to partially disengage the learning of verbal symbols (phonemes) from that of written symbols (graphemes), on the grounds that conflating two tasks makes the job harder. Thus children are taught to pronounce letters rather than name them: as in L, M and N. Just try pronouncing these for a moment: remember not to say `le’ but `lll’ etcetera. To see the basic problem (descriptive if not procedural), try K. Now try P! To get technical for a second, the plosives such as P and B are unpronounceable without the addition of some vowel sound afterwards. The best you can do is pĕ! Let’s have a quick look at the phonetic system enunciated in the OED. [Overhead with reproduction of phonemes from OED] As you will see, there are 97 distinct phonemes, 91 if you exclude the (FOREIGN) section. More reasonable authorities will identify a mere 45 or 46 phonemes in English. And all these are apparently to be achieved in the Phonics system by learning the sounds of 26 letters! Some authorities24 have ascribed the rapid increase in the diagnosis of dyslexia to the use of the Phonics system. I hesitate to be so condemnatory but would merely say that this is certainly a case of mis-describing a `treatment’. Now let us turn to the Phonics system as it is enunciated by the (American) Riggs Institute25. This is an apparently complicated system of intensive instruction (over 9 weeks) that teaches 71 `phonograms’ (letter combinations which have one or more `single sound’ pronunciations). The total number of phonogram/pronunciation combinations is 118. Letters are not named. In the first 3 weeks the students learn the first 55 of these phonograms and then start writing (by dictation), reading and combining these phonemes! Should we care? I don’t know what system was used on me but I do know that the last three generations of my family learnt to read at home (so I don’t have an axe to grind). However, most people learn to read at school and there is widespread 23 OFSTED Reviews of Research- Educating the Very Able. (1998) OFSTED website The Learning Curve. Radio 4. 2001 25 The Riggs Institute. What we teach. (2002). http://riggsinstitute.com 24 The problems of statistics 9 agreement that the system of instruction matters. Personally, I think that the up-todate version of the car sticker that says `If you can read this, thank a teacher!’ should be: IF U CN RD THIS ITS THNX 2 N NGNR About systems, HMI said (in 1996) `The wide gulf in pupils’ reading performance is serious and unacceptable… It is clear that it is what individual schools do that makes the difference…’26; and `only about one in ten [teachers] held the view that their training [in the teaching of reading] had been satisfactory’27. In 1999, 20% of schoolchildren in England were classified as having Special Educational Needs (SEN)28. Whilst many of these will have substantial problems unrelated to literacy, it seems fairly clear that an illiterate or semi-literate 11 year old will certainly have SEN: in 1995, 52% of English 11 year-olds did not achieve level 4 (expected level of attainment for age 11) in English SATs, whilst 12% achieved level 2 (expected level of attainment for age 7) or lower29. It is hard to estimate the budget for SEN, but a plausible figure30 (in 2000) was £7.1bn out of a total schools’ budget of c£20bn. It seems, therefore, well worth investigating the link between 1) illiteracy and SEN and 2) the efficacy of literacy teaching methods. Indeed, had someone started a mere 5 years ago, with well-defined treatments/methods, an enormous amount of information should be available. The statistician’s problem: adequately define treatments and conduct trials (randomised?) amidst a morass of political infighting, special interest groups and deep prejudice. Concluding remarks: statisticians are, by and large an honest and conscientious lot; though regrettably inclined to excessive disputation (collective nouns might be: a disagreement of statisticians, an argument of politicians). They approach modelling issues conscientiously but sometimes with a touching naivety. The big problems of statistics are: to allow for the audience (and those who mediate the message), to avoid being manipulated or working to someone else’s agenda, and to explore conscientiously the issue of what it is that you’re actually modelling. In short, to be politically aware and constantly to remember that by no means everyone wants to know the truth. Saul Jacka, University of Warwick The teaching of reading in 45 Inner London Primary Schools- A report by Her Majesty’s Inspectors in collaboration with the LEAs of Islington, Southwark and Tower Hamlets. OFSTED,1996 27 ibid. 28 Marks J: What are special educational needs? Centre for Policy Studies. 2000 29 Marks, J: Standards in English and Maths in primary schools for 1995. Social Market Foundation. 1996. 30 See ref 28. 26