The problems of Statistics: sex, money and education

advertisement
The problems of Statistics: sex, money
and education
Preliminary remarks
I was both very grateful and very nervous to receive the invitation to give this talk to
the Assistant Statisticians and Statistical Officers Conference. Very grateful because,
as an applied probabilist, my experience of `statistics at the sharp end’ is pretty
limited; and very nervous … for exactly the same reason.
To be more precise about my experience: I have done some consulting for the water
industry, I trained as an actuary before moving to Warwick University as a lecturer,
and I have some expertise in Financial Mathematics. As you can see, a paltry set of
credentials to wave under the nose of your average statistician.
In mitigation I can honestly claim that I have maintained an interest in statistics at the
two extremes throughout my career. What do I mean by the two extremes? I mean the
very theoretical end and the - applied has completely the wrong overtones, so let us
say - the `How do we go about this?’ end. Some of you will now be objecting that
these are the same end of the spectrum but I hope that during this talk it will become
clear both what I mean and that it is the, or at least an-, other end of the spectrum.
To turn then to the subject of this talk: `the problems of statistics’. A pretentious and
over-ambitious title? Well, yes!
On the other hand, I think if one stands back a long way one sees that, when it comes
to the serious issues of practise, methodological disagreements between statisticians
and, in particular, the almost religious divide between frequentists and Bayesians
assume their proper relevance (or irrelevance). It is regrettable, perhaps, that one sees
(to quote Laurence Sterne (the master of portrayal of emotionally-charged intellectual
argument)) “one half of a learned profession tilting full butt against the other half of
it, and then tumbling and rolling over one another like hogs.”1, but it is not very
important when it comes to the significance of real statistics.
Turning to the subtitle of this talk: back in the days when there was such a thing as
`polite society’ it was a firm rule that three things should not be discussed at a dinner
party: money, religion and politics. To these we might add `sex’, on the grounds that,
whilst it wouldn’t have been considered necessary to ban this topic in those days, it
certainly might be now. The point, I take it, of banning them is that these subjects
arouse the strongest of emotions in those who hold an opinion; and it is rare to find
someone without an opinion. So, intense argument and disgust is likely to ensue when
such matters are discussed. We, however, all have strong stomachs and besides, we’re
not at dinner. My overall point is that these topics, along with education (because
we’ve all had one), are subjects, spheres of human activity where everyone has an
opinion, a corner to fight, an axe to grind.
Picture, if you will, the noble statistician, like some character from the art of socialist
realism, striding forth, hammer or sickle in hand, to do battle with the monsters of
1
Tristram Shandy, p207. Laurence Sterne
The problems of statistics 2
ignorance on some topic such as AIDS, home childbirth, the money supply, cancer
screening or the quality of education.
In my opinion, ignorance is the least of your problems. The real `monsters’ you face
are special interests, the precise definition of terms and of `treatments’, political
expediency and the ability of people, in the mass, to circumvent attempts to control
them.
Religion: I do not intend to discuss this but would say with Lord Melbourne: “While I
cannot be regarded as a pillar, I must be regarded as a buttress of the church, because
I support it from the outside”.
There is an excellent paper by Bartholomew2 in JRSSA to which you should refer if
you are interested in the interaction between statistics and religion. I would only point
out two conflicts explored in that paper: the first between what is often called
Cromwell’s law: `I beseech you, in the bowels of Christ, think it possible you may be
mistaken’3 which is usually interpreted as `don’t rule possibilities out in your model’,
and Occam’s razor [least hypothesis] – William of Occam’s principle that the simplest
hypothesis adequately explaining the facts is to be preferred. Popper sides with
Cromwell by asserting that, a priori, more complicated models are more likely than
simpler ones (since with more parameters it’s easier to fit a model). The second
concerns whether life itself is evidence for God: `life is so unlikely that God must
have set things up’ versus `God would never have set things up so that life was so
unlikely, therefore life is an accident’.
Sex Let me turn swiftly to the first of the topics in the subtitle. Of course the
term sex here is merely a way to suck you in to a discussion of medicine and
medical statistics.
It might be said that the medical view of sex is that it has three outcomes: HIV/AIDS,
other (more strictly defined) sexually transmitted diseases and pregnancy. Indeed, to
slightly change the grim medical joke, we might say that there are only two, since the
third outcome is a subset of the second.
Two faulty views of medics persist today. The first views them as educated people
striving only to save life or improve the quality of people’s health. People, in other
words, who always remember Galen’s stricture: `first, do no harm’.
The second regards them as overanxious, arrogant know-alls; too keen to intervene in
situations where things are best left well alone. In short, as the fools who do the
rushing-in.
I must admit that I tend to belong to the second camp and, like many people, could be
regarded as a spiritual descendent of Laurence Sterne when he caricatures them as one
side of a discussion about how the body arranges the quantity of blood, in particular
with reference to someone who has lost both his legs in a battle:
“Nature accommodates herself to these emergencies, cried the opponents—else what
do you say to the case of a whole stomach—a whole pair of lungs but half a man,
when both his legs have been unfortunately shot off? —
2
3
Bartholomew, D J: Probability, statistics and theology. JRSSA, 151, 137-178, 1988
Letter to the General Assembly, Church of Scotland. 3 Aug. 1650.
The problems of statistics 3
He dies of a plethora, said they—or must spit blood, and in a fortnight or three weeks
go off in a consumption—
It happens otherwise—replied the opponents. —
It ought not, they said.”4
So, let’s start with the thorny issue of home births-their advisability and desirability.
First a little history. Back in the 1930’s approximately 95% of UK births were at
home and 5% in hospital5. Crude rates for perinatal mortality were better for home
births (actually the relevant study was by social class but this was very closely
correlated with place of birth for obvious reasons pre the NHS6), however the rate was
about 60 per 1000.
By 1966 the percentage of hospital births had risen to 75%7, with mortality rates
substantially better for home births. ‘Round about 19708 (and with mortality rates still
substantially better for home births), in the face of a mortality rate of about 20 per
10009 (which was considered much too high), there was a [more] concerted move to
hospital births so that, by 1990 the home birth rate was about 1%.
Opponents of the medicalisation of childbirth pointed to other countries (in particular
the Netherlands, which did not follow the `hospital route’ and still has a home birth
percentage of about 60% and similar mortality to the UK), suggesting that for lowrisk births, home-birth was as safe as, if not safer than hospital birth.
Then, in 1996, the Northern Region Perinatal Mortality Survey10 was published in the
BMJ, comparing all home births (in the region) for the period 1986-1993 with all
hospital births. In 3466 home births, mortality was 134 (an enormously high figure).
At first sight this confirms the wisdom of hospitalisation. However, 131 of these
deaths were in cases where the home birth was not planned or where there was no
plan for delivery at all (i.e. the pregnancy had either been concealed or not
diagnosed)!
The remaining mortality was substantially better than the average.
An editorial in the same BMJ issue refers to the Cumberlege11 report, which `sees
home birth as a real option’ and suggests home birth is an option for `women with low
risk of obstetrical complications’. However, it points out that `some primary care
practitioners may need to be persuaded to provide the option for their patients: the
survey from Britain’s Northern region found that GPs and, to a lesser extent,
midwives often had reservations about home birth and tended to discourage it.’
4
Tristram Shandy, p261. Laurence Sterne
Maternal Services. The Bourn report. HMSO 1990
6
Johanson R, Newburn M and MacFarlane A: Has the medicalisation of childbirth gone too far? BMJ
324, 892-895, 13 Apr 2002
7
The Court Report on child health. 1976?
8
Central Health Services Committee. Standing Maternity and Midwifery Committee. Report of the
sub-committee on domiciliary midwifery and maternity bed needs. HMSO 1970
5
9
Year
1966 1969 1970 1971 1972 1973
% live births which were at home 25.2 16.4 13.6 11.0 8.6
6.1
% still births which were at home 9.2
7.0
5.2
5.3
4.3
3.9
(see ref. 7)
10
Collaborative survey of perinatal loss in planned and unplanned home births. Northern Region
Perinatal Mortality Survey Coordinating Group. BMJ 313, 1306-1309, 23 Nov. 1996.
11
Changing childbirth. Dept. of Health Expert Maternity Group. HMSO 1993.
The problems of statistics 4
So, it’s all sorted then! Well, not quite, there are two problems. Firstly there are
dissenting voices to the removal of those 131 deaths from the 134 figure (see Drife12
and letters in the BMJ 320 (18 Mar 2000) p 798)) and secondly there is a dearth of
midwives experienced at home birth. Actually there’s a third caveat: `… the absence
of randomised clinical trials’ which (in a glorious understatement) are `difficult to
achieve’!
The NBTF enquiry13 attempted (but failed) to form matched (by risk factors) pairs of
low risk (planned) hospital and home births. There were 5971 women in the home
group and only 4724 in the hospital group. Probably the most important conclusion
(apart from the lower rates of infant mortality and morbidity and of caesareans and
other interventions in the home group) was that `home births will probably increase to
4 or 5% of all maternities in the UK over the next decade and this needs preparatory
planning’!
To give a very partisan summary then, the NHS moves (at great cost and with no
supporting evidence) from 25% home births to, essentially, all hospital births over 20
years. After a decade of argument, it is then conceded that statistics suggest that the
NHS possibly shouldn’t have done this, but we’ve lost so much expertise that we
can’t move back quickly, indeed it will take a decade to reverse one eighth of the
change, and it will be done in the face of continued resistance from the providers of
primary health care. What is the problem for statisticians? To be included in the
decision making process, and to have their advice taken seriously.
Now that word `sex’ again. The OED gives the following definition:
Sex: 1. Either of the two divisions of organic beings distinguished as males and
females respectively …
Of course this is the meaning for which we now commonly use the word `gender’ (as
in Gender Studies) so let’s check that in the OED:
Gender: 1. Kind, sort. 2. Each of the two or three grammatical `kinds’, … , into
which substantive nouns are discriminated …
Gender (v): 2. To copulate.
As far as I’m concerned this is sufficient justification to include the topic of breast
cancer screening (which, I am reliably informed, is a women’s issue) under the
heading of sex.
We’ll start again with a brief history.
After the publication in 1985 of the results from the `two counties’ randomised
controlled trial in Sweden, the UK introduced a breast cancer screening programme
(mammography) for women aged 50-69, with a target of a 25% reduction in breast
cancer mortality in the target age group by 2000.
12
13
Drife, J: Data on babies’ safety during hospital births are being ignored. BMJ 319, 1008, 9 Oct 1999
Chamberlain G, Wraight A, Crowley P: Birth at home. Practical Midwife, 2(7), 35-39, 1999.
The problems of statistics 5
My attention was first drawn to this subject by hearing a discussion on Woman’s
Hour14 between a Danish statistician and the Head of the NHS breast cancer-screening
programme.
The statistician made the, I discover, fairly standard assertions that there was no
evidence of improvement and that there was evidence of adverse effects on women’s
health. The response from the Head of the screening programme was horror and
outrage: `how could anyone suggest that screening was ineffective, let alone
detrimental. This was an important issue for women and he should shut up’ is a not
unfair summary.
It took some time to track down the statistician: as it turns out, he is a member of the
Cochrane Breast Cancer Group (a part of the influential Cochrane Collaboration). The
paper15 reported a meta-analysis of eight randomised controlled trials (5 from
Sweden). Six (!) were rejected for reasons of bias in randomisation and the remaining
two gave a relative mortality risk for the screened group of 1.06 (10% mortality) so
that `for every 1000 women screened over twelve years, one breast cancer death is
averted but the total number of deaths is increased by six.’ In addition, the
mastectomy rate was increased by 25%, as was the radiotherapy rate. Incidentally, the
office of the NHS cancer screening programmes stated “It is difficult to evaluate these
claims … based on … two studies classified as poor quality studies by Gøtzsche and
Olsen.”16 This is a mistake (whether deliberate or accidental is not for me to say); the
authors classify these studies as `adequately randomised’ and `unbiased’, it was the
excluded studies that they classified as `poor’.
To quote a Lancet editorial17: `At present there is no evidence from large randomised
trials to support screening mammography programmes’. Now see the Lancet
commentary of 200218: `The benefits appear real but modest’, but, despite trials with
247010 participants `the latest analysis does not tell us whether the massive effort…
is worthwhile’!
The statistician’s problem: to measure a (possible) small improvement sufficiently
accurately so as to determine its cost-effectiveness in the face of enormous political
pressure supporting that measure.
AIDS Turning briefly now to AIDS, I recently performed a little experiment. I asked
17 friends and acquaintances (aged over 40) what they remembered about the history
of AIDS and the predictions back in the early to mid 80s and what they thought about
it now. There was a surprisingly uniform and simple response: `You lot got it wrong”!
On further exploration the common theme was that by late 1986 statisticians had
made `doomsday’ predictions about the likely incidence of AIDS in the UK and that
these had proved vast overestimates. So I checked. The AIDS awareness campaign
started in mid 1986 and various measures (in particular needle exchanges and a
concerted safe sex campaign) were fully in place by 1989. So I searched for papers
appearing by the end of 1988. A literature search unsurprisingly turned up many
Woman’s Hour. Radio 4, 6/7 April 2002.
O. Olsen and P C Gøtzsche: Is screening for breast cancer with mammography justified? Lancet 355,
129-134, 8 Jan 2000
16
Mayor S: Row over breast cancer screening shows that scientists “bring some subjectivity to their
work”. BMJ 323, 956, 7 Oct 2001
17
Horton R: Screening mammography—an overview revisited. Lancet 358, 1284-1285, 20 Oct 2001
18
Gelmon K A and Olivotto I: The mammography screening debate—time to move on. Lancet 359,
904-905, 16 Mar 2002
14
15
The problems of statistics 6
papers, and in particular a special issue of JRSSA19, coincidentally the same issue
which contained the paper by Bartholomew which I mentioned earlier. I was at first
surprised to discover no numerical predictions whatsoever. Of course, on reflection
this is no surprise at all. Statisticians may be contentious but they aren’t (usually)
suicidal. It was clear that the incubation period for AIDS was long and the data with
which to estimate parameters in any reasonable model simply weren’t there. Very
definitely a case of the dog which didn’t bark in the night, and yet, on the basis of my
survey, statisticians are carrying the can.
The statistician’s problem: sometimes the fact that you can’t do anything is not an
acceptable excuse, and making If…, then… statements won’t get you off the hook.
The Money Supply The economists’ definition of money is: `a store of value, a
medium of exchange and a unit of account’.
It’s important to understand this economic definition of money since otherwise one
would assume that the money supply consisted of notes and coins. I don’t have time
to give the standard economics lecture on how money is created by the banking
system by fractional banking and on the treatment of liquid assets as money but let us
just say for now that in 1994 one measure of money (M2) put the money supply at
£401bn whilst the value of notes and coins was £21bn20. The reason for this is
basically that many assets (including deposits at banks) are usually regarded as money
by their owners when it comes to making decisions about spending.
Why seek to control the money supply? After a moment’s thought, it should be clear
that the growth in money spent should be the real growth in production of goods and
services plus inflation. Thus controlling the money supply should control inflation
(since there is a limit to how fast a given stock of money can circulate).
To quote a standard introductory university text:
“In the 1970’s the authorities in many parts of the world’s governments became
converted to monetarism, the belief that… important macro variables can be
manipulated by manipulating the money supply”21.
OK, so never mind the policy, there are good reasons why we might want to know the
amount of money available. So, the statistical problem is to measure how much
money is available for transactions (we might also want to measure the velocity of
circulation, but that’s a separate question).
19
Issue 1, JRSSA 151, 1988
Lipsey R G and Chrystal K A: An introduction to positive economics (8 th edition), OUP, 1995, p746.
21
Lipsey R G: An introduction to positive economics (6th Edition), Weidenfeld and Nicholson, 1983,
p690.
20
The problems of statistics 7
MONEY SUPPLY DEFINITIONS
M0 M1 M2 £M3 M3 PSL1 PSL2
Notes and coins
X
Bank deposits:
Sight deposits
Checkable deposits
Other interest bearing
Time deposits< 1month
Time deposits 1month to 2 years
Time deposits over 2 years
Sector:
Private
Public
Foreign currency:
Size<£100,000
Size >£100,000
Other money market instruments
Savings deposits and securities
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
As to what happened, I can do little better than quote Lipsey again:
“There are so many highly substitutable monetary assets that control of any one group
can often lead to disintermediation as decision makers slip into holding more of a
similar but uncontrolled asset and less of the controlled one…”
“Many central banks started out controlling M1. Those who were successful,
however, often found the simple statistical relation between M1 and those macro
aggregates they sought to control breaking down. The public learned to do with assets
not in M1 and the central banks then sought to control a wider monetary aggregate”22
[See also table 38.1 in Lipsey (8th Edition)] By the end of monetary targeting in this
country in 1986 it’s clear that people were not just using the money under their
granny’s mattress, but were allowing for future grannies they might acquire by
remarriage and adding in their stock of Monopoly money together with their
telephone number.
The statistician’s problem: sometimes quantum effects apply at the macroscopic level
– the act of measuring affects the (desired effect of) the measurement. To spell it out:
if you control a surrogate (for the thing you really want to control) you’ll influence
behaviour, but quite possibly not in the way that you want.
22
Lipsey R G: An introduction to positive economics (6 th Edition), Weidenfeld and Nicholson, 1983,
p691.
The problems of statistics 8
Education: turning now to education, if we ignore the current, high-octane, issue of
university fees, you might say `what’s the problem?’ Literacy rates are at 95%
(higher than they’ve ever been), `A’ level pass rates rise and rise and soon 50% of
young people will go to university. In short, it’s an unmitigated success story.
My initial response is that it’s a problem of definition: the top three priorities of the
current government in its first term were, in my opinion, `undefined term, undefined
term, undefined term.’ Of course, back in 1988 only 6% of candidates achieved grade
A in an A level23 whereas the figure now is about 24%; but I would claim that we are
not comparing like with like.
To be more detailed, macroscopic measures of achievement can only be compared
over time if there is some stability in syllabuses and in standards, and I would claim
that this stability is totally absent. In particular, I invite you to compare an A level
question in Maths from today with one from 1980.
So, let me instead take an example from teaching methods: the so-called Phonics
system. In essence, as it is generally practised in this country, this system attempts to
partially disengage the learning of verbal symbols (phonemes) from that of written
symbols (graphemes), on the grounds that conflating two tasks makes the job harder.
Thus children are taught to pronounce letters rather than name them: as in L, M and
N. Just try pronouncing these for a moment: remember not to say `le’ but `lll’
etcetera. To see the basic problem (descriptive if not procedural), try K. Now try P!
To get technical for a second, the plosives such as P and B are unpronounceable
without the addition of some vowel sound afterwards. The best you can do is pĕ!
Let’s have a quick look at the phonetic system enunciated in the OED.
[Overhead with reproduction of phonemes from OED]
As you will see, there are 97 distinct phonemes, 91 if you exclude the (FOREIGN)
section. More reasonable authorities will identify a mere 45 or 46 phonemes in
English. And all these are apparently to be achieved in the Phonics system by
learning the sounds of 26 letters! Some authorities24 have ascribed the rapid increase
in the diagnosis of dyslexia to the use of the Phonics system. I hesitate to be so
condemnatory but would merely say that this is certainly a case of mis-describing a
`treatment’. Now let us turn to the Phonics system as it is enunciated by the
(American) Riggs Institute25. This is an apparently complicated system of intensive
instruction (over 9 weeks) that teaches 71 `phonograms’ (letter combinations which
have one or more `single sound’ pronunciations). The total number of
phonogram/pronunciation combinations is 118. Letters are not named. In the first 3
weeks the students learn the first 55 of these phonograms and then start writing (by
dictation), reading and combining these phonemes!
Should we care? I don’t know what system was used on me but I do know that the last
three generations of my family learnt to read at home (so I don’t have an axe to
grind). However, most people learn to read at school and there is widespread
23
OFSTED Reviews of Research- Educating the Very Able. (1998) OFSTED website
The Learning Curve. Radio 4. 2001
25
The Riggs Institute. What we teach. (2002). http://riggsinstitute.com
24
The problems of statistics 9
agreement that the system of instruction matters. Personally, I think that the up-todate version of the car sticker that says `If you can read this, thank a teacher!’ should
be:
IF U CN RD THIS ITS THNX 2 N NGNR
About systems, HMI said (in 1996) `The wide gulf in pupils’ reading performance is
serious and unacceptable… It is clear that it is what individual schools do that makes
the difference…’26; and `only about one in ten [teachers] held the view that their
training [in the teaching of reading] had been satisfactory’27.
In 1999, 20% of schoolchildren in England were classified as having Special
Educational Needs (SEN)28. Whilst many of these will have substantial problems
unrelated to literacy, it seems fairly clear that an illiterate or semi-literate 11 year old
will certainly have SEN: in 1995, 52% of English 11 year-olds did not achieve level 4
(expected level of attainment for age 11) in English SATs, whilst 12% achieved level
2 (expected level of attainment for age 7) or lower29.
It is hard to estimate the budget for SEN, but a plausible figure30 (in 2000) was £7.1bn
out of a total schools’ budget of c£20bn. It seems, therefore, well worth investigating
the link between
1) illiteracy and SEN
and
2) the efficacy of literacy teaching methods.
Indeed, had someone started a mere 5 years ago, with well-defined
treatments/methods, an enormous amount of information should be available.
The statistician’s problem: adequately define treatments and conduct trials
(randomised?) amidst a morass of political infighting, special interest groups and
deep prejudice.
Concluding remarks: statisticians are, by and large an honest and conscientious lot;
though regrettably inclined to excessive disputation (collective nouns might be: a
disagreement of statisticians, an argument of politicians). They approach modelling
issues conscientiously but sometimes with a touching naivety. The big problems of
statistics are: to allow for the audience (and those who mediate the message), to avoid
being manipulated or working to someone else’s agenda, and to explore
conscientiously the issue of what it is that you’re actually modelling. In short, to be
politically aware and constantly to remember that by no means everyone wants to
know the truth.
Saul Jacka,
University of Warwick
The teaching of reading in 45 Inner London Primary Schools- A report by Her Majesty’s Inspectors
in collaboration with the LEAs of Islington, Southwark and Tower Hamlets. OFSTED,1996
27
ibid.
28
Marks J: What are special educational needs? Centre for Policy Studies. 2000
29
Marks, J: Standards in English and Maths in primary schools for 1995. Social Market Foundation.
1996.
30
See ref 28.
26
Download