PowerPoint Presentation - Talking about Science

advertisement
Talking about Science
A lecture in the 6th Century course
“Mankind in the Universe”
by Kees van Deemter,
Computing Science dept., University of Aberdeen
Objectivity
• a major theme in “Mankind in the Universe”
– Can people know the universe?
(e.g., the Big Bang, man-made global warming)
– Can people know objectively what’s right?
(e.g. stem-cell research)
• Philosophical positions include
– Realism
– Anti-realism
– Constructivism
• This lecture: the expression of scientific data and
theories in language
Plan of the lecture
1. Publishing scientific results
2. Using computers: from data to text
3. (Science in daily life and politics)
1. Publishing Scientific Results
Peer review: the main mechanism for deciding whether
a result is worth publishing (e.g., as a journal article)
(1)Authors submit article
(2) Editors select expert reviewers (“peers”)
(3) Reviewers assess article
(4) Editors decide: accept/reject/revise
If revise then authors may go back to (1)
Submissions as conference papers lack “revise” option
Peer review is no guarantee
against flaws
1. Human frailty:
–
–
–
–
Maybe the experts lack in expertise
Peers may disagree with each other
(Maybe they don’t like the authors)
(A dishonest peer may reject, then “steal” results)
Possible solutions
–
–
Anonimity of reviewer and/or reviewee
Declaring conflicts of interest
No silver bullet. Much depends on the editor.
Peer review is no guarantee
against flaws
2. Publication bias
• Reviewers and editors are keen on
“interesting” results.
• Interesting results are read eagerly,
are often quoted, and sell journals
So how about disappointing results?
• Research hypothesis: “activity x makes you
more likely to get cancer”
• 1000 patients tested. 500 do x, 500 don’t do x.
x:
50 get cancer
not x: 53 get cancer
• Your hypothesis is not confirmed (the trend
even goes in the opposite direction)
• Your journal submission may be rejected,
because it’s not interesting enough.
• Your “negative” findings may never get published
• Yet they tell us something of potential value:
– Maybe x is unrelated to cancer
– Maybe x makes you less likely to get cancer
• Note: Your experiment does not show
convincingly that x makes you less likely to get
cancer. (50/53 is too small a difference)
– Statisticians say: the result is not significant
• But others may have found similar negative
results …
Meta-analysis
• A stats analysis that tries to draw conclusions
from a set of experiments. (Meta: “about”)
• Championed, among others, by the Cochrane
collaboration
• Instructive logo:
The Cochrane logo explained
• A landmark 1989 analysis of the use of steroids
on prematurely born babies:
– 2 studies had found a positive effect (significant),
– 5 studies had found no significant effect
– Doctors did not believe the effect until a meta-analysis
of all 7 studies together showed a positive effect
• Back to our imaginary study of cancer:
A meta-analysis might have shown that
x makes you less likely to get cancer …
But …
• those negative results will not be counted in
the meta-analysis, because they were never
published
• Omission of “disappointing” results could even
result in the erroneous conclusion that x
makes you more likely to get cancer
[Goldacre 2009] Bad Science. Harper Perennial
3. Cheating
• Dishonesty about authorship: plagiarism
• Dishonesty about data and statistics
Plagiarism
“taking someone else’s work and passing it off
as one’s own”
• There is a grey area. I got my definition from the
Mac’s Dictionary application. Do I have to
acknowledge this?
• If you take someone else’s ideas then (try to) say
who had them first
• If you also take someone else’s words verbatim
(for more than just a few words) then
put quotes around the text as well
– A grey area: “just a few words”
Plagiarism and peer review
• Peer review contains important safeguards
against plagiarism.
– One of your reviewers may have read that earlier
article …
• But peer review is no guarantee.
– What if the article was published in Japanese?
• Still, offenders get caught. Moreover, if the
dishonesty only concerned the authorship,
the implications for science are limited
– A victimless crime?
Improper use of data
In science (as opposed to teaching),
this is a bigger problem than plagiarism
(1) Conscious cheating
(2) Unconscious cheating
Conscious cheating (?)
• Some notorious cases, where it appears that
data were intentionally faked or distorted
– Andrew Wakefield’s work linking the MMR vaccine
to autism
– Parts of the University of East Anglia’s work on
global warming
– Hwang Woo-suk’s work on stem-cell research and
human cloning
BBC News, 15 Dec 2005
(…) Stem cell success 'faked’
A South Korean cloning pioneer has admitted fabricating
results in key stem cell research, a colleague claims. At least
nine of 11 stem cell colonies used in a landmark research
paper by Dr Hwang Woo-suk were faked, said Roh Sung-il,
who collaborated on the paper. Dr Hwang wants the US
journal Science to withdraw his paper on stem cell cloning,
Mr Roh said. Dr Hwang, who is reported to be receiving
hospital treatment for stress, was not available for
comment. Science could not confirm whether it had
received a request to retract the paper. Dr Hwang's paper
had been hailed as a breakthrough, opening the possibility
of cures for degenerative diseases. (…)
Unconscious cheating: observer bias
One experiment: Some patients got a medicine against
multiple sclerosis, others got a placebo
• 50% of trained observers (A) knew who got the placebo
• 50% of trained observers (B) did not know
• Observers (A) observed an improvement in the condition of
patients who were given the medicine
• Observers (B) did not observe an improvement
Noseworthy et al. The impact of blinding on the results of a
randomized, placebo-controlled multiple sclerosis clinical trial.
Neurology. 2001;57:S31 S35.
Unconscious cheating
• Rosenthal effect. Participants were given
photographs of people, and ask to say whether
these were “successful in life”.
– Some (A) experimenters were told that participants
judge most photographs as successful
– Other experimenter (B) were told that participants
judge most photographs as unsuccessful
• Participants supervised by A judged photographs
much more positively than those supervised by B
• Supervisors could only read out a set speech!
Unconscious cheating
• Rosenthal effect (conclusion): by believing in a given
behaviour, you can make this result come about
Rosenthal R. Interpersonal expectations: effects of the
experimenter's hypothesis. In: Rosenthal & Rosnow (eds.)
Artifact in Behavioral Research. New York, NY: Academic
Press; 1969:181-277
• Rosenthal effect concerns experiments with people;
observations in physics can be hazardous as well
(e.g., when do you stop running an experiment?)
• Observer bias & Rosenthal effect are reasons for
making studies with human subjects double-blind
• Cheating is not something done by a few
criminals, but something we all need to
constantly be on guard against
– in science
– in daily life
• The science behind these phenomena is
interesting in itself
• [Ben Goldacre 2008] (again!)
Dubious uses of statistics
• “There are lies, damn lies, and statistics”
(author unknown)
• This not an indictment of numbers or statistics
– Statistics is safe when performed competently,
but errors are easy to make
– These can be conscious or unconscious
One common abuse of statistics
• Failing to declare your research hypothesis in advance
• Recall the “disappointing” cancer study study
– Your research hypothesis: x makes cancer more likely
– You found weak indications for the opposite:
x makes cancer less likely (50/53, not significant)
– Suppose you had found strong indications for this
(e.g., 40/63, significant)
– Reporting this as a confirmed hypothesis would be wrong!
– Stats is for testing a pre-existing suspicion
– Anything else is “data fishing”
On to our next topic …
2. Computers as authors:
from data to text
• Measurement can give rise to a huge amount of
numerical information, e.g.,
– Monitoring patients in intensive care
– Climate predictions: 2 petabyte (2 * 1015 bytes)
• People are bad at making sense of this, so
we use Natural Language Generation
to let computers produce readable text
• At Aberdeen: Reiter, Turner, Sripada, Davy.
Example: Turner’s pollen level forecasts demo:
http://www.csd.abdn.ac.uk/~rturner/cgi_bin/pollen.html
Neonatal ICU (Babytalk project)
Baby Monitoring
SpO2 (SO,HS)
ECG (HR)
Peripheral Temperature (TP)
Arterial Line
(Blood Pressure)
Transcutaneous Probe
(CO,OX)
Core Temperature (TC)
Input: Sensor Data (45 min’s)
Some medical jargon …
• Bradycardia: when the heart rate is too slow
• Intubation: placing a tube in the windpipe (e.g.,
for oxygen or drugs)
• FiO2: a metric of oxygen flow
• Sats: oxygen saturation levels
• ETT suction: “sucking” away contaminated
secretions (which might cause pneumonia)
• BP: Blood Pressure
• HR: Heart rate
Written by nurse
•
•
•
In preparation for re-intubation, a bolus of 50ug of morphine is given at
1039 when the FiO2 = 35%. There is a momentary bradycardia and then
the mean BP increases to 40. The sats go down to 79 and take 2 mins to
come back up. The toe/core temperature gap increases to 1.6 degrees.
At 1046 the baby is turned for re-intubation and re-intubation is complete
by 1100 the baby being bagged with 60% oxygen between tubes. During
the re-intubation there have been some significant bradycardias down to
60/min, but the sats have remained OK. The mean BP has varied
between 23 and 56, but has now settled at 30. The central temperature
has fallen to 36.1°C and the peripheral temperature to 33.7°C. The
baby has needed up to 80% oxygen to keep the sats up.
Over the next 10 mins the HR decreases to 140 and the mean BP = 30-40.
The sats fall with ETT suction so the FiO2 is increased to 80% but by 1112
the FiO2 is down to 49%.
Generated by Babytalk system
You saw the baby between 10:30 and 11:12. Heart Rate (HR) = 148. Core Temperature
(T1) = 37.5. Peripheral Temperature (T2) = 36.3. Mean Blood Pressure (mean BP) = 28.
Oxygen Saturation (SaO2) = 96.
The tcm sensor was re-sited.
By 10:40 SaO2 had decreased to 87. As a result, Fraction of Inspired Oxygen (FIO2)
was set to 36%. SaO2 increased to 93. There had been a bradycardia down to 90.
Previously 50.0 mics/min of morphine had been administered. Over the next 17
minutes mean BP gradually increased to 37.
By 11:00 the baby had been hand-bagged a number of times causing 2 successive
bradycardias. She was successfully re-intubated after 2 attempts. The baby was sucked
out twice.
At 11:02 FIO2 was raised to 79%.
By 11:06 the baby had been sucked out a number of times. Previously T2 had
increased to 34.3. Over the next 17 minutes HR decreased to 140.
FIO2 was lowered to 61%.
How the computer generates the text
(four stages, just a sketch …)
A kind of data mining: using computers
to analyse & summarise data
1.
2.
3.
4.
Signal analysis
Data abstraction
Content Determination
Saying it in English
(alternative: graphs/diagrams)
1. Signal Analysis
Essentially a collection of mathematical tools
• Detect trends, patterns, events, etc in the data
– (Blood oxygen levels) increasing
– Downward spike (in heart rate)
– Etc.
• Separate real data from artefacts
– Sensors can malfunction
2. Data Abstraction (1)
• Detect higher-level events in the data
– Bradycardia
– Sensor flapping against skin
(inferred from shapes in data)
• Not just maths: medical knowledge required
2. Data Abstraction (2)
• Determine relative importance of events
• Link related events
– Blood O2 falls, therefore O2 level in incubator is
increased (reason for the action)
– HR up because baby is being handled (cause)
• Potentially a strong point of text summaries
– Graphs/diagrams seldom show such links
3. Content Determination
• Determine what’s important enough
to talk about.
• This
– depends on purpose & context of text
• How much space/time is available?
• Saying A may force you to say B as well
– uses importance rating
(from Data Abstraction (2))
4. Saying it in English
Lots of different issues. For instance,
• How to organise the text as a whole? (e.g.,
Chronologically? Organised in paragraphs?)
• What sentence patterns to use? (e.g.,
Active mood? One fact per sentence?)
– “… have varied between 23 and 56”
• How to refer? (e.g. refer to a time saying
“at 11:05”, or “after intubation”?)
• What words to use
(e.g., avoiding medical jargon?)
Objectivity issues
• In signal analysis: What’s an event?
– Imagine three short downward spikes in HR
• Three events or one?
• In data abstraction:
– Concepts like “bradycardia” are theory laden
• 20 years from now, a different definition?
– Causality is problematic
• Was HR increase caused by handling?
– Many thresholds are a bit arbitrary
Objectivity issues
In Content Determination
• Suppose 37.5C counts as a fever. Suppose this
lasts for only 10 minutes
– Is this worth saying? (Can it be relevant for clinical
decisions?)
• How long does your temperature need to be
above threshold to call it a fever?
• How long before we call something a
bradycardia?
• What makes a momentary bradycardia, or
significant bradycardias?
• How long can a fever last before it is worth
reporting?
Using vague words
• What does it take for SATs to be “OK”?
– As SATs decrease, medical complications
become more likely
– This is not a Yes/No thing, but something gradual
– Application of vague words can be a matter of judgment
• Should a patient’s age be taken into account?
• His/her medical condition? The nurse’s expectations?
• Computers struggle using vague words (“significant”,
“momentary”, “OK”) appropriately
– Often avoided altogether (see earlier example)
Using vague or crisp words
• Science often replaces vague concepts by crisp
ones, e.g. “obese = BMI > 30”
– Such definitions make a value judgment about what’s
good or bad for one’s health (e.g. motivated by
statistical data about life expectancy)
– Hence, they are theory laden
– These value judgments may not always match doctors’
assessment
• There is more to morbide obesity than BMI
• Not just in medical affairs!
• Consider weather forecasting:
Two weather forecasters
(Is the cup half full or half empty?)
1. “Sunny spells and mainly dry. Temperatures
up to 15C this afternoon and when the sun is
out it will feel pleasant enough in spite of a
moderate northerly breeze.”
2. “Cloudy at times with a slight chance of rain.
Temperatures only reaching 15C this
afternoon and with any rain around and a
moderate northerly breeze it will feel cooler.”
Reading material on
“Computers as Authors”
• [Reiter 2007] An architecture for Data-to-text
systems. In Proceedings of ENLG-07.
(Conference paper on the NLG challenges
involved in mapping data to text)
• [van Deemter 2010] Not Exactly: in praise of
vagueness. Oxford University Press.
(Informal book on the expression of
quantitative information; chapters 3 and 11)
In summary …
• Complete objectivity may not always be
achievable
In summary …
• Complete objectivity may not always be
achievable
• But we can keep trying!
3. Science in daily life and politics
• Too large a topic to squeeze into the
remaining time
• An entire 6th Century course on this topic:
Science and the Media
http://www.abdn.ac.uk/thedifference/media-science.php
Instead, let’s have a brief
wrap-up of the course
Download