Talking about Science A lecture in the 6th Century course “Mankind in the Universe” by Kees van Deemter, Computing Science dept., University of Aberdeen Objectivity • a major theme in “Mankind in the Universe” – Can people know the universe? (e.g., the Big Bang, man-made global warming) – Can people know objectively what’s right? (e.g. stem-cell research) • Philosophical positions include – Realism – Anti-realism – Constructivism • This lecture: the expression of scientific data and theories in language Plan of the lecture 1. Publishing scientific results 2. Using computers: from data to text 3. (Science in daily life and politics) 1. Publishing Scientific Results Peer review: the main mechanism for deciding whether a result is worth publishing (e.g., as a journal article) (1)Authors submit article (2) Editors select expert reviewers (“peers”) (3) Reviewers assess article (4) Editors decide: accept/reject/revise If revise then authors may go back to (1) Submissions as conference papers lack “revise” option Peer review is no guarantee against flaws 1. Human frailty: – – – – Maybe the experts lack in expertise Peers may disagree with each other (Maybe they don’t like the authors) (A dishonest peer may reject, then “steal” results) Possible solutions – – Anonimity of reviewer and/or reviewee Declaring conflicts of interest No silver bullet. Much depends on the editor. Peer review is no guarantee against flaws 2. Publication bias • Reviewers and editors are keen on “interesting” results. • Interesting results are read eagerly, are often quoted, and sell journals So how about disappointing results? • Research hypothesis: “activity x makes you more likely to get cancer” • 1000 patients tested. 500 do x, 500 don’t do x. x: 50 get cancer not x: 53 get cancer • Your hypothesis is not confirmed (the trend even goes in the opposite direction) • Your journal submission may be rejected, because it’s not interesting enough. • Your “negative” findings may never get published • Yet they tell us something of potential value: – Maybe x is unrelated to cancer – Maybe x makes you less likely to get cancer • Note: Your experiment does not show convincingly that x makes you less likely to get cancer. (50/53 is too small a difference) – Statisticians say: the result is not significant • But others may have found similar negative results … Meta-analysis • A stats analysis that tries to draw conclusions from a set of experiments. (Meta: “about”) • Championed, among others, by the Cochrane collaboration • Instructive logo: The Cochrane logo explained • A landmark 1989 analysis of the use of steroids on prematurely born babies: – 2 studies had found a positive effect (significant), – 5 studies had found no significant effect – Doctors did not believe the effect until a meta-analysis of all 7 studies together showed a positive effect • Back to our imaginary study of cancer: A meta-analysis might have shown that x makes you less likely to get cancer … But … • those negative results will not be counted in the meta-analysis, because they were never published • Omission of “disappointing” results could even result in the erroneous conclusion that x makes you more likely to get cancer [Goldacre 2009] Bad Science. Harper Perennial 3. Cheating • Dishonesty about authorship: plagiarism • Dishonesty about data and statistics Plagiarism “taking someone else’s work and passing it off as one’s own” • There is a grey area. I got my definition from the Mac’s Dictionary application. Do I have to acknowledge this? • If you take someone else’s ideas then (try to) say who had them first • If you also take someone else’s words verbatim (for more than just a few words) then put quotes around the text as well – A grey area: “just a few words” Plagiarism and peer review • Peer review contains important safeguards against plagiarism. – One of your reviewers may have read that earlier article … • But peer review is no guarantee. – What if the article was published in Japanese? • Still, offenders get caught. Moreover, if the dishonesty only concerned the authorship, the implications for science are limited – A victimless crime? Improper use of data In science (as opposed to teaching), this is a bigger problem than plagiarism (1) Conscious cheating (2) Unconscious cheating Conscious cheating (?) • Some notorious cases, where it appears that data were intentionally faked or distorted – Andrew Wakefield’s work linking the MMR vaccine to autism – Parts of the University of East Anglia’s work on global warming – Hwang Woo-suk’s work on stem-cell research and human cloning BBC News, 15 Dec 2005 (…) Stem cell success 'faked’ A South Korean cloning pioneer has admitted fabricating results in key stem cell research, a colleague claims. At least nine of 11 stem cell colonies used in a landmark research paper by Dr Hwang Woo-suk were faked, said Roh Sung-il, who collaborated on the paper. Dr Hwang wants the US journal Science to withdraw his paper on stem cell cloning, Mr Roh said. Dr Hwang, who is reported to be receiving hospital treatment for stress, was not available for comment. Science could not confirm whether it had received a request to retract the paper. Dr Hwang's paper had been hailed as a breakthrough, opening the possibility of cures for degenerative diseases. (…) Unconscious cheating: observer bias One experiment: Some patients got a medicine against multiple sclerosis, others got a placebo • 50% of trained observers (A) knew who got the placebo • 50% of trained observers (B) did not know • Observers (A) observed an improvement in the condition of patients who were given the medicine • Observers (B) did not observe an improvement Noseworthy et al. The impact of blinding on the results of a randomized, placebo-controlled multiple sclerosis clinical trial. Neurology. 2001;57:S31 S35. Unconscious cheating • Rosenthal effect. Participants were given photographs of people, and ask to say whether these were “successful in life”. – Some (A) experimenters were told that participants judge most photographs as successful – Other experimenter (B) were told that participants judge most photographs as unsuccessful • Participants supervised by A judged photographs much more positively than those supervised by B • Supervisors could only read out a set speech! Unconscious cheating • Rosenthal effect (conclusion): by believing in a given behaviour, you can make this result come about Rosenthal R. Interpersonal expectations: effects of the experimenter's hypothesis. In: Rosenthal & Rosnow (eds.) Artifact in Behavioral Research. New York, NY: Academic Press; 1969:181-277 • Rosenthal effect concerns experiments with people; observations in physics can be hazardous as well (e.g., when do you stop running an experiment?) • Observer bias & Rosenthal effect are reasons for making studies with human subjects double-blind • Cheating is not something done by a few criminals, but something we all need to constantly be on guard against – in science – in daily life • The science behind these phenomena is interesting in itself • [Ben Goldacre 2008] (again!) Dubious uses of statistics • “There are lies, damn lies, and statistics” (author unknown) • This not an indictment of numbers or statistics – Statistics is safe when performed competently, but errors are easy to make – These can be conscious or unconscious One common abuse of statistics • Failing to declare your research hypothesis in advance • Recall the “disappointing” cancer study study – Your research hypothesis: x makes cancer more likely – You found weak indications for the opposite: x makes cancer less likely (50/53, not significant) – Suppose you had found strong indications for this (e.g., 40/63, significant) – Reporting this as a confirmed hypothesis would be wrong! – Stats is for testing a pre-existing suspicion – Anything else is “data fishing” On to our next topic … 2. Computers as authors: from data to text • Measurement can give rise to a huge amount of numerical information, e.g., – Monitoring patients in intensive care – Climate predictions: 2 petabyte (2 * 1015 bytes) • People are bad at making sense of this, so we use Natural Language Generation to let computers produce readable text • At Aberdeen: Reiter, Turner, Sripada, Davy. Example: Turner’s pollen level forecasts demo: http://www.csd.abdn.ac.uk/~rturner/cgi_bin/pollen.html Neonatal ICU (Babytalk project) Baby Monitoring SpO2 (SO,HS) ECG (HR) Peripheral Temperature (TP) Arterial Line (Blood Pressure) Transcutaneous Probe (CO,OX) Core Temperature (TC) Input: Sensor Data (45 min’s) Some medical jargon … • Bradycardia: when the heart rate is too slow • Intubation: placing a tube in the windpipe (e.g., for oxygen or drugs) • FiO2: a metric of oxygen flow • Sats: oxygen saturation levels • ETT suction: “sucking” away contaminated secretions (which might cause pneumonia) • BP: Blood Pressure • HR: Heart rate Written by nurse • • • In preparation for re-intubation, a bolus of 50ug of morphine is given at 1039 when the FiO2 = 35%. There is a momentary bradycardia and then the mean BP increases to 40. The sats go down to 79 and take 2 mins to come back up. The toe/core temperature gap increases to 1.6 degrees. At 1046 the baby is turned for re-intubation and re-intubation is complete by 1100 the baby being bagged with 60% oxygen between tubes. During the re-intubation there have been some significant bradycardias down to 60/min, but the sats have remained OK. The mean BP has varied between 23 and 56, but has now settled at 30. The central temperature has fallen to 36.1°C and the peripheral temperature to 33.7°C. The baby has needed up to 80% oxygen to keep the sats up. Over the next 10 mins the HR decreases to 140 and the mean BP = 30-40. The sats fall with ETT suction so the FiO2 is increased to 80% but by 1112 the FiO2 is down to 49%. Generated by Babytalk system You saw the baby between 10:30 and 11:12. Heart Rate (HR) = 148. Core Temperature (T1) = 37.5. Peripheral Temperature (T2) = 36.3. Mean Blood Pressure (mean BP) = 28. Oxygen Saturation (SaO2) = 96. The tcm sensor was re-sited. By 10:40 SaO2 had decreased to 87. As a result, Fraction of Inspired Oxygen (FIO2) was set to 36%. SaO2 increased to 93. There had been a bradycardia down to 90. Previously 50.0 mics/min of morphine had been administered. Over the next 17 minutes mean BP gradually increased to 37. By 11:00 the baby had been hand-bagged a number of times causing 2 successive bradycardias. She was successfully re-intubated after 2 attempts. The baby was sucked out twice. At 11:02 FIO2 was raised to 79%. By 11:06 the baby had been sucked out a number of times. Previously T2 had increased to 34.3. Over the next 17 minutes HR decreased to 140. FIO2 was lowered to 61%. How the computer generates the text (four stages, just a sketch …) A kind of data mining: using computers to analyse & summarise data 1. 2. 3. 4. Signal analysis Data abstraction Content Determination Saying it in English (alternative: graphs/diagrams) 1. Signal Analysis Essentially a collection of mathematical tools • Detect trends, patterns, events, etc in the data – (Blood oxygen levels) increasing – Downward spike (in heart rate) – Etc. • Separate real data from artefacts – Sensors can malfunction 2. Data Abstraction (1) • Detect higher-level events in the data – Bradycardia – Sensor flapping against skin (inferred from shapes in data) • Not just maths: medical knowledge required 2. Data Abstraction (2) • Determine relative importance of events • Link related events – Blood O2 falls, therefore O2 level in incubator is increased (reason for the action) – HR up because baby is being handled (cause) • Potentially a strong point of text summaries – Graphs/diagrams seldom show such links 3. Content Determination • Determine what’s important enough to talk about. • This – depends on purpose & context of text • How much space/time is available? • Saying A may force you to say B as well – uses importance rating (from Data Abstraction (2)) 4. Saying it in English Lots of different issues. For instance, • How to organise the text as a whole? (e.g., Chronologically? Organised in paragraphs?) • What sentence patterns to use? (e.g., Active mood? One fact per sentence?) – “… have varied between 23 and 56” • How to refer? (e.g. refer to a time saying “at 11:05”, or “after intubation”?) • What words to use (e.g., avoiding medical jargon?) Objectivity issues • In signal analysis: What’s an event? – Imagine three short downward spikes in HR • Three events or one? • In data abstraction: – Concepts like “bradycardia” are theory laden • 20 years from now, a different definition? – Causality is problematic • Was HR increase caused by handling? – Many thresholds are a bit arbitrary Objectivity issues In Content Determination • Suppose 37.5C counts as a fever. Suppose this lasts for only 10 minutes – Is this worth saying? (Can it be relevant for clinical decisions?) • How long does your temperature need to be above threshold to call it a fever? • How long before we call something a bradycardia? • What makes a momentary bradycardia, or significant bradycardias? • How long can a fever last before it is worth reporting? Using vague words • What does it take for SATs to be “OK”? – As SATs decrease, medical complications become more likely – This is not a Yes/No thing, but something gradual – Application of vague words can be a matter of judgment • Should a patient’s age be taken into account? • His/her medical condition? The nurse’s expectations? • Computers struggle using vague words (“significant”, “momentary”, “OK”) appropriately – Often avoided altogether (see earlier example) Using vague or crisp words • Science often replaces vague concepts by crisp ones, e.g. “obese = BMI > 30” – Such definitions make a value judgment about what’s good or bad for one’s health (e.g. motivated by statistical data about life expectancy) – Hence, they are theory laden – These value judgments may not always match doctors’ assessment • There is more to morbide obesity than BMI • Not just in medical affairs! • Consider weather forecasting: Two weather forecasters (Is the cup half full or half empty?) 1. “Sunny spells and mainly dry. Temperatures up to 15C this afternoon and when the sun is out it will feel pleasant enough in spite of a moderate northerly breeze.” 2. “Cloudy at times with a slight chance of rain. Temperatures only reaching 15C this afternoon and with any rain around and a moderate northerly breeze it will feel cooler.” Reading material on “Computers as Authors” • [Reiter 2007] An architecture for Data-to-text systems. In Proceedings of ENLG-07. (Conference paper on the NLG challenges involved in mapping data to text) • [van Deemter 2010] Not Exactly: in praise of vagueness. Oxford University Press. (Informal book on the expression of quantitative information; chapters 3 and 11) In summary … • Complete objectivity may not always be achievable In summary … • Complete objectivity may not always be achievable • But we can keep trying! 3. Science in daily life and politics • Too large a topic to squeeze into the remaining time • An entire 6th Century course on this topic: Science and the Media http://www.abdn.ac.uk/thedifference/media-science.php Instead, let’s have a brief wrap-up of the course