Natural Language Generation of Scientific Argumentation for Healthcare Consumers Nancy L. Green University of North Carolina Greensboro, USA April 1, 2011 Outline of Presentation Overview of Genie Project Technical details Some future work GenIE Project Overview Long-term goal: Automatic generation of biomedical arguments for healthcare consumers Example: Your mother and sister have BC. Are you at higher risk of BC? What are pros/cons of genetic testing for BC? Pros/cons of preventive medication or surgery? Need to have in-depth understanding of arguments to make informed decisions First step: “On-the-fly” generation of 1st draft of patienttailored genetic counseling letters Clinical Genetics Domain Explanatory arguments for/against: Diagnosis (e.g. a genetic condition responsible for a child’s hearing loss) Inheritance of condition in family (autosomal dominant/recessive inheritance, mosaicism) Intervention (e.g. medication to prevent breast cancer) Technical details Overview of Genie Knowledge base Generation Process GenIE Patient Letter Generator Discourse grammar Observations (symptoms, etc.) & conclusions (diagnosis, etc.) Argument generator English generator Knowledge base (KB) letter (1st draft) GenIE Components Knowledge Base (KB): causal model of genetic disease modeled with qualitative probabilistic network (similar to Bayesian Network) Genetic counselor adds facts about a patient’s case to KB Discourse grammar generates “outline” of patient letter using info in KB; calls argument generator Argument generator creates arguments for diagnosis etc. using evidence from KB English generator: transforms completed outline (propositional representation) to English text Technical details Overview of Genie Generator Knowledge base Generation Process GenIE Patient Letter Generator Discourse grammar Observations (symptoms, etc.) & conclusions (diagnosis, etc.) Argument generator English generator Knowledge base (KB) letter (1st draft) Part of cystic fibrosis KB – evidence for patient X’s diagnosis CFTR genotype (GP) CFTR protein (BP) Variable GP BP BP2 PP SP2 PP2 EP SP TSP TRP Concept Type Genotype Biochemistry Biochemistry Physiology Symptom Physiology Event Symptom Test (decision) Result Value 2 mutations Abnormal Abnormal True True True True True Done Abnormal “Sweat Test” (TSP) Pancreas enzyme level (BP2) Y+ “Sweat Test” Result (TRP) Malabsorption (PP) Key for qualitative relations (see Wellman 1990; Druzdzel & Henrion 1993) Y+: positive additive synergy (enablement) Bacteria in lung secretion (EP) Y+ Growth failure (SP2) S+ : positive influence (implicit on arcs in networks) Viscous lung secretion (PP2) Frequent respiratory infections (SP) Part of cystic fibrosis KB – evidence for patient X’s inheritance of mutation N. European ancestry (HM) N. European ancestry (HF) CFTR genotype of mother (GM) CFTR genotype of father (GF) Frequent respiratory infections (SM) Variable (refers to) GP (patient) GM (mother) SM (mother) HM (mother) GF (father) SF (father) HF (father) X0 Frequent respiratory infections (SF) CFTR genotype (GP) Concept Type Genotype Genotype Symptom History Genotype Symptom History Value 2 (mutated alleles) 1 (mutated allele) False True 1 (mutated allele) False True Key for qualitative relations (see Wellman 1990; Druzdzel & Henrion 1993) S+ : positive influence (implicit on arcs in networks) X0 : joint responsibility Technical details Overview of Genie Generator Knowledge base Generation Process GenIE Patient Letter Generator Discourse grammar Observations (symptoms, etc.) & conclusions (diagnosis, etc.) Argument generator English generator Knowledge base (KB) letter (1st draft) Some discourse grammar rules (simplified) letter(P,D,S,R) :- pretest(P), diagnosis(D), source(S), risk(R). pretest( narration( T1, T2, T3, T4)) :- referral(T1), pretest_diagnosis_act(T2), testing(T3), test_result(T4). referral( purpose( nuc(E1), sat(E2))) :- get_clinic_details(patient, E1), % E1: patient X visited clinic on… get_symptoms(patient, E2), % E2: to diagnose cause of % symptoms S1, … Generated paragraph on cystic fibrosis diagnosis without arguments Narration Purpose Attribution To find Patient was Clinic cause of referred to suspected patient’s clinic respiratory Evidence infections and growth failure Patient has Patient’s cystic symptoms are due to fibrosis cystic fibrosis Purpose To find cause of patient’s respiratory infections and growth failure Patient was given sweat test Patient’s test result was abnormal • Tree representing rhetorical structure of paragraph of letter • Content of paragraph stored in leaves as logical formulas • Argument will be grafted onto yellow node (claim) Generated paragraph on cystic fibrosis diagnosis (in English) without arguments The patient was referred to the clinic in order to diagnose the cause of his respiratory infections and growth failure. The clinic thought that respiratory infections and growth failure could be due to cystic fibrosis. The clinic performed a sweat test in order to determine if he could have cystic fibrosis. The sweat test showed an abnormal NaCl level. GenIE Patient Letter Generator Discourse grammar Observations (symptoms, etc.) & conclusions (diagnosis, etc.) Argument generator English generator Knowledge base (KB) letter (1st draft) How to build an argument generator 1. Analyze examples of different types of real-world biomedical arguments written by genetic counselors • How is it related to domain knowledge in GenIE KB? • What’s the argument’s structure? 2. Derive general (domain-independent) rules of argumentation 3. Encode as set of argumentation schemes • Abductive or predictive (non-deductively valid) Analysis of Real-World Example (2) … delays in development and [birth defect] … (4) … to test for Velocardiofacial syndrome (VCF). (5) Individuals with VCF often have [birth defect] and learning problems. Claim in (4): [patient] may have VCF Data in (2): [patient] has [birth defect] and learning problems Warrant (implicit): VCF can cause [birth defect] and learning problems Backing for warrant in (5) Analysis of Use of Argumentation Analyzed arguments in letters in terms of their: • Claim (Conclusion) • Data: beliefs about patient (and/or biological family members) supporting claim • Warrant: biomedical principle relating data to conclusion • Backing: support for warrant, e.g., epidemiological statistics, Mendel’s theory • Critical questions (CQ): exception conditions Example of Argumentation Scheme Scheme: Effect to Cause Claim: A a e.g., Patient has VCF mutation Data: B b e.g., Patient has birth defect Warrant: S+(A,B) e.g., VCF can cause birth defect CQ: X- ({C,A}, B): C c i.e., unless there is some other explanation C for B where A, B,C are random variables; X-, S+ are qualitative probabilistic relations ?+A +C X+B Example of Another Argumentation Scheme Scheme: IncRisk Claim: B b Data: A a e.g., Patient will have CHD e.g., Patient has high LDL Warrant: S+(A,B) e.g., High LDL can lead to CHD CQ: Y- ({C,A}, B): C c i.e., unless there is some C (e.g. exercise) that mitigates A’s influence on B where A, B,C are random variables; Y-, S+ are qualitative probabilistic relations +A +C Y?+B Generated arguments for cystic fibrosis diagnosis List Background Background Evidence 2 mutated CFTR alleles abnormal protein viscous lung secretion. Viscous lung secretion & exposure to bacteria Patient had respiratory exposure to infection bacteria & respiratory infections Patient has cystic fibrosis 2 mutated CFTR alleles abnormal Evidence protein abnormal pancreas enzyme malabsoption growth failure Patient had growth failiure Patient has cystic fibrosis • Tree representing rhetorical structure of generated arguments • Content of arguments stored in leaves as logical formulas • Argument tree will be grafted onto tree for rest of paragraph Generated paragraph on cystic fibrosis diagnosis before adding arguments Narration Purpose Attribution To find Patient was Clinic cause of referred to suspected patient’s clinic respiratory Evidence infections and growth failure Patient has Patient’s cystic symptoms are due to fibrosis cystic fibrosis Purpose To find cause of patient’s respiratory infections and growth failure Patient was given sweat test Patient’s test result was abnormal • Tree representing rhetorical structure of paragraph of letter • Content of paragraph stored in leaves as logical formulas • Argument will be grafted onto yellow node (claim) Generated arguments are in red Generated letter on cystic fibrosis case – arguments for diagnosis added The patient was referred to the clinic in order to diagnose the cause of his respiratory infections and growth failure. The clinic thought that respiratory infections and growth failure could be due to cystic fibrosis. Cystic fibrosis can cause abnormal CFTR protein. Abnormal CFTR protein can cause a viscous lung secretion. Exposure to bacteria and a viscous lung secretion can lead to respiratory infections. He had respiratory infections. He had exposure to bacteria. Thus he could have cystic fibrosis. Also abnormal CFTR protein can cause an abnormal pancreas enzyme level. An abnormal pancreas enzyme level can cause malabsorption. Malabsorption can cause growth failure. He had growth failure. Thus he could have cystic fibrosis. Therefore respiratory infections and growth failure could be due to cystic fibrosis. The clinic performed a sweat test in order to determine if he could have cystic fibrosis. The sweat test showed an abnormal NaCl level. Cystic fibrosis is a disease caused by having two changed copies of a gene called CFTR. Two changed copies of the CFTR gene can cause abnormal CFTR protein. Abnormal CFTR protein can lead to an abnormal NaCl level and a sweat test can detect an abnormal NaCl level. The patient has an abnormal NaCl level. Thus he has cystic fibrosis. Therefore his respiratory infections and growth failure are due to his two changed copies of the CFTR gene. Full generated letter on cystic fibrosis case Generated arguments are in red The patient was referred to the clinic in order to diagnose the cause of his respiratory infections and growth failure. The clinic thought that respiratory infections and growth failure could be due to cystic fibrosis. Cystic fibrosis can cause abnormal CFTR protein. Abnormal CFTR protein can cause a viscous lung secretion. Exposure to bacteria and a viscous lung secretion can lead to respiratory infections. He had respiratory infections. He had exposure to bacteria. Thus he could have cystic fibrosis. Also abnormal CFTR protein can cause an abnormal pancreas enzyme level. An abnormal pancreas enzyme level can cause malabsorption. Malabsorption can cause growth failure. He had growth failure. Thus he could have cystic fibrosis. Therefore respiratory infections and growth failure could be due to cystic fibrosis. The clinic performed a sweat test in order to determine if he could have cystic fibrosis. The sweat test showed an abnormal NaCl level. Cystic fibrosis is a disease caused by having two changed copies of a gene called CFTR. Two changed copies of the CFTR gene can cause abnormal CFTR protein. Abnormal CFTR protein can lead to an abnormal NaCl level and a sweat test can detect an abnormal NaCl level. The patient has an abnormal NaCl level. Thus he has cystic fibrosis. Therefore his respiratory infections and growth failure are due to his two changed copies of the CFTR gene. North European ancestry increases the risk of having at least one changed copy of the CFTR gene. The patient's mother has north European ancestry. Thus she could have one changed copy of the CFTR gene. A child with two changed copies of a gene inherited one changed copy from the mother and one changed copy from the father. The patient has cystic fibrosis. Thus the patient's mother and the patient's father each have at least one changed copy of the CFTR gene. Two changed copies of the CFTR gene can cause cystic fibrosis symptoms. The patient's mother does not have cystic fibrosis symptoms. Thus she does not have two changed copies of the CFTR gene. Therefore she could have one changed copy of the CFTR gene. Similarly the patient's father has north European ancestry. He does not have cystic fibrosis symptoms. Thus he could have one changed copy of the CFTR gene. Suppose the patient's mother has one changed copy of the CFTR gene and the patient's father has one changed copy of the CFTR gene. A twenty-five percent chance exists that future siblings will inherit two changed copies of the CFTR gene and they will have cystic fibrosis symptoms. Alternatively a seventy-five percent chance exists that they will not inherit two changed copies of the CFTR gene and they will not have cystic fibrosis symptoms. GenIE Patient Letter Generator Discourse grammar Observations (symptoms, etc.) & conclusions (diagnosis, etc.) Argument generator English generator Knowledge base (KB) letter (1st draft) English Generator: Natural Language Generation Narration Purpose To find cause of patient’s respiratory infections and growth failure Attribution Patient was referred to clinic Clinic suspected Evidence (Grafted argument subtree here) Patient’s symptoms are due to cystic fibrosis Purpose To find cause of patient’s respiratory infections and growth failure Patient was given sweat test Patient’s test result was abnormal Input to English Generator: •Tree representing rhetorical structure of each paragraph of letter • Content of paragraph stored in leaves as logical formulas Sentence organization Select words, referring expressions SimpleNLG tool letter (1st draft) The end! Outline of Presentation Overview of Genie Project Technical details Some future work Some future work • Modify GenIE’s English generator to use existing biomedical ontology (e.g. UMLS) • Risk communication for low numeracy patients • Intelligent Learning Environments for Scientific Argumentation • students will debate with GenIE on cases in genomic medicine Acknowledgments Graduate and undergraduate research assistants: Tami Britt, Karen Jirak, Darryl Keeter, Carmen Navarro-Luzón, Zach Todd, David Waizenegger, Xuegong Xin, Rachael Dwight, K. Navoraphan, Brian Stadler, Carolyn McCann. Genetic counselors outside of UNCG, as well as students, faculty and staff of the UNCG Genetic Counseling MS Program and the UNCG Center for Biotechnology, Genomics and Health Research This material is based upon work supported by the National Science Foundation under CAREER Award No. 0132821 Thank you! Some Related Papers on GenIE Green, N. 2005a. A Bayesian network coding scheme for annotating biomedical information presented to genetic counseling clients. Journal of Biomedical Informatics 38: 130-144. Green, N. 2010. Representation of argumentation in text with Rhetorical Structure Theory. Argumentation 24(2): 181-196. Green, N. et al. 2011. Natural language generation of biomedical argumentation for lay audiences. Argument and Computation 2(1): 23-50.