Natural Language Generation of Scientific Argumentation for Healthcare Consumers

advertisement
Natural Language
Generation of Scientific
Argumentation for
Healthcare Consumers
Nancy L. Green
University of North Carolina Greensboro, USA
April 1, 2011
Outline of Presentation

Overview of Genie Project

Technical details

Some future work
GenIE Project Overview

Long-term goal: Automatic generation of biomedical
arguments for healthcare consumers



Example: Your mother and sister have BC. Are you at
higher risk of BC? What are pros/cons of genetic testing
for BC? Pros/cons of preventive medication or surgery?
Need to have in-depth understanding of arguments to
make informed decisions
First step: “On-the-fly” generation of 1st draft of patienttailored genetic counseling letters
Clinical Genetics Domain
Explanatory arguments for/against:

Diagnosis (e.g. a genetic condition
responsible for a child’s hearing loss)

Inheritance of condition in family (autosomal
dominant/recessive inheritance, mosaicism)

Intervention (e.g. medication to prevent
breast cancer)
Technical details

Overview of Genie

Knowledge base

Generation Process
GenIE Patient Letter Generator
Discourse
grammar
Observations
(symptoms, etc.)
& conclusions
(diagnosis, etc.)
Argument
generator
English
generator
Knowledge
base (KB)
letter
(1st draft)
GenIE Components

Knowledge Base (KB): causal model of genetic disease modeled
with qualitative probabilistic network (similar to Bayesian Network)

Genetic counselor adds facts about a patient’s case to KB

Discourse grammar generates “outline” of patient letter using info in
KB; calls argument generator

Argument generator creates arguments for diagnosis etc. using
evidence from KB

English generator: transforms completed outline (propositional
representation) to English text
Technical details
Overview of Genie Generator
 Knowledge base
 Generation Process

GenIE Patient Letter Generator
Discourse
grammar
Observations
(symptoms, etc.)
& conclusions
(diagnosis, etc.)
Argument
generator
English
generator
Knowledge
base (KB)
letter
(1st draft)
Part of cystic fibrosis KB – evidence
for patient X’s diagnosis
CFTR genotype
(GP)
CFTR protein
(BP)
Variable
GP
BP
BP2
PP
SP2
PP2
EP
SP
TSP
TRP
Concept Type
Genotype
Biochemistry
Biochemistry
Physiology
Symptom
Physiology
Event
Symptom
Test (decision)
Result
Value
2 mutations
Abnormal
Abnormal
True
True
True
True
True
Done
Abnormal
“Sweat Test”
(TSP)
Pancreas
enzyme level
(BP2)
Y+
“Sweat Test”
Result (TRP)
Malabsorption
(PP)
Key for qualitative relations (see Wellman 1990; Druzdzel & Henrion 1993)
Y+: positive additive synergy (enablement)
Bacteria in lung
secretion (EP)
Y+
Growth failure
(SP2)
S+ : positive influence (implicit on arcs in networks)
Viscous lung
secretion (PP2)
Frequent
respiratory
infections
(SP)
Part of cystic fibrosis KB – evidence
for patient X’s inheritance of mutation
N. European
ancestry (HM)
N. European
ancestry (HF)
CFTR genotype
of mother (GM)
CFTR genotype
of father (GF)
Frequent
respiratory
infections (SM)
Variable (refers to)
GP (patient)
GM (mother)
SM (mother)
HM (mother)
GF (father)
SF (father)
HF (father)
X0
Frequent
respiratory
infections (SF)
CFTR genotype
(GP)
Concept Type
Genotype
Genotype
Symptom
History
Genotype
Symptom
History
Value
2 (mutated alleles)
1 (mutated allele)
False
True
1 (mutated allele)
False
True
Key for qualitative relations (see Wellman 1990; Druzdzel & Henrion 1993)
S+ : positive influence (implicit on arcs in networks)
X0 : joint responsibility
Technical details
Overview of Genie Generator
 Knowledge base
 Generation Process

GenIE Patient Letter Generator
Discourse
grammar
Observations
(symptoms, etc.)
& conclusions
(diagnosis, etc.)
Argument
generator
English
generator
Knowledge
base (KB)
letter
(1st draft)
Some discourse grammar
rules (simplified)
letter(P,D,S,R) :- pretest(P), diagnosis(D), source(S), risk(R).
pretest( narration( T1, T2, T3, T4)) :- referral(T1),
pretest_diagnosis_act(T2),
testing(T3), test_result(T4).
referral( purpose( nuc(E1), sat(E2))) :- get_clinic_details(patient, E1),
% E1: patient X visited clinic on…
get_symptoms(patient, E2),
% E2: to diagnose cause of
%
symptoms S1, …
Generated paragraph on cystic fibrosis
diagnosis without arguments
Narration
Purpose
Attribution
To find
Patient was
Clinic
cause of
referred to
suspected
patient’s
clinic
respiratory
Evidence
infections
and growth
failure
Patient has
Patient’s
cystic
symptoms
are due to
fibrosis
cystic
fibrosis
Purpose
To find
cause of
patient’s
respiratory
infections
and growth
failure
Patient was
given
sweat test
Patient’s
test result
was
abnormal
• Tree representing rhetorical structure of paragraph of letter
• Content of paragraph stored in leaves as logical formulas
• Argument will be grafted onto yellow node (claim)
Generated paragraph on cystic fibrosis
diagnosis (in English) without arguments
The patient was referred to the clinic in order to diagnose the
cause of his respiratory infections and growth failure. The clinic
thought that respiratory infections and growth failure could be
due to cystic fibrosis. The clinic performed a sweat test in order
to determine if he could have cystic fibrosis. The sweat test
showed an abnormal NaCl level.
GenIE Patient Letter Generator
Discourse
grammar
Observations
(symptoms, etc.)
& conclusions
(diagnosis, etc.)
Argument
generator
English
generator
Knowledge
base (KB)
letter
(1st draft)
How to build an argument
generator
1. Analyze examples of different types of real-world
biomedical arguments written by genetic counselors
•
How is it related to domain knowledge in GenIE KB?
•
What’s the argument’s structure?
2. Derive general (domain-independent) rules of
argumentation
3. Encode as set of argumentation schemes
•
Abductive or predictive (non-deductively valid)
Analysis of Real-World Example
(2) … delays in development and [birth defect] …
(4) … to test for Velocardiofacial syndrome (VCF).
(5) Individuals with VCF often have [birth defect] and
learning problems.
Claim in (4): [patient] may have VCF
Data in (2): [patient] has [birth defect] and
learning problems
Warrant (implicit): VCF can cause [birth defect]
and learning problems
Backing for warrant in (5)
Analysis of Use of Argumentation
Analyzed arguments in letters in terms of their:
• Claim (Conclusion)
• Data: beliefs about patient (and/or biological
family members) supporting claim
• Warrant: biomedical principle relating data to
conclusion
• Backing: support for warrant, e.g.,
epidemiological statistics, Mendel’s theory
• Critical questions (CQ): exception conditions
Example of Argumentation Scheme
Scheme: Effect to Cause
Claim: A  a
e.g., Patient has VCF mutation
Data: B  b
e.g., Patient has birth defect
Warrant: S+(A,B)
e.g., VCF can cause birth defect
CQ: X- ({C,A}, B): C  c i.e., unless there is
some other explanation C for B
where A, B,C are random variables;
X-, S+ are qualitative probabilistic relations
?+A
+C
X+B
Example of Another Argumentation Scheme
Scheme: IncRisk
Claim: B  b
Data: A  a
e.g., Patient will have CHD
e.g., Patient has high LDL
Warrant: S+(A,B)
e.g., High LDL can lead to CHD
CQ: Y- ({C,A}, B): C  c i.e., unless there is
some C (e.g. exercise) that mitigates A’s
influence on B
where A, B,C are random variables; Y-, S+ are
qualitative probabilistic relations
+A
+C
Y?+B
Generated arguments for
cystic fibrosis diagnosis
List
Background
Background
Evidence
2 mutated CFTR alleles
 abnormal protein
 viscous lung
secretion.
Viscous lung secretion
& exposure to bacteria
Patient had
 respiratory
exposure to
infection
bacteria &
respiratory
infections
Patient has
cystic fibrosis
2 mutated CFTR
alleles  abnormal
Evidence
protein  abnormal
pancreas enzyme 
malabsoption 
growth failure
Patient had
growth
failiure
Patient has
cystic fibrosis
• Tree representing rhetorical structure of generated arguments
• Content of arguments stored in leaves as logical formulas
• Argument tree will be grafted onto tree for rest of paragraph
Generated paragraph on cystic fibrosis
diagnosis before adding arguments
Narration
Purpose
Attribution
To find
Patient was
Clinic
cause of
referred to
suspected
patient’s
clinic
respiratory
Evidence
infections
and growth
failure
Patient has
Patient’s
cystic
symptoms
are due to
fibrosis
cystic
fibrosis
Purpose
To find
cause of
patient’s
respiratory
infections
and growth
failure
Patient was
given
sweat test
Patient’s
test result
was
abnormal
• Tree representing rhetorical structure of paragraph of letter
• Content of paragraph stored in leaves as logical formulas
• Argument will be grafted onto yellow node (claim)
Generated
arguments
are in red
Generated letter on cystic fibrosis case –
arguments for diagnosis added
The patient was referred to the clinic in order to diagnose the cause of his respiratory
infections and growth failure. The clinic thought that respiratory infections and growth
failure could be due to cystic fibrosis. Cystic fibrosis can cause abnormal CFTR protein.
Abnormal CFTR protein can cause a viscous lung secretion. Exposure to bacteria and a
viscous lung secretion can lead to respiratory infections. He had respiratory infections. He
had exposure to bacteria. Thus he could have cystic fibrosis. Also abnormal CFTR protein
can cause an abnormal pancreas enzyme level. An abnormal pancreas enzyme level can
cause malabsorption. Malabsorption can cause growth failure. He had growth failure. Thus
he could have cystic fibrosis. Therefore respiratory infections and growth failure could be
due to cystic fibrosis. The clinic performed a sweat test in order to determine if he could
have cystic fibrosis. The sweat test showed an abnormal NaCl level.
Cystic fibrosis is a disease caused by having two changed copies of a gene called CFTR.
Two changed copies of the CFTR gene can cause abnormal CFTR protein. Abnormal
CFTR protein can lead to an abnormal NaCl level and a sweat test can detect an abnormal
NaCl level. The patient has an abnormal NaCl level. Thus he has cystic fibrosis. Therefore
his respiratory infections and growth failure are due to his two changed copies of the CFTR
gene.
Full generated letter on cystic fibrosis case
Generated
arguments
are in red
The patient was referred to the clinic in order to diagnose the cause of his respiratory infections and growth failure.
The clinic thought that respiratory infections and growth failure could be due to cystic fibrosis. Cystic fibrosis can
cause abnormal CFTR protein. Abnormal CFTR protein can cause a viscous lung secretion. Exposure to bacteria
and a viscous lung secretion can lead to respiratory infections. He had respiratory infections. He had exposure to
bacteria. Thus he could have cystic fibrosis. Also abnormal CFTR protein can cause an abnormal pancreas
enzyme level. An abnormal pancreas enzyme level can cause malabsorption. Malabsorption can cause growth
failure. He had growth failure. Thus he could have cystic fibrosis. Therefore respiratory infections and growth failure
could be due to cystic fibrosis. The clinic performed a sweat test in order to determine if he could have cystic
fibrosis. The sweat test showed an abnormal NaCl level.
Cystic fibrosis is a disease caused by having two changed copies of a gene called CFTR. Two changed copies of
the CFTR gene can cause abnormal CFTR protein. Abnormal CFTR protein can lead to an abnormal NaCl level
and a sweat test can detect an abnormal NaCl level. The patient has an abnormal NaCl level. Thus he has cystic
fibrosis. Therefore his respiratory infections and growth failure are due to his two changed copies of the CFTR
gene.
North European ancestry increases the risk of having at least one changed copy of the CFTR gene. The patient's
mother has north European ancestry. Thus she could have one changed copy of the CFTR gene. A child with two
changed copies of a gene inherited one changed copy from the mother and one changed copy from the father. The
patient has cystic fibrosis. Thus the patient's mother and the patient's father each have at least one changed copy
of the CFTR gene. Two changed copies of the CFTR gene can cause cystic fibrosis symptoms. The patient's
mother does not have cystic fibrosis symptoms. Thus she does not have two changed copies of the CFTR gene.
Therefore she could have one changed copy of the CFTR gene. Similarly the patient's father has north European
ancestry. He does not have cystic fibrosis symptoms. Thus he could have one changed copy of the CFTR gene.
Suppose the patient's mother has one changed copy of the CFTR gene and the patient's father has one changed
copy of the CFTR gene. A twenty-five percent chance exists that future siblings will inherit two changed copies of
the CFTR gene and they will have cystic fibrosis symptoms. Alternatively a seventy-five percent chance exists that
they will not inherit two changed copies of the CFTR gene and they will not have cystic fibrosis symptoms.
GenIE Patient Letter Generator
Discourse
grammar
Observations
(symptoms, etc.)
& conclusions
(diagnosis, etc.)
Argument
generator
English
generator
Knowledge
base (KB)
letter
(1st draft)
English Generator: Natural
Language Generation
Narration
Purpose
To find
cause of
patient’s
respiratory
infections
and
growth
failure
Attribution
Patient
was
referred to
clinic
Clinic
suspected
Evidence
(Grafted
argument
subtree
here)
Patient’s
symptoms
are due to
cystic
fibrosis
Purpose
To find
cause of
patient’s
respiratory
infections
and
growth
failure
Patient
was given
sweat test
Patient’s
test result
was
abnormal
Input to English Generator:
•Tree representing rhetorical structure of each
paragraph of letter
• Content of paragraph stored in leaves as logical
formulas
Sentence
organization
Select words,
referring
expressions
SimpleNLG tool
letter
(1st draft)
The
end!
Outline of Presentation

Overview of Genie Project

Technical details

Some future work
Some future work
• Modify GenIE’s English generator to use existing
biomedical ontology (e.g. UMLS)
• Risk communication for low numeracy patients
• Intelligent Learning Environments for Scientific
Argumentation
• students will debate with GenIE on cases in
genomic medicine
Acknowledgments
Graduate and undergraduate research assistants: Tami
Britt, Karen Jirak, Darryl Keeter, Carmen Navarro-Luzón,
Zach Todd, David Waizenegger, Xuegong Xin, Rachael
Dwight, K. Navoraphan, Brian Stadler, Carolyn McCann.
Genetic counselors outside of UNCG, as well as students,
faculty and staff of the UNCG Genetic Counseling MS
Program and the UNCG Center for Biotechnology,
Genomics and Health Research
This material is based upon work supported by the National Science
Foundation under CAREER Award No. 0132821
Thank you!
Some Related Papers on GenIE
Green, N. 2005a. A Bayesian network coding scheme for annotating
biomedical information presented to genetic counseling clients.
Journal of Biomedical Informatics 38: 130-144.
Green, N. 2010. Representation of argumentation in text with
Rhetorical Structure Theory. Argumentation 24(2): 181-196.
Green, N. et al. 2011. Natural language generation of biomedical
argumentation for lay audiences. Argument and Computation 2(1):
23-50.
Download