>> Lucy Van der Wende: Good morning everyone. ... would like to introduce Wendy Chapman who is going to...

advertisement
>> Lucy Van der Wende: Good morning everyone. My name is Lucy Van der Wende and I
would like to introduce Wendy Chapman who is going to be giving the talk this morning. Thank
you very much for coming to Seattle when it's living up to its reputation for being rainy. Wendy
is the chair of the Biomedical Informatics Department at the University of Utah and her work
lies in the intersection of clinical research, NLP and human computer interface, or interaction,
HCI. She's the leading figure, really, the leading figure in biomedical informatics. Wei Fung and
I met with her. We been known of Wendy for a long time, but we met at the NLP workshop
that was convened at the Veterans Administration Department where Wendy is an affiliate
faculty. As some of you may know, the Veterans Administration has one of the largest and
most comprehensive electronic medical records systems. It has pioneered the Million Veterans
Program where it has sought consent for research to sequence the genomics and access the
records, the EMR records for a million veterans, which is a unique resource and Wendy knows
about this and hopefully will tell us more about data mining information extraction and how we
can use that information to improve the outcome for our patients. Thank you.
>> Wendy Chapman: It's such a pleasure to be here. I really appreciate the opportunity. So it
is rainy here and when I left Salt Lake City it was 36 degrees and sunny and when I came here
was 46 and cloudy and felt a lot colder, that humidity. But there's been a big snowstorm there,
so the university is all but closed down today. My husband bike to work on a fat bike, so it was
very difficult in the snow and he sent me pictures so I've received a lot of texts and phone calls
about getting into the office today.
>>: They got their skis open and ready now?
>> Wendy Chapman: If you had your skis, but it's all uphill there. Going down would be fine.
I'm really happy to be here and my focus in research is natural language processing, in
particular, information extraction. That's what my talk will be focused on. I'll give you a little
bit of background. I talked with Wei Fung. I lived in Hong Kong and learned Cantonese and so
that's really how I got this love of language. I went back to the University of Utah and I studied
linguistics and Mandarin Chinese for my bachelors degree. In between here I went to the
University of Wisconsin to study Chinese literature. That's what I was going to get my PhD in
and I submitted but didn't have funding. I waited another year, still no funding and in the
meantime my husband, who Lucy met at a Johns Hopkins workshop, found medical informatics
and wanted to do that. He was moving from electrical engineering. And then I saw that they
do natural language processing and I thought that seems like a really nice way to plan my love
of linguistics to something practical. So I signed up, led in on probation and the rest is all
history. I fell in love with the field. I went to Pittsburgh to do a postdoc for three years and
then stayed on there as faculty, so I was in Pittsburgh for 10 years. I moved to UC San Diego in
my attempt to get back to the West and then this opportunity came open at the University of
Utah for the chair position. It was a big career switch for me. I still do research but about three
fourths of the day a week. I have a really great team that's moving forward well, but I'm not in
the details as much as I would like to be. I want to give a little bit of context about healthcare in
the United States. Here we have a graph. The x-axis is the amount of spending, the total
expenditure of healthcare per capita. Then and the y-axis is the life expectancy. You can see
that we spend way more than anybody else and our life expectancy is nothing to brag about.
So the U.S. healthcare right now is in crisis. That means that the economy is in crisis because
our healthcare spending is such a huge part of our economy. Some people see crisis as an
opportunity. That's what I'm learning in all of my leadership classes. [laughter]. And so there's
this great, and it really is an opportunity to transform healthcare because the weight healthcare
has been run is so awful. The incentives have really benefited hospitals and doctors, but they
haven't benefited patients. And I think in the next ten years we are going to see a whole
different world where it's really patient centered and so it really is an opportunity and the
pressure is from the finances. There is a big movement that we have a lot of data now. There's
a lot of digital data and so we need to be learning from that data. Every patient that comes in
needs to be learned from, so that we know what are the better treatments and what
treatments don't work, and then with all the big data science and big data analytics. Here's an
article in JAMA about academic health centers are really at risk right now because they're more
expensive. Why is someone going to pay more to go to an academic health center rather than
to a community health center? And really what they see is that the advantage of an academic
health center is the research, that if you can translate that research and if you can take all the
knowledge that you have and apply it to the data and learn from it and create better practices
and implement them, then you can really have an edge. And that's the only way that we're
going to really get to where we want to go. I would say then that healthcare transformation
needs natural language processing because a lot of the information in the electronic medical
record is in text. I'll give you a couple of examples. First of all, in clinical decision support there
is a system called the antibiotic assistant that's implemented at Intermountain Healthcare and
it monitors in the background a patient's temperature, white blood cell count and all these
different variables to determine whether or not the patient develops a new infection in the
hospital. If they do develop a new infection it will alert the physician and it will say we think
your patient has an infection. Here's why. Here's the evidence. Here's the dose and the type
of antibiotic we think they should take based on their insurance and allergies et cetera. It's a
very popular program and has saved a lot of money and a lot of lives. It needs information from
the chest x-ray report to be able to say does the patient have an infiltrate that is indicative of an
ammonia. Another area in the clinical research from is readmissions to the hospital. It costs
the Medicare program a lot of money. And a huge portion of that is potentially preventable.
We can prevent these readmissions. There are a lot of readmission models that are being
created and they are created from ICD codes, discharge diagnoses, lab values, these coded data
that are in databases. But they don't have very good predictive value. Some people are
hypothesizing if we can get information out of the text, which includes more detailed
symptoms, but also social risk factors that really make that patient at risk for not taking care of
themselves, we might be able to improve our prediction of readmission. And the social risk
factors are things like do they have a stable housing situation? Are they abusing substances?
What is their living conditions? Do they have social support at home? Can they bathe
themselves? These kinds of things really affect whether a patient is going to care for their
wound and take their medications. And these are all described in various ways in the text. So
healthcare transformation needs natural language processing and, indeed, about 70 percent of
the clinical data that we are interested in is locked inside this text. Jeff Hamerbacher who is the
founder and chief scientist of Cloudera and was at Facebook previously, said the best minds of
my generation are thinking about how to make people click ads. That sucks. So I'm here to say
there are some great minds here. We need some of that brainpower to transform healthcare.
My objectives are first to convince you, because we already have two people here who are
already at Microsoft and working in natural language processing in the medical domain. But
out of a lot of researchers at Microsoft, let's get more people working on this area. And then to
talk about what do we need to do to make it a product and make it effective and be used, and
so focusing on some informatics principles where we are not just trying to improve scores a
little bit at a time on little parts, but how do we really build something that is usable? I don't
have the answer. I can just point out some principles and some problems. People know about
natural language processing more now because of Jeopardy. And there was a big follow-up in
Computerworld. I just love this article. It could very well herald a whole new era in medicine.
Like no one had ever thought of a client computer's medicine until Jeopardy. And so now
there's more knowledge about what we can do. People have been working on natural language
processing applied to clinical reports in a variety of different domains. And in these focused
areas we can build systems that perform as well as people. But clinical NLP has been a research
focus since the 1960s, so why do we still not have an NLP system in every hospital? Why are we
not just annotating, automatically annotating all of the data that's coming through and storing
that in coded form? There are a few barriers, I think, that have put us way behind the other
NLP, the general NLP field. The main one is getting the data. Sharing clinical data is just so
difficult. We haven't had shared data sets for developments in evaluation and when we try to
adapt modules that are trained on general English, they just don't work as well. We haven't
had standard conventions for annotations and so everyone creates their own annotated
corpora and nobody can share and so it's one person at a time and a lot of preposition. In the
past there wasn't a lot of collaboration in NLP. There were a few people that were the main
NLP people but there wasn't a lot of collaboration. I would say over the past five years these
things have changed to large extents. We've developed resources that are shareable. We've
created common schemas that people use and there's a lot more collaboration going on. But
it's slow progress. But to me, the biggest barrier is that what we build as NLP researchers is just
so far upstream from what people need that there is just a huge gap. I would claim that if we
want to have impact we have to go beyond improving our accuracy of the individual tools to
creating things that can be applied to real world problems. I want to talk about three
informatics principles that I think we could apply to NLP that can help in this area. First of all, to
be application driven. Second, think about user centered design and third pay attention to
standards. I'll go through each one of those in a little bit in detail. When you run an NLP
system and an information extraction system on a sentence like no family history of colon
cancer, there is a pipeline with all different types of NLP tools and you break up the sentence
into its syntactic parts and you assign semantic values like maps to this vocabulary item and it's
negated. That's the output and it's very important and it's hard to get that output and get it to
be accurate. But what the users want is not show me all of the UMLS concepts in the text and
tell me if they are negated. They want to know how do I improve, was my colonoscopy exam
high-quality and if not, why? Find patients with cravisnosis [phonetic] so that I can see whether
medication works or surgery. They definitely want to know how do I get higher billing codes?
[laughter]. That's one area where industry has jumped in and really helped out, because there
is a business case, right? How I spend less time documenting? And how do I find all the
information that people have already documented? I can't find what I'm looking for. There's
too much information. How do we help patients understand the reports. So these are the
types of applications that people want. So there is this big gap between the NLP output and
these applications. And NLP researchers might not be the ones driving these applications, but
they need to be involved and partnering on those. The difficult part though is how do you be
application driven and still develop general-purpose tools, because we can build an application
for one particular thing and then when we want to build it for another diagnosis we start from
scratch. And so it's really finding that balance between being application driven and being
general-purpose. To do that we really need this strong partnership with domain experts who
have the insight about what the data are needed for. An example of that would be if you're
going to create a knowledge base for cough… I started this work when I was a postdoc and I
went to the National Library of Medicine and I said I'm going to build something to find out if
there's respiratory findings. So first I'm just going to find all of the UMLS concepts that map to
the concepts that I care about. I just had no idea that in cough there were 20 UMLS concepts
for cough. I sometimes wouldn't find them all either. So what do you mean by cough? We
have to explicitly model what we mean by cough. When we look at things like I want to find
patients with fever, it's not just looking for words with fever and febrile, but there's attribute
value pairs that have to be found. Those dependent on the application that you are rebuilding.
So how do you represent your knowledge in a way so that for this application they have the
threshold at 38 and in another application they might have it at 37. How do you not just
hardcode everything you are doing for every single application? That is not scalable. The
second principle is to be user centered. We need to be able to support users and so we have to
think about the way their brains work and then we have to fit it into the workflow because it's
not an individual taking care of a patient; it's a team and there's this whole workflow. And how
does this information fit into the workflow? We first build accurate tools, but then beyond that
they have to be useful and beyond that you have to be scalable and deployable. Those are
things that we spent a lot of time focusing on the accuracy, but not on the other parts. I think
that to really succeed in healthcare and a lot of other domains, but healthcare is complicated,
the technical is just one tiny part and there are all these spheres of the clinical and sociological,
political and commercial spheres that we have to be aware of to be able to really build the tools
that people need. And finally paying attention to standards. We need to better leverage
existing resources so that what we build is interoperable. There are vocabularies and we do
that in large part with vocabularies, but information modeling, we don't only want to know that
this maps to a certain vocabulary item. We need to know the context of it. We need to know
for blood pressure, we need to know what position was the patient in? What was used to take
the blood pressure? There is all of this information that goes with this metadata that goes with
that concept or that action that is important for interpretation. As NLP developers, we need to
be extracting all of that type of information and modeling. And then how do we model it in a
way that we can use it in different EMRs and different settings? How does this all relate to NLP
researchers? I would say that a lot of NLP research problems that we work on are really far
upstream from the healthcare applications. But there are a lot of new interesting NLP research
problems that arise when you are working on user driven development types of applications.
So it's not like you're abandoning research and saying oh I'm just going to be applied. There are
many research problems that come up when you're trying to build things that people use. I
want to talk a little bit about the work that we're doing to try to bridge this gap in our lab and
it's one small part of the world. Our lab is the Biomedical Language Understanding Lab. That's
an old slide. It still says University of California San Diego. We hadn't moved our website, but
we did move it now so we need to replace that slide. So we are building a toolkit and this is
funded from the VA called IE-Viz, information extraction and visualization and it's a workbench
to help people, to help domain experts and NLP experts collaborate and build applications that
are useful while taking advantage of existing NLP tools. It has four parts to it. First you need to
create your knowledge base about what you are trying to represent and what you want to
extract. Next you need to create NLP tools. By create, there are a lot of NLP tools out there
and you apply things had already exist and compare them and use the Watson model of lots of
different evidence coming in to develop the best tool for each thing you are trying to extract.
Oftentimes extraction is part of the problem, but sometimes you only need a classifier.
Sometimes you need a classifier on top after the extraction, so how do we help people build
classifiers that integrate knowledge from the NLP that is beyond bag of words? And then, and
we haven't really gotten much to this part with the visualization, but they don't want an XML
file with a bunch of concepts marked. They want a graph or they want a timeline or something
like that. So how do we help them told that from the NLP output? The first step then is
knowledge authoring. We've developed two ontologies, a domain schema ontology and a
modifier ontology. The domain schema lists the, it's a linguistic representation of the clinical
elements that can be described in text. The modifier ontology tells which modifiers are
allowable by each of those elements, so that when you extract the information from a sentence
like this, you know here is the disease. Who experienced it, was it negated or not? Was it
historical or not? And so it's an information model around the concept of cancer. So these
ontologies, and when I say the word ontology, they are not ontologies like the kind of
ontologies that represent reality. They're representing information in text, so they are lexical
research ontologies. Here we have that there are different elements that you can describe and
report. There are entities, like a person, and there are events. Most things that you describe
our events, like allergies, problems which are diagnoses findings, et cetera, vital signs. Those
are the types of things that you might see described in the clinical report that you're interested
in extracting. They can have relationships with each other. One finding can be evidence of a
diagnosis. Medication can treat a disease, and so on. We can model those relationships and
then those elements, the modifiers are very important for understanding what is going on
when you describe something in the text. You can see the word pneumonia in all three of these
sentences and if all you're looking for is pneumonia you're going to misinterpret it because
everyone of those has a different interpretation. Knowing which modifiers are allowable for
each of those is very important, so that's what the modifier ontology is. And what is is it
started out as the neg x knowledge base and stored in owl format and then context which is an
algorithm that we developed. It extended from there and then we added more and more
modifiers, so it's kind of an extension from that. It has different types of modifiers. Like does
something exist, and it definitely exists or is there uncertainty about the existence? Is it talking
about the future? Is it talking about past? Is an indication for exam? Et cetera. But it also is a
lexicon and so it has linguistic expressions that we've seen that indicate that so for historical,
again noted, previous, changing, those are things that indicate that something happened in the
past. It has actions because the scope of the modifier is it important and sometimes the scope,
most of the time the scope goes forward like no tumor, but sometimes it goes backwards like
tumor free. And so everything you need to run neg x or context is encoded in here. The
direction that it goes, and then it's translated into some languages, Swedish, German and
French right now. And so a lot of people have written papers about applying the algorithm with
these terms two different languages and how well does it transfer. The schema ontology
imports the modifier ontology. If you have a medication event then the modifier ontology says
you also need to know, you can know the type, the dose, the frequency and the root of the
medication. If you have a diagnosis event then it can have severity. It can have the history.
And we've developed this from models that have been built out of the sharp project. Some
people here are familiar with the sharp project. We've built a cTAKES, and common type
system and that was the basis of our ontologies and we've extended beyond there to map to
information models in the clinical world. They are mapped to FHIR now for people who are
interested in HL7 FHIR. What these allow the user to do is to create a domain ontology that's
used for natural language processing. Demand ontology would be instance of the schema
ontology and it would represent the linguistic information that you want to know about clinical
elements in a particular domain. So if you are working on the monument, for instance, then
what are all the concepts you care about that indicate pneumonia? And then you will use that
as the knowledge base for the natural language processing system and the target output. I'll
give you an example. Here is a domain ontology for pneumonia and it's just the very beginning
of one and so we see under diagnosis that we have altered mental status, heart condition, from
condition and pneumonia. Those are four instances of the class diagnosis, but in there then we
have the whole lexicon. What are the synonyms for pneumonia? What are the misspellings?
What our regular expressions? If there were numeric values like if there is a fever, you would
have numeric values to go along with it. And so you can explicitly define what you are looking
for. And the goal behind this is to create these potentially open shareable knowledge
representation modules that people could borrow. And so when someone else is looking for
pneumonia they don't have to start from scratch. I mean how many of us have written about
pneumonia? Several of us in this room. But if someone else is going to work on pneumonia
with NLP they start from scratch and build up their lexical expressions and their synonyms.
Wouldn't it be nice if you could go to this library and you could say here are definitions of
pneumonia that people have had. You could borrow and tweak and kind of customize to start
with what somebody else has already done.
>>: This happens sometime in my [indiscernible]. Do you think there would be a use case
where it happens that the [indiscernible] and you actually want to capture the [inaudible].
>> Wendy Chapman: Like the regular expressions?
>>: Yeah. Like maybe a neg x in one case and then [indiscernible] or something [inaudible].
>> Wendy Chapman: Part of addressing the ambiguity is specifying the modifiers. The word
sense disambiguation that might occur isn't addressed in this way. That's definitely something
that would have to be on top of this. We've built a front-end interface because using protégé is
not natural for everybody. It's called Knowledge Author and on the back and you have the two
ontologies that the user doesn't have to know anything about. It's just what's driving the
questions you ask the user. And the output is a domain ontology and a schema for NLP
systems. For instance, you might want to create a variable called African-American adult. You
would create a person role, a person and you can define their age. You can define their gender,
their race, their death date, birthday, all the attributes and the modifiers that occur with a
person. You can create very specific variables that you're trying to extract. That's where the
differences between a general NLP system that's trying to output a concept like cough and a
specific… You know, in one study you might want productive cough. In another study you
might want severe productive cough. In a different study you might want mild productive
cough. In another study want that they don't have a cough. So all of those modifiers they're
particular to applications that a single group or person is interested in and we want to be able
to model all of that and that's where the information modeling comes in. It gives you the
power to create exactly what you are trying to filter on. Maybe what you're looking for his
patients who are taking ibuprofen. And so when you type in ibuprofen she will map to the
UMLS and you can select the concepts that you want and when you select them, now all the
synonyms and acronyms that are stored in that knowledgebase become part of your knowledge
base. But you might not just be interested in the mention of ibuprofen. You want to know that
they are taking ibuprofen orally. With the modifier of form, you can say I only want things that
are oral and so you are building these very specific variables. No family history of colon cancer,
to a physician this is one variable. They have no family history of colon cancer, but in NLP, this
is a lot of different parts. It's cancer; that's the concept. It's the anatomical location of colon.
It's occurred in the past. It's talking about the past history for a family member and it didn't
occur. So that's a lot of things that the NLP system has to output and so what we did is split
them out into these kinds of linguistic variables like negation. Who experienced it and is it in
the past, current or future and those apply to all the different schema elements. But then for
cancer, which is a disease, then you can also look at severity and other things like that. We did
a lot of user studies to try to figure out how to model this in a way that we could get the
domain experts to understand it because it seemed so simple from an NLP point of view. What
is that negation? But to them it's just one concept. And then we have also worked on how do
we suggest synonyms to them because you can only think of so many synonyms. And then
different algorithms for mining text, bringing forward synonyms and letting them select them.
There are a lot of different research questions that we have addressed in the area of knowledge
authoring. Which modifier is important? The modifier ontology has way too many and no one
would want to use all of those modifiers. It's every possible modifier that has been used in any
clinical modeling. How well do people agree when they annotate them? Because if you can't
get people to agree, it's very difficult to create a system that can do it. Some of them are very
difficult to get agreement on, like uncertainty. How can we learn the terms? And that's what
we have been talking with Wei Fung about, can we mine text and bootstrap and help learn
these terms and can we suggest other types of things like medications or treatments that might
indicate the patient has that disease when the text doesn't explicitly say it. So lots of fun
research questions there and many of them unanswered. Once you create a domain ontology,
now you have the opportunity, you have all of your knowledge explicitly defined. You can now
start running some NLP tools over your text and see what pops up. That's the NLP
customization part of the workbench. Our vision is that there would be a lot of different NLP
tools that you have access to. Why limit yourself? There are so many different tools available
and some might perform better on some variables and others on others. If you can set of kind
of a customization loop where you have the knowledge explicitly defined, now you can run
several different tools over using that knowledge, bring it back and show it to the user or have
some kind of gold standard. It's kind of an iterative thing where they are correcting it. They are
marking things that are wrong. It's learning and over time it is kind of selecting and optimizing
the best combination of tools for each different variable. In theory, there's no reason why you
can't have one variable, one tool for five of the variables that you are looking for and a different
tool for three of them and another one for another six, especially if you start simple.
Sometimes keyword matching is good enough for some things when you add negation. Other
things, you might need a machine learning classifier. Other things you might need some text,
but why use the more sophisticated things on the things that you can get in an easier way? So
the way to help user interact with the system and develop those. But to do those you really
need some good tools for the user to interact with. We've spent quite a bit of time developing
some different tools and it still doesn't feel like there's one tool. It feels like different tools
have different strengths. This is the evaluation workbench where you read in two annotate of
sets. It might be that one's a gold standard and one's your system. It might be two different
systems. But you're able to compare them against each other now and if you consider one of
them right and one of them the gold standard, then you can look at the false positives and false
negatives and this allows you to drill down and see not only the extracted named entities but
also all of their attributes. You can look at the attributes and really do in error analysis to see
where your system screwed up compared to the gold standard. That's one of the tools that we
developed. This is another tool that we've been developing with colleagues at University
Pittsburgh. Lucy probably knows Jan Weibe and Rebecca Hwa. This is a visualization tool
where the table here are the variables that you created in the domain ontology and they are
binary. They are true or false. Did they occur? And you give it a couple of training examples
and now it learns to annotate those. It's a classifier and now you can start to drill down and
give you feedback and highlight text that shows no. You are wrong on this and here is how I
know. And so it has things like this word tree. You're looking at the word biopsy and it will
show all of the words that occur with biopsy before and after biopsy and whether they were
true or false in the text and as you click on them it takes you to the text. So it is this interactive
way to really try to understand what the system is doing, mark the evidence that it's wrong and
retrain. Some questions that we are addressing in this area, how do you really use these
domain ontologies and different tools? So we are writing APIs for different types of tools like C
takes, our own tools, pi context. Which methods work best for which types of concepts? How
do we incorporate the feedback? That's a big research question. We have done it on the
machine learning and had one or two publications on that, but what about rule-based systems?
How do we incorporate feedback from users on that? And how do we suggest changes? So
there is lots of research in that area. Sometimes you need a classifier and oftentimes you need
it after the NLP. So you get all of this NLP evidence and now you need to determine for the
report or for the patient, what is the value of the variable? Did the patient have ammonia, for
instance? So we developed a tool called TextVect which is based on the idea that you already
have a training set at the document level, for instance. Now, you want to create a classifier and
typically you use Wecca or Mallet then there are all kinds of different tools that you can use or
writing your own and they're use N-grams. But there is evidence from NLP systems that could
help improve the classification performance, but people building these classifiers typically
aren't familiar with the NLP literature and tools and they have to go read up and find out what
they all are and solve them and get them going. The idea behind this is it is a Uema [phonetic]
pipeline. It has a bunch of different NLP tools already in it, part of speech taggers, C takes,
negation et cetera, and now you can select what kinds of features you want to use in your
classifier. You can select the representation you want, whether it's binary count or tf-idf and
then you run it and the output is a vector that now you can train your classifiers on. You can do
some of the training inside here. We evaluated that on the i2b2 training set and showed that it
can perform almost as well as the best systems just out-of-the-box by using these default tools
that we have installed. So still questions, what's the best representation of these complicated
NLP features? What features are more useful? And what type of NLP tools can really help in
classifier developments? One thing that people found is mapping to concepts can be helpful
because now words like shortness of breath become the concept shortness of breath and not
shortness of and breath and so it can reduce your feature space for one thing. But negation can
be very important, so differentiating between shortness of breath and no shortness of breath.
We've looked at history and family experience and those we haven't found as strong of a need
to mark them in your feature vector, but there's lots of… If you create these different
information models to represent no family history of colon cancer, how do you represent that
in your machine learning feature? Finally, you want to build, people, like I said, don't want an
XML file. The domain expert is working with you because they want to create, for instance,
they want to create a dashboard to find all of the patients who have some kidney infection and
they want to be able to see, oh, that patient has a kidney infection. I need to pay attention to
them or call them or whatever. Or they want to create a time line and see what's happened
over the patient's history. How do we help people create visualizations? And the vision behind
this is that just like Excel, if you have your data in certain form you should be able to render it
as a table or a bar chart or a pie chart the same way. If you have the text annotated you should
be able to render it in a lot of different visual ways depending on what you are interested in.
There are a lot of libraries out there like D3 and others to help build those types of
visualizations. Some of the visualizations that we found people are interested in with our
collaborators would be like a population view. I want to look for patients who have pneumonia
and there is such a big set of patients to look at, where do I start? If I could cluster them in
ways that are similar, now I can focus on a particular cluster of patients and only look at those
first. It's kind of like an EMR centric view of where you are looking at the patient and if you are
doing chart review on a patient then you want to find a patient that has these symptoms or
physical exam or labs that are indicative of pneumonia. Could you show in one glance the
positive, negative and uncertain evidence which is marked with colors there for the different
features in your domain ontology? And now as they look through that they can click on them
and it shows the text that it came from so that they can see the evidence and just more quickly
peruse the whole patient's record instead of having to search through one document at a time.
Maybe you're looking for a particular case definition. Does this patient fit this case definition in
the chart review? This is often the case. Does this patient fit the CDC's definition of
pneumonia? Here's a diagram of the CDC's definition and you could link the evidence from the
text to the items in the diagram and they can just more quickly go through the diagram and
look at the text and make that conclusion. Timelines are another area that has always been
important but it is so difficult, so we did a little bit of research on this. How do you build a
timeline at is useful from text? Because consider the scenario where you take over a new
patient or you come to the hospital and you've been off for one or two days and now you are
taking over, and you want to say what's happening with this patient last two days? The way it
currently is is you take about three minutes and you look through the most recent reports and
things and that's all you get. But what if you could just summarize everything over time and the
things that were really of interest you could drill down on and look at the text? We started on
this and we built some really cool tools to help users drag over and create timelines. The hard
part for us is not the NLP so much, it's how you put information together. There are a lot of
annotations in a report from NLP system and you can't put all or 100 or 150 of them on this
timeline. Some of them need to go together, like they had a chest x-ray. The chest x-ray
showed this. Those things need to be clustered in one place. It doesn't make sense if they are
in different places on the timeline. And so the cognitive part of what information should be put
together is where we didn't get past that part. Yeah?
>>: [indiscernible] the timeline texts, the sequencing of the texts that represents the timeline,
that assumption must prove to be false. You should actually have to mine.
>> Wendy Chapman: Say that a little louder.
>>: Presumably you can't read the timeline directly off the ordering of the information in the
text. That there would be backwards looking information [indiscernible] that he was admitted
to some other hospital three weeks earlier or something of that nature.
>> Wendy Chapman: You could in the way that this report came sooner is less faraway then
this other report, which was two weeks ago. But it's pretty close. But within the report they
are going to talk about things that happened years ago and they're going to talk about things
that are hypothetical or in the future. And so within the report you have to be able to
understand the relation of those items to the time the report was dictated to. Okay. So some
things, like I said, we have just done preliminary work on these things, but what are the
visualizations that people want? And can we increase people's efficiency in looking at the chart
while not hurting their accuracy? Also in important areas like cognitive bias, if you are looking
at a patient to determine whether to treat them for pneumonia and so you start looking back
on the different evidence and you see some positive evidence for pneumonia, then as a human
you are going to have this cognitive bias to say they have pneumonia because they had a cough.
And now you are going to probably ignore negative evidence and not look for things that will go
against your own hypothesis. And you think about politics, we all do that. And we just feed
ourselves with things we already believe. So how do we prevent against that? If we could
point out to them contradictory evidence and point out to them evidence that is ambiguous or
uncertain, then we might help them make better decisions. We've done a lot of work on, like
what are the things you might point out? Which indicates uncertainty? How do they
linguistically express uncertainty and how does that affect what people see? We are in the
middle of analyzing some data on that. Yeah?
>>: [indiscernible] so you mentioned there are two ways that things could be uncertain. It
could be that the doctor saying [indiscernible] or saying uncertain. So the author could be
based on background knowledge this thing might indicate something other than [indiscernible].
The symptom also suggest like if you consider multiple disease together then you could
potentially explain away some of the evidence.
>> Wendy Chapman: So you are saying that if there are multiple…
>>: If you consider that only any cause for symptoms of pneumonia remind me some other
symptom that could indicate another disease, so when you saw the symptom and you say oh,
this cannot be pneumonia. This must be something else.
>> Wendy Chapman: Yeah. When we do, when you think about pre-annotating text to help
people review it, we typically think of marking the positive evidence. We don't think of marking
the things that oppose that. But we really want them to look at that and so if there are other
diagnoses that they mention that are contradictory or findings that might compete with the
findings that indicate pneumonia then you would want to point those out. In our study we
focused on pneumonia first. We looked at, we had physicians mark, like not radiologists but
physicians mark radiology reports and the other clinical reports and say mark everything that
supports pneumonia, refutes pneumonia or causes uncertainty in your mind. We had seven
people and they were very different on what they thought caused uncertainty. If someone said
this supports and someone said this refutes or someone said this supports and someone said
this is uncertain, then we called that uncertain because it causes disagreement too. And then
we used that information to analyze it linguistically and about 20 percent of it is words like
could be or might be, but the other 80 percent is the particular finding. Like they mention
atelectasis. That's competing to pneumonia. Or they say this is an opacity consistent with
pneumonia, so they are making a finding with a diagnosis, but the reason they're linking is there
is kind of uncertainty there and so… Yep?
>>: [indiscernible] without your visualizations, the way a doctor's writing the medical record
he's kind of assuming that a doctor is going to read the text that was written. So in a way it's
like a visualization. I'm kind of curious if the doctors write in a way of doing that if it's going to
be read by another person, likely they had extra terms or ordered in a certain way. Whereas if
they knew that it was then going to go to a system and be digitalized I wonder if they would
change the way they even write because they could give more detail or write in different order.
>> Wendy Chapman: That's a really good point. And I think sometimes they do think about the
reader and they're really trying to lay out for the reader. And other times they are just more
worried, they think really the purpose of it is just to protect their butts and for documentation
and billing. And so they are copying and pasting and they are putting these huge things in that
are so unreadable and duplicative and not right and they still are doing it because they feel like
no one is going to look at this and it's just a waste of my time. So their intent and what they
think it's going to be used for, I think, very much plays into how they organize it.
>>: [indiscernible] the system so they could just check some boxes and it shows a bunch of text
in there, or do you think they are actually just physically copy and paste?
>> Wendy Chapman: They physically copy and paste. Oftentimes they'll use those reports,
they'll take a whole day to write the report and they will do a little bit at a time because they
are using it to help reason and make their own diagnosis. And then they are waiting for a lab
test. Now they get the results of the lab test. They go in and rather than type here's what the
lab test said, they just copy it and paste it in. Wouldn't it be nice if you had a pointer instead?
And there are all kinds of things and they have their own, depending on the hospital, their own
templates, for their own macros that they can put in that they have made up. Yeah?
>>: I'm a physician and I've been a physician for 35 years. I moved from doing things on paper
to doing things electronic, dictation. What I see happening in the vendor community right now
and the reason I applaud this work is that if anything the vendor community is trying to get
physicians totally off of free text and creating solutions that are an absolute nightmare of click
boxes and trying to codify everything into individual units. You mentioned copying and pasting.
If anything, the whole industry is coming down on that because of all of the errors it's
introducing in the medical record. Do you see, I see the research applications for this. My
concern is really what we're doing to the end-users in the clinical space. I am just seeing my
colleagues now just suffering. Things that used to take two seconds to do now take 10 minutes
to do and they are all complaining about it. So as you look at your work and what people in
your field are trying to do, what's the direction? Is it more towards, you know, reviewing these
large bodies of data from population health and sort of codifying it or how much of it is really
aimed at helping the end-user?
>> Wendy Chapman: Yeah. I think the end-user gets ignored. The people making the decisions
about what vendor systems to buy, et cetera, it's all based on they are going to improve their
billing. They are going to improve their population management which will cut down their
costs and things like that and the physicians and nurses they just get stuck. And so we have a
faculty member, Charlene Weir who is a cognitive psychologist and she is going around and she
is observing them as we installed this. She's observing what they are doing and making note of
what are the pain points, what's going well. And when we bring those to our CMIO he's like I
already know about those things. Don't tell me those things because there is nothing we can
do about them. And now you're just getting people's hopes up and they think we're going to
change and we are not because there is nothing we can do about it. So it's this very fatalistic
and they are just ah. So I think one of the big paradigm shifters that is coming is kind of the
smart on FHIR. Have people heard about this? This FHIR, this new interoperability standard,
FHIR, and it's, we could have a whole talk about that. But it opens up the opportunity that you
can say, for instance, in rheumatology, okay, dermatology, our dermatology chair is just so
angry about it because it has cut down the number of patients that they can see by almost a
third. So they have lost so much money and they just spend it all documenting. Because Epic
doesn't have a good dermatology, they are not going to pay attention to dermatologists. It's a
very small market. So there are companies out there that create dermatology interfaces and
they can just go really fast and it's very intuitive and its visual, but it doesn't hook onto Epic. So
no way are we going to consider it. But with FHIR you can, if you can write to the same
specifications and if epic supports it, now you can plug it into epic, plug it into Cerner and you
can create these custom interfaces that really helped the end-user. I see that as a hopeful area
in the future.
>>: Is your work really balanced sort of to both and as a research context end-user experience?
>> Wendy Chapman: Yeah. Back to NLP. I think that NLP has to play a large role in that. I think
that there are some things that should be filled out in structured form, but there are a lot of
things that you just need the text. When you say the patient crawled to her mailbox to get her
mail, there is no checkbox for that and there is a lot of information that that tells you about, the
patient's motivation, their physical state. There are things, stories that need to be told in text
and the NLP needs to be there and I see it as a really semi automated thing.
>>: What we've done is we are forcing humans to change behaviors to suit the machine versus
the other way around.
>>: They are the most inexpensive piece of the whole system. Doctors have been practicing,
been in school for 12 years. Is there anything in here that is also speech, dictation as to
automatic translation and then either feedback that says, especially when they are going into a
report area, it's much better if you get it at the beginning. You had called out I want to do a
[indiscernible]. We are not ready for doing it all on our own machines. They are not capable of
doing it. But have you done anything in terms of usability then how do you make it so that this
is a system technology, so they can be much more productive? Our new charter here at
Microsoft is to make everybody more productive blah blah blah. That didn't come out quite
right; did it?
>>: We've got you on record. [multiple speakers] [indiscernible]
>>: But I mean I think that a lot of advances can be made if we try to do a machine man
collaborative thing.
>> Wendy Chapman: Creating this is interactive, you put you're like in radiology or anything, if
you are doing speech at the same time and then you are looking at completeness. You
mentioned this but you didn't mention this and we expect to see that so you can help with
completeness. You can ask questions. These kind of contradict. You can really help improve
the quality if people are willing and I don't know. I think speech is getting to that accuracy and
so I think there is more possibility for it.
>>: There's a whole lot of training you have to do to specialize in the space but yeah, in terms
of where we going, it's getting much better than it was, but there is still the context and all
sorts of stuff has to go up.
>> Wendy Chapman: This is one reason why NLP and other medical type applications are so
slow. I mean, they have to understand a lot of domain knowledge and so that takes a lot of
customization, a lot of fine-tuning around a particular domain.
>>: There used to be a lot of dictation and transcriptions. Is that still going on?
>> Wendy Chapman: Yeah.
>>: So would step one for getting the domain knowledge be working with a transcriptionist to
make transcriptions more productive? I mean here's the transcription. Here's the dialogue.
Here's the everything going on, so doing a partial.
>> Wendy Chapman: It's not real-time. You lose that real-time opportunity.
>>: Agreed, but step one is…
>>: You have the opportunity to learn from the transcription's corrections to the transcript.
>> Wendy Chapman: Right. That's a business model for M Modal. I don't know if you know
the M Modal company in Pittsburgh. They are a transcription service but they do it through
speech recognition. The transcriptionist changes it and then they run NLP and then the user
corrects it and they send back the coded document. It's really nice model, but it's not real-time.
But it could get to real-time.
>>: You could get to real-time with advances in that. And you did say that it was M Modal?
>> Wendy Chapman: M Modal.
>>: But they are feeding back in those changes so they are building the learning system so that
hopefully you are getting more and more productive and the transcriptionist is faster and
better and less time is spent on each one with fewer errors.
>> Wendy Chapman: That's right. And that's another reason why industry and research really
used to work together on this because the researchers have the idea about the bigger picture
about the workflow and the detailed kind of NLP things, but we are not going to build an
interactive speech recognition system for radiology that's really going to be deployable in the
next few years on our research grants. Maybe you guys can. Hopefully you guys can. Okay so
NLP is starting to show up places and the promise of electronic medical records, the promise of
natural language processing may be closer than ever. This was in JAMA which was amazing to
have an article about NLP and an editorial about NLP in JAMA. But it's around the corner and
that corner just doesn't feel like it's getting very much closer. But as we start to build
applications that will assist users, I think that we can get closer to that. And it's not throwing
out researchers. There are all kinds of research questions that have to be answered to really
get there so I say come join us in this quest. And I want to acknowledge my collaborators and
my lab and conclude with, do you guys know where this is? It's sand. That's White Sands
National Park in New Mexico. So thinking about context, you are probably thinking that I'm
from Utah. I just talked about a snowstorm. But it's not. It's sand in the summer. Thank you
everybody and I will take more questions. [applause].
>>: A question for you in medical devices, so you are now starting to advise a physician. The
transition from advised to practicing medicine and medical devices, what is it where you have
to start having the FDA and the whole approval process in your thinking?
>> Wendy Chapman: I think everyone is really worried about that. And I don't know the
answer. And it could change at any time, so I think right now it's okay. It's safe with the
support systems, but at some point it's going to flip. So I'm not sure. That's a good question.
>>: As soon as he gets to the point where the human i.e. clinician is looking at the data to
render a diagnosis or making a decision. That's when the FDA starts getting a little excited.
>>: What I saw the support was we see this and we think this is the treatment plan. Here's the
biotic. That's practice to me and I would have got that one…
>>: That gets them a little excited.
>> Wendy Chapman: But there is also the thing that eventually it comes to the point where if
you are not giving them that advice then you're doing something wrong. That's malpractice
because that knowledge is available. Why are you not pulling up that information that is there
for them that they can't find because you have hidden it in so many places? Why are you not
giving it to them so they can make a better decision? And there are guidelines about here is
what you always do in this case, so I think it's complicated.
>>: There is lots of other mining. You talked briefly before, part of this is like if you wanted to
go back and look at machine learning over [indiscernible] and you want to actually improve care
and you want to move the state-of-the-art forward, we have to have certain things coming out
of the medical records, which is sort of all of the germane things that lead up to the disease.
You've got a diagnosis. You've got the germane things during treatment and then outcomes. Is
there any hope that we'll see this in a concise form, reasonable form, complete form? There's
never going to be a complete form in my mental model, but are we, where are we on the path
to actually being able to generate that in the example of 1 million veterans program or any
other?
>> Wendy Chapman: To be able to generate a kind of a, all of the data that you need to do the
genome type phenotype and really do personalize treatment…
>> Whether it's genotype phenotype and you just are doing best practices, so if clinic A in South
Dakota is doing something or other and they are doing something very different over in
Minneapolis, you would like to make sure that one, the cases are similar enough, if not
identical, which is never the case. But close enough that when they got the care in Minneapolis
working and now they want to help disseminate that information, how do you make sure that
the evidence is there supporting this is a best practice, and that it is there and that we are not
going to be doing clinical trials when we start getting down to the personalized medicine, so the
clinical trials? And of course some small number, you are going to be using statistics. You are
going to be using this, which is all learning which means we've got to be extracting a
computable form of the medical record.
>> Wendy Chapman: Yeah, I think there are a lot of steps in making that happen and we are on
that path. One important step is a system called Open Computerized Decision Support, or
Open Clinical Decision Report. It's an open-source way of delivering that knowledge so that if
you author your knowledge with these certain standards in Minneapolis at the Mayo and it
works well, now someone else can port it to their own place. Whereas now, that's not possible,
and so being able to share that knowledge and deliver it in a different institution is one of the
major steps needed for that.
>>: Is there a legal framework in place to do this sharing?
>> Wendy Chapman: I think so. I think as long as it's not actual data about your patients,
people are fairly happy to share their guidelines. They publish them and putting them in
computable format, I think people are willing to do that in general. I could be wrong.
>>: That's not the statistical. That's here's my recommendations and so they have already
done whatever statistics they have.
>> Wendy Chapman: I'm sure they would be willing to share it for a price. I mean people want,
hospitals want to become more commercial and they create a predictive analytics system that
works well, they would like to market and share it in that way.
>>: [indiscernible] socialized medicine Canada, China.
>> Wendy Chapman: Yeah, I think there's a desire to share knowledge, at least in the academic
medical centers. And you're the leader. You showed that this work and you are the leader and
now people are adopting it not shows that, your influence.
>>: Could you perhaps say something more about the MPP project and what is the promise of
it and what are the barriers to actually making it happen?
>> Wendy Chapman: Okay. I don't know very much about that project, really. I helped apply
for one grant that would use the data from it, so I know it's hard to be able to use the data.
And I think they have maybe 100,000 patients' genotypes so far and in the next year they will
have 200,000, so they have this kind of phased plan. The genotype information is now linked to
all of their patient records which are available to researchers. Beyond that, I don't really…
>>: So you mentioned you have this rant and how long do they expect to get the actual data?
>> Wendy Chapman: We applied for a grant to use the million veterans' data and we haven't
heard back whether we got the grant or not. But it's my impression that you can just use it.
You have to apply for these opportunities. It's not just there for everyone to use. I'm not
positive about that but I'm thinking that's true.
>>: [indiscernible] you could use it on your computers.
>> Wendy Chapman: Oh yeah. We were talking about that and the access that we did was very
difficult because there are lots of hurdles to getting to the permission and when you do get the
permission it has to be on their servers and they are very difficult to do, they crash. You can get
to them. It's just ahh. You can't run your system because it's too, it takes too much CPU or
whatever and they don't have it, so there are lots of barriers.
>>: Do you see this as a problem compared to all of the other shareable data?
>> Wendy Chapman: Yeah, just by a few more servers, put it on the cloud.
>>: We have spoken with people at the VA and they are by and large Microsoft shop.
>>: I was there.
>>: But they do need help.
>> Wendy Chapman: They do. And they are not going to let you put that data somewhere else.
>>: [indiscernible]
>> Wendy Chapman: Yeah, that's the problem, the funding for it. They really don't like to
spend money on IT.
>>: [indiscernible] sequencing [indiscernible] genotype things instead of [indiscernible] even
[indiscernible] sequencing you don't have the funds. They have huge numbers of samples of
biodata. They don't have the funding authorization to go all through it.
>> Wendy Chapman: To go through it?
>>: And they have a lot of people who said yes you can do it but they don't have the process to
actually get it done.
>> Wendy Chapman: That makes sense Uh-huh.
>>: [indiscernible] barriers, because you were talking about open source and pick your
workbench. Is that getting, are other institutions kind of taking that on or what is the barrier to
getting that more widely adopted?
>> Wendy Chapman: The idea like a workbench to pull into the different NLP tools? Well we
haven't really developed it out very well yet, so that's the first barrier. I don't know. I would be
interested in your guys' thoughts about clinical NLP. It feels to me like 100 people working on
the same things. There aren't aligned incentives to really collaborate. You get your grant. You
work on your research, so it's likely need someone to find development of application and then
you use the resources that people are developing and use them as consultants. But we as
researchers are not going to be able to build out that big thing and sustain it. That's what I
think. And the VA seems like the place that could potentially do that. They could hire someone
to really build the applied, pull it all together kind of thing and maintain, in theory.
>>: You have a slide of the African-American person. You imagine that as the end-user? That's
the clinician wanting to see the information?
>> Wendy Chapman: Yes. That is then defining with a kind of variable so they won't extract
from text. Uh-huh. A lot of us make our things open source and so they are available that way.
>>: But there is still no, I mean you have made your models available.
>>: Yeah, we make the models [indiscernible] we sometimes can't make models available. I
think the main bottleneck is the data sharing. I think it's not a good idea to have people trying
to be able to [indiscernible] because the description of that [indiscernible] may not be
addressing the needs. I think everybody is passionate about the data and the source code
should be available.
>> Wendy Chapman: Yeah. And then if you could build ways to quickly draw on all of those
different tools and evaluate them because it's a lot of work to adopt someone else's tool and
map it to yours so that the input and the output line. It's a huge amount of work and when
there are dozens of them.
>>: [indiscernible] models and lots of people want to share data just from early Windows all
the way up in all sorts of things. It's very hard making a model that people will do. What's the
value? Until you identify a significantly important piece of work that is bringing value that
people see, they appreciate, they share in common amongst a lot of people, it's going to be
very, very hard to pull any of this together. So getting a step that is going to gain enough
momentum to say I have improved my productivity and you have done whatever the right
metric is, getting a lot of people, especially researchers who are by their nature independent
and want to be pushing the limits on different things, care less about delivering value as much
as working on exciting and interesting problems. It's going to be really hard.
>> Wendy Chapman: Yeah, and at the same time, if you just rely on companies to create these
tools without researchers' input, you get epic concern and things that are not really useful in
the real world.
>>: Yeah, where's customer value and if you are looking at it from the IT and business side, as
you mentioned from the beginning it's very, very different than looking at it from the patient
side. And where is the value to the patient? Is a getting better outcomes? No. I'm getting
better billings. I'm getting better charging. I'm tracking the business better and for lack of a
better word, the bean counters are getting a little happier, but are we really pushing that life
expectancy up? Are we driving the costs down? No. We are not driving the costs down. The
bean counters are there because any cost that they take out of the system they are going to put
it in their own pocket, so the costs are not, that's not a capitalist incentive.
>> Wendy Chapman: Yeah, so we might be getting to a time in the future, we're getting to a
time where the value for the patient and the value for the institution are getting aligned. And
in that case I think there will be more, they will see more value in the research tools that come
out.
>>: From our experience for [indiscernible] for the last six years now at the medical center, so
in our experience the most exciting projects usually come from creations when they are super
excited about something. For example, we finished a project very recently and made like that
[indiscernible] speech where maybe with this app we need doctor tools to the phone and like to
a secure channel is transferred to a server and then it's mapped to a text file where the clinician
himself [indiscernible] and the [indiscernible] some sort of light parsing on the total fixed so
that it will be more like nice in terms of [indiscernible] purposes. And now we should apply this
where the clinicians are measuring their time, like how much they are inputting their time in
terms of cost and also like to provide like how many times they are making less errors kind of
analysis and they are loving it.
>>: If you can get the data to the server quickly and you can do the analysis and you can be
assistive and you have ambiguity in this diagnosis you can resolve some of this ambiguity by
doing this test and giving them constructive feedback to drive to a better one. But you have to
be able to do that quickly and that's, you know, how do we spin that up? It involves
[indiscernible]. You can do a small project.
>>: Yeah, a small private project, that was kind of a nice way to measure the impact of an
automated system like how people are receptive to use that rather than seeing doctors and 10
patients [indiscernible]
>>: And I think that here you are seeing the value of the people that are going to be advocating
for it are not the NLP people. It's as you said earlier the end-user and it's a user centered
design. How do you get the high-value people in the system saying this is adding value to me. I
am being more productive. I am being more accurate. I am being more whatever because I am
getting this assistant technology.
>> Wendy Chapman: Yeah.
>>: How do you maintain freshness of the data? For example, if there is a treatment and a new
drug comes out, within three weeks everybody needs to be using it because this is the one that
has the right effect, but it's not going to show up in your data because it hasn't been processed
yet. How do you do that?
>> Wendy Chapman: That's a big problem, the lag in discovery and application. And people are
shown that, I think it's 17 years.
>>: I have heard 10 to 17 years.
>> Wendy Chapman: Between the time when everybody realizes and agrees that something is
the important way and it's actually used regularly in practice, so if you are going to mine clinical
records to try to learn things, you are going to be way behind. I think that's right. And we have
to keep that in mind. It doesn't mean don't do it, but it means just because this is how people
do it doesn't mean that's right way to do it, necessarily. Yeah, because then you need the
literature combined with the clinical record.
>>: Compare the records and make sure they don't contradict each other.
>> Wendy Chapman: Yeah, that's always interesting.
>> Lucy Van der Wende: Thank you so much.
>> Wendy Chapman: Yeah, thank you so much. [applause]
Download