>> Lucy Van der Wende: Good morning everyone. My name is Lucy Van der Wende and I would like to introduce Wendy Chapman who is going to be giving the talk this morning. Thank you very much for coming to Seattle when it's living up to its reputation for being rainy. Wendy is the chair of the Biomedical Informatics Department at the University of Utah and her work lies in the intersection of clinical research, NLP and human computer interface, or interaction, HCI. She's the leading figure, really, the leading figure in biomedical informatics. Wei Fung and I met with her. We been known of Wendy for a long time, but we met at the NLP workshop that was convened at the Veterans Administration Department where Wendy is an affiliate faculty. As some of you may know, the Veterans Administration has one of the largest and most comprehensive electronic medical records systems. It has pioneered the Million Veterans Program where it has sought consent for research to sequence the genomics and access the records, the EMR records for a million veterans, which is a unique resource and Wendy knows about this and hopefully will tell us more about data mining information extraction and how we can use that information to improve the outcome for our patients. Thank you. >> Wendy Chapman: It's such a pleasure to be here. I really appreciate the opportunity. So it is rainy here and when I left Salt Lake City it was 36 degrees and sunny and when I came here was 46 and cloudy and felt a lot colder, that humidity. But there's been a big snowstorm there, so the university is all but closed down today. My husband bike to work on a fat bike, so it was very difficult in the snow and he sent me pictures so I've received a lot of texts and phone calls about getting into the office today. >>: They got their skis open and ready now? >> Wendy Chapman: If you had your skis, but it's all uphill there. Going down would be fine. I'm really happy to be here and my focus in research is natural language processing, in particular, information extraction. That's what my talk will be focused on. I'll give you a little bit of background. I talked with Wei Fung. I lived in Hong Kong and learned Cantonese and so that's really how I got this love of language. I went back to the University of Utah and I studied linguistics and Mandarin Chinese for my bachelors degree. In between here I went to the University of Wisconsin to study Chinese literature. That's what I was going to get my PhD in and I submitted but didn't have funding. I waited another year, still no funding and in the meantime my husband, who Lucy met at a Johns Hopkins workshop, found medical informatics and wanted to do that. He was moving from electrical engineering. And then I saw that they do natural language processing and I thought that seems like a really nice way to plan my love of linguistics to something practical. So I signed up, led in on probation and the rest is all history. I fell in love with the field. I went to Pittsburgh to do a postdoc for three years and then stayed on there as faculty, so I was in Pittsburgh for 10 years. I moved to UC San Diego in my attempt to get back to the West and then this opportunity came open at the University of Utah for the chair position. It was a big career switch for me. I still do research but about three fourths of the day a week. I have a really great team that's moving forward well, but I'm not in the details as much as I would like to be. I want to give a little bit of context about healthcare in the United States. Here we have a graph. The x-axis is the amount of spending, the total expenditure of healthcare per capita. Then and the y-axis is the life expectancy. You can see that we spend way more than anybody else and our life expectancy is nothing to brag about. So the U.S. healthcare right now is in crisis. That means that the economy is in crisis because our healthcare spending is such a huge part of our economy. Some people see crisis as an opportunity. That's what I'm learning in all of my leadership classes. [laughter]. And so there's this great, and it really is an opportunity to transform healthcare because the weight healthcare has been run is so awful. The incentives have really benefited hospitals and doctors, but they haven't benefited patients. And I think in the next ten years we are going to see a whole different world where it's really patient centered and so it really is an opportunity and the pressure is from the finances. There is a big movement that we have a lot of data now. There's a lot of digital data and so we need to be learning from that data. Every patient that comes in needs to be learned from, so that we know what are the better treatments and what treatments don't work, and then with all the big data science and big data analytics. Here's an article in JAMA about academic health centers are really at risk right now because they're more expensive. Why is someone going to pay more to go to an academic health center rather than to a community health center? And really what they see is that the advantage of an academic health center is the research, that if you can translate that research and if you can take all the knowledge that you have and apply it to the data and learn from it and create better practices and implement them, then you can really have an edge. And that's the only way that we're going to really get to where we want to go. I would say then that healthcare transformation needs natural language processing because a lot of the information in the electronic medical record is in text. I'll give you a couple of examples. First of all, in clinical decision support there is a system called the antibiotic assistant that's implemented at Intermountain Healthcare and it monitors in the background a patient's temperature, white blood cell count and all these different variables to determine whether or not the patient develops a new infection in the hospital. If they do develop a new infection it will alert the physician and it will say we think your patient has an infection. Here's why. Here's the evidence. Here's the dose and the type of antibiotic we think they should take based on their insurance and allergies et cetera. It's a very popular program and has saved a lot of money and a lot of lives. It needs information from the chest x-ray report to be able to say does the patient have an infiltrate that is indicative of an ammonia. Another area in the clinical research from is readmissions to the hospital. It costs the Medicare program a lot of money. And a huge portion of that is potentially preventable. We can prevent these readmissions. There are a lot of readmission models that are being created and they are created from ICD codes, discharge diagnoses, lab values, these coded data that are in databases. But they don't have very good predictive value. Some people are hypothesizing if we can get information out of the text, which includes more detailed symptoms, but also social risk factors that really make that patient at risk for not taking care of themselves, we might be able to improve our prediction of readmission. And the social risk factors are things like do they have a stable housing situation? Are they abusing substances? What is their living conditions? Do they have social support at home? Can they bathe themselves? These kinds of things really affect whether a patient is going to care for their wound and take their medications. And these are all described in various ways in the text. So healthcare transformation needs natural language processing and, indeed, about 70 percent of the clinical data that we are interested in is locked inside this text. Jeff Hamerbacher who is the founder and chief scientist of Cloudera and was at Facebook previously, said the best minds of my generation are thinking about how to make people click ads. That sucks. So I'm here to say there are some great minds here. We need some of that brainpower to transform healthcare. My objectives are first to convince you, because we already have two people here who are already at Microsoft and working in natural language processing in the medical domain. But out of a lot of researchers at Microsoft, let's get more people working on this area. And then to talk about what do we need to do to make it a product and make it effective and be used, and so focusing on some informatics principles where we are not just trying to improve scores a little bit at a time on little parts, but how do we really build something that is usable? I don't have the answer. I can just point out some principles and some problems. People know about natural language processing more now because of Jeopardy. And there was a big follow-up in Computerworld. I just love this article. It could very well herald a whole new era in medicine. Like no one had ever thought of a client computer's medicine until Jeopardy. And so now there's more knowledge about what we can do. People have been working on natural language processing applied to clinical reports in a variety of different domains. And in these focused areas we can build systems that perform as well as people. But clinical NLP has been a research focus since the 1960s, so why do we still not have an NLP system in every hospital? Why are we not just annotating, automatically annotating all of the data that's coming through and storing that in coded form? There are a few barriers, I think, that have put us way behind the other NLP, the general NLP field. The main one is getting the data. Sharing clinical data is just so difficult. We haven't had shared data sets for developments in evaluation and when we try to adapt modules that are trained on general English, they just don't work as well. We haven't had standard conventions for annotations and so everyone creates their own annotated corpora and nobody can share and so it's one person at a time and a lot of preposition. In the past there wasn't a lot of collaboration in NLP. There were a few people that were the main NLP people but there wasn't a lot of collaboration. I would say over the past five years these things have changed to large extents. We've developed resources that are shareable. We've created common schemas that people use and there's a lot more collaboration going on. But it's slow progress. But to me, the biggest barrier is that what we build as NLP researchers is just so far upstream from what people need that there is just a huge gap. I would claim that if we want to have impact we have to go beyond improving our accuracy of the individual tools to creating things that can be applied to real world problems. I want to talk about three informatics principles that I think we could apply to NLP that can help in this area. First of all, to be application driven. Second, think about user centered design and third pay attention to standards. I'll go through each one of those in a little bit in detail. When you run an NLP system and an information extraction system on a sentence like no family history of colon cancer, there is a pipeline with all different types of NLP tools and you break up the sentence into its syntactic parts and you assign semantic values like maps to this vocabulary item and it's negated. That's the output and it's very important and it's hard to get that output and get it to be accurate. But what the users want is not show me all of the UMLS concepts in the text and tell me if they are negated. They want to know how do I improve, was my colonoscopy exam high-quality and if not, why? Find patients with cravisnosis [phonetic] so that I can see whether medication works or surgery. They definitely want to know how do I get higher billing codes? [laughter]. That's one area where industry has jumped in and really helped out, because there is a business case, right? How I spend less time documenting? And how do I find all the information that people have already documented? I can't find what I'm looking for. There's too much information. How do we help patients understand the reports. So these are the types of applications that people want. So there is this big gap between the NLP output and these applications. And NLP researchers might not be the ones driving these applications, but they need to be involved and partnering on those. The difficult part though is how do you be application driven and still develop general-purpose tools, because we can build an application for one particular thing and then when we want to build it for another diagnosis we start from scratch. And so it's really finding that balance between being application driven and being general-purpose. To do that we really need this strong partnership with domain experts who have the insight about what the data are needed for. An example of that would be if you're going to create a knowledge base for cough… I started this work when I was a postdoc and I went to the National Library of Medicine and I said I'm going to build something to find out if there's respiratory findings. So first I'm just going to find all of the UMLS concepts that map to the concepts that I care about. I just had no idea that in cough there were 20 UMLS concepts for cough. I sometimes wouldn't find them all either. So what do you mean by cough? We have to explicitly model what we mean by cough. When we look at things like I want to find patients with fever, it's not just looking for words with fever and febrile, but there's attribute value pairs that have to be found. Those dependent on the application that you are rebuilding. So how do you represent your knowledge in a way so that for this application they have the threshold at 38 and in another application they might have it at 37. How do you not just hardcode everything you are doing for every single application? That is not scalable. The second principle is to be user centered. We need to be able to support users and so we have to think about the way their brains work and then we have to fit it into the workflow because it's not an individual taking care of a patient; it's a team and there's this whole workflow. And how does this information fit into the workflow? We first build accurate tools, but then beyond that they have to be useful and beyond that you have to be scalable and deployable. Those are things that we spent a lot of time focusing on the accuracy, but not on the other parts. I think that to really succeed in healthcare and a lot of other domains, but healthcare is complicated, the technical is just one tiny part and there are all these spheres of the clinical and sociological, political and commercial spheres that we have to be aware of to be able to really build the tools that people need. And finally paying attention to standards. We need to better leverage existing resources so that what we build is interoperable. There are vocabularies and we do that in large part with vocabularies, but information modeling, we don't only want to know that this maps to a certain vocabulary item. We need to know the context of it. We need to know for blood pressure, we need to know what position was the patient in? What was used to take the blood pressure? There is all of this information that goes with this metadata that goes with that concept or that action that is important for interpretation. As NLP developers, we need to be extracting all of that type of information and modeling. And then how do we model it in a way that we can use it in different EMRs and different settings? How does this all relate to NLP researchers? I would say that a lot of NLP research problems that we work on are really far upstream from the healthcare applications. But there are a lot of new interesting NLP research problems that arise when you are working on user driven development types of applications. So it's not like you're abandoning research and saying oh I'm just going to be applied. There are many research problems that come up when you're trying to build things that people use. I want to talk a little bit about the work that we're doing to try to bridge this gap in our lab and it's one small part of the world. Our lab is the Biomedical Language Understanding Lab. That's an old slide. It still says University of California San Diego. We hadn't moved our website, but we did move it now so we need to replace that slide. So we are building a toolkit and this is funded from the VA called IE-Viz, information extraction and visualization and it's a workbench to help people, to help domain experts and NLP experts collaborate and build applications that are useful while taking advantage of existing NLP tools. It has four parts to it. First you need to create your knowledge base about what you are trying to represent and what you want to extract. Next you need to create NLP tools. By create, there are a lot of NLP tools out there and you apply things had already exist and compare them and use the Watson model of lots of different evidence coming in to develop the best tool for each thing you are trying to extract. Oftentimes extraction is part of the problem, but sometimes you only need a classifier. Sometimes you need a classifier on top after the extraction, so how do we help people build classifiers that integrate knowledge from the NLP that is beyond bag of words? And then, and we haven't really gotten much to this part with the visualization, but they don't want an XML file with a bunch of concepts marked. They want a graph or they want a timeline or something like that. So how do we help them told that from the NLP output? The first step then is knowledge authoring. We've developed two ontologies, a domain schema ontology and a modifier ontology. The domain schema lists the, it's a linguistic representation of the clinical elements that can be described in text. The modifier ontology tells which modifiers are allowable by each of those elements, so that when you extract the information from a sentence like this, you know here is the disease. Who experienced it, was it negated or not? Was it historical or not? And so it's an information model around the concept of cancer. So these ontologies, and when I say the word ontology, they are not ontologies like the kind of ontologies that represent reality. They're representing information in text, so they are lexical research ontologies. Here we have that there are different elements that you can describe and report. There are entities, like a person, and there are events. Most things that you describe our events, like allergies, problems which are diagnoses findings, et cetera, vital signs. Those are the types of things that you might see described in the clinical report that you're interested in extracting. They can have relationships with each other. One finding can be evidence of a diagnosis. Medication can treat a disease, and so on. We can model those relationships and then those elements, the modifiers are very important for understanding what is going on when you describe something in the text. You can see the word pneumonia in all three of these sentences and if all you're looking for is pneumonia you're going to misinterpret it because everyone of those has a different interpretation. Knowing which modifiers are allowable for each of those is very important, so that's what the modifier ontology is. And what is is it started out as the neg x knowledge base and stored in owl format and then context which is an algorithm that we developed. It extended from there and then we added more and more modifiers, so it's kind of an extension from that. It has different types of modifiers. Like does something exist, and it definitely exists or is there uncertainty about the existence? Is it talking about the future? Is it talking about past? Is an indication for exam? Et cetera. But it also is a lexicon and so it has linguistic expressions that we've seen that indicate that so for historical, again noted, previous, changing, those are things that indicate that something happened in the past. It has actions because the scope of the modifier is it important and sometimes the scope, most of the time the scope goes forward like no tumor, but sometimes it goes backwards like tumor free. And so everything you need to run neg x or context is encoded in here. The direction that it goes, and then it's translated into some languages, Swedish, German and French right now. And so a lot of people have written papers about applying the algorithm with these terms two different languages and how well does it transfer. The schema ontology imports the modifier ontology. If you have a medication event then the modifier ontology says you also need to know, you can know the type, the dose, the frequency and the root of the medication. If you have a diagnosis event then it can have severity. It can have the history. And we've developed this from models that have been built out of the sharp project. Some people here are familiar with the sharp project. We've built a cTAKES, and common type system and that was the basis of our ontologies and we've extended beyond there to map to information models in the clinical world. They are mapped to FHIR now for people who are interested in HL7 FHIR. What these allow the user to do is to create a domain ontology that's used for natural language processing. Demand ontology would be instance of the schema ontology and it would represent the linguistic information that you want to know about clinical elements in a particular domain. So if you are working on the monument, for instance, then what are all the concepts you care about that indicate pneumonia? And then you will use that as the knowledge base for the natural language processing system and the target output. I'll give you an example. Here is a domain ontology for pneumonia and it's just the very beginning of one and so we see under diagnosis that we have altered mental status, heart condition, from condition and pneumonia. Those are four instances of the class diagnosis, but in there then we have the whole lexicon. What are the synonyms for pneumonia? What are the misspellings? What our regular expressions? If there were numeric values like if there is a fever, you would have numeric values to go along with it. And so you can explicitly define what you are looking for. And the goal behind this is to create these potentially open shareable knowledge representation modules that people could borrow. And so when someone else is looking for pneumonia they don't have to start from scratch. I mean how many of us have written about pneumonia? Several of us in this room. But if someone else is going to work on pneumonia with NLP they start from scratch and build up their lexical expressions and their synonyms. Wouldn't it be nice if you could go to this library and you could say here are definitions of pneumonia that people have had. You could borrow and tweak and kind of customize to start with what somebody else has already done. >>: This happens sometime in my [indiscernible]. Do you think there would be a use case where it happens that the [indiscernible] and you actually want to capture the [inaudible]. >> Wendy Chapman: Like the regular expressions? >>: Yeah. Like maybe a neg x in one case and then [indiscernible] or something [inaudible]. >> Wendy Chapman: Part of addressing the ambiguity is specifying the modifiers. The word sense disambiguation that might occur isn't addressed in this way. That's definitely something that would have to be on top of this. We've built a front-end interface because using protégé is not natural for everybody. It's called Knowledge Author and on the back and you have the two ontologies that the user doesn't have to know anything about. It's just what's driving the questions you ask the user. And the output is a domain ontology and a schema for NLP systems. For instance, you might want to create a variable called African-American adult. You would create a person role, a person and you can define their age. You can define their gender, their race, their death date, birthday, all the attributes and the modifiers that occur with a person. You can create very specific variables that you're trying to extract. That's where the differences between a general NLP system that's trying to output a concept like cough and a specific… You know, in one study you might want productive cough. In another study you might want severe productive cough. In a different study you might want mild productive cough. In another study want that they don't have a cough. So all of those modifiers they're particular to applications that a single group or person is interested in and we want to be able to model all of that and that's where the information modeling comes in. It gives you the power to create exactly what you are trying to filter on. Maybe what you're looking for his patients who are taking ibuprofen. And so when you type in ibuprofen she will map to the UMLS and you can select the concepts that you want and when you select them, now all the synonyms and acronyms that are stored in that knowledgebase become part of your knowledge base. But you might not just be interested in the mention of ibuprofen. You want to know that they are taking ibuprofen orally. With the modifier of form, you can say I only want things that are oral and so you are building these very specific variables. No family history of colon cancer, to a physician this is one variable. They have no family history of colon cancer, but in NLP, this is a lot of different parts. It's cancer; that's the concept. It's the anatomical location of colon. It's occurred in the past. It's talking about the past history for a family member and it didn't occur. So that's a lot of things that the NLP system has to output and so what we did is split them out into these kinds of linguistic variables like negation. Who experienced it and is it in the past, current or future and those apply to all the different schema elements. But then for cancer, which is a disease, then you can also look at severity and other things like that. We did a lot of user studies to try to figure out how to model this in a way that we could get the domain experts to understand it because it seemed so simple from an NLP point of view. What is that negation? But to them it's just one concept. And then we have also worked on how do we suggest synonyms to them because you can only think of so many synonyms. And then different algorithms for mining text, bringing forward synonyms and letting them select them. There are a lot of different research questions that we have addressed in the area of knowledge authoring. Which modifier is important? The modifier ontology has way too many and no one would want to use all of those modifiers. It's every possible modifier that has been used in any clinical modeling. How well do people agree when they annotate them? Because if you can't get people to agree, it's very difficult to create a system that can do it. Some of them are very difficult to get agreement on, like uncertainty. How can we learn the terms? And that's what we have been talking with Wei Fung about, can we mine text and bootstrap and help learn these terms and can we suggest other types of things like medications or treatments that might indicate the patient has that disease when the text doesn't explicitly say it. So lots of fun research questions there and many of them unanswered. Once you create a domain ontology, now you have the opportunity, you have all of your knowledge explicitly defined. You can now start running some NLP tools over your text and see what pops up. That's the NLP customization part of the workbench. Our vision is that there would be a lot of different NLP tools that you have access to. Why limit yourself? There are so many different tools available and some might perform better on some variables and others on others. If you can set of kind of a customization loop where you have the knowledge explicitly defined, now you can run several different tools over using that knowledge, bring it back and show it to the user or have some kind of gold standard. It's kind of an iterative thing where they are correcting it. They are marking things that are wrong. It's learning and over time it is kind of selecting and optimizing the best combination of tools for each different variable. In theory, there's no reason why you can't have one variable, one tool for five of the variables that you are looking for and a different tool for three of them and another one for another six, especially if you start simple. Sometimes keyword matching is good enough for some things when you add negation. Other things, you might need a machine learning classifier. Other things you might need some text, but why use the more sophisticated things on the things that you can get in an easier way? So the way to help user interact with the system and develop those. But to do those you really need some good tools for the user to interact with. We've spent quite a bit of time developing some different tools and it still doesn't feel like there's one tool. It feels like different tools have different strengths. This is the evaluation workbench where you read in two annotate of sets. It might be that one's a gold standard and one's your system. It might be two different systems. But you're able to compare them against each other now and if you consider one of them right and one of them the gold standard, then you can look at the false positives and false negatives and this allows you to drill down and see not only the extracted named entities but also all of their attributes. You can look at the attributes and really do in error analysis to see where your system screwed up compared to the gold standard. That's one of the tools that we developed. This is another tool that we've been developing with colleagues at University Pittsburgh. Lucy probably knows Jan Weibe and Rebecca Hwa. This is a visualization tool where the table here are the variables that you created in the domain ontology and they are binary. They are true or false. Did they occur? And you give it a couple of training examples and now it learns to annotate those. It's a classifier and now you can start to drill down and give you feedback and highlight text that shows no. You are wrong on this and here is how I know. And so it has things like this word tree. You're looking at the word biopsy and it will show all of the words that occur with biopsy before and after biopsy and whether they were true or false in the text and as you click on them it takes you to the text. So it is this interactive way to really try to understand what the system is doing, mark the evidence that it's wrong and retrain. Some questions that we are addressing in this area, how do you really use these domain ontologies and different tools? So we are writing APIs for different types of tools like C takes, our own tools, pi context. Which methods work best for which types of concepts? How do we incorporate the feedback? That's a big research question. We have done it on the machine learning and had one or two publications on that, but what about rule-based systems? How do we incorporate feedback from users on that? And how do we suggest changes? So there is lots of research in that area. Sometimes you need a classifier and oftentimes you need it after the NLP. So you get all of this NLP evidence and now you need to determine for the report or for the patient, what is the value of the variable? Did the patient have ammonia, for instance? So we developed a tool called TextVect which is based on the idea that you already have a training set at the document level, for instance. Now, you want to create a classifier and typically you use Wecca or Mallet then there are all kinds of different tools that you can use or writing your own and they're use N-grams. But there is evidence from NLP systems that could help improve the classification performance, but people building these classifiers typically aren't familiar with the NLP literature and tools and they have to go read up and find out what they all are and solve them and get them going. The idea behind this is it is a Uema [phonetic] pipeline. It has a bunch of different NLP tools already in it, part of speech taggers, C takes, negation et cetera, and now you can select what kinds of features you want to use in your classifier. You can select the representation you want, whether it's binary count or tf-idf and then you run it and the output is a vector that now you can train your classifiers on. You can do some of the training inside here. We evaluated that on the i2b2 training set and showed that it can perform almost as well as the best systems just out-of-the-box by using these default tools that we have installed. So still questions, what's the best representation of these complicated NLP features? What features are more useful? And what type of NLP tools can really help in classifier developments? One thing that people found is mapping to concepts can be helpful because now words like shortness of breath become the concept shortness of breath and not shortness of and breath and so it can reduce your feature space for one thing. But negation can be very important, so differentiating between shortness of breath and no shortness of breath. We've looked at history and family experience and those we haven't found as strong of a need to mark them in your feature vector, but there's lots of… If you create these different information models to represent no family history of colon cancer, how do you represent that in your machine learning feature? Finally, you want to build, people, like I said, don't want an XML file. The domain expert is working with you because they want to create, for instance, they want to create a dashboard to find all of the patients who have some kidney infection and they want to be able to see, oh, that patient has a kidney infection. I need to pay attention to them or call them or whatever. Or they want to create a time line and see what's happened over the patient's history. How do we help people create visualizations? And the vision behind this is that just like Excel, if you have your data in certain form you should be able to render it as a table or a bar chart or a pie chart the same way. If you have the text annotated you should be able to render it in a lot of different visual ways depending on what you are interested in. There are a lot of libraries out there like D3 and others to help build those types of visualizations. Some of the visualizations that we found people are interested in with our collaborators would be like a population view. I want to look for patients who have pneumonia and there is such a big set of patients to look at, where do I start? If I could cluster them in ways that are similar, now I can focus on a particular cluster of patients and only look at those first. It's kind of like an EMR centric view of where you are looking at the patient and if you are doing chart review on a patient then you want to find a patient that has these symptoms or physical exam or labs that are indicative of pneumonia. Could you show in one glance the positive, negative and uncertain evidence which is marked with colors there for the different features in your domain ontology? And now as they look through that they can click on them and it shows the text that it came from so that they can see the evidence and just more quickly peruse the whole patient's record instead of having to search through one document at a time. Maybe you're looking for a particular case definition. Does this patient fit this case definition in the chart review? This is often the case. Does this patient fit the CDC's definition of pneumonia? Here's a diagram of the CDC's definition and you could link the evidence from the text to the items in the diagram and they can just more quickly go through the diagram and look at the text and make that conclusion. Timelines are another area that has always been important but it is so difficult, so we did a little bit of research on this. How do you build a timeline at is useful from text? Because consider the scenario where you take over a new patient or you come to the hospital and you've been off for one or two days and now you are taking over, and you want to say what's happening with this patient last two days? The way it currently is is you take about three minutes and you look through the most recent reports and things and that's all you get. But what if you could just summarize everything over time and the things that were really of interest you could drill down on and look at the text? We started on this and we built some really cool tools to help users drag over and create timelines. The hard part for us is not the NLP so much, it's how you put information together. There are a lot of annotations in a report from NLP system and you can't put all or 100 or 150 of them on this timeline. Some of them need to go together, like they had a chest x-ray. The chest x-ray showed this. Those things need to be clustered in one place. It doesn't make sense if they are in different places on the timeline. And so the cognitive part of what information should be put together is where we didn't get past that part. Yeah? >>: [indiscernible] the timeline texts, the sequencing of the texts that represents the timeline, that assumption must prove to be false. You should actually have to mine. >> Wendy Chapman: Say that a little louder. >>: Presumably you can't read the timeline directly off the ordering of the information in the text. That there would be backwards looking information [indiscernible] that he was admitted to some other hospital three weeks earlier or something of that nature. >> Wendy Chapman: You could in the way that this report came sooner is less faraway then this other report, which was two weeks ago. But it's pretty close. But within the report they are going to talk about things that happened years ago and they're going to talk about things that are hypothetical or in the future. And so within the report you have to be able to understand the relation of those items to the time the report was dictated to. Okay. So some things, like I said, we have just done preliminary work on these things, but what are the visualizations that people want? And can we increase people's efficiency in looking at the chart while not hurting their accuracy? Also in important areas like cognitive bias, if you are looking at a patient to determine whether to treat them for pneumonia and so you start looking back on the different evidence and you see some positive evidence for pneumonia, then as a human you are going to have this cognitive bias to say they have pneumonia because they had a cough. And now you are going to probably ignore negative evidence and not look for things that will go against your own hypothesis. And you think about politics, we all do that. And we just feed ourselves with things we already believe. So how do we prevent against that? If we could point out to them contradictory evidence and point out to them evidence that is ambiguous or uncertain, then we might help them make better decisions. We've done a lot of work on, like what are the things you might point out? Which indicates uncertainty? How do they linguistically express uncertainty and how does that affect what people see? We are in the middle of analyzing some data on that. Yeah? >>: [indiscernible] so you mentioned there are two ways that things could be uncertain. It could be that the doctor saying [indiscernible] or saying uncertain. So the author could be based on background knowledge this thing might indicate something other than [indiscernible]. The symptom also suggest like if you consider multiple disease together then you could potentially explain away some of the evidence. >> Wendy Chapman: So you are saying that if there are multiple… >>: If you consider that only any cause for symptoms of pneumonia remind me some other symptom that could indicate another disease, so when you saw the symptom and you say oh, this cannot be pneumonia. This must be something else. >> Wendy Chapman: Yeah. When we do, when you think about pre-annotating text to help people review it, we typically think of marking the positive evidence. We don't think of marking the things that oppose that. But we really want them to look at that and so if there are other diagnoses that they mention that are contradictory or findings that might compete with the findings that indicate pneumonia then you would want to point those out. In our study we focused on pneumonia first. We looked at, we had physicians mark, like not radiologists but physicians mark radiology reports and the other clinical reports and say mark everything that supports pneumonia, refutes pneumonia or causes uncertainty in your mind. We had seven people and they were very different on what they thought caused uncertainty. If someone said this supports and someone said this refutes or someone said this supports and someone said this is uncertain, then we called that uncertain because it causes disagreement too. And then we used that information to analyze it linguistically and about 20 percent of it is words like could be or might be, but the other 80 percent is the particular finding. Like they mention atelectasis. That's competing to pneumonia. Or they say this is an opacity consistent with pneumonia, so they are making a finding with a diagnosis, but the reason they're linking is there is kind of uncertainty there and so… Yep? >>: [indiscernible] without your visualizations, the way a doctor's writing the medical record he's kind of assuming that a doctor is going to read the text that was written. So in a way it's like a visualization. I'm kind of curious if the doctors write in a way of doing that if it's going to be read by another person, likely they had extra terms or ordered in a certain way. Whereas if they knew that it was then going to go to a system and be digitalized I wonder if they would change the way they even write because they could give more detail or write in different order. >> Wendy Chapman: That's a really good point. And I think sometimes they do think about the reader and they're really trying to lay out for the reader. And other times they are just more worried, they think really the purpose of it is just to protect their butts and for documentation and billing. And so they are copying and pasting and they are putting these huge things in that are so unreadable and duplicative and not right and they still are doing it because they feel like no one is going to look at this and it's just a waste of my time. So their intent and what they think it's going to be used for, I think, very much plays into how they organize it. >>: [indiscernible] the system so they could just check some boxes and it shows a bunch of text in there, or do you think they are actually just physically copy and paste? >> Wendy Chapman: They physically copy and paste. Oftentimes they'll use those reports, they'll take a whole day to write the report and they will do a little bit at a time because they are using it to help reason and make their own diagnosis. And then they are waiting for a lab test. Now they get the results of the lab test. They go in and rather than type here's what the lab test said, they just copy it and paste it in. Wouldn't it be nice if you had a pointer instead? And there are all kinds of things and they have their own, depending on the hospital, their own templates, for their own macros that they can put in that they have made up. Yeah? >>: I'm a physician and I've been a physician for 35 years. I moved from doing things on paper to doing things electronic, dictation. What I see happening in the vendor community right now and the reason I applaud this work is that if anything the vendor community is trying to get physicians totally off of free text and creating solutions that are an absolute nightmare of click boxes and trying to codify everything into individual units. You mentioned copying and pasting. If anything, the whole industry is coming down on that because of all of the errors it's introducing in the medical record. Do you see, I see the research applications for this. My concern is really what we're doing to the end-users in the clinical space. I am just seeing my colleagues now just suffering. Things that used to take two seconds to do now take 10 minutes to do and they are all complaining about it. So as you look at your work and what people in your field are trying to do, what's the direction? Is it more towards, you know, reviewing these large bodies of data from population health and sort of codifying it or how much of it is really aimed at helping the end-user? >> Wendy Chapman: Yeah. I think the end-user gets ignored. The people making the decisions about what vendor systems to buy, et cetera, it's all based on they are going to improve their billing. They are going to improve their population management which will cut down their costs and things like that and the physicians and nurses they just get stuck. And so we have a faculty member, Charlene Weir who is a cognitive psychologist and she is going around and she is observing them as we installed this. She's observing what they are doing and making note of what are the pain points, what's going well. And when we bring those to our CMIO he's like I already know about those things. Don't tell me those things because there is nothing we can do about them. And now you're just getting people's hopes up and they think we're going to change and we are not because there is nothing we can do about it. So it's this very fatalistic and they are just ah. So I think one of the big paradigm shifters that is coming is kind of the smart on FHIR. Have people heard about this? This FHIR, this new interoperability standard, FHIR, and it's, we could have a whole talk about that. But it opens up the opportunity that you can say, for instance, in rheumatology, okay, dermatology, our dermatology chair is just so angry about it because it has cut down the number of patients that they can see by almost a third. So they have lost so much money and they just spend it all documenting. Because Epic doesn't have a good dermatology, they are not going to pay attention to dermatologists. It's a very small market. So there are companies out there that create dermatology interfaces and they can just go really fast and it's very intuitive and its visual, but it doesn't hook onto Epic. So no way are we going to consider it. But with FHIR you can, if you can write to the same specifications and if epic supports it, now you can plug it into epic, plug it into Cerner and you can create these custom interfaces that really helped the end-user. I see that as a hopeful area in the future. >>: Is your work really balanced sort of to both and as a research context end-user experience? >> Wendy Chapman: Yeah. Back to NLP. I think that NLP has to play a large role in that. I think that there are some things that should be filled out in structured form, but there are a lot of things that you just need the text. When you say the patient crawled to her mailbox to get her mail, there is no checkbox for that and there is a lot of information that that tells you about, the patient's motivation, their physical state. There are things, stories that need to be told in text and the NLP needs to be there and I see it as a really semi automated thing. >>: What we've done is we are forcing humans to change behaviors to suit the machine versus the other way around. >>: They are the most inexpensive piece of the whole system. Doctors have been practicing, been in school for 12 years. Is there anything in here that is also speech, dictation as to automatic translation and then either feedback that says, especially when they are going into a report area, it's much better if you get it at the beginning. You had called out I want to do a [indiscernible]. We are not ready for doing it all on our own machines. They are not capable of doing it. But have you done anything in terms of usability then how do you make it so that this is a system technology, so they can be much more productive? Our new charter here at Microsoft is to make everybody more productive blah blah blah. That didn't come out quite right; did it? >>: We've got you on record. [multiple speakers] [indiscernible] >>: But I mean I think that a lot of advances can be made if we try to do a machine man collaborative thing. >> Wendy Chapman: Creating this is interactive, you put you're like in radiology or anything, if you are doing speech at the same time and then you are looking at completeness. You mentioned this but you didn't mention this and we expect to see that so you can help with completeness. You can ask questions. These kind of contradict. You can really help improve the quality if people are willing and I don't know. I think speech is getting to that accuracy and so I think there is more possibility for it. >>: There's a whole lot of training you have to do to specialize in the space but yeah, in terms of where we going, it's getting much better than it was, but there is still the context and all sorts of stuff has to go up. >> Wendy Chapman: This is one reason why NLP and other medical type applications are so slow. I mean, they have to understand a lot of domain knowledge and so that takes a lot of customization, a lot of fine-tuning around a particular domain. >>: There used to be a lot of dictation and transcriptions. Is that still going on? >> Wendy Chapman: Yeah. >>: So would step one for getting the domain knowledge be working with a transcriptionist to make transcriptions more productive? I mean here's the transcription. Here's the dialogue. Here's the everything going on, so doing a partial. >> Wendy Chapman: It's not real-time. You lose that real-time opportunity. >>: Agreed, but step one is… >>: You have the opportunity to learn from the transcription's corrections to the transcript. >> Wendy Chapman: Right. That's a business model for M Modal. I don't know if you know the M Modal company in Pittsburgh. They are a transcription service but they do it through speech recognition. The transcriptionist changes it and then they run NLP and then the user corrects it and they send back the coded document. It's really nice model, but it's not real-time. But it could get to real-time. >>: You could get to real-time with advances in that. And you did say that it was M Modal? >> Wendy Chapman: M Modal. >>: But they are feeding back in those changes so they are building the learning system so that hopefully you are getting more and more productive and the transcriptionist is faster and better and less time is spent on each one with fewer errors. >> Wendy Chapman: That's right. And that's another reason why industry and research really used to work together on this because the researchers have the idea about the bigger picture about the workflow and the detailed kind of NLP things, but we are not going to build an interactive speech recognition system for radiology that's really going to be deployable in the next few years on our research grants. Maybe you guys can. Hopefully you guys can. Okay so NLP is starting to show up places and the promise of electronic medical records, the promise of natural language processing may be closer than ever. This was in JAMA which was amazing to have an article about NLP and an editorial about NLP in JAMA. But it's around the corner and that corner just doesn't feel like it's getting very much closer. But as we start to build applications that will assist users, I think that we can get closer to that. And it's not throwing out researchers. There are all kinds of research questions that have to be answered to really get there so I say come join us in this quest. And I want to acknowledge my collaborators and my lab and conclude with, do you guys know where this is? It's sand. That's White Sands National Park in New Mexico. So thinking about context, you are probably thinking that I'm from Utah. I just talked about a snowstorm. But it's not. It's sand in the summer. Thank you everybody and I will take more questions. [applause]. >>: A question for you in medical devices, so you are now starting to advise a physician. The transition from advised to practicing medicine and medical devices, what is it where you have to start having the FDA and the whole approval process in your thinking? >> Wendy Chapman: I think everyone is really worried about that. And I don't know the answer. And it could change at any time, so I think right now it's okay. It's safe with the support systems, but at some point it's going to flip. So I'm not sure. That's a good question. >>: As soon as he gets to the point where the human i.e. clinician is looking at the data to render a diagnosis or making a decision. That's when the FDA starts getting a little excited. >>: What I saw the support was we see this and we think this is the treatment plan. Here's the biotic. That's practice to me and I would have got that one… >>: That gets them a little excited. >> Wendy Chapman: But there is also the thing that eventually it comes to the point where if you are not giving them that advice then you're doing something wrong. That's malpractice because that knowledge is available. Why are you not pulling up that information that is there for them that they can't find because you have hidden it in so many places? Why are you not giving it to them so they can make a better decision? And there are guidelines about here is what you always do in this case, so I think it's complicated. >>: There is lots of other mining. You talked briefly before, part of this is like if you wanted to go back and look at machine learning over [indiscernible] and you want to actually improve care and you want to move the state-of-the-art forward, we have to have certain things coming out of the medical records, which is sort of all of the germane things that lead up to the disease. You've got a diagnosis. You've got the germane things during treatment and then outcomes. Is there any hope that we'll see this in a concise form, reasonable form, complete form? There's never going to be a complete form in my mental model, but are we, where are we on the path to actually being able to generate that in the example of 1 million veterans program or any other? >> Wendy Chapman: To be able to generate a kind of a, all of the data that you need to do the genome type phenotype and really do personalize treatment… >> Whether it's genotype phenotype and you just are doing best practices, so if clinic A in South Dakota is doing something or other and they are doing something very different over in Minneapolis, you would like to make sure that one, the cases are similar enough, if not identical, which is never the case. But close enough that when they got the care in Minneapolis working and now they want to help disseminate that information, how do you make sure that the evidence is there supporting this is a best practice, and that it is there and that we are not going to be doing clinical trials when we start getting down to the personalized medicine, so the clinical trials? And of course some small number, you are going to be using statistics. You are going to be using this, which is all learning which means we've got to be extracting a computable form of the medical record. >> Wendy Chapman: Yeah, I think there are a lot of steps in making that happen and we are on that path. One important step is a system called Open Computerized Decision Support, or Open Clinical Decision Report. It's an open-source way of delivering that knowledge so that if you author your knowledge with these certain standards in Minneapolis at the Mayo and it works well, now someone else can port it to their own place. Whereas now, that's not possible, and so being able to share that knowledge and deliver it in a different institution is one of the major steps needed for that. >>: Is there a legal framework in place to do this sharing? >> Wendy Chapman: I think so. I think as long as it's not actual data about your patients, people are fairly happy to share their guidelines. They publish them and putting them in computable format, I think people are willing to do that in general. I could be wrong. >>: That's not the statistical. That's here's my recommendations and so they have already done whatever statistics they have. >> Wendy Chapman: I'm sure they would be willing to share it for a price. I mean people want, hospitals want to become more commercial and they create a predictive analytics system that works well, they would like to market and share it in that way. >>: [indiscernible] socialized medicine Canada, China. >> Wendy Chapman: Yeah, I think there's a desire to share knowledge, at least in the academic medical centers. And you're the leader. You showed that this work and you are the leader and now people are adopting it not shows that, your influence. >>: Could you perhaps say something more about the MPP project and what is the promise of it and what are the barriers to actually making it happen? >> Wendy Chapman: Okay. I don't know very much about that project, really. I helped apply for one grant that would use the data from it, so I know it's hard to be able to use the data. And I think they have maybe 100,000 patients' genotypes so far and in the next year they will have 200,000, so they have this kind of phased plan. The genotype information is now linked to all of their patient records which are available to researchers. Beyond that, I don't really… >>: So you mentioned you have this rant and how long do they expect to get the actual data? >> Wendy Chapman: We applied for a grant to use the million veterans' data and we haven't heard back whether we got the grant or not. But it's my impression that you can just use it. You have to apply for these opportunities. It's not just there for everyone to use. I'm not positive about that but I'm thinking that's true. >>: [indiscernible] you could use it on your computers. >> Wendy Chapman: Oh yeah. We were talking about that and the access that we did was very difficult because there are lots of hurdles to getting to the permission and when you do get the permission it has to be on their servers and they are very difficult to do, they crash. You can get to them. It's just ahh. You can't run your system because it's too, it takes too much CPU or whatever and they don't have it, so there are lots of barriers. >>: Do you see this as a problem compared to all of the other shareable data? >> Wendy Chapman: Yeah, just by a few more servers, put it on the cloud. >>: We have spoken with people at the VA and they are by and large Microsoft shop. >>: I was there. >>: But they do need help. >> Wendy Chapman: They do. And they are not going to let you put that data somewhere else. >>: [indiscernible] >> Wendy Chapman: Yeah, that's the problem, the funding for it. They really don't like to spend money on IT. >>: [indiscernible] sequencing [indiscernible] genotype things instead of [indiscernible] even [indiscernible] sequencing you don't have the funds. They have huge numbers of samples of biodata. They don't have the funding authorization to go all through it. >> Wendy Chapman: To go through it? >>: And they have a lot of people who said yes you can do it but they don't have the process to actually get it done. >> Wendy Chapman: That makes sense Uh-huh. >>: [indiscernible] barriers, because you were talking about open source and pick your workbench. Is that getting, are other institutions kind of taking that on or what is the barrier to getting that more widely adopted? >> Wendy Chapman: The idea like a workbench to pull into the different NLP tools? Well we haven't really developed it out very well yet, so that's the first barrier. I don't know. I would be interested in your guys' thoughts about clinical NLP. It feels to me like 100 people working on the same things. There aren't aligned incentives to really collaborate. You get your grant. You work on your research, so it's likely need someone to find development of application and then you use the resources that people are developing and use them as consultants. But we as researchers are not going to be able to build out that big thing and sustain it. That's what I think. And the VA seems like the place that could potentially do that. They could hire someone to really build the applied, pull it all together kind of thing and maintain, in theory. >>: You have a slide of the African-American person. You imagine that as the end-user? That's the clinician wanting to see the information? >> Wendy Chapman: Yes. That is then defining with a kind of variable so they won't extract from text. Uh-huh. A lot of us make our things open source and so they are available that way. >>: But there is still no, I mean you have made your models available. >>: Yeah, we make the models [indiscernible] we sometimes can't make models available. I think the main bottleneck is the data sharing. I think it's not a good idea to have people trying to be able to [indiscernible] because the description of that [indiscernible] may not be addressing the needs. I think everybody is passionate about the data and the source code should be available. >> Wendy Chapman: Yeah. And then if you could build ways to quickly draw on all of those different tools and evaluate them because it's a lot of work to adopt someone else's tool and map it to yours so that the input and the output line. It's a huge amount of work and when there are dozens of them. >>: [indiscernible] models and lots of people want to share data just from early Windows all the way up in all sorts of things. It's very hard making a model that people will do. What's the value? Until you identify a significantly important piece of work that is bringing value that people see, they appreciate, they share in common amongst a lot of people, it's going to be very, very hard to pull any of this together. So getting a step that is going to gain enough momentum to say I have improved my productivity and you have done whatever the right metric is, getting a lot of people, especially researchers who are by their nature independent and want to be pushing the limits on different things, care less about delivering value as much as working on exciting and interesting problems. It's going to be really hard. >> Wendy Chapman: Yeah, and at the same time, if you just rely on companies to create these tools without researchers' input, you get epic concern and things that are not really useful in the real world. >>: Yeah, where's customer value and if you are looking at it from the IT and business side, as you mentioned from the beginning it's very, very different than looking at it from the patient side. And where is the value to the patient? Is a getting better outcomes? No. I'm getting better billings. I'm getting better charging. I'm tracking the business better and for lack of a better word, the bean counters are getting a little happier, but are we really pushing that life expectancy up? Are we driving the costs down? No. We are not driving the costs down. The bean counters are there because any cost that they take out of the system they are going to put it in their own pocket, so the costs are not, that's not a capitalist incentive. >> Wendy Chapman: Yeah, so we might be getting to a time in the future, we're getting to a time where the value for the patient and the value for the institution are getting aligned. And in that case I think there will be more, they will see more value in the research tools that come out. >>: From our experience for [indiscernible] for the last six years now at the medical center, so in our experience the most exciting projects usually come from creations when they are super excited about something. For example, we finished a project very recently and made like that [indiscernible] speech where maybe with this app we need doctor tools to the phone and like to a secure channel is transferred to a server and then it's mapped to a text file where the clinician himself [indiscernible] and the [indiscernible] some sort of light parsing on the total fixed so that it will be more like nice in terms of [indiscernible] purposes. And now we should apply this where the clinicians are measuring their time, like how much they are inputting their time in terms of cost and also like to provide like how many times they are making less errors kind of analysis and they are loving it. >>: If you can get the data to the server quickly and you can do the analysis and you can be assistive and you have ambiguity in this diagnosis you can resolve some of this ambiguity by doing this test and giving them constructive feedback to drive to a better one. But you have to be able to do that quickly and that's, you know, how do we spin that up? It involves [indiscernible]. You can do a small project. >>: Yeah, a small private project, that was kind of a nice way to measure the impact of an automated system like how people are receptive to use that rather than seeing doctors and 10 patients [indiscernible] >>: And I think that here you are seeing the value of the people that are going to be advocating for it are not the NLP people. It's as you said earlier the end-user and it's a user centered design. How do you get the high-value people in the system saying this is adding value to me. I am being more productive. I am being more accurate. I am being more whatever because I am getting this assistant technology. >> Wendy Chapman: Yeah. >>: How do you maintain freshness of the data? For example, if there is a treatment and a new drug comes out, within three weeks everybody needs to be using it because this is the one that has the right effect, but it's not going to show up in your data because it hasn't been processed yet. How do you do that? >> Wendy Chapman: That's a big problem, the lag in discovery and application. And people are shown that, I think it's 17 years. >>: I have heard 10 to 17 years. >> Wendy Chapman: Between the time when everybody realizes and agrees that something is the important way and it's actually used regularly in practice, so if you are going to mine clinical records to try to learn things, you are going to be way behind. I think that's right. And we have to keep that in mind. It doesn't mean don't do it, but it means just because this is how people do it doesn't mean that's right way to do it, necessarily. Yeah, because then you need the literature combined with the clinical record. >>: Compare the records and make sure they don't contradict each other. >> Wendy Chapman: Yeah, that's always interesting. >> Lucy Van der Wende: Thank you so much. >> Wendy Chapman: Yeah, thank you so much. [applause]