>> Lee Dirks: Thank you, everyone. We're going... portion of the schedule. Thanks. Hope you guys...

>> Lee Dirks: Thank you, everyone. We're going to reconvene the afternoon portion of the schedule. Thanks. Hope you guys had a nice lunch and a chat with your colleagues. For the next session, I'd like to introduce Rafael Sidi. He's the vice president of product management for Science Direct, an expert in research productivity. He's been with Elsevier since 2001, where he's been instrumental in developing Engineering Village and launching Illuminate. He was also the publisher for the Compendex database. Before joining Elsevier, Rafael was director of E-Commerce operations at Bolt, a teenage social networking portal. He holds an MA from Brandeis University and a BS in electrical engineering from Bosphorus University in Istanbul, Turkey. He's also joined by his colleague, Dave Marques, who we're very proud to claim here in the pacific northwest. Although he works for Elsevier, he sits here in Seattle. We're very glad to have him here. I'll hand it over to Rafael. [applause]. >> Rafael Sidi: Thank you, Lee. First of all, I'm going to -- we are doing this presentation with David Marques together. And one of the topic that I want to introduce to you and show to you is a new product that we launch. It's all about visualization. And I was going to present this product, but David is the guru of this product. He lives here, so we are going to do this as a team. I'm going to look at most of the product [inaudible] so we are talking about multimedia visualization. For me whenever we introduce any multimedia and visualization one thing that I keep in mind is how this going to help researcher and scientist in their daily outcome. Le Iovrnal des Scavans started in 1665. This was the early scientific journal. Since that time, as publishing we changed tremendously. And today, right now, if you go to different publishers, you see graphic abstract. So we are expressing the journals now in a more visual perspective. So you can see the graphic abstract and you can get the quick overview have the article. Then we started playing wit text and caption. So you can look at -- you can get inside from the articles. Now, the multimedia, introducing multimedia is very important for us because we are trying to provide more insight to the researcher and scientist. We are trying to provide more intelligence from the article. So we don't want the articles to be aesthetic articles, we want the articles to be interactive articles. And what we have been doing is we are introducing, we are asking the authors to submit the videos, any supplementary data so we are showing this to our users. And here is how the users who come to our websites they can really understand what's happening and they can learn about an operation. Again, it's not something great to watch after lunch, but still you know, you need to understand that this is how you're going to communicate with the scientists and research community. We certain about the datasets, the importance of the datasets this morning. So you want to also to provide more context to our users when they are looking to an article. So we partnered with Pangaea, it's in Germany. So what we are doing, we are introducing the datasets. We are connect canning the article and then linking to the dataset this is available. This is our way much visualizing the datasets within the article. Then this morning we heard about [inaudible]. So this is the way that you want the article to be interactive so people can get the information easily. And then we have pleased to have [inaudible] submission coming with our articles through our authors. And the beauty of this is that you are connecting the user from the article to the other external relevant sources that they need to go and they need to do more in depth search and discovery. We partnered with EMBL. This is researchers in Germany. So what they do, they extract the protein structures and they visualize that within the articles. So we are trying to link the users contextually to relevant information wherever they are in the article. Another visualization or image multimedia that we are using is we are asking the author to submit keywords for the structure. So when we get the structure keywords, we are linking structure and we are presenting the structure to our users so they can go to other databases, they can go through their searches in other databases. And we will always looking to the search results. So search results most of the time they have been textual. So what we are doing is it is for the engineers and some -- and a medical researchers to look at the images to search the images is very critical. So you sort of just providing search results in a textual format. We are also providing search results you can do image search and you can get the images in your search results. Now, in 2005 I looked at what -- how the search results are presented. And the way that I see it in today's search result, some of the let's say Google scholar views a lot. And are these results dumb search results? Where is the intelligence? How people can get inside from the search results? In this building I think this is down in China Microsoft Research. Again, if you compared the Google results with these results, there are more insights in these results. At least you can see some author names, prolific names. So another way for us in terms of visualization is okay, how we can leverage facets, how we can leverage the content, the keywords that you have so we can provide more insight using facets to the end users. Again, in Microsoft what they are doing is they are taking the facetted search to the extreme. This is in my mind faceted search inserts. So you can get the visual way, you can really get insights looking at to this table. For me, you know, [inaudible] introduced faceted search to the search industry. By thinking what Microsoft is doing here, they are taking faceted search is like they are creating the next version of faceted search where you can get -- you can see much, much better information. You can easily interact with the results. And this is for us, you know, this is an excellent way for me for search and discovery, specifically in science. A technical field is a great way to see information. Again, there are great tools that we integrated to our journal articles so we are trying to bring the data, where trying to bring the content to life. So this kind of visualization is helping our researchers, scientists, students to understand the content much, much better. Now, we are talking about video. And video is going to be very, very important in the future for scientific and technical research and retrieval. And what we have seen right now the important of video also is introducing journals based on video. So this is a completely different transition just journal article. Now you have journal -- video journals. And not only that, we want to also provide to our users the videos -- the protocols in videos. So there are new startup companies who are leveraging and presenting the protocols in video format. So this is an excellent way for our researchers and scientists to learn new things. Now, recently we launch another product and then here we took two subjects in health sciences and we created again leveraging the images, the video that we have, we created a brand new product. And we are leveraging here not just image search but also the facets. And then what you see here, again, sorry -- oops. This is a video where you can really learn how to do the search. So that's basically we are sharing this kind of information with the scientists, with user and researcher to understand what's happening here. And you can take this body and then you can look at -- you can visualize and you can see all the figures or all the videos that you have in a book. So that's -- we are making the content more interactive. So you can really learn what's happening. You saw -- we don't want to just provide text but you want to provide your multimedia visual so that you can improve what you are in your daily work. Again, these are some additional examples of how we are presenting the content. We have another product that we are leveraging visualization is geofacets. So you can -- you can map all the vessels that you have and then you can look at and you can did a search in this specific person. And what you are doing is you are getting the image that is are in the document and then you can do more in depth search if you like it just images you can get the images or if you want to look at the document, you can access the document. And not only that, we are thinking about the user's workflow, how we can take that image map and you can put in your other map workflow tools that you are using. Again, there are great tools, great visuals that you can see. It looks great, it looks very cool by what you can do with that. So that's the important stuff. You know, there are great visualization but okay, how am I going to use this? What am I going to do with that? But in this case, it's a workflow problem because they have the map, see the map in the journal but they cannot take that map and put that in their workflow solution. Again, another way that we looked at to information is that how I can provide more insight from all the content that we are putting together? How we can categorize it? How we can provide a simple visual image that that shows what the trend is. And how I can compare two companies' output just using visuals? So if I want to look at what's happening in the electric car, if I want to compare General Motors and Chrysler, this is easy. I can create a visual diagram. I can take that. I can put in my Excel. I can present to my managers in the presentations that you are to go. You need always to think about multimedia visualization, how it integrates with the scientists and researchers workflow. Key factor in the assign field is how we measure authors? Again, you can visualize this. And you can see an author publication by source. In this case, we are looking to Ben Shneiderman. And I can see where he's publishing. And I can see also what kind of subjects he's publishing. So that gives me an intelligence in a very, very quick manner. Again, one of the best search engines that we have right now in the scholar [inaudible] is what Microsoft is doing in China with academic research. This is a helpful tool for me to figure out the co-author network of any author. So I can look at this and I can see that Ben Shneiderman is connected to all those people. But if I want to find out if Ben Shneiderman is connected to Jim Hendler [phonetic], again, I can use this tool to figure out how they are connected and I can figure out whom to go to figure out how they are connected. So we need to understand again on the visualization how it is helping me so solve their problem. Journal analyzer. We all try to analyze our journals. But the way that you can do that is you can create an image of that and you can visualize all the stuff. Another important research is at University of Maryland with paper lens. If you look at this, it's giving you insight about what's happening in a topic, who are the most prolific author, and you can get insight on this and you can deep down drill down to find more information. There are some interesting companies that are using visualization tool. And this kind of visualization tool provide insights to their customers, to their researchers. So here you can find the landscape. You can see the whole landscape in any topic. Or you can find unanticipated patterns in any subject. This site for me is a usual way for looking at the data, looking at the content. Or you can identify the early trends. Another company who is working in the visualization, again, you can visualize the whole genome and you can drill town, you can get much, much more insight, much, much more intelligence on this kind of visualization. So those are helpful for the specific niche areas. If you want to see NIH Funding, this is another tool that you can look at how this whole funding is distributed. If you want to look at the whole content distribution in terms of keywords, you can look at that way. Or if you want to see everything in a wheel, you can do that too. But, again, how useful are this stuff I think still is -- there's a question mark on this. How we are going to use? This is always question mark. How the researcher going to use this? It's a great visualization, it's cool, it looks nice, but how is going to be integrated to the workflow? Again, it's another way to look at the cancer network in terms of content. Now, there is a guy in Europe who is really, really know what he's doing. Moritz Stefaner, he really understand information and visualization. So the way that he present you can easily understand, you can easily interact with any content. And here is some samples that he has done on different topics. But if you can bring something like this to a science and technology, it will really help the scientists and researcher to really interact with the content, to get more insights from the content. Again, this is a -- his -- the way that he did this is citation network for the journals. And this is the citation -- journal network. And then you can see which journal is citing is other journal. That you can get a quick snapshot of what's happening in an area. Again, this is the eigenfactor that shows the journals. And then you can see which journals are citing which journals and which journals are being cited by that journal. So that is important performance for any kind of visualization that is important for administrator, for scientist or for researchers. Again, this is another similar tool that shows the whole network area. This is a beautiful image. But what I'm going to do with this? So how I'm going to use this image? So again, from the product side we need always to be very careful, okay, there's a coolness but what's -- how it's connected to the outcome. And I think the other beauty that I see likely in the market is with the Web, the creativity of crowd is really coming up. So when we look at with Many Eyes, the IBM's product, people are using Many Eyes and they are creating their own visualization on scientific content. So for me from the product side as a publisher this is very, very important. So scientists, researchers, they are using Many Eyes, they are using available content, and they are creating applications on top of that content using a tool that IBM's providing. So what we are doing is, okay, we are also providing the tools to the scientists, the content, the APIs so they can create their own applications. In this case as a publisher, we are not going to come with all the visualization or all the multimedia solutions. But what we can do, we can provide the tools and the content to the scientific community and they can build these tools. So what we have done, we have launched an application marketplace. In that application marketplace we are providing the APIs and we are telling to do scientist researchers, hey, build whatever you want to build. You want to build visualization, go ahead, build. And then one of the applications that we got was from [inaudible] university, and they created an expert search. I wasn't create this expert search so they created an expert search. What you can do when you do a search on any topic, in this case we did for I think Semantic Web and then I can see who are the most prolific authors in my search results. So I'm trying to bring the contextual application in my search results. So that's the key. If you are going to bring visualization anything it should be contextual. And in this case, again, this is their application, it's not my application. We can view Jim Hendler's network and we can really go in depth in Jim Hendler's network, we can play with that. So this not my application. Again, the crowd created this application and they did this in a much, much quicker way than I did. So what we are doing with the applications here an example that we have is prolific author. So we partnered with a company. They created this visual showing the most prolific author within our content. Or what we are doing, we created this application that shows the author's network in the system. I think what we are pushing is that whenever application, whenever visualization or multimedia you are going to do should be contextual, should provide insight on the end users. And one of the applications that what we are doing is we are bringing other -- we are creating the integration tool, we are creating a probability tool so that we can integrate other products, we can link to the other products. So one of the products that we are linking is we created an a application called Brain Link. And it looks to the content that we have and it extracts neurostructures and then it links to the brain navigator product that my colleague, David Marques is going to present to you now. Thank you. >> David Marques: First I have to put everybody to sleep. Video. That's what I wanted to do, right. Dot cam. And I want to pick up on a couple of things that were mentioned this morning. And particularly one of the things that Rafael was just saying now is talking about what people are going to do with the content. So that's what I'm going to talk about is what people want to do with the content in making it easier. So I'll have to compress a two hour talk into 10 minutes. So how many people are neuroscientists in the room? That's a good sign. This is all about neuroscience. It's about people who study the brain and study what different parts of the brain do. So simplify it way down to say the key problem is I have some task that somebody does or some animal does and I want to figure out what parts of the brain are involved in that and how they do that, how they implement that. So a connection between the structure of the brain and the function of what those structures or what those pieces do. We publish in book form we publish atlases of -- almost a variety of 20 species of animal, rat, mouse, monkey, human and birds fish and I'll kinds of things. So what we've done is we've taken though atlases which are plate after plate of the histology; that is slices through the brain at certain locations and then outlines of every one of the named structures that neuroanatomists called out. If you look at a structure, if you just look at it an anatomy picture, it's sort of fuzzy, you can see some major landmarks but you can't see the thousand different little structures in each of the brains. So what we've done is we've taken that atlas and put it online and help people figure out, make it easier for them to actually do their work with it. So I'll talk about two tasks that is they do. One is plan the research; that is plan to get to that spot that they want to do something with in the brain and the other is after they've done something, interpret what they've done to say did I get to that spot or what change did I make on that spot of the brain? So those two tasks are two of the things that we've talked about for the brain. So you saw in Science Direct when you're reading an article and you read about a structure in the brain, you'll get a picture of that structure as it is in that brain on the species that you're looking at, and then you can click over to this page, which is the quick summary of the printed atlas pages that have that structure in it. First thing you notice is the structure, everything that's colored here is the structure. I picked out the striatum. Now, that's actually made up of about eight different substructures. So we color them immediately and we show them in a structure hierarchy that is defined by the neuroscientist. And then the major thing, there's a lot of things you can go in here and do here, but the major thing you can do I'm going to talk about interpretation first, is you interpret your research. So the thing on the right here is an image, a slice through a brain, right, that you, the researcher might have done from your animal, and you can compare that to the drawing of the standardized brain that we've published in our atlases. And then you can then do an overlay of those on top of each other and then match up exactly each named temperature with each different part of the structure of your brain. And then there's all kind of different ways you can shrink it and do all kinds of various morphing to get the match exactly right. So this is the interpretation part. I'm interpreting what has happened here. A lot of researchers will inject some kind of substance, radio active substance, whatever, and they'll get little spots in a certain area, and they want to know exactly what nucleus that's in. So they'll do this overlay process to interpret where they are. So it's a very simple visualization but a very powerful tool. And then they can print this out and do all the usual things that you would expect. The other side that is planning research if you're going to go in and do an intervention. When you're thinking of like a small mouse brain, we're talking about 5 centimeters by 4 -- by 9 centimeters, right? Millimeters. Sorry. 5 millimeters by 9 millimeters. So you've got to hit an exact spot in there. So one of the things you do is you go to the atlas. This is where you're typically the researcher will do, they will go to the atlas and they will -- let me go back to the comparison and they will say okay, I want to hit it right here. They have a spot, it tells them the exact coordinates of the spot. You can even -- we have a calculator in there, too, if your animal is bigger than this or smaller we'll do a calculation for you so you adjust your instrumentation to hit that exact spot. And then you can say this -- I'm not going to do this. You can say this coordinate is a note. And then you go into the three-dimensional viewer. These are those same highlighted structures in the 3D viewer. You got all your usual 3D look at from it the top, look at from it the side, look at from it the front, play it all around, usual stuff. This, by the way, our VTK model is build from IT -- the toolkit, in case you're wondering. And, you know, you can play with the transparency and you'll see why that's important coming up. So now you're in that spot. And one of the things, as I said, you can do is you can plan your research. So let's say I wanted to inject into there. I go in and I -- hard to read these plans. Oops, wrong one. So I can say, okay, here are my saved injections. Here's my plan right there. Let's load this one. We're going to load that one in. Did I not load it? Okay. Let's try that again. Lower -- there we go. All right. There it is. So now it's loaded in. Let's show the injection. So now you see there's an injection canula coming down. It's in that lower spot. Obviously we have all the usual, you know, you can move it around, et cetera. One of the things that's very important is you can show what structures you're going through to get there. That might be important because you might want to not disrupt other parts of the brain. So let's say I want to show the rest of the brain -- this is the whole brain around it done in much less detail, right? So I can highlight my -- my injection site. And then if I want to avoid structures, so this down here, this list down here, is the structures my probe is passing through. I can change the angle of injection to go around so I avoid certain ones of those viewers. This is a process that's routinely done because they want to avoid certain cortical structures because that's part of the behavior they're going to study. So they want to figure what angle to come in to hit that exact spot that they were looking at. So you get the idea. Again, these are -- it's part of helping the researcher get the job done for planning their research. And then when they do their research, they often want to slice up their brain and then compare it to the atlas. Now, what happens when they slice up their own brain is usually they don't slice it exactly in the same plane that the atlas is in. Because nobody does it exactly right. So we provide a slicing tool for them to say let's now slice through the brain at a certain place and now you see I've sliced through one plane. I have two planes. I can slice front and back. And I can now change the angle back and forth to match the angle that I've sliced in my histology, either intentionally or otherwise. And then -- I'm not going to show it here, because it takes a while to do that. And then I can say, okay, now create a custom atlas. So between plane one and plane two, I will save off whatever frequency I want drawings of the entire brain at that section at that plane. And now I've got a custom atlas that I can compare an overlay against my actual results. All right. So again, we're trying to make this as easy to use as possible. One final visualization I want to show. I should have said -- called this out in the beginning. And my apologies for not doing that. The 3D modeling stuff we built in collaboration with the LN Institute For Brain Science down here in Seattle, they did a lot of the early work on building the toolkit we redid for our own things and added our own pieces but one of the things they did was they did a mapping of the entire mouse brain for every one of 20,000 genes and where in the brain that gene is expressed. That data is available on a public API. And so what we've done is we've added in -- let me get rid of the slicing tool here, what we've done is we've added in the feature to search for any particular gene that you want to show. I've preloaded some of the genes in here. It only takes a few minutes. But then I can show now that's the gene expression for that particular gene, which is the dopamine gene, which you notice is very closely matched to these particular structures. And you can get, again, any of 20,000 genes that you could load in here, and of course as you saw this morning you can set the threshold to get only the ones that are most strongly expressed and so forth. Again, now you're getting real visualization of where an exactly what's going on here. You can do that in combination with the slicing plane and more better isolate, if you look at it from the front, better isolate exactly where that is on this particular front and back plane, et cetera. All right? So you're really bringing together a good visualization of what you've done, where you've done it and how that maps. And finally, because I know I was only allowed a little bit of time here, finally one of the pieces that people talk about all the time is, okay, the structures are all right, and you can show all of the structures -- let's add in another structure. The hippocampus a favorite one for learning and memory and Alzheimer's disease. That's the hippocampus in case you haven't seen it before. There. Brought it nice and bright. This is the hippocampus back here. You can do all the usual things of clicking on them and highlighting them and finding out what they are and hover over them, all that kind of stuff. Anything you can imagine doing with 3D I think we've tried to throw in here. But what people really care about is how are these structures connected together? Now, we don't have the actual point-to-point data loaded in yet because it's monitor actually very complete. What we have done is we've mapped out all of the fiber tracks themselves. So anatomically where are these fibers going when they go from one spot to another? And you actually won't find this anywhere else. You can find oh, 100 different fiber pathways that are loaded up here. And if you look up -- let's bring in this one here, this one here wraps very nicely right around the hippocampus because that's the main fiber track leading out from the hippocampus. Again, you're starting to see all of the -- there's the -- all of the white matter. Let me bring them up to high resolution as well. You see all the white matter as it wraps around the brain structures. And then the fluids, et cetera. And then -- there. People like to look at this as a nice sort of quick little view from the top of the brain identifying all the cortical regions of the mouse brain. And then of course we have mouse brain, rat, monkey -- mouse brain, rat, and monkey, one of the things that we've done that nobody else has done is we've joined together the nomenclature for all of those. So if you're looking at one structure in one, you hop to the other of those two structures -- of species and you'll get the same structure. Haven't done that for the human -- I mean we, the authors, I mean be careful here. I was a neuroscientists but that was 30 years ago. But our authors are neuroanatomists are doing that, and there's joining up the human into that same nomenclature as well, which will be available later. Finally we're providing this information as part of an API. And working with some of our partners who do MRI research. MRI research functional MRI, whatever, a lot less resolution so you can't get down to identifying that, you know, the thousand structures. You can maybe identify maybe 100 or so. But the smaller structures you can't, the nucleus structures. So what they want to do, and so what some of our partners are doing, taking their MRI data, mapping the 3D to our MRI models, we give them a whole set of models, they match up the major structures and now you get linear extrapolations that go in each -- distortions that go in each dimension, right? So you map it up. They -- then we have an API. They send in the coordinates and we give them back information. Here's where you are. Here's information about where that structure is in the brain. This one I don't happen to have -- I got a bad one here. I'll just randomly pick another one. There. So randomly pick one. And here's -- and then it shows all of the atlas plates in our atlas that contain that structure for a reference, et cetera. Again, giving back information for them to use inside their software as well. Opening up again the data exchange and the overlay. Really quick whirl wind, but ->> Lee Dirks: Perfect. >> David Marques: Okay? >> Lee Dirks: Well, thank you very much. And what we can do is open it up for questions for Dave or for Rafael. All the way in the back. >>: Yeah [inaudible]. Just to clarify, is this primarily for research what you showed or is any of this in practice in terms of specifically things like linear accelerator, stereo static, radio surgery, any of the gamma life procedures where you need to take MRI data and have it mapped? Is any of this in practice? >> David Marques: This project is only neuroanatomy, right? Because that's -we are linking it back to the neuroanatomical structures. You know, whether the same things can be done or not, we are working on a paper to publish out exactly what we've done to create these models and how we map them and so forth. But honestly, I think we're not the leaders in that kind -- that level of that technology. As I said, our partners are doing the mapping between the MRI and our data models. So, you know, again, the same thing can be done elsewhere, but we're not going in other than basing it on our neuroanatomical stuff. >>: You have great visualization. So I had a question regarding some of the viewing of the gene expression data. As we know, a lot of this data has been accumulated over the last five or ten years. So you specifically mentioned that there are 20,000 genes that -- for those genes that the expression can be visualized on to a specific part of the ring. >> David Marques: So there are 20,000 genes that the Allen Institute for Brain Science has mapped out specifically for the mouse. >>: Right. >> David Marques: Full stop. They are doing more for human now and will, you know -- so we have the API. We'll be able to pull those in as those come out. Many other places -- there's tons of researchers that are mapping out gene expression done a different way different techniques resolved differently and so wind up in different structures and have different definitions. I mean, it's really very complex. And so we're making it available so that we can reach out to those others and pull those others in as well. It takes time to go through each one of those collaborations. Most of them don't have the open API that Allen Institute has. So we just started there. But you're absolutely right. There's just reams of data, some old, some new, and, you know, connecting it out and allowing the individual to say I want that piece or that piece. We're also going the other way around. We link out -- I didn't show you that, but there's a -- for any structure you can link out to any of our partner sites which are mostly university sites that have their own specific research on connections between one structure and another, for example. Computational modeling at USC. We link out to structures that are involved in a computational model for this -- how the structure works in the monkey brain, et cetera. >>: Yeah, that's fascinating. And you've pretty much preempted my question. But can a user who is using this sort of a tool can he or she, if she has specific gene expression data, send it to a -- to one of the places that you mentioned? And how frequently will they ->> David Marques: As soon as I can make that work. It's a matter of time and resources, honestly. We're all dying to do that. I would say the thing that's going to come first, before of the gene expression I think is probably going to be MRI data. Because we have people knocking at the door saying can we send our MRI data into your service? That service co-register -- that's a big tricky part, right? Co-register our individual animal against your standardized model. Then we can use the API, right? So that service I think is probably going to be where we put resources sooner rather than later. The gene expression data it's so far been a little bit more complex to work with those people. I don't mean those people, as opposed to -- I mean with the people doing that research. And so it's coming slower. That's all I can say. But we're dying to do all of those things. It's a mere matter of how many bodies. >> Lee Dirks: Any other questions? Dave, Rafael, thank you very much. [applause]. >> Lee Dirks: All right. Well, we'll go ahead and move on. And I'd like to introduce both Behrooz and Lorrie. My colleague, Behrooz Chitsaz from Microsoft Research. The two of them will be teaming up to partner on this presentation around the ScienceCinema multimedia search and retrieval in the sciences. Behrooz joined Microsoft in 1991 during [inaudible] he was involved in more than a dozen product chips, including the first three versions of Microsoft Exchange and Windows 2000. He joined Microsoft Research in 2002 where he led the program management team responsible for technology transfer for over 800 researchers worldwide to the product and services groups. In his current role as director of IT strategy for Microsoft Research. Behrooz is responsible for developing and executing on strategies for bringing various Microsoft Research technologies to market. And also do a quick intro for Lorrie as well, so we can facilitate the transition. Lorrie Johnson is at the U.S. Department of Energy's office of scientific [inaudible]. She holds a master of science degree in information sciences from University of Tennessee and she's completed dual bachelor of science degrees in biochemistry and zoology from North Carolina State University. And I didn't know that. I'm actually a big Carolina fan. >> Lorrie Johnson: I won't hold that against you. >> Lee Dirks: Okay. We won't talk any more college basketball, I promise. I'll hand it over to Behrooz and Lorrie. >> Lorrie Johnson: Thank you. >> Behrooz Chitsaz: It's a pleasure to do a dual presentation with Lorrie. We've been actually collaborating for the past [inaudible]. >> Lorrie Johnson: Two years. >> Behrooz Chitsaz: [inaudible] specific project. So ScienceCinema -- it's not on? Is it on? It's not on? >>: [inaudible]. >> Behrooz Chitsaz: Okay. Is that better? Is that better? I'll just speak louder. Okay. So ScienceCinema is a site that's in collaboration with Microsoft Research in a project that we call MAVIS internally, Microsoft Audio Video Indexing Service. And the idea is to allow to you search inside the spoken document. So essentially treat spoken documents with audio-video speech just as you would do textual documents. And when you think about millions of hours of audio-video being generated on a daily basis, today the only way for us to be able to get access to that is to the textual metadata that's surrounding it. And a lot of time that doesn't really give you the richness of what's inside the actual content. I think the previous presentation mentioned something like hippo something. It would be really nice for you to actually be able to search inside audio-video to be able to capture that. Last year I was interested in finding out for example what we're doing in -- around volcanos or if we had any presentation around volcanos because of the volcano eruptions in Iceland. And I did the search on our videos and I found actually two talks that were given at Microsoft on people working on sensor networks on volcanos in Iceland. Which was really interesting. The kind of thing that I wouldn't have found out if it wasn't -- there wasn't a capability to actually search inside the audio-video. So what I will do is just to put this in context, multimedia is very rich in terms of research. Just so you know, GE isn't the only company that's doing face identification and tracking people and tracking objects. We're also doing work in that space. We're doing some really exciting stuff which I mentioned yesterday in a meeting around 3D medical imaging, segmentation and the ability to be able to find anomalies in 3D medical images which I think is going to have a huge impact. And semantic extraction of inside videos. For example imagine I think a good analogy is sort of around supports. That's always a really good one. Imagine the computer being able to automatically commentate your favorite sport program, whether it's basketball or soccer or football or whatever it is, for it to be able to understand what a foul is, know what a three-point shot is, know what a, you know, strike is. So the ability for the capability to be able to understand the semantics of what's actually happening inside video. Those are some of the work that we're doing. Focusing on speech, today there are -- it's a huge area, lots of applications around speech and we're doing research in many different areas of speech. Today, though, a lot of the work in applications in speech is around using it as an interface, using it as an interface to directory services, for example, using, you know, bing and Google Mobile Search you can now get directory services. It works very, very well. And so there's a lot of work that's done in that space and a lot of applications for accessing services on the back end. Many services today you can access through speech. Also Windows has actually had speech to text for the past I would probably say about 10 years, for a long time. Now, what we want to do is sort of take it further, really looking at it from a -taking all the speech content and thinking of it as like essentially like textual documents. The ability to be able to search inside that extract metadata out of that and moving forward creating a high quality close captions. And in the future the ability to be able to create realtime translation of speech to speech. We actually demonstrated that at our event last year where we had somebody speaking German and another person English and having it in realtime translate from one language to the other language. In order to do that, you need more than just being able to understand the single phrase. In these cases you have, it's conversational speech or many different speakers, different accents, different domains they are speaking in. So a lot of sort of interesting challenges in this particular space. In terms of speech recognition at a very high level, there is the first -- the first process is to analyze the actual audio. There are really two categories of speech recognition. One is phonetic based, the other one large vocabulary based. The in the case of phonetic base you don't convert the actual audio stream into words, so you're searching inside the actual signals, essentially. In the second case, you have -- you do a vocabulary and that's high quality so you get more accuracy because you can apply grammar to that. So at a very high level you do -- you have these acoustic models that essentially take the audio stream and statistically you model that and compare that with phonemes. Then you create words out of those and apply grammar on top of that in order to get the -- get the -- recognize the words out of that. Now, one of the challenges of that is speech is very challenging. People have been working on it for 50 years. And it just gets better and better and better. But one of the things we wanted to do was focus on the search side of it. And once you focus it on search, there are certain techniques that you can apply in order to improve the search capability. If you were going to just take the speech recognition output and you create essentially a transcript. And you index that. You will not get a very good result. And the reason for that is the accuracy that you get out of creating the transcript is somewhere between 50 and 80 percent, depending on the speaker, depending on the domain, depending on a bunch of other things. So one of the things that we do is there are a couple of techniques that we use. One is called automatic vocabulary adaptation, which means that you take some of the textual metadata, the title, the speaker, the abstract, whatever that you have. Do you some natural language processing on that. You extract some keywords. You search on the Web. You get some documents related to that particular content. You do more natural language processing on that. You extract more keywords. You find out whether those keywords actually are in your vocabulary or not and you essentially add it to your vocabulary if it's not before you actually run the recognition. And do you that two or three times in order to improve the recognition accuracy. The other thing that you do is that we keep sort of word alternatives. So if I say something like Crimean War, the system might think I said crime in a war. And that might be sort of the high confidence on the Crimean War. So in order to improve the contrast of search, we actually keep all these alternatives. Because the user knows the context of what they're searching for, there's a higher probability of actually get what they're looking for. The other thing that we do is extraction of keywords and it's not just extracting keywords out of the actual speech content but we also do this searching on the Web, finding more documents that are related to the actual consent, speech content and using that in order to expand the keyword set. Let's say the document sort of mentions something like, you know, something about Microsoft. Searching on the Web chances are you're going to get Bill Gates and a bunch of other things, Silverlight and you're going to get SharePoint, et cetera, et cetera. So we now can get more information about that speech content by doing more analysis by searching the Web and getting more things that are related to that particular speech content. So that's another -- sort of the other things that we do. And at the bottom I've got a link to our site. In terms of -- one of the things, though, with all these techniques, the creating, the doing the vocabulary adaptation, keeping the word alternatives, doing the actual signal processing, all of this stuff is very compute intense. So one of the things that we've done is we've actually integrated into our Azure based cloud service. So all of that processing, that capability is all integrated in Azure, the company or organization doesn't have to invest in the infrastructure in order to do that, and that makes it easier to deploy. So their interface to that is just that RSS feed. The RSS feed contains links to the content that's being processed as well as the metadata like the title, abstract, et cetera. And this information gets uploaded to Azure. Azure will just download the content, do all the processing, feed back what's called the audio index blog which contains all the words, the confidence levels, the alternatives, where they appear in the audio and then that is now imported into SQL server. On the right-hand side is essentially what happens in the organization, and any database administrator is very familiar with what happens on the right-hand side because it's just dealing with normal full text search, essentially. So here I'm going to pass to Lorrie, partner here, and she will say more about the Department of Energy as well as give you a demo. >> Lorrie Johnson: Yes. Can you hear me? Good afternoon. It's a pleasure to be here. Behrooz just described a little bit about the technology. I want to spend just a couple minutes here talking about why this technology would be so important to a federal agency such as the Department of Energy. I know most of you probably think of the Department of Energy when you by a new appliance or a new car or you put a new roof or windows on your home. However, I wanted to point out that the Department of Energy is one of the largest research agencies within the Federal Government. And here in the US it does invest over 10 billion dollar each year in basic science research, clean energy research, renewable energy, energy efficiency as well as nuclear research, which is actually where we started years ago. The immediate output from this investment of 10 billion dollars of taxpayer money every year is information, knowledge, R&D results. So the mission of my organization which is the Office of Scientific and Technical Information, or OSTI, as we call it for short, is to accelerate scientific progress by accelerating access to this information. So we have been doing this at OSTI for a number of years, since the 1940s for both DOE and its predecessor agencies. We started just after the Manhattan Project starting to collect the information that people had done at that point. Originally of course it was all in paper or microfiche as some of you remember that format as well. In the 1990s we started transitioning to electronic formats. And today we have a number of specialized websites and databases which are geared towards two goals. One is providing access to the Department of Energy's research information. The other is to enable DOE scientists and other US scientists to gain access to the information they need to do better research here in the US. We have a few core products that I wanted to mention just briefly. Information Bridge is our full text report database and it has full text for over 250 documents since the early 1990s. Science Accelerator is what we call a federated search product. It provides access both to the reports within information bridge as well as R&D accomplishments and project information and some other things as well. And then lastly on this slide we have science.gov which we act as the operating agent for the Cindy group. And that one not only contains DOE information but information from 14 federal agencies. And of course OSTI is also the operating agent for worldwidescience.org on behalf of the Worldwide Science alliance. And for those of you who have been in the meetings the past few days, you've heard about this one already. It does provide access now to do 4 million -- 400 million pages of scientific and technical information from almost 80 databases representing over 70 countries from around the world. Last June in partnership with Microsoft Research we launched a multi-lingual version of this, which provides translation capabilities to nine languages giving users the option to enter query in one of those nine languages, have it bring back results, and then they have the option to translate as well back into their native language. All of these products, however, are text based. And as we've been hearing all day today, we've got lots of emerging forms of scientific and technical information. Certainly numeric data, multimedia data, social media. I'm not sure if anyone's mentioned Facebook yet today. But those are different types of information -- different forms of information that we all have to deal with. And we do see continued proliferation within multimedia in the sciences. These forms of multimedia often present special challenges and opportunities. And just to briefly talk about a few of these. We have lack of written transcripts for a lot of these things. There is no full text to search. So for those of us that are used to full text databases, you know, suddenly we have this multimedia video or whatever that doesn't have a corresponding transcript. Metadata if it's available is often minimal. In many cases you might have the title, the name of the presenter, a date. But for those of us that were trained in library sciences, there is no thesauri, there's no subject categories. There's sometimes not even a brief abstract or a description. Another challenge is the scientific, technical and medical vocabulary that many of us are dealing with. Behrooz gave an excellent example of how different words in different context can mean different things. And certainly we all have words like that in our fields where depending on the context it has a completely different meaning. Finally in the case of videos, these things can be very long. Most of the ones that I'll be talking about here in a second are and hour or more long. For a scientist or a physician maybe that's interested in just one particular experiment or one surgical technique it would be very difficult for somebody to sit there and watch and how long video when maybe they're only interested in two minutes of it. So that's yet another challenge that could be a substantial time burden. So to overcome some of these barriers and challenges, as Behrooz mentioned, we have been partnering with Microsoft Research for about the past two years. We had heard about Behrooz and the MAVIS team through another contact at Microsoft. And this project originally began as an XD technical activities coordinating committee endeavor. What we have done specifically is collect video files from our DOE national laboratories and research facilities. We went out basically to their sites and collected anything that we thought met the criteria of being scientific and technical information. If it was something, you know, like a promotional video, we didn't include that. We did create RSS feeds with metadata and the URLs were sent to Behrooz to the MAVIS team. And I've had a couple people ask me about that process. It was actually very easy. They just told us what we needed and our folks were able to get that. At that point, Behrooz' team took over and performed the audio indexing via MAVIS. Then they sent back the audio index blob to us and our IT folks at that point were able to integrate with it our SQL servers. So at this point, as far as we're concerned, the product performs exactly the way it would on our other SQL server based databases. So the end result is that the user can now search for a precise term within the video, be directed exactly to that point where the particular word was spoken. So make it in the next slide. Lee, do you know if this is going to play the sound in a second? >>: I think it will. >> Lorrie Johnson: Okay. Okay. So today I'm happy to announce that we are launching the final product out of this two-year collaboration. This is now a public website that we're calling ScienceCinema. It's comprised of about a thousand hours of video from DOE. Of course it does use the MAVIS technology in it. And officially launched today, as I mentioned. And this does represent a ground breaking capability among federal agents in offering public access to audio indexed video. So acid mentioned earlier, there are some challenges with multimedia collections, namely that you don't have full text, usually just search. But with this technology as Behrooz mentioned earlier, you can actually conduct a search much as you would a regular full text database. Okay. I'm going to move over here just a second so I can point some things out. Behrooz helped a lot with this interface. He gave multiple suggestions over the course of two years. So for a couple of folks that have seen it over this time period, thank you. As you can see, there are 35 results for the search term biofuels. You see a few familiar fields here, title, presenters, date. This is actually a thumbnail of the entire video. So if someone did want to watch the full thing, I think this one is about 55 minutes long maybe. And it's 342 megabytes for anybody who wants to know. So it is big. These snippets here are the actual occurrences of the term biofuels. I'm going to expand that, show more, so you can see all of them. So within this particular video each time the word biofuels was spoken you have it identified and then [inaudible] just a little bit. And then the most recent thing that we've done to this interface was actually add a timeline to the bottom. So each of these dots represents one of these snippets. So in this case, you can see there's a cluster of the words -- the word biofuels here at the beginning, a few more in the middle and then some at the end. So in some case you know, you might see a cluster representing maybe five or 10 minutes of video and decide to just go there and watch the full thing. Let's try a snippet. There we go. >>: All of these challenges into turning biomass into biofuels. And right now we're working both in the energy biosciences institute ->> Lorrie Johnson: So you heard him say the word biofuels. >>: [inaudible]. >> Lorrie Johnson: I'm going to play a couple more here. >>: With biofuels you heard Jim talk about we're going to have different crops depending on ->>: There's a lot of different biofuels and, you know, I mean I hear you speaking, you keep on talking about this alcohol-like ->> Lorrie Johnson: So the user could basically go through this whole video and listen to each time the word biofuels is spoken. So that's a big improvement we think over somebody having to watch a full hour's worth of video and listen that closely. We do have -- I do want to do a different search very quickly. Okay. This search is on the words energy efficiency, so it will do phrases as well. I haven't tried like any long sentences yet, but just to give you a quick example of a phrase, I'm going to scroll to the second one here. There we go. >>: I respect the importance of energy efficiency. [inaudible] because ->> Lorrie Johnson: Okay. So you could search for, you know, any word, a phrase. I believe it will do boolean and ORs as well. I won't do that just in the interest of time. Let me go back to the Power Point. Okay. With the launch of ScienceCinema today, we're already looking towards the future, as well. We do expect to receive additional content from our DOE researchers both in the university and the research laboratory community. We've actually modified our processes at OSTI to accept now multimedia forms and information. So we do hope that the ScienceCinema product will grow much beyond the thousand hours that we're currently offering now. The second thing, we do plan to integrate this website into the WorldWide Science project, along with hopefully connection from CERN and a few others. And we'll be showing that at the EXTI annual conference in Beijing this summer. One thing that Behrooz and I have just kind of touched upon and need to come back to is the creation of high quality automatic closed captioning. That is very important, especially within the US government agencies that you do offer some kind of closed captioning for a lot of these videos. And then finally he mentioned the ability maybe to do multilingual translations capabilities at some point on videos. And we would hope that that would be a possibility in future as well. So finally I want to extend a personal thanks as well as one on behalf of the Department of Energy to Behrooz, Lee, Tony, the MAVIS team and to Microsoft Research for partnering with us in this endeavor. It's certainly been a pleasure to work with Behrooz. And we look forward to continued collaboration. >> Behrooz Chitsaz: Vice versa. Same here. >> Lorrie Johnson: Thank you. [applause]. >> Lee Dirks: Are there any questions? Yes, there is. >>: Well, it's transformational, that's for sure. >> Lorrie Johnson: Thank you. >>: I noticed when you had the energy efficiency sometimes the snippets were just the words energy efficiency, which aren't particularly helpful, and other times it had some context around it. And it seems to me if I were looking at -- if I were searching and using this, I would be using the transcript -- I don't know if I would even want to listen to it if I'm going to just hear that word at that point. But if there's enough context around that word, it's the transcript with just the most valuable part. I almost don't need the -- I mean, at that point I might want to listen to the whole thing. But if I know there's enough context around it, I'll know oh, no, this is about something else, I don't want to go after that. So what explains the differences sometimes? >> Behrooz Chitsaz: So, yeah, there's a couple of things. One is around the context. That is -- that's designed by the algorithm so you can actually specify I want three seconds of context, go back, you know, four or five minutes or whatever. You can certainly do that. >>: [inaudible] specified. >> Behrooz Chitsaz: Well, it's -- it can be specified for the corporation basically. So we can configure that. >>: [inaudible] the user. >> Behrooz Chitsaz: No, not either user. >> Lorrie Johnson: Right. >> Behrooz Chitsaz: Yeah. So it can be configured for that particular -- it can be, in fact, configured for the user as well if we wish to do that. So the capability's there to do that is what I'm trying to say. What is -- but there's a difference between an actual transcript and what you're seeing here. Because what you're seeing here is essentially searching this lattice which does have all the word alternatives. In order to generate a full transcript, what you need to do is essentially walk the highest confidence path which may not be the correct path to walk. Like the example I gave around Crimean War versus crime in a war. So my transcript would probably say crime in a war, while I actually Crimean War. So that is the difference between search and an actual transcript. But in the future, there's ways of improving the actual accuracy, and those are things [inaudible] doing, personalized vocabulary or personalized acoustic models for the particular person. There are a lot of techniques that can be used in order to actually improve that. But speech recognition for in general case is still a very, very challenging, challenging job. >>: So when I see just a couple words there, does that mean that it's spoken slowly? >> Behrooz Chitsaz: Yes. >> Lorrie Johnson: Probably. >> Behrooz Chitsaz: So that's essentially what it is. >>: The timeframe ->> Behrooz Chitsaz: Exactly. >>: [inaudible] catch that. >> Behrooz Chitsaz: That's exactly right. So it's more related to time is what you're saying. Exactly. >>: But would I presume that you could go backwards. >> Behrooz Chitsaz: Yes. >>: Grab the arrow backwards [inaudible] get your context -- >> Behrooz Chitsaz: Good point. >> Lorrie Johnson: Uh-huh. >> Behrooz Chitsaz: Absolutely. >> Lorrie Johnson: Uh-huh. >>: [inaudible]. >> Lorrie Johnson: Right. >> Behrooz Chitsaz: Thank you for that. That's very important. >>: What he's saying then is he doesn't want to have to listen to it [inaudible]. >> Behrooz Chitsaz: If he can read it [inaudible]. >>: Without slight wait [inaudible]. >> Behrooz Chitsaz: Yes. Yes. There's absolutely. I just remembered now what the previous presenter mentioned. It was hypothymia, right? Hypothymia? Who was the previous presenter? >>: Hippocampus. >> Behrooz Chitsaz: Hippocampus. Okay. >> Lorrie Johnson: Yes. >> Behrooz Chitsaz: So that wasn't a word that was part of the title of the talk or part of the transcript or part -- part of the description of the talk. However, the presentation that visualization that they showed was beautiful visualization. So it would be great for me to be able to later on, if I -- if I was in the medical field and I wanted to search and find out did anybody in Microsoft mention this ever, I would love to be able to get that particular presentation. And yet today I won't be able to do that because -- well, actually in Microsoft you can because we're in indexing our content. But in many other places you won't be able to do that, because you're searching the text, and the text is typically -- what's actually spoken is much more content than what's spoken than what's actually in the surrounding text. >>: What about other languages? I would love to see those for let's say German. >> Behrooz Chitsaz: Yes. I actually it's interesting you mention German because the lead researcher is, in fact, he's originally from German. So this is he's actually done the language both and the architecture supports multiple different languages. In order to introduce a new language you need some training data, close to I would say starting off about 300 hours to 500 hours of training data in order to train the system then add the vocabulary and then things like that. So it is -- it is doable in other languages. Right now we're sort of focused on the English language and sort of understand what it means to improve the accuracy and then moving to other languages. >>: Is it part of the [inaudible]. >> Behrooz Chitsaz: Yes. It's definitely -- it's definitely a part of something that we're thinking about doing for sure, creating -- even creating tools to make it easier to introduce other languages, absolutely. And I mention -- it's interesting you mention German. Because we actually did the translation, the realtime translation was, in fact, between English and German. >>: So if you wanted to perhaps do a project with the National German Science Library, that's who you would want to talk to. [laughter]. >> Behrooz Chitsaz: Okay. Great. >> Lorrie Johnson: I've got her card. >>: Good advertising. >>: So I'm curious. I may have missed this. But can you say a little about how you handled disambiguation and homonyms and other types have nyms and when people are using, you know, sort of keyword level searching? >> Behrooz Chitsaz: So that can -- that can actually be -- there is a level of sort of the text search, so there's a lot of techniques in order to do that as part of text. One of the things we can do is in fact integrate into the bing logs and the search alternatives and what people have done. So there's the speech recognition and then there's this search. And we can sort of improve that. We haven't done that yet. But we can certainly improve that by adding the same techniques that we do on the Web. So it's essentially taking whatever bing has done and sort of using that in order to improve the search experience. >> Lee Dirks: This will be the last question. >> Behrooz Chitsaz: Yes. >> Lorrie Johnson: Yes. >>: I was curious with the ScienceCinema, when you have Department of Energy researchers adding new video, is that something they can do dynamically and on automatically or is that something where you send the video back to Microsoft again and then add it to the website? >> Lorrie Johnson: We have a process within our office at the Department of Energy where the researchers actually submit files via whether they're text files or in this case now metadata, video files. It would then go into our regular system, which contains, you know, lots of formats. Then if it was identified as a video, we would flag that as a candidate for ScienceCinema and then at that point we would probably collect a number of videos before we would then send them to Microsoft to be audio indexed. So it wouldn't be, you know, a real instantaneous process. I mean that would be great at some point. But, you know, it's easier I think to index a number of hours rather than one video at the time. >> Behrooz Chitsaz: The interface, just so you know what's happening on the Microsoft side, what we do is we essentially take that RSS feed that has a link to the content, we read that content in order to index it in Azure, and then it's deleted. So there's no -- there's no trace of that in Microsoft. So it just, you know, that's basically -- we just -- we need to read it in order to be able to index it. So that's the only thing that we essentially do with the content. >> Lee Dirks: Well, thank you very much. >> Behrooz Chitsaz: Thank you. >> Lorrie Johnson: Thank you. >> Lee Dirks: Please join me in [inaudible]. [applause]. >> Lee Dirks: I'll let Sebastian get set up and I'll do his introduction. Dr. Sebastian Stueker is the leader of the research group Multilingual Speech Recognition at Karlsruhe Institute of Technology. His work currently focuses on automatic speech recognition for simultaneous and offline speech translation systems. He received his doctoral degree from the University of Karlsruhe on the top of his acoustic modeling for the underresource languages in 2009. In 2003, he received his diploma degree in informatics from Universitat Karlsruhe. His diploma thesis being on the topic of multilingual articulatory features and their integration into speech recognition systems. He has extensive experience in speech translation and has been working in the field for more than 10 years. He is known for his work in multilingual speech recognition, particularly on work on strategies where data is insufficient or costly to obtain a large scale integrated speech translators. And we will hand it over to you to tell us more about Speech Processing Applications in Quaero. >> Sebstian Stuker: Thank you very much for this kind introduction. Yeah, so what I will do is probably complement a little bit the previous talk. And since I'm coming from a university background I will also give you a little bit a glimpse into the future what Microsoft might be doing in the next couple of years. The title of the talk is Speech Processing Applications in Quaero, so the first question I should be answering probably is what is the Quaero project? Well, first of all it's not a project. Quaero is actually a program. And it's a French program. It started out as a French-German program but due to some political difficulties in the end it turned out to be a simple French program with German participation. That's the official language. German participation being the Karlsruhe Institute of Technology, formally known as the Universitat Karlsruhe and the RWTH [inaudible] University. And then our French partners and being funded by the French state. And speech technologies are one of the very important parts in this Quaero program. Quaero itself is all about addressing multimedia content, making multimedia content searchable so that you can search through a large databases of video, audio, and text. And in order to do that, as pointed out in the earlier talk, it is helpful to be able to deal with the speech that is happening in the multimedia content. Another important feature is to be able to deal with the vision part of the multimedia content and other parts of the Quaero program are actually dealing with it. But in this talk, we will concentrate on the speech part of it, and especially I will concentrate on a speech translation part of it. And when it comes to this conference or workshop today, one of the questions we should answer is how can technology actually advance science? And when I thought about that, if you think about it, one of the important parts can have an impact is actually academic lectures and talks. A lot of scientific knowledge is disseminated in the form of lectures today at these workshops, obviously. And these lectures are regularly now recorded as today at this workshop or as in lecture hauls at the universities all over the world. And in order to be able to access this scientific content, it is necessary to be able to search through that. And then of course besides being able to find the correct scientific content, we also have to bridge a language barrier. English is not the only language in the world, English is not the only language chosen to give academic talks in the world, and most of us will be able only to speak a very limited of languages. So there's a lot of scientific content potentially out there that we cannot access because of the language barrier. So I will also hint and show how speech technology in the form of speech translation technology can actually help to bridge this language barrier that we have. Other parts besides are finding our content, but I will not talk about that in much detail, is already set. There is this huge wave, tsunami of information being thrown at us in science. So in the previous talks you could see how it could simply jump to the part in the video where certain keyword is mentioned so you don't have to watch the whole video. Other important or interesting parts that we've worked on in the past and -- but which is currently not my specialty is how can we summarize content automatically so that you can sort of condense the content of a lecture to the most important parts so that you can reduce the time that you need to take in that lecture. So the Quaero program. It's a five year program. It started officially in 2008. It had a sort of longer preparatory period where all the political issues were sorted out. And it started out with France and Germany realizing that France and Germany are not spending enough money when it comes to funding research. The United States traditionally have been funding research from a government side with large a little over very long time and in Europe that sort of has gotten out of fashion. Japan, for example, suddenly realized that that was the same in Japan, and so they started to hike up the public funding for research again. And then Europe realized, hum, maybe we should do something about that as well. So Chancellor Schroeder and Jacques Chirac, the French president back then, got together and decided we need to do something about it and started this -- and agency that was supposed to fund innovative projects. And one of these first programs to be funded was Quaero. And it's a pretty large project. It's -- it has a budget of about 200 million Euros. Over those five years. And the French state is funding 100 million Euros for that budget. And as I said, it's a program not a project. Program because it's made up of multiple projects. And there are two types of projects in this program. One is -type is application project where industry is heavily involved and where they are certain projects -- certain applications that they want to research and develop and where they get funding from. And in order to have really innovative new projects they are two technology projects. And the role of the technology projects is to advance the state of the art in all different kinds of technologies, such as speech recognition, machine translation, image processing, retrieval technologies, you name it, whatever you need. It's a real wide spectrum. I wouldn't be able to enumerate all the technologies. But these technologies are forwarded in the technology project and then transferred into the application projects in order to advance gift bed applications. All the application projects deal with multimedia are content in certain ways and they have this nice arrow where they sort of have a spectrum. On the one side you have content providers and the other side you have the end user. So you want to have tools for content providers in order to organize the media databases. Let's say you are a news station with a large archive of broadcast news that you have produced over the years. You want to be able to search through that database and to pull up clips from the past, et cetera, so you need tools from that. On the other side of the spectrum, you have a user with a mobile device who just wants to watch a personalized video, his personalized TV program. So he has sort of the other side of the multimedia content view. And the program wants to provide technology that actually realizes to do that. So when it comes to speech application projects and Quaero, they are basically these technologies being researched in the basic research project. On top there is automatic speech recognition. And the reason it's automatic speech recognition is still a very challenging problem. So projects for speech recognition have been out there for a very long time and has been researched for decades and decades and it's still far from being a soft problem. In the Quaero project, we're currently addressing seven languages, the core languages being English, French and German obviously, since it is a German -a French program with German participation, and English being one of the major languages, especially when it comes to communicating with people -- with different languages. And there's a growing group of other languages that seem to be interesting, especially when it comes to the property of the languages you want to select languages that are somehow different from the main languages such as English. So currently it's Russian, Spanish, Greek, and Polish. And there are going to be two more languages over the course of the project. Four patterns are actually involved in doing the automatic speech recognition. That's the KIT, then LIMSI, part of the CINS in France, the RWTH in Aachen and Vexes Research are now being called Vocapia. I think they just formed a new company or renamed themselves. They also do speech recognition on a commercial basis. Then another technology that you need and that's being researched is speaker diarization. Speaker diarization has two facets. One the you have a huge chunk of audio and you just want to cluster the audio segments into groups that are homogenous with one speaker. So you have to narrow speaker IDs and you want to know which speaker spoke when. And then there is what is called political speaker tracking. Doesn't necessarily have to be political, they just chose political as an application where you have speakers that are known beforehand, you know you want to look for speech coming from that known speaker, you know his name, for example, politicians, and then you search the whole database and try to find all the clips where that particular person has been speaking. Then language recognition. So you try to decide that audio file which language actually is it in. Then it's more complicated than it first seems. The reason being is humans are so good at it. So it's hard to imagine why a machine should have actually trouble with it but it is -- needs some work. Then emotion recognition dialog and vocal interfaces have been part of the work package note. But actually have not been worked on that intensively because they haven't been asked for by the application project providers. So the question is why they didn't want it. And first everybody thought it might be interesting but then it sort of turned out nobody really had an idea how to use it in the application. So it faded out a little bit. And then of course we have machine translation, and then when you combine machine translation and automatic speech recognition you get this new field of speech-to-speech translation. And in order to measure progress and in order to really drive people to achieve progress, the project is using something which is called coopetition. This is an artificial word consisting of cooperation and competition. So what we do is we have yearly evaluations where you have a common test site, you evaluate your technologies. We partner one [inaudible] technology on that common test set and in the end you measure who was the best, who had the best performance. And then you exchange what you did in order to be so good. And what were the real tricks, which techniques gave you improvements. And that is the cooperation part. So you then later exchange and next year have been knows the tricks from everybody else and then you have to come up with new tricks in order to be the best. Okay. Automatic Speak Recognition. The speaker before already talked a little bit automatic speech recognition, gave a little bit of an overview how it works. I will not do that because actually it's too complicated in order to talk about it in two minutes. When it comes to applications for automatic speech recognition we have done lot of work in interaction with machines. So if you want to direct humanoid robots, you want to control computers, appliances, but also a human-to-human integration. For example, speech translation needs automatic speech recognition as its part if you want to have people with different -- speaking different languages interact. And we've been working in that field now for over 15 years, almost 20 years. And also we've demonstrated live translation scenarios, among different languages. And then a different field is if you want to observe the human user and you want to predict his or her needs. For example, you've seen this smart room with the -- from GE with the video. We've had projects where we did something similar. We also had a video observing the users in the smart room, their actions. But we were also listening in to them. We were performing speech recognition in order to predict their needs in addition to observing them visually. Speech recognition is difficult. And why is it difficult? Well, actually once question by a researcher. She asked me, oh, speech recognition seems similar. I'm working on medical images. It's much harder, it's multidimensional. And speech recognition is one dimensional signal, should be easy. Well, actually it's not. Now, the reason why it is not easy is speech is very variable. When you look at it as a recognition problem and that is the way you look at it in science, you have the problem that the pattern -- every time somebody speaks the same thing, the pattern that you work record with your microphone looks different. Even if you have the same person saying the exact same thing under the exact same conditions, the actual physical recording that you will make looks different. And dealing with this variability is very challenging and that is been tackled using statistical methods. In real life it is really difficult because now you have different microphones, you have different recording distances. The environment is different. You have all sorts of -- sources of noises. People are talking and are usually cross-talking. So you have several people talking at the same time and you have to sort of concentrate on one speaker only. And so on and so on. Speakers are in different emotional states, have different accents, have different kinds of voices, et cetera. Humans are good at automatically adapting to that. For machines it's actually difficult. And as you've heard, speech recognition systems today learn automatic models from large amount of annotated corpora. And you need to collect and annotate this corpora, which is very expensive. So that makes it easier for big companies because they have a lot of money they can spend on that. But that's why we've been working also on methods and trying to sort of reduce the dependency on these large amounts of data. And then you have the field of machine translation. And machine translation has also been researched on for quite a long time there. And if you talk to a machine translation researchers, they are sort of two different kinds of approaches. The first approach that people used was to do a rule based approach. So what you basically try to do is you took a sentence, you tried to extract the semantics into re-presentation that is independent of the language and then you try to generate a sentence in the new language that had exactly the same semantic information. Works pretty well if you have written the right rules. Because all the rules had to be written manually. So it's a lot of work. And there are actually companies out there that have been working in that field for 20, 30 years, and they've started now to have finally enough rules in order to tackle a language in a very large -- in a larger domain. But then IBM pioneered a new technique in the field of statistical machine translation where instead of humans writing large amount of rules by hand, which is time expensive and money expensive, you had just machines learn automatically from parallel corpora. So the machine was learning by itself by simply looking at examples and constructing statistical models that are very similar to the ones that you use for automatic speech recognition. So also machine translation is difficult for other reasons. Not as much variability, but things like word order. So different languages have different word orders. So the machine has to be able to reorder the words, and that can be quite difficult. For example, if you translate from English into German, the Germans have this very unfortunate habit that they sometimes tear apart the word. So it could actually happen that you -- that is one word in English, the word needs to be torn apart into two pieces in German and they go in two different opposite directions into the sentence. And then it's really hard to tackle. Also word fertilities. So one word in one language might be multiple words in another language. Or multiple words in one language might be one word in the other language. And then you have ambiguities that you need to resolve. So here comes the question from semantics that we had in the previous talk. So if you look at the English word bank, it could be either the financial institution, it could be the edge of a river. In German the exact same word spelled exactly the same way could refer to a financial institution or could actually be a bench. So these are things you have to deal with. You have to -- in order to be able to translate correctly. Okay. And that I will skip for now. Maybe I will get to that later. So if you now combine automatic speech recognition and machine translation you get speech translation. It is very disappointing to see, but it's a fact nowadays speech translation is still a simple concatenation of automatic speech recognition and machine translation. In the previous talk you heard about how using these word lattices which is nothing else but a compact representation of different alternatives of recognized sentences from the automatic speech recognition how using these work lattices instead of the single best output can improve the search when you are looking for keywords. People having trying to do that for speech translation as well. Unfortunately they didn't have very much success. Wasn't -- they didn't get out as much improvements as they were hoping for. When we talk about speech recognition and speech translation, we usually distinguish between two different scenarios. I call that the offline versus the online scenario. The offline scenario is what you've seen in the last talk. We have a large database of videos and you want to translate them or you want to search them. So you can take your time. You're not in a real hurry to transcribe the video or to translate the video. Also, you've got the whole material that you want to translate in one big chunk. And that gives from you a technical point of view several advantages. You can do very nifty things like you go through the whole material and can take the whole available knowledge, the whole information that is in that one recording segment the recording into good sentences, to identify different speakers in the recordings, to adapt yourself to this different speakers in an unsupervised manner by first doing a first recognition and then using that recognition to adapt yourself to the different speakers and then improve your recognition. So you can do several passes and you can take time. You can go burn a lot of CPU time in order to achieve as good a result as humanly possible or as machinely possible, I should say. So the only thing that actually limits the amount of time you are allowed to spend is until when does it need to be done? For example some companies operate in a way they get the data and they provide the result overnight or at what pace is additional dated added to your database? For example, if you look at YouTube there is insanely large amounts of data added every hour, every minute. So I've got a timer. It says 19 minutes 11 seconds since I started speaking. >> Lee Dirks: [inaudible] leave some time for questions. [laughter]. >> Sebstian Stuker: Okay. Okay. So in the other cases the online, simultaneous translation. So you have to keep up with the translation, you have to keep up with the speech recognition which is very time consuming. You have to go through a lot of tricks in order to be able to really process your data in a fast way. So here comes another science part. Academic lectures. Academic lectures happen to be online and offline. They happen live in the lecture hauls at the university, and they're being recorded and put into large databases. Here are some examples of large databases. For example, MIT started to record all the lectures during the course of the OpenCourseware. Carnegie Mellon university has the Open Learning initiative where they also collect all the data. And now the latest thing is and there seems to be very successful is the iTunes U, like university service, which has selected large amounts of data and also videos of lectures from more than 12 countries. So that's where the translation part now comes into the things. Besides teaching at universities academic lectures at conferences, et cetera, are also now routinely recorded. And one thing, even though we're short on time I want to show it, when the talk of speaker from [inaudible] Tim, where are you? Not here anymore? Here talked about how he has different multimedia things and how to visualize the research, et cetera, and how to make things available to the public. There's something called the Ted lectures, technology, entertainment design. They have sort targeted the general public. And they talk about technological things. And so what he forgot to mention I think is basic public talks in order to bring the [inaudible]. So this is for example one of the Ted lectures actually in the large [inaudible]. Trying in a sort of entertaining but still scientifically sound way introducing the collider to the general public. So in order to be able to advance science we can use speech technology to work with these lectures. We can try to find the relevant information by using information retrieval. We can try to summarize the content. They have techniques for that. They are not very advanced yet. It's still a research topic. But it's something we should look into the future. And then of course we need to overcome the language barrier. Currently what people are doing is they're using broken English at the conferences. And usually English is -- everybody thinks everybody speaks English, especially in the academia. Actually that's not true. English is not as widely spoken as you think, even in countries -- areas like the European union it's less than half of the average people that actually are able to communicate fluently in English. And you should never forget where you often in a position where we don't have as much difficulty learning English as other people because our languages are very close to English. Imagine that some cultural revolution takes place and to demographics, tomorrow Mandarin will be the lingua franca and put yourself in the position now having to give a talk in Mandarin at an international conference. And then you probably will feel what pain they are going through in learning English. So I won't talk too much about how it is important. Believe me, it is important that we keep up the language diversity in the world that not everybody of speaks English because they're speaking different languages brings along a whole different ways of thinking. The language that you speak influences heavily the way that you think. If we get rid of all languages and only leave one language or of one language left we will actually get rid of a large diversity of thinking. It will be a large loss to us and all science advances. Okay. The last five minutes that I have now I would like to show you some things. So at Microsoft was hinting at that they want to in the future they want to do multilingual translations. So let me give you an example of what that might should look like when Microsoft is going to do it. So this is something that came out of a European project that was finished in 2007. So what the project did back then is they recorded or they used the existing recordings of European parliament lectures and started to translate them. So this is one of the examples that ->>: Ladies and gentlemen, I'm delighted to be here -- >> Sebstian Stuker: So this is back then British secretary of state speaking at the European parliament. And there we have our translation to Spanish. So we did different translation directions. [inaudible] we also have German translation in there. We've also been working with other data. So this is actually now an offline case where you have a database [inaudible] processing. We also tried something what happens if we apply that through a [inaudible] to a different scenario. So this is a system actually running in realtime. It's recording but there we [inaudible] take this time of taking a long time of translating it. So this is now from English into -- from German into English which is a very challenging speech [inaudible]. Also what we've done in the past in 2005 we actually demonstrated or we actually did show a first our simultaneous lecture translation system. So currently we are doing -- personally I'm doing a lot of work on advancing the simultaneous lecture translation system that we developed that simultaneously would translate or can translate lectures in our case from English into Spanish as shown here. So this little video shows how that system works. You have a speaker that the audience recorded. It's transferred to a [inaudible] works on a normal PC and then it translates and then the translation is brought to the audience. And we have some different kinds of modalities how to bring that. We have these [inaudible] devices. We've gotten some prototypes from [inaudible]. What they do is they produce a very narrow bit of audio. So the idea is that you have different areas in the audience, the Spanish speaking, the French speaking and then you have different areas there. The other things that we're working with are subtitles. So we have these [inaudible] where you have subtitles displayed to you personally. We're working with subtitles projected on to screens. Then the background you can see some people wearing these goggles and we're also working with these [inaudible] in order to [inaudible] this is what the audio device sounds like. That's what it looks like. So it produces this narrow beam of audio. And then another thing that we've done -- skip that. Another thing that we've done is you heard -- you've seen -- it is very time consuming and very, very intensive to do these speech recognition and translation. You need a lot of computation power. So we decided to put it all on an iPhone. Actually back then when we started the research, it wasn't an iPhone yet so it was first a compact handheld. And most of the translation systems that you see nowadays do need an Internet connection. They record the audio then send it to a server, have it processed, translated. You send back the result. It's sort of unfortunate if you are in a foreign country, the roaming costs will kill you. [laughter]. It's good for the companies providing the service but for you as a consumer it's better to have it all in one device. So we actually now have [inaudible] now has this spinoff company where he commercialized that [inaudible]. So he's [inaudible] decided to have fun on vacation. It's called Jibbigo to have that in one of their commercials. And basically two way translator, different languages [inaudible] English, Spanish version that all runs on the simple iPhone. And it's -- since it has to run on the iPhone, it's for our tourist domains so it can't do the whole lecture translation thing. For that we need at least a laptop size device. But for tourists travelling abroad, things like an iPhone, iPad also works on nowadays already sufficient. So if you want to play around with that, I have two versions with me, otherwise you would have to pay I think 25 bucks on the [inaudible] site. [laughter]. With the lecture translation system I didn't bring that with me today. I can show you that. I can also show you, if you want to look at what the Quaero people have been doing in the field of information retrieval there's actually a nice website that you can play around with. It's the Voxalead -- Voxalead News Service that's a French company, and they've used speech recognition technology in order to index news videos, news clips. And you can then query the news clips either radio shows or TV shows or if you look for Microsoft you get even from BBC it's latest one or this is about Bill Gates at the economic forum in Davos. And it's available in multiple languages. So currently you have French, English, Chinese, Arabic, Spanish and Russian. More languages I guess to come, because I know the company that is providing the speech recognition technology also has multiple -- many more languages in their toolbox. So that is just so that if you see that there is actually multiple companies nowadays providing this kind of information retrieval capability. This isn't scientific document scenario but as I said we're currently working on the academic lecture scenario. We've been working a lot with the Ted lectures. So there's also work going on in that area. And that is really something that is useful for science if you are able to access language -- if you are able to listen directly to the languages off an ongoing presentation or if you are able to actually access across languages lectures that have been prerecorded in the database. And that is with respect to this workshop I think the most important point that that is really worthwhile thing to do and that it will advance science significantly if these products get out and the core technology improves even further to a point where it is applicable in multiple languages and as reliable as humanly possible. Okay. So I guess my time is up now. >> Lee Dirks: All right. Thank you. Thank you very much. [applause]

>> Lee Dirks: Thank you, everyone. We're going... portion of the schedule. Thanks. Hope you guys...

Related documents

Products

Support

&gt;&gt; Lee Dirks: Thank you, everyone. We're going... portion of the schedule. Thanks. Hope you guys...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> Lee Dirks: Thank you, everyone. We're going... portion of the schedule. Thanks. Hope you guys...