>> Matthew J. Graham: Good afternoon, and welcome to the afternoon session which is practical astrosemantics workshop. This is actually the fifth gathering we will have had which is devoted to astronomy and semantics and astronomy. I'm going to talk for about 25 minutes on just exactly what semantics and semantics in astronomy is and then Norman is going to talk about some stuff and Sebastien is going to talk about some stuff and then [inaudible] will talk about some stuff and then will have a discussion about some stuff, which is what semantics really is, isn't it? Let me start by giving you a definition of astrosemantics and essentially astrosemantics is the branch of astronomy which deals with machine processible knowledge and it's nothing more than that or it's nothing less than that, and it does this through the use of a set of technologies which you can broadly call semantic technologies. A lot of this stuff traditionally came out of the AI community and has been adopted by various bits of the web and things like that, but there has been precious little application specifically in astronomy and what we are trying to do is promote the usage of these tools because in the era of big data everyone focuses on big data and doesn't think about big informational, big knowledge which is what you get necessarily when you have big data. You quite often see this pyramid or diagram where you have data going into information going to knowledge going to wisdom, and the statement is that normally you'll spend about 40% of your time grubbing around in the data and maybe 30% of your time working with information and if you're lucky 20% up in the knowledge layer and then the final 10% of your time gaining wisdom. However, by using semantic technologies you actually invert that and you as a carbon-based system would be spending maybe 10% of your time down in the data layer and maybe 20% in the information, but you'll be spending the bulk of your time up here in the knowledge and wisdom part of it doing good thinking, good science, making discoveries instead of wondering about why is my data crap and what can I discover in it. The bulk of this stuff will be handled by semantic technologies or smart technologies. I think smart is the way to think about this sort of stuff, that you're not, we’re not talking about making intelligent computers which are thinking. We are just talking about knowledge machine processable so that you can then apply first-order logic and the sorts of things that computers are really good at, so that they can handle that sort of stuff in the same way that they already handle basic zeros and ones. Why do we want to do this? What are some of the things that are making knowledge machine processable would be good for in astronomy? Let me go to Ned and I type in an NGC number, NGC 7377. This is what NGC 7377 looks like and it comes up with a whole load of annotations. One of the annotations that NGC 7377 has is most unusual galaxy. This is the most unusual galaxy in Ned, so it is tagged. The reason is because if you go back to the original source paper from the 1970s talking about this, someone said this is a most unusual galaxy and someone has annotated that in a fashion. But that means there has been a judgment call on this. There's something about this object which makes it an unusual galaxy, so if you could take that as a tag you might want to go to the system then and say well, if this is such an unusual galaxy in the opinion of someone, I would like to find other objects like this either by type, so I would go out and find more spirals or more ellipticals hopefully that someone has tagged up. But what happens if instead using a tag such as E someone has tagged it up as elliptical using the word elliptical? Somehow my system needs to know that E, the concept of E in this particular tagging system and the concept of elliptical in a particular tagging system are equivalent, that there is a concept scheme being used here, that there is some piece of domain knowledge that says when I am tagging some people might use this particular tag or use this particular tag. That's a piece of smartness that you want to put into your application at a very low level. I might want to do it by properties. I might want to say that actually I'm interested in things which have spiral arms and someone may have said that this is a spiral galaxy or that this is a galaxy of S type. Somewhere there will be a definition in your smart system that says these are the set of properties that in my worldview I will define a spiral galaxy as having. It will have spiral arms or it might have dust lanes. If I say it has dust lanes then I can do a search for objects which have been tagged as having dust lanes and my system will be smart enough to know that that falls within my definition of what I consider to be a spiral galaxy. The word spiral doesn't appear anywhere in that tagging, conceivably; there could be an object that says this has got dust lanes and that's what it has. It doesn't mention spiral at all, but when I do my search to get the objects like that it will come back and say this is a spiral galaxy because it's defined to have dust lanes. I could say find me more objects which are of the particular distance of this object. You know, there might be objects out there that don't have the specific distance metric attached to them at the moment, but they may say that this is a spiral galaxy. In my knowledge system I might have something which says spiral galaxies have a distance measure, or can have a distance attached to them through something called the Tully Fisher relationship and the Tully Fisher relationship requires there to be an H1 line width attached to it, so maybe I can go out and find galaxies are objects which have an H1 line width attached to them and somewhere there is defined an algorithm to say you can use that to guess a distance, an estimated distance for that particular galaxy using this particular algorithm or this particular property. That's a broader set of what the main knowledge would be, but I can express that in a machine processable fashion so that my smart system can go out and give me a data set that's usable or present me with a set of data that I could then do. This is not just by tagging; it's also by inference that I'm using, I've expressed my knowledge and my information in such a way that the system can be smart about it and do things on my behalf that I've already told it or given it the knowledge to do so. On a completely different note, in Second Life various people have, there's a little thing called a semantic museum and what people have done there is that they have put in this particular case, it's about classical Greece. There is a display about classical Greece and they have created pictures of various objects from classical Greece and they put descriptions of those objects in such a fashion that there is knowledge about those objects and knowledge about their provenance and how it's all connected together, pieces of information pieces of knowledge that are connected together in a machine processable fashion. What happens when you go up to the object is that this description pops up about the object saying that this is a prochous and it comes from the classical period of Greece? It sort of tells you how this was characterized and it tells you something about the object and where it is. Each of those little pieces of information in there does not necessarily reside in a single block of text in Second Life. It actually is probably stored at the level of individual pieces of information, but you can connect the pieces of knowledge together like we are familiar with in Wikipedia linking it all together and you can retrieve it because it's all related and tagged together. You can also because you've encoded information in such a way and said what the structure is, you can do clever things about multilingual presentations and all of this sort of thing, so the term that I use for this sort of approach is artificial docents. Now where this might be useful is if you are doing a sort of smart EPO type thing, so here we have a curiosity picture showing stratification on Mars and, you know, there could be a piece of text saying "planetary scientists believe that the same geological processes that have shaped the Earth, volcanism, techtonism, water and ice and impacts are at work on Mars," and there could be links then to descriptions about that and to find further information, further knowledge about that which would be useful if you are exploring this sort of thing. Think of it as a very smart Wikipedia, or it could link through to let's say we are interested in impacts and it could then bring up the amateur picture of the thing that collided with Jupiter this week and say Jupiter is important in the context of impacts because, and someone has put a description in there. Now I know that Ray was claiming that he does a lot of Wikipedia entries, so stuff that he puts in could be drawn up and presented in a fashion through a knowledge-based framework, and it would be reuse of the information that he's put in, not necessarily in the context that he originally put in, but if the necessary metadata to make it useful or to describe it or the use is in place, it can put together for doing this sort of smart docent EPO type approach. A third area in which sort of knowledge can be useful is we as a species have this sort of innate need to classify things. This is L. Eyer’s sort of conceptual taxonomy for essentially variable stars or variable objects. Wouldn't it be great if I was doing data mining or machine learning in some fashion and I could incorporate this knowledge not as a preprocessing step or as a postprocessing step, but as something that's actually used directly as part of the machine learning or the data mining that I'm doing? Two particular techniques that I've been playing around with is there is a way that you can use self organizing maps to include what's called an ontology which is essentially just a representation of domain knowledge so that you can get a metric between two different knowledge ideas. You can get a metric between things that are expressed in a taxonomy and if you go to NED, NED has the undersea catalog as you've seen it and then it has this wonderful set of annotations and you can express that in such a way that it is machine processable and get distances between a spiral galaxy and an elliptical galaxy in your concept scheme. Because you then have a quantitative measure, you can then do traditional data mining clustering techniques like the ontological SOM or self organizing map. So you take the NGCs, you figure out broadly what their classes are, you get a self organizing map based on the distances of the concepts on how those objects are tagged up and then you can see how those sort of relate. Another technique is a sort of super class of Bayesian networks which Gerald talked about and I think, are you talking about Bayesian networks as well Gerald, ah, Ashish [phonetic]? >>: [inaudible]. >> Matthew J. Graham: Right, so Ashish will talk more about networks. There's a thing called Markov logic networks which are a way of combining statistical reasoning and first-order logical reasoning. Essentially what you can do is you can express your domain knowledge in the form of rules, but you can associate weights with those rules, so the rules are expressing knowledge and you can attach weights to those expressing your believability in that particular statement, so you could say that a supernova is associated with a nearby galaxy 80% of the time, and you could then use that as a classification system and it will use first-order knowledge, first-order logic rather to infer things like what is the most likely statement I can say about a particular piece of information based on a body of knowledge that I've associated with it or Markov probabilities and stuff like that. Those are just three sort of separate areas where having machine processable knowledge could potentially be very useful in astronomy. The question is how do we use these tools? Well, the basic rule is that there is in machine processable knowledge there is the sort of lowest level, the quantum of knowledge is something that is very simple. It is what's called a triple. You simply have subject, predicate, object and you reduce all of your statements to that basic level, that very fundamental unit, and then you do all sorts of other things with it and express it in your concept schemes storage and data bases and infer over it, but that's the basic level that we work at. The World Wide Web has defined a whole set of standards related to knowledge representation and the basic one is called resource description framework, RDF and you will hear talk if you read more about semantics and semantic technologies, you will hear more about what are called triples, or RDF triples and essentially what they are is they are just a subject, predicate, object. Pluto is a planet. Actually, that's not true anymore. That's the other thing about knowledge. Knowledge evolves and our knowledge changes. If you work in bio medics or, bio informatics, you have something called the gene ontology. The gene ontology which expresses the current body of knowledge about gene structures changes about 10 times a day, gets updated about 10 times a day, so they are very used to having techniques in dealing with this fluid body of knowledge and we change these ideas. We reclassify stuff and whatever. That is sort of a bonus you get as well it; it's inherent in that. How do we build up bodies of knowledge? Well one way of doing it is extracting knowledge from the literature. You express it in angle brackets and we've been very good at doing that in astronomy for at least 2000 years. In time domain astronomy we send out event notices expressing our knowledge of what we have seen in the sky and it essentially asks the question of who, what, where, when, how and why, but actually literature, there's a lot of information in literature. There's a lot of knowledge in literature that is worth exploring and trying to extract. This brings us into text mining and trying to create concept schemes from the body of text. Now there are two ways that you can create a context scheme. I can go and ask George what he believes the correct hierarchy or correct classification scheme is for variable objects. And then I can go and ask Bob and they will have probably let’s hope 80% agreement but they might disagree over certain things and I may ask someone else. Everyone will have their own version of what a concept scheme is and how you would classify certain things or how things might be related depending on what your domain knowledge is. That doesn't matter in semantics because there is this whole thing about resolving different people's ideas to form consensus opinions so you can have hierarchies and how to map them. But going to the literature where there is maybe a more established corpus of knowledge and a representation is a good idea, so one of the things we've been playing around with is another way in which in time domain you can send event notifications around are things called astronomers telegrams. These are natural language, so instead of trying to put it into a forced structure like maybe we’re trying to do with the angle bracket way, with the ATels you have someone who has written some sentences. The hope is that these are scientifically rich sentences, but there may be things that are very useful in there that might allow us to gather snippets of knowledge together that are relevant for a particular class of astronomical object, so I've had students over the last two summers who've been working on this and trying to hierarchically cluster vector representations of text that they've extracted from ATels and there is some interesting clustering going on that would maybe allow us to create a concept scheme for that particular subdomain. This is another thing. You don't need to create one huge overarching body of knowledge. This is astronomy and that's what you've got to use. You can deal with the very small subdomain of which you are expert and just work with that and maybe use someone else's representation of which they are also experts and then join the two together quite easily. ADS labs is a potentially very rich corpus of material to mine and create these sort of concept schemes and I think that there is work that has been done or is being done in these sorts of various. >>: [inaudible]. >> Matthew J. Graham: Okay. There was, David Hogg has been putting ideas up on his ideas blog last month and one of them was about creating a paragraph level index for arXiv. Wouldn't this be a great way of identifying astronomical knowledge and as he says also identifying who wrote which paragraph in a paper? And then there are things like the NED annotations. They are somewhat freeform, but you can create a concept scheme from those as I sort of have done and present that and say this is the worldview according to NED; use it if you want, but those would be concept schemes from bodies of knowledge, bodies of data that we already have. How do you store this information? How do you store knowledge? Well, for a separate project, one thing that I had done recently is we had a corpus of information, we had about 11,000 event notifications and we wanted to see what was the best way of representing the knowledge, the information stored in those was and for retrieving it. You see, there are these things called triples stores which, who talked about it? Mark Stoltza [phonetic] was mentioning with the flash blades on Monday and we have relational databases. We have things like no SQL non-relational databases which are seen as one of the storage solutions for big data. With LSST we are going to be getting not 11,000 events; we’re going to be getting about ten million to 100 million events a night. That knowledge, that information needs to be stored somewhere in a very efficient way, so we were looking at doing a little toy experiment to see what we could do to store those, so we tried a traditional relational database with two different ways of organizing our information inside of that. We tried MongoDB which is I think what Galaxy U, or Zuniverse uses under the hood. StarDog, which is a commercial triple store and eXist which is a native XML database. VOEvents is an XML; it's a angle back at representation, and so you just give it a very simple thing. In this case, find me all events which have, you know, a parameter called event flux and a value less than 40, so there is my knowledge statement, my knowledge query. Find me everything with that in it. It turns out that the triple store is by far the fastest because it, the technology is very fast at retrieving these very, the semantic triples, which is essentially you break it down and then you're just doing matching. I was quite surprised by that. I fully expected MySQL, the relationship data based to win out, but this particular product it. We had hoped to have the creator of this particular product here this afternoon but he had to call off at the last minute. There are these specific technologies for storing very large amounts of knowledge data out there which will scale very nicely I suspect. Finally, there are these different ways of codifying knowledge which I've alluded to. We start off with controlled vocabularies and the semantics working group of the IVOA is very much into trying to collect these and provide advice on how to do these, so we start off with controlled vocabularies. The theory working group has produced one of these for marking up astronomical theories. There are taxonomies like the general catalog of variable stars defining how a hierarchy of variable stars. There are thesauri which Norman is going to be talking about in a few minutes and then we have these things called full-blown ontologies. CDS uses one of these for doing a lot of starts and I talked about the concept schemes that I've been working on, data mining, and there are a couple of more out there. Something I finally came across yesterday which I was very interested to see is something which is putting all of these together which is the SKA information intensive framework, which has been developed by the IBM CTO and someone in New Zealand which provides semantic services to manage data right across the board for ingesting, for classifying, for making inferences to do transient detection and for data retrieval. This uses one of the ontologies that the IVOA has defined, so I was very interested to see this. I'm trying to find out more information because there seems to be very little information about it. There seems to have been a set of press announcements about it at the end of last year, but clearly SKA has got its ideas right about how to use knowledge and knowledge management systems in the heart of a big data system. >>: [inaudible]. >> Matthew J. Graham: Yes. >>: It is? >> Matthew J. Graham: It is, yes. And it's, the guy who proposed it is the chief technology officer at IBM who is also in charge of some aspect of SKA project management, so I hadn't heard about it before I happened to see a random news story on it yesterday and I was quite surprised by it. Ray says he knows nothing about it. >>: [inaudible]. >> Matthew J. Graham: Yeah, I know. I don't know. Joe might know more. >>: [inaudible]. >> Matthew J. Graham: Joe is not in the room. I'll ask him later. That's where I'll end off, so if there are any questions or comments… [applause]. >> Norman Gray: Hello again. As Matthew mentioned, I'm going to talk about the both the history and the future of broadly considered vocabularies in the context of the VO which in this context often means within the IVOA, but not exclusively. Oh, that wasn't supposed to happen. I'm going to skip the slides because they are implied by the discussion that we had in the last session, closed parenthesis. The core assertion here, the thing I want you to remember is that there are multiple thesauri and I'll get to definitions in a moment, already in astronomy. This is no longer arcane even if it was originally and this is all ready for deployment and for applications to build on it. This needs saying because this whole area, the vocabularies, the thesauri, the semantic wave has appeared to be off putting to people and that's intelligible because a lot of the technologies involved are sufficiently far from the comfort zones of most political scientists; they were new to many computer scientists until 10 years ago and they are sufficiently fiddly that they are a route from the reading of a download to producing something interesting and useful, so you needed to be sort of pre-convinced if it was worth the effort. Now I think there are enough applications or potential applications that are fairly obvious next steps that that is no longer the case, so that was in the past and is no longer now. As Matthew mentioned there are vocabularies. There are thesauri; there are ontologies, so some definitions are useful. So this section of the half hour is pedagogical and the pedagogical will be through stories. I don't expect to read all of that. You can read it in the PDF of the slides. You can and I encourage you to do so; read it in the PDF slides afterwards. The main point is that there is a story here about someone starting off reading a paper and going from there to service to data to other things including Wikipedia and moving around through the possibilities of astronomical knowledge to make their work easier. That works; that doesn't work yet but getting from where we are now to there is just a matter of code. The bare bones of that are already present in things like the deep linking or the deep markup, the deep tagging that Matthew mentioned in the last session would support this. The point is that one of the points, and I'll come back to it several times is that what supports doing that is that it's easy for humans because the web works the way it goes. If you're reading a webpage you can follow a link. That link doesn't just, isn't restricted to go into another place on that page. It's not restricted to going to another page on that site. The link can go anywhere on the web, but only humans can read HTML pages. This depends on a machine-readable web, on machines that go through the same sort of service to service to source to source linking that humans have always been able to do with the web. I don't think there's much more to see there. This is lightweight semantics, that word again, lightweight stuff. The more elaborate end of the range of possibilities you can talk about ontology's and I'll skip over this fairly swiftly. I just wish to illustrate that there is a heavyweight end of the spectrum. What does this mean? That is an ra and a dec and an r magnitude by some particular observer or particular deed. What can you conclude from that? This is an optical measurement of a star. You know it's a star because it has n ra and a dec. It's optical because there was an r band measurement there. Look there is an astronomer because she has taken an optical measurement. She was not in a radio observatory on that day. You can tell that because she was clearly at an optical observatory. An optical observatory and radio observatories are different things, and so you could ask who was at an optical observatory in March and you could never get an answer to that question. In other words, there are rules here. If it has an ra and a dec it is a celestial object. If it has a measurement in the optical, it's visible in the optical and so on. All of you are sitting here thinking that's not true. I can think of countless examples of that. And you are right. This is not absolute truth. It is a programming language. If you're at simulation, what you are coding up is not absolute truth. It's enough of the truth to be usable for, to get something done, to draw a conclusion. So this is the sort of more elaborate programming with logic that reflects enough of a discipline that the machine doesn't understand what's happening but it can act a little bit as if it did and that's the a really modest goal of things even at the heavyweight end of the spectrum. I mentioned the spectrum and the spectrum, this is a much power pointed picture. You can imagine the spectrum of these from very light weight knowledge representation things at one end to very heavyweight knowledge representation things that the other end. Way over here are lists of terms. You've seen things like that if you submitted a paper to one of the major astronomy journals, you have to add keywords to the paper. Those keywords have no structure but you have to pick one or two of them as a controlled vocabulary. You can't just put in a keyword you like. You can do a little bit more than that. You can add structure to those keywords. You can say this is a narrower term than this one. The canonical example there is if you have a car, concept of a car, a steering wheel is a narrower concept. There is less covered by that concept. It does not follow that a steering wheel is a type of car, so the relationship is about finding things. It is not a subclass relationship. Going the other side of this red line you can talk about more formal “is a” relations. You have the class of all cars and the class of red cars and if you say something is a red car then it is a deduction that it is a car. Nothing exotic there. It has a very light weight bit of logical structure but at the very beginning of how you can imagine the classification is based on that type of structure. Over on this side you have much more elaborate intricate types of the same sort of thing, but there is a sort of boundary of vocabularies and thesauri on this side and ontologies on the side. There is also a boundary of cost. Things on this side are easy and cheap. Things on the side are not easy and emphatically are not cheap and all of them are [inaudible]. As Matthew said there isn't just, this is not about having one grand ontology thesaurus, so keep that in mind. I'm now going to go on to talk about various thesauri and vocabularies that already exist just to show the range of things and that are being developed further at present. One controlled vocabulary that many people know about is UTDs. [inaudible] everybody knows about UTDs. I think they have evolved over the course of years, not hugely but there are developments. The UCDs are barely structured. There is some structure sort of there, but it doesn't play a heavy role. The reason why UCDs are so important is because everyone recognizes them and they have their authority from the fact that they were extracted. They were deduced; they were mined from the headings of the databases in the holdings of the data, so these aren't just some words that some folks thought that might be a nice idea. These are, this is a collection of things that are actually measured in astronomy, so they are not intricate but they are very well known. Things like that laid onto work in 2009 where the IVOA declared in a rather tentative declaration that everyone should use SKOS. SKOS is our w3 standard for writing down simple knowledge organization systems, so it says that this is, these are the important features, the thesaurus idea of a narrow relationship, a broader relationship, a related relationship and that's about it, so very light weight. This document simply said that's a good plan if you could produce thesauri in astronomy, do that, and it mentioned as examples it showed SKOS versions of four existing thesauri, namely the journals keyword list, so basically the list of keywords that are required for the journals with angle brackets around it, the UCDs, an overt vocabulary called AVM and the IAU’s 1993 thesaurus of all astronomy, which was an intricately constructed thesaurus but hasn't been updated in 20 years so it is of limited use. That again stresses the point that I come back to that it is useful to, these thesauri don't have to be complete and they don't have to be new. They can be refurbished versions of existing things and there is value in that. What does one of these look like? The concepts are the nv urls. You can describe broader, narrower relationships. The concept is different from the labels that describe it. The concept of absolute magnitude is not bound to the string absolute helligkeit in German or absolute magnitude in English and so on and so the labels are a separate layer on top of the notion and this points to other related things, so nothing exotic or particularly arcane there. This is about lightweight semantics. Another, that is from the IAU’s thesaurus. Another set of terms is Simbad's object tapes, again not hugely structured but well understood and applied to a very large number of objects. These are effectively a lightweight ontology of astronomical objects. Sebastien can talk more about that if there are any questions at all. Also from CDS and this time from the heavyweight end of the spectrum is the ontology of astronomical object types, which really needs a snappier name. This is the URL by these people. It was virtually finished in 2008 with some tweaks later and I put this up not because I have much more to say about this, but because I want to point to these notes describing it and the use cases for that which I think are interesting. This has a couple of applications. Matthew mentioned one. It is to be used within CDS to do consistency checking of the attachment of labels to astronomical objects to see if somehow something has been inconsistently labeled. That I say is at the heavyweight end. One thing that is sort of missing from this talk is snappy demos. I said at the beginning and I will say again at the end that these things are much more mature than they were and are more usable ready to use things than they were in the past but they are not quite there yet, so I do have snappy astronomy specific demos but all of this general technology does work at scale and the BBC’s sports webpages are I understand essentially a semantic wave application. There's a big [inaudible] behind them; none of the pages are fixed. Of course that works. This is just being used as a content management system and this is not the only piece that uses the content management system. But this content management system is firmly based on the technologies that I'm talking about here and is obviously on big scale. I picked this particular screenshot because this is, although it's only sports stuff that is covered by this application, it's flexible enough and has enough links across the BBC that this is basically a news story, a basically non-sports news story that had enough of a sport angle that it was included as one of the sports headlines. As it happens that's today. One of the things I'll be saying again and again is about linking between vocabularies; there are multiple vocabularies and you can link between them, and this has been systematized in the notion of the linked data cloud and I did not expect you to read that. The part of the point is it's too big read. Each of these is a, some sort of data warehouse and I think these ones are publications of various types. These ones are community content. These are the governmental ones, but the point is they link from one to another. Each of these gives out its information as RDF, as these angle brackets I showed you in the illustration of thesaurus and makes links from one to another. This is just like the ordinary human readable wave. The point is this wave of links to links to links to links all go via Wikipedia is literally here in a computer readable version of the wave and in this case going via DBpedia which is a machine readable version of Wikipedia. So this is not a French technology anymore. That is the point that I keep going back to. Bigger projects, one is the grandly unified astronomy thesaurus. This has various people involved. I think the important thing here is the range of institutions involved in developing this. The history of it is that it was, the history is implied by the slide after next. One interesting thing is the use cases. These are some of the use cases for EDAs. I'll let you read them, but the point is that having this structure, machine readable and shareable allows applications to be built on that foundation. The publishers have similar, distinct but similar use cases. There are different things they want to do with the same structure and if they are doing things with the same structure, then it is natural for them for these applications to go back and forth between the various commercial publishers and the community oriented ADS and beyond. The goals for this are fairly specific. The background to this is both the American Institute of physics, AIP and Institute of physics, IOP in the UK were generating large all physics thesauri and they have donated the astronomic portions of these to this project, so there are some incompatibilities. Both of those thesauri have in their lineage the IAU’s 1983 thesaurus, but they have come by them by different routes. Productizing this is important because this mustn't end up being just a research project. This must end up being a thing which is released and which can build long-term applications on. The maintenance process is important because it is necessary so it will not fall out of sync with astronomy as the IAU’s thesaurus did and so a mechanism for involving astronomers, not information scientists, but astronomers in the creation of it is important. As Matthew said in the last remark, this does not have to be complete. It is okay to have more than one thesaurus. The process, AAS, American Astronomical Society will own this. IAU expects to bless it. The process is not terribly exciting but the point is is that there is one which has yet to be baked a bit and finalized. There is nothing more to say about that. Okay. Moving on, another cluster of thesauri are the [inaudible] theory vocabularies which again Matthew previewed. These are the four involved, The Paris Observatory in Meudon and I have been involved a little bit, although it is the Meudon people who have been doing most of the work. Now the motivation here is that the simulation data model with the IVOA is about finding simulation results. If you search for these is the plan by searching for the objects we simulated, by searching for the types of input parameters, by searching for the algorithms being used and so you don't just want to have free text searching for all of this. You want it to be a little more structured and so these need thesauri for these efforts. That is the site, votheory.obspm.fr and there are 10 vocabularies being created there. Some of these were pre-existing ones which were just tagged up and reported; some were given more or less substantial editing work after being imported and some were created on this site. The back end of that is a commercial system called Poolparty which is a rather expensive system which we have a free license to use, but because this is using SKOS, we are not in hock to that commercial supplier. If they disappeared or started excessive charging then we would just dump the SKOS files and go somewhere else. There is such a diversity of standards. I'm going to talk now about one of these in particular, the thesaurus for chemical species because I was involved in elaborating that a little bit. It builds on the VAMDC, virtual atomic and molecular line database c; I'm not sure what the C stands for. That is quite a large thesaurus of chemical species of astronomical interest and here is a term from it. I put this up here just to illustrate what the thesaurus looks like. There is some URL that refers to this concept. This has a label, a couple of labels. This is water, the concept of water. It has preferred names, names and narrower concepts which like the concept of heavy water and pretreated water are narrower concepts than this. I make that point next. In this case, heavy water is a narrower concept than water. You find that in the same part of the shop, if you like, as the shop where we find water, not very much of it typically, but it is also the case that heavy water is water. It's a type of water and that isn't captured here. That implication isn't present in what's up there. The solution to that was to write a parallel, was to generate a parallel and that site did change by the way, a parallel ontology that does talk of subclassing, that heavy water is a type of water. Another thing is this very naturally can link to other sources of information. This is clicking on a link, other sources about water. Chebi is chemical elements of biological interest which is another elaborately and I think extensively curated source of machine-readable knowledge about chemistry. It is natural for this fairly homegrown bit of work to link into that and thus inherit the knowledge that other people are extensively creating in their ontology, so that linking and the machine readable linking is like a broken record. I am going back to it again and again. So there are, just to make it explicit the various lessons that I've been mentioning in passing through this last 20 minutes. A little bit of structure goes a long way. You don't have to have elaborate logically intricate descriptions of the entire astronomical world in order to build something useful. A little bit of structure is enough. Most of the work involved in the various thesauri I have described here has been scripting work, just taking something that already exists which possibly people believe in and use in one little part of the forest and put angle brackets around it and making it a standard and openly shareable and reusable thing. It's okay to have lots of them. There is no benefit really to having a monolithic figure here. A monolithic thesaurus has the virtue of consistency and some internal integrity, but it doesn't have many other benefits. If you had a whole forest of these things, as long as they were linked together and not logically inconsistent with each other every one maintaining little bits of the tree can have their work useful. Thesauri don't do everything. Sometimes adding a companion, a very lightweight ontology which just expresses a few extra relationships is necessary and as I've said several times this is all about machines being able to do what we do on the web anyway, to move from source to source. I'll start where I began, there are already many thesauri developed and deployed. This is no longer arcane. This is ready for applications to be built on top of it, so what are you waiting for? And I will stop there. [applause]. We have plenty of time for questions. >> Matthew J. Graham: Yes, are there any questions for Norman? >>: So you showed [inaudible] data staff, is there any astronomical data in there already? >> Norman Gray: No, none. >> Matthew J. Graham: Well, that's not strictly true. Any astronomical data that is in Wikipedia and is sufficiently marked up will be. >>: Real data from… >> Norman Gray: Real data? [laughter]. >>: In that sense each data would be like your main goal would be to connect to the linked in network? >> Norman Gray: Where--I know exactly how to put astronomical information on there. There is a lot of information in Simbad, for example, which is exactly the sort of information about astronomical objects which could go onto that, perhaps not tomorrow. It is only a matter of code and--no, that's not true. It's a matter of code and permission to use that data that way. >>: [inaudible] something there, you have to create an RDF database, right? >> Norman Gray: Not necessarily. You have to give the information out as RDF. But you don't have to store it as RDF. >>: But in the case of the [inaudible] Observatory, would it be possible to connect the [inaudible] Observatory through the new data web? >> Norman Gray: The link to the web isn't really about bulk data. RDF tends to be slightly uneasy. It's not natural to pump vast quantities of numbers out that way. It's about concepts, about the sort of knowledge that you have after reading a Wikipedia page. >> Matthew J. Graham: So one thing that we did experiment with at a previous semantic workshop a couple of years ago was putting an interface around Simbad for the information in there or a subset thereof and there are ways that you can put layers on top of relational databases and then you just define the mapping into the RDF. You don't need to reconvert all of your data already into it. And we sort of got somewhere with that, so that might be a way of doing something similar to that. >> Norman Gray: So the proof of concept way that I had of showing a link data version of Simbad was just a layer that took the linked data API on this site and made the right SQL queries and rewrote the results. >>: For example, I could easily imagine that it would be out of context browsing through these relationships and stepping from one to the other, but also some of the relationships could tie into data. For example, some of it may already be being done at some level. For example, I know space telescope shares with ADS the relationships between data sets and publications and so that's a very natural sort of stepping off point in both directions. Maybe you discovered some data sets and you want to update some publications and then you can go from there to other data sets and so on and it's a very natural way to sort of expand the sort of scope of things that may be of interest to whatever you are browsing. >> Norman Gray: And this provides a language for doing that. This is glue. This is a pot of glue sitting there ready to be used and once you've done that for machines it's a matter of code to do it for human readable pieces as well. >> Matthew J. Graham: Last question. >>: I assume that we will produce these triple stores for output. >> Matthew J. Graham: [inaudible]. >>: Didn't quite follow the context you mentioned UCDs that [inaudible]. >> Norman Gray: I was using those as an example of quite lightweight controlled vocabulary with some structure but the structure isn't the important thing about it, which gains its authority from the fact that it is used by many people and it was harvested from [inaudible] rather than being the creation of someone's beautiful mind. >>: But they do have these [inaudible]? >> Norman Gray: They are useful. The point to be made there was that even something as lightweight as UCDs, the mere fact that there is an agreement, use this strength even that isn't enough to get lots of beautiful things done, so if you have more structure in the thesaurus you get more things done. As sort getting some function from any consensus. >> Matthew J. Graham: Thanks Norm. Our next speaker is Sebastien Derriere who is going to be talking about building a smart portal for astronomy. Thank you, Sebastien. >> Sebastien Derriere: Thank you Norman. Thank you Matthew. [laughter]. I could have made the title improving the serious portal but this is more ambitious and I hope… >>: It's already smart. >> Sebastien Derriere: [laughter]. It's already very smart, so I will start because portal is a fuzzy word. I took the official definition of the word portal in the Merriam-Webster dictionary and so the first, there are several meanings to portal and the first one is a door, entrance, especially a grand or imposing one. So I thought well, that sounds good. I hope we can build a portal to astronomy or a unifying portal for a city of services. There are other meanings and one has to do with church and one has to do with a tunnel, and we don't want to go that way, or an entry point for diseases or pathogens in the body. Well, that's not good, and the last one is quite modern. I think it's an arresting definition. It's a site serving as a guide or point of entry to the World Wide Web and usually including a search engine or a collection of links to other sites arranged especially by topic and it might be a hard time coming up with this kind of definition. It's still a bit fuzzy. You can go to the World Wide Web or do some searches or point to different things, and in fact when we do portal we have the same kind of problem, so my definition for astronomy portal is a bit more restrictive. It's I say astronomy portal is a web interface or web application because it's more and more integrated somehow. It's something that will allow us to do some data discovery or service discovery and maybe then doing some queries to that data set or services and then why not do some analysis and some workflows and so on and I would do more and more of this in your browser. If you want to do complicated things then you will need some kind of customizable interface where you can put modular widgets as you want and these widgets will communicate with each other and maybe communicate with external applications and we start to have some technologies which do all that. We have very nice JavaScript frameworks which allow you to do some quite high level programming for interfaces. You can do Ajax with XML or JSON to exchange packets of information. We've got HTML5 which has for more and more support and allows you to make nice interfaces and we've got all of these VO protocols where you can query data. I will mainly focus on the data and service discovery aspect in this session which has to do with--because that's where most of the semantic goes in. We've got to choices, in my opinion. We can go for simplicity or we can go to complexity. You can make a nice simple portal or you can make a quite complex portal. See all those widgets in the--well it's not very customizable but there is plenty of information in the [inaudible]. In practice, I looked at a few existing portals. This is the VAO portal. You can maybe notice that the input here is quite simple. You can put in this search box an object name or coordinates and say, what is the size of the area you consider around this sky location? And when you press submit, you will get a bunch of results corresponding to a list of resources around this location. The input is very simple and then you can filter the results with facets and so on, so it's a good example of a VAO portal. If you go to ADS, it's slightly different. You still have to have a single text input box but you can stop to put some scripting in this box. You can categorize the input terms that you are putting in there, so you can say I want to find papers that have cloud in the title and which are dealing with a thesaurus object, for example. You've got additional constraints which you can put in your query. You can choose how you wish to solve the results and so on and you've got the possibility to login, so you can categorize the inputs. You can choose somehow how the sorting will be done and you can login so once you've logged in you get additional, well, maybe the portal can have some memory on your behavior and customize things for you. Again, on the results page you've got this faceted search where you can restrict the results. A slightly different portal is what was demonstrated by Andrew Connolly, I think it was three years ago at the [inaudible], two or three years ago. He made a portal demonstration called ASCOT and this is made of many individual widgets. Here you've got the name resolver. You put the name of the object, you get the coordinates and then you can broadcast these coordinates or use them in conjunction with a widget which is writing a SQL query to some dataset and when you submit this you can make a plot of the data in another widget and so these widgets are talking to each other and this is more data analysis portal than a simple data discovery portal. In fact the data discovery is barely there because the predefined buttons for the catalogs which you can query in this case. And then I am coming to CDS resources. This is the current VizieR front page or the VizieR full search. We've got a simple search where we've got two text boxes, but this is the complete VizieR interface; it's much more complex than what I should before. You can put here some text, whatever you want. This can be another name of keywords, names of catalogs or bit codes and this service will try to figure out how to, what to make from this, or you can say yourself what kind of constraint you want to put on the search. In fact, this single interface can do two different things. It can search for catalogs if you don't know them, so you can search say, I want catalog with infrared photometry of pulsars and the you will try to find catalogs, or you can directly, if you know the name of a catalog, put an object here or search for an object across all of the catalogs, so you've got quite different things that you can do with this same interface. The point is that it's complicated. I never show this to the students when I'm teaching databases or access to VO data, because it is too complex. So we came up a few years ago with the idea to have a single CDS portal where you can go and make a simple query and this query will be broadcast to Simbad, Aladin and VizieR and we tried aggregate the results together. Currently it's only limited to object name and/or position, just like the VAO portal. The benefit from that is that the web interface is very simple. You've got a box you put your target. It can be an object name. It can be coordinates and you press go and then you get an aggregated result. You get Simbad results at the top with, we made the selection of links. We aggregate some information from Simbad and then we give you some pointers, find out more for this object in Simbad or you want to find all of the bibliography for this object or do you want to find similar objects in Simbad, which is something that Simbad does not do. In that case, we take the object, look at papers for this object and look at other objects cited in those papers and we make statistics on the fly and so we say oh, the papers that deal with this subject also deal often with these other objects, so that is some information that you can retrieve. This is a list of images which will cover this subject in [inaudible] and this is a list of catalogs. In fact we have two things for the VizieR results. We search first objects from catalogs in VizieR which only deal with this subject. These catalogs might not even contain sky coordinates, can be XY coordinates or just a list of numbered elements for a strength of values atomic lines for the subject. A certain one is we make a full positional search for this object in VizieR, so this is the current capabilities of the portal and that's all. What we plan to do is have some improvements. For example, we would like to be able to put additional inputs into the query other than the simple object name and coordinates, and make some smart interpretation of these inputs and then broadcast a query to the various services that we have and possibly others and still at the same time keep it simple. That's not easy and I tried to summarize a few use cases to access through such a portal. What is the redshift of 3C273? Find information on globular clusters in M31, around M31, what we know about the proper motion of brown dwarves, and we are trying to make an SED for Vega. I just want a Veron catalog or the Veron catalogs because there are several versions. Quasars with redshift greater than five. I want to crossmatch the SDSS and 2MASS, so I looked at existing portals; the most famous portal I think everyone is using quite often is Google. For the user for Google is simple. It's amazing that it's remained that simple for so long. It's helping you by making suggestions, so in a way you feel smart because as you type, you see suggestions coming and if you are very lazy and it's fast enough you can simply click on what is suggested and you don't have to type the whole thing. And the suggestions as well as the results are nicely sorted. Sometimes when discussing with colleagues they have trouble sorting the output. They say oh, who am I to sort the output for the user? I just give everything. This you cannot do when you have a data avalanche, when you have many, many data sets you need someone to find the most relevant ones. You cannot simply throw everything to the user. Imagine if Google throws all of the webpages it has for one keyword. You would not use it. It is useless. It means that on the server side you need a heck of a lot of indexing. It needs to be very fast and it needs to be capable of fuzzy searches and to find if you made a typo. You want the system to tell you if you made a typo and still find the right results and so on. A second example I had in mind for requirements for improving the portal was Wolfram alpha. Wolfram alpha is smart. It's really smart. If you type M51 redshift, it would say ah, I am assuming you mean that M51 is an astronomical object, which is good and it is what I want, and so we get will interpret my input and just give me a result. That's nice. On the user side it's simple, just a single text box where you can put stuff. You can categorize it yourself, so there is a script where you can say this is a city, colon and put a name of the city, but if you don't it will try to figure out in the values that are the basis that are behind that, that maybe this string is a city name. On the server side there is a lot of interpretation of your inputs and there are many, many--I should have put plurals here--there are many knowledge bases. They keep on harvesting data sets from databases, knowledge bases such as DBpedia and so on and trying to annotate it with metadata so they can figure out what's in there and have many relations, so if you put two city names into Wolfram alpha it will give you the distance between the cities, the time it takes to fly from one to the other and so on. It's also about, thinking about what you have in mind when you make a simple search. >>: Can I make a comment? So one of my complaints about alpha is that it sometimes gives you complete garbage and you can't say how quickly can you [inaudible] this 42, and you've got no way of telling so there's no confidence in the… >> Sebastien Derriere: I think there are some sources here, for each result, it says where it is coming from and you can go and figure out the originals. >>: [inaudible] it says [inaudible] meters, fine. You put average heights of anybody from anywhere in the world and it gives you the same answer, so that sort of thing. Clearly it's picked up a number somewhere and such it's misinterpreting, so one important thing is that you need some measure of confidence. >> Sebastien Derriere: Yeah confidence. >>: Whether that's like [inaudible] [laughter]. >> Sebastien Derriere: Yeah, so, I try to think, okay, what are these simple things you need to build a smart portal. First the portal needs to understand what you mean when you give it some input. For this you need some kind of stemming, making sure that star and stars are interpreted in the same way, lemmatization where you deal with synonyms, so whether you type quasar or QSO, it will be interviewed the same and then categorize the inputs which is a problem with name entity recognition. Once you've done that, well, what can you do with it? What can you do with the query? So you must somehow match the annotated or entity recognized search string to some search template in various contexts which is what Wolfram alpha is doing and you must somehow maybe translate the query in each context. In the case of the value of city services, they don't have the same metadata exactly for various reasons but we have to deal with that. The first thing I really did was try to look at statistics and what do astronomers do when they search in Vizier? You've got this text box in Vizier were you can type whatever you want, so I just took the logs of Vizier and I just took the contents and what is used in there. If you do that you can make some nice plots so this is for a few, I don't know, tens of thousands of queries. This is the number of queries for each of the search term ranked by order. This is the most frequently used term which happens to be the word 2Mass, so it has been searched more than 1000 times. So than the most searched term and so on. You can mix this until you have gotten expressions which are only searched only once. It's a very classical shape for it follows a [inaudible]. In fact it follows to [inaudible]. You've got one slope here and another one here. It's called a zip flow. You find this everywhere. Take your book, take every word of the book, compute how many times each word appears, rank them, plot the number versus the rank in log log scale and you will find a [inaudible]. It's universal. >>: [inaudible]. >> Sebastien Derriere: Sorry? >>: [inaudible]? >> Sebastien Derriere: You will find, usually, it depends. Sometimes it's one slope and sometimes they are two slopes, but it's something that is well known. It means that while you can have many different searches and a few searches will appear very often and the trend is following a [inaudible]. I tried to look at what is the average length of the search string in number of characters. How many characters do people type in the search box before they expect to have a result? What is your guess on this? How many characters, 10, 15 20? >>: Four, eight. >> Sebastien Derriere: >>: Five. >>: I guess eight. >> Sebastien Derriere: It's less. It's less than the difference if you look at the total number of queries is one, two, three, four five, so five character is average, the most likely that people will type. If you look at the distinct search strings then it goes to six, but most often it will be between one, one is not often, but between three and ten characters. That's what people will type in the search box. It can be as large as, well this is garbage. [laughter]. But a few tens have [inaudible]. You must deal with that. People will not give you a lot of information most often, and from the list of terms that I extracted from the logs, I tried to categorize the search terms and I made this very simple categorization. People will search in VizieR for catalogs, to get a name of its catalog. They can search for a mission, instrument, sometimes it's the same word can be both. If I search for IRAS, IRAS is the name of a mission. It is also the name of the IRAS catalog. They can search for the name of a person. They can look for measurements, proper motion, redshift. They can look for object types, spiral galaxies, quasars, object names or positions which are somehow you can easily go from object name to a position or a data product. I want a spectral. I want something else. So I took my previous use cases and I tried to categorize them given the, with these categories and this is what you end up with. These are already complex queries given that people on average type six characters, but they can say oh, I can search for proper motion of brown dwarves or globular clusters in M31, so what do I do if I have an object type and an object? Do you have a question? >>: So is five or six keystrokes what people are comfortable with for entry or is it more what is needed to identify to get results? >> Sebastien Derriere: I think this is a bit biased first, because the search box was quite small. It was maybe 20 characters so you would not type a lot of text and maybe this is sufficient to retrieve what they want, but if you want to make some more complex queries like this one, probably you would type a bit more information, hopefully for me you would give more information. >>: [inaudible]. >> Sebastien Derriere: Yes. >>: That means that you said that 2MASS was the most searched term. That one has five strokes. >> Sebastien Derriere: Yeah. Building a smart portal you have to, there is what the user wants. Maybe you know this cartoon, but it is what the user wants, how he expresses it like what is he typing in the search box. Then you've got maybe what the portal will understand and what the portal returns and you don't want to end with this. Hopefully you want to be as close as what the user desires in the output. In order to interpret and tag the user inputs, the simple way is to say to the user okay, please tag it for me. Use this syntax where--you've got this syntax in ADS where you can type author, this, or in Google you can say I want result from this website so I will put this site column in front of this and let the user do it. The hard way is you let the user type whatever he wants and try to figure out what was meant. And so you can, to solve this problem you can use both vocabularies but I will show just after that these vocabularies are not enough, and what you really wish to do in order for the portal to feel smart, or to be smart, is you want to do it on the fly, ideally, and provide suggestions. As the user is typing you want to be able to make suggestions and say oh, I know what you mean. I know what you mean and I am giving you hints on what I am able to understand. So I made some prototypes where you can say for example, you can categorize your inputs with these categories that I defined as targets and the quantity red shift and you can give this as an input. The user can type it. You can provide ways to suggest that as you type it's just like typing quantity and so what quantity starts with an A and so it can be age. It can be absolute magnitude, abundance, age and so on. You pick this from the vocabulary. In order to suggest you need to make some queries, sort it, filter it. You don't want to provide hundreds of results and this tagging and suggesting in general can be done with vocabularies and thesauri, but not only. You don't have thesauri for everything. If you take author names, where do you find a reference list of author names, values? Here you take ADS with complete statistics on everything and you make statistics on author names, but then how do you sort? What are the most relevant authors in the output? That is a big problem. You can take some on the next slide. It is more difficult for object names. We've got more than 12 million identifiers in Simbad. We can load all of this in memory; it fits, but if you go to all of the possible object names that are specified by the dictionary of [inaudible] you've got nearly ten billion in VizieR, ten billion identifiers for all of the big catalogs, take 2MASS and [inaudible] and so on. And what you want to do ideally is as you type in the search box you would search for vocabularies and of course object names at the same time to find what the user is typing. There is a big issue of finding this and sorting. It is mandatory to sort things if you want to face the data avalanche. For this you want to make use of some statistics. You can make statistics, for example, on the most queried catalogs or present first papers with the most citations, but I know people who are not happy with this analysis. It is biased. The winner will take all. If there is one catalog which is frequently queried, I will always come out at the top. Yes, but that's life. That's what everyone is doing. If you don't do that people will most likely, it is most likely that people won't be happy with what you present. They will say, ah, I am looking for this one, the most used one and it's not at the top. This is not logical. It's not only the number, although the user statistic, it is the trend in the statistics. If you've got, so you need some short-term statistics. What about the, a paper that is recent? It doesn't have yet a big number of references or citations, but it's got a high citation rate, so it's popular, so you want this one to move to the top even if its total number of citations is low. The last point I want to make before conclusion is you've got one query, so you interpret it correctly. You search for quasars with red shift greater than five. You've identified that Quasar is an object type, red shift is a measurement. You've got a memory constraint on the measurement and then you need to broadcast this query to different services. They will talk different languages. They will transmit different data, so you will need in Simbad, quasars are called QSOs. Red shift is called red shift. In VizieR you have to look for astronomy keywords and QSOs with an S and you find the red shift with the UCD. You need mappings between all of these and this is probably the hardest part in the work of making these smart things. So I just summarize a few ideas that I have presented to you. I want to build a smart portal which will enable us to do some data services and new features and discovery. It's important. We release across much service. It is able to cross big catalogs and so on and still people will query Simbad with a list of 20,000 objects one by one. They make 20,000 queries. They don't know that there is a service that can do this in seconds. That will overload your server and so on with thousands of queries, so if people say, kind of object by list and this you want to say hey, there is a service that is able to interpret your query. Building a smart portal is extreme text mining. You've got very short input strings and it makes things more difficult and it is another challenge than mining the full ADS corpus. You can learn from your logs. Look at what people are searching and you will probably serve them better. There is no such thing as too many metadata. If you want to interpret queries in various contexts, you will always find a point where, ah, why don't we have this piece of metadata in the database. Yes, you can add it. It will take days and days of work, so think about the metadata early. Sorting and suggesting is important. You can learn from the actual usage of the portal. If users are registered you can even customize the output to remember what they have been searching for and provide it in the output. Things are changing. It was said earlier that the vocabularies can change. There can be new data. There can be obsolete data sets or services. You need to take this into account to have a dynamic answer. And the last point is do not underestimate the carbonbased intelligence. You will never be able to build a portal which is smarter than the user or I don't think so. But you must have confidence in the user, that he will be able to somehow rephrase his query to find what he wants, so don't try to envision all of the possible cases. Try to be smart enough with the portal and the user will be smart also with you. Thank you. [applause]. >> Matthew J. Graham: We've probably got time for maybe one question. >>: The visual that your portal generates what would be the solution [inaudible] onedimensional data? What visual do you… >> Sebastien Derriere: Oh. I am not planning to… >> Matthew J. Graham: Can you repeat the question? >> Sebastien Derriere: Sorry? >> Matthew J. Graham: Can you repeat the question? >> Sebastien Derriere: Yeah, the question is what do we plan to do when the result is multidimensional data. At this point the portal will not be about visualizing the data and so on, but mostly locating or interpreting the input and redirecting the user to specific links. If we can provide the numerical answer and do it, that's good, like for the red shift, that's good. But if the user asks for globular clusters in M31, you will say oh, I can search for a catalog containing this kind of object type, such as globular clusters and make the search in the area of M31 and tell the user that this is the kind of query you can construct and you can customize it. Or in Simbad you can search for filter on object type, stored in Simbad, same thing, look around M31, customize the output with a search radius and so on. So it would be more like providing the user with a list of directions in which to get different results than trying to integrate and visualize everything on the same page. This is for later, much later probably. >> Matthew J. Graham: Okay. We will take a 20 minute break if we may, and then we will reconvene at 3:07.