>> Lee Dirks: The second session here is the -- the theme of it is tools and technologies for communication and problem solving, and so that will be more the emphasis for this. And our first speaker in this session is Curtis Wong, and Curtis is a principal researcher in the E Science group at Microsoft Research. He joined Microsoft in 1998 to start the next media research group focusing on interaction, media, and visualization technologies. He has been granted more than 25 patents in areas such as interactive television, media browsing, visualization, search and gaming. He's the co-author of Microsoft's 5,000th patent in 2006, 10,000th patent in 2009. He's the co-creator of the Worldwide Telescope featuring the largest collection of the highest resolution ground and space based imagery ever assembled into a single learning environment inspiring millions to explore and expand the universe. His most recent project is Project Tuva teaching the 1964 messenger series lectures of Nobel Prize winning physicist Richard Feynman, who many of us in the Department of Energy know -- knew, which are an interactive research media player featuring rich interactive simulations and related content. His presentation this morning is called telling stories in the cloud. So help me welcome Curtis Wong. [applause]. >> Curtis Wong: Thank you. Okay. Well, I'm really excited to be here with your guys. You're sort of my people. Multimedia, interactive, visualization. So how many people here have heard of the Worldwide Telescope? Can I see some hands? A few? Maybe almost half. How many people have played with it? About a quarter? Anybody authored with it yet? Oh, you have a real treat coming for you. Okay. So I'm going to spend a little bit of time talking a little about the Worldwide Telescope and take you into stuff that you probably haven't seen or noticed before, and then more of the recent work we've been doing with it is bringing in data. And I'll show you an example of that. And then there's also been some really interesting work, I think, around what we call it guided tours, which are essentially like hyper-media publications. It's very similar to the last speaker. And I think you'll find that particularly interesting as well. So let me just dive in here. I'm going to set up -- running here, and I'm going to turn this down. Excuse me a second here. It's amazing. It just happens. Okay. So Worldwide Telescope will run at pretty much any resolution. Right now we're running it at a fairly low resolution so that it can be captured on video, but when we were running it on, like, a wall or something like that, we could pretty much run it at any resolution, which is pretty wonderful. So you're looking at the visible light view of the night sky. This is the milky way, of course. And this is a trillion pixel image. It was put together with some folks here and based on the Palomar DSS Survey, which is about 1700 plates, and those were mosaic together and stitches were eliminated and, you know, so you can do this kind of thing, zoom in pretty much anywhere. But within this same environment we have the ability to compare other wavelengths. And you notice that they're all sort of registered. So if we take a look at, say, the X-ray view, you can zoom in and look at this heat signature from a supernova remnant, and we can make that a foreground view and then we'll go over and make the visible background view. And you can compare two different data sets. So you can see the heat signature as opposed to the debris clouds from that particular supernova remnant, or if you wanted to see that in hydrogen, you can see the compression of hydrogen from that particular explosion. So having a base layer imagery is particularly important. We have 50 of those, 50 different wavelengths. And then within collections we have collections from the Spitzer-Hubble Chandra, the highest resolution imagery of all of them here, and these are overlaid on top of the visible light view that you have. Say, here's an example of an M42. And notice down below we're showing you, as we slew over, this blue rectangle, it shows you the field of view as we're zooming in to show you what portion of the sky you're looking at. The numbers here tell you the field of view in degrees. And down here it shows you what part of the celestial sphere you're looking at. So you're looking at a very high resolution image of M42. And you can sort of zoom down as far as you want. This happens to be a problitin [phonetic] that has resolved into a disc of dust where you can see the star forming at the center here. There are many examples here in this particular area. Here's another one, an earlier stage problitin right there. So let me show you something else. One of the most exciting features, I think, of Worldwide Telescope lies in the idea of what we're calling guided tours for the public, but essentially they're just paths within this very, very large data set of hundreds of terabytes of imagery that we have. And what's exciting about it is that it allows you, because we have built in an authoring environment, to create these paths that you can richly annotate with graphics, text, imagery, hyperlinks to other things as well. I'll show you an example of -- this is a tour that Alyssa Goodman did a couple of years ago. It's interesting that when you add up all the people that have seen this tour and then you compare it to all the people that have been students of hers, freshmen undergraduate and all of people that have read her technical papers, all the people that have been in -- read her and been at public talks, it's about a factor of 10 more have seen this tour. So in terms of public outreach, it's a terrific opportunity to -- let's see. Is that right? We have a small technical problem. It's not coming through because it's still coming through on my speakers, which is -There we go. One of those things. [Video playing. Voice of Allysa Goodman] >> Narrator: -- because we're inside it. Here's a spiral galaxy not far from us, about 12 million light years away called M81. If we look at it in optical light, we see billions of stars shining together in a spiral pattern. If we look at the heat from M81 rather than the light, it looks like the false color orangey image we see here. This Spitzer Space Telescope image uses long wavelength patterns that can see heat just like the one that took the picture of this cat. Galaxies are filled with tiny particles called dust that absorb the light from stars -[Video paused] >> Curtis Wong: So it feels like video. Video is like the most powerful media form we have. It can really sort of engage somebody at a deep level. But the problem with video is that if it's something they already know about, it's not going fast enough. If it's something you don't know about, it's going too slow. So the opportunity really here is that, as I mentioned before, these things that look like video are just rendered video in realtime. And so because I've paused, I'm back in this environment of the Worldwide Telescope so I can then look at a particular object from any other telescope, I can look at a different part of that particular object that I was looking at. I can look at other tours that internet with this particular galaxy. So I can sort of semantically branch off. If I want today know more information about this particular galaxy, I could go and look up -- if I was a kid doing research, I could go look at Wikipedia and go get some information from Wikipedia. So that's one source. There's other sources. If I was an astronomer and wanted to know what's the latest papers that have been written that reference this particular object, we can go and it does a realtime query to Smithsonian ADS and it will show you all the results from that of which there are 2625, and the most recent one was from January of this month. And there are other information sources that are essentially on the web. But if I wanted to get an original source image, I could directly go get a DSS image from that source or somewhere else, or if I want today get an image from the Sloan Digital Sky Survey, it goes and does a query, pulls it out from the database and I don't have to do anything other than just access it. And then there's a number of other things that you can do with it. So far we have this rich visual environment. We have the ability to create these paths in this environment. And we also have this connection to data underneath it. And so that gets particularly interesting from an educational perspective as you start to think about, well, what kind of stories can I tell about these different environments that we have. And we have quite a lot of different environments in here. I mean, if you look at planets, you can create -- here's the planet Mars, and if I wanted to sort of explore down into a particular area Valus Mariarus [phonetic] you could do it and I could create a path down into the valley and I can annotate it based on interesting things that I was looking at or seeing. And that's one example. Of course, we have the entire virtual Earth data set as well, whether it's -- I'll show you here -- whether it happens to be the straight Earth view or the map view or Earth at night or -- these are blue marble, winter and summer. And these can all sort of be used within that experience. If I wanted to go out beyond the solar system, so right now we're looking at the sun and planets, actually quite small, if you -- I can turn on the orbits, but when I show you the orbits you'll see that we have quite a lot of bodies here. Okay. See the little lint balls? Those are the moons around -- as you look at Jupiter here, you can go in and see all those. We can crank up the clock and you see them spin around like crazy. But let's go get some other context for kids. You can see why Pluto is not a planet [laughter]. It's sort of an acquired object. But let's take a look at Orion here, the little three stars in the belt. If you keep going further out you'll notice that the constellation starts to distort. Of course, that's because the planets in Orion are at different locations -- I mean the stars that are in Orion are in different locations. But we can exit, you know, the Milky Way and continue going out into the million or so galaxies in the Sloan. So if you look at some of these galaxies here, such as this is the Coma Cluster, each one of these has a real galaxy behind it and you can go and say I want to get information about that particular outlier galaxy. Here's the image of that galaxy. There's the spectra, there's the red shift, you can download the data for it, you can take a look at chemical composition. And this is sort of information that's behind every one of these million galaxies that are here rendered. So let's go back to the Earth and I'll show you a little bit about what we can do with data. Okay. So the first thing I'm going to do is I'm going to go find some data. And this one will be -- we'll say this one. So I have some data in here in Excel. A lot of people have data in Excel. So this one happens to be earthquakes from 2009 to 18 months later, 2010. It's about 40,000 rows of data. And it's difficult to sort of discern any real insight by looking at all of this data, but we've been working on connecting Worldwide Telescope with things like Excel. And so we can go and take that data and bring it into Worldwide Telescope. So let's see. Let's go in here. Let me copy this and bring it into here. All right. And here we have -- I'm going to minimize this. Let's take a look at -- so here's all of that data in Worldwide Telescope, but we can play with some of the properties of this. If we wanted to change the scale of some of the data that we were looking at, you could see it here. Oh, okay. Wait a minute. The challenge right here is that I didn't select a critical field, which is magnitude [laughter]. That's why it's not plotting anything. All right. Let's try that again. All right. I'm going to delete this guy out of here. All right. That's better. So it's pretty easy to sort of see the ring of fire. We can change the scale of so that's a little bit -- that which it will allow you to look a little bit deeper at what we're seeing there. You can change the color of it is as well and make it a little bit more dramatic. But when you start looking at these earthquakes, you notice that there's some funny patterns of these quakes out in the middle of the ocean. And you kind of wonder what's going on out here. And so what we can do is we can switch to a different view of the underlying information about the ocean floor and you can see that, of course, this is a little bit of a subduction zone where you see the pattern of the earthquakes following that subduction zone there. You can go look down here. Here's the Haiti earthquake. And it was very close to the surface. But, you know, this is an interesting distribution right here, right above Puerto Rico, seeing some interesting structures here. We can take this same data and look at it over time. So we'll select that and select time series and we can go into the clock and crank up the clock a little bit more and -- and pause there. So you can start to see some of these patterns here. Let's go in here and see them a little more closely. Some of them even look like lightning bolts. Let's speed this up a little bit. I think what will help us too, is if we adjust the latency of it to give it a little bit more persistence so we can look at more temporal patterns. So we're seeing something here that is very difficult to see any other way. And one of the nice things we have about this environment of being able to author in this space -- I'm going to pause here for a second -- is that we can create an annotation about the data here. So we're going to create a little guided tour. And the way you create a tour is just by deciding on a point of view, which happens to be this one. We could pick a different one. And it creates a little snapshot of a point that we're looking at. And then maybe what we might want to do is look at it from a little different angle, and it will interpolate between those two views. And we also want to pick a segment of time that we're looking at. Say that segment. And then if we set an end camera there and then we can create another slide that will be a starting point, and then we'll move the time slider a little bit more, we might change the view a little bit more here, add an end camera position to that and then we can take a look at what we have. So I just saw something interesting in here. Maybe what I wanted to do is I might want to cull something out and maybe I wanted to add something here of -- let's see. I want to make a little audio note to somebody to take a look at thing. And let's sort of preview that. [Audio clip played]. >>: Carl, can you take a look at these [inaudible] there's something interesting going. [Audio clip stopped] >> Curtis Wong: I just recorded that with a little digital audio recorder. And then, you know, you can -- within this space you can add anything like text, image, pictures, and all of those objects, if I added, say, even this thing, if this ring, I wanted to do something to it, if I wanted to hyperlink that thing to a website, I could go to the HSTI [phonetic] and get that url, copy that, paste it into here. And I'm actually going to mute myself because I don't want to hear myself again. So that's going, and I could click that and it would link me to some relevant other related piece of information to what we were looking at. And I could actually animate this particular object just by setting a different starting and ending point in between those two. So that's sort of a simple example of something that we're doing around data. I'm going to show you a guided tour that was created by some astronomers at the Harvard Center for Astrophysics, and this one is called John Huchra's Universe as a dedication to John Huchra, who was one of the two astronomers that worked on the original work on the large-scale structure of the universe. [Clip played]. >>: About 14 billion years after the Big Bang. It's only during the last hundred of that 14 billion years that human beings have figured out that we live in an expanding universe. When John was born, no one really knew how fast the universe was expanding, how old it was, or where the galaxies were in that stretching space. >> Curtis Wong: So this tour is 12 minutes long. So I inserted these little [inaudible] in there as little jump points so we could sort of jump through it and I could show you some different features about this. Now, what we're doing here in the background, this is actually a path within the Sloan Digital Sky Survey, which is sort of a testament to him. So this is an example about the analogy of ->> Narrator: Here's a 3-dimensional view of the top of. Mt. Hopkins in Arizona, the site of many of John Huchra's astronomical observations. Adding altitude measurements to a two dimensional map of this part of Arizona gives a 3D view. In the same way, adding distances inferred from galaxy redshifts to a two dimensional map of the sky can make a 3-dimensional map of the universe, like slide 33 seen here. One study of Markarian galaxies got John into a heated discussion in the journal Nature about whether the distribution of these galaxies was lumpy or smooth, another harbinger of things to come. After his Ph.D., John moved east to the Harvard Smithsonian Center for Astrophysics. He never left. John and a number of his young colleagues were deeply influenced by Jim Peebles' desire to understand the origins of large-scale structure in the universe. Big surveys of galaxies seemed the right way to proceed, and John was up for the challenge. If you measure a galaxies redshift you can infer its distance. This lets you add a third dimension to maps of the sky. On the Earth adding altitude lets you understand mountains and canyons. Would a 3D map of the universe similarly reveal such grand structures? Soon after his arrival in Cambridge, Massachusetts, in 1976 [inaudible] who wondered whether the universe was smooth or lumpy. At that time many people had opinions, but there were few facts. To build a 3D view of the way galaxies are arranged in space, you need a spectrum for hundreds, or, more likely, thousands of galaxies in order to determine each one's red shift and, thus, its distance, and none had ever set out to measure -- and summarizes some big red shift [inaudible]. >> Curtis Wong: It's an interactive timeline. >> Narrator: The CFA1 was the first systematic attempt to map out a large swath of the universe in 3D. In the late 1970s and early '80s, John and his colleagues spent hundreds of hours atop. Mt. Hopkins measuring hundreds [inaudible] seemed to indicate a million cubic megaparsec hole in the galaxy distribution. But nobody knew if voids like this were rare or common. The results from CFA1, published in 1982, hinted that there might be interesting large-scale structure in the universe, but they were not conclusive. A larger survey with more red shifts was needed. After CFA1 ->> Curtis Wong: So this is the actual CFA survey plotted against the sky. Each one of these squares is that survey. In fact, you can go in and look and find a galaxy behind each one of those little squares. >> Narrator: -- structure in the universe, but they were not conclusive. A larger survey -- two red shift surveys should employ a different strategy than CFA1. She thought that more would be learned by sampling long, thin strips on the sky than by samples over a broader region as had been done in CFA1. The terrestrial analogy is that more can be learned about the Earth's general topography from a long, thin strip of elevation measurements stretching from coast to coast than from sampling elevations over a random patch of Earth. After all, the CFA mappers reasoned, a long strip would encounter rivers and mountain ranges and oceans, while a sample patch could turn out to be all ocean or all Iowa. John, Margaret and their students and coworkers began the CFA2 red shift survey in 1984, using the long, thin strip strategy that Margaret had proposed. The first strip was 130 degrees long and just 6 degrees wide and contained the 1100 galaxies highlighted here. A new insight was their 3D map. >> Curtis Wong: So you remember this. >> Narrator: It looked at a view of their first slice seeing as if they could look at the universe from above. Here is the famous first slice from CFA2 where velocity, a proxy for distance and Hubble flow, increases radially outward. The range from east to west, stretching for 130 out of 360 degrees around on the sky, is shown as the angular coordinate from right to left. The thin range of 6 degrees north and south is scrunched down in this diagram, which shows galaxies of the slice projected to make a two dimensional view as if we viewed the universe from an orientation 90 degrees way from our usual sky view. It's easier to understand this diagram if we look at it in context. So let's put there before we dissect its meaning. Shown here again are the galaxies the first script as projected on the sky in two dimensions. If we color code the same first script galaxies in a 3D view of the universe given to us by the modern Sloan Digital Sky Survey, we can see how a strip on the sky translates to a wedge in three dimensions. In John's modest description of the first slice he says -- total for CFA2 ->> Curtis Wong: This is 18,000 galaxies plotted against the sky. >> Narrator: Here we show all 18,000 galaxies on the sky color coded by their red shift ->> Curtis Wong: So I can pause and you can sort of see that segment of the strip that they're talking about, and you can drill into any one of these. And the color code tells you about the red shift. We can go in and pull up red shifts about any one of those. >> Narrator: The total for CFA2 reached about 18,000 in the 1990s. Here we show all 18,000 galaxies on the sky color coded by their red shift. Red markers are farthest, and blue nearest. The result from the CFA2 red shift survey are best understood in three dimensions. Here we see the 3-dimensional locations of the same 18,000 galaxies we saw just a moment ago on the sky. The bubbles and sheets are now obvious, as is the great wall. The CFA red shift survey inspired many larger surveys. The Sloan Digital Sky Survey is shown here. The Sloan survey measured red shifts for hundreds of thousands of galaxies. The cosmos synthetic 3D map of the Sloan galaxy shown here in Worldwide Telescope was created by placing two dimensional images of galaxies at the right positions in 3-dimensional space. The CFA survey, where each of the 18,000 galaxies is marked by a colored dot, was much smaller than the Sloan. But the essential features of large-scale structure and network of filaments and ->> Curtis Wong: Okay. I want to make sure we have time for questions. So this, like all the others, it's a real 3D model. There's real data behind all of those points, and I think it allude to a little bit about the last speaker in terms of how you might connect both visualization and the data underneath it. Worldwide Telescope connects to a number of different data sources transparently within this application. You don't know that, you know, some of the Mars data is coming from JPL and other data is coming from other sources, and you don't really have to care. You just have to say point me to the data and you can get it. And that was part of the original reason for wanting to build Worldwide Telescope, because many of the astronomical imagery, say, for the Hubble was resident in Hubble in visible light and the X-ray stuff was at Chandra and the infrared stuff was at Vizier and all these different laboratories sort of had their own different data, and there was an effort about ten years ago to try and bring all of these imagery into something called the International Virtual Observatory Association. And part of that was just to have a standard so everybody could publish their data on the web and we could sort of get to it. And so I pushed them about six or seven years ago in saying we should create this uber environment where everything is brought together seamlessly and then connect it with these acts to author experiences as well as connect them with information. And now the IVOA essentially uses Worldwide Telescope as their de facto visualization tool. So we have a few minutes. I want to make sure we have time for questions. >>: Thank you. I think this is -- it's a great demonstration of that kind of data linkage and such. I have a question just in terms of how people were using the data. In a number of the examples you showed were probably more educational. >> Curtis Wong: Right. >>: Which is probably good for a more lay audience like I think we have here. Are scientists using this as a way to get access to data sets to ->> Curtis Wong: Yeah. I mean, I can show you an example -- let me bring up -here's a data set. There's an astronomer that did that tour about dust, and her data set -- she's posted her data sets. And if I click here -- so this is her data set overlaid a top of Worldwide Telescope. And I think that's the real benefit of it is they have their data sets and they can compare their data to any other multi-spectral data from a number of different sources. And we have an API so that they can just construct their own unique viewer on a web page that they can do their own things with. There's a connection for amateur astronomers. They can take a picture of the sky, whatever it is, upload it to Flicker, and on Flicker there's a group called Astrometry.net which solves the position of the stars, and then there's a length to Worldwide Telescope and they can see their image overlaid in Worldwide Telescope and you can sort of cross fade between to look for what you can see in their image versus what's commercially available. Another question? >>: I'm just curious about the project size in terms of how many developers have been working on this, how long have you been working on it, those kind of -- >> Curtis Wong: Well, I've been thinking about this project for about 30 years, and there really was not the technology to do it until Jim Gray [phonetic] and Alex Salae [phonetic] started working with the Sloan about 10 years ago. I joined them -- I did a little bit of work with them in 2002, and that was when I said, you know, just having the data isn't enough, we should build this larger thing. And they were huge supporters of it. And because it was one of these -- I wasn't in the E Science part of the organization at that time. And so it wasn't really sort of core in my job, so it sort of was this background thing. And about 2006 I had a little window of time that I could start working on this, so I hired a really good developer named Jonathan Fey [phonetic], and Jonathan and I worked on it for about a year and a half. We got an intern to help us with all the data, and that was the first launch we did in 2008. We had other people help us with the website and things like that. But the core team was very, very small just because, you know, it's kind of a labor of love. We wanted to create something for education and something that we hoped would have some benefits to science, and I think some of the data visualization work that we're doing now I think has a lot of interesting potential, both scientific and commercial. >>: Hi. This is a really compelling demo, and actually some of the features here remind me somewhat of the work on collaborative visualization by Jeff Hare [phonetic], who's at Stanford. In particular, some of you may know about his senses project. And one of the nice things there is that the system kind of directly supports communication and discussions and collaborations based on the data set kind of anchored with respect to the data set. And it seems like this system would definitely support this type of discussion as well. Are there features built in like for commenting and for kind of having discussions around these different views of the data? >> Curtis Wong: We haven't done that yet. But, I mean, as you saw, Alyssa has a data set that I was just showing you. She could easily make a tour with that data set which has pointers to the data, and that could be shared with someone else. She could annotate it and send it to somebody else, and it's a very small file because it's basically xml with pointers, and then somebody else could annotate it too or send it to a group or have it just be large -- more largely available. So there's a mechanism for it, and, you know, it's another one of our sort of many to-do things. >>: It seems like it just would be so cool to, like, have that video that's been viewed by so many people, for instance, and then having them be able to leave particular kind of comments or notes kind of embedded as part of that guided experience. >> Curtis Wong: And have it as a space where you could sort of view all this sort of discussion and collaboration sort of geographically, look at a heat map of the whole sky and say where's the activity and where's they're not the activity, and should there be. And, you know, Harvard is working with ADS, which is the publication sort of home for all those papers that I was showing you, to generate a heat map of publications about the sky. Because it wouldn't be great to try and figure out, you know, where is all the scholarship going and where is it not and what are we missing in an interactive way? >>: Let me get this row in the back, Peter, and I'll come up to you in a second. >>: So kind of related to that, this seems like a new way to -- or a different way than the traditional method for browsing and navigating data. So have you found that has resulted in a different type of metadata structure or for semantics in support of this way to kind of be tying things through geographic location? >> Curtis Wong: A different kind of metadata structure? Well, we were trying to use sort of conventions that exist out there already which allows us to bring in lots of other metadata without having to do anything special. So we can bring in shaped files, we can bring in 3D studio polygons, we can bring in a whole number of different things with this environment, and each sort of -- you can define each one as a separate layer which has its own rendering space which you could then plot your data into that rendering space within the larger context of another environment. >>: It seems like with that component where you're looking at, you know, the space and let's go look at -- you're kind of querying what's out there. I didn't know if maybe having that kind of link required anything -- is going to require anything different or anything that you found. >> Curtis Wong: Well, for the sky and the Earth there are existing sort of conventions there. We're also working on, you know, totally abstract 3D or ND spaces that you can do visualization, and we're going to try and do the same thing there where we use the same contentions, because we don't want to invent too many things that we don't have to. >>: Two things. First, I want to note for the audience that when the astronomer started talking, the clock at the back stopped [laughter]. >> Curtis Wong: I'm sorry, what? I didn't hear that. >>: When you started talking, the clock at the back stopped. >> Curtis Wong: Oh [laughter]. >>: But I'm interested -- I know it's tough for someone who's been so involved in this -- is the whole earth telescope at a disruptive technology stage for astronomers, and if it is, are they afraid of it or welcoming it? >> Curtis Wong: It's really interesting, because I think at the -- in the early stages a number of astronomers said, wow, that's really cool, it's for education, right? You know how that argument is. And then some of the astronomers that were using it were showing some other astronomers and saying, well, you know, I could do that, but let me show you how I do it here, and it's like a fraction of the work. So what would you rather be doing, being a computer programmer or doing science? And they go, you're right. Some of the early advocates of Worldwide Telescope today to this day, too, are two astronomers at Space Telescope, and ironically those two astronomers are the ones that persuaded Google to do Google Sky. But now I think they're partly a fan of what we've been doing because we've sort of taken it much further. I mean, we were very passionate about sort of the accuracy of rendering. So when you're doing the sky, you can't do the sky in a Mercator projection because anything within 15 degrees of the poles is distorted. So that's a problem. So we created our own projection method that essentially has no distortion where you sort of look. And so that's particularly helpful for Earth data sets if you're trying to look at what's happening climate-wise on the Earth. I mean, you saw a little bit of earth in one of the tours. You could essentially create a tour of any data set that's in this environment. So it's quite flexible. And I didn't show you the tour that was done by a six year-old. There was a great tour done by a 16 year-old girl who did a tour about extra solar planets, which is particularly relevant. And she really does a really fantastic tour about that. >> Lee Dirks: Help me thank Curtis, please. Thank you. >> Curtis Wong: Thank you. [applause]. >> Lee Dirks: Thank you very much. So our second speaker for -- our second speaker for this session is Tim Smith from CERN. And, of course, I think everyone knows what CERN is. The Anglicized spelling of that acronym is the European Organization for Nuclear Research. And Tim has been so gracious to appear before [inaudible] venues before. He's active in this area as well. He holds a Ph.D. in physics and performed research at the CERN LEP Accelerator for ten years before moving to It. He leads a group in the CERN IT department that provides services for the 10,000 strong CERN user community covering the domains of audio/visual, conferencing, document management, print shop and copy editing as well as the IT help desk. Of course, these responsibilities cover the burgeoning area of multimedia, and Tim oversees CERN multimedia archive which contains 25 terabytes of open access photos and videos. In addition to the technology of disseminating multimedia, he will also give us a view of how scientists at CERN are capturing their work in the form of visual media. Help me welcome Tim Smith. Thank you. [applause]. >> Tim Smith: So thanks very much for inviting me to speak. What I'm going to talk about is in fact the work of many other people, so I'd like to first of all acknowledge that it's the physicists around the world who work for CERN, at CERN, with CERN that I'm going to present their work here. Multimedia means so many different things ->>: [inaudible]. >> Tim Smith: Yeah. I can move it up if you can't hear. The scope of multimedia has expanded so much over recent years that it covers so many different diverse things that instead of going into any one in particular, I'm going to go and skim across the top. So I hope it isn't too light for you. I'm going to start with talking about how multimedia is used in communications, communications to the public, which in general is slightly different in the way that we use the multimedia for the scientists themselves. But when I get on to that I'll talk just a tiny little bit about the supporting technologies behind that. So starting with the public and starting with the weird and wonderful, physicists around the world have come up with different ways to engage the public and try and make what they're doing inside the buildings interesting to the public working outside. So here is one idea from the Danish physicists who projected with 96 LED projectors onto their physics institute realtime events as they were happening in the collider. The tracks go across their institute just to try and attract the attention of people walking by and get them to come in and ask what on earth this is all about. That's called the Colliderscope. A group in London took the reconstruction event -- the reconstructed events, the raw data, take the tracks and the identified particles and fed it through a music composition software that generated music from the events. So this way you cannot only hear about the Higgs particle when we find it but you can hear the Higgs particle. >>: What's it going to sound like. >> Tim Smith: Hmm [laughter]. And then how do you attract the younger generation? Well, gaming is the only way to attract them. So what we did was we made a little CERNland game. You can't actually see this very well. Can I dim? Here we go. So the idea here being to try and attract their attention by making it a game, but nonetheless, have all the correct elements, quarks and glue-ones and how to construct protons and neutrons out of their elements so that they sort of absorb the physics and perhaps get fascinated by it just by playing around with it. But once we got the attention of the public, they obviously want more information. They want to explore what we're doing. In recent years we've been called the cathedrals of Science and you want to walk around a cathedral. But the problem is it's a hundred meters underground and not only is it underground and it's in a big cavern, but the experiment itself is massive, and it completely fills the cavern. So even if you get to go down there, which is unfortunately an extremely rare chance, you can hardly get a perspective of the whole thing, and you can only go on a tiny little tour of it. So what we wanted to do is give them some way of exploring it without having to go down. And the solution was we painted it on the outside of the buildings. Actually, that's not what we did -- it is what we did, but it's not the solution. So we painted it real scale so that we could actually give an impression for the people that came to visit and can't actually go down. But what we did is we started to create a virtual tour, an interactive video, so you could actually explore it without actually going down. So what we did here is we made a navigation paragraph of the entire cavern where you can walk on the [inaudible] and the walkways where the viewing points might be useful from, and then we got some experts to come in and take 360 degree pan photography at each of these intersection points on the graph. We coded all this up in flash and then we also, between all the connector points, we made videos so that we could put all this together into a virtual tour. So, again, it doesn't show very well. So I'll risk doing this online. So here is the result. This means that you can now, at any given point, you can start rotating around, zooming in, zooming out, the normal way of sort of a tour, and then if you go round to the edge and then you start saying, oh, I want to walk along there, then it actually goes and walks you along the video, along the connectors, to the next point, and then when you're at a decision point then you can interact with a flash which put it all together. So this point here, I walked into the left and I can choose which floor I want to go on and then I'll come out somewhere else. So this gives people who can't go down at least an impression of what it's all about. And once they've got an impression, they want to know more about it. What are these experiments, what's going on in the apparatus. So what we try and do here is we make illustrative videos. So the top right-hand corner is there is an illustration of the experimental apparatus itself. Perhaps I'll just leave the lights down because these are all rather dark. So what we have here, we've coded into just an animation software the outer shells of the apparatus and then we do a cut-away so that we can go in. So if only I'd known about this software that was described this morning, I'd have done this differently. But it cuts away, and then you can see what's actually happening in the heart of the experiment. And underneath you can also show by analogy what a magnetic trap looks like by portraying it as sort of a hill in terms of a gravitational analogy. Other things that we can do with the illustrative videos, I'll explain how the accelerators themselves work, how we collect particles, how the electric and magnetic fields work. So these are very useful tools, but they have their limitations. So when we get on to explaining the processes that are under study, in this case here, this was an anti-hydrogen trap. And what happens is when the anti-hydrogen finally leaves the trap and hits the outer wall, it annihilates and you can see the decay products. So this is fine for a relatively simple experiment like this. You can recode it up into an animation software and get something useful for the public. But when it comes to the bigger experiments, when there are millions of detector elements and tens of thousands of tracks coming out, we don't want to go and code all that in just for the sake of illustration. So instead we have to have a different approach, and there we start from what the physicists themselves use. So the physicist himself wants an accurate representation so he can visualize what's happening in each of the events. So for any given event we have a reconstruction software that take the hundred million channels of readout and allows you to visualize it in various different ways. You can actually -- the reconstruction program itself joins all the dots to make tracks and shows you where the energy depositions are, so you can actually see what's happening in the event in the center and the physicist can then rotate it around in 3D, take slices through it or make projections, so energy projection plots on theta phi or theta [inaudible] plots. So this is what the physicist wants. And it's a very powerful tool. But nonetheless, this doesn't actually convey enough information to an expert. It's actually coded in -- the whole reconstruction is coded in C++, so we have converters to make these event displays work that actually extract the information into xml which then a java visualizer can actually work on. Now, that intermediate step, the xml, we can use to feed into the other software, it's a visualization software, and make it useful or understandable to the public. So the usual stuff that goes on in something like 3D Studio Max, when you convert you add texture, and then we can basically make a video replay of the event as it happened, slowed down so we can explain to the public what's actually happening in the center there. Let me just get rid of the -- so here, pointing to our digital library, here is the video that comes out of this. So this is taken from real data, a real collision. We haven't had to code it back in. We've just converted the information into something that the renderer can actually work on. So you can see the collision and the particles coming out. Sorry, this is realtime from CERN over wireless so I'm asking a bit much. And then you can see the energy depositions in the calorimeters from the outside, and then we just rotate it and. So this has the limitations of video that were just mentioned, but nonetheless, it allows us to explain our science a lot easier than using one of the complex programs the physicists themselves use. >>: [inaudible]. >> Tim Smith: So once we've explained what's going on in the apparatus, the next thing that we want to get them to understand is the theories that go behind. What are we trying to study and why. And that's a lot more challenging for us to try and explain some of these concepts that the theorists are coming up with: Hidden dimensions, parallel universes, strings. I put anti-matter there because most people think it's a theory rather than something that we actually create and use in the lab. So what we happily rely on here is the creative genius of authors around the world like Philip Pullman [phonetic] and Dan Brown here and Hollywood to explain exactly what these theories are all about. So the relevance here is that when we discussed this with Sony and [inaudible], they agreed that we could -- if they were allowed to film in our underground chambers, then we could put our multimedia material along with the video. So actually if you look at the blue Ray disc of the Angels and Demons, you got the whole multimedia pack from CERN explaining what is in fact the science behind Angels and Demons so they can created our own special ambigram for that. So that's great. We got the attention of the public. But, of course, these authors don't always portray it in the best light. So it creates a lot of fear as well. So people were worried whether or not we were actually creating earth-eating black holes, so we had to work out a way of actually allaying their fears and giving them sufficient information. So what we decided is we would -- the best which to do science is really to do it open with everyone watching at the time we do it. So when we were switching on we decided to open up the lab and have a webcast so people could really see what we were doing and they could have links to all the supplementary material at the same time. So this worked extremely well. The public were interested, and we got a million watchers of the webcast, which seems to have been the biggest scientific webcast to date. And at that time when they started hitting our network they were asking for more and more multimedia material for these explanations and things like that. So that put a huge stress on the infrastructure underneath because they want from one place to start navigating out to all the extra information, exactly what we were talking about, these extended publications. They want to go through our entire digital library, especially the multimedia part of it. So this -- that was the webcast itself. This put enormous challenges on our digital library. So just another side of multimedia is how you serve it to the outside world. So prior to the multimedia library we just had basically one instance of -- this is the CERN digital library, CDS, which was a front end, back end, but it was just one simple item. So when multimedia comes into the game, unfortunately then you have to have streaming servers that can stream the content. You have to have transcoders that can translate it into all sorts different formats, the different browsers that different usage patterns want to use. You have to have servers that can serve it securely for the high resolution ones that can orchestrate this whole thing. And not just a few but many, many, many. So we turned from one server into a farm of 30 or 40 servers just to serve multimedia content. So the flip side of having interest and having content is you have to manage it in a completely different way. So given the location we're in, I just wanted to point out that we achieve all this by actually using virtualization, and these are all based on hyper-V virtualization machines and we can just instantiate them as needed if there's more transcoding or more streaming needing to be done underneath there. So what do the physicists want? Sort of going from the public side. The physicists want a similar thing. They want to see streaming media, but they want to see it from internally on site. They want to see at any given time what's happening in all the other lecture theatre, and they don't want to be sitting in front of all of them. So we've get up this system whereby basically any lecture anywhere on site can be streamed through the webcast server. Just a comment here. We find that rather than just filming one part of it, I think it was -- John was showing this yesterday as well, what our scientists want is in fact to see exactly what's being pointed at so we capture the VGA, and they don't particularly want to see what the lecturer is waving his hands at. Especially when it happens to be in a darkened room and you can't see anything. So, hence, the relative sizes. But once they've watched it live they want to go back and either download it or jump to somewhere and look at something specific in there. So what they want from the same webcast is afterwards to have a web lecture which they can then use and jump around in. So to be able to do this we capture the video, the audio, the VGA signal, even the whiteboards the theorists still use and we try and put that together into a lecture object. We analyze it, again, like was described yesterday, so that we can detect all the slide transitions and we can make chapters. We can make it into an interesting blog for the scientists to use. So this is what it then looks like is they can download this, they can use it, they can see it on the web. This is a big flash object where you can jump forwards, next slide, back a slide, and you can see the slides and the speaker or you can jump anywhere in it. Because we've done the slide transitions, you can jump along and you can just look at any -- the video syncs off and you can just look at what the lecturer is saying at any given point. So this is one type of multimedia -- the stuff that's more like what the public are after. But the other side of it which was discussed this morning I think by Michael is they want all sorts of other information which is vaguely classed as multimedia information which supports the science that's going on. So along with the publication, they want the data sets, they want the supplementary plots, any things that didn't actually make the publication but support it. They want access to all of this stuff and, of course, in a linked way also to the presentations and videos that I said. So this means that in our digital library we're starting getting more and more tabs coming along here with the extra pieces of information that they want. And as was mentioned this morning, they want to be able to add their data set there such that someone else can analyze it. So you can click on the data set and it will bring up the analysis program that all the physicists use, load up the data and then just allow you to revalidate what was being discussed in the paper. Now, one little aspect I'd like to bring up here that wasn't mentioned this morning is this is all great as a new tool for new people that are thinking of new ways of using it. But to attract the people who have one way of doing things, that are used to things, we have to actually show them that it's useful. And there's this sort of chicken and egg that if they're the first one to put it in and they're not seeing anyone else using it, then they don't really want to put it in and be the first. So rather than looking at just stuff that's being contributed, we decided to go a different way through it and see what we could mine out and put in there anyway to show the value of having it there in the first place. So just take a couple of slides on that. So the first thing that we tried to do is to mine out all of the plots. So from the zip files that are contributed or the PDFs or even the raw tech files, we go through with analysis programs, start extracting all the information and putting them out as separate objects behind all of the publications automatically. So this already was starting to be extremely useful because then the physicists see that they don't have to measure things on paper anymore, they can get to the original objects. So then what we did was we started extraction in the captions as well, which means that this is much more useful information because then you can start doing searches on it, such on the full text with a caption on this where the plot discussed a certain particular particle. And you can really hone down very quickly on the graphs that you're interested in. But this takes quite a lot of processing power. So the next logical step from that was once you've got objects like this which are not textual, you want to do searching not just in the classic way on the caption but actually on the image itself. So there we started working with the University of Geneva on some of the tools that they were doing for medical imaging and we started to see whether this could actually be useful in our domain, whether or not you can use these tools which actually look for the color distributions and the different shapes and surfaces and can actually go through our data within use. So this is from the lab. This is not in production yet. But it allowed you just to click, you know, I want something similar to this or not like that and then it would do a search through the entire multimedia database based on this -- the tool having already indexed all these features. So another thing I think I've shown you sometime in the past already, another way of visualizing things, this is done already in many different ways, but it's very fun. Once we're starting to analyze all the papers, the tech and the PDF and things, you can extract all sorts of different pieces of information out from them automatically. So here we've extracted all the references, then we do the citation graph, the citation network and the co-citation network, and then we can start plotting things so they visually speak all the quicker to people. So, for instance, a new person, a new researcher, can see that this was a very hot topic in the 1970s here and didn't die off. Some papers lose their interest completely. The subject is no longer interesting. This is obviously a really seminal work because it stood the test of time. And they can look for just shapes in the plots here. Like this one, it's obviously something that's really hot, that's taking off, and it allows the young researcher to focus more on the type of articles they're interested in because you can search for things that were interesting, are interesting, are taking off and things like that just by the shapes that you can see through these visual patterns. I won't talk much on the type of crowds and other things that are normal now in the social tools. I'm trying to bring them back to the science community as well. So all of this, of course, needs a lot of analysis power. So what we have to do instead of just having a digital library, the digital library now has to be able to dispatch all of these tasks for mining off. So we parallelize it, dispatch it, and then collect and reintegrate the results, often indexed results with the indexing from the classic text indexing. So we did this on the grid originally. We made connectors out to our grid that the physicists used for the analysis and are now actually doing it out into the cloud instead. I won't talk about all the other things because I probably haven't got time. So just on to more things to do is the visualization. Something we tried ten years ago was to see what virtue reality could do for us to help with the assembly. So the LHC was constructed by institutes around the world, and they had to validate that what was being made in these different places would actually fit together, and a sort of 27-kilometer long accelerator so it was quite a challenge. There were 142 different CAD systems being used around the world, and they had to integrate the information together in order to validate it. So what they did was they made a virtue reality lab. This was ten years ago, so these were massive goggles with your hands on balls and you could navigate through and just see whether or not a cooling pipe would go through an electrical supply or something like that. So this was fun and useful, but you had to be at CERN in that virtual reality lab, so it didn't get much use. So when we came to describing or assembling the detectors later on we tried another technique which is something that anybody could view from anywhere, which is to take all the diagrams and to try and work through a sequence, a time sequence, of how we'd actually put them together and then we could just run it in front of people and see if that fit what they thought they were going to be doing, what they were responsible for and to make sure that it works. So each of these are animation videos taken from the CAD output, again, with a texturizing and the surface coloring and things like that, and then we just animate it with a timeline to make sure that things insert as their inspected and there aren't any clashes. So this turned out to be quite a useful tool for the assembly and as useful afterwards for the public to understand what makes a detector and how does it go together. So one last visualization thing to do with the grid. So the physicists distribute the data all around to the participating institutes because we haven't got enough compute power or even data storage to have everything at CERN in multiple copies. So we distribute jobs around the world, dispatch them, then bring them back together, the results, across the grid. So we have a visualization tool that shows how these jobs and the data gets be distributed to the 350 centers around the world that are actually participating in this grid. Let's just see if this works. So then you can see realtime the distribution of jobs. It's just pulsating because -- to give a visual impact of how big each of the centers are. But you can see the data transfers and the job transfers meant to be realtime around the grid. So that's about it for the summary of the different techniques that we use. I just want to finish and ask if there's any questions. >> Lee Dirks: Any questions of Tim? >>: Tim, wearing taxpayer's hat, as I understand it, the amount of running time for the LHC to prove the Higgs boson is a critical feature. How do you try and convey that you've not yet got statistical certainty on a particular particle? So if you combine the Worldwide Telescope type of approach, then you would have to have some slide bar for speeding up, showing accumulation of data, and that would, you know, convey to the general public, you know, that the machine needs to keep running. >> Tim Smith: The way we generally do it is -- because it's a statistical analysis and you have to show a signal over a massive background, we generally do it just by literally showing what we have accumulated to date in the distributions and we just show that you can't possibly see anything with the statistical accuracy. And then with the modeling of the detector over 20 years, we can show how the arrow bars go down. The problem is all of this is statistical, and arrow bars and things like that, it's hard to convey that sort of stuff to the public. >> Lee Dirks: Other questions? All right. Thank you very much, Tim. [applause]