>> Brian Hitson: And thank you for coming from far and wide to many of you to this one-day workshop hosted most hospitably by our host, Microsoft, and also by the organization International Council for Scientific and Technical Information. This workshop is Multimedia and Visualization Innovation For Science. And so I think it's relevant to the work of science but in many cases many avenues beyond science. We have assembled a world-class lineup of speakers this morning and this afternoon, and we really appreciate the speakers' willingness to come and share their work with us. So I want to extend a specially thanks to them. We have a couple of introductions of -- for welcoming purposes and remarks from Microsoft and from ICSTI. And the first of those that I would like to introduce you to is Ms. Roberta Shaffer. She is the law librarian, a lawyer herself from the Library of Congress. And most recently elected as president of ICSTI. So Roberta will offer a few opening remarks and a welcome, and then we'll follow up with Tony Hey from Microsoft. Roberta. >> Roberta Shaffer: Thank you, Brian. And good morning to everyone. Good morning. I need to see that all that oxygen the flowing from head to toe. It's my pleasure to welcome you as the president of ICSTI and to also have this opportunity to extend our thanks to Microsoft and in particular to Lee Dirks who is finally getting breakfast. He's probably been up since 2 a.m. But he's in the back and he has been a phenomenal source of all kinds of support to make this day and the few days that some of us that have been here at Microsoft possible. And so we thank you, Lee, from the bottoms of our heart and from the tops of our head. ICSTI is we believe a unique organization because we bring together people who have an interest in the information society and knowledge economies and knowledge ecologies from a variety of nations, a variety of sectors and a variety of disciplines. And we believe that that really makes us unique in the professional association space. And so for those of you who are not members, we are inviting you to join us today to see how this all works with a showcase of a phenomenal workshop. And we believe that at the end of the day, those of you who are members will see that the money you are spending to travel with us and to be members is well worth it. And for those of you who are not rather than pouting, we will give you the opportunity to join because we welcome everybody. One of the benefits that ICSTI also offers is partnerships with a variety of other associations that share our interest. And I wanted to take this opportunity to let you know that N phase, one of our partners is having a conference at the end of this month, February 27th to March 1st. For those of you who have forgotten, February is a short month, so it's not quite as long a conference as it appears to be. And it has a rather intimidating and exciting title which relates to Taming the Information Tsunami. So if you see the tsunami coming on the horizon, then perhaps you'll want to be in Philadelphia at the end of February to find out how to protect yourself or how to exploit the tsunami. I also want to remind you that we will be having our annual conference in Beijing this year in early June, and it's never too early to plan to attend. So please begin to think about that. We have information about the program that will shortly be on our website. But we're happy to talk to you today about the program and about what we think the outcomes of it will be. So please, please put that in the front of mind. Before I turn the podium over to Tony Hey, who is co-hosting this day with us, I want to express my particular appreciation to Brian Hitson who introduced me this morning and started the day. It's really Brian's portfolio and intellectual muscle that is enabling us to be gathered here today for this what I think will be phenomenal workshop. So please join me in expressing appreciation to Brian for all that he has done. [applause]. >> Roberta Shaffer: And then without further a do, I will turn the podium back over to Brian so he can introduce our fabulous Tony Hey. Thank you. >> Brian Hitson: And so can you hear me with this mic? Oh, there, it just comes on. I want to specially appreciate that acknowledgement from Roberta, but we also had a lot of help from a program committee that put this together, and then also in addition to Lee Dirks consisted of the ICSTI executive director, Tony Llewellyn. If you just wave, Tony. And from the French information organization Inist, Herbert Grudemeyer [phonetic]. And myself. And I'm from the US Department of Energy's Office of Scientific and Technical Information. So we've enjoyed doing this. We had several other players who have helped contribute to this. I was somewhat on extended travel myself for some time. I had special support from one of my staff members, Lorrie Johnson, so specially accolades for putting this program together. So thanks to everyone. And now for Dr. Tony Hey. He has been a tremendous support to the objectives of ICSTI now for several years since Microsoft's membership in ICSTI. And of course Lee Dirks is his point person who interacts with us most frequently. But it's Tony's fundamental support for the objectives of this that enable this kind of collaboration and our ability to it in a venue like this. He's a tremendous support not only to ICSTI but also to worldwide science which is really a sister, inseparable election of ICSTI's program, worldwidescience.org is a federated search engine that searches across 70 plus databases in the world simultaneously reaching lots of scientific literature that's commercial search engines cannot reach. And with Tony's team that works in multilingual translations we've added translations to this technology as well. And so he's been instrumental in offering support to that. Tony is the corporate vice president in Microsoft Research for the connections team. And in that position, he has the responsibility for university relations and other forms of external collaborations. And I would just like to welcome him to welcome us. Tony. [applause]. >> Tony Hey: Thank you very much, Brian. And thanks to Brian and Walter introducing me to ICSTI. And it's been a very exciting time and I'm very pleased to be involved with ICSTI. Just background of my interest in scientific data. In the UK I ran the -- from 2001 to 2005 the UK equivalent of the Cyber Infrastructure Initiative in the US, which was all about if you like multidiscipline science, we called it e-science. And that data collections became clear to me that they are of vital importance and that's why I really welcome the great contribution of Worldwide Science and ICSTI's efforts in this space. So it's great to be here. You are seeing one of our first public showing of our new branding. We used to be called Microsoft External Research as opposed to Internal Research. You're in Microsoft's Research building on the Redmond campus. Microsoft Research has about 800 PhD computer scientists around the world. And that's about one percent of Microsoft's total employees. So it's about one percent. And that's Microsoft Research is here to give Microsoft agility to give -- to understand what's happening, to understand trends. And that's why it's really great for us to be participating in this workshop, because we also feed to understand what's happening out there and having a collaboration with ICSTI makes that much easier. So I just should warn the speakers that the clock at the back of the room is not working. Lee, will you fix that, please. You need to fix that. Now it's -- I don't know why it's wrong actually. Anyway, it's great to be here. And it's great to see you all here. I hope you have a great time. And the weather today looks like typical Seattle, the sun and so on. So I hope you'll enjoy it. And I hope you have a great meeting. So we'll be around all day, Lee and I. So if you have any issues, come and find us, and we'll do our best. Thanks very much for coming, and I hope you have a great meeting. Thanks a lot. [applause]. >> Brian Hitson: Thank you, Tony. So as you can see on your program, we have this workshop organized into essentially four sessions, two in the morning and two in the afternoon. And I'll be the chair for the morning sessions and Lee Dirks will chair the afternoon sessions. And we'll try to give you a break if between each of the two morning and afternoon sessions to stretch your legs and refresh. But we're on -- going to have sort of military precision here in terms of trying to keep to the timetable and because the clock not working if the speakers will occasionally gaze at me, I'll be over in this area maybe giving you a sign as to how many minutes you might have. Typically we're going to try to keep the presentations themselves to in the range of 25 minutes or so, and that would allow a few minutes for Q and A and then transition to the next speaker and so forth. So that's what we'll generally try to stick to as far as the schedule goes. On the first session this morning, and these are not hard-walled distinctions between the sessions because, for example, in the first session the emphasis is on the interactive nature of the multimedia and visualizations but that's not to say that subsequent speakers won't have interactive aspects to theirs as well. But in particular on these first -- first three presentations they'll be a heavy emphasis on interactivity. So with that, let me introduce our first speaker. And he is Will Li, who is a scientist in the creative technology's lab at Adobe Systems. After earning his PhD in 2007 in computer science at the University of Washington, his -- which where his graduate work focused on new interactive visualization techniques that help users understand and explore complex 3D objects with many constituent parts, for example CAD models and anatomical datasets. I've seen a preview of his presentation and it's an amazing depiction of 3D visualization of complex machinery taking machines apart and showing how they work and has high potential for training purposes and engineering purposes, and also in the area of human anatomy. So it also has medical applications. So I think you'll be impressed and intrigued by the potential of his work to accelerate science and technology understanding comprehension. And the title of his presentation is Explaining Complex 3D Objects With Interactive Illustrations. So I'd like to introduce you to Will Li. [applause]. >> Wilmot Li: I've got a clock here too, so I'm going to keep an eye on it. Great. Thank you for the introduction, Brian. And also thanks for inviting me here. I'm excited to be here, and I'm also very excited to be speaking first so I can fully enjoy the rest of the talks today. [laughter]. So as Brian mentioned, I did my PhD at the University of Washington. And, in fact, I spent many summers and extended internships here at Microsoft Research. So it was kind of familiar heading over here, maybe a little too familiar, actually. It turned into the old building 112, 113 parking lot. That's why I was a little bit late. So I wanted to talk today about this problem of conveying important characteristics of 3D objects, of complex 3D objects. So let me start by just clarifying what I actually mean by a complex object. In particular, I'm referring to 3D objects that have complex internal structure. Now, for many objects the structure is defined by an arrangement of many internal parts. So for example in mechanical assemblies like this turbine on the top left, they are typically -- they are typically composed of many individual components. And the same could also be said for human anatomy. In some sense we are just a system of many internal parts. Now, for other objects, this structure, this internal structure can be defined more by complex geometric features rather than separate parts. So, for example, on the right you see here an image of a well known mathematical surface called Boys Surface. And this surface actually twists and turns and folds back on itself creating lots of internal self-intersections. And these are what defined its complex structure. Now, as you might imagine, these types of objects arise in a wide variety of different domains. And two of the kind of more important types of information that we want to convey we might want or feed to convey about these objects are spatial and functional. Spatial information refers to the relative positions and orientations of parts. So, for example, and airplane mechanic might need to understand where all the parts there that turbine kind of fit together or how they relate to each other spatially. A medical student might need to understand how all the muscles in the neck are kind of layered with respect to one other. So here for spatial information the critical thing to understand is the spatial relationships between different parts. Now, functional information is often a little bit more domain specific. But it tends to -- it often has to do with how parts interact with each other in order to achieve some kind of functional goal. For example, in a human organs, muscles, [inaudible] which are often work together to perform some kind of function. And in mechanical assemblies the parts often kind of move and interact in conjunction to achieve some kind of mechanical goal. So these are two important types of information that we often want to convey. The problem is actually communicating this information effectively for complex objects is often quite challenging. So just as an example let's say we wanted to understand a bit more about the spatial relationships between parts inside of this turbine model. Well, a semi transparent view like this helps somewhat. Here at least we can see some of the internal structure. But there's still lots of occlusions and partial occlusions that make it difficult to understand how all these parts fit together. As an alternative I could just render all the parts separately. So here we can clearly see all these different parts, right, but what we've lost or what is more difficult is to understand how they actually fit back together in this object. So these are -- these two things are clearly not the most sophisticated -- the sophisticated visualization techniques we could use to try to convey this information. But I just wanted to point out that maybe some of the simplest approaches we might think of are not always sufficient. And in particular there are three kind of high level challenges that often arise when trying to convey spatial and functional information in economics 3D objects. The first since many of these objects contain lots of parts, occlusions between these parts can end up hiding the parts that may be of most interest to the viewer. In addition, because there are lots of parts, visual clutter can sometimes make it difficult for the viewer to really focus in on and distinguish the parts that are more interesting from those that are less interesting. And finally in the case of functional visualizations, these simultaneous motions and interactions between parts can often be difficult for viewers to understand. Now, luckily illustrators in science and technology have come up with a number of really effective techniques for conveying spatial and functional information in ways that actually address these challenges. And a today I'll focus on three such techniques. Cutaway views where portions of occluding geometry are partially removed in order to expose some of the underlying structure. Exploded views where parts are separated in order to reduce or eliminate occlusions. And finally what I'll call how things work illustrations. And these illustrations use visual techniques such as motion arrows as well as static sequences of frames in order to convey how the parts in a mechanical assembly move and interact in conjunction to achieve some kind of mechanical goal. The rest of the talk I'll describe a few interactive visualization techniques we've developed that are inspired by these different illustration techniques. And I'll start with cutaways. So this first project is work that we've presented at SIGGRAPH a few years ago, and it focuses on interactive cutaways. And just to provide a little bit of context, here is a type of cutting interaction that is available in existing 3D tools. So here these the user can slide a cutting plane interactively through the model in order to expose some of its internal structure. Now, this technique does technically expose the internals of this object. And if I was looking for a particular target part, let's say this off-access gear, I can now see it. But a cross-section like this does not provide a lot of spatial context. I would argue that for many viewers it would be really difficult to tell from an image like this exactly how this gear is positioned and oriented with respect to its surrounding parts. So in contrast here is a -- here is a video just showing one of the ways in which users can interact with our system. The user here now selects parts of interest from this list of parts on the bottom. And the system automatically generates a cutaway that exposes those parts. Now, I want to point out a couple things about this visualization. First of all, we're not just using a single cutting plane to expose the parts of interest. Now, here the system determines the size, the shape, and the position of cuts in order to not only expose the parts of interest but also hopefully do so in a way that helps the viewer understand how those target parts relate to the surrounding parts. So I'll just continue letting this video play a little bit. And as you can see, the user selects -- when the user changes the target parts of interest, the system kind of smoothly animates the cuts and in some cases the viewpoint in order to expose those parts. So let me say a little bit more about the design of that system. In order to determine how do generating effective cutaways the first thing we did was to analyze a large corpus of hand designed cutaway situations. And by doing so, we identified a number of conventions and design principles that could help conform our system. And probably the most important convention we came across has to do with the types of cuts that illustrators use. So for different -- it sounds somewhat obvious, but for different types of objects, illustrators tend to use different types of cuts. So for example for vector linear kind of boxy objects, illustrators use object align cutting planes to form what we call box cuts. And for thin shells like skin or the chassis of a car, illustrators use what we call window cuts, whose boundaries seem to often correspond to smooth geodesic paths on the surface. For regularly symmetric parts, illustrators use wedge cuts that are centered around the axis of a radial symmetry. And finally for tubular objects, illustrators often use transverse cutting planes that remain perpendicular to the tube's main axis. Now, the one possible rationale for this convention is that by using different shapes of cuts for different types of objects, it helps to emphasize the actual -the shape of the object being cut. And in so doing, the resulting cutaway I -- we believe makes it easier for viewers to mentally reconstruct the missing geometry. So how do we incorporate this convention into our system? Well, the main challenge in our approach was to figure out some way of interactively or automatically generating these four different types of cuts. And the approach we used was to specify a parameterization, one for each part in the model, that corresponds to these different types of cuts and lets us easily generate cutaway views. So for example, for box cuts, our parameterization consists of is we use a 3-dimensional parameterization which is visualized here, and the three -- these three parameters correspond to these three object aligned axes of the object. And so in this visualization and the ones that follow, the purpose boxes here represent the cutting volume represented both in this parameter space on the left as well as in model space on the right, and these pink boxes correspond to the maximum possible extents of that cutting volume. Okay. So what does this parameterization help us with? Well, given these three parameters, we can now easily specify different types of box cuts despite setting these -- setting these six parameters for the dimensions. So the box cuts, you know, these aren't particularly interesting. They're not that different from standard cutting planes. So let's take a look at some of the other types of cuts. For wedge cuts we also specify three dimensional parameterization that corresponds to the angle, the length and also the depth of the wedge. For tube cuts we specify a one dimensional parameterization that simply defines the positions of the two transverse cutting planes along the tube. And finally we also have two different types of window-cut parameterizations. But in the interest of time, I'll just skip over these two. So what I'm not going to get into today are the details about the algorithms we developed to automatically specify these parameterizations for individual objects. I'll just say at a high level we use a variety of geometric analysis techniques to automatically assign these parameterizations to each -- to each part inside of an object, so that the user does not have to do it manually. Okay. So I just wanted to show a couple more, a couple more examples from the system. So here is an example model of a disk brake. And this dialog box on the right just lists all of the different parts inside of this object. And this user can just double click on one of them to specify as a target part and the system will generate this cutaway automatically. User can also specify multiple target parts. That's not a problem. And in some cases the system will decide to big a better viewpoint to better expose these parts. This next example is a human anatomy example and here the user selects two different two different parts, the thyroid gland and the neck muscle. And the system automatically generates this cutaway showing these parts. Here again is the turbine example. And I'll just point out that in this case as part of the authoring process, the user has grouped sets of parts into the subassemblies which are denoted here as part groups. So the user can just kind of cycle through these part groups, and the system will automatically expose all of the parts within each group. So far I've just shown a fairly automatic way of using our system, which is to just select target parts of interest. But the user can also specify cuts in a more interactive manner. So, for instance, here the user draws a scribble to open up a wedge cut and then can change the size and position of this cut using constraint direct manipulation. Here is this one final example of a more complicated model. And this is played back at two times speed. Okay. So just to summarize some of the contributions of this work, one of the really important parts of this project was just identifying and distilling these set of cutting conventions that allow us to create effective cutaway illustrations. From a technical perspective, the main contribution is this parameterized representation of cuts that I described. And finally we authoring and viewing interfaces for working with our interactive cutaways. Bless you. So I talked a little bit about cutaways here. And cutaways are especially good at exposing target parts or parts of interest in situ with respect to their spatial surroundings. But they're less good at showing the overall arrangement of parts within an object. And for creating -- for that goal illustrators tend to create exploded views. So let me talk a little bit about some work we've done in this area. So this is a kind of a similar project to the cutaways project that we also presented at SIGGRAPH a couple of years ago. But it focuses on interactive exploded views rather than cutaways. And just like with the -- excuse me. And just like with the cutaways, it was really important for us to identify the key conventions or design principles for creating effective exploded views. And in particular we notice that illustrators tend to consider several different criteria when creating these diagrams. First of all, they arrange parts along these explosion axes in a way that respects their blocking relationships. So by doing so, the stacking -- by doing so, these diagrams emphasize the stacking relationships between these different parts when they're put back together. In addition the parts are separated far enough apart that all the parts of interest are visible. So this is a way of reviewing the occlusions. But, on the other hand, they're not separated so far apart it becomes difficult for the viewer to understand how the parts fit back together. So there's this notion of compactness as well in these exploded views. Parts typically are exploded along just a small number of canonical explosion directions. And, you know, we believe that this kind of helps if you're mentally reconstruct the object. It's not that the parts can move in any possible direction. They tend to move along only a few small -- a small number of directions. So the user can kind of understand how they all fit back together. Finally if the assembly has some kind of hierarchy of subassemblies, these subassemblies tend to be exploded separately. So let me just show a few results from this system. So much like the exploded -much like the cutaway system, our system -- our exploded view system generates not just static illustrations but dynamic ones. And again, the user can select target parts of interest. And the system will kind of interactively and automatically generate an exploded view that just exposes those parts. User can also interact kind of more directly with the parts by dragging them too. So I'm going to skip some of the details of the approach here and just show some more results. So -- this one's already shown. So here is a case where we're combining actually some of the cut way's work with these exploded views. So here the system -- here the user selects a subassembly to expose and the system first creates a cutaway and then creates an exploded view to expose the individual parts of that subassembly. Much like with the cutaway system, the user can also interact more directly with these exploded views. Here the user just clicks on individual components to explode them. The user can also directly manipulate parts along their explosion axes. And here the parts all kind of respect their stacking relationships as the user drags. And we also implemented a kind of a riffling interaction where the user kind of hovers over parts to get a quick sense for how they move with respect to one another. I'll show just one last example here. This -- you typically don't see exploded views of anatomical models. But you know we had this anatomical model so we figured we would just experiment with it. [laughter]. You know, the system actually worked reasonably well. It's just kind of analyzing blocking relationships, analyzing geometric properties. And so I don't know if this really the way you want to visualize these types of datasets but it was kind of an interesting experience. So just to quickly summarize the contributions of this work, once again, the conventions were critical in actually producing effective exploded views. We presented a method for automatically generating these. And so I didn't talk about this, but there's a kind of an explosion graph data structure that encodes most of the information necessary for these exploded views. And finally we also presented some interactive ways of exploring these kinds of models. Oh, and as I showed, there was -- we presented one approach for kind of combining both explosions and cutaways. So, Brian, I have about five minutes until 9:40. Can I just take a couple more minutes to describe one more project. Okay. So I'm going to skip over this one. This is a -- some work we did on creating exploded views of mathematical surfaces. I won't say any more about that, but it's clearly related to -- related to our mechanical assemblies work but focusing on a different domain. Okay. So the last thing I wanted to touch upon just briefing are these how things work illustrations, which -- and this is work that we also presented at SIGGRAPH just this past year. And here we were primarily motivated by the work of David Macaulay. Some of you might know him, especially those of you with kids. He makes really fantastic illustrations that show well, you know, how things work. And we identify three main techniques that illustrators tend to use to create these types of illustrations. Motion arrows indicate that the ways in which different parts move. Frame consequences are often used to explain more complex motions by breaking them down into individual steps. And finally in some cases animations can be used to help understand the dynamic behavior of these assemblies. So what our system does is it takes as input a geometric model, like the one that you see here. And here there's no motion specified. It's just a static model of the actual parts. And then we use geometric analysis to understand how these parts interact and move in order to create static illustrations like this. We also create frame sequences that show the sequence, the causal chain of interactions from the driving part to the rest of the parts if the system. And finally we also can create animations like the one you saw earlier. Okay. So just to show you a couple of the results from this work. So here is one model that's loaded into our analysis interface. And here what the user does is this is an automatic system, so the user just says run this part analysis which computes kind of plausible axes in motion for each of the individual parts. And then we also build an a interaction graph, which is a kind of a way of encoding all of the different interactions and motions between parts. And once that's done, the model's now dynamic, so we can interact with it or we can just run an animation. And here we can compute the arrows based on this motion. So remember, the input here had no motion whatsoever. This is just a static model of geometry. So let me just show a couple more results. Here's -- here is a kind of planetary gear box. Which is kind of interesting because it has two different possible configurations, one where these outer rings actually move and here the user says, okay, let's try to keep them fixed and see what happens when we drive the mechanism forward. And now the kind of inner smaller gears rotate or orbit around the main axis. So here as the animation is running, the user can also step through the causal chain and we just use kind of simple highlighting to emphasize these parts. And we can also combine this with the exploded views work to create exploded how things work illustrations. Okay. So finally the contributions of this work. You know, once again this is kind of a recurring theme but identifying these design guidelines and conventions were kind of a key contribution. We also introduced some motion and interaction analysis for automatically computing the motion of the parts. And finally we presented some automated visualization algorithms for creating these types of how things work illustrations. So just to summarize, I've talked today about a number of different techniques for creating these types of illustrations. And I think actually the main -- one of the main take-aways that I hope you get from this is that it's possible with a combination of geometric analysis as well as really understanding the relevant design guidelines to create effective interactive visualizations without a lot of manual effort. Now, for most of the systems I described, much of the process of creating these things was automated. And so I think I'm just going to end there so that we stay relatively on time. And if there's maybe a couple of questions, I'm happy to take them. Thank you. [applause]. >>: I think I missed the very beginning part of what software you're using to create those. >> Wilmot Li: These -- this is all software that I wrote or, you know, I wrote with -- in conjunction with my collaborators. They're all research prototypes. So they're not -- you can't get them right now unless you ask me, I guess. But, yeah, we kind of just built these prototype visualization systems ourselves to explore some of these ideas. >> Brian Hitson: Any others? >>: I guess my one word description would be wow or cool or something like that. So I think my boss took advantage of this cutaway technology to open up my brain and see why it is that I disagree with him sometimes and then he labeled that part of my brain poor judgment or something like that. So thank you very much, Will. I appreciate it. >> Wilmot Li: Great. Thank you. [applause]. >> Brian Hitson: Okay. Thank you. Now, moving on to our second speaker, this is Robert Hanson, who is -- Robert M. Hanson who is professor at St. Olaf College, professor of chemistry. And St. Olaf, for those of you who don't know, is in Northfield, Minnesota. He's the principal developer have the open source Jmol applet and project director for the Jmol molecular visualization project resulting in the transformation of Jmol into a powerful Web based visualization and analysis tool used by a broad interdisciplinary community of scientists and educators, representing the full range of activity from K through 12 education to PhD level research. In collaboration with the nano biotechnology center at Cornell, Dr. Hanson has designed a Jmol based exhibit can at the Epcot Theme Park in Orlando called touch a molecule, which opened in February 2010 and is expected to have over three million interactive visitors. And so we're very fortunate to have Dr. Hanson here. He's a colleague of Brian McMahon and John Helliwell from Crystallography Union. And there was a workshop in Paris last year on interactive publications. And so this is kind of a nice graduation on to multimedia visualization topics from there. But it's nice to have this continuity of the Jmol topic as part of the program. His presentation is Communication in 3d challenges and Perspectives. So help me welcome Dr. Robert Hanson. Thank you. [applause]. >> Robert Hanson: Thank you, Brian. Is my mic on? Okay. Well, it's a pleasure to be here. And sorry I missed the last one. Sounds like that was good. Wilmot, you might be interested to know that we have Jmol in PDF files. So we can -we're interested in developing that some more. But that's one of the sorts of visualizations we can do. Okay. What I'd like to introduce to you is a tool that's out there for molecular visualization, Jmol. Just raise your hand if you have ever heard of that before. Hey, what do you know. Okay. How many of you never heard of that before? Good. All right. [laughter]. My pleasure to do this. I am a professor. I teach undergraduates. St. Olaf College is an undergraduate only liberal arts institution in Northfield, Minnesota, about 45 minutes south of Minneapolis in the southern part of Minnesota, although I -- we do have two feet of snow on the ground right now. But so I hear maybe Philadelphia does, too. So I think we're all getting the know this year. As such, I design all my talks around 55 minutes. And I expect Brian to bring out the whip as soon as you're ready to tell me to go. Okay. So I have to tell you how this -- my business in Jmol got started. It's actually with this book which we published oh, gosh, it's 15 years ago now with University Science Books called Molecular Origami Precision Scale Models From Paper. This is a bizarre little project that I was working on as part of a grant that I had and interest a publisher. And my publisher really was kind of worried about this because he said how can I publish a book that you either have to rip the pages out of photocopy in order to use? And I assured him that it would be okay. And he compromised by having the pictures only on one side of the page to maybe encourage the cutting out rather than the -- but the idea was to build a set of models, to allow people to build models. And in chemistry we do a lot with models. Anybody ever had organic chemistry? You remember the plastic models? We still actually use those some, but we've gone much more to virtual models. And you'll basically see a bunch of those in this talk. But this was a real retro idea that maybe we like handheld models and maybe we have two hands so that we could hold one and hold another and compare. And I had a lot of fun with this. Students had a lot of fun with this. We build these models out of paper. Give you a sense probably of what we're talking about here maybe. For example, this is a quartz model. That's the paper on the right. This is a really nice zircon model. These are actually markings that show distances and angles. They're precision. They're scale. They're 200 million to one, generally scale. And my interest in Jmol actually derived from wanting to put this on the Web and having a more interactive version of these paper models. So for example here's the Jmol version of that particular model. Okay. So my actual introduction to Jmol -- okay. So Jmol is a applet primarily that interacts with Web pages. It's a project that was worked on many years before I got involved in 2004, I believe. And I did so because of this wanting to get some renderings on the Web of interesting molecules. And so one of these projects was to have a database of structures that people could access that were of interest so I selected about 1,000 compounds out of the Cambridge crystallographic database and the idea here is that one could investigate these. And I needed something that I could display the 3D structure for. And so my first application actually of using Jmol as just a user was this little window into the molecular world that allowed us to do interesting things like measure distances and compare structures. Okay. So basically Jmol, the J stands for Java. It's an applet and a stand-alone application. As an applet it plugs very easily into Web pages, works on every browser we've ever tried it on, as long as Java is implemented on the hardware. And it focuses definitely chemistry related, molecular structure, has a -- I've been working for the last five years to develop a very rich internal scripting language. So one can guide a user through a structure, not just present it statically, not just present in it a way that they can manipulate it, but provide them all sorts of controls. And I think the excitement of this for me is to have seen hundreds of applications of Jmol. I'm sure if you -- if you look on the Web, you'll see many, many applications where people have come up with completely different ideas of what to do with this applet. Because they can actually script it. And it interacts with JavaScript on a Web page so it plays nicely with links and buttons and such. It's an open source project. There's absolutely no external funding for this. My wife said pass the hat and maybe somebody will have a spare hundred dollar bill that they could slip in there. We just do that because we love it. And I've incorporated it some into my professional development at St. Olaf. It's highly multidisciplinary. It started out very much in the sense of sort of small molecule chemistry, quickly developed into an applet that could display proteins and nucleic acids and biomolecular structures. More recently we've introduced the full collection of crystallographic techniques and properties into it so it can really just about every crystallographic file that exists and process it. A group of mathematicians have found it, and there's a project called Sage which is a project which allows people to interactively do mathematical creations. And the online version of Sage uses Jmol to deliver mathematical surfaces and structures. Just because molecules aren't that much different than everything else in mathematics, a bunch of nodes and connections and surfaces and such. More recently -- most recently I've been working with a group in Canterbury at the University of Kent if the solid state physics area. So part of my fun is I get to learn all of these different areas that I've never actually studied myself before. It's really great fun. Here's a structure on the cover of an R and A journal from last year created with Jmol. So you can see we can develop rather complex structures. A number of journals are using Jmol for interactive display of figures. A figure can show up as a -- just an image. User clicks on the image, a window pops up or that particular place on the page turns into 3D. And I think this is actually a somewhat old list. There are probably more than these. One example is a paper that Brian McMahon and I wrote -- well, I wrote with Brian last year and for the Journal of Applied Crystallography developed a -- I can see it here. A method of showing figures. So, for example, here's my journal article figure collection. And these are just ping images. But if you select one and click on it, it will show up in a 3D fashion as a popup window. So that's one mode that it's done. The actual journal article online is this one, Jmol a paradigm shift in crystallographic visualization. And you can see here that throughout the text what they did is they added then the figures in this interactive fashion. And you can click on the figure and get the interactive view. One of the very nice aspects of Jmol that we built into it is the idea of a very simply defined state so when I -- this is a ping image, but that ping image actually encodes the three-dimensional structure as well. And so for example I'm going to switch here for a second to the Jmol application, not the applet, and here simply a Windows directory somewhere here with a bunch of ping images. But these ping images were created with Jmol, so if I bring them back to Jmol just by dragging, that turns that two-dimensional image back into 3D, and we get to explore at will. And it should look exactly like it started. So that's been a fun innovation. Well, what I'd like to do is just spend a little time showing you some examples and hopefully leave a little bit of time for questions if people have them. I just picked a very few examples just to give you a sense of the range of possibilities that Jmol can be involved in. So let's just start simply with chemistry. I'd like to show you a little work that was done by a friends of mine, Otis Rotenberger, at University of South Florida, I believe. And this is the organic chemist's model kit but now in virtual 3D, which allows you to play. And there are lots of ways of getting at this. So now, for example, if we pick a -- we can draw a structure in 2D and then have that structure turned into 3D. And the reason I'm really showing you folks this is that to let you know of a tremendous resource that we have been tapping into. There is a database at the NIH of molecules, I don't remember how many millions there are now, but they're all accessible and available to us very simply. And so for example this is my application again. Let me just pull up a console here. Name a drug. That I can spell. I think I can do that one. Do I do it right? Okay. So this Jmol has simply tapped into the NIH database and, you know, everything from simple structures, see if I can spell coding. There we go. There's coding. This is -- this is something that's just totally unheard of even a couple years ago, that you could simply say a word and the name of a compound and instantly have its structure. And this is -- this is just a tremendous resource that the National Institutes of Health have developed for us. In the area of molecular biology we can do all sorts of things with protein structures. This for example is a little application, a Web application I wrote called Jmol protein explorer. It was based on an earlier version that used an earlier application than Jmol called Chime. How many of you have ever heard of Chime? Okay. So you know that that was the way to go. It was a plugin some, what, 10 years ago? I think. And then we lost it to the world. Jmol is basically that replacement. Now, the one other thing I wanted to emphasize here which is I think really, really cool is that down here we can take whatever view we have and save it to our hard drive and then just drag it into Jmol or drag it back to this page and we'll come back live 3D. And in addition what you can do is e-mail it to yourself or e-mail it to a colleague, and it will come as an attachment that they download, it's just a webpage, they click on the link and it opens them up and they see exactly what you saw right here. So it's a way of conveying information to others. Well, I've had a lot of fun learning about crystallography in the process. I'm actually an organize chemist, but I love mathematics. I guess you probably could guess that. And crystallography is just a beautiful application of mathematics within chemistry and physics. And many of you may have seen structures, crystal structures like I was just showing you results of structures of proteins and it's easy to get caught up in the idea that these bonds are really there, the atoms are really little balls and there are sticks in between them and, you know, this model that we see over and over again. And I had a little project this last year to try to get behind that, and so I learned some about crystallography, and the idea here is really I think a cool, cool -- have I said that word too many times, a great idea. This gets to the idea of trying to inform people about where data comes from. And if you take a protein structure as the final form and that's all you have, you get no indication of uncertainty or anything about that structure. It's a long way from the actual data. So here's the idea. Crystal structure comes to us ultimately as a set of little points in space on a grid, basically a whole bunch of numbers. And this little application is designed to reenforce that. And so here I have a challenge for you. Can you tell me what this structure is? >>: Salt. >> Robert Hanson: Salt? No. Well, first of all let me ask you this. Can you see any structure in there at all, or does it just look like a nice grid of snow? Tell me if you see any structure. You see a little bit? Let's go to a black background just to see if you can see any structure in there. Isn't it just amazing just changing the background can -- you see it now? Well, that's a crystal structure. Now, what crystallographers do, actually, is at each one of these points is a number. And all you have to do is say, well, just show me the big numbers, don't show me the little numbers. I'm going to give you a cutoff value. And I want to know just where the points that are greater than, and you tell me a number. So this is cutoff zero, meaning show me all the points. But watch what happens if we say oh, maybe O.2. I think I'll go to the black background. Now we're starting to see a little bit more structure maybe. If we go really high, we might lose the whole thing. I don't know. What x-rays are doing is diffracting off the electrons of atoms. And especially the core of atoms have a lot of electrons in them generally. And so we're seeing basically the data representing the cores. Now, what you usually see is this. The typical sort of data that goes in to producing a protein structure is usually represented as this sort of abstract mesh of data. And one of the things I find interesting is that there are no carbons or oxygens or nitrogens listed there. The diffractometer does not list the atoms. It just gives you this. And then it's left to the interpreter to put in what they perceive to be structure. I think my wire frame is not showing up for that particular part. But the oxygens and the nitrogens and the carbons are really interpretations of all that. So we've had some fun with that. Just a couple more examples. Mathematics. I mentioned the Sage project. Here's just a little, little quick application actually that my son wrote in the seventh grade, I think. The idea is something called Sierpinski's triangle. And the idea is to create a triangle. And this is just an example of the kind of scripting that you can do in Jmol to manipulate objects. And we were interested in that. That was kind of in his book, and we were interested if that would work in 3D. And it turns out that, yeah, you can -- you can create that same kind of object in 3D. Okay. I think I have one more example. It's a little off the wall. I hope you don't mind. How many of you like Sodoku? A few of you. What's the next move in this puzzle? I'll give you 10 seconds. [laughter]. I get somebody could find it. Well, I kind of went nuts on Sodoku a couple years ago. I don't know, with the way my brain works, I decided it was really a 3D puzzle, not a 2D puzzle, that I was really looking at a stack of numbers and it might be interesting to see it in 3D. So of course why not just use Jmol, right? So my Sodoku co-assistant here -oh, actually this isn't the one I want to look at. I think this one is more fun. Came up with this idea of looking at Sodoku in terms of three-dimensional object rather than a two-dimensional object. So we're back to the idea of just visualizing things in new ways I hope here at the end. And probably no closer to the solution, right? But let me show you something. If we were to kind of look at this from the side so we have our different -- the numbers are from the top to bottom, one, two, three, four, five, six, seven, the big balls are the ones that have already been determined. We don't need those anymore. Let's get rid of those. But in these little lines in between them are the logical chains. It just simply says if it's this, it's not this, but then it's this and it's not this. If it's this, it's not -- like that. Got the little individual, logical connections. And they make these three-dimensional chains. And if I look at one in particular, here's your next move. This is a really interesting construction in terms of Sodoku. What you have is a logical chain that's going around in a pattern. And you see how it's connected and it's connected and it's connected all the way around that site. Well, if you were to connect that the rest of the way, you would have impossible solution, Sodoku. And if I were to simply remove this point right here, if I were to say it's not this, which is the number four, then this structure would become a complete unit and it would be an impossible solution to Sodoku because the Sodoku puzzle would have two possible solutions, and Sodoku puzzles only have one possible solution. And you can see that when you go back to the standard view what we're actually seeing here is this 5, 9, 5, 9, 5, 9, 5, 9. And how many of you could tell me that that then has to be the number four? You can't have 5, 9, 5, 9, 5, 9, 5, 9, 5, 9 all around, because then it could either be 5 or 9 in every cell in that loop. That's two solutions. So it's got to be four. I love this. And then it's just a very simple solution from there. [laughter]. Okay. So maybe my end -- my bottom line should be Jmol bringing you solutions to random puzzles or something. But basically that's my presentation. And I thank you. And happy to answer questions. [applause]. >>: Beautiful work. I'm curious. Have people put in interactions or some kinetic-type structure like how things would interact in terms of for covalent or ionic bonds or just force types of things or protein folding? Different forces in addition to the stick representations or different geometric representations? >> Robert Hanson: You mean in terms of designing animations that then are driven by these principles or do you mean individualizing the forces themselves? >>: Or a few interactions between molecules or within a molecule? >> Robert Hanson: Yeah. There's all sorts of different ideas that people have come up with and various ways of coloring atoms based on parameters. So if you give me a parameter, we can color a surface or color an atom. So proximity to some other group, proximity to a positive charge. In the chemical informatics where they're trying to bind -- show binding, there's some really interesting visualizations you can do with that. Absolutely. >>: Is it possible to actually take two molecules, has anybody done that, figure out how they all interact? >> Robert Hanson: Is it possible to take two molecules and see how they'll kind of interact? Yes. Now, I would say Jmol is primarily a visualization tool. It's not a high-end tool for that purpose. And so we -- our sort of philosophy is let other groups do better what they do best, and we'll fill in the gap with the actual visualization. So typically what people do with that is they will design some sort of an animated sequence that's all precalculated, and then they would use Jmol, for example, to produce a movie of it. Something was just coming to mind as you said that. Oh, let's see if I can do this really fast. We have something called a model kit mode in Jmol which allows you to add and rearrange atoms. And there's -- there's a wonderful click I love here. It's called drag atom and minimize. And I probably shouldn't do it with such a complex structure. But it would be kind of interesting to see how the molecule respond if you were to take an atom and just kind of move it someplace. And this is really addictive. Because it's just like whoa whoa whoa. So in the old model kits where you had to have things just the right way, just put it anywhere you want and see if we can kind of move it to the other orientation. A little bit along those lines. There's a minimization function down there. >>: [inaudible] question or with organic chemistry it's very hard to remember all the reactions. I'm just wondering if people use this for education to show interactivity between different organics. >> Robert Hanson: Oh, yeah. I could show you about a hundred examples of reactions in 3D happening and the atoms coming in and atoms going out. >>: [inaudible] reaction type program [inaudible]. >> Robert Hanson: UC Irvine. I'm not sure. >>: They've got a reaction viewing type program. >> Robert Hanson: Sir. Or Madam. >>: So you talked about [inaudible]. >> Robert Hanson: Why don't you wait for the microphone, because I think it's being recorded. >>: All right. You talked about an internal scripting language for doing some of the guides. Do you have -- not Internet, but do you have like a community I guess where you can like share it and contribute the things that others have already done? >> Robert Hanson: Yes. We have a very vibrant user community, a user list always sharing and asking for help and, you know, I can't quite get this page to work, how do you do it, and somebody within a couple hours will -- from somewhere on the planet will respond to that. And there's full documentation on it, too. Lots of examples. Every time I introduce something I create a little example file that shows how it's used. >>: Just curious if you've heard about the executable paper grand challenge? >> Robert Hanson: No. What is that? >>: I think you might want to enter it. >> Robert Hanson: The executable program grand challenge? >>: Executable paper. >>: They're running a competition for a paper which is interactive. >>: And executable. And I think you might want to enter. >>: I don't know when the deadline is. >>: Some money associated with that maybe? >> Robert Hanson: That would be nice. >>: Next year. >> Robert Hanson: Next time. Yeah. >>: John [inaudible]. >> Robert Hanson: Make him run. Get his exercise. >>: Bob, that was great, and thanks for coming by the way. On the uncertainties in these representations, this feeling of everything is equally good is another is a real problem. >> Robert Hanson: Right. >>: And you alluded to it. And I think you were hinting if you raise the threshold of the contour level and the less [inaudible] ones disappear and that gives you a feeling for precision, imprecision, we discussed this at the [inaudible] yesterday and I went in with that point and an even more basic point came out which is how do we deal with people that think sulfurs are yellow, nitrogens are blue and oxygen is red? >> Robert Hanson: You mean they're not? >>: The real point is the [laughter] first point. But it is a true problem. >> Robert Hanson: You know, it is a problem, but it would be even more problem if we didn't have some kind of systematic way of representing the color of molecules because the rest of us who know that that's not true would just go totally nuts with what's this color today? So it's probably a best of all evils to have at least those colors standard. Yeah, good point though. >>: Any other questions? >>: Hi. I'm an scientific illustrator with a scientific journal, and so we get diagrams in but we often tweak -- need to have the authors tweak these. And a couple of questions on that. My first question is you mentioned the NIH data bank. And excuse me if I'm misinformed or uninformed on this. Can you import from -- there's all -- the protein data bank, you can just import models into Jmol and they will -- it will, you know, automatically as you showed us here. >> Robert Hanson: Exactly. >>: It will render the molecule. Second part of the question is can you choose between this, what we see here as stick ball representation versus what [inaudible] describe as more clumpy representation of a molecule versus a ribbon diagram? Because sometimes we need different representations to show the same molecule. >>: Yeah. Absolutely. Jmol has a full representation of these various implementations. This just came from the PDD database. It's a little protein called 1CRN. The default rendering is typically the ball and stick, but you can get anything that you can do in any other program that I know of. So say you wanted to say, I'm just using the command language because it's easier for me since I wrote it, very common thing to display is the molecular surface. And so getting back to the uncertainty business, a common thing to do would be for example to take this surface and actually color it based on the uncertainty in the file for that particular structure. And you can see that this -- I think there's a tire scene out here maybe on this end. What is that? Yeah, it's tire scene. Is the most floppy of the structural components there. So cartoons, the whole works. You know. >>: Okay. Thank you very much. >> Robert Hanson: Appreciate it. >>: Thank you very much. [applause]. >> Brian Hitson: Let me just say before I introduce the next speaker, we have a really nice cross-section of speakers and talks on our agenda today. And ICSTI is thinking about trying to have a continuing series in this product area of multimedia visualizations. And if you would like to see topics in future workshops that we're not covering today, please take the time to jot down some thoughts on that and give them to me or Lee Dirks, your session chairs, and we will take those into serious consideration for future workshops as well. So anyway, thank you, Bob, very much. It was a nice smooth segue really to our next speaker because this carries on into even further uses of interactive and publications, interactive tools and publications. And here to give us this presentation is Dr. Michael Ackerman, who is the International Library of Medicine's assistant director for high performance computing and communications providing guidance for NLM's telemedicine distance collaboratory advanced networking and imaging interests. He was a research physiologist in the Naval Medical Research Institute and later head of the institute's biomedical engineering and computing branch. At NLM he has worked -- his work has included applying technology to medical education, including probably most famously the visible human project and overseeing the library's non-bibliographic databases. The title of his presentation is Interactive Multimedia Scientific Publication. Help me welcome Michael Ackerman. [applause]. >> Michael Ackerman: Thank you very much. For some of you this will be a review or partly review. Elliott Siegel gave parts of this last year at the winter ICSTI meeting. And I've seen some of you also at CENDI. But the program committee asked that I review it and bring you up to date as to where we are on this particular project and have we reached the ends yet? We have not reached the end yet. We're almost there. One of the things that we learned unsaid as we go through this is that perhaps we did too much too soon. And I'll let you be the judge of that of what we actually wanted and what the public or the learned public is ready to chew on and use at this point. But I'll let you be the judge. The goals of our project -- the idea of the project was very simple. People were and are publishing by way of publishing PDFs. And to us, using the computer media and a PDF is okay, but it's really a waste. Because the computer media is capable of so much more than the representation of a piece of paper which is PDF. And if you will, publishers, please excuse, but you may agree. Publishing by way of PDF is just redistributing the cost of printing instead of you doing it and we paying for it, we do it and we pay for it anyway. Because I think it costs a lot more in toner and paper than what we pay publishers. But people seem to like that. So we said if you're going to do things in PDF, there's got to be more in the computer media and perhaps the PDF than is currently being used. And for the library's point of view, we're not necessarily interested in the publishing business and what you all do but rather if there is something greater and better, is it worth doing? So our first goal was to evaluate the educational value in scholarly journals. Is it worth doing this extra tough rather than just distributing by PDF. And if it is worth doing, then you know, the National Library of Medicine is the ultimate archiver in health. And what is this going to do to our problem of archiving literature if not only do the archive the flat literature, the paper, but also all the additional value that comes with it. And how do we do that? Now, we got together with the Optical Society of America. They're also interested -- they had a similar vision. We'll talk about that. And we talked about if we add databases to this collection, then how do you get to those databases? What is the peer review process, not only of the journal but of all the data and datasets that go with it? And also to give the viewers an independent way of looking at the data, and here's the first departure from where we are. When you publish and you go to the tenure committee, they ask you what have you published and how many people are using it? How valuable is it? What we're proposing is that if you have to publish in your PDF the datasets behind your article they should be available independent of the publication which means that if I read the publication and I like the dataset and I see something in it that the author hadn't thought about it, I could write a paper based on that dataset. And I could be point to that dataset in my independent paper. So the second tenuable commodity is how many datasets have you published and how many people are using them? This is something that's out there. We may have gone too far with that notion. But we still think it's a notion of the future is that data, especially data which came from the public coffers should be public data. And an author doesn't necessarily see everything that's in it, others see what's in it, and it would be nice if they could reanalyze it and come up with new things. Which also brings us to an unsaid goal of the project, which was held by our director, Dr. Donald Lindberg. The NLM, because we archive and we index the medical literature also sees all of the papers that are withdrawn because of false things in them. That have been forced to be withdrawn. Dr. Lindberg's thesis is that if the author not only published the paper would have to publish the data, the first or second or third person to read the paper and look at the data if not the editor themselves, would realize that it was falsified because the data itself was not stand up to the microscope which is being somehow shadowed through the evaluation committees. So he had this unsaid thing that this would be a good way to screen the data because it's Web 2. We would also screen the data, not just the couple folks that get to read it for the journals. Why we pick OSA? Luck. It turned out that those folks and our folks were at a meeting. We got to talk and we both realized that we had the same vision of using the PDF as a way to do advanced programming in imaging. Now, our vision is not only imaging but it's also in non-image things. When you publish and show a graph, wouldn't it be nice if you could click on the graph and get the data behind the graph so that you could reanalyze it? So that you could combine it with your own data and see how yours was the same or different than the published data. The OSA Optical Society of America is very interested in the visual things, and so we limit ourselves in this project to imaging. But we have another project at NLM that's dealing with data other than imaging, the kinds of things that show up as charts and graphs. We were lucky. Four of the OSA journals appear in Medline. So they're a member of the club. It's not like we're bringing in an outsider which might be looked upon not so nice with people within the health community. They're top ranked in ISI, and they have a long history. So it's a safe partner and as I said a member of the club. So the idea here is to publish special journals issues which they suggested would probably be Optics Express on biomedical research topics. There would be an online version that incorporated the printed version because Optics Express as well is printed, and the source data videos and other media objects so that you could visualize for yourself those pictures and the data behind it and that you would do that by downloading a free plugin or plugin-like software to the PDF reader. And obviously the downloads should be quick and easy. That's not a small thing to do. These are image datasets, image datasets by their nature full image datasets are huge. So since the NLM is interested in is this words doing from the educational point of view? That's our goal here. If you're going to go through all this trouble, does the reader care? There would be authors, reviewers and readers would be asked for feedback at every stage of the way. What was the extra work? How are they doing it? Was it worth from it their point of view and especially the reader. We would do a usability analysis. The articles that came out of this would be indexed in Medline. That was terribly important if you want to get people to write these articles. And that the datasets would become open access, fully citable and archived in OSAs -- currently in OSA's InfoBase database which is an open source database. Datasets would include the data and the metadata behind them. They would be discoverable. You could Google them. Or bing them in this building I guess [laughter]. Lightning didn't strike. So you could get to them. And as I said, they would become -- they would -- they could be accessed through other publications so they in effect would become what I will call a tenuable commodity. So the first round we published three issues the first one in October 2008 was seven papers, 45 datasets, called Interactive Science Publishing Introductory Issue. It was a plethora of things. The second issue was on Optical Coherence Tomography in Ophthalmology. 17 papers, 242 datasets. And the third issue a year later, October 2009, was five papers, 43 datasets in Digital Holography. So that made up the first corpus. And while this was going on evaluations were being done. And what we learned in that evaluation is that people generally liked it. But the reviewers had a sense of being overwhelmed by the job. Well, you can imagine being a reviewer and you've all either done it or helped us do it. And you receive a paper that's 10 or 15 pages and you read it, you check the references and so on and now this paper comes along with seven or eight or 10 datasets. Well, what are you supposed to do? You kind of look at it and say oh, that looks pretty good. You certainly look to see if you see what the writer said was there and if you agree with it. But do you go beyond that? You have the whole dataset. You know, it's a joke in this area, it's see figure 3. Typical x-ray of the chest. And you're thinking see figure 3. The best x-ray I've ever seen of the chest. Because why would you publish the typical one? You only say it's the typical one. So we all have these euphemisms. If this is what the author wanted me to see because he got it in just the right angle, these are three-dimensional datasets. What if it were off axis a little bit? Would it still be there? If this is such and important finding, why can I only see it in one dimension? And so what should I do as a reviewer? This turned out to be part of the experiment because we didn't give the reviewers directions. We said review it. You're a reviewer. Now, ultimately that came back and bit us. Because Optics Express requires at least two reviewers of every paper that gets in. Many of these papers had one reviewer, and that was because OSA did a lot of begging and called in a lot of favors. So it turns out it's a massive job for the reviewer. Readers liked it as long as they got past the learning curve, which is extremely steep. There were installation problems, there were navigation problems, and the 800 help number didn't answer. So there were help problems. So those folks that got past that and were able to do it liked it. But I have to tell you the majority never got that far. And we learned that from telephone interviews, pop-up windows and stuff like that. So we stopped at the third paper. The third publication. And we rewrote the free download software. It's version 2.3. It's been rewritten several times. But the current version as of April 2010, when we froze it, was version 2.3. And then in July, we put out the fourth issue, which is four papers, 45 datasets entitled Imaging For Early Lung Cancer Detection. We are now doing an intensive evaluation effort. The usual life -- the usual pop-up questionnaires, but also we are discovering who many of the users are, and we have a company that is making arrangements to call these folks one-on-one to ask questions about what their experience was, especially their experience which can be summarized as saying was it worth all the effort to do this? The almost final assessment is that we cut -- we solve a lot of the problems from chapters one, two, and three, that the new software is much better. But that the user interface is not intuitive enough unless you're a radiologist. Of if you're a radiologist, it's like working at any radiological work station. If you're not, you need help. There's now a lot more help functions in this and so on. The first time I picked this thing up, I got it to load. That was not a problem. Well, it was a problem because I'm not a -- I'm not a privileged user at my computer, so I have to call IT so they can unlock it so you can load a program because this is not a JavaScript, this is an executable that loads with PDF, and it comes in Mac, PC, and I'm blanking. But you all know the third -- the third. >>: Unix. >> Michael Ackerman: Unix. It comes in the Unix version. So it works on all of those. So I called and I know people at OSA and I said, okay, it's there. How do I change the grayness and the contrast? And they said oh, you move your mouse across it left to right to change the gray, up and down to change the contrast. And I said how would I know that? And they said, well everybody knows that. [laughter]. Well, everybody who is a radiologist knows that, because that's what they teach you in radiology school. But I didn't go there. I went to engineering school. They didn't teach us that. The new version now has ways that you could know that. It's very much like handing a novice, handing them Photo Shop. And I'll demonstrate. It's just overwhelming. But for those that know it's very good. For those that don't, it's a bit of a struggle. Eighty percent of the people we've talked to now have said that they've enhanced their experience, and 50 percent think that it's increased their learning be and understanding. Interestingly enough all of them, a vast majority said it's really good, but we can do the same things with Matlab. And we've written our own little thingies to do these kinds of things using Matlab. Very interesting. These are the user recommendations. And when I demonstrate some of this to you, you'll see why these are user recommendations. Eliminate the need to download and install the software. Make ISP, that's what it's called, the Interact Software Package, a Web based thin client instead of PDF based. So we don't have to download or call IT or do whatever. And, by the way, that's one of the reasons that I thought we were all going to all use a common computer and I specifically asked my host can I bring my machine because it's pre-downloaded, it works in here. And I don't know if I could work the magic. There's something about that. Also, eliminate the download the data itself, the dataset. Because remember you're bringing the dataset into your computer and then you're doing the analysis on your computer. How about making this a Web based thing and so all you have to download is the results, the pictures that the Web based computer, the computer out in the cloud has done the work for you. And therefore I don't have to worry about how strong and how fast the machine that I'm executing it on is. Now, this is a laptop. It's marginal. And you'll see that in the speeds. And I'll also show you how do you know. So a whole push to cloud based Web based for this entire publishing effort not to force it into my machine and have me have to worry about software loading as well as hardware deficiencies. Well, if you go to www.osa.org and click on OpticsInfoBase, you'll end up at a thing called Interactive Publications, ISP, but if you want to go directly, and I think the slides are going to be made public so you can get this, you go to OpticsInfoBase.org/ISP. And if you go there, you'll come here. And one of the things you should know about I've circled is get the ISP software. You click on that, it goes to a page and explains how to download and get the software. It's all as we said originally, it's all free. Those are the four issues that have been published. And I will click on the inaugural issue of Scientific Publishing. And if I click on that, it goes to the index page of that first issue. And on that index page are all the issues. And you can download and you see it's circled, you can download PDF. And so you download the PDF. And if you haven't download the software and now you've gotten this far and decide well maybe I've got to do that, you can click on download the software. If you download -- if you click on the ISP logo that's on each one, that shows that it's ISP enabled, and it also explains to you what this is all about. That's an added help function. So I will click on the PDF. And when I do, I get your typical PDF. The first difference in that typical PDF is here. And what that is a Web reference. And if you click on that, it will go to the database of all the -- all the datasets that are in that article. So if I click on it, I will come to this page. And you'll notice that here's the article, the abstract. You can get all the datasets that are in this article, which is 151 megabytes. You got a fast line. Or you can get them one at a time. This is completely independent. This is a discoverable page. It's completely independent of ever reading or downloading or looking at the article. So it meets the criteria. Oh, I got this other idea, I found some data. Please, just like you're supposed to cite when you cite somebody's paper, you're supposed to cite which you use somebody's data as well. That raises another social issue that we're pushing here. These are all things that we think may be we went a little bit too far too fast. But you got to have serial number point nine, right? Somebody's got to try it. So now if we go back to the PDF we start reading, all of a sudden we come to a nice picture, patient chest CT depicting airway. And down here in the figure it says view 1. If you click on view 1 or view 2 or view 3, you will download the data that makes up these pictures. And you will enable the ISP software. And it will open this page such that you'll end up there. You'll notice they're identical. When you click on view 1, the magic happens and there it all is. Now, I'm going to allow the magic to happen. And you see it loads the software. Now I think I know how to do this. And there we are, except for the color, which takes a little bit longer. And this is live. So one of the things that we've added is -- no, no, no. That's not what I wanted -is something to allow you to find out -- there's the color -- I'm losing it. There's a bar which tells -- which allows you to find out what's wrong with your hardware or why it's taking so long. And there is one red light on this laptop. But if we go over here, these datasets are live. And I go to contrast -- well, let's go to zoom first, which is the little thingy there. I can zoom in and zoom out. I would be only pick four of these. Move this over a little bit. And I am zoom in or zoom out so we can see what this is really part of. I can then go over here and change the contrast and the brightness. We can see the bulbs a little better. There are preset pages -- radiologists use all kinds of preset contrast and loading. There's a whole set of these which makes sense to the radiologist but not so much to us. If I grab a hold of this bar here, which is the cross-section, so this green line here is the cross-section that you're seeing here. So if I grab a hold of that line and I move it up and down, you'll see that the upper changes as I move down the body and up the body. In this color I can rotate it. And again, I can move in and out and slow on through this to see. I go back here and I'm interested in how long something is. I can take my little distance measure and decide that this bone from here to here is 1.35.1 millimeter long. If for some reason I'm interested in the angle of the backbone, I could take my angle tool and go from here to here. And then from here to here and find out that that book bone is 146.3 degrees. Whoops. Let's just stop it. Okay. And if I'm interested in how big a space or a bone is, let's find something here. Oh, this part of the backbone here. I can pick this up. Let's see. I've locked myself into something. Like I told you are, it's not as easy -- but you can circle this item here and actually see what the dimensions are of a particular piece. >>: [inaudible]. >> Michael Ackerman: Huh? >>: It's in the shading mode. >> Michael Ackerman: Yeah, I know it's in the shading mode. But my problem is that it's probably behind me and I'm waiting for it to come out of that mode. And -- well, I can do it here. And you can see if I wanted to know the size of this. And so it tells me the area is 66 -- 662 square millimeters, perimeter of 116 millimeters and so on and so forth. So one can take this and go on and on and on and analyze this three-dimensional dataset a million different ways. You can imagine what this means to reviewer. You can see me not being a radiologist what this means to try to use it. Radiologists fly at this. But it also means that the entire dataset is there, and if I am interested in something in the research that I'm doing and somebody else says I wonder if it also shows up on this other dataset, they could get it through the analysis and publish. So that's a quick view of what this is all about. You're all more than invited to take a look at it for yourselves, download the datasets and if anybody would like, I'd be more than happy for you to contact me and tell me what you think. Thank you so much. [applause]. >> Brian Hitson: Thank you, Michael. Any questions? Yes, Will. >>: Thanks. Hi. This is really interesting. And since I wasn't here last year, I didn't hear about it last year, so this is the first time that I've heard about it. You mentioned the -- one of the possible problems with reviewers feeling kind of overwhelmed by all this data. I was wondering if you had thought about ways of letting the authors or maybe encouraging authors to publish not just the dataset plus the paper but also maybe some of the analytical tools that they use to come to the conclusions or to reach the findings that are in the paper. So this is not really my background but in computer science and computer graphics and vision sometimes people publish code, for instance, and say, you know, as a reviewer I can at least run this code, it gives me a starting point to say are the claims valid? How can I -- you know, it helps me as a reviewer evaluate what's being said in the paper without necessarily having to look at an entire dataset. So I was just wondering if there were thoughts along those lines. >> Michael Ackerman: Usually in these areas that they did the work in, these are done -- the image is captured, CTs, MRIs and things like that and radiological scanners. And so the people who are doing the analysis know how to use the machine. So they don't necessarily know what's inside. So they're using a GE scanner or a Picker Console or something. And they're picking buttons and seeing things. Now, all that is revealed in the paper. But precisely what algorithm, well, it's whatever algorithm Picker uses when I do a contrast enhancement. It's that kind of moved. You know, the carpenter knows what a good hammer is but hasn't the faintest idea what exactly the met also were in it that made it be a good hammer. It's that kind of problem. So although it would be nice, it's not the nature of this kind of thing to know precisely the algorithms that are in it that are being used. >>: Michael, great work by the way. Quick question. In the -- when you hit the link on the handle that went to the information about the dataset, where is that dataset stored? Who's responsible for storing that data? Had. >> Michael Ackerman: Right now that dataset is stored at the OSA in an OSA database just like the PDFs. One of the design criteria, since it's all linked, is that the links be made variables, not hard coded. And the reason for that is that ultimately just like everything else, it's going to end up at the National Library of Medicine. It's going to be end up at the NLM Pub-Med Central or somewhere. And therefore, that's the ultimate place where it resides. We have to be able to store it. It has to be gotten from the outside. And those links have to be variable. Now, that poses a problem for us. Because currently when you submit a paper that has data attached to it to the NLM, our databases are set up so that that's one entry and you -- you point to or you go to the box that contains the paper and everything that came with it. That's your box. You can't get into that box from the outside other than the one way in with the literature reference. And so the compromise that was made when we go ahead and do the next stage of how we're going to store this is that in the box at the NLM would be the paper and a reference to another box outside, which could be at another NLM computer or elsewhere, where the data would individually be stored and where that data would be discoverable. So the -- I've said at the beginning is how -- is it worth doing and then how do you do the archiving, that's a problem. And the problem for us is because we want the data to be discoverable as individual datasets, not as we force you to go to the paper. >> Brian Hitson: One more quick question before I go to break. >>: This is basically a nuance on the last question. Have you thought about discoverability of papers that use the same dataset long term once people are starting to publish kind of subpapers that look at nuance in a dataset that the original author didn't see? >> Michael Ackerman: Well, that's what we're hoping about. That's exactly what we're hoping. What we're hoping is that somebody will look at the paper -- let's take the best of all worlds -- and not look at it and say this is ridiculous, I'm going to write a paper with a complete retraction that this is -- we're not -- we're assuming that 99.999 it's legit, it's good and so on. But somebody is going to see it because it's related to something they're working on. And they look at it not for the purpose of the paper but because of the related that they're working on, and they say oh, my goodness, it's in this one too. So I'm going to publish mine and I'm going to enhance it by showing that this data had nothing to do with me, completely independent, also shows what you're saying. >>: In a graph between kind of the datasets and the papers ->> Michael Ackerman: And then both datasets would appear in the new paper, suitably cited or maybe even merged in the new paper, one on top of the other, whatever is appropriate. But it would enhance so much -- one of the things that we -- we -- if you think about this, you'll realize it. The genome projects are working phenomenally fast clinically because you can do retrospective searching. They publish things and then you can go to the genome database and say did anybody ever report so and so, and if not, is it in there? And they can look at the genomes, the original data and see if this thing appeared. When you index something, you can only index what you know is there. You can't index what you don't know, and therefore you can't find it. So you can do a retrospective search in the genomics because you can put in your new sequence, you can go there, find out that it's there and nobody noticed it before. Aha, 10 years of clinical trials took 10 minutes. And that's what we're hoping this would do. Somebody notices something and they say well these are related, let's find out all the chest films we can find out there that have been put there for the last five years and see if it's there, aha, we don't have to do the clinical trial. So that's an extremely important motivation on the research side about why you want to do this. >> Brian Hitson: Okay. One more hand for our excellent speakers this morning. Thank you. [applause]