>> Brian Hitson: And thank you for coming from... this one-day workshop hosted most hospitably by our host, Microsoft,...

advertisement
>> Brian Hitson: And thank you for coming from far and wide to many of you to
this one-day workshop hosted most hospitably by our host, Microsoft, and also
by the organization International Council for Scientific and Technical Information.
This workshop is Multimedia and Visualization Innovation For Science. And so I
think it's relevant to the work of science but in many cases many avenues
beyond science.
We have assembled a world-class lineup of speakers this morning and this
afternoon, and we really appreciate the speakers' willingness to come and share
their work with us. So I want to extend a specially thanks to them.
We have a couple of introductions of -- for welcoming purposes and remarks
from Microsoft and from ICSTI.
And the first of those that I would like to introduce you to is Ms. Roberta Shaffer.
She is the law librarian, a lawyer herself from the Library of Congress. And most
recently elected as president of ICSTI. So Roberta will offer a few opening
remarks and a welcome, and then we'll follow up with Tony Hey from Microsoft.
Roberta.
>> Roberta Shaffer: Thank you, Brian. And good morning to everyone. Good
morning. I need to see that all that oxygen the flowing from head to toe.
It's my pleasure to welcome you as the president of ICSTI and to also have this
opportunity to extend our thanks to Microsoft and in particular to Lee Dirks who is
finally getting breakfast. He's probably been up since 2 a.m. But he's in the back
and he has been a phenomenal source of all kinds of support to make this day
and the few days that some of us that have been here at Microsoft possible. And
so we thank you, Lee, from the bottoms of our heart and from the tops of our
head.
ICSTI is we believe a unique organization because we bring together people who
have an interest in the information society and knowledge economies and
knowledge ecologies from a variety of nations, a variety of sectors and a variety
of disciplines. And we believe that that really makes us unique in the
professional association space.
And so for those of you who are not members, we are inviting you to join us
today to see how this all works with a showcase of a phenomenal workshop.
And we believe that at the end of the day, those of you who are members will see
that the money you are spending to travel with us and to be members is well
worth it.
And for those of you who are not rather than pouting, we will give you the
opportunity to join because we welcome everybody.
One of the benefits that ICSTI also offers is partnerships with a variety of other
associations that share our interest. And I wanted to take this opportunity to let
you know that N phase, one of our partners is having a conference at the end of
this month, February 27th to March 1st.
For those of you who have forgotten, February is a short month, so it's not quite
as long a conference as it appears to be. And it has a rather intimidating and
exciting title which relates to Taming the Information Tsunami. So if you see the
tsunami coming on the horizon, then perhaps you'll want to be in Philadelphia at
the end of February to find out how to protect yourself or how to exploit the
tsunami.
I also want to remind you that we will be having our annual conference in Beijing
this year in early June, and it's never too early to plan to attend. So please begin
to think about that. We have information about the program that will shortly be
on our website. But we're happy to talk to you today about the program and
about what we think the outcomes of it will be. So please, please put that in the
front of mind.
Before I turn the podium over to Tony Hey, who is co-hosting this day with us, I
want to express my particular appreciation to Brian Hitson who introduced me
this morning and started the day. It's really Brian's portfolio and intellectual
muscle that is enabling us to be gathered here today for this what I think will be
phenomenal workshop. So please join me in expressing appreciation to Brian for
all that he has done.
[applause].
>> Roberta Shaffer: And then without further a do, I will turn the podium back
over to Brian so he can introduce our fabulous Tony Hey. Thank you.
>> Brian Hitson: And so can you hear me with this mic? Oh, there, it just comes
on.
I want to specially appreciate that acknowledgement from Roberta, but we also
had a lot of help from a program committee that put this together, and then also
in addition to Lee Dirks consisted of the ICSTI executive director, Tony Llewellyn.
If you just wave, Tony. And from the French information organization Inist,
Herbert Grudemeyer [phonetic]. And myself. And I'm from the US Department of
Energy's Office of Scientific and Technical Information.
So we've enjoyed doing this. We had several other players who have helped
contribute to this. I was somewhat on extended travel myself for some time. I
had special support from one of my staff members, Lorrie Johnson, so specially
accolades for putting this program together. So thanks to everyone.
And now for Dr. Tony Hey. He has been a tremendous support to the objectives
of ICSTI now for several years since Microsoft's membership in ICSTI. And of
course Lee Dirks is his point person who interacts with us most frequently. But
it's Tony's fundamental support for the objectives of this that enable this kind of
collaboration and our ability to it in a venue like this.
He's a tremendous support not only to ICSTI but also to worldwide science which
is really a sister, inseparable election of ICSTI's program, worldwidescience.org
is a federated search engine that searches across 70 plus databases in the world
simultaneously reaching lots of scientific literature that's commercial search
engines cannot reach. And with Tony's team that works in multilingual
translations we've added translations to this technology as well. And so he's
been instrumental in offering support to that.
Tony is the corporate vice president in Microsoft Research for the connections
team. And in that position, he has the responsibility for university relations and
other forms of external collaborations. And I would just like to welcome him to
welcome us. Tony.
[applause].
>> Tony Hey: Thank you very much, Brian. And thanks to Brian and Walter
introducing me to ICSTI. And it's been a very exciting time and I'm very pleased
to be involved with ICSTI.
Just background of my interest in scientific data. In the UK I ran the -- from 2001
to 2005 the UK equivalent of the Cyber Infrastructure Initiative in the US, which
was all about if you like multidiscipline science, we called it e-science. And that
data collections became clear to me that they are of vital importance and that's
why I really welcome the great contribution of Worldwide Science and ICSTI's
efforts in this space.
So it's great to be here. You are seeing one of our first public showing of our
new branding. We used to be called Microsoft External Research as opposed to
Internal Research. You're in Microsoft's Research building on the Redmond
campus. Microsoft Research has about 800 PhD computer scientists around the
world. And that's about one percent of Microsoft's total employees. So it's about
one percent.
And that's Microsoft Research is here to give Microsoft agility to give -- to
understand what's happening, to understand trends. And that's why it's really
great for us to be participating in this workshop, because we also feed to
understand what's happening out there and having a collaboration with ICSTI
makes that much easier.
So I just should warn the speakers that the clock at the back of the room is not
working. Lee, will you fix that, please. You need to fix that. Now it's -- I don't
know why it's wrong actually. Anyway, it's great to be here. And it's great to see
you all here. I hope you have a great time. And the weather today looks like
typical Seattle, the sun and so on. So I hope you'll enjoy it.
And I hope you have a great meeting. So we'll be around all day, Lee and I. So
if you have any issues, come and find us, and we'll do our best. Thanks very
much for coming, and I hope you have a great meeting. Thanks a lot.
[applause].
>> Brian Hitson: Thank you, Tony. So as you can see on your program, we
have this workshop organized into essentially four sessions, two in the morning
and two in the afternoon. And I'll be the chair for the morning sessions and Lee
Dirks will chair the afternoon sessions. And we'll try to give you a break if
between each of the two morning and afternoon sessions to stretch your legs
and refresh.
But we're on -- going to have sort of military precision here in terms of trying to
keep to the timetable and because the clock not working if the speakers will
occasionally gaze at me, I'll be over in this area maybe giving you a sign as to
how many minutes you might have. Typically we're going to try to keep the
presentations themselves to in the range of 25 minutes or so, and that would
allow a few minutes for Q and A and then transition to the next speaker and so
forth. So that's what we'll generally try to stick to as far as the schedule goes.
On the first session this morning, and these are not hard-walled distinctions
between the sessions because, for example, in the first session the emphasis is
on the interactive nature of the multimedia and visualizations but that's not to say
that subsequent speakers won't have interactive aspects to theirs as well. But in
particular on these first -- first three presentations they'll be a heavy emphasis on
interactivity.
So with that, let me introduce our first speaker. And he is Will Li, who is a
scientist in the creative technology's lab at Adobe Systems. After earning his
PhD in 2007 in computer science at the University of Washington, his -- which
where his graduate work focused on new interactive visualization techniques that
help users understand and explore complex 3D objects with many constituent
parts, for example CAD models and anatomical datasets. I've seen a preview of
his presentation and it's an amazing depiction of 3D visualization of complex
machinery taking machines apart and showing how they work and has high
potential for training purposes and engineering purposes, and also in the area of
human anatomy. So it also has medical applications.
So I think you'll be impressed and intrigued by the potential of his work to
accelerate science and technology understanding comprehension.
And the title of his presentation is Explaining Complex 3D Objects With
Interactive Illustrations.
So I'd like to introduce you to Will Li.
[applause].
>> Wilmot Li: I've got a clock here too, so I'm going to keep an eye on it.
Great. Thank you for the introduction, Brian. And also thanks for inviting me
here. I'm excited to be here, and I'm also very excited to be speaking first so I
can fully enjoy the rest of the talks today. [laughter].
So as Brian mentioned, I did my PhD at the University of Washington. And, in
fact, I spent many summers and extended internships here at Microsoft
Research. So it was kind of familiar heading over here, maybe a little too
familiar, actually. It turned into the old building 112, 113 parking lot. That's why I
was a little bit late.
So I wanted to talk today about this problem of conveying important
characteristics of 3D objects, of complex 3D objects. So let me start by just
clarifying what I actually mean by a complex object.
In particular, I'm referring to 3D objects that have complex internal structure.
Now, for many objects the structure is defined by an arrangement of many
internal parts. So for example in mechanical assemblies like this turbine on the
top left, they are typically -- they are typically composed of many individual
components. And the same could also be said for human anatomy. In some
sense we are just a system of many internal parts.
Now, for other objects, this structure, this internal structure can be defined more
by complex geometric features rather than separate parts. So, for example, on
the right you see here an image of a well known mathematical surface called
Boys Surface. And this surface actually twists and turns and folds back on itself
creating lots of internal self-intersections. And these are what defined its
complex structure.
Now, as you might imagine, these types of objects arise in a wide variety of
different domains. And two of the kind of more important types of information
that we want to convey we might want or feed to convey about these objects are
spatial and functional. Spatial information refers to the relative positions and
orientations of parts.
So, for example, and airplane mechanic might need to understand where all the
parts there that turbine kind of fit together or how they relate to each other
spatially.
A medical student might need to understand how all the muscles in the neck are
kind of layered with respect to one other.
So here for spatial information the critical thing to understand is the spatial
relationships between different parts.
Now, functional information is often a little bit more domain specific. But it tends
to -- it often has to do with how parts interact with each other in order to achieve
some kind of functional goal.
For example, in a human organs, muscles, [inaudible] which are often work
together to perform some kind of function. And in mechanical assemblies the
parts often kind of move and interact in conjunction to achieve some kind of
mechanical goal.
So these are two important types of information that we often want to convey.
The problem is actually communicating this information effectively for complex
objects is often quite challenging. So just as an example let's say we wanted to
understand a bit more about the spatial relationships between parts inside of this
turbine model.
Well, a semi transparent view like this helps somewhat. Here at least we can
see some of the internal structure. But there's still lots of occlusions and partial
occlusions that make it difficult to understand how all these parts fit together.
As an alternative I could just render all the parts separately. So here we can
clearly see all these different parts, right, but what we've lost or what is more
difficult is to understand how they actually fit back together in this object.
So these are -- these two things are clearly not the most sophisticated -- the
sophisticated visualization techniques we could use to try to convey this
information. But I just wanted to point out that maybe some of the simplest
approaches we might think of are not always sufficient.
And in particular there are three kind of high level challenges that often arise
when trying to convey spatial and functional information in economics 3D objects.
The first since many of these objects contain lots of parts, occlusions between
these parts can end up hiding the parts that may be of most interest to the
viewer. In addition, because there are lots of parts, visual clutter can sometimes
make it difficult for the viewer to really focus in on and distinguish the parts that
are more interesting from those that are less interesting.
And finally in the case of functional visualizations, these simultaneous motions
and interactions between parts can often be difficult for viewers to understand.
Now, luckily illustrators in science and technology have come up with a number
of really effective techniques for conveying spatial and functional information in
ways that actually address these challenges. And a today I'll focus on three such
techniques.
Cutaway views where portions of occluding geometry are partially removed in
order to expose some of the underlying structure.
Exploded views where parts are separated in order to reduce or eliminate
occlusions.
And finally what I'll call how things work illustrations. And these illustrations use
visual techniques such as motion arrows as well as static sequences of frames in
order to convey how the parts in a mechanical assembly move and interact in
conjunction to achieve some kind of mechanical goal.
The rest of the talk I'll describe a few interactive visualization techniques we've
developed that are inspired by these different illustration techniques.
And I'll start with cutaways.
So this first project is work that we've presented at SIGGRAPH a few years ago,
and it focuses on interactive cutaways. And just to provide a little bit of context,
here is a type of cutting interaction that is available in existing 3D tools. So here
these the user can slide a cutting plane interactively through the model in order
to expose some of its internal structure.
Now, this technique does technically expose the internals of this object. And if I
was looking for a particular target part, let's say this off-access gear, I can now
see it. But a cross-section like this does not provide a lot of spatial context. I
would argue that for many viewers it would be really difficult to tell from an image
like this exactly how this gear is positioned and oriented with respect to its
surrounding parts.
So in contrast here is a -- here is a video just showing one of the ways in which
users can interact with our system. The user here now selects parts of interest
from this list of parts on the bottom. And the system automatically generates a
cutaway that exposes those parts.
Now, I want to point out a couple things about this visualization. First of all, we're
not just using a single cutting plane to expose the parts of interest. Now, here
the system determines the size, the shape, and the position of cuts in order to
not only expose the parts of interest but also hopefully do so in a way that helps
the viewer understand how those target parts relate to the surrounding parts. So
I'll just continue letting this video play a little bit.
And as you can see, the user selects -- when the user changes the target parts
of interest, the system kind of smoothly animates the cuts and in some cases the
viewpoint in order to expose those parts.
So let me say a little bit more about the design of that system. In order to
determine how do generating effective cutaways the first thing we did was to
analyze a large corpus of hand designed cutaway situations. And by doing so,
we identified a number of conventions and design principles that could help
conform our system. And probably the most important convention we came
across has to do with the types of cuts that illustrators use.
So for different -- it sounds somewhat obvious, but for different types of objects,
illustrators tend to use different types of cuts. So for example for vector linear
kind of boxy objects, illustrators use object align cutting planes to form what we
call box cuts.
And for thin shells like skin or the chassis of a car, illustrators use what we call
window cuts, whose boundaries seem to often correspond to smooth geodesic
paths on the surface.
For regularly symmetric parts, illustrators use wedge cuts that are centered
around the axis of a radial symmetry. And finally for tubular objects, illustrators
often use transverse cutting planes that remain perpendicular to the tube's main
axis.
Now, the one possible rationale for this convention is that by using different
shapes of cuts for different types of objects, it helps to emphasize the actual -the shape of the object being cut. And in so doing, the resulting cutaway I -- we
believe makes it easier for viewers to mentally reconstruct the missing geometry.
So how do we incorporate this convention into our system? Well, the main
challenge in our approach was to figure out some way of interactively or
automatically generating these four different types of cuts. And the approach we
used was to specify a parameterization, one for each part in the model, that
corresponds to these different types of cuts and lets us easily generate cutaway
views.
So for example, for box cuts, our parameterization consists of is we use a
3-dimensional parameterization which is visualized here, and the three -- these
three parameters correspond to these three object aligned axes of the object.
And so in this visualization and the ones that follow, the purpose boxes here
represent the cutting volume represented both in this parameter space on the left
as well as in model space on the right, and these pink boxes correspond to the
maximum possible extents of that cutting volume.
Okay. So what does this parameterization help us with? Well, given these three
parameters, we can now easily specify different types of box cuts despite setting
these -- setting these six parameters for the dimensions.
So the box cuts, you know, these aren't particularly interesting. They're not that
different from standard cutting planes. So let's take a look at some of the other
types of cuts. For wedge cuts we also specify three dimensional
parameterization that corresponds to the angle, the length and also the depth of
the wedge.
For tube cuts we specify a one dimensional parameterization that simply defines
the positions of the two transverse cutting planes along the tube.
And finally we also have two different types of window-cut parameterizations.
But in the interest of time, I'll just skip over these two.
So what I'm not going to get into today are the details about the algorithms we
developed to automatically specify these parameterizations for individual objects.
I'll just say at a high level we use a variety of geometric analysis techniques to
automatically assign these parameterizations to each -- to each part inside of an
object, so that the user does not have to do it manually.
Okay. So I just wanted to show a couple more, a couple more examples from
the system. So here is an example model of a disk brake. And this dialog box
on the right just lists all of the different parts inside of this object. And this user
can just double click on one of them to specify as a target part and the system
will generate this cutaway automatically.
User can also specify multiple target parts. That's not a problem. And in some
cases the system will decide to big a better viewpoint to better expose these
parts.
This next example is a human anatomy example and here the user selects two
different two different parts, the thyroid gland and the neck muscle. And the
system automatically generates this cutaway showing these parts.
Here again is the turbine example. And I'll just point out that in this case as part
of the authoring process, the user has grouped sets of parts into the
subassemblies which are denoted here as part groups. So the user can just kind
of cycle through these part groups, and the system will automatically expose all
of the parts within each group.
So far I've just shown a fairly automatic way of using our system, which is to just
select target parts of interest. But the user can also specify cuts in a more
interactive manner. So, for instance, here the user draws a scribble to open up a
wedge cut and then can change the size and position of this cut using constraint
direct manipulation.
Here is this one final example of a more complicated model. And this is played
back at two times speed. Okay. So just to summarize some of the contributions
of this work, one of the really important parts of this project was just identifying
and distilling these set of cutting conventions that allow us to create effective
cutaway illustrations.
From a technical perspective, the main contribution is this parameterized
representation of cuts that I described. And finally we authoring and viewing
interfaces for working with our interactive cutaways.
Bless you.
So I talked a little bit about cutaways here. And cutaways are especially good at
exposing target parts or parts of interest in situ with respect to their spatial
surroundings. But they're less good at showing the overall arrangement of parts
within an object. And for creating -- for that goal illustrators tend to create
exploded views.
So let me talk a little bit about some work we've done in this area. So this is a
kind of a similar project to the cutaways project that we also presented at
SIGGRAPH a couple of years ago. But it focuses on interactive exploded views
rather than cutaways.
And just like with the -- excuse me. And just like with the cutaways, it was really
important for us to identify the key conventions or design principles for creating
effective exploded views. And in particular we notice that illustrators tend to
consider several different criteria when creating these diagrams. First of all, they
arrange parts along these explosion axes in a way that respects their blocking
relationships. So by doing so, the stacking -- by doing so, these diagrams
emphasize the stacking relationships between these different parts when they're
put back together.
In addition the parts are separated far enough apart that all the parts of interest
are visible. So this is a way of reviewing the occlusions. But, on the other hand,
they're not separated so far apart it becomes difficult for the viewer to understand
how the parts fit back together. So there's this notion of compactness as well in
these exploded views.
Parts typically are exploded along just a small number of canonical explosion
directions. And, you know, we believe that this kind of helps if you're mentally
reconstruct the object. It's not that the parts can move in any possible direction.
They tend to move along only a few small -- a small number of directions. So the
user can kind of understand how they all fit back together.
Finally if the assembly has some kind of hierarchy of subassemblies, these
subassemblies tend to be exploded separately.
So let me just show a few results from this system. So much like the exploded -much like the cutaway system, our system -- our exploded view system
generates not just static illustrations but dynamic ones. And again, the user can
select target parts of interest. And the system will kind of interactively and
automatically generate an exploded view that just exposes those parts. User can
also interact kind of more directly with the parts by dragging them too.
So I'm going to skip some of the details of the approach here and just show some
more results. So -- this one's already shown. So here is a case where we're
combining actually some of the cut way's work with these exploded views. So
here the system -- here the user selects a subassembly to expose and the
system first creates a cutaway and then creates an exploded view to expose the
individual parts of that subassembly.
Much like with the cutaway system, the user can also interact more directly with
these exploded views. Here the user just clicks on individual components to
explode them. The user can also directly manipulate parts along their explosion
axes. And here the parts all kind of respect their stacking relationships as the
user drags.
And we also implemented a kind of a riffling interaction where the user kind of
hovers over parts to get a quick sense for how they move with respect to one
another.
I'll show just one last example here. This -- you typically don't see exploded
views of anatomical models. But you know we had this anatomical model so we
figured we would just experiment with it. [laughter]. You know, the system
actually worked reasonably well. It's just kind of analyzing blocking relationships,
analyzing geometric properties. And so I don't know if this really the way you
want to visualize these types of datasets but it was kind of an interesting
experience.
So just to quickly summarize the contributions of this work, once again, the
conventions were critical in actually producing effective exploded views. We
presented a method for automatically generating these. And so I didn't talk about
this, but there's a kind of an explosion graph data structure that encodes most of
the information necessary for these exploded views. And finally we also
presented some interactive ways of exploring these kinds of models. Oh, and as
I showed, there was -- we presented one approach for kind of combining both
explosions and cutaways.
So, Brian, I have about five minutes until 9:40. Can I just take a couple more
minutes to describe one more project. Okay. So I'm going to skip over this one.
This is a -- some work we did on creating exploded views of mathematical
surfaces. I won't say any more about that, but it's clearly related to -- related to
our mechanical assemblies work but focusing on a different domain.
Okay. So the last thing I wanted to touch upon just briefing are these how things
work illustrations, which -- and this is work that we also presented at SIGGRAPH
just this past year.
And here we were primarily motivated by the work of David Macaulay. Some of
you might know him, especially those of you with kids. He makes really fantastic
illustrations that show well, you know, how things work. And we identify three
main techniques that illustrators tend to use to create these types of illustrations.
Motion arrows indicate that the ways in which different parts move.
Frame consequences are often used to explain more complex motions by
breaking them down into individual steps.
And finally in some cases animations can be used to help understand the
dynamic behavior of these assemblies.
So what our system does is it takes as input a geometric model, like the one that
you see here. And here there's no motion specified. It's just a static model of the
actual parts. And then we use geometric analysis to understand how these parts
interact and move in order to create static illustrations like this. We also create
frame sequences that show the sequence, the causal chain of interactions from
the driving part to the rest of the parts if the system.
And finally we also can create animations like the one you saw earlier.
Okay. So just to show you a couple of the results from this work. So here is one
model that's loaded into our analysis interface. And here what the user does is
this is an automatic system, so the user just says run this part analysis which
computes kind of plausible axes in motion for each of the individual parts.
And then we also build an a interaction graph, which is a kind of a way of
encoding all of the different interactions and motions between parts. And once
that's done, the model's now dynamic, so we can interact with it or we can just
run an animation. And here we can compute the arrows based on this motion.
So remember, the input here had no motion whatsoever. This is just a static
model of geometry.
So let me just show a couple more results. Here's -- here is a kind of planetary
gear box. Which is kind of interesting because it has two different possible
configurations, one where these outer rings actually move and here the user
says, okay, let's try to keep them fixed and see what happens when we drive the
mechanism forward. And now the kind of inner smaller gears rotate or orbit
around the main axis.
So here as the animation is running, the user can also step through the causal
chain and we just use kind of simple highlighting to emphasize these parts. And
we can also combine this with the exploded views work to create exploded how
things work illustrations.
Okay. So finally the contributions of this work. You know, once again this is kind
of a recurring theme but identifying these design guidelines and conventions
were kind of a key contribution. We also introduced some motion and interaction
analysis for automatically computing the motion of the parts. And finally we
presented some automated visualization algorithms for creating these types of
how things work illustrations.
So just to summarize, I've talked today about a number of different techniques for
creating these types of illustrations. And I think actually the main -- one of the
main take-aways that I hope you get from this is that it's possible with a
combination of geometric analysis as well as really understanding the relevant
design guidelines to create effective interactive visualizations without a lot of
manual effort. Now, for most of the systems I described, much of the process of
creating these things was automated. And so I think I'm just going to end there
so that we stay relatively on time.
And if there's maybe a couple of questions, I'm happy to take them. Thank you.
[applause].
>>: I think I missed the very beginning part of what software you're using to
create those.
>> Wilmot Li: These -- this is all software that I wrote or, you know, I wrote with
-- in conjunction with my collaborators. They're all research prototypes. So
they're not -- you can't get them right now unless you ask me, I guess. But,
yeah, we kind of just built these prototype visualization systems ourselves to
explore some of these ideas.
>> Brian Hitson: Any others?
>>: I guess my one word description would be wow or cool or something like
that. So I think my boss took advantage of this cutaway technology to open up
my brain and see why it is that I disagree with him sometimes and then he
labeled that part of my brain poor judgment or something like that. So thank you
very much, Will. I appreciate it.
>> Wilmot Li: Great. Thank you.
[applause].
>> Brian Hitson: Okay. Thank you. Now, moving on to our second speaker, this
is Robert Hanson, who is -- Robert M. Hanson who is professor at St. Olaf
College, professor of chemistry. And St. Olaf, for those of you who don't know, is
in Northfield, Minnesota.
He's the principal developer have the open source Jmol applet and project
director for the Jmol molecular visualization project resulting in the transformation
of Jmol into a powerful Web based visualization and analysis tool used by a
broad interdisciplinary community of scientists and educators, representing the
full range of activity from K through 12 education to PhD level research. In
collaboration with the nano biotechnology center at Cornell, Dr. Hanson has
designed a Jmol based exhibit can at the Epcot Theme Park in Orlando called
touch a molecule, which opened in February 2010 and is expected to have over
three million interactive visitors.
And so we're very fortunate to have Dr. Hanson here. He's a colleague of Brian
McMahon and John Helliwell from Crystallography Union. And there was a
workshop in Paris last year on interactive publications. And so this is kind of a
nice graduation on to multimedia visualization topics from there. But it's nice to
have this continuity of the Jmol topic as part of the program.
His presentation is Communication in 3d challenges and Perspectives. So help
me welcome Dr. Robert Hanson. Thank you.
[applause].
>> Robert Hanson: Thank you, Brian. Is my mic on? Okay. Well, it's a pleasure
to be here. And sorry I missed the last one. Sounds like that was good. Wilmot,
you might be interested to know that we have Jmol in PDF files. So we can -we're interested in developing that some more. But that's one of the sorts of
visualizations we can do.
Okay. What I'd like to introduce to you is a tool that's out there for molecular
visualization, Jmol. Just raise your hand if you have ever heard of that before.
Hey, what do you know. Okay. How many of you never heard of that before?
Good. All right. [laughter].
My pleasure to do this. I am a professor. I teach undergraduates. St. Olaf
College is an undergraduate only liberal arts institution in Northfield, Minnesota,
about 45 minutes south of Minneapolis in the southern part of Minnesota,
although I -- we do have two feet of snow on the ground right now. But so I hear
maybe Philadelphia does, too. So I think we're all getting the know this year.
As such, I design all my talks around 55 minutes. And I expect Brian to bring out
the whip as soon as you're ready to tell me to go.
Okay. So I have to tell you how this -- my business in Jmol got started. It's
actually with this book which we published oh, gosh, it's 15 years ago now with
University Science Books called Molecular Origami Precision Scale Models From
Paper. This is a bizarre little project that I was working on as part of a grant that I
had and interest a publisher. And my publisher really was kind of worried about
this because he said how can I publish a book that you either have to rip the
pages out of photocopy in order to use? And I assured him that it would be okay.
And he compromised by having the pictures only on one side of the page to
maybe encourage the cutting out rather than the -- but the idea was to build a set
of models, to allow people to build models.
And in chemistry we do a lot with models. Anybody ever had organic chemistry?
You remember the plastic models? We still actually use those some, but we've
gone much more to virtual models. And you'll basically see a bunch of those in
this talk.
But this was a real retro idea that maybe we like handheld models and maybe we
have two hands so that we could hold one and hold another and compare. And I
had a lot of fun with this. Students had a lot of fun with this. We build these
models out of paper. Give you a sense probably of what we're talking about here
maybe.
For example, this is a quartz model. That's the paper on the right. This is a
really nice zircon model. These are actually markings that show distances and
angles. They're precision. They're scale. They're 200 million to one, generally
scale. And my interest in Jmol actually derived from wanting to put this on the
Web and having a more interactive version of these paper models. So for
example here's the Jmol version of that particular model.
Okay. So my actual introduction to Jmol -- okay. So Jmol is a applet primarily
that interacts with Web pages. It's a project that was worked on many years
before I got involved in 2004, I believe. And I did so because of this wanting to
get some renderings on the Web of interesting molecules. And so one of these
projects was to have a database of structures that people could access that were
of interest so I selected about 1,000 compounds out of the Cambridge
crystallographic database and the idea here is that one could investigate these.
And I needed something that I could display the 3D structure for. And so my first
application actually of using Jmol as just a user was this little window into the
molecular world that allowed us to do interesting things like measure distances
and compare structures.
Okay. So basically Jmol, the J stands for Java. It's an applet and a stand-alone
application. As an applet it plugs very easily into Web pages, works on every
browser we've ever tried it on, as long as Java is implemented on the hardware.
And it focuses definitely chemistry related, molecular structure, has a -- I've been
working for the last five years to develop a very rich internal scripting language.
So one can guide a user through a structure, not just present it statically, not just
present in it a way that they can manipulate it, but provide them all sorts of
controls. And I think the excitement of this for me is to have seen hundreds of
applications of Jmol. I'm sure if you -- if you look on the Web, you'll see many,
many applications where people have come up with completely different ideas of
what to do with this applet. Because they can actually script it.
And it interacts with JavaScript on a Web page so it plays nicely with links and
buttons and such.
It's an open source project. There's absolutely no external funding for this. My
wife said pass the hat and maybe somebody will have a spare hundred dollar bill
that they could slip in there. We just do that because we love it. And I've
incorporated it some into my professional development at St. Olaf.
It's highly multidisciplinary. It started out very much in the sense of sort of small
molecule chemistry, quickly developed into an applet that could display proteins
and nucleic acids and biomolecular structures. More recently we've introduced
the full collection of crystallographic techniques and properties into it so it can
really just about every crystallographic file that exists and process it.
A group of mathematicians have found it, and there's a project called Sage which
is a project which allows people to interactively do mathematical creations. And
the online version of Sage uses Jmol to deliver mathematical surfaces and
structures. Just because molecules aren't that much different than everything
else in mathematics, a bunch of nodes and connections and surfaces and such.
More recently -- most recently I've been working with a group in Canterbury at
the University of Kent if the solid state physics area. So part of my fun is I get to
learn all of these different areas that I've never actually studied myself before.
It's really great fun.
Here's a structure on the cover of an R and A journal from last year created with
Jmol. So you can see we can develop rather complex structures. A number of
journals are using Jmol for interactive display of figures. A figure can show up as
a -- just an image. User clicks on the image, a window pops up or that particular
place on the page turns into 3D. And I think this is actually a somewhat old list.
There are probably more than these.
One example is a paper that Brian McMahon and I wrote -- well, I wrote with
Brian last year and for the Journal of Applied Crystallography developed a -- I
can see it here. A method of showing figures.
So, for example, here's my journal article figure collection. And these are just
ping images. But if you select one and click on it, it will show up in a 3D fashion
as a popup window. So that's one mode that it's done.
The actual journal article online is this one, Jmol a paradigm shift in
crystallographic visualization. And you can see here that throughout the text
what they did is they added then the figures in this interactive fashion. And you
can click on the figure and get the interactive view.
One of the very nice aspects of Jmol that we built into it is the idea of a very
simply defined state so when I -- this is a ping image, but that ping image actually
encodes the three-dimensional structure as well. And so for example I'm going
to switch here for a second to the Jmol application, not the applet, and here
simply a Windows directory somewhere here with a bunch of ping images.
But these ping images were created with Jmol, so if I bring them back to Jmol
just by dragging, that turns that two-dimensional image back into 3D, and we get
to explore at will. And it should look exactly like it started. So that's been a fun
innovation.
Well, what I'd like to do is just spend a little time showing you some examples
and hopefully leave a little bit of time for questions if people have them. I just
picked a very few examples just to give you a sense of the range of possibilities
that Jmol can be involved in. So let's just start simply with chemistry.
I'd like to show you a little work that was done by a friends of mine, Otis
Rotenberger, at University of South Florida, I believe. And this is the organic
chemist's model kit but now in virtual 3D, which allows you to play. And there are
lots of ways of getting at this. So now, for example, if we pick a -- we can draw a
structure in 2D and then have that structure turned into 3D.
And the reason I'm really showing you folks this is that to let you know of a
tremendous resource that we have been tapping into. There is a database at the
NIH of molecules, I don't remember how many millions there are now, but they're
all accessible and available to us very simply. And so for example this is my
application again. Let me just pull up a console here. Name a drug. That I can
spell. I think I can do that one. Do I do it right? Okay. So this Jmol has simply
tapped into the NIH database and, you know, everything from simple structures,
see if I can spell coding. There we go. There's coding. This is -- this is
something that's just totally unheard of even a couple years ago, that you could
simply say a word and the name of a compound and instantly have its structure.
And this is -- this is just a tremendous resource that the National Institutes of
Health have developed for us.
In the area of molecular biology we can do all sorts of things with protein
structures. This for example is a little application, a Web application I wrote
called Jmol protein explorer. It was based on an earlier version that used an
earlier application than Jmol called Chime. How many of you have ever heard of
Chime? Okay. So you know that that was the way to go. It was a plugin some,
what, 10 years ago? I think. And then we lost it to the world. Jmol is basically
that replacement.
Now, the one other thing I wanted to emphasize here which is I think really, really
cool is that down here we can take whatever view we have and save it to our
hard drive and then just drag it into Jmol or drag it back to this page and we'll
come back live 3D. And in addition what you can do is e-mail it to yourself or
e-mail it to a colleague, and it will come as an attachment that they download, it's
just a webpage, they click on the link and it opens them up and they see exactly
what you saw right here. So it's a way of conveying information to others.
Well, I've had a lot of fun learning about crystallography in the process. I'm
actually an organize chemist, but I love mathematics. I guess you probably could
guess that. And crystallography is just a beautiful application of mathematics
within chemistry and physics. And many of you may have seen structures,
crystal structures like I was just showing you results of structures of proteins and
it's easy to get caught up in the idea that these bonds are really there, the atoms
are really little balls and there are sticks in between them and, you know, this
model that we see over and over again. And I had a little project this last year to
try to get behind that, and so I learned some about crystallography, and the idea
here is really I think a cool, cool -- have I said that word too many times, a great
idea. This gets to the idea of trying to inform people about where data comes
from. And if you take a protein structure as the final form and that's all you have,
you get no indication of uncertainty or anything about that structure. It's a long
way from the actual data.
So here's the idea. Crystal structure comes to us ultimately as a set of little
points in space on a grid, basically a whole bunch of numbers. And this little
application is designed to reenforce that. And so here I have a challenge for you.
Can you tell me what this structure is?
>>: Salt.
>> Robert Hanson: Salt? No. Well, first of all let me ask you this. Can you see
any structure in there at all, or does it just look like a nice grid of snow? Tell me if
you see any structure. You see a little bit? Let's go to a black background just to
see if you can see any structure in there. Isn't it just amazing just changing the
background can -- you see it now? Well, that's a crystal structure.
Now, what crystallographers do, actually, is at each one of these points is a
number. And all you have to do is say, well, just show me the big numbers, don't
show me the little numbers. I'm going to give you a cutoff value. And I want to
know just where the points that are greater than, and you tell me a number.
So this is cutoff zero, meaning show me all the points. But watch what happens
if we say oh, maybe O.2. I think I'll go to the black background. Now we're
starting to see a little bit more structure maybe. If we go really high, we might
lose the whole thing. I don't know. What x-rays are doing is diffracting off the
electrons of atoms. And especially the core of atoms have a lot of electrons in
them generally. And so we're seeing basically the data representing the cores.
Now, what you usually see is this. The typical sort of data that goes in to
producing a protein structure is usually represented as this sort of abstract mesh
of data. And one of the things I find interesting is that there are no carbons or
oxygens or nitrogens listed there. The diffractometer does not list the atoms. It
just gives you this.
And then it's left to the interpreter to put in what they perceive to be structure. I
think my wire frame is not showing up for that particular part. But the oxygens
and the nitrogens and the carbons are really interpretations of all that. So we've
had some fun with that.
Just a couple more examples. Mathematics. I mentioned the Sage project.
Here's just a little, little quick application actually that my son wrote in the seventh
grade, I think. The idea is something called Sierpinski's triangle. And the idea is
to create a triangle. And this is just an example of the kind of scripting that you
can do in Jmol to manipulate objects. And we were interested in that. That was
kind of in his book, and we were interested if that would work in 3D. And it turns
out that, yeah, you can -- you can create that same kind of object in 3D.
Okay. I think I have one more example. It's a little off the wall. I hope you don't
mind. How many of you like Sodoku? A few of you. What's the next move in
this puzzle? I'll give you 10 seconds. [laughter]. I get somebody could find it.
Well, I kind of went nuts on Sodoku a couple years ago. I don't know, with the
way my brain works, I decided it was really a 3D puzzle, not a 2D puzzle, that I
was really looking at a stack of numbers and it might be interesting to see it in
3D.
So of course why not just use Jmol, right? So my Sodoku co-assistant here -oh, actually this isn't the one I want to look at. I think this one is more fun. Came
up with this idea of looking at Sodoku in terms of three-dimensional object rather
than a two-dimensional object. So we're back to the idea of just visualizing
things in new ways I hope here at the end.
And probably no closer to the solution, right? But let me show you something. If
we were to kind of look at this from the side so we have our different -- the
numbers are from the top to bottom, one, two, three, four, five, six, seven, the big
balls are the ones that have already been determined. We don't need those
anymore. Let's get rid of those.
But in these little lines in between them are the logical chains. It just simply says
if it's this, it's not this, but then it's this and it's not this. If it's this, it's not -- like
that. Got the little individual, logical connections. And they make these
three-dimensional chains. And if I look at one in particular, here's your next
move. This is a really interesting construction in terms of Sodoku. What you
have is a logical chain that's going around in a pattern. And you see how it's
connected and it's connected and it's connected all the way around that site.
Well, if you were to connect that the rest of the way, you would have impossible
solution, Sodoku. And if I were to simply remove this point right here, if I were to
say it's not this, which is the number four, then this structure would become a
complete unit and it would be an impossible solution to Sodoku because the
Sodoku puzzle would have two possible solutions, and Sodoku puzzles only
have one possible solution. And you can see that when you go back to the
standard view what we're actually seeing here is this 5, 9, 5, 9, 5, 9, 5, 9. And
how many of you could tell me that that then has to be the number four? You
can't have 5, 9, 5, 9, 5, 9, 5, 9, 5, 9 all around, because then it could either be 5
or 9 in every cell in that loop. That's two solutions. So it's got to be four. I love
this.
And then it's just a very simple solution from there. [laughter]. Okay. So maybe
my end -- my bottom line should be Jmol bringing you solutions to random
puzzles or something. But basically that's my presentation. And I thank you.
And happy to answer questions.
[applause].
>>: Beautiful work. I'm curious. Have people put in interactions or some
kinetic-type structure like how things would interact in terms of for covalent or
ionic bonds or just force types of things or protein folding? Different forces in
addition to the stick representations or different geometric representations?
>> Robert Hanson: You mean in terms of designing animations that then are
driven by these principles or do you mean individualizing the forces themselves?
>>: Or a few interactions between molecules or within a molecule?
>> Robert Hanson: Yeah. There's all sorts of different ideas that people have
come up with and various ways of coloring atoms based on parameters. So if
you give me a parameter, we can color a surface or color an atom. So proximity
to some other group, proximity to a positive charge. In the chemical informatics
where they're trying to bind -- show binding, there's some really interesting
visualizations you can do with that. Absolutely.
>>: Is it possible to actually take two molecules, has anybody done that, figure
out how they all interact?
>> Robert Hanson: Is it possible to take two molecules and see how they'll kind
of interact? Yes. Now, I would say Jmol is primarily a visualization tool. It's not
a high-end tool for that purpose. And so we -- our sort of philosophy is let other
groups do better what they do best, and we'll fill in the gap with the actual
visualization.
So typically what people do with that is they will design some sort of an animated
sequence that's all precalculated, and then they would use Jmol, for example, to
produce a movie of it.
Something was just coming to mind as you said that. Oh, let's see if I can do this
really fast. We have something called a model kit mode in Jmol which allows you
to add and rearrange atoms. And there's -- there's a wonderful click I love here.
It's called drag atom and minimize. And I probably shouldn't do it with such a
complex structure. But it would be kind of interesting to see how the molecule
respond if you were to take an atom and just kind of move it someplace. And this
is really addictive. Because it's just like whoa whoa whoa.
So in the old model kits where you had to have things just the right way, just put
it anywhere you want and see if we can kind of move it to the other orientation. A
little bit along those lines. There's a minimization function down there.
>>: [inaudible] question or with organic chemistry it's very hard to remember all
the reactions. I'm just wondering if people use this for education to show
interactivity between different organics.
>> Robert Hanson: Oh, yeah. I could show you about a hundred examples of
reactions in 3D happening and the atoms coming in and atoms going out.
>>: [inaudible] reaction type program [inaudible].
>> Robert Hanson: UC Irvine. I'm not sure.
>>: They've got a reaction viewing type program.
>> Robert Hanson: Sir. Or Madam.
>>: So you talked about [inaudible].
>> Robert Hanson: Why don't you wait for the microphone, because I think it's
being recorded.
>>: All right. You talked about an internal scripting language for doing some of
the guides. Do you have -- not Internet, but do you have like a community I
guess where you can like share it and contribute the things that others have
already done?
>> Robert Hanson: Yes. We have a very vibrant user community, a user list
always sharing and asking for help and, you know, I can't quite get this page to
work, how do you do it, and somebody within a couple hours will -- from
somewhere on the planet will respond to that. And there's full documentation on
it, too. Lots of examples. Every time I introduce something I create a little
example file that shows how it's used.
>>: Just curious if you've heard about the executable paper grand challenge?
>> Robert Hanson: No. What is that?
>>: I think you might want to enter it.
>> Robert Hanson: The executable program grand challenge?
>>: Executable paper.
>>: They're running a competition for a paper which is interactive.
>>: And executable. And I think you might want to enter.
>>: I don't know when the deadline is.
>>: Some money associated with that maybe?
>> Robert Hanson: That would be nice.
>>: Next year.
>> Robert Hanson: Next time. Yeah.
>>: John [inaudible].
>> Robert Hanson: Make him run. Get his exercise.
>>: Bob, that was great, and thanks for coming by the way.
On the uncertainties in these representations, this feeling of everything is equally
good is another is a real problem.
>> Robert Hanson: Right.
>>: And you alluded to it. And I think you were hinting if you raise the threshold
of the contour level and the less [inaudible] ones disappear and that gives you a
feeling for precision, imprecision, we discussed this at the [inaudible] yesterday
and I went in with that point and an even more basic point came out which is how
do we deal with people that think sulfurs are yellow, nitrogens are blue and
oxygen is red?
>> Robert Hanson: You mean they're not?
>>: The real point is the [laughter] first point. But it is a true problem.
>> Robert Hanson: You know, it is a problem, but it would be even more
problem if we didn't have some kind of systematic way of representing the color
of molecules because the rest of us who know that that's not true would just go
totally nuts with what's this color today? So it's probably a best of all evils to
have at least those colors standard. Yeah, good point though.
>>: Any other questions?
>>: Hi. I'm an scientific illustrator with a scientific journal, and so we get
diagrams in but we often tweak -- need to have the authors tweak these. And a
couple of questions on that. My first question is you mentioned the NIH data
bank. And excuse me if I'm misinformed or uninformed on this. Can you import
from -- there's all -- the protein data bank, you can just import models into Jmol
and they will -- it will, you know, automatically as you showed us here.
>> Robert Hanson: Exactly.
>>: It will render the molecule. Second part of the question is can you choose
between this, what we see here as stick ball representation versus what
[inaudible] describe as more clumpy representation of a molecule versus a ribbon
diagram? Because sometimes we need different representations to show the
same molecule.
>>: Yeah. Absolutely. Jmol has a full representation of these various
implementations. This just came from the PDD database. It's a little protein
called 1CRN. The default rendering is typically the ball and stick, but you can get
anything that you can do in any other program that I know of. So say you wanted
to say, I'm just using the command language because it's easier for me since I
wrote it, very common thing to display is the molecular surface. And so getting
back to the uncertainty business, a common thing to do would be for example to
take this surface and actually color it based on the uncertainty in the file for that
particular structure. And you can see that this -- I think there's a tire scene out
here maybe on this end. What is that? Yeah, it's tire scene. Is the most floppy
of the structural components there. So cartoons, the whole works. You know.
>>: Okay. Thank you very much.
>> Robert Hanson: Appreciate it.
>>: Thank you very much.
[applause].
>> Brian Hitson: Let me just say before I introduce the next speaker, we have a
really nice cross-section of speakers and talks on our agenda today. And ICSTI
is thinking about trying to have a continuing series in this product area of
multimedia visualizations. And if you would like to see topics in future workshops
that we're not covering today, please take the time to jot down some thoughts on
that and give them to me or Lee Dirks, your session chairs, and we will take
those into serious consideration for future workshops as well.
So anyway, thank you, Bob, very much.
It was a nice smooth segue really to our next speaker because this carries on
into even further uses of interactive and publications, interactive tools and
publications. And here to give us this presentation is Dr. Michael Ackerman, who
is the International Library of Medicine's assistant director for high performance
computing and communications providing guidance for NLM's telemedicine
distance collaboratory advanced networking and imaging interests.
He was a research physiologist in the Naval Medical Research Institute and later
head of the institute's biomedical engineering and computing branch. At NLM he
has worked -- his work has included applying technology to medical education,
including probably most famously the visible human project and overseeing the
library's non-bibliographic databases.
The title of his presentation is Interactive Multimedia Scientific Publication. Help
me welcome Michael Ackerman.
[applause].
>> Michael Ackerman: Thank you very much. For some of you this will be a
review or partly review. Elliott Siegel gave parts of this last year at the winter
ICSTI meeting. And I've seen some of you also at CENDI. But the program
committee asked that I review it and bring you up to date as to where we are on
this particular project and have we reached the ends yet?
We have not reached the end yet. We're almost there. One of the things that we
learned unsaid as we go through this is that perhaps we did too much too soon.
And I'll let you be the judge of that of what we actually wanted and what the
public or the learned public is ready to chew on and use at this point. But I'll let
you be the judge.
The goals of our project -- the idea of the project was very simple. People were
and are publishing by way of publishing PDFs. And to us, using the computer
media and a PDF is okay, but it's really a waste. Because the computer media is
capable of so much more than the representation of a piece of paper which is
PDF. And if you will, publishers, please excuse, but you may agree. Publishing
by way of PDF is just redistributing the cost of printing instead of you doing it and
we paying for it, we do it and we pay for it anyway. Because I think it costs a lot
more in toner and paper than what we pay publishers. But people seem to like
that.
So we said if you're going to do things in PDF, there's got to be more in the
computer media and perhaps the PDF than is currently being used. And for the
library's point of view, we're not necessarily interested in the publishing business
and what you all do but rather if there is something greater and better, is it worth
doing? So our first goal was to evaluate the educational value in scholarly
journals. Is it worth doing this extra tough rather than just distributing by PDF.
And if it is worth doing, then you know, the National Library of Medicine is the
ultimate archiver in health. And what is this going to do to our problem of
archiving literature if not only do the archive the flat literature, the paper, but also
all the additional value that comes with it. And how do we do that?
Now, we got together with the Optical Society of America. They're also
interested -- they had a similar vision. We'll talk about that. And we talked about
if we add databases to this collection, then how do you get to those databases?
What is the peer review process, not only of the journal but of all the data and
datasets that go with it?
And also to give the viewers an independent way of looking at the data, and
here's the first departure from where we are. When you publish and you go to
the tenure committee, they ask you what have you published and how many
people are using it? How valuable is it? What we're proposing is that if you have
to publish in your PDF the datasets behind your article they should be available
independent of the publication which means that if I read the publication and I
like the dataset and I see something in it that the author hadn't thought about it, I
could write a paper based on that dataset.
And I could be point to that dataset in my independent paper. So the second
tenuable commodity is how many datasets have you published and how many
people are using them? This is something that's out there. We may have gone
too far with that notion. But we still think it's a notion of the future is that data,
especially data which came from the public coffers should be public data. And
an author doesn't necessarily see everything that's in it, others see what's in it,
and it would be nice if they could reanalyze it and come up with new things.
Which also brings us to an unsaid goal of the project, which was held by our
director, Dr. Donald Lindberg.
The NLM, because we archive and we index the medical literature also sees all
of the papers that are withdrawn because of false things in them. That have
been forced to be withdrawn. Dr. Lindberg's thesis is that if the author not only
published the paper would have to publish the data, the first or second or third
person to read the paper and look at the data if not the editor themselves, would
realize that it was falsified because the data itself was not stand up to the
microscope which is being somehow shadowed through the evaluation
committees. So he had this unsaid thing that this would be a good way to screen
the data because it's Web 2. We would also screen the data, not just the couple
folks that get to read it for the journals.
Why we pick OSA? Luck. It turned out that those folks and our folks were at a
meeting. We got to talk and we both realized that we had the same vision of
using the PDF as a way to do advanced programming in imaging.
Now, our vision is not only imaging but it's also in non-image things. When you
publish and show a graph, wouldn't it be nice if you could click on the graph and
get the data behind the graph so that you could reanalyze it? So that you could
combine it with your own data and see how yours was the same or different than
the published data.
The OSA Optical Society of America is very interested in the visual things, and
so we limit ourselves in this project to imaging. But we have another project at
NLM that's dealing with data other than imaging, the kinds of things that show up
as charts and graphs.
We were lucky. Four of the OSA journals appear in Medline. So they're a
member of the club. It's not like we're bringing in an outsider which might be
looked upon not so nice with people within the health community. They're top
ranked in ISI, and they have a long history. So it's a safe partner and as I said a
member of the club.
So the idea here is to publish special journals issues which they suggested would
probably be Optics Express on biomedical research topics. There would be an
online version that incorporated the printed version because Optics Express as
well is printed, and the source data videos and other media objects so that you
could visualize for yourself those pictures and the data behind it and that you
would do that by downloading a free plugin or plugin-like software to the PDF
reader.
And obviously the downloads should be quick and easy. That's not a small thing
to do. These are image datasets, image datasets by their nature full image
datasets are huge.
So since the NLM is interested in is this words doing from the educational point
of view? That's our goal here. If you're going to go through all this trouble, does
the reader care? There would be authors, reviewers and readers would be
asked for feedback at every stage of the way. What was the extra work? How
are they doing it? Was it worth from it their point of view and especially the
reader.
We would do a usability analysis. The articles that came out of this would be
indexed in Medline. That was terribly important if you want to get people to write
these articles. And that the datasets would become open access, fully citable
and archived in OSAs -- currently in OSA's InfoBase database which is an open
source database.
Datasets would include the data and the metadata behind them. They would be
discoverable. You could Google them. Or bing them in this building I guess
[laughter]. Lightning didn't strike. So you could get to them. And as I said, they
would become -- they would -- they could be accessed through other publications
so they in effect would become what I will call a tenuable commodity.
So the first round we published three issues the first one in October 2008 was
seven papers, 45 datasets, called Interactive Science Publishing Introductory
Issue. It was a plethora of things.
The second issue was on Optical Coherence Tomography in Ophthalmology. 17
papers, 242 datasets.
And the third issue a year later, October 2009, was five papers, 43 datasets in
Digital Holography. So that made up the first corpus. And while this was going
on evaluations were being done. And what we learned in that evaluation is that
people generally liked it. But the reviewers had a sense of being overwhelmed
by the job. Well, you can imagine being a reviewer and you've all either done it
or helped us do it. And you receive a paper that's 10 or 15 pages and you read
it, you check the references and so on and now this paper comes along with
seven or eight or 10 datasets.
Well, what are you supposed to do? You kind of look at it and say oh, that looks
pretty good. You certainly look to see if you see what the writer said was there
and if you agree with it.
But do you go beyond that? You have the whole dataset. You know, it's a joke
in this area, it's see figure 3. Typical x-ray of the chest. And you're thinking see
figure 3. The best x-ray I've ever seen of the chest. Because why would you
publish the typical one? You only say it's the typical one.
So we all have these euphemisms. If this is what the author wanted me to see
because he got it in just the right angle, these are three-dimensional datasets.
What if it were off axis a little bit? Would it still be there? If this is such and
important finding, why can I only see it in one dimension?
And so what should I do as a reviewer? This turned out to be part of the
experiment because we didn't give the reviewers directions. We said review it.
You're a reviewer.
Now, ultimately that came back and bit us. Because Optics Express requires at
least two reviewers of every paper that gets in. Many of these papers had one
reviewer, and that was because OSA did a lot of begging and called in a lot of
favors. So it turns out it's a massive job for the reviewer. Readers liked it as long
as they got past the learning curve, which is extremely steep. There were
installation problems, there were navigation problems, and the 800 help number
didn't answer. So there were help problems.
So those folks that got past that and were able to do it liked it. But I have to tell
you the majority never got that far. And we learned that from telephone
interviews, pop-up windows and stuff like that.
So we stopped at the third paper. The third publication. And we rewrote the free
download software. It's version 2.3. It's been rewritten several times. But the
current version as of April 2010, when we froze it, was version 2.3.
And then in July, we put out the fourth issue, which is four papers, 45 datasets
entitled Imaging For Early Lung Cancer Detection. We are now doing an
intensive evaluation effort. The usual life -- the usual pop-up questionnaires, but
also we are discovering who many of the users are, and we have a company that
is making arrangements to call these folks one-on-one to ask questions about
what their experience was, especially their experience which can be summarized
as saying was it worth all the effort to do this?
The almost final assessment is that we cut -- we solve a lot of the problems from
chapters one, two, and three, that the new software is much better. But that the
user interface is not intuitive enough unless you're a radiologist. Of if you're a
radiologist, it's like working at any radiological work station. If you're not, you
need help. There's now a lot more help functions in this and so on.
The first time I picked this thing up, I got it to load. That was not a problem.
Well, it was a problem because I'm not a -- I'm not a privileged user at my
computer, so I have to call IT so they can unlock it so you can load a program
because this is not a JavaScript, this is an executable that loads with PDF, and it
comes in Mac, PC, and I'm blanking. But you all know the third -- the third.
>>: Unix.
>> Michael Ackerman: Unix. It comes in the Unix version. So it works on all of
those.
So I called and I know people at OSA and I said, okay, it's there. How do I
change the grayness and the contrast? And they said oh, you move your mouse
across it left to right to change the gray, up and down to change the contrast.
And I said how would I know that? And they said, well everybody knows that.
[laughter]. Well, everybody who is a radiologist knows that, because that's what
they teach you in radiology school. But I didn't go there. I went to engineering
school. They didn't teach us that. The new version now has ways that you could
know that.
It's very much like handing a novice, handing them Photo Shop. And I'll
demonstrate. It's just overwhelming. But for those that know it's very good. For
those that don't, it's a bit of a struggle.
Eighty percent of the people we've talked to now have said that they've
enhanced their experience, and 50 percent think that it's increased their learning
be and understanding.
Interestingly enough all of them, a vast majority said it's really good, but we can
do the same things with Matlab. And we've written our own little thingies to do
these kinds of things using Matlab. Very interesting.
These are the user recommendations. And when I demonstrate some of this to
you, you'll see why these are user recommendations. Eliminate the need to
download and install the software. Make ISP, that's what it's called, the Interact
Software Package, a Web based thin client instead of PDF based. So we don't
have to download or call IT or do whatever.
And, by the way, that's one of the reasons that I thought we were all going to all
use a common computer and I specifically asked my host can I bring my machine
because it's pre-downloaded, it works in here. And I don't know if I could work
the magic. There's something about that.
Also, eliminate the download the data itself, the dataset. Because remember
you're bringing the dataset into your computer and then you're doing the analysis
on your computer.
How about making this a Web based thing and so all you have to download is the
results, the pictures that the Web based computer, the computer out in the cloud
has done the work for you. And therefore I don't have to worry about how strong
and how fast the machine that I'm executing it on is. Now, this is a laptop. It's
marginal. And you'll see that in the speeds. And I'll also show you how do you
know.
So a whole push to cloud based Web based for this entire publishing effort not to
force it into my machine and have me have to worry about software loading as
well as hardware deficiencies.
Well, if you go to www.osa.org and click on OpticsInfoBase, you'll end up at a
thing called Interactive Publications, ISP, but if you want to go directly, and I think
the slides are going to be made public so you can get this, you go to
OpticsInfoBase.org/ISP.
And if you go there, you'll come here. And one of the things you should know
about I've circled is get the ISP software. You click on that, it goes to a page and
explains how to download and get the software. It's all as we said originally, it's
all free.
Those are the four issues that have been published. And I will click on the
inaugural issue of Scientific Publishing. And if I click on that, it goes to the index
page of that first issue. And on that index page are all the issues. And you can
download and you see it's circled, you can download PDF. And so you download
the PDF. And if you haven't download the software and now you've gotten this
far and decide well maybe I've got to do that, you can click on download the
software.
If you download -- if you click on the ISP logo that's on each one, that shows that
it's ISP enabled, and it also explains to you what this is all about. That's an
added help function.
So I will click on the PDF. And when I do, I get your typical PDF. The first
difference in that typical PDF is here. And what that is a Web reference. And if
you click on that, it will go to the database of all the -- all the datasets that are in
that article. So if I click on it, I will come to this page. And you'll notice that
here's the article, the abstract. You can get all the datasets that are in this
article, which is 151 megabytes. You got a fast line. Or you can get them one at
a time. This is completely independent. This is a discoverable page. It's
completely independent of ever reading or downloading or looking at the article.
So it meets the criteria. Oh, I got this other idea, I found some data. Please, just
like you're supposed to cite when you cite somebody's paper, you're supposed to
cite which you use somebody's data as well. That raises another social issue
that we're pushing here. These are all things that we think may be we went a
little bit too far too fast. But you got to have serial number point nine, right?
Somebody's got to try it.
So now if we go back to the PDF we start reading, all of a sudden we come to a
nice picture, patient chest CT depicting airway. And down here in the figure it
says view 1. If you click on view 1 or view 2 or view 3, you will download the
data that makes up these pictures. And you will enable the ISP software. And it
will open this page such that you'll end up there. You'll notice they're identical.
When you click on view 1, the magic happens and there it all is. Now, I'm going
to allow the magic to happen. And you see it loads the software. Now I think I
know how to do this. And there we are, except for the color, which takes a little
bit longer. And this is live.
So one of the things that we've added is -- no, no, no. That's not what I wanted -is something to allow you to find out -- there's the color -- I'm losing it. There's a
bar which tells -- which allows you to find out what's wrong with your hardware or
why it's taking so long. And there is one red light on this laptop.
But if we go over here, these datasets are live. And I go to contrast -- well, let's
go to zoom first, which is the little thingy there. I can zoom in and zoom out. I
would be only pick four of these. Move this over a little bit. And I am zoom in or
zoom out so we can see what this is really part of. I can then go over here and
change the contrast and the brightness. We can see the bulbs a little better.
There are preset pages -- radiologists use all kinds of preset contrast and
loading. There's a whole set of these which makes sense to the radiologist but
not so much to us.
If I grab a hold of this bar here, which is the cross-section, so this green line here
is the cross-section that you're seeing here. So if I grab a hold of that line and I
move it up and down, you'll see that the upper changes as I move down the body
and up the body.
In this color I can rotate it. And again, I can move in and out and slow on through
this to see. I go back here and I'm interested in how long something is. I can
take my little distance measure and decide that this bone from here to here is
1.35.1 millimeter long.
If for some reason I'm interested in the angle of the backbone, I could take my
angle tool and go from here to here. And then from here to here and find out that
that book bone is 146.3 degrees. Whoops. Let's just stop it. Okay.
And if I'm interested in how big a space or a bone is, let's find something here.
Oh, this part of the backbone here. I can pick this up. Let's see. I've locked
myself into something. Like I told you are, it's not as easy -- but you can circle
this item here and actually see what the dimensions are of a particular piece.
>>: [inaudible].
>> Michael Ackerman: Huh?
>>: It's in the shading mode.
>> Michael Ackerman: Yeah, I know it's in the shading mode. But my problem is
that it's probably behind me and I'm waiting for it to come out of that mode. And
-- well, I can do it here. And you can see if I wanted to know the size of this. And
so it tells me the area is 66 -- 662 square millimeters, perimeter of 116
millimeters and so on and so forth.
So one can take this and go on and on and on and analyze this
three-dimensional dataset a million different ways. You can imagine what this
means to reviewer. You can see me not being a radiologist what this means to
try to use it. Radiologists fly at this. But it also means that the entire dataset is
there, and if I am interested in something in the research that I'm doing and
somebody else says I wonder if it also shows up on this other dataset, they could
get it through the analysis and publish.
So that's a quick view of what this is all about. You're all more than invited to
take a look at it for yourselves, download the datasets and if anybody would like,
I'd be more than happy for you to contact me and tell me what you think. Thank
you so much.
[applause].
>> Brian Hitson: Thank you, Michael. Any questions? Yes, Will.
>>: Thanks. Hi. This is really interesting. And since I wasn't here last year, I
didn't hear about it last year, so this is the first time that I've heard about it. You
mentioned the -- one of the possible problems with reviewers feeling kind of
overwhelmed by all this data. I was wondering if you had thought about ways of
letting the authors or maybe encouraging authors to publish not just the dataset
plus the paper but also maybe some of the analytical tools that they use to come
to the conclusions or to reach the findings that are in the paper.
So this is not really my background but in computer science and computer
graphics and vision sometimes people publish code, for instance, and say, you
know, as a reviewer I can at least run this code, it gives me a starting point to say
are the claims valid? How can I -- you know, it helps me as a reviewer evaluate
what's being said in the paper without necessarily having to look at an entire
dataset. So I was just wondering if there were thoughts along those lines.
>> Michael Ackerman: Usually in these areas that they did the work in, these are
done -- the image is captured, CTs, MRIs and things like that and radiological
scanners. And so the people who are doing the analysis know how to use the
machine. So they don't necessarily know what's inside. So they're using a GE
scanner or a Picker Console or something. And they're picking buttons and
seeing things.
Now, all that is revealed in the paper. But precisely what algorithm, well, it's
whatever algorithm Picker uses when I do a contrast enhancement. It's that kind
of moved.
You know, the carpenter knows what a good hammer is but hasn't the faintest
idea what exactly the met also were in it that made it be a good hammer. It's that
kind of problem. So although it would be nice, it's not the nature of this kind of
thing to know precisely the algorithms that are in it that are being used.
>>: Michael, great work by the way. Quick question. In the -- when you hit the
link on the handle that went to the information about the dataset, where is that
dataset stored? Who's responsible for storing that data? Had.
>> Michael Ackerman: Right now that dataset is stored at the OSA in an OSA
database just like the PDFs. One of the design criteria, since it's all linked, is that
the links be made variables, not hard coded. And the reason for that is that
ultimately just like everything else, it's going to end up at the National Library of
Medicine. It's going to be end up at the NLM Pub-Med Central or somewhere.
And therefore, that's the ultimate place where it resides. We have to be able to
store it. It has to be gotten from the outside. And those links have to be variable.
Now, that poses a problem for us. Because currently when you submit a paper
that has data attached to it to the NLM, our databases are set up so that that's
one entry and you -- you point to or you go to the box that contains the paper and
everything that came with it. That's your box. You can't get into that box from
the outside other than the one way in with the literature reference.
And so the compromise that was made when we go ahead and do the next stage
of how we're going to store this is that in the box at the NLM would be the paper
and a reference to another box outside, which could be at another NLM computer
or elsewhere, where the data would individually be stored and where that data
would be discoverable.
So the -- I've said at the beginning is how -- is it worth doing and then how do you
do the archiving, that's a problem. And the problem for us is because we want
the data to be discoverable as individual datasets, not as we force you to go to
the paper.
>> Brian Hitson: One more quick question before I go to break.
>>: This is basically a nuance on the last question. Have you thought about
discoverability of papers that use the same dataset long term once people are
starting to publish kind of subpapers that look at nuance in a dataset that the
original author didn't see?
>> Michael Ackerman: Well, that's what we're hoping about. That's exactly what
we're hoping. What we're hoping is that somebody will look at the paper -- let's
take the best of all worlds -- and not look at it and say this is ridiculous, I'm going
to write a paper with a complete retraction that this is -- we're not -- we're
assuming that 99.999 it's legit, it's good and so on. But somebody is going to
see it because it's related to something they're working on. And they look at it
not for the purpose of the paper but because of the related that they're working
on, and they say oh, my goodness, it's in this one too. So I'm going to publish
mine and I'm going to enhance it by showing that this data had nothing to do with
me, completely independent, also shows what you're saying.
>>: In a graph between kind of the datasets and the papers ->> Michael Ackerman: And then both datasets would appear in the new paper,
suitably cited or maybe even merged in the new paper, one on top of the other,
whatever is appropriate. But it would enhance so much -- one of the things that
we -- we -- if you think about this, you'll realize it. The genome projects are
working phenomenally fast clinically because you can do retrospective searching.
They publish things and then you can go to the genome database and say did
anybody ever report so and so, and if not, is it in there? And they can look at the
genomes, the original data and see if this thing appeared.
When you index something, you can only index what you know is there. You
can't index what you don't know, and therefore you can't find it. So you can do a
retrospective search in the genomics because you can put in your new
sequence, you can go there, find out that it's there and nobody noticed it before.
Aha, 10 years of clinical trials took 10 minutes. And that's what we're hoping this
would do. Somebody notices something and they say well these are related, let's
find out all the chest films we can find out there that have been put there for the
last five years and see if it's there, aha, we don't have to do the clinical trial.
So that's an extremely important motivation on the research side about why you
want to do this.
>> Brian Hitson: Okay. One more hand for our excellent speakers this morning.
Thank you.
[applause]
Download