23330 >> Aaron Smith: Okay. So I think we can get started now. Sorry for the interruption or the delay. Yeah. So I'm Aaron Smith from Extreme Computing Group. And it's my pleasure today to host Paul Navratil from TACC, Texas Advanced Computing Center. So Paul and I have known each other for over a decade. So I remember when he was using small displays. And single cores. So at TACC he's managing the visualization software group and he's visiting today because of SE, so I talked him to come over here and giving us a talk. He's going to tell us about the new display technology and the software they've been building. Thanks for coming. >> Paul Navratil: Thanks, Aaron. Thanks to all of you for being here and watching on the Web. I'm Paul Navratil, part of the visualization group at TACC, the Texas Advanced Computing Center. It's part of the Vice Presidents of Research Portfolio at the University of Texas. And we have a mission to service the advanced computing needs of the university, the Texas system, and through primarily NSF funding the national cyber infrastructure. So what I'm going to talk about today is some of the challenges we face as a visualization group at TACC and some of the ways we're solving those challenges both in our remote computing platforms and in our local large format displays. >>: Tell us how big you are and ->> Paul Navratil: In terms of TACC. >>: Number of people and ->> Paul Navratil: I don't have slides for that but I can give you just an off the top of my head we're about 100 people. Almost all at UT Austin at our research campus about ten miles north of main campus. We do have a few folks in remote offices at Office of Naval Research, down in medical branch in Houston for UT. So we have just the beginnings of a branching outside. But otherwise we're divided into visualization group, high performance computing. Advanced systems that keep the big iron running, user support and outreach. And so as part of our mission, our director, Jay Bosso, really has an emphasis on giving back to the community, not only the scientific community, but also the larger community. And both to inform the public and to inspire the next generation of scientists that will take our places. Okay. So basically what I want to talk about is how we're using the cluster technology that predominates high performance computing today and applying that technology to visualization. Because as you'll see, once a simulation that runs on a very large cluster, 100,000 cores, for instance, the data produced by that cluster really can't go anywhere else. So that cluster has to become the visualization and analysis instrument. Now I'll talk a little more later about what that looks like in terms of what we do. So I'm also going to contrast the visualization workflow with traditional high performance computing. If you think about, say, a fluid solver in massive parallel, you can just divide up the domain evenly or, say, if you're doing a ionization study of dark matter, you can take each of your stars that emits the ionization and duplicate the entire universe that you're studying across different -across your cluster. But for visualization, the computational workflow looks different and the results and the demands placed on those computations are different as well. And we'll talk a little bit more about what that means. Then I'll also describe some of the solutions that we're pursuing at TACC. Of course, I'm being biased towards the one I'm more familiar with than not. So you'll see a lot of the software and some of the hardware that I've helped design. And I'll talk a little bit at the end about motivation for the work of future clusters. Okay. So let's look at what a typical HPC workflow looks like. Say you have some parameters that you input into an equation. So this could be, say, the initial conditions for a weather simulation, temperature, cloud density and fluid in the atmosphere. You might have some initial conditions, starting points and you ship those into your super computer and run a simulation and then you get some results, typically in the form of some graphs or maybe just even a single number or set of numbers, and you might also get some time steps out that you either feedback as new initial conditions or you do later analysis on. And so most of your work goes, happens at the machine and it's in terms of inter and intra process communication. That might take the form of MPI, maybe P threads, maybe some combination of both. Nowadays you're also talking about hybrid technology. So GPU computing. Intel is making big announcements about MIKE, their many integrated core project that came out of the Larrabee development, if you're familiar with that work. So visualization workflow takes these time steps that were typically generated from some simulation. Could also be from an instrument. Say an MRI scan. Electron microscope, and runs visualization algorithms on some high performance or advanced computing hardware. And then you get some geometry, for instance, that you're actually going to render into a picture. You then have to feed that geometry back into some sort of hardware to perform rendering that creates the pixels of the images that you create. And then you have to display those pixels somewhere. And that can be an online process for interactive visualization and off line if you're rendering frames to create a movie. And typically this process is iterative. So, say, you're trying to do a visualization and you don't like the technique you used you may have used iso servicing and want to do volume rendering, so you change the way you're rendering the geometry. Maybe you didn't like the color palette you used, so you have to create a different image. Maybe you have point data and you need to resample it into a grid so that you can use more visualization algorithms on it. So you actually have to manipulate the data itself. Then you typically also have a process where you're creating your rough visualizations just for your own edification, then you polish it up a little bit and show it to your colleagues. Then you polish it even more and put it in your publications or into your talks. So there's another axis of iteration that comes out of the screen. So let's -- what does this communication pattern look like? And I put this in quotes, because I mean interprocess and intra process communication similar to HPC but I also mean the communication that a human gives to the simulation or to the process and that comes back to the human, because there's typically an interactive component to this. So there's the same inter and intra process communication. There's interactive algorithm manipulation, think of changing the isovalue in an iso surface, and there's also interactive display of the data. So all of this process typically has to happen in, say, a 20th of a second for just interactive or a 60th of a second if you want really high performance interaction. So the algorithms tend to be more demanding on the hardware than a typical HPC algorithm for a couple of reasons. First, the calculations tend to be more irregular. There's data driven calculations, think traversing a tree or searching for a particular isovalue in a dataset. You also generate new data. So if you're doing that weather simulation, the region of the country you're simulating never changes. Right? You may change the gridding of it. But once you've formed that grid it's constant through the calculation. In visualization algorithms you're returning new geometry or you're generating images that you'll return. So you have to buffer that data in memory. Also you're interacting with someone controlling the algorithm. And controlling the calculation. And so if you're changing that isovalue, then your parameters change and you have to recalculate very quickly. And the users expect interactive response. They can tolerate up to, say, ten frames a second. But if you start getting into seconds per frame where they have to go get coffee or check their e-mail between them you've lost it. And also interactive display of data and particularly for large simulations as the pixel count grows, it's more and more difficult to ship those pixels in a meaningful way to a display. Okay. So let me give you just a quick example if you're not familiar with visualization algorithms. Some work that we've done in GPU-based isosurfacing. Isosurface means you're looking for a continuous value in the dataset. Think the temperature lines on a weather map extrapolated to 3-D. And a classic way to do this is with marching cubes where you find where the value crosses in a cube and then there's a lookup table to determine how the geometry is placed. And so say if our value crossed the line 4-7, 4-5 and 4-0 then we go back to the lookup table and put the triangle at the cross point through tri linear interpolation. Okay. So the way this is done on the GPU is actually a three-step process. They classify the voxels, find out which voxels have your data value and which don't. You do a scan to determine where those voxels are, and then you compact them into an active voxel list and generate the triangle. The challenge here is that you don't know how many triangles you're going to generate ahead of time. And on a CPU-based algorithm, that's fine, you just create a new vector or create maybe an STL vector and push on the end of it. On the GPU, that's harder because until recently you couldn't allocate new memory on the device side. And in open CL that's still true. CUDO now gives you the ability to do that. But either way you have to buffer some of your available memory for the data that's going to result. And so you can see that the execution time is really bad in this space because one of the multi-pass process and, two, because of the amount of global memory that you're accessing in that compaction step. And so one of the things we've done is we've -- whoops, that was back. Sorry. Is that we've now moved this, this is just summarizing the issues that I mentioned previously, that you don't know how many triangles you're creating initially but also you're generating the triangles in parallel and you're riding them to a single buffer. So the massive multiple threads in a CPU have to synchronize down into that single buffer. So think of it as a reduction but with more data. So then what we've done is we just take a classic, you can think of it as an octre approach from graphics that instead of dealing with the individual cells we create meta cells that help filter out regions of space that don't or regions of the data that don't have intra C numbers. So we can now in a spot that was ten mega milliseconds we can reduce that to three for about two and a third times speedup. And we can also because we're operating on fewer cells, the classification step becomes a little faster. But so I hope that that motivates just a little bit. We can talk more about that work afterwards. I know I've just given it quick treatment. But basically the take-aways are that we need more memories HPC algorithms and the amount of data we need is dynamic or can be. And the large problems need significant computational resources that may be larger than what is evenly divisible on a single node. For instance, we have done some work with dark matter simulation where each individual time step is 650 gigabytes. And that is the memory footprint for the HPC simulation itself. For the additional structures that the visualization algorithm needs, that expands to three terabytes. And some of that is because the visualization space VTK is only now being concerned with memory efficiency. But some of that is just you need the extra space to allow the algorithm to work. So if we need -- what we find out is when we first started working on this system, our largest visualization resource had only a terabyte of memory. So we had to go back to the HPC resource, but we were allocating nodes now from memory instead of for processing power. And because libraries like VTK aren't multi-threaded we would allocate the 32 gigabytes per node on ranger, one of our machines, and 15 cores would sit idle. So there's definitely an opportunity. And this is some of what we're working on at TACC is to parallelize this so we can make use of those cores even when we're allocating for memory. Okay? Okay. So the original solution to all this was to move the data to a separate machine to do VIZ interactively. For instance, some datasets were small enough that you could move to your own desktop or to your laptop. However, as we're getting larger datasets, moving them off the machine where they're generated is becoming untenable. So even if you have a ten gigabit connection, a terabyte go get a cup of coffee. A petabyte, go on a vacation, right? If you have to do this over wireless, forget it it's just not going to happen. We today even work on datasets where it's easier to put the data on physical drives and mail them rather than try to do the transfer. Okay. So this is only going to get worse, because as the machines grow, the datasets grow but the disk technology and the network infrastructure isn't there and this suffers from a last mile problem. So at UT we have ten gigabit connecting our machine room to the main campus, but if I try to send that to UT El Paso it's going to pass through a thin pipe and that's only as fast as that transfer is going to go. Okay. So what we've done is we're moving the visualization resource to at least the same machine room and ideally moving forward on the same machine. So our first system like this was built in 2008. Spur, which was an attachment to ranger. Eight nodes, 32 GPUs and terabyte of aggregate RAM. And each single node had 128 gigabytes of RAM because there's still some legacy shared memory codes that need that much, that need that much RAM. Longhorn is a machine we just put out in 2010. 256 nodes. 512 GPUs and 13 and a half terabytes of aggregate RAM. And because it's in the same machine room we can put a very high bandwidth connection to the ranger file system and operate on the data without moving it. And now our Lone Star system that we just released also in 2010 has right now eight nodes each with two GPUs but we're going to expand that by the end of the year to 72 nodes with GPUs. So we have a smaller version of Longhorn sharing the same interconnect and disk as the larger nodes of lone star. And lone star is a 20,000 core machine. Our new machine that was announced, stampede, ten petaflop resource, released 2013, and yes everything has a Texas tie-in to the naming scheme, this will have 128 nodes each with probably a Kepler GPU from Nvidia and also this each node will have an Intel mic across the entire system which will be on the order of thousands of nodes. So that will be interesting to experiment with. And so for problems that don't operate on the VIZ subsystem we use software rendering and we move it back to the HPC cluster. Okay. So what about shared memory? There's some in the community that think shared memory is and always will be necessary for visualization. We've done the experiment ourselves. And SPUR was actually a replacement for Maverick which was a 512 core -- I'm sorry, 1K cores, half a terabyte of shared memory. And what we found is that we were able to build SPUR with more capability for cheaper and easier to maintain. And the use of SPUR was actually much higher than of the shared machine Maverick. And the nice thing is that if you go to a distributed memory model, you can get much more aggregate RAM than is possible on a single machine today. So you can get order of magnitude more. And there are single nodes that you can get now that lone star and stampede will both have that have a terabyte of RAM. So you can still -- you can still get a significant shared memory resource even in a cluster environment. >>: Can you give us sort of the one-liner of why shared memory fell through? Because this is great evidence that it did but... >> Paul Navratil: Sure. I think because the environment is harder to control because underneath you're either having a NUMA access that you don't have a control over or you're slowing everything down to the least common denominator. And it's ultimately a shared resource. In our clusters we can give everybody exclusive access to their set of nodes whereas on a shared machine everyone's playing in the same sandbox. >>: Okay. >> Paul Navratil: And so also this is just following trends. VIZ machines have always followed the path of HPC machines. And clusters have definitely won out. I think 480 some odd of the top 500 are cluster machines and that's just going to grow. And we're trying to bring the community into that role. Okay. So there are some tools out there. Some tool kits like Para View and Visit, two large shared memory -- I'm sorry -- large open source platforms based on VTK to do visualization. And what they've done is they use a Fat Client model where the geometry is generated on a server and then shipped to the client. And some of them Visit in particular tries to be smart about when it ships pixels versus geometry. But what we've done is so we've seen that the data traffic, especially doing managed communication within the software can be too high for low bandwidth conditions. And also the connection options are still lagging a bit behind. Sometimes they just assume a large shared memory system that you're connecting to even remotely. What we've done instead is we push everything server side. And now we just use a thin VMC client to interact with the server. And that's been successfully used from TACC machines literally half a world away. We have collaborators in the Gulf states working on our visualization system remotely. And there's definitely latency in that model but you're going to experience the latency either way and this allows the computation to move forward while just moving keyboard and mouse movements from the user and pushing pixels back. And with the new VNCs they're doing smart things like only updating the change region of the window rather than pushing an entire window across. And so this really minimizes the bandwidth and if you want we have full-featured gnome or KDE on the back end or I just tend to use TDM that's ancient and Spartan but it gets the job done and it minimizing even the overhead of the windowing system. So let me show you just briefly -- I won't play this entire video. But this is our Web-based interface onto Longhorn. It's called Envision. And so someone through the Web can either get that VMC window I mentioned or go through an interview process to do their own visualization and this is, I think, this version is VTK-based but it has tie-ins where we can use other rendering solutions, and so you just have an idea that this all happens again in your browser. And this is a mummy MRI that they're playing with. >>: How much is happening [inaudible] and how much is VTK window? >> Paul Navratil: This is actually all on the browser. So VTK is rendering server side and the pixels are being shipped. >>: Okay. >> Paul Navratil: So it's not even VTK window. So let's talk about image display because generating the images is only part of the solution. You still -- your analysis of those images are limited by the pixel count on the display you're using. If you have an electron microscope that's producing an image that's 25,000 pixels by 25,000 pixels, you're either zooming and panning in something like a Bing maps interface where you're zooming in to see what you want to see or pulling back. But you have to trade off context for detail. There's other images like NASA Blue Marble has a 3.4 gigapixel image of the entire earth. Google Art Project has the large scans. Each museum in the project has donated a piece of art to be high resolution scanned. It's the equivalent -- this is at half kilometer resolution per pixel and it's equivalent to the entire earth scanned at kilometer resolution per pixel is how much data is in one of these Google Art project scans. Very high resolution and again the electromicroscopy is at half a gigapixel already. So it's nice to have both resolution and size. Resolution to see the details; size to see the context. So we have multiple display technologies in our visualization lab that give you an idea, that allow you to choose the technology that's right for you. And this is a brief view of the VIS lab from the Longhorn network. This is all the Longhorn network I've seen. I don't know if you're following that saga but apparently there's a neighborhood in Dallas that has a Longhorn network and Austin doesn't. That was our 307 megapixel display. That's a 12 megapixel touch display. We actually have that touch display in our booth at SC. If you have an opportunity to go to the exhibitors fair, it's at Booth 223. So this is just a minute. You can see high resolution photography. Actually the lab has allowed us to expand beyond the traditional STEM field, science, technology, engineering and mathematics. We've got fine arts, architecture and humanities in here. We've had artists build pieces for the VIS lab. And so what we found is by working with the artists, they have the vision of how it's supposed to look and they challenge our technology to reach it. Then we can take that new ability and bring it back to the scientists and engineers to expand what they can do. So we find it's a virtuous cycle to work with these folks. This is a 3-D display. This is just an 82-inch commodity television and driven by a quadro graphics card. So what we've done, this replaced a barco projector solution. 2900 square feet. In the space that that projector, that single display of projector technology took up, we have a meeting room and six high tech displays now. So we still have a projector, but we also have so much more that we built and we maintain and the power of the commoditization of this hardware has really come to the fore. Okay. So this was our first tile display. Colt three by three. These are 30-inch LCD monitors. And we built the frame ourselves. Just any drafting program that gives you actual measurements, you can design it, send it to a company that's called 80/20, industrial erector set. They'll cut it, send it back to you, iodize it any color you want. As long as it's black. This is the 307 megapixel display. That's showing Mars in the background, plus some visualizations on top of it. The nice thing about this you can show either very large data or you can show multiple views of data or correlated data. And so you can have everything up at once rather than having to switch between windows on a smaller display. Our work here has also been starting to expand beyond just our center. We run a lab in the college of education at UT now that allows them to work on their information visualization problems that we have a partnership with such as the scores for every student in California in 1995. The test scores for them. So those are the types of problems they're trying to visualize. We've also consulted on tile displays at UT San Antonio, UT El Paso, and I believe also in Colorado. So that's broadening out. Again, they'll be maintained all by TACC staff and using commodity equipment. So also at a fraction of the cost. This is a schematic of stallion. You don't have to read it. The take-away it's 75 displays, 30-inch. There's a three by five hot spot just by a feature of how the hardware worked out. The outside that hot spot it's each GPU drives two displays. Inside the hot spot each GPU drives one. So you can have -- there's a notable rendering performance boost in that hot spot. 23 workstations and these at the time this was built in mid 2008 there were no server class machines that contained GPUs. It was really -- at the birth of the GPGPU resolution. We just got gaming boxes. These are G-80, G force machines driving it. If we did this today, we would use server rack-mounted machines that contain GPUs and we'd use quadros. We have SDR infinibands just to have high bandwidth among them and that's it. We also have a five terabyte file system that's in the process of being expanded to about 50 terabytes. Okay. Displaying data. So that square is a scale representation of the 30-inch monitor compared to Stallion. Okay. And then that Blue Marble project I mentioned. So this is to scale. 3.4 gigapixels versus 300 megapixels versus four megapixels on a single display. So you can't see the entire image but which one would you rather look at that image on if you had the choice of either in your office. So the point being you can't see all the data but you can see more of the data at resolution. So that you can see not only the details but get the context. You can almost see the great wall on that. It's pretty cool. Okay. So display software is not standardized at this scale. The Cal IT Squared team has built CGLX. Those were really the first folks. And the folks at Chicago have built Sage, but they're now closed source. That's a problem because some of the configuration assumptions are baked in. When we first got Sage we had to modify it. It was open source then. We have to run CGLX in gradient mode because it doesn't accept our hardware configuration naturally. And full featured window environments like XDMX don't scale. There's actually hard-coded limits in the code at 16 displays. And so 75 -- we just uncommented that and see what happens. Bad things. There's definitely a need for folks to do X right. And because outside of a full-featured environment, your ability to have software hosted on the large display is limited as well. Primarily images in video streams, because that's the first thing people try and it's relatively easy. You have an API for third-party software but if you want to run a closed source proprietary piece of software or large software base you'd have to come in and modify it. You can use something like Chromium to just sniff the open GL and map it across. But that Chromium stopped development in the mid late nineties. Only supports open GL 1.3. We're at GL4 or 5 now. So a lot of stuff it doesn't support. And then to give you that pan-and-zoom feature for very large images, there's a separate application in Sage and CLX doesn't have support for that at all. So what we've done instead is we've now beta released our own display costs or software that will remain open source. And it combines the features of Sage CGLX magic carpet for large data, large image. It has all those features plus it allows you to toggle between choosing network bandwidth versus disk bandwidth. If you say we had a development event for before UT football game and we streamed 75 football, historic football games on the tile display, which was really cool and it was amazing. They gave us the copyright provision to do that. But it really taxed the ability of the other software programs to do that. So we had to run it in our own software. We are also exploring a touch center face, you'll see a video of that in a moment. And we also have TUIO bluetooth connectivity so you can use smartphones and tablets to control the display. And we're also giving Python scripting interface so that you can can demonstrations or do advanced scripting for interactions across the display. This was really motivated initially by our artists who wanted particular images to come on the display at a particular size at a particular point and the other windowing environments really couldn't handle that. Okay. So let me show you a little clip of this. And so this is showing that pan and zoom that you would normally have to have a separate application open for. But now in display cluster we can show multiple images and zoom into them simultaneously. These are some of the Google Art projects, Van Gogh's Starry Night and Ambassadors from the 17th century. The nice thing about it is everything's public domain. So we can put this up. Here's the Kinect interface. We liked that. [chuckling] so this is a piece of 17th century Russian art. And we were amazed that face looks a lot like Robert Downey, Jr., particularly after a bender. It's a little green tinted. And so again part of the motivation for this is the College of Education wanted to use Macs to run their cluster because they have a relationship with that. And so CGLX and Sage wouldn't run on the Mac. So that really motivated us. We talked about it a long time. But once we had a tangible problem to address and fix that's what motivated us to do that it. We've already had a lot of interest from all the UT institutions that have tile displays, folks at Stanford are interested. University of Central Florida.. University of Michigan. So I think folks are using what's out there now because that's what's out there. And hopefully we'll have some nice moving forward experiences about how our stuff works. This is the touch screen display. This is I think that's Bing maps being over the University of Texas. This was designed in house from six 46-inch Samsung LCDs. PQ Lab 32 point IR frame. And we also have a pane of glass in front to give a smooth swipe experience over the bezels. And we also have the Kinect to do touchless interface. This is really working as a test bed to then take those interface designs to the larger displays. So there's a nice feed back up. And it's driven from a single node with an AMD Ifinity 6 port GPU. That allowed us to keep the costs down. I'll talk a little more about this later. The challenge there is it has really reduced rendering capacity. Only has four gigabytes. It's driving six displays at two megabytes each. If you're trying to stream a video on each of those displays things break down pretty quickly. It's also doable into Windows 7 and [inaudible]. I think this is the last video. But this gives you a sense of how the display works. So this has been motivated by a project with the National Archives. This is showing the digital holdings in a tree map view and the National Archives are really struggling because they don't understand half of what they have in digital form much less how to respond to Freedom of Information Act request. So this is operating to allow those folks from the School of Information from National Archives to really interact with it in a more dynamic way. So let me get to some of the interaction. That's just one of the demos. There's multi-touch map navigation. You can see the whole video on YouTube. So that's Bing there. That's one thing you actually have to hold that and do the rotation. Google Maps lets you do a gesture like this to do that rotation. So any Bing developers in the audience, that would be a nice feature to add. Okay. So let me talk a little bit now in the last few minutes about what -- the types of things that we're doing on these displays. And some of the user successes that we've had. So this is really something that could only happen at Texas, I think, is after the BPO oil spill, we had scientists who have models of how the Gulf coast, how the Gulf, the patterns of ocean currents in the Gulf, and they then just modified that code to track where the oil spills are -- where the oil particles are going. So here's -- I lied, that wasn't the last video. Whoops. Pause it. So here's a visualization of that simulation. And these were done -- the simulations were done in real time so we could give an idea to the responders where the oil might be going. So you can see how this is the end of Louisiana. This is the Mississippi and coming into the New Orleans area. So the simulation was run at multiple resolutions so you could get the coastal effects and also track more broadly in the Gulf. >>: How much can you zoom? >> Paul Navratil: So this -- on this video itself? Or just in general? So you can zoom into the -- to really fine-grained. Now the simulation has granularity limits, because it's a Gulf simulation. So at the coastal elements are rather coarse. But the simulation itself is pretty -is pretty -- it was pretty accurate in terms of where the oil was going. And so this is in the spirit of what we do with NOA during hurricane season. We have several hundred thousand hours of compute time set aside so when the plane flies through a hurricane, they can take those immediate readings and then they run what's called an ensemble simulation where they run really about a dozen different hurricane simulations with slight parameter tweaks or slightly different implementations simultaneously and that's where they get that cone of probability from. They just average the simulation results together. Back when Ike came through in 2008, our super computer actually predicted that at the moment that the storm was going to go over Austin and cause the UT football game to be moved a month later and the storm took a right turn and missed us entirely. So it was our claim to fame that we got the game cancelled. So we've also studied H1N1, not only the molecule itself and how the virus attacks cells, but we've also looked at the epidemiologic effects and this is a nice Web interface that Greg Johnson on my staff has developed. That part really isn't exciting. So here I think we're getting a little choppy. But you can see the counties of Texas and then once you zoom into a particular point, you can run, look at the results of the simulation. I wonder if I broke things. Yep. So you can see the counties are changing color by how much infection is occurring at a particular time. And you can see how it radiates out across the state from a particular input. So that was a disease that was breaking out in Travis County where Austin is, how it slowly follows the population around in a state. And this is something that was commissioned by the Texas Department of Health. So that you can -- so they actually want to use this to track Texas epidemics and to prepare for future ones. And what you saw there was then the graphic output of amounts of antivirals, amounts of folks who are susceptible to the disease and who have recovered or the mortality rates. So this is a really powerful way for those folks to really learn more about what's going on in terms of Texas health. Okay. We've also worked on high resolution mantle convection and what that allowed us to do, this work was featured on the cover of Science not too long ago, that allowed us to then, with the same team, do a simulation of the Tahoku earthquake. And so what you'll see here -- this is work that Greg Abram did -- is the seismic wave propagation that originates in Japan in the corner here. So there's Japan. I assume this is right. Looks like the screen froze. I think the system's having a little problem with it. The video should be smoother. But what you'll see is that as the earthquake waves radiate out, watch in that center here the waves actually hit the core of the earth and reflect back up. And you can see that in the simulation eventually. And so here it comes. There's the shadow of it. Right there. And so that's part of the reverbative effects of the original Quake, really rings the earth like a bell. And you can see that waves propagating all the way out. This is through Alaska down the east -- the western coast of the U.S. Through Greenland into Russia. Okay. And so this is another piece of visualization from that NARA project with the National Archives. And so this is 3-D visualization and/or 3-D imagery that the National Park Service has plus photographic tours. So this allows them to see where those particular files are located. So to summarize, let's talk about the trends that visualization systems track HPC systems. Both at the large scale and at the small scale. The reason we could build a system like Stallion or the touch display is because of the power of individual workstations nowadays. And that VIS systems must be able to scale to HPC size, either in the raw hardware or in the software that has to run on the visualization systems. So, for instance, a multi-threaded Mesa or other software rendering package would be excellent to the develop. And advances in the GPUs themselves not only in the compute where we can move the visualization algorithms onto the GPU, but just in terms of their display power allows us to really increase the density. Although, as we increase the density, it limits the power of the display. And so in terms of expanding a single window for like Bing Maps or for Google Maps, that's fine. But to do high intensity video streaming, that's either CPU or GPU intensive, a single node does show its limitations. But that power allows you to trade off. So can you design your system either with many nodes to make it a proper compute resource in addition to display or you can minimize the nodes to reduce the cost and still have a very large format display. For instance, if we were to redo Stallion today we could span anywhere from seven to 38 nodes, depending on the amount of compute power we wanted behind the display. And so for future work, really as a community we need to address the algorithmic inefficiencies in terms of the visualization stack. Memory efficiency, VTK still has an annoying habit of moving the dataset into an unstructured grid, even when it doesn't have to, especially when there's a structured grid algorithm that can be more efficient, the raw BTK library may still do the dumb thing, both the visit and pair view teams have reimplemented a fair amount of the VTK staff to avoid situations like that and to use the fast case when possible. We're just starting to explore as a visualization community how to include accelerator support. We're already using the GPU for rendering. If we can move more of the algorithm down there and generate the geometry that we're going to render already, then more to the good. And, again, improved software rendering, particularly for HPC systems that don't have graphic support. Right now we really only can use the Mesa library and that's single threaded. And in terms of usability for large scale displays, the windowing interfaces are still limited but we at TACC are working to fix that. Distributed rendering support, Chromium is the latest effort I'm aware of and that was at the end of the mid-90s. So that's 20 years old. Or almost. And then better user interaction. That's what we're working with both in touch and touch-less interfaces or using the interfaces that people carry around in their pockets are smartphones to actually allow us to use the displays. And with that, I'll open it up to questions. Thanks for your attention. [applause] >>: So I have a question. >> Paul Navratil: Go for it. >>: So I was interested in the utilization of Kinect. So what exactly can you use with that interface? Because on the video, the guy's kind of struggling to ->> Paul Navratil: Yeah. So that's with the touch. With the Kinect, that allows us, especially on a display like Stallion, to have a centered control point and then expand more of the display. We're also using, investigating using multiple Kinects to do three dimensional sensing, not just in front but also getting a side sense, and then doing a chain of Kinects to have control through the expansive Stallion instead of just one in the center. So you could be at the side of the display and still have effective control. Part of the challenge is determining clicks. So if you want to grab a window and move it you have to have some sort of gesture, maybe closing a hand so the surface area is reduced. I think the gesture we're using right now is kind of a Pacman, I want to grab this and move it. That's part of the challenge too. There is work out there is using gloves or using a color-coding system in cameras. We're trying to simplify the equipment needed. We don't want to have Michael Jackson gloves for everybody. Yeah? >>: When I look at your picture and whisper bezels, he's got bezels. >> Paul Navratil: He does. >>: How much pain do you find users experience having bezels across. >> Paul Navratil: If they notice it at all, they notice it the first time and grow increasingly resilient to them. We're used to looking out at things like windows. If you're looking out of a window you want something behind it you move your head. Here you just move the data. What it doesn't do well is PowerPoint. If that text is behind the bezel you're not going to read it. So we have a 4 K projection system that gives us that slide show. I did my dissertation defense there on the four megapixel screen. We had another dissertation defense with the four megapixel slides and then the data all across stallion, in terms of videos, stills, and it was 10 interactive -- interactive -- but 10 video posters of his work and it was very compelling. The bezels also give us some other advantages. In issue purchase cost, because these are commodity. We don't have to do any modification. We just stick them up there. Replacement cost -- you may have noticed in the video some of those workstations. Those are the same monitors as in stallion. So we have hot spares we can essentially just pop in there. The only caveat on that you want the same manufacturer lot because the fluorescence has the same temperatures and that gets messy. And the construction and maintenance ourselves also if you ever come to Austin and are right up with stallion it's not a precision fit. So those bezels buffer us a little bit so we can just put them together, us, without shimmying it or doing any sort of touch to it whereas the bar code system that we replaced we had to pay a five figure contact to have someone drive up from Houston every six months to readjust and realign to count for building shift, thermal expansion things like that. Another thing is those bar code systems had to be kept at 60 degrees. And so the students would come in in their winter jackets. Wouldn't want to stay there. The entire walls were painted black to have this immersive experience. Here we've made it warmer. People want to be in there and because it's all commodity parts, it's used to being in the same environment as humans. It's made to be in an office. So we can keep the room at a balmy 68, 70, people can stay in there. >>: 75 monitors heat it up to 84. >> Paul Navratil: Exactly. There is a thermocline as you move forward. We adjusted the HVAC so it now has a row of registers that blow straight down on the display. But 2900 square feet. It's pretty reasonable. It doesn't get too hot in there. >>: We did an 18 pound wall we found the amount of power we are sucking in there was substantial. I was wondering about your power consumption. >> Paul Navratil: So each -- so I know the stats in terms of amps. Just because when we were doing the circuitry on the wall, you know, we wanted to make sure we weren't going to blow anything. The display is determined by the brightness of the monitors. We have it probably at about 1.5, which is just the default turn-on. If you max the brightness it goes to about 1.8 amps. If you minimize it you can go down to about .8 amps. We as part of the renovation put glass doors in so that people could see the display as they go buy in the hall. We tend to keep the display on with interesting things. It's great you literally get these pauses and walk-backs for people wondering what's going on in there. The machines themselves are rated at 8.3 amps. Just warm boot is about if I remember correctly, maybe 1.2 amps. And even running -- it wasn't Lynn Pack but another stress test on it, never got above 2.5 amps. So that 8.3, I'm not sure where they get that but that's extreme, crazy, doing something. So our normal operation is I'd say for the entire system we're probably under 100 amps, which is still a lot. But I think for the size of the display it's pretty reasonable and compared to a datacenter it's not moving the needle. Good question, though. >>: So right now you have people coming in and using it. >> Paul Navratil: Yes. >>: Is there -- I can imagine to a certain extent that they can't do their typical things on here. They've got special apps and the signup and is it mostly sort of a show piece right now or do they actually get some work done? >> Paul Navratil: There's a lot of outreach that happens here and education coming through. For instance, the electromicroscopy, the lab came in, they loved it. The PI was a 40-ish woman and never touched a game controller. It took 40 minutes to pry it out of her hands because she was zooming around and exploring it. Great. The 3-D reconstruction technology, the science they're actually doing is like a key-framer, they stack these slices together and recreate the neurons in 3-D. Software is single threaded, runs on a single workstation, doesn't utilize the 23 nodes of stallion. And so while we could bring up those images at the flick of a button, which they loved, their actual software didn't run there. What we do as a center, we do have funding to do advanced user support projects of porting that over. But there's also the last mile problem of this isn't in my office, this isn't in my building, I can't walk outside in the Texas summer. So there has to be some motivating factors for that. But we do have folks using -- we're actually expecting to get more out of the fine art side, because to be able to zoom in where a brush stroke fills an entire screen, especially of something like the birth of Venus that's in Italy it's a fresco so it doesn't travel, that allows folks -- doesn't replace interacting with the piece but you can get a much different experience with it. And then the science follows. In terms of analysis, rather than big data, I think we have the multiple views gets more use. Where, for instance, for hurricanes you can put path of the storm. You can put the ensemble forecasting, you can put the storm surge very large and all together. They can analyze it as a group and bring in the news media and it looks great there. [chuckling]. If you go to the Texas website, just the home page, the new campaign video for development was shot in the lab. And so we have the university president. We have all the students. And probably about 60 percent of that footage was in the lab. And even though we're a ten-year-old center, we have top-ranked machines in the world. There are still folks on campus that don't know we exist. So in terms of an ambassador and outreach piece, it's already worth what it can do. And the science we can get out of it is in some ways a bonus. >>: Do you have the [inaudible] any of that super high res video conferencing stuff they've been doing? >> Paul Navratil: Yeah, and so we can do that as well. That's one of the things we wanted to design into our software. We had one of the lead developers for Sage was on our staff for a time. He's since gone to Slumberje but in Portland we had a live stream to our display colt from Australia. It comes back to the bandwidth we had to set it up with National Lambda Rail to get a dedicated 10 gig connection. A whole lot of hoops to jump through rather than just use Skype and call Aaron on a Sunday. >>: Don't call me on a Sunday. >> Paul Navratil: Exactly. So in terms of high stream data, the interesting take-away from that experiment is the compressed video actually looked much better than the raw. We did compressed and raw HD stream. And just the amount of packet loss in between made the raw stream really noticeably Jankey where the compression -- yeah, that's a technical term -- where the compression actually recovered from that a bit. >> Aaron Smith: All right. No more questions. Let's thank our speaker one more time. >> Paul Navratil: Thank you very much. I appreciate it. [applause]