Remote Visualisation of Large Oceanic Datasets LJ West, J Stark, PD Killworth, P Challenor, J Marotzke Southampton Oceanography Centre European Way, Southampton SO14 3ZH UK Abstract The global high-resolution ocean model, OCCAM, has been run by D. Webb and colleagues at the Southampton Oceanography Centre (SOC) for many years. It was configured to resolve the energetic scales of oceanic motions, and its output is stored at the Manchester Supercomputer Centre. Although this community resource represents a treasure trove of potential new insights into the nature of the world ocean, it remains relatively unexploited for a number of reasons, not least of which is its sheer size. Computer visualisation of datasets is a powerful way of presenting vast amounts of information in a fashion accessible to the human mind. However, the lack of readily available hardware and software tools amenable to the task means that, too often, it is simply not an option. Under discussion, is a system being developed at SOC, which makes the remote visualisation of very large volumes of data on modest hardware (e.g. a laptop with no special graphics capability) a present reality. This system is enabling our researchers to investigate the unresolved question of oceanic convection and its relationship to large-scale flows; a question which lies at the heart of many current climate change issues. 1. Introduction The use of large scientific datasets continues to expose unforeseen bottlenecks, pitfalls and other surprises throughout the computer visualisation process (VP). For example, at the beginning of the process, it is necessary frequently to store 3dimensional data fields in a ‘chunked’ format in order to spread the burden of reading slices evenly over the data array’s axes. This innovation (supported by the Hierarchical Data Format) vastly reduces the time taken to perform the ‘average’ slicing operation of OCCAM data, and reduces network traffic, too. An example from the middle of the VP is the computation of isosurfaces – a staple for consumers of 3D visualisations, generally. It is a computationally expensive task, but even though highly parallelisable, it is still not possible to execute in real-time (i.e. in the blink of an eye), and can render some applications non-interactive. Another limit as being approached at end of the VP: The number of model elements (grid points in the case of OCCAM) along one axis of a dataset is becoming comparable to the number of pixels on a high resolution display device. It is unlikely that computer-screen resolutions will increase significantly in the future because of the limited resolving power of the human eye, but the relentless march of Moore’s Law virtually guarantees that dataset sizes will increase almost indefinitely. Rendering problems such as this are beginning to be addressed in a practical way by open source software visualisation libraries such as ‘VTK’ from kitware, which provides a facility for sub sampling geometric objects. Each of these problems is compounded when the components of a computer visualisation system exist in different geographical locations. For example, the raw data may reside on a GRID server in one location, the processing cluster used to generate the requested diagnostic fields may reside elsewhere, the computer graphics (CG) engine may be housed at a further site, and a scientific end user (armed with only a modest PC and network connection) could be working almost anywhere in the world. As part of the GODIVA project, a prototype system of this kind is being developed at the SOC, enabling remote users to process and visualise vast amounts of OCCAM output hosted by the GADS server at ESSC – a hitherto impossible task. The intermediate stages, (scientific processing and isosurface generation) are performed by the SOC’s 12 processor 24Gb SGI Onyx300 graphics supercomputer, ‘proteus’. The final rendering is performed locally, or by one of proteus’s 1Gb graphics pipes, optionally, if the client machine’s graphics card is not up to scratch. In the final stage, the generated isosurfaces are cached on proteus, courtesy of those 24Gb, allowing users to sweep through isosurface values in real time, subject to network constraints. In the following sections, the model data and software system are outlined, before preliminary results are presented and a short discussion of further work and technical issues. 2. OCCAM Ocean Model The Ocean Circulation Climate Advanced Model (OCCAM) uses two curvilinear rectangular patches connected at the Atlantic equator to span the world ocean. The version used in the present work has 1/4o horizontal resolution and 36 vertical levels. Vertical contours of water-mass properties such as temperature can indicate convection, and vertical velocities, inferred from divergence, may represent downwelling. The question of whether these two processes are co-located lies at the heart of many current climate change issues. The prognostic fields required to investigate this matter, Temperature and Horizontal Velocity, are downloaded along with the configuration fields, Topography and Depth before being processed into vertical velocities and passed on to the next stage. The Temperature field is processed into a sequence of isosurfaces (approximately 40) between two interesting optima, chosen apriori. Isosurface generation is a CPU intensive task, and the size of the resulting geometry objects may vary considerably, because of the connectivity information required to represent a completely general, possibly multiplyconnected or highly convoluted region. It allows the user to flick through the isovalues and get a feel for the structure of the 3D field, but why, it could be asked, is this better, or even different, from scrolling through a movie-style visualisation? There are a number of reasons: Firstly, the cached object can be manipulated, interacted with and viewed from different angles, unlike a movie frame. Secondly, other objects, sheets of vertical w-velocities, for example, can be added to the scene. The total number of different scene configurations, then, is nm rather than nm, as it would be in the case of a movie, where n is the number of cached frames, and m is the number of objects in the scene. 2. System Details Much effort went into choosing an appropriate software platform for such a tool, although when all the requirements were gathered, the number of technologies fit for the task was small The software had to be 64bit (to cope with the sheer size of the data, and access proteus’s address space), multithreaded (for future parallelisation), and fast (for efficient processing). Other desirables were platform independence and open source, for portability, distribution and accessibility to the wider scientific community. C++ was adopted because of its speed, ubiquity and descriptive power (i.e. it takes less lines of code to do the job). The GSOAP library provides access to web services, although it is only distributed in a 32bit format at present, which inhibits the use of very large files, so today’s demonstration uses a locally cached server file. ‘Locally,’ in this sense means local-to-the-processing-server, i.e. at Southampton, and definitely NOT local to the client machine here in Nottingham. The 32bit problem will be discussed in a later section. The Kitware VTK library supplies high level visualisation functionality and supports a number of useful features such parallel processing of isosurfaces and sub sampling of geometry objects by implementing level-ofdetail actors. The entire development environment is open source and has been constructed in both 32- and 64bit platforms, and is suitable for PC users and users of large datasets (I.e. >2Gb). The Silicon Graphics Vizserver software allows post rendered graphical output from proteus to be piped across the internet, and supports a number of compression modes up to a ratio of 32:1. This enables remote users to view and manipulate highly detailed and complex scenes by utilizing the full graphics capabilities of proteus at Southampton. 3. Results Consider figure 1. It shows the CPU and memory usage of an end user ‘client’ machine without using Vizserver. In other words, the client machine is responsible for displaying the geometry information received across the internet. Naturally, the performance of the application will depend very much on the power of the graphics card, and this is revealed in the figure. Here, the client machine has an NVIDIA GeForce4-4800 Titanium (384Mb) graphics card and dual AMD Athlon processors, 2Gb of main RAM and is running Red Hat 9 – hardly low powered for a PC. The earlier CPU history (not shown) is steady and the levels are approximately equal to those at the left of the CPU Usage and % Memory Usage graphs respectively. If the Vizserver software is used, very little changes in these graphs, except for a small amount of CPU usage incurred by manipulating the scene. Unsurprisingly, this system monitor profile reflects the activity of proteus. Clearly, there are spikes of activity in the CPU Usage History graph. These spikes reach halfway up the scale, suggesting that the client window is running on one processor (50% CPU on a dual CPU machine). This is indeed the case. It also appears that their frequency is increasing slightly. Between the spikes, CPU usage levels are the same as before the application begins to run, suggesting that the client box is doing nothing as it waits for more information from the main server. The earlier peaks are barely noticeable, whereas the later ones overwhelm the single CPU, albeit briefly. This increase in CPU usage is heralded by a gentle increase in memory usage. If the Vizserver software is used, these profiles remain uninterestingly flat for practical purposes, and so are not shown here, but this indicates that variations in CPU and memory usages are due entirely to the changing contents of the graphics window. As has been stated, the generation of isosurfaces is CPU intensive, and the process is being executed on a single processor of the server, proteus. This bottleneck corresponds to periods of inactivity on the client machine – the flat patches between spikes. Once a surface has been generated, however, a call to render it sends the geometry object ‘down the wire’ to the client which takes responsibility for rendering on receipt. At first, the client machine is effectively empty, and the geometry object is simply passed through to the graphics card, whose activity is not recorded in the System CPU Usage History, (which is one of the reasons for having a graphics card in the first place.) Eventually, it becomes clear that even 384Mb of texture memory is not sufficient to cope with a scene of this complexity, and subsequent geometry objects must be stored in slower main memory, whose allocation is evidenced by an incline in the % memory usage graph. The CPU must now take part in the graphical process, as the graphics card begins to make demands on main memory at every turn. This is the reason for the increase in the size of the CPU Usage spikes. Figure 1. System load under local rendering. By the time all the surfaces have been loaded and rendered, and before the scene has even been interacted with, approximately 1.5Gb of graphics data have been transferred from proteus to the client, which is impractical, clearly, for a low bandwidth connection (~56Kbps), for a machine possessing a low power graphics card, having a small memory, or one that is just downright slow. It is important to remember, here, that even if the user were operating proteus locally, the ability to sweep through isosurfaces could not be performed in real time, (i.e. interactively) even by utilizing many processors. It is caching that achieves this, but incurs a longer start up time. The increasing frequency between successive spikes is due to the nature of temperature data on the interval of interest. Colder isosurfaces are generated before warmer ones, i.e. to the left of the history graph. The colder surfaces tend to span the ocean in a stratified manner and so yield global (and therefore large) geometries, but the warmer ones outcrop at the surface and frequently consist of only a few isolated blobs, and have a smaller memory footprint, therefore. Figure 2 shows a temperature isosurface intersecting a ‘hedgehog’ of vertical, w-velocity arrows in the North Atlantic. Large outliers in w can be seen as huge arrows emerging from the surface. These occur at the coast where the water has nowhere else to go but vertically, due to the fact that w is inferred from the horizontal divergence. The w-field is notoriously difficult to compute accurately, due to the cancellation of large and similar terms. 4. Further Work Isosurface Parallelisation Parallelisation of isosurface generation will improve results for users of higher bandwidth connections and higher-performance graphics cards, as this will reduce the length of flat spots in CPU usage for these users. VTK supports this kind of parallelisation very well. GUI Toolkit Use of Trolltech’s QT widget library is popular and has proved successful with other GODIVA partners. Drawing on the group’s expertise, QT has been adopted at SOC and will be used in subsequent GUI development. GSOAP32 issues 64bit versions of libraries are increasingly common, but are far from ubiquitous. It is unfortunate that the GSOAP library, necessary to access the ESSC GADS webserver, is distributed in a 32bit format only at present. There are two possible ways to circumvent this problem. The preferred solution would be to obtain a 64bit version, either from the GSOAP community or by minimal in house development. Alternatively, a separate 32bit-server component could be developed which streams data in <2Gb chunks and streams the data through a socket to a software tool derived from the current 64bit application. A communications toolkit, ACE appears to be ideal for the purpose, is available in both 32- and 64-bit versions, and is also used successfully by other GODIVA partners. References I. Foster, J. Insley, G. von Laszewskei, C. Kesselman and M. Thiebaux, Distance Visualisation: Data exploration on the grid, Computer, 32(1999), pp.36-43 J. Marotzke, Boundary mixing and the dynamics of three-dimensional thermohaline circulations, J. Phys. Oceanogr., 27 (1997), pp. 1713-1728 Figure 2 z-plane of w-velocity cutting through an isotherm in the North Atlantic J. Marotzke and J. R. Scott, Convective mixing and the thermohaline circulation, J. Phys. Oceanogr., 29(1999), pp. 2962-2970