Meshing with Grids: Toward functional abstractions for grid-based visualization Rita Borgo∗ David Duke† University of Leeds University of Leeds Malcolm Wallace‡ University of York Colin Runciman§ University of York Abstract A challenge for grid computing is finding abstractions that separate concerns about what a grid application must achieve from the specific resources on which it might be deployed. One approach, taken by a range of toolkits and APIs, is to add further layers of abstraction above the raw machine and network, allowing explicit interaction with grid services. The developers’ view, however, remains that of a stateful von Neumann machine, and the interleaving of grid services with domain computations adds complexity to systems. An alternative is to replace this tower of abstraction with a new kind of programming structure, one that abstracts away from complexities of control and location while at the same time adapting to local resources. This paper reports early results from a project exploring the use of functional language technologies for computationally demanding problems in data visualization over the grid. Advances in functional language technology mean that pure languages such as Haskell are now feasible for scientific applications, bringing opportunities and challenges. The opportunities are the powerful types of ‘glue’ that higher order programming and lazy evaluation make possible. The challenge is that structures that work well within stateful computation are not necessarily appropriate in the functional world. This paper reports initial results on developing functional approaches to an archetypal visualization task, isosurfacing, that are a preliminary step towards implementing such algorithms on a polytypic grid. 1 Introduction Distribution and parallelism are inherent properties of grid computing environment, and grid programming requires attitude and skills that go beyond that of traditional sequential or even parallel and distributed programming. Programmers need to cope not only with sharing of resources but to handle computation in an environment characterised by heterogeneous and dynamic resources. Resources allocated to a program may vary between executions, and in some cases may change during execution. Managing this complexity ideally requires a combination of run-time support systems and high level abstraction that allow the programmer to separate cleanly applications concerns and grid interfaces. Building such layers of service abstraction is an approach that has served computing well in the past, giving developers reusable domain-independent blocks for building an application. For example middleware and libraries such as Globus, OpenGL, and VTK are likewise abstractions that provide high-level access to lower-level services (HPC tasks, graphical rendering, and visualization, respectively). Data visualization is an application domain that has a close affinity with grid computing for two reasons: large scale datasets are challenging computational problem, and visualization is frequently an important component of grid computing applications. A number of architectures have been proposed and developed for data visualization, including spreadsheets, relational databases, spray-rendering, scene graphs, and pipelines. They provide a layer of application-oriented services on 1 which problem-specific visualization tools can be constructed. Although some of these architectures can be used in a grid environment (e.g. Cactus [1]) this is by explicit use of low-level services. Of the approaches explored to date, the pipeline model has found the most widespread use. It underlies the implementation of well-known systems such as AVS, SCIRun, and also serves as a conceptual model for visualization workflow as VTK [16],. For the pipeline model, services provide the capability to organize visualization operations within a dataflow-like network. Some pipelined systems extend the basic model with demand-driven evaluation and streaming of dataset chunks, again frozen into the service layer. Streaming [12] is an enrichment to the basic model that allows a pipeline to pass datasets in chunks. For scientific data, such chunks are usually spatially contiguous subsets of the full extent. Some algorithms, for example Marching Cubes [14], can operate on individual chunks in isolation. Others require access to the full dataset, for example surface reconstruction: the dataset may be passed as a sequence of chunks, with algorithms working downstream and upstream on different sequencesof the pipeline. However, this layered approach fixes design decisions associated with the services, without regard for the operations that are implemented in terms of those services. Pipeline services provide a lazy, dataflow-like model, but client operations are defined as a separate layer of stateful computation. While pipeline capabilities have advanced, both the services and the algorithms that use those services continue to be implemented using imperative languages, usually C or C++. The underlying computational model Figure 1: (a) Marching cubes applied to a 2D dataset (e.g. marching squares) and extracted isocontour; (b) Marching cubes applied to a 3D dataset and extracted isosurface.) is call-by-value parameter-passing, yet the way to access services from an application is conceptually callby-need. In contrast, non-strict functional languages such as Haskell [7] use a call-by-need evaluation strategy in which function arguments are only evaluated to the extent they are demanded (if at all). Apart from closely matching the pipeline model, this strategy also provides a ‘new kind of glue’ [8] for assembling programs from components. Recent work on functional languages has produced new forms of generic programming including polytypic functions (GH [3]) and typegeneric frameworks (SYB [10]). In Generic Haskell (GH), a polytypic function captures a family of polymorphic functions in a single, inductive, and typed definition. Instances of the family for specific types can be generated automatically by the GH compiler. In a grid context, polytypic definitions may support single functions that capture patterns of computation over a family of data representations. This paper sets out initial findings on work aimed at using functional technologies for data visualization over the grid. Implementations of ‘lazy’ functional languages have advanced significantly over the last decade. They now have well-developed interfaces to low-level services such as graphics and I/O. In this paper we take surfacing as an archetypal visualization task. We illustrate how pipelining and demand-driven evaluation become naturally integrated within the expression of an algorithm. The resulting implementations have a pattern of space utilization quite different to their imperative counterparts, occupying an intermediate point between purely in-core and out-of-core approaches. Section 2 revisits the basic marching cubes algorithm for surface extraction, using the lazy functional language Haskell [7]. Through a series of refinements, we show how pipelining and demand-driven evaluation allow the use of memoization to improve performance. Section 2.5, on evaluation, gives particular attention to the space performance of our implementation. The lazy streaming approach set out here features low memory residency, even with larger datasets. Section 3 discusses related research. The work reported here is a first step in a much larger programme of work, and in Section 4 we summarize the achievement to date. 2 Marching Cubes, Functionally Without giving a full tutorial on Haskell, we need to introduce some key aspects of functional languages, for which we use the classic Marching Cubes algorithm as an exemplar. Marching Cubes is a technique widely used in the visualization field to represent 3D scalar fields in terms of approximating polygonal surfaces. Given a scalar field F and a constant value c then the locus of points x, with x ∈ R 3 , that satisfies the equation F (x) = c represents an isosurface of value c. Given a threshold value c the marching cube algorithm proceeds through the scalar field, testing the corner of each voxel (or cube) in the scalar field as being either above or below the threshold. If one or more corners of the cube have values less than the threshold, and one or more have values greater than this value, the voxel must contribute some component of the isosurface. Locating which edges of the cube are intersected by the isosurface, it is possible to determine the polygons that must be created to represent the part of isosurface that is intersected by the voxel. The result of the marching cubes algorithm is a smooth surface that approximates the isosurface that is constant along a given threshold. Figure 1 represents the algorithm for the 2D and 3D case respectively. Examples of 2D scalar fields are temperature, pressure and humidity in a meteorologichal context, examples of 3D scalar fields can be taken from the medical field lik eCT scans of human skull or body parts. We first implement it in the standard fashion, iterating through an array of sample values, then refine the implementation into suite lazily streaming variations. These illustrate two of the main benefits of laziness – on-demand processing (permitting fine-grained pipelining of input and output data), and automatic sharing of already-computed results. In the following section we introduce some of the early results presented in more detail in [5] we then exploit how our approach can interestingly suit within the grid context. 2.1 Ordinary, array-based algorithm. First, we explore a straightforward representation of the dataset as a three-dimensional array of sample values. type XYZ = (Int,Int,Int) type Num a => Dataset a = Array XYZ a These type definitions declare synonyms for the actual array representation. Concrete type names are capitalised, for instance the Array index domain type is XYZ. The type variable (lower-case a) in the range of the array indicates that the type of the samples themselves is generic (polymorphic). The predicate Num a constrains the polymorphism: samples must have arithmetic operations defined over them. Thus, we can reuse the algorithm with bytes, signed words, floats, complex numbers, and so on, without change. isosurface :: Num a => a -> Dataset a -> [Triangle b] This type declaration (signature) of the Marching Cubes isosurface function shows that it takes two arguments, a threshold value and the dataset, and computes from them a sequence of triangles approximating the surface. The triangles can be fed directly into e.g. OpenGL for rendering. The full visualization pipeline can be written:1 pipeline t = mapper view . normalize . isosurface t . reader Here the dot . operator means pipelined composition of functions. The last function in the chain is applied to some input (a filename), and its results are fed back to the previous function, whose results are fed back, and so on. Since this operator is the essence of the pipeline model, let’s look briefly at its definition: (.) :: (b->c) -> (a->b) -> a -> c (f . g) x = f (g x) Dot takes two functions as arguments, with a third argument being the initial data. The result of applying the second function to the data is used as the argument to the first function. The type signature should help to make this clear - each type variable, a, b, and c, stands for any arbitrary (polymorphic) type, where for instance each occurrence of a must be the same, but a and b may be different. Longer chains of these compositions can be built up, as we have already seen in the earlier definition of pipeline. Dot is our first example of higher-order function. From the very name “functional language” one can surely guess that functions are important. Indeed, passing functions as arguments, and receiving functions as results, comes entirely naturally. A function that receives or returns a function is called higher-order. Shortly, we will need another common higher-order function, map, which takes a function f and applies it to every element of a sequence: map :: (a->b) -> [a] -> [b] map f [] = [] map f (x:xs) = f x : map f xs 1 Our Haskell implementation is actually built directly on the HOpenGL binding, so the mapping phase is implemented slightly differently, via a function that is invoked as the GL display callback. This definition uses pattern-matching to distinguish the empty sequence [], from a non-empty sequence whose initial element is x, with the remainder of the sequence denoted by xs. Colon : is used both in patternmatching, and to construct a new list. Now to the algorithm itself. We assume the classic table, either hard-coded or generated by the Haskell compiler from some specification. Full details of these tables are not vital to the presentation and are omitted; see [14] for example. Marching Cubes iterates through the dataset from the origin. At every cell it considers whether each of the eight vertices is below or above the threshold, treating this 8-tuple of Booleans as a byte-index into the case table. Having selected from the table which edges have the surface passing through them, we then interpolate the position of the cut point on each edge, and group these points into threes as triangles, adding in the absolute position of the cell on the underlying grid. isosurface threshold sampleArray = concat [ mcube threshold lookup (i,j,k) | k <- [1 .. ksz-1] , j <- [1 .. jsz-1] , i <- [1 .. isz-1] ] where (isz,jsz,ksz) = rangeSize sampleArray lookup (x,y,z) = eightFrom sampleArray (x,y,z) In Haskell, application of a function to arguments is by juxtaposition so in the definition of isosurface, the arguments are threshold and sampleArray. The standard array function rangeSize extracts the maximum co-ordinates of the grid. The larger expression in square brackets is a list comprehension2 , and denotes the sequence of all applications of the function mcube to some arguments, where the variables (i,j,k) range over (or are drawn from) the given enumerations. The enumerators are separated from the main expression by a vertical bar, and the evaluation order causes the final variable i to vary most rapidly. This detail is of interest mainly to ensure good cache behaviour, if the array is stored with x-dimension first. The comprehension can be viewed as equivalent to nested loops in imperative languages. The result of computing mcube over any single cell is a sequence of triangles. These per-cube sequences are concatenated into a single global sequence, by the standard function concat. Now we look more closely at the data structure representing an individual cell. For a regular cubic grid, this is just an 8-tuple of values from the full array. type Cell a = (a,a,a,a,a,a,a,a) eightFrom :: Array XYZ a -> XYZ -> Cell a eightFrom arr (x,y,z) = ( arr!(x,y,z), arr!(x+1,y,z) 2 It bears similarities to Zermelo-Frankel (ZF) set comprehensions in mathematics. , arr!(x+1,y+1,z), arr!(x,y+1,z) , arr!(x,y,z+1), arr!(x+1,y,z+1) , arr!(x+1,y+1,z+1), arr!(x,y+1,z+1) ) Finally, to the definition of mcube: mcube :: a -> (XYZ->Cell a) -> XYZ -> [Triangle b] mcube thresh lookup (x,y,z) = group3 (map (interpolate thresh cell (x,y,z)) (mcCaseTable ! bools)) where cell = lookup (x,y,z) bools = toByte (map8 (>thresh) cell) The cell of vertex sample values is found using the lookup function that has been passed in. We derive an 8-tuple of booleans by comparing each sample with the threshold (map8 is a higher-order function like map, only over a fixed-size tuple rather than an arbitrary sequence), then convert the 8 booleans to a byte (bools) to index into the classic case table (mcCaseTable). The result of indexing the table is the sequence of edges cut by the surface. Using map, we perform the interpolation calculation for every one of those edges, and finally group those interpolated points into triples as the vertices of triangles to be rendered. The linear interpolation is standard: interpolate :: Num a => a -> Cell a -> XYZ -> Edge -> TriangleVertex interpolate thresh cell (x,y,z) edge = case edge of 0 -> (x+interp, y, z) 1 -> (x+1, y+interp, z) ... 11 -> (x, y+1, z+interp) where interp = (thresh - a) / (b - a) (a,b) = selectEdgeVertices edge cell Although interpolate takes four arguments, it was initially applied to only three in mcube. This illustrates another important higher-order technique: a function of n arguments can be partially applied to its first k arguments; the result is a specialised function of n − k arguments, with the already-supplied values ‘frozen in’. 2.2 Factoring Common Design Patterns The implementation outlined so far is a naive and quite straightforward “functional translation” of the traditional marching cubes algorithm. As for the original C implementation sections where improvements can be made are easy to spot: (1) the entire dataset is assumed to be loaded in memory; (2) common behaviours like threshold comparison and interpolant computation are not factored out and shared between adjoining cells; (3) and ambiguous cell configurations are not considered. The ideal solution to the listed issues would intuitively be an implementation of the algorithm that would take care of all the three kind of problems simultaneously. However often ideal solutions are taylored to specific resources of computation, policy not affordable in a multi-facet environment like the grid. Experience shows how to one holistic solution it is often preferable an amenable suite of solutions capable to efficiently provide optimal trade-offs between results and available computational power: different implementations for different environments. However the ability to swap between different versions of the same algorithms involves evincing crucial patterns of behaviour within the algorithm itself. Exploiting the abstraction and expression power of Haskell in the context of the marching cubes algorithm, we have outlined several different implementations that in turn take care of memory issues, when the entire dataset cannot be loaded in memory, and sharing of computation, when the same computation is carried on the same data more than once. 2.2.1 Stream based algorithm When dealing with large dataset the monolithic array data structure presented so far is not feasible; it simply may not fit in core memory. A solution is to separate traversal and processing of data. For its inner structure the marching cube algorithm only ever needs at any one moment, a small partition of the whole dataset: a single point and 7 of its neighbours suffices, making up a unit cube of sample values. If we compare this with a typical array or file storage format for regular grids (essentially a linear sequence of samples), then the unit cube is entirely contained within a “window” of the file, corresponding to exactly one plane + line + sample of the volume. The ideal solution is to slide this window over the file, constructing one unit cube on each iteration, and dropping the previous unit cube. Figure 2 illustrates the idea. Figure 2: Sliding a window over a grid Following this idea our first implementation provides a streamed version of the algorithm as follow: isosurfaceS :: (Ord a, Int a, Fract b) => a -> Dataset a -> [Triangle b] isosurfaceS thresh (D size samples) = concat (zipWith2 (mcubeS thresh) (cellStream samples) allXYZ ) where cellStream = disContinuities size . mkStream size allXYZ = [XYZ i j k | k <- [0 .. ksz-2] , j <- [0 .. jsz-2] byte stream, before it is tupled up into cells, rather than , i <- [0 .. isz-2]] after. (XYZ isz jsz ksz) = size isosurfaceT :: (Ord a, Int a, Fract b) => a -> Dataset a -> [Triangle b] mcubeS :: (Ord a, Int a, Fract b) => a -> Cell a -> XYZ -> [Triangle b] isosurfaceT thresh (D size samples) = concat (zipWith3 (mcubeT thresh) mcubeS thresh cell xyz = (cellStream samples) group3 (map (interpolate thresh cell xyz) (idxStream samples) allXYZ ) (mcCaseTable ! byte)) where where cellStream = disContinuities size . byte = toByte (mapCell (>thresh) cell) mkStream size Haskell allows us to read data out of a file in this idxStream = map toByte . cellStream . streamed fashion using lazy file I/O. The content of the map (>thresh) file appears to the program as a sequence of bytes, exallXYZ = [XYZ i j k | k <- [0 .. ksz-2] tended on-demand one byte at a time.3 As for dropping , j <- [0 .. jsz-2] data after it has been consumed, Haskell is a garbage, i <- [0 .. isz-2]] collected language, so when a datum is no longer ref(XYZ isz jsz ksz) = size erenced by the computation, the memory storing it is recycled automatically. mcubeT :: (Int a, Fract b) => a -> Cell a The datatype representing the dataset is constructed -> Byte -> XYZ -> [Triangle b] from a lazy sequence of samples, stored along with the mcubeT thresh cell index xyz = bounds of the grid: group3 (map (interpolate thresh cell xyz) (mcCaseTable ! index)) data Num a => Dataset a = D XYZ [a] Taking the notion of sharing-by-construction one The sliding window of eight point values (cell) is step further, we now memoize the interpolation of extracted from the lazy stream of samples as follows: edges. Recall that, in the result of the mcCaseTable, 8 copies of the datastream are laid side-by-side, one the sequence of edges through which the isosurface value from each of the 8 is then repeatedly sliced off passes may have repeats, because the same edge beand glued together into a cell. We refer to [5] for a longs to more than one triangle of the approximated surmore detailed description of the code. Table 2 shows face. But in general, an edge that is incident on the isothe memory performances of our streamed implemen- surface is also common to four separate cells, and we tation. would like to share the interpolation calculation with The advantage of call-by-need over call-by-name is those cells too. So, just as the threshold calculation was that although the evaluation of an item might be de- performed at an outer level, on the original datastream, layed until it is needed, it is never repeated, no mat- something similar can be done here building 12-tuple of ter how often the value is used. If we want to share a possible edges, one entry for each cube edge, adding a computation between different parts of the program, we per-edge description of how to compute the interpolajust arrange for the shared value to be constructed in tion one place, by one expression, rather than constructing it multiple times which leads to multiple evaluations. type CellEdge a = (a,a,a,a,a,a,a,a,a,a,a,a) In the streaming version of marching cubes presented so far, we can see that the reading of sample values isosurfaceI :: (Ord a, Int a, Fract b) => from file is shared and performed only once. However, a -> Dataset a -> [Triangle b] by construction, comparison against the threshold value isosurfaceI thresh (D size samples) = (in mcubeS) is performed eight times for every sample, concat (zipWith3 mcubeI because on each occasion, the sample is at a different (edgeStream samples) vertex position in the cell. Depending on the compu(idxStream samples) allXYZ ) tational power available it is sometime worth to allow where for redundant computations rather than to increase the edgeStream = disContinuities size . complexity of the algorithm or adding extra structure mkCellEdges thresh size at memory expenses. However such a conclusion can cellStream = disContinuities size . be often achieved only after a performance comparison mkStream size of both solutions. For what concerns algorithm comidxStream = map toByte . cellStream . plexity in our implementation it does not represents an map (>thresh) issue, to compute the comparison only once per sample, allXYZ = [XYZ i j k | k <- [0 .. ksz-2] we just need to do the thresholding against the original , j <- [0 .. jsz-2] , i <- [0 .. isz-2]] 3 For efficiency, the underlying system may choose to hold (XYZ isz jsz ksz) = size variable-size buffers for the file, but crucially, that buffering can be tuned to match available resources of memory, disc, and processor. mcubeI :: Num a => CellEdge a -> Byte -> XYZ -> [Triangle a] mcubeI edges index xyz = group3 (map (selectEdge edges xyz) (mcCaseTable ! index)) This per-edge implementation guarantees that an interpolated vertices is computed only once and therefore no replication of the same value are present. When dealing with slow graphics hardware the possibility to reduce the amount of information sent to be rendered (i.e. duplicated primitives like vertices, edges or faces) is worth to be prosecuted. Up to now we have built three different version of the same algorithm able to cope with the previously marked issues. Each version represents an independent and ready to execute program and at the same time as the flexibility to be merged with with its siblings to generate a unique optimized solution. We still miss one point worth of noting, it is well-known that in the original marching cubes, ambiguous cases can occur, and several efforts have been carried on in literature to enhance and generalize the original method to assure topological correctness of the result. Within the available solutions we adopted the approach proposed in [2, 13]. 2.3 Functional Patterns following defines a class of algorithm that given a cell intersected by the contour (isosurface in 3D) follows it walking along its neighbours until the starting point (the surface border) is reached. In our Haskell implementation we have split the contour following algorithm into two main functions: (1) Traverse Seeds: which given a Seed Set and a threshold value, searches the seeds sets for all the cells that constitute a seed for the given value. (2) Grow Contour: which given a Seed grows the contour following the contour path through cells adjacent to the seed. The function signature are expressed as follows: traverseSeeds:: Dataset a -> a -> [Seed a] -> [Triangle b] growContour:: Dataset a -> a -> Seed a -> [Triangle b] Our implementation of the Contour Tracking algorithm is built on top of the marching cube implementation, first the Seed data is introduced to define an arbitrary type of cell (in the regular case either a cube or a tetra) while growContour employs at its interior alternatively mcube or mtetra to extract the complete isosurface according to the seed (cell) kind. 2.4 Observations. The approach presented so far has several interesting aspects. The availability of a suite of polymorphic versions of the same algorithm allows to choose between the more suitable ones for the type of resources available. Figures from Tables 3 and 4 show how performance can change between platforms with similar comisosurface:: a -> Dataset a -> [Triangle b] putational power but different architectures. Moreover mcube:: a -> Cell a -> Format -> Dataset a through polymorphic types (sec 2.1) functions can be -> [Triangle b] defined independent of the datatypes available on a specific architecture; type predicates allow developers to We pushed the abstraction a step further generalizset out the minimum requirements that particular types ing the interface to other surface fitting techniques in must satisfy. Beyond the scope of this paper, polytypism the means of marching tetrahedra and contour tracking. [9], also known as structural polymorphism on types, Both algorithm fall within the aforementioned specifihas the capability to abstract over data organisation and cation. The marching tetrahedra algorithm is closely traversal, e.g. a polytypic marching cubes could be aprelated to marching cubes except that the fundamental plied to other kinds of dataset organization like irregsampling structure is a tetrahedron instead of a cube. ular grids. The employment of a functional language Extension of the marching cubes code presented so far, like Haskell makes the construction of such a suite easy to the marching tetrahedra technique keeps unchanged to achieve. Its abstraction power outlines common rethe isosurface signature while the marching tetrahedra cursive patterns quite easily. If we consider the class of itself looks as follow. surface fitting techniques presented the emerging pattern is the one made up of a set of Cells (the Dataset) mtetra :: (Num a, Int a, Fract b) => a -> a threshold value and a fitting technique. The fitting TetraGrid a -> Cell a -> technique itself (marching cubes, marching tetrahedra, XYZ -> trilinear interpolation) appears strictly dependent on the [ Triangle b ] Cell kind (cube or tetrahedra) up to a generic and commtetra thresh g cell lookup = prehensive definition that culminates in the Seed definigroup3 map interpolate g ((!)mtCaseTable tion. The generic pattern can be then specified accord(caseIndex lookup cell thr)) ing to the kind of Dataset (Cell) and within each imThe interpolation function remains unchanged plementation further qualified with respect to specific as well while mcCaseTable is substituted with computational issues. At the same time the intrinsic mtCaseTable table containing the possible configura- property of the language where each function is indetion of the interpolant within a tetrahedral cell. Contour pendent of the other ease the process of merging pieces Occurrence of the same computational pattern are easily evinceable from the signature of each implemented function. The creation of a gluing interface for the suite of algorithm version is straightforward: Table 1: Dataset Statistics. size wndw surface (b) neghip 64×64×64 4,16 131,634 hAtom 128×128×128 16,51 134,952 statueLeg 341×341×93 116,62 553,554 aneurism 256×256×256 65,79 1,098,582 skull 256×256×256 65,79 18,415,053 stent8 512×512×174 262,65 8,082,312 vertebra8 512×512×512 262,65 197,497,908 dataset of code coming from different implementations but defined by logically equivalent signatures. In our testing phase for example we have noticed how the best performance on both machine were achieved by a marching cubes implementation which included both the streaming and sharing of the threshold computation. 2.5 Time and Space Profiles Performance numbers are given for all the presented versions of marching cubes written in Haskell, over a range of sizes of dataset (all taken from volvis.org). The relevant characteristics of the datasets are summarised in Table 1, where the streaming window size is calculated as one plane+line+1. Tables 3 and 4 give the absolute time taken to compute an isosurface at threshold value 20 for every dataset, on two different platforms, a 3.0GHz Dell Xeon and a 2.3GHz Macintosh G5 respectively, compiled with the ghc compiler and -O2 optimization. Table 2 shows the peak live memory usage of each version of the algorithm, as determined by heap profiling. dataset neghip hydrogen statueLeg aneurism skull stent8 vertebra8 Table 2: Memory Usage memory (MB) array stream. VTK 0.270 0.142 1.4 2.10 0.550 3.0 11.0 3.72 15.9 17.0 2.10 28.1 17.0 2.13 185.3 46.0 8.35 119.1 137.0 8.35 1,300.9 On the Intel platform the array-based version struggles to maintain acceptable performance as the size of the array gets larger. We suspect that the problem is memory management costs: (1) The data array itself is large: its plain unoptimised representation in Haskell uses pointers, and so it is already of the order of 5– 9× larger than the original file alone, depending on machine architecture. (2) The garbage collection strategy used by the compiler’s runtime system means that there is a switch from copying GC to compacting GC at larger sizes, which changes the cost model. (3) The compiler’s memory allocator in general uses blocksizes up to a maximum of a couple of megabytes. A single array value that spans multiple blocks will im- Table 3: Time Performance - Intel time (s) dataset array stream. VTK neghip 1.09 0.44 0.06 hydrogen 16.7 3.47 0.21 275.0 17.9 1.09 statueLeg 619.7 28.1 1.73 aneurism skull 626.4 30.1 28.6 4530.0 79.7 13.1 stent8 vertebra8 6530.0 277.8 269.2 Table 4: Time Performance - PowerPC time (s) dataset array stream. VTK neghip 1.088 0.852 0.29 hydrogen 8.638 6.694 0.51 48.78 34.54 2.78 statueLeg aneurism 72.98 54.44 5.69 skull 79.50 57.19 79.03 stent8 287.5 154.9 33.17 517.1 755.0 vertebra8 703.0 pose extra administrative burden. In contrast, the time performance of the streaming version scales linearly with the size of dataset outperforming VTK (see Table 4). It can also be seen that the memory performance is exactly proportional to the size of the sliding window (plane+line+1). The streaming version has memory overheads too, mainly in storing and retrieving intermediate (possibly unevaluated) structures. However, the ability to stream datasets in a natural fashion makes this approach much more scalable to large problem sets. Comparing the haskell implementations with implementations in other languages something interesting appears. If we consider the figures obtained by running the algorithm on a Mac paltforms the streaming Haskell version is actually faster than VTK for the larger surfaces generated by skull and vertebra8. Moving to a different platform the panorama changes “apparently” in favour of the VTK implementation. However surprisingly the Haskell implementation appears to be still competitive. The interesting aspect worth noting is the speed-up gained by VTK which is much bigger than the one of the Haskell implementation, in proportion to the increase in computational power due to a faster processor. We suspect that such a difference resides in the compiler-generated code which could easily be better optimized for an Intel architecture rather than for a PowerPC. 3 Related Work The difficulties of working with large volumes of data have prompted a number of researchers to consider whether approaches based on lazy or eager evaluation strategies would be most appropriate. While call-byneed is implicit in lazy functional languages, several in the kind of data defining the sample points, and several of the simple functions used to make the final application are eminently reusable in other visualization algorithms (for example mkStream). Our next step is to explore how type-based abstraction, e.g. polytypic programming, can be used to make the algorithm independent of the specific mesh organization; we would like the one expression of marching cubes to apply both to regular and irregular meshes. Acknowledgement The work reported in this paper was funded by the UK Engineering and Physical Sciences Research Council. References Figure 3: Functionally surfaced dataset coloured to show age of triangles in stream generated by the streaming marching cubes Haskell implementation over the neghip dataset. efforts have explored more ad hoc provision of lazy evaluation in imperative implementations of visualization systems e.g. [11],[4]. In [15] Moran et al. use a coarse-grained form of lazy evaluation for working with large time-series datasets. The fine-grained approach inherent in Haskell not only delays evaluation until it is needed, but also evaluates objects piecewise. This behaviour is of particular interest in graphics and computational geometry, where order of declaration and computation may differ. Properties of our fine-grained streaming approach also match requirements for data streaming from [12], we refer for a more detailed disscussion on this topic to [5]. 4 Conclusion Our purely functional reconstruction of the marching cubes algorithm makes two important contributions. First, it shows how functional abstractions and data representations can be used to factor algorithms in new ways, in this case by replacing monolithic arrays with a stream-based window, and composing the overall algorithm itself from a collection of (functional) pipelines. This is important in the context of grid computing, because a stream- based approach may be more suitable for distribution than one that relies on a monolithic structure. It is also important for visualization, as streaming occupies an important niche between fully in-core and fully out-of-core methods, and the functional approach is novel in that the flow of data is managed on a need-driven basis, without the programmer resorting to explicit control over buffering. Second, the functional reconstruction shows that elegance and abstraction need not be sacrificed to improve performance; the functional implementation is polymorphic [1] G. Allen, T. Dramlitsch, I. Foster, N.T. Karonis, M. Ripeanu, E. Seidel, and B. Toonen. Supporting efficient execution in heterogeneous distributed computing environments with cactus and globus. In Supercomputing ’01: Proc. of the 2001 ACM/IEEE conference on Supercomputing, pages 52–52, 2001. [2] E. Chernyaev. Marching cubes 33: Construction of topologically correct isosurfaces. Technical report, 1995. [3] D. Clarke, R. Hinze, J. Jeuring, A. oh, and J. de Wit. The generic haskell user’s guide, 2001. [4] M. Cox and D. Ellsworth. Application-controlled demand paging for out-of-core visualization. In Proceedings of Visualization ’97, pages 235–ff. IEEE Computer Society Press, 1997. [5] D. Duke, M. Wallace, R. Borgo, and C. Runciman. Fine-grained visualization pipelines and lazy functional languages. IEEE Transaction on Visualization and Computer Graphics, 12(5), 2006. [6] R.B. Haber and D. McNabb. Visualization idioms: A conceptual model for scientific visualization systems. In Visualization in Scientific Computing. IEEE Computer Society Press, 1990. [7] Haskell: A purely functional language. http://www.haskell.org, Last visited 27-03-2006. [8] J. Hughes. Why functional programming matters. Computer Journal, 32(2):98–107, 1989. See also http://www.cs.chalmers.se/ rjmh/Papers/whyfp.html. [9] Johan Jeuring and Patrik Jansson. Polytypic programming. In J. Launchbury, E. Meijer, and T. Sheard, editors, Tutorial Text 2nd Int. School on Advanced Functional Programming, Olympia, WA, USA, 26–30 Aug 1996, volume 1129, pages 68– 114. Springer-Verlag, 1996. [10] R. Lämmel and S. Peyton Jones. Scrap your boilerplate: a practical design pattern for generic programming. ACM SIGPLAN Notices, 38(3):26–37, 2003. Proceedings of the ACM SIGPLAN Workshop on Types in Language Design and Implementation (TLDI 2003). [11] D.A. Lane. UFAT: a particle tracer for time-dependent flow fields. In Proceedings of Visualization ’94, pages 257–264. IEEE Computer Society Press, 1994. [12] C.C. Law, W.J. Schroeder, K.M. Martin, and J. Temkin. A multi-threaded streaming pipeline architecture for large structured data sets. In Proceedings of Visualization ’99, pages 225– 232. IEEE Computer Society Press, 1999. [13] T. Lewiner, H. Lopes, A.W. Vieira, and G. Tavares. Efficient implementation of marching cubes’ cases with topological guarantees. Journal of Graphics Tools, 8(2):1–15, 2003. [14] W.E. Lorensen and H.E. Cline. Marching cubes: A high resolution 3d surface construction algorithm. In Proceedings of SIGGRAPH’87, pages 163–169. ACM Press, 1987. [15] P.J. Moran and C. Henze. Large field visualization with demand-driven calculation. In Proceedings of Visualization’99, pages 27–33. IEEE Computer Society Press, 1999. [16] W. Schroeder, K. Martin, and B. Lorensen. The Visualization Toolkit: An Object-Oriented Approach to 3D Graphics. Prentice Hall, second edition, 1998.