Data Visualization - A Very Rough Guide Ken Brodlie University of Leeds SDMIV 1 What is This Thing Called Visualization? Visualization – “Use of computersupported, interactive, visual representations of data to amplify cognition” (Card, McKinlay, Shneiderman) – Born as a discipline in 1987 with publication of NSF Report – Now widely used in computational science and engineering SDMIV Vis5D 2 Visualization – Twin Subjects Scientific Visualization – Visualization of physical data Ozone layer around earth SDMIV Information Visualization – Visualization of abstract data Automobile web site - visualizing links 3 Scientific Visualization – Another Characterisation Focus is on visualizing an entity measured in a multi-dimensional space – – – – 1D 2D 3D Occasionally nD Underlying field is recreated from the sampled data Relationship between variables well understood – some independent, some dependent SDMIV http://pacific.commerce.ubc.ca/xr/plot.html Image from D. Bartz and M. Meissner 4 Scientific Visualization Model Visualization represented as pipeline: – Read in data – Build model of underlying entity – Construct a visualization in terms of geometry – Render geometry as image data model visualize render Realised as modular visualization environment – IRIS Explorer – IBM Open Visualization Data Explorer (DX) – AVS SDMIV 5 Extending the SciVis Model The dataflow model has proved extremely flexible Provides basis of collaborative visualization data model visualize collaborative server internet – Implemented in IRIS Explorer as the COVISA toolkit render Extensible – User code introduced as module in pipeline allows computational steering SDMIV render control simulate visualize render 6 An e-Science Demonstrator Emergency scenario: release of toxic chemical – Simulation launched on Grid resource, steered from desktop using IRIS Explorer – Collaborators linked in remotely using COVISA toolkit Dispersion of pollutant studied under varying wind directions A collaborator links in over the network SDMIV 7 Other Metaphors SDMIV Other user interface metaphors have been suggested Spreadsheet interface becoming popular.. Allows audit trail of visualizations Jankun-Kelly and Ma 8 Information Visualization Focus is on visualizing set of observations that are multi-variate Example of iris data set – 150 observations of 4 variables (length, width of petal and sepal) – Techniques aim to display relationships between variables SDMIV 9 Dataflow for Information Visualization Again we can express as a dataflow – but emphasis now is on data itself rather than underlying entity First step is to form the data into a table of observations, each observation being a set of values of the variables Then we apply a visualization technique as before SDMIV data data table visualize render variables observations 1 A .. B .. C .. 2 .. .. .. 10 Multivariate Visualization Techniques designed for any number of variables – – – – SDMIV Glyph techniques Parallel co-ordinates Scatter plot matrices Pixel-based techniques Acknowledgement: Many of images in following slides taken from Ward’s work Software: – Xmdvtool Matthew Ward ..and also IRIS Explorer! 11 Glyph Techniques Star plots – Each observation represented as a ‘star’ – Each spike represents a variable – Length of spike indicates the value Variety of possible glyphs – Chernoff faces Crime in Detroit SDMIV 12 Parallel Co-ordinates Each variate represented as vertical axis Axes laid out uniformly Observation represented as a polyline traversing all M axes, crossing each axis at the observed value of the variate SDMIV Detroit homicide data (7 variables,13 observations) 13 Scatter Plot Matrices Matrix of 2D scatter plots – Each plot shows projection of data onto a 2D subspace of the variates – Order M2 plots SDMIV 14 The Screen Space Problem All techniques, sooner or later, run out of screen space Parallel coordinates – Usable for up to 150 variates – Unworkable greater than 250 variates SDMIV Remote sensing: 5 variates, 16,384 observations) 15 Brushing as a Solution SDMIV Brushing selects a restricted range of one or more variables Selection then highlighted 16 Clustering as a Solution Success has been achieved through clustering of observations Hierarchical parallel co-ordinates – Cluster by similarity – Display using translucency and proximity-based colour SDMIV 17 Hierarchical Parallel Coordinates SDMIV 18 Reduction of Dimensionality of Variate Space Reduce number of variables, preserve information Principal Component Analysis – Transform to new coordinate system – Hard to interpret Hierarchical reduction of variate space – Cluster variables where distance between observations is typically small – Choose representative for each cluster SDMIV 19 Using a Dataflow System for Information Visualization IRIS Explorer used to visualize data from BMW – Five variables displayed using spatial arrangement for three, colour and object type for others – Notice the clusters… More later.. Kraus & Ertl SDMIV 20 Scientific Visualization – Information Visualization Scientific Visualization SDMIV Focus is on visualizing an entity measured in a multi-dimensional space Underlying field is recreated from the sampled data Relationship between variables well understood Information Visualization Focus is on visualizing set of observations that are multi-variate There is no underlying field – it is the data itself we want to visualize The relationship between variables is not well understood 21