Visualization of Multidimensional Multivariate Large Dataset Presented by: Zhijian Pan zpan@cs.umd.edu University of Maryland Description Covered papers: – Alfred Inselberg, Multidimensional Detective – Ted Mihalisin, Visualizing Multivariate Functions, Data, and Distributions The problem: • Visualization and analysis of large dataset with multiple parameters or factors, and the key relationships among them • MDMV problem Key words explanation Multidimensional: – The dimensionality of independent variables Multivariate: – The dimensionality of dependent variables Example: – 3-D volume space+temperature+pressure produces 3D2V data The data set could larger than number of pixels Four Stages of Development 1st:Graphical representation of either one or two variate data, e.g. scatterplot, scatterplot matrix 2nd:Two dimensional graphics, but encoding multiple parameters, e.g. color, size,shape coding 3rd:High dimensional graphics, high speed computation, single display, such as Parallel Coords 4th:elaboration and assessment of various visualization techniques MDMV Visualization Category Broadly categorized into five groups: – Brushing – Panel Matrix – Iconography – Hierarchical Displays – Non-Cartesian Displays Group 1 Brushing – Direct manipulation of MDMV visualization display:labeling, enhanced linking – E.g. brushing a scatterplot matrix Group 2 Panel Matrix (pairwise 2-D plot, n-D box) – E.g. Hyperbox: n*n lines, n*(n-1)/2 faces – Elaboration of scatterplot matrix – Adding interactive data navigation (hyperbox cutting) Group 3 Iconography: Glyphs: graphical entities which encode MDMV with shape, size, color, and position. – E.g. faceglyph: size and position of eyes, nose, mouth; curvature of mouth; angle of eyebrows Group 4 Hierarchical Displays: – map a subset of variates into different hierarchical display – Dynamic interactive analysis – the Ted Mihalisin paper, more details followed Group 4 (cont’d) New term: speed=the hierarchical axes E..g. Three variables:x,y,and z: {0,1,2} X the fastest axis, Z the slowest axis Group 4 (Cont’d) Visualizing 3 variables: – 2 interdependent variables: x, y: • x= -2, -1, 0, 1, 2; • y= -2, -1, 0, 1, 2 – 1 dependent variable: z = x**2 + y**2 – so, a 2D1V problem – x fastest, y slowest Group 4 (Cont’d) 3d1v: W = (x**2) * (e**-y) + z • Top panel speed order : x, y, z • Bottom panel speed order: z, y, x Group 4 (cont’d) What if the number of the data points greatly exceeds the number of horizontal pixels assigned to the panel? Example: 7 independent variables + each has 10 values = 10,000,000 points Need: – hierarchical subspace zooming to reduce dimension Group 4 (cont’d) From 7D to 2D: Group 4 (cont’d) example: experiment data visualization: – Dependent: specific heat – Independent: • Fastest: temperature (white) :gaussian peak • Then alloy concentration (blue): linear increase • Then magnetic field (red) :nonlinear decrease Group 5 Parallel Coordinates – So many class presentations have already been done! – Everybody is already expert using it – What are some basic ideas behind it? – Cartesian v.s. Parallel Coords Group 5 (cont’d) A Cartesian line: – L: x2 = mx1+b – A set of points sampled on this line • On Parallel Coords: – Each point becomes a line – The set of points becomes a set of intersecting lines Group 5 (cont’d) The intersect point: The location of the intersect point is important! – Between two axes: inversely proportional (x1 α 1/x2) – Outside two axes: directly proportional (x1 α x2) Group 5 (cont’d) Application example – Aircraft collision checking – Converting the problem into detecting a four dimension geometric intersection – Collision at (2,2,2,1) Group 5 (cont’d) Application example: – Economic model of a real country – 8 variables: • • • • • • • • Agriculture Fishing Mining Manufacturing Construction Government Miscellaneous GNP Group 5 (cont’d) A Least Squares function defines the boundary region in 8 dimension space Any point (polygon) inside the boundary represents a feasible economic policy for the country Group 5 (cont’d) Discoveries: – No policy would favor Agriculture without also favoring Fishing: (x1 α x2) – Inverse relationship between Fishing and Mining: resource competition: (x1 α 1/x2) Notes on the References The Inselberg’s paper: – 11 citations found on researchIndex – Application in knowledge discovery, user interface, aircraft design, etc. Ted Mihalisin paper: – Only one citation found Contribution Inselberg’s paper: – Transform MDMV hyperspace relations into a 2-D geometric pattern problem – empirical studies demonstrated the ability extending the strength with trade-off analysis, discover sensitivities, and optimization Mihalisin’s paper: – Hierarchical technique visualizing data points greatly exceeding number of pixels Critique Inselberg’s paper: – No comparison with other MDMV techniques – No examples supporting the claim that displayed objects can be recognized under projective transformations Mihalisin’s paper: – Limited number of values for each variable visualized in one display – No discussion of potential information loss with coarse-grained grid Favorite Sentence “You can’t be unlucky all the time!” – Multiple techniques exist for MDMV visualization problem – Each has strength and weakness – Whichever you start with, you can’t be unlucky all the time! – Integration and collaboration of existed tools remain to be active research topics.