Digital Data Visualization May 1, 2001 Hwan-Seung Yong Dept. of Computer Science & Eng Ewha Womans Univ. hsyong@ewha.ac.kr Contents • Background • Visualization Example – OLAP – Data Mining • Multimedia Data Mining • Spatial Data Mining • Text Mining • New Visual Approach – Visual ICON Language – Visual Language • Future Trend Data Visualization 2 Database View of Data Visualization • File Processing: – Record by record navigation, • Network/Hierarchical Data Model – Record based interface using Text – Records have network/hierarchical structure • Conceptual Modeling, Database Design – Entity-Relationship Model – ER Diagram • Relational Model – 2 dimensional Table – QBE User Interface: 2 dimensional Data Visualization 3 ERwin Database Designer Data Visualization 4 Access Query Interface: QBE Data Visualization 5 Definition of Visualization • To form a mental vision, image, or picture of something not visible (an abstraction) – To make visible to the mind or imagination – [Oxford Dictionary, 1989] • Visualization is a method of Computing – It transforms the symbolic into geometric, enabling researchers to observe their simulations and computation. – Enrich the process of scientific discovery – Foster profound and unexpected insights – In many fields, it is already revolutionizing the way scientists do science – [MCC89] Data Visualization 6 Scientific Visualization/Goals • • • • • • • • exploration/exploitation of data and information enhancing understanding of concepts and processes gaining new (unexpected, profound) insights making invisible visible effective presentation of significant features quality control of simulations, measurements increasing scientific productivity medium of communication/collaboration Data Visualization 7 Visualization and adjacent disciplines • Computer Graphics: Efficiency of algorithms (CG) vs effectiveness of use (V). • Computer Vision: Mapping from pictures to abstract description (CV) vs mapping from abstract description to pictures (V). • Image Processing: Mapping from data domain to data domain (IP) vs mapping from data domain to picture domain (V). • Art and Design: Aesthetics and style (AD) versus expressiveness and effectiveness (V). Data Visualization 8 Kind of Digital Data • Atomic Value (Numeric, String, Boolean) • Multimedia Data – Sound & Audio, Video, Text • Complex Data Structure – Tuple, Set, Array, Stack, Queue, Tree, Graph etc • Large Set of Data – Database • What to visualize? • Why • How Data Visualization 9 New Data Processing Technique • Object-Oriented/Relational Data Model – Complex Data: Graph style • Multimedia: – Visual Interface is required – Time/Space/Sound and 3 dimension • Data Warehousing, OLAP and – Multi-dimensional Modeling and Cube Browser • Data Mining – Visual Interface for Mining – Visual data mining • Data pattern analysis • Clustering Data Visualization 10 Why Visualization? • Development of H/W and S/W – Computer graphic and visualization technology • Interactive and Windows Age • Visual programming Language – Visual Basic, Visual C++ etc. • Visual ICON language – Emoticon • Multimedia and Animation Data Visualization 11 Scientific Data Visualization Data Visualization 12 Boxplot Analysis • Five-number summary of a distribution: Minimum, Q1, M, Q3, Maximum • Boxplot – Data is represented with a box – The ends of the box are at the first and third quartiles, i.e., the height of the box is IRQ – The median is marked by a line within the box – Whiskers: two lines outside the box extend to Minimum and Maximum Data Visualization 13 A Boxplot A boxplot Data Visualization 14 Visualization of Data Dispersion: Boxplot Analysis Data Visualization 15 Data Visualization Systems • • • • • • AVS, IBM Visualization Data Explorer, SGI Explorer Khoros, SciAn, other PD vis packages NetMap S-Plus, SPSS, MatLab, Mathematica, MAPLE XmdvTool, Xgobi Xsauci Data Visualization 16 From Tables and Spreadsheets to Data Cubes • A data warehouse is based on a multidimensional data model which views data in the form of a data cube • A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions – Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year) – Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables • In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube. Data Visualization 17 Visualization of OLAP Model using Star Schema time item time_key day day_of_the_week month quarter year Sales Fact Table time_key item_key branch_key branch location_key branch_key branch_name branch_type units_sold dollars_sold avg_sales item_key item_name brand type supplier_type location location_key street city province_or_street country Measures Data Visualization 18 A Concept Hierarchy: Dimension (location) all all Europe region country city office Germany Frankfurt ... ... ... Spain North_America Canada Vancouver ... L. Chan Data Visualization ... ... Mexico Toronto M. Wind 19 View of Warehouses and Hierarchies Specification of hierarchies • Schema hierarchy day < {month < quarter; week} < year • Set_grouping hierarchy {1..10} < inexpensive Data Visualization 20 Multidimensional Data • Sales volume as a function of product, month, and region Dimensions: Product, Location, Time Hierarchical summarization paths Industry Region Year Product Category Country Quarter Product City Office Month Data Visualization Month Week Day 21 A Star-Net Query Model Customer Orders Shipping Method Customer CONTRACTS AIR-EXPRESS ORDER TRUCK PRODUCT LINE Time Product ANNUALY QTRLY DAILY PRODUCT ITEM PRODUCT GROUP CITY SALES PERSON COUNTRY DISTRICT REGION Location Each circle is called a footprint DIVISION Data Visualization Promotion 22 Organization OLAP User Interface: Drilling Down • Drilling Down to the lowest level of Customer Dimension Data Visualization 23 Examples: Discovery-Driven Data Cubes Data Visualization 24 Browsing a Data Cube • Visualization • OLAP capabilities • Interactive manipulation25 Data Visualization Data Visualization 26 OLAP (Summarization) Display Using MS/Excel 2000 Data Visualization 27 3D Cube Browser Data Visualization 28 Data Mining Result Visualization • Presentation of the results or knowledge obtained from data mining in visual forms • Examples – Scatter plots and boxplots (obtained from descriptive data mining) – Decision trees – Association rules – Clusters – Outliers – Generalized rules Data Visualization 29 Visualization of Association Data Visualization 30 Data Visualization 31 Data Visualization 32 Data Visualization 33 Data Visualization 34 Data Visualization 35 Market-Basket-Analysis (Association)—Ball graph Data Visualization 36 Display of Association Rules in Rule Plane Form Data Visualization 37 Display of Decision Tree (Classification Results) Data Visualization 38 Output: A Decision Tree for “buys_computer” age? <=30 student? overcast 30..40 >40 credit rating? yes no yes excellent fair no yes no yes Data Visualization 39 Visualization of a decision tree in MineSet 3.0 Data Visualization 40 Display of Clustering (Segmentation) Results Data Visualization 41 C-BIRD: Content-Based Image Retrieval fr om Digital libraries Search by image colors by color percentage by color layout by texture density by texture Layout by object model by illumination invariance Data Visualization by keywords 42 Multi-Dimensional Search in Multimedia Databases Color layout Data Visualization 43 Multi-Dimensional Analysis in Multimedia Databases Color histogram Texture layout Data Visualization 44 Mining Multimedia Databases Refining or combining searches Search for “airplane in blue sky” (top layout grid is blue and keyword = “airplane”) Search for “blue sky and green meadows” Search for “blue sky” (top layout grid is blue and bottom is green) (top layout grid is blue) Data Visualization 45 Multidimensional Analysis of Multimedia Data • Multimedia data cube – Design and construction similar to that of traditional data cubes from relational data – Contain additional dimensions and measures for multimedia information, such as color, texture, and shape • The database does not store images but their descriptors – Feature descriptor: a set of vectors for each visual characteristic • Color vector: contains the color histogram • MFC (Most Frequent Color) vector: five color centroids • MFO (Most Frequent Orientation) vector: five edge orientation centroids – Layout descriptor: contains a color layout vector and an edge layout vector Data Visualization 46 Mining Multimedia Databases in Data Visualization 47 Data Visualization 48 Mining Multimedia Databases The Data Cube and the Sub-Space Measurements By Size By Format By Format & Size RED WHITE BLUE Cross Tab JPEG GIF By Colour By Colour & Size RED WHITE BLUE Group By Colour Sum By Format Sum RED WHITE BLUE Measurement Sum Data Visualization By Format & Colour By Colour • Format of image • Duration • Colors • Textures • Keywords • Size • Width • Height • Internet domain of image • Internet domain of parent pages 49 • Image popularity Mining Multimedia Databases Spatial Relationships from Layout property P1 on-top-of property P2 property P1 next-to property P2 Different Resolution Hierarchy Data Visualization 50 Data Visualization 51 Data Visualization 52 Data Visualization 53 Classification in MultiMediaMiner Data Visualization 54 Mining Associations in Multimedia Data • Special features: – Need # of occurrences besides Boolean existence, e.g., • “Two red square and one blue circle” implies theme “air-show” – Need spatial relationships • Blue on top of white squared object is associated with brown bottom – Need multi-resolution and progressive refinement mining • It is expensive to explore detailed associations among objects at high resolution • It is crucial to ensure the completeness of search at multi-resolution space Data Visualization 55 Data Visualization 56 Text Miner: Feature Extracton example from IBM Intelligent Miner Data Visualization 57 Visual Data Mining & Data Visualization • Integration of visualization and data mining – – – – data visualization data mining result visualization data mining process visualization interactive visual data mining • Visual Data Mining: the process of discovering implicit but useful knowledge from large data sets using visualization techniques • Data visualization – Data in a database or data warehouse can be viewed • at different levels of granularity or abstraction • as different combinations of attributes or dimensions – Data can be presented in various visual forms Data Visualization 58 Boxplots from Statsoft: multiple variable combinations Data Visualization 59 Visualization of data mining results in SAS Enterprise Miner: scatter plots Data Visualization 60 Visualization of association rules in MineSet 3.0 Data Visualization 61 Visualization of cluster groupings in IBM Intelligent Miner Data Visualization 62 GeoMiner Visualization Example Data Visualization 63 Spatial Clustering Data Visualization 64 Spatial Association • Association Rules – isa(X, "Golf Course") -> closeto(X, "Man-Made Channel") (61%, 61%). isa(X, "Golf Course") & closeto(X, "Secondary road") -> closeto(X, "Open space") (64%, 78%). Data Visualization 65 Data Mining Process Visualization • Presentation of the various processes of data mining in visual forms so that users can see – – – – – – How the data are extracted From which database or data warehouse they are extracted How the selected data are cleaned, integrated, preprocessed, and mined Which method is selected at data mining Where the results are stored How they may be viewed Data Visualization 66 Visualization of Data Mining Processes by Clementine Data Visualization 67 Interactive Visual Data Mining • Using visualization tools in the data mining process to help users make smart data mining decisions • Example – Display the data distribution in a set of attributes using colored sectors or columns (depending on whether the whole space is represented by either a circle or a set of columns) – Use the display to which sector should first be selected for classification and where a good split point for this sector may be Data Visualization 68 Interactive Visual Mining by PerceptionBased Classification (PBC) Data Visualization 69 Visual ICON Language • Video Annotation Problem – 과거에는 비디오 데이타들이 1회성으로 사용 – 전문가들이 주석을 달아 저장, 검색 – 현대는 반복 재사용 비디오의 시대 • 어떻게 비디오 데이터를 검색할 것인가? • Keyword based approach 의 한계 – Do not describe temporal structure of video – Not semantic representation • ‘dog’ and ‘German shepherd’ – Do not describe relations between descriptions • Only ‘man’, ‘dog’ ‘bite’ not “dog bite man” – Do not scale, set of new keyword increase Data Visualization 70 Language for representation of Video content • ICON Annotation Language, why? – Quick recognition and browsing of annotation – Accurate and readable – Global, international use • Example – 'Arnold, an adult male, wears a jacket' – ‘scene is located inside a bar in United States of America’ – Character action: full body actions, head actions, arm actions, and leg actions Data Visualization 71 Language for representation of Video content • Number of object – single object, two objects, or groups of objects • Media Timeline Editor – Timeline annotation of Icon sentence • Icon Space, icon palette, – a utility for constructing and retrieving iconic sentences Data Visualization 72 Media Timeline Editor Data Visualization 73 Icon Space Data Visualization 74 ICONS used (sample) Data Visualization 75 MIT Visual Language Project Data Visualization 76 Some Words: Integration of Text and Visual Icon Data Visualization 77 Future Trend • • • • Animated Visualization vs static visualization 3D Visualization vs 2D Visualization 3D with Animated Visualization Cinematic Technique is becoming more and more important for User Interface – Lev Manovich, Professor of UCSD – The language of New Media, 2000, MIT Press • Find New metaphor – Spiral Curve etc. Data Visualization 78