Data Visualization

advertisement
Digital Data Visualization
May 1, 2001
Hwan-Seung Yong
Dept. of Computer Science & Eng
Ewha Womans Univ.
hsyong@ewha.ac.kr
Contents
• Background
• Visualization Example
– OLAP
– Data Mining
• Multimedia Data Mining
• Spatial Data Mining
• Text Mining
• New Visual Approach
– Visual ICON Language
– Visual Language
• Future Trend
Data Visualization
2
Database View of Data Visualization
• File Processing:
– Record by record navigation,
• Network/Hierarchical Data Model
– Record based interface using Text
– Records have network/hierarchical structure
• Conceptual Modeling, Database Design
– Entity-Relationship Model
– ER Diagram
• Relational Model
– 2 dimensional Table
– QBE User Interface: 2 dimensional
Data Visualization
3
ERwin Database Designer
Data Visualization
4
Access Query Interface: QBE
Data Visualization
5
Definition of Visualization
• To form a mental vision, image, or picture of something
not visible (an abstraction)
– To make visible to the mind or imagination
– [Oxford Dictionary, 1989]
• Visualization is a method of Computing
– It transforms the symbolic into geometric, enabling researchers to
observe their simulations and computation.
– Enrich the process of scientific discovery
– Foster profound and unexpected insights
– In many fields, it is already revolutionizing the way scientists do
science
– [MCC89]
Data Visualization
6
Scientific Visualization/Goals
•
•
•
•
•
•
•
•
exploration/exploitation of data and information
enhancing understanding of concepts and processes
gaining new (unexpected, profound) insights
making invisible visible
effective presentation of significant features
quality control of simulations, measurements
increasing scientific productivity
medium of communication/collaboration
Data Visualization
7
Visualization and adjacent disciplines
• Computer Graphics: Efficiency of algorithms (CG) vs
effectiveness of use (V).
• Computer Vision: Mapping from pictures to abstract
description (CV) vs mapping from abstract description to
pictures (V).
• Image Processing: Mapping from data domain to data
domain (IP) vs mapping from data domain to picture
domain (V).
• Art and Design: Aesthetics and style (AD) versus
expressiveness and effectiveness (V).
Data Visualization
8
Kind of Digital Data
• Atomic Value (Numeric, String, Boolean)
• Multimedia Data
– Sound & Audio, Video, Text
• Complex Data Structure
– Tuple, Set, Array, Stack, Queue, Tree, Graph etc
• Large Set of Data
– Database
• What to visualize?
• Why
• How
Data Visualization
9
New Data Processing Technique
• Object-Oriented/Relational Data Model
– Complex Data: Graph style
• Multimedia:
– Visual Interface is required
– Time/Space/Sound and 3 dimension
• Data Warehousing, OLAP and
– Multi-dimensional Modeling and Cube Browser
• Data Mining
– Visual Interface for Mining
– Visual data mining
• Data pattern analysis
• Clustering
Data Visualization
10
Why Visualization?
• Development of H/W and S/W
– Computer graphic and visualization technology
• Interactive and Windows Age
• Visual programming Language
– Visual Basic, Visual C++ etc.
• Visual ICON language
– Emoticon
• Multimedia and Animation
Data Visualization
11
Scientific Data Visualization
Data Visualization
12
Boxplot Analysis
• Five-number summary of a distribution:
Minimum, Q1, M, Q3, Maximum
• Boxplot
– Data is represented with a box
– The ends of the box are at the first and third quartiles, i.e., the
height of the box is IRQ
– The median is marked by a line within the box
– Whiskers: two lines outside the box extend to Minimum and
Maximum
Data Visualization
13
A Boxplot
A boxplot
Data Visualization
14
Visualization of Data Dispersion:
Boxplot Analysis
Data Visualization
15
Data Visualization Systems
•
•
•
•
•
•
AVS, IBM Visualization Data Explorer, SGI Explorer
Khoros, SciAn, other PD vis packages
NetMap
S-Plus, SPSS, MatLab, Mathematica, MAPLE
XmdvTool, Xgobi
Xsauci
Data Visualization
16
From Tables and Spreadsheets to Data
Cubes
• A data warehouse is based on a multidimensional data model which
views data in the form of a data cube
• A data cube, such as sales, allows data to be modeled and viewed in
multiple dimensions
– Dimension tables, such as item (item_name, brand, type), or time(day,
week, month, quarter, year)
– Fact table contains measures (such as dollars_sold) and keys to each of the
related dimension tables
• In data warehousing literature, an n-D base cube is called a base cuboid.
The top most 0-D cuboid, which holds the highest-level of summarization,
is called the apex cuboid. The lattice of cuboids forms a data cube.
Data Visualization
17
Visualization of OLAP Model using Star
Schema
time
item
time_key
day
day_of_the_week
month
quarter
year
Sales Fact Table
time_key
item_key
branch_key
branch
location_key
branch_key
branch_name
branch_type
units_sold
dollars_sold
avg_sales
item_key
item_name
brand
type
supplier_type
location
location_key
street
city
province_or_street
country
Measures
Data Visualization
18
A Concept Hierarchy: Dimension (location)
all
all
Europe
region
country
city
office
Germany
Frankfurt
...
...
...
Spain
North_America
Canada
Vancouver ...
L. Chan
Data Visualization
...
...
Mexico
Toronto
M. Wind
19
View of Warehouses and Hierarchies
Specification of hierarchies
• Schema hierarchy
day < {month < quarter; week}
< year
• Set_grouping hierarchy
{1..10} < inexpensive
Data Visualization
20
Multidimensional Data
• Sales volume as a function of product, month, and region
Dimensions: Product, Location, Time
Hierarchical summarization paths
Industry Region
Year
Product
Category Country Quarter
Product
City
Office
Month
Data Visualization
Month Week
Day
21
A Star-Net Query Model
Customer Orders
Shipping Method
Customer
CONTRACTS
AIR-EXPRESS
ORDER
TRUCK
PRODUCT LINE
Time
Product
ANNUALY QTRLY
DAILY
PRODUCT ITEM PRODUCT GROUP
CITY
SALES PERSON
COUNTRY
DISTRICT
REGION
Location
Each circle is
called a footprint
DIVISION
Data Visualization
Promotion
22
Organization
OLAP User Interface: Drilling Down
• Drilling Down to the lowest level of Customer Dimension
Data Visualization
23
Examples: Discovery-Driven Data Cubes
Data Visualization
24
Browsing a Data Cube
• Visualization
• OLAP capabilities
• Interactive manipulation25
Data Visualization
Data Visualization
26
OLAP (Summarization) Display Using MS/Excel 2000
Data Visualization
27
3D Cube Browser
Data Visualization
28
Data Mining Result Visualization
• Presentation of the results or knowledge obtained from data
mining in visual forms
• Examples
– Scatter plots and boxplots (obtained from descriptive data mining)
– Decision trees
– Association rules
– Clusters
– Outliers
– Generalized rules
Data Visualization
29
Visualization of Association
Data Visualization
30
Data Visualization
31
Data Visualization
32
Data Visualization
33
Data Visualization
34
Data Visualization
35
Market-Basket-Analysis (Association)—Ball graph
Data Visualization
36
Display of Association Rules in Rule Plane Form
Data Visualization
37
Display of Decision Tree (Classification Results)
Data Visualization
38
Output: A Decision Tree for “buys_computer”
age?
<=30
student?
overcast
30..40
>40
credit rating?
yes
no
yes
excellent
fair
no
yes
no
yes
Data Visualization
39
Visualization of a decision tree in MineSet 3.0
Data Visualization
40
Display of Clustering (Segmentation) Results
Data Visualization
41
C-BIRD: Content-Based Image Retrieval fr
om Digital libraries
Search

by image colors

by color percentage

by color layout

by texture density

by texture Layout

by object model
by illumination
invariance


Data Visualization
by keywords
42
Multi-Dimensional Search in Multimedia
Databases Color layout
Data Visualization
43
Multi-Dimensional Analysis in
Multimedia Databases
Color histogram
Texture layout
Data Visualization
44
Mining Multimedia Databases
Refining or combining searches
Search for “airplane in blue sky”
(top layout grid is blue and
keyword = “airplane”)
Search for “blue sky and
green meadows”
Search for “blue sky”
(top layout grid is blue
and bottom is green)
(top layout grid is blue)
Data Visualization
45
Multidimensional Analysis of
Multimedia Data
• Multimedia data cube
– Design and construction similar to that of traditional data cubes from relational
data
– Contain additional dimensions and measures for multimedia information, such as
color, texture, and shape
• The database does not store images but their descriptors
– Feature descriptor: a set of vectors for each visual characteristic
• Color vector: contains the color histogram
• MFC (Most Frequent Color) vector: five color centroids
• MFO (Most Frequent Orientation) vector: five edge orientation centroids
– Layout descriptor: contains a color layout vector and an edge layout vector
Data Visualization
46
Mining Multimedia Databases in
Data Visualization
47
Data Visualization
48
Mining Multimedia Databases
The Data Cube and
the Sub-Space Measurements
By Size
By Format
By Format & Size
RED
WHITE
BLUE
Cross Tab
JPEG GIF
By Colour
By Colour & Size
RED
WHITE
BLUE
Group By
Colour
Sum
By Format
Sum
RED
WHITE
BLUE
Measurement
Sum
Data Visualization
By Format & Colour
By Colour
• Format of image
• Duration
• Colors
• Textures
• Keywords
• Size
• Width
• Height
• Internet domain of image
• Internet domain of parent pages
49
• Image popularity
Mining Multimedia Databases
Spatial Relationships from Layout
property P1 on-top-of property P2
property P1 next-to property P2
Different Resolution Hierarchy
Data Visualization
50
Data Visualization
51
Data Visualization
52
Data Visualization
53
Classification in MultiMediaMiner
Data Visualization
54
Mining Associations in Multimedia Data
• Special features:
– Need # of occurrences besides Boolean existence, e.g.,
• “Two red square and one blue circle” implies theme “air-show”
– Need spatial relationships
• Blue on top of white squared object is associated with brown bottom
– Need multi-resolution and progressive refinement mining
• It is expensive to explore detailed associations among objects at high
resolution
• It is crucial to ensure the completeness of search at multi-resolution space
Data Visualization
55
Data Visualization
56
Text Miner: Feature Extracton example from
IBM Intelligent Miner
Data Visualization
57
Visual Data Mining & Data Visualization
• Integration of visualization and data mining
–
–
–
–
data visualization
data mining result visualization
data mining process visualization
interactive visual data mining
• Visual Data Mining: the process of discovering
implicit but useful knowledge from large data sets
using visualization techniques
• Data visualization
– Data in a database or data warehouse can be viewed
• at different levels of granularity or abstraction
• as different combinations of attributes or dimensions
– Data can be presented in various visual forms
Data Visualization
58
Boxplots from Statsoft: multiple variable
combinations
Data Visualization
59
Visualization of data mining results in SAS
Enterprise Miner: scatter plots
Data Visualization
60
Visualization of association rules in
MineSet 3.0
Data Visualization
61
Visualization of cluster groupings in IBM
Intelligent Miner
Data Visualization
62
GeoMiner Visualization Example
Data Visualization
63
Spatial Clustering
Data Visualization
64
Spatial Association
•
Association Rules
– isa(X, "Golf Course") -> closeto(X, "Man-Made Channel") (61%,
61%). isa(X, "Golf Course") & closeto(X, "Secondary road") ->
closeto(X, "Open space") (64%, 78%).
Data Visualization
65
Data Mining Process Visualization
• Presentation of the various processes of data mining in visual forms so
that users can see
–
–
–
–
–
–
How the data are extracted
From which database or data warehouse they are extracted
How the selected data are cleaned, integrated, preprocessed, and mined
Which method is selected at data mining
Where the results are stored
How they may be viewed
Data Visualization
66
Visualization of Data Mining Processes
by Clementine
Data Visualization
67
Interactive Visual Data Mining
• Using visualization tools in the data mining process to help
users make smart data mining decisions
• Example
– Display the data distribution in a set of attributes using colored sectors
or columns (depending on whether the whole space is represented by
either a circle or a set of columns)
– Use the display to which sector should first be selected for
classification and where a good split point for this sector may be
Data Visualization
68
Interactive Visual Mining by PerceptionBased Classification (PBC)
Data Visualization
69
Visual ICON Language
• Video Annotation Problem
– 과거에는 비디오 데이타들이 1회성으로 사용
– 전문가들이 주석을 달아 저장, 검색
– 현대는 반복 재사용 비디오의 시대
• 어떻게 비디오 데이터를 검색할 것인가?
• Keyword based approach 의 한계
– Do not describe temporal structure of video
– Not semantic representation
• ‘dog’ and ‘German shepherd’
– Do not describe relations between descriptions
• Only ‘man’, ‘dog’ ‘bite’ not “dog bite man”
– Do not scale, set of new keyword increase
Data Visualization
70
Language for representation of Video content
• ICON Annotation Language, why?
– Quick recognition and browsing of annotation
– Accurate and readable
– Global, international use
• Example
– 'Arnold, an adult male, wears a jacket'
– ‘scene is located inside a bar in United States of America’
– Character action: full body actions, head actions, arm actions, and
leg actions
Data Visualization
71
Language for representation of Video content
• Number of object
– single object, two objects, or groups of objects
• Media Timeline Editor
– Timeline annotation of Icon sentence
• Icon Space, icon palette,
– a utility for constructing and retrieving iconic
sentences
Data Visualization
72
Media Timeline Editor
Data Visualization
73
Icon Space
Data Visualization
74
ICONS used (sample)
Data Visualization
75
MIT Visual Language Project
Data Visualization
76
Some Words: Integration of Text and Visual
Icon
Data Visualization
77
Future Trend
•
•
•
•
Animated Visualization vs static visualization
3D Visualization vs 2D Visualization
3D with Animated Visualization
Cinematic Technique is becoming more and more
important for User Interface
– Lev Manovich, Professor of UCSD
– The language of New Media, 2000, MIT Press
• Find New metaphor
– Spiral Curve etc.
Data Visualization
78
Download