Building Mashups By Example Rattapoom Tuchinda, Pedro Szekely

advertisement
City of Charlotte Data Warehousing
and Business Intelligence
and
Building Mashups By Example by
Rattapoom Tuchinda, Pedro
Szekely, and Craig A. Knoblock
by Doris Phillips
6010 Data Integration
UNC Charlotte
Business Decision Making
My
KBU
ORG’s
Which?
(Raper, J., Building a Data Warehouse in a Heterogeneous Tool Environment)
Overview
• Describe current data warehousing/business
intelligence methods at the City of Charlotte
• Present Building Mashups By Example by
Rattapoom Tuchinda, Pedro Szekely, and Craig
A. Knoblock
• Compare data warehousing techniques to
data integration methods for mashups
3
Focus on Processes Common to Data
Warehousing and Data Integration
•
•
•
•
Data Retrieval
Data Modeling
Data Cleaning
Data Integration
Common processes but not the same!
4
Data Warehousing at the City of
Charlotte
5
DISCLAIMER
The views and opinions presented in this paper are solely
those of the Author and do not necessarily reflect those of
Business Support Services Information Technology Division
or of the City of Charlotte. This material is provided for
informational purposes only. City of Charlotte assumes no
responsibility for accuracy of the information in this paper
or from damages caused by implemented the techniques or
methodologies presented herein.
City of Charlotte
Simplified KBU Organization Chart
Mayor
and
City Council
City Manager
Offices and
Commissions
Utilities
Budget
Roads, Streets,
Bridges,
Intersections
Neighborhood
Development
Bus and Rail
Fire
Airport
Police
Human
Resources
Business Support
Services
Finance
Solid Waste
Engineering and
Property
Management
(Raper, J., Building a Data Warehouse in a Heterogeneous Tool Environment)
Sample Data Sources
•
•
•
•
•
•
Accounts Payable
Accounts Receivable
Asset Center
Emerald
Faster
General Ledger
•
•
•
•
•
•
Kronos
Hansen
PeopleSoft
Remedy
Unisys Helpdesk
Utility Billing System
HUB and Spoke Design
SOURCES
TARGET DATA MARTS
DB2
DB
FIN
MS/SS
MART
ETL
MS/
SS
DB
ORA
DATA VAULT
ODS
DV
DTS
SWS
MS/SS
MART
OWB
DB
Oracle
OWB
UTL
ORA
MART
FILES
(Raper, J., Building a Data Warehouse in a Heterogeneous Tool Environment)
Data Vault Components
•
•
•
•
•
Hubs - Key Topics
Links – Relationships
Satellites – Details
Auxiliaries – Support
Implemented as
Relational Tables
• Diagramed Using
Distinctive Shapes
HUB_XXXX
Hub Record Shape
LNK_XXXX
Link Record Shape
Satellite Record Shape
SAT_XXXX_YYYYYY
AUX_XXXX
Auxiliary Record Shape
(Raper, J., Phillips, D., User-Managed Metadata: Oracle Application Express Meets
Oracle Warehouse Builder 10.2)
10
Hub Tables
• Hub records contain the logical business and
physical keys to the business data and its
context.
• Relatively Stable over Time
• Keys
– Primary Key – Surrogate ID
– Natural Key – Unique Logical Key Fields
(Raper, J., Phillips, D., User-Managed Metadata: Oracle Application Express Meets
11
Oracle Warehouse Builder 10.2)
Link Tables
• Link records provide information related to relationships
between Hubs and Links.
• The information in link records can and does change over
time.
• May have one or more Satellite Type records
• Keys
– Primary Key – Surrogate ID
– Natural Key – Unique Logical Key Fields
– Foreign Key – HUB or LNK SIDs
(Raper, J., Phillips, D., User-Managed Metadata: Oracle Application Express Meets
12
Oracle Warehouse Builder 10.2)
Satellite Tables
• Satellite records provide the structure to hold the
context or descriptive type information from
operational systems.
• Maintain these changes over time.
• Related Directly to Hub or Link Records
• Keys
– Primary Key – Surrogate ID
– Foreign Key – HUB or LNK SIDs
(Raper, J., Phillips, D., User-Managed Metadata: Oracle Application Express Meets
13
Oracle Warehouse Builder 10.2)
Auxiliary Tables
• Auxiliary records contain a variety of cross reference and lookup
descriptions tied to logical business keys.
• Standalone Support Tables Not Directly Linked to HUB, SAT, or LNK
tables.
• Types of Auxiliaries
– Lookups
– Cross References
– External Tables
– ETL Work Tables
• Keys
– Primary Key is NK – Unique Logical Key Fields
– May have FK
(Raper, J., Phillips, D., User-Managed Metadata: Oracle Application Express Meets
14
Oracle Warehouse Builder 10.2)
Generic Data Vault Schema
SAT
SAT
SAT
SAT
LNK
HUB
HUB
SAT
SAT
AUX
SAT
AUX
AUX
(Raper, J., Phillips, D., User-Managed Metadata: Oracle Application Express Meets
15
Oracle Warehouse Builder 10.2)
Oracle Warehouse Builder Mapping
New Code Capture
(Raper, J., Phillips, D., User-Managed Metadata…)
16
Oracle Warehouse Builder Mapping
New Code Capture Detail
(Raper, J., Phillips, D., User-Managed Metadata…)
17
Data Cleaning
•
•
•
•
•
Convert to UPPER case
Trim blank spaces
Replace NULL within UNK or UNKNOWN
Check for valid values using lookups
Remove duplicates
18
Metadata Capture - Cross Reference
Organization Codes APEX Application
(Raper, J., Phillips, D., User-Managed Metadata…)
19
Data Warehouse Approach
• Extracted data from multiple heterogeneous
sources
• Converted to Data Vault Architecture
• Cleaned data and transformed into desired
format
• Combined data from multiple sources
• Data provided to users for reporting and data
visualizations
20
Building Mashups By Example
by
Rattapoom Tuchinda, Pedro Szekely,
and Craig A. Knoblock
21
Overview
• Mashup: A web application that integrates
data from multiple web sources to provide a
unique service
• Goal: Create a mashup building framework
where an average Internet user with no
programming experience can build Mashups
easily
22
Current Solutions
• Widget Paradigm
– Current Solutions involve selecting, customizing,
and connecting widgets together
• Disadvantages
– As number of widgets gets large, locating the right
widget becomes confusing and time consuming
– Connecting widgets required understanding
programming concepts
23
Widgets – Yahoo Pipes
(Tuchinda, et. al, p. 140)
24
Microsoft Popfly
United States Information
Widget - EDIT
United States Information
Widget - RUN
25
Mashup Building Process
• Data Retrieval
– Extracting data from web pages into a structured source
(table or XML)
• Source Modeling
– Process of assigning the attribute name for each data
column
• Data Cleaning
– Required to fix misspellings and transform extracted data
into the appropriate format
• Data Integration
– Specifies how to combine two or more data sources
together
26
Karma Solution
“The left window is an embedded web browser. The top right window contains a
table that a user would interact with. The lower right window shows options that
the user can select to get into different modes of operation.” (Tuchinda, et. al, p.
140)
27
Data Retrieval – Karma Table
User Selects Value from Page,
List Automatically Populated
(Tuchinda, et. al, p. 140-141)
User Selects Address for Value,
List Automatically Populated
28
Source Modeling - Attributes
Karma Attributes
• Karma compares extracted
data with existing data in its
repository
• Automatically populates
some attributes
• User specifying the correct
attribute
• Users search existing
attributes in data repository
Example
29
Data Cleaning
• Users specify what
data to clean
• Karma tries to pick
up desired
transformations
and populate the
remaining
columns
30
Data Integration
Karma
• Analyzes attributes and
data to determine
possible join conditions
• Suggests existing data
sources in the
repository that can be
linked to the new data
in the table
31
Data Integration
Karma Problems
• Locating the related
sources from the
repository
• Figuring out the query
to combine the new
source and existing
valid sources
32
Data Integration
Karma Solution
• Uses table constraints
• Uses programming
methods and
procedures induced
from user interaction
33
Mashup Building Approach
• Combines most problem areas in Mashup
building into a unified interactive framework
that requires no widgets
• Allows users with no programming
background to easily create Mashups by
example
34
Conclusions
Data Warehouse
• Data from enterprise
applications and maintained
sources
• Historical data for trending
and analysis
• Extract, Transform, and
Load
• Often Strategic information
Data Integration / Mashup
• Data from web that may
not be maintained or may
contain errors
• Real time data for current
information
• Extract, Model, and Clean
• Often tactical information
35
Resources
• Linstedt, D. http://www.danlinstedt.com/AboutDV.php. Nov.
9, 2008.
• Linstedt, D., Garziano, K., Hultgren, H., May 2008. The New
Business Supermodel: The Business of Data Vault Modeling.
• Raper, Jim., Dec. 2004. Building a Data Warehouse in a
Heterogeneous Tool Environment.
• Raper, Jim., Dec. 2004. From Source to Loading Dock with
Oracle Warehouse Builder
• Raper, Jim., Phillips, Doris., 2006. User-Managed Metadata:
Oracle Application Express Meets Oracle Warehouse Builder
10.2
• Tuchinda, R., Szekely, Pl., Knoblock, C., 2008. ACM., Building
Mashups by Example.
36
Download