metadata - Computer Information Systems

advertisement
METADATA
Data
Warehouse
success
depends on
metadata
Overview
•
•
•
•
What is metadata?
Why is it needed?
Types of metadata
Metadata life cycle
Better end user data access and analysis tools
can help users figure out how to get information
they need out of the warehouse, but only good,
easily accessible metadata can help them figure
out what is available in the data warehouse
and how to ask for it.
Data Warehouse Process
Data Characteristics
• Raw Detail
• No/Minimal History
• Integrated • History
•Scrubbed •Summaries
• Targeted
• Specialized (OLAP)
Source OLTP
Systems
Data Marts
Data
Warehouse
•Design
•Mapping
•Extract
•Scrub
•Transform
•Load
•Index
•Aggregation
•Replication
•Data Set Distribution
Meta Data
System Monitoring
Copyright © 1997, Enterprise Group, Ltd.
•Access & Analysis
•Resource Scheduling & Distribution
Meta Data Description
• Information about the data warehouse
system
–
–
–
–
–
–
–
Content
Organizational
Structural
Management Information
Scheduling Information
Contact Information
Technical Information
Why Do You Need Meta Data?
• Share resources
– Users
– Tools
• Document system
• Without metadata
– Not Sustainable
– Not able to fully utilize resource
Metadata Life Cycle
• Collection - Identify metadata and capture into
repository; automate
• Maintenance - Put in place processes to synchronize
metadata automatically with changing data
architecture; automate
• Deployment - Provide metadata to users in the right
form and with the right tools; match metadata
offered to specific needs of each audience
Metadata Collection
• Right metadata at the right time
• Variety of collection strategies
• Sources
– potential sources of data for DW
– external data
– data structures
• Data Models - enterprise data model start point
– import from CASE tool
– correlate enterprise and warehouse models
Metadata Collection
• Warehouse mappings
– map operational data into warehouse data structure
– Need record of logical connection used for mapping and
transformation
• Warehouse usage information
–
–
–
–
After roll out
What tables accessed, by whom and for what
What queries written
Capture nature of business problem or query
Maintaining Metadata
• Up to date with reality
• Capture incremental changes
Metadata Deployment
• Warehouse developers need:
–
–
–
–
physical structure info for data sources
enterprise data model
warehouse data model
concerned with accuracy, completeness and flexibility of
metadata
– Need access to comprehensive impact analysis
capabilities
– Need to defend against accuracy & integrity questions
Meta Data
• Types
– Technical
– Business / User
• Levels
– Core
– Basic
– Deluxe
Core Technical Meta Data
• Source
• Target
• Algorithm
Basic Technical Meta Data
•
•
•
•
•
•
•
•
•
History of transformation changes
Business rules
Source program / system name
Source program author / owner
Extract program name & version
Extract program author / owner
Extract JCL / Script name
Extract JCL / Script author / owner
Load JCL / Script name
Basic Technical Meta Data
(con’t)
•
•
•
•
•
•
•
•
Load JCL / Script author / owner
Load frequency
Extract dependencies
Transformation dependencies
Load dependencies
Load completion date / time stamp
Load completion record count
Load status
Deluxe Technical Meta Data
•
•
•
•
•
•
•
•
•
Source system platform
Source system network address
Source system support contact
Source system support phone / beeper
Target system platform
Target system network address
Target system support contact
Target system support phone / beeper
Etc.
Core Business Meta Data
• Field / object description
• Confidence level
• Frequency of update
Basic Business Meta Data
• Source system name
• Valid entries (i.e. “There are three valid codes: A, B,
C”)
• Formats (i.e. Contract Date: 82/4/30)
• Business rules used to calculate or derive
the data
• Changes in business rules over time
Deluxe Business Meta Data
•
•
•
•
•
•
Data owner
Data owner contact information
Typical uses
Level of summarization
Related fields / objects
Existing queries / reports using this field /
object
• Estimated size (tables / objects)
Amount of Meta Data
• How much Meta Data do I need?
• As much as you can support!
The Meta Data Conundrum
• Meta Data is absolutely required for success
• Meta Data is 99% Manual
Cold, Hard Reality
5,000 data mart fields
7 manually populated and maintained meta data fields
35,000 total manual meta data fields
Copyright © 1997, Enterprise Group, Ltd.
The Meta Data Conundrum
•
•
•
•
Can you support 35,000 Meta Data fields?
Calculate available ongoing resources
Commit only to what you can maintain
You MUST deliver core, probably some
basic to be viable
Meta Data Functions - Technical
•
•
•
•
Maintenance
Troubleshooting
Documentation
Logging / Metrics
Meta Data Location
• DB Resident
–
–
–
–
Almost always relational
C/S predominantly
Normalized design
OODB is popular option for proprietary
solutions
Repository
• Specialized databases designed to maintain
metadata, together with tools and interfaces
that allow a company to collect and
distribute its metadata
• Repository Requirements
– Logically Common
– Open
– Extensible
Multiple Repository
• Upside
– Local instance, quick response
– Local view
• Users don’t have to wade through other’s material
• Downside
–
–
–
–
More challenging implementation
Advanced replication
Requires maintenance resources
More susceptible to architecture modification to remote
instances
Multiple Repository
Where do I find all
the information
about sales?
•Requires multiple access points
•Requires more system resources
Copyright © 1997, Enterprise Group, Ltd.
Data Mart
Meta Data
Data
Marts
Common Repository
• Upside
– Optimum solution
– Avoids replication challenges
– Allows central management/access
• Downside
– Requires remote access for remote DM’s
– More network infrastructure
– May require gateways
Common Repository
Where do I find all
the information
about sales?
Data Mart
Meta Data
•Single access point for all information resources
•Low system resources required
Copyright © 1997, Enterprise Group, Ltd.
Data
Marts
Meta Data Process
• Integrated with entire process and data flow
– Populated from beginning to end
– Begin population at design phase of project
– Dedicated resources throughout
• Build
• Maintain
•Design
•Mapping
•Extract
•Scrub
•Transform
•Load
•Index
•Aggregation
•Replication
•Data Set Distribution
Meta Data
System Monitoring
Copyright © 1997, Enterprise Group, Ltd.
•Access & Analysis
•Resource Scheduling & Distribution
Meta Data Vision vs. Reality
• Standards
– OMG standard (June 2000)
•
•
•
•
Common Warehouse Metadata Model
XML based
Supported by Oracle
Designed by Oracle, Unisys, IBM, NCR and
Hyperion
• Industry initiatives just taking hold
• Proprietary solutions inadequate
• Who is missing?
Meta Data Challenges
• The Meta Data conundrum
• Thin tool support (pairing standards, MSFT
coming)
• Hidden resource trap
• Absolute requirement for success
Web Sites
• List of metadata tools
http://www.dwinfocenter.org/catalog.html
• Universal Metadata
http://www.eaijournal.com/DataIntegration/HolyG
rail.asp
• Metadata Project
http://www.dis.state.ar.us/DIS_Proj/EDWR/Metad
ata/MD_Home.htm
Download