Objectives

advertisement
Objectives
• Define the main roles of metadata in a
warehousing environment.
• Identify the problem of integrating data.
Metadata
Metadata -1
• Identify how metadata might be created and by
whom, and where metadata might be stored.
• Describe the contents of metadata.
Metadata -2
Meta Data Management
Overview
• This is information about your enterprise data.
• Very important to the warehouse
• No business or fact data
• Key to a successful data warehouse
• Evolving and changing component
• Important to control and manage
External
Sources
Metadata
Repository
• It is data about data that describes the data
warehouse.
• It is data about data stored in the warehouse and its
users.
• It helps the user become self sufficient. It also plays
a key role in integrating multiple data.
• A mechanism to let information consumers know:
– what data is available,
Operational
Data
Sources
– where it comes from, and
Warehouse
Metadata -3
– how current it is.
Metadata -4
Metadata Defined
• Metadata provides decision-support-oriented
pointers to warehouse data, and thus provides a
logical link between warehouse data and the
decision support application.
• The key to providing users and applications with a
roadmap to the information stored in the
warehouse is the metadata.
• It can define all data elements and their attributes,
data sources and timing, and the rules that govern
data use and data transformations.
Metadata -5
The most important person in
the library is that smiling person
behind the information desk who
may be devoid of all actual
content but is full of metadata
about how to find actual content
Metadata -6
1
Metadata
Metadata Defined
• Metadata defines the contents and location
of data (data model) in the warehouse:
– relationships between the operational databases and
the data warehouse, and
– the business views of the warehouse data that are
accessible by end-user tools.
• Metadata is searched by users to find data
definitions or subject areas.
• Metadata needs to be collected as the
warehouse is designed and built.
• The content of the information directory is the
metadata that helps technical and business users
exploit the power of data warehousing.
• It is used for building, maintaining, managing, and
using the data warehouse.
• Functionally, It is classified into:
• Technical Metadata:
– defines the contents and location of data (data model) in
the warehouse.
– defines the relationships between the operational data and
the data warehouse.
• Business Metadata
– It is also define the business views of the
warehouse that are accessible by end-user tools.
Metadata -7
Metadata -8
Technical Metadata
Contains information about warehouse data for use by
warehouse designers and administrators when carrying
out warehouse development and management tasks.
Technical metadata document include
• Information about data sources
• Transformation descriptions: the mapping method from
databases into the warehouse, and algorithms used to convert
or transform data
• Warehouse object and data structure definitions for data targets
Business Metadata
Contains information that gives users an easy to
understand perspective of the information stored in the
data warehouse.
• Subject areas and information object type, including queries,
reports, images, video, and/or audio clips
• Internet home pages
• Other information to support all data warehousing
components.
• The rules used to perform data cleanup and data enhancement
– the information about information delivery system may include:
• Data mapping operations when capturing data from source
systems and applying it to the target warehouse database
– subscription information, scheduling information, details
• Access authorization, backup history, archive history,
information delivery history, data acquisition history, data
access, etc.
Metadata -9
of delivery destinations, and the business query objects,
such as predefined queries, reports, and analyses.
• Data warehouse operational information, e.g., data history,
ownership, extract audit trail, usage data
Metadata -10
Metadate Characteristics
Types of Metadata
• End user
– Key to a good warehouse
– Navigation aid
– Information provider
• ETT
– Maps structure
– Source and target information
– Transformations
– Context
• Operational
– Load, management, scheduling processes
– Performance
Metadata -11
At a minimum, metadata contains:
• The location and description of warehouse system
and data components (warehouse objects).
• Names, definition, structure, and content of the data
warehouse and enduser views.
• Identification of authoritative data sources (systems of
record).
• Integration and transformation rules used to populate
the data warehouse; these include the mapping
method from operational databases into the
warehouse, and algorithms used to convert, enhance,
or transform data.
Metadata -12
2
Metadate Characteristics
Metadata Qualities
• Integration and transformation rules used to deliver
data to end-user analytical tools.
• Must be integrated
• Subscription information for the information
delivery to the analysis subscribers.
• Must reflect changes
• Must show history and context
• Data warehouse operational information, which
includes a history of warehouse updates,
refreshments, snapshots, versions, ownership
authorizations, and extract audit trail.
• Metrics used to analyze warehouse usage and
performance (end-user) usage patterns.
• Security authorizations, access control lists, etc.
Metadata -13
Metadata -14
Metadata Life Cycle
Metadata Users
1. Collection:
IT staff
• Identify metadata and capture it into central repository
• To ensure high level of accuracy, the collection should be
automated.
ETT
Metadata
Repository
2. Maintenance
End
User
Operational
• Put in place processes to synchronize metadata automatically
with the changing data architecture
• To ensure high level of maintenance, it is important to automate
as much of the metadata maintenance as possible
Warehouse
Mapping
Users
3. Deployment
• Provide meta data to users in the right form and with the right
tools.
• One key here is to correctly match the metadata offered to the
specific needs of different audiences (developers, maintainers,
end users)
Metadata -15
Metadata -16
Metadata Management
• Since metadata describes the information in the
warehouse from multiple viewpoints
– input, sources, transformation, access, etc.
• It is important that the same metadata or its replicas be
available to all tools selected for the warehouse
implementation to enforce the integrity and accuracy
of the warehouse information.
• The metadata also has to be available to all warehouse
users in order to guide them as they use the
warehouse.
• A well-thought-through strategy for collecting,
maintaining, and distributing metadata is needed for a
successful data warehouse implementation.
Metadata -17
Metadata
• Provides interactive access to users to help
understand data content, find data.
• Metadata is one of the most important aspects of
data warehousing.
• Metadata infrastructure is also very important.
• The infrastructure that enables metadata’s
– storage,
– management, and
– integration with other components of a data
warehouse.
• This infrastructure is known as metadata repository
• Metadata repository is the information directory.
Metadata -18
3
Metadata Repository
Metadata Repository
Hollywood
External
sources
Internal
sources
Cu
Browser: Browser:
stom
http://
X
+
Hollywood
• This directory helps integrate, maintain, and view the
contents of the data warehousing system.
er+s:X
a reco
as rof
Browser:
Cu
http://
stom
Operational
data
sources
http://
Hollywood
X
er+s:
Data Warehouse
Data Model
12345.00
12780.00
100% ABC CO
110% GMBH LTD
• It provides a logical link between warehouse data and
the decision support applications.
2345787.00 230% GBUK INC
87877.98
5678.00
• It provides decision support oriented pointers to
warehouse data.
200% FFR ASSOC
-10% MCD CO
Metadata
Repository
Warehouse
• Map from source systems into the warehouse
Metadata -19
• It is searched by users to find data definitions or
subject areas.
• The warehouse design should prevent any direct
access to the warehouse data if it does not use
metadata definitions to gain the access.
Metadata -20
Metadata Repository
• Metadata repository management software can be
used to:
– map the source data to the target database
– generate code for data transformations
– integrate and transform the data, and
– control moving data to the warehouse.
• This software, which typically runs on a workstation,
enables users to specify how the data should be
transformed, such as data mapping, conversion, and
summarization.
Metadata -21
Metadata Repository Benefits
• Metadata repository implemented as a part of the
data warehouse framework provides the following
benefits:
– It provides a comprehensive suite of tools for
enterprisewide metadata management.
– It reduces and eliminates information
redundancy, inconsistency, and underutilization.
– It simplifies management and improves
organization, control, and accounting of
information assets.
– It increases identification, understanding,
coordination, and utilization of enterprisewide
information assets.
Metadata -22
Metadata Repository Benefits
• It provides effective data administration tools to
better manage corporate information assets with fullfunction data dictionary.
• It increases flexibility, control, and reliability of the
application development process and accelerates
internal application development.
• It leverages investment in legacy systems with the
ability to inventory and utilize existing applications.
• It provides a universal relational model for
heterogeneous RDBMSs to interact and share
information.
• It enforces CASE development standards and
eliminates redundancy with the ability to share and
Metadata -23
reuse metadata.
A Metadata Strategy
• Define a strategy
• Ensure high quality metadata
• Provide users with quality information
• Enables metalayer integration
– Targets, goals, types
– Source and location
– Maintenance and management
– Standards
– Access and tools
– Integration and evolution
Metadata -24
4
Targets and Goals
Types and Source of Metadata
• Intention
• Who are the metadata users?
• Requirements
• What do they need?
• Access - who and how
• What should metadata contain?
• Source identification
• What tool shall I use to create metadata?
• Integration approach
• Evolution and change management
Metadata -25
Metadata -26
Techniques
Location of Metadata
• Data modeling tools
• Usually the warehouse server
• Database schema definitions
• Maybe on operational platforms
• ETT tools
• Desktop tool with metalayer
• End user tools
• COBOL copybooks
• Maintained by metadata architect
• Middleware tools
• Managed by metadata manager
• Standards produced by metadata architect
Metadata -27
Metadata -28
Access and Tools
Integration and Change
• Access to who?
• Metalayer integration issues
• When?
• Metadata exchangeability desirable
• To what?
• Manage changing metadata
• Tools for management
• Consider refresh cycles
• Tools for query
• Tools for development
Metadata -29
Metadata -30
5
Extraction Metadata
Extraction Metadata
• Business rules
• Space and storage requirements
• Source tables, fields and key values
• Ownership
• Source location information
External
Sources
• Field conversions
• Diverse source data
External
Sources
• Access information
• Encoding and reference table
Staging
File
Operational
Data
Sources
• Key value changes
• Default values
• Security
Extraction
• Name changes
Extraction
• Contacts
Operational
Data
Sources
• Program names
• Frequency details
• Logic to handle multiple sources
• Failure procedures
• Algorithms
• Validity checking information
• Time stamp
Metadata -31
Metadata -32
Transformation Metadata
Transportation Metadata
• Method of transfer
• Duplication routines
• Frequency
• Exception handling
• Key restructuring
• Grain conversions
• Validation procedures
External
Sources
• Failure procedures
• Deployment rules
• Program names
Transform
• Frequency
• Summarization
• Manual exercise
Staging
File
• Contact information
Metadata
repository
ETT
External
sources
Transport
Operational
Data
Sources
Mapping
Staging
file
Warehouse
Operational
data
sources
Metadata -33
Metadata -34
User Metadata
User Metadata
739516 1816 666 15 17.62
• Need to know the context of the table queried
Table
Name
Column
Name
Data
Meaning
• Associate the metadata description
• Analogous to Oracle Data Dictionary views
Metadata
repository
End
User
Product Prodid
739516 Unique identifier for the product
Product Valid_date
01/97
Last refresh date
Product Ware_loc
1816
Warehouse location number
Product Ware_bin
Product Code
666
15
Product Weight
17.62
Warehouse bin number
The color of the product; please
refer to table COL_REF for details
Packed shipping weight in
kilograms
Warehouse
Metadata -35
Metadata -36
6
User Metadata
Context of Data
• Supports change history
• Location of fact and dimensions
• Maintains the context of information
• Availability
• Description of contents
• Algorithms for derived and summary data
• Owners of data and telephone number
Operational
Warehouse
Metadata
repository
Metadata
repository
Structure
End
User
Content
92
Warehouse
Metadata -37
93
94
95
96
Metadata -38
Context of Data
Additional Metadata Contents and
Considerations
• Simple
• Summarization algorithms
– Data structures
• Relationships
– Naming conventions
– Metrics
• Stewardship
Warehouse
• Permissions
• Complex
– Product definitions
• Pattern analysis
– Markets
• Reference tables
– Pricing
• External
92
93
94
95
96
– Economic
– Political
Metadata -39
Metadata -40
Metadata Management Tools
• Carleton
• Evolutionary Technologies
• Hewlett Packard
• Informatica
• Information Advantage
• Oracle
Designer/2000
• Platinum Technology
• Prism Solutions
• Sagent
Metadata -41
7
Download