Objectives • Define the main roles of metadata in a warehousing environment. • Identify the problem of integrating data. Metadata Metadata -1 • Identify how metadata might be created and by whom, and where metadata might be stored. • Describe the contents of metadata. Metadata -2 Meta Data Management Overview • This is information about your enterprise data. • Very important to the warehouse • No business or fact data • Key to a successful data warehouse • Evolving and changing component • Important to control and manage External Sources Metadata Repository • It is data about data that describes the data warehouse. • It is data about data stored in the warehouse and its users. • It helps the user become self sufficient. It also plays a key role in integrating multiple data. • A mechanism to let information consumers know: – what data is available, Operational Data Sources – where it comes from, and Warehouse Metadata -3 – how current it is. Metadata -4 Metadata Defined • Metadata provides decision-support-oriented pointers to warehouse data, and thus provides a logical link between warehouse data and the decision support application. • The key to providing users and applications with a roadmap to the information stored in the warehouse is the metadata. • It can define all data elements and their attributes, data sources and timing, and the rules that govern data use and data transformations. Metadata -5 The most important person in the library is that smiling person behind the information desk who may be devoid of all actual content but is full of metadata about how to find actual content Metadata -6 1 Metadata Metadata Defined • Metadata defines the contents and location of data (data model) in the warehouse: – relationships between the operational databases and the data warehouse, and – the business views of the warehouse data that are accessible by end-user tools. • Metadata is searched by users to find data definitions or subject areas. • Metadata needs to be collected as the warehouse is designed and built. • The content of the information directory is the metadata that helps technical and business users exploit the power of data warehousing. • It is used for building, maintaining, managing, and using the data warehouse. • Functionally, It is classified into: • Technical Metadata: – defines the contents and location of data (data model) in the warehouse. – defines the relationships between the operational data and the data warehouse. • Business Metadata – It is also define the business views of the warehouse that are accessible by end-user tools. Metadata -7 Metadata -8 Technical Metadata Contains information about warehouse data for use by warehouse designers and administrators when carrying out warehouse development and management tasks. Technical metadata document include • Information about data sources • Transformation descriptions: the mapping method from databases into the warehouse, and algorithms used to convert or transform data • Warehouse object and data structure definitions for data targets Business Metadata Contains information that gives users an easy to understand perspective of the information stored in the data warehouse. • Subject areas and information object type, including queries, reports, images, video, and/or audio clips • Internet home pages • Other information to support all data warehousing components. • The rules used to perform data cleanup and data enhancement – the information about information delivery system may include: • Data mapping operations when capturing data from source systems and applying it to the target warehouse database – subscription information, scheduling information, details • Access authorization, backup history, archive history, information delivery history, data acquisition history, data access, etc. Metadata -9 of delivery destinations, and the business query objects, such as predefined queries, reports, and analyses. • Data warehouse operational information, e.g., data history, ownership, extract audit trail, usage data Metadata -10 Metadate Characteristics Types of Metadata • End user – Key to a good warehouse – Navigation aid – Information provider • ETT – Maps structure – Source and target information – Transformations – Context • Operational – Load, management, scheduling processes – Performance Metadata -11 At a minimum, metadata contains: • The location and description of warehouse system and data components (warehouse objects). • Names, definition, structure, and content of the data warehouse and enduser views. • Identification of authoritative data sources (systems of record). • Integration and transformation rules used to populate the data warehouse; these include the mapping method from operational databases into the warehouse, and algorithms used to convert, enhance, or transform data. Metadata -12 2 Metadate Characteristics Metadata Qualities • Integration and transformation rules used to deliver data to end-user analytical tools. • Must be integrated • Subscription information for the information delivery to the analysis subscribers. • Must reflect changes • Must show history and context • Data warehouse operational information, which includes a history of warehouse updates, refreshments, snapshots, versions, ownership authorizations, and extract audit trail. • Metrics used to analyze warehouse usage and performance (end-user) usage patterns. • Security authorizations, access control lists, etc. Metadata -13 Metadata -14 Metadata Life Cycle Metadata Users 1. Collection: IT staff • Identify metadata and capture it into central repository • To ensure high level of accuracy, the collection should be automated. ETT Metadata Repository 2. Maintenance End User Operational • Put in place processes to synchronize metadata automatically with the changing data architecture • To ensure high level of maintenance, it is important to automate as much of the metadata maintenance as possible Warehouse Mapping Users 3. Deployment • Provide meta data to users in the right form and with the right tools. • One key here is to correctly match the metadata offered to the specific needs of different audiences (developers, maintainers, end users) Metadata -15 Metadata -16 Metadata Management • Since metadata describes the information in the warehouse from multiple viewpoints – input, sources, transformation, access, etc. • It is important that the same metadata or its replicas be available to all tools selected for the warehouse implementation to enforce the integrity and accuracy of the warehouse information. • The metadata also has to be available to all warehouse users in order to guide them as they use the warehouse. • A well-thought-through strategy for collecting, maintaining, and distributing metadata is needed for a successful data warehouse implementation. Metadata -17 Metadata • Provides interactive access to users to help understand data content, find data. • Metadata is one of the most important aspects of data warehousing. • Metadata infrastructure is also very important. • The infrastructure that enables metadata’s – storage, – management, and – integration with other components of a data warehouse. • This infrastructure is known as metadata repository • Metadata repository is the information directory. Metadata -18 3 Metadata Repository Metadata Repository Hollywood External sources Internal sources Cu Browser: Browser: stom http:// X + Hollywood • This directory helps integrate, maintain, and view the contents of the data warehousing system. er+s:X a reco as rof Browser: Cu http:// stom Operational data sources http:// Hollywood X er+s: Data Warehouse Data Model 12345.00 12780.00 100% ABC CO 110% GMBH LTD • It provides a logical link between warehouse data and the decision support applications. 2345787.00 230% GBUK INC 87877.98 5678.00 • It provides decision support oriented pointers to warehouse data. 200% FFR ASSOC -10% MCD CO Metadata Repository Warehouse • Map from source systems into the warehouse Metadata -19 • It is searched by users to find data definitions or subject areas. • The warehouse design should prevent any direct access to the warehouse data if it does not use metadata definitions to gain the access. Metadata -20 Metadata Repository • Metadata repository management software can be used to: – map the source data to the target database – generate code for data transformations – integrate and transform the data, and – control moving data to the warehouse. • This software, which typically runs on a workstation, enables users to specify how the data should be transformed, such as data mapping, conversion, and summarization. Metadata -21 Metadata Repository Benefits • Metadata repository implemented as a part of the data warehouse framework provides the following benefits: – It provides a comprehensive suite of tools for enterprisewide metadata management. – It reduces and eliminates information redundancy, inconsistency, and underutilization. – It simplifies management and improves organization, control, and accounting of information assets. – It increases identification, understanding, coordination, and utilization of enterprisewide information assets. Metadata -22 Metadata Repository Benefits • It provides effective data administration tools to better manage corporate information assets with fullfunction data dictionary. • It increases flexibility, control, and reliability of the application development process and accelerates internal application development. • It leverages investment in legacy systems with the ability to inventory and utilize existing applications. • It provides a universal relational model for heterogeneous RDBMSs to interact and share information. • It enforces CASE development standards and eliminates redundancy with the ability to share and Metadata -23 reuse metadata. A Metadata Strategy • Define a strategy • Ensure high quality metadata • Provide users with quality information • Enables metalayer integration – Targets, goals, types – Source and location – Maintenance and management – Standards – Access and tools – Integration and evolution Metadata -24 4 Targets and Goals Types and Source of Metadata • Intention • Who are the metadata users? • Requirements • What do they need? • Access - who and how • What should metadata contain? • Source identification • What tool shall I use to create metadata? • Integration approach • Evolution and change management Metadata -25 Metadata -26 Techniques Location of Metadata • Data modeling tools • Usually the warehouse server • Database schema definitions • Maybe on operational platforms • ETT tools • Desktop tool with metalayer • End user tools • COBOL copybooks • Maintained by metadata architect • Middleware tools • Managed by metadata manager • Standards produced by metadata architect Metadata -27 Metadata -28 Access and Tools Integration and Change • Access to who? • Metalayer integration issues • When? • Metadata exchangeability desirable • To what? • Manage changing metadata • Tools for management • Consider refresh cycles • Tools for query • Tools for development Metadata -29 Metadata -30 5 Extraction Metadata Extraction Metadata • Business rules • Space and storage requirements • Source tables, fields and key values • Ownership • Source location information External Sources • Field conversions • Diverse source data External Sources • Access information • Encoding and reference table Staging File Operational Data Sources • Key value changes • Default values • Security Extraction • Name changes Extraction • Contacts Operational Data Sources • Program names • Frequency details • Logic to handle multiple sources • Failure procedures • Algorithms • Validity checking information • Time stamp Metadata -31 Metadata -32 Transformation Metadata Transportation Metadata • Method of transfer • Duplication routines • Frequency • Exception handling • Key restructuring • Grain conversions • Validation procedures External Sources • Failure procedures • Deployment rules • Program names Transform • Frequency • Summarization • Manual exercise Staging File • Contact information Metadata repository ETT External sources Transport Operational Data Sources Mapping Staging file Warehouse Operational data sources Metadata -33 Metadata -34 User Metadata User Metadata 739516 1816 666 15 17.62 • Need to know the context of the table queried Table Name Column Name Data Meaning • Associate the metadata description • Analogous to Oracle Data Dictionary views Metadata repository End User Product Prodid 739516 Unique identifier for the product Product Valid_date 01/97 Last refresh date Product Ware_loc 1816 Warehouse location number Product Ware_bin Product Code 666 15 Product Weight 17.62 Warehouse bin number The color of the product; please refer to table COL_REF for details Packed shipping weight in kilograms Warehouse Metadata -35 Metadata -36 6 User Metadata Context of Data • Supports change history • Location of fact and dimensions • Maintains the context of information • Availability • Description of contents • Algorithms for derived and summary data • Owners of data and telephone number Operational Warehouse Metadata repository Metadata repository Structure End User Content 92 Warehouse Metadata -37 93 94 95 96 Metadata -38 Context of Data Additional Metadata Contents and Considerations • Simple • Summarization algorithms – Data structures • Relationships – Naming conventions – Metrics • Stewardship Warehouse • Permissions • Complex – Product definitions • Pattern analysis – Markets • Reference tables – Pricing • External 92 93 94 95 96 – Economic – Political Metadata -39 Metadata -40 Metadata Management Tools • Carleton • Evolutionary Technologies • Hewlett Packard • Informatica • Information Advantage • Oracle Designer/2000 • Platinum Technology • Prism Solutions • Sagent Metadata -41 7