Transportation: Loading Warehouse Data Chapter 12 Transporting data into the Warehouse Loading moves the data into the warehouse Loading can be time-consuming: - Consider the load window. - Schedule the task; automates all processes. Initial load moves large volumes Subsequent refresh moves smaller volumes Business determines the cycle Extract Processing Environment Operational database T1 T2 T3 After each time interval, build a new database Run queries Warehouse Processing Environment Operational database T1 T2 T3 Build a new database After each time interval, add changes to database Archive or pure oldest data Run queries First-Time Load Single event that populates the data with historical data Involves large volume of data Employs distinct ETT tasks Involves large amounts of processing after load Operational database T1 T2 T3 Refresh Performed according to a business cycle Simple task Less data to load than first-time load Less-complex ETT Smaller amounts of postload processing Operational database T1 T2 T3 Building the Transportation Process Specification Techniques and tools File transfer methods The load window Time window for other tasks First-time and refresh cycle Connectivity bandwidth Building the Transportation Process Test the proposed techniques Document proposed load Gain agreement on the process Monitor Review Revise Granularity Important design and operational issue Space requirements - Storage - Backup - Recovery - Load Low-level grain - Expensive, high level of processing, more disk, detail High-level grain - Cheaper, less processing, less disk, little detail Transportation Techniques Tools Utilities and 3GL Gateways Customized copy programs Replication FTP Manual Transportation Technique Considerations Tools are comprehensive but costly. Data-movement utilities are fast and powerful. Gateways are not always the fastest method: - Access other databases - Supply dependent data marts - Support a distributed environment - Provide real-time access if needed Using SQL*Loader to Load Data Input files Control file Fastest load mechanism Bad files Log files SQL*Loader Direct path Parallel and unrecoverable Discard files Direct-load INSERT (Oracle8) Direct-path load API (Oracle8i) Direct-Path Load API in Oracle8i Allows ETT and other tools to load Oracle databases efficiently Permits load behavior to be customized Gives direct-path load performance Provided complete access to all directload functionality using OCI More Transportation Technique Considerations Use customized programs as a last resort Replication is limited by data-transfer rates Postprocessing of Loaded Data Transform Extract Create Indexes Transport Generate keys Postprocessing of loaded data Summarize Filter Indexing Data Before load: fast index reenablement During load: adds time to load window After load: adds time to load window Index Operational databases Staging files Warehouse database Unique Indexes Disable constraints load Enable constraints to create index Disable Load Enable constraints data constraints Create Catch Reprocess index errors Creating Artificial Keys Use generalized or derived keys Maintain the uniqueness of a row Use an administrative process to assign the key Concatenate operational key with number: - Easy to maintain - Cumbersome keys - No clean value for retrieval Creating Unique Keys for Records Assign a number from a list: - No semantic meaning - Extract operations must reference table assign numbers 109908 Update metadata Verdict 1 Creating Summary Tables CTAS pCTAS Summary data Warehouse Data marts Verifying Data Integrity Load File 1 File 2 Load data into intermediate file Compare target flash totals with totals before load Counts and amounts Flash totals Intermediate Warehouse file File 1 Load File 2 Preserve, inspect, fix, then load Steps for Verifying Data Integrity Standard Quality Assurance Checks Load status Completion of the process Completeness of the data Data reconciliation Violations Reprocessing Comparison of counts and amounts Summary This lesson discussed the following topics: First-time load considerations Techniques for transporting data Tasks involved in the postload processing stage