Data Warehouse : Design and Lifecycle N. L. Sarda Professor, IIT Bombay nls@cse.iitb.ernet.in NLS/IITB/DWH 1 Outline • • • • • • • Introduction Warehouse structure A case study Lifecycle for development Dimensional analysis Technical architecture Conclusions NLS/IITB/DWH 2 Introduction • DW is a single, complete and consistent store of data from different sources to understand & analyze the business • Contains history data • Typical decision support requires data to be corelated, aggregated in an interactive manner • Warehouse to facilitate browsing, navigating, aggregating and visualization of related data to understand performance, problems, customer preferences, trends, etc. NLS/IITB/DWH 3 Introduction... • Conventional MIS/reporting applications lacked interactivity and flexibility • Warehouse data organized by important business subjects (customer, product, etc…) NLS/IITB/DWH 4 Warehouse Structure • Organized to facilitate ease of access and aggregation • warehouse structure decomposed into dimensions and facts – Dimensions like ‘independent variables’, represent entities for analysis – Fact represents business data; relates to a set of dimensions – Eg : customer, time, type of account are dimensions, and balances are facts NLS/IITB/DWH 5 Warehouse Structure... • The complex network of business entities and their relationships as depicted in an operational DB (using, say, ER model) is difficult for navigation and analysis • A ‘2-level’ structure defined by ‘star schema’ is performed where a fact is at the center and dimensions form ‘spokes’ • Data not stored in ‘normalized’ form NLS/IITB/DWH 6 Star Schema • Contains a fact table and for each dimension one dimension table date, custno, prodno, cityname, ... Time Cust NLS/IITB/DWH f a c t Prod City 7 Dimensions • • • • Stored as a database table Contains many descriptive attributes for analysis Small and slowly changing data Data often group-able for analysis – Customers by age, occupation, income level – Time by weeks, months, years – Branches as rural, suburban or by size • Thus, dimension data viewable as a hierarchy • For analysis, data here joined with facts NLS/IITB/DWH 8 Dimensions... • Joins very frequent; efficient access to dimension (through multiple indexes) and computation of join required • Heavily used in constraints and GROUP-BY NLS/IITB/DWH 9 Facts • Contain business activity data • May be at detailed level or status level; called transaction-oriented or snap-shot oriented • Deciding on granularity : every sale or total sales of a day ? • Often contain numeric attributes for aggregation (additive, semi-additive,…) • Contain dimensional table keys also NLS/IITB/DWH 10 Snowflake Schema • Hierarchies not captured explicitly in a star schema • Snowflake schema represents hierarchy directly • Saves on storage but requires more join NLS/IITB/DWH 11 Snowflake Schema • Represent dimensional hierarchy directly by normalizing tables. T i m e c u s t NLS/IITB/DWH p r o d date, custno, prodno, cityname, ... f a c t c i t y r e g i o n 12 Fact Constellation • Fact Constellation – Multiple fact tables that share many dimension tables – Booking and Checkout may share many dimension tables in the hotel industry Hotels Travel Agents NLS/IITB/DWH Promotion Booking Checkout Room Type Customer 13 Data Mart • A subset of warehouse for use by individuals or departments • Contents may be differently structured; may contain limited history; may be coarser / aggregated • Lightens load on central warehouse • Users primarily use marts with OLAP tools for analysis and decision support • refreshed periodically from central warehouse NLS/IITB/DWH 14 Aggregates • An aggregate is a fact table representing a summarization of base-level fact table data • It is a pre-calculated summaries that are stored in the data warehouse to improve query performance • Aggregates are used for speeding the queries by a factor of 100 or even 1000 • The IS owners of a data warehouse should exhaust the potential for using aggregates before investing in new hardware NLS/IITB/DWH 15 Warehouse Architecture • Building a single organization-wide WH that integrates all data from legacy systems is a very challenging task • data marts are subject/dept-wise and easier to build • multiple data marts must be relatable and interoperable across depts or business areas • Kimball proposes DW with a ‘bus architecture’; he proposes an architecture phase followed by construction of data marts independently and asynchronously NLS/IITB/DWH 16 WH Architecture ... • As marts come on-line, they fit with each other properly • this approach natural in most cases as extraction of data for WH building is often source-wise and needs to be done independently NLS/IITB/DWH 17 Conformed Dimensions and Facts • Goal is to produce a master suite of conformed dimensions and to standardize facts • resulting dimensions and facts for the ‘bus’ • conformed dimension means same thing with every fact table (eg., customer, time, geography) • it may contain data brought together from many sources • without conformed dimensions, a WH cannot function as a whole NLS/IITB/DWH 18 WH Architecture ... • Getting conformed dimensions represents 80 % up-front architecture effort • rest for conformed facts that ensures same terminology across data marts so that ‘drill across’ can be done (eg, price, profit) • ensures same units and meaning, same time durations and geographies across marts NLS/IITB/DWH 19 WH Architecture ... • Advantages of conformed dimensions – a single dimension table can be used against multiple fact tables in the same WH – user interfaces and data content are consistent whenever the dimension is used – there is consistent interpretation of attributes and rollups across marts – a new data mart can be created such that it can coexist with other • Use of conformed dimensions must be supported at the highest executive level NLS/IITB/DWH 20 Financial Services : A Case Study • A bank offers various products/services like saving/checking accounts, mortgage loans, personal loans, TD, credit cards, etc… • Purpose : track various a/c, customer profiles, etc…, for marketing and offering new services • Requirements: – Get end-of-month summary of a/c for last 5 years – Valid snapshot as of yesterday for current month (with full details) – Ability to group a/c in various ways & compare balances – demographic behavior NLS/IITB/DWH 21 Case Study ... • Each account type has some unique attributes (requiring customized dimension and facts for each) • Old data (a/c & customers ) may be incomplete or even different • The warehouse data may come from multiple sources : – – – – NLS/IITB/DWH Loan processing system(customer,loan,dues,payment) Fixed deposit system(customer,TD,…) Front-office system(customer, account, transaction,..) Credit-card system customer, transactions, interest,..) 22 Case Study ... • Must plan extraction, correlation, consistent representation,… • Let us consider a possible warehouse design for the indicated requirements • Core fact table : balance in each account, # of transactions, grain : month • Dimensions : a/c, household, branch, product, status, time • A/c and household separate : many accounts per family; household definitions change NLS/IITB/DWH 23 Case Study ... • Product dimension permits hierarchy and defining specific attributes; separate because it changes • Status : active or not, closed, etc. with reasons • Account contains customer’s data; for historical reasons, customer to accounts relationship not well maintained NLS/IITB/DWH 24 The household data warehouse account key primary_name secondary_name account_address account_city account_state account_zip date_opened primary_age primary_sex primary_marital household key household_head_name household_address household_city household_state household_zip household_income household_type NLS/IITB/DWH Household Facts account_key household_key branch_key product_key status_key time_key primary_balance transaction_count branch key branch-name branch_address branch_city branch_state branch_zip branch_type product key product_description type category status key status_description status_reason new_account_flag closed_account_flag time key month year fiscal_quarter 25 Case Study ... • Balance is semi-additive : can not be added across time • Products highly heterogeneous : different attributes characterize different accounts (balance, deposit options, interest rate, over draft limit,..) • Can’t combine all in a dimension as many not applicable to all products NLS/IITB/DWH 26 Case Study ... • Solution: create many facts, customized for each product, and one core fact with a product dimension having common attributes; leads to 100% replication, but facilitates clarifications, browsing, etc. and avoids join of customized and core facts • When many facts are to be stored together go for snapshot (eg. monthly) snapshots NLS/IITB/DWH 27 Case Study ... • Transaction-gained facts usually have a single fact (eg. amount) that is directly involved in the transaction; we need a transaction dimension to represent these amounts • In transaction grained fact table, we do not need customized facts tables per product; instead we create customized dimension tables NLS/IITB/DWH 28 Data Warehouse Life Cycle Project planning Business Requirement Definition Technical Architecture Design Product Selection & Installation Dimensional Modeling Physical Design End-User Application Specification Data Staging Design & Development Deployment Maintenence & Growth End-User Application Development Project Management NLS/IITB/DWH 29 Life Cycle Phases • Project planning – Life cycle begins with project planning and addresses the scoping of the project – focuses on resource and skill-level, staffing requirements, project task assignments, and duration • Business requirements definition – success of the project depends on the sound understanding of the business users and their requirements – Data warehouse designers must understand the key factors driving the business requirement and translate them into design considerations NLS/IITB/DWH 30 Phases ... • Dimensional modeling – Dimensional model is performed by combining data analysis with our earlier understanding of business requirements (represented as a matrix) – this step identifies the fact table grain, associated dimensions, attributes and hierarchical drill paths, and facts • Physical design – The primary elements in this phase are defining the naming standards and setting up the database environment – It focuses on defining the physical structures necessary to support the logical database design NLS/IITB/DWH 31 Phases ... • Data staging design and development – The data staging process has three major steps – Extraction • It exposes data quality issues within the operational system – Transformation • Consists of data re-structuring and type conversions (eg., form the EBCDIC character set to ASCII) – Load • Load the prepared data into the target tables NLS/IITB/DWH 32 Phases ... • Technical Architecture Design – It specifies the tools and techniques we will need to make DW happen • Product Selection and Installation – Architectural components such as Hardware platforms, DBMS, and Data staging tools • End user application specification – Application specification describe the report template, user driven parameters, and required calculations. • End user application Development NLS/IITB/DWH 33 Phases ... • Deployment – It is the convergence of technology, data, and end user applications accessible from the business user’s desktop – Business user education integrating all aspects of the convergence must be developed and delivered • Maintenance and growth – Data warehouse acceptance and performance metrics should be measured over time and the maintenance plan should include a communication strategy – Prioritization processes must be established to deal with user demands for evolution and growth NLS/IITB/DWH 34 Phases ... • Project management – Project management ensures that the business dimensional life cycle activities remain on track and synchronized – these activities occurs throughout the life cycle – It focuses on monitoring the project status, issue tracking, and change control to preserve scope – It includes the development of a comprehensive project communication plan that addresses both the business and information system organization • Use a good project management tool NLS/IITB/DWH 35 Life Cycle : summary • Project planning • Business requirements definition • Data track – Dimensional modeling – Physical design – Data staging design and development • Technology track – Technical architectural design – Product selection and and installation NLS/IITB/DWH 36 Life Cycle... • Application track – End user application specification – End user application development • Deployment • Maintenance and growth • Project management NLS/IITB/DWH 37 Assess Your Readiness • • • • • Strong business management sponsors Compelling business motivation IS/Business partnership Current analytic culture Feasibility NLS/IITB/DWH 38 Core Project Team • • • • • • Business system analyst Data modeler Data warehouse database administrator Data staging system designer End user application developers Data warehouse educator NLS/IITB/DWH 39 Special Teams • • • • • Technical/security architect Technical support specialists Data staging programmer Data administrator Data warehouse quality assurance analyst NLS/IITB/DWH 40 Develop the Project Plan • • • • • • • • • • Integrated and detailed Resources Original estimated effort Start date Original estimated completion date Current estimated completion date Status Effort to complete Dependencies Late flags NLS/IITB/DWH 41 Develop Communication Plan • • • • To manage expectations at all levels within project team : share scope, plans, status face-to-face communications with sponsors Business user community : inform what is there for them : capabilities, limitations, timeframes • Communication with other interested parties – Executive management – IS organization - to enable integration with existing and proposed systems – Organization at large NLS/IITB/DWH 42 Collecting Requirements • Dimensional Modeling Project Planning & Management Physical Design Business Requirements Maintenance and Growth Data Staging Design Deployment Planning NLS/IITB/DWH Technical Architecture Design End-User Application Specification 43 Collecting Requirements... • Interviews/write-ups • Requirements findings document – – – – – Project overview review of business objectives analytic and information requirements preliminary source systems analysis Preliminary success criteria • Prepare and publish the requirements • Agree on next step after collecting requirements • Facilitation for conforming and prioritization NLS/IITB/DWH 44 Collecting Data about Existing Systems • • • • Understanding the candidate data sources Source data ownership Data providers Detailed criteria for selecting the data sources – – – – Data accessibility Longevity of the feed Data accuracy Project scheduling • Customer matching and house-holding • Browsing and data content • Mapping data from source to target NLS/IITB/DWH 45 Designing the Data Warehouse / Data Marts • Identifying marts and dimensions • identify marts based on facts likely to be used together, as a mart is a kind of subject area or application (divide-and-conquer strategy) • often based on a single business process or a single source • 10 to 30 marts common for a large organization • build a matrix of marts versus dimensions NLS/IITB/DWH 46 Designing a Fact • Choose a data mart : start with single source data marts • Define fact grain based on the basic business facts stored in legacy systems • Choose dimensions and match them with granularity of facts • Combine as many facts as possible with the context of defined granularity NLS/IITB/DWH 47 Detailed Design Tips • Labels which name data marts, dimensions and attributes should be chosen carefully to refer to corresponding business entities • An attribute (in a dimension) is not replicated, but a fact may be present in many fact tables • If a dimension occurs multiple times (eg, time), it is playing multiple roles; name them uniquely • A single field in the underlying source data can have one or more logical columns associated with it (eg, product having code, description, etc) • Every fact should have a default aggregation rule so that it is not aggregated wrongly NLS/IITB/DWH 48 Data Modeling Tool • The advantages of data modeling tool are – Integrates the data warehouse model with other corporate data model – Helps assure consistency in naming – Creates good documentation – Generates physical schema – Provides a reasonably intuitive user interface for entering comments about objects NLS/IITB/DWH 49 Dimensional Modeling • Strength of dimensional modeling – It is predictable and standard framework – It makes the user interfaces more understandable and processing more efficient – The predictable frame work of a dimensional model allows both database systems and end user query tools to make strong assumptions about the data that aid in presentation and performance – It is gracefully extensible to accommodate unexpected new data elements and new design decisions – Number of standard approaches for handling Common modeling situations in the business world NLS/IITB/DWH 50 Dimension Attributes • The quality of the data warehouse is measured by the quality of the dimension attributes • The user interface responses and final reports are restricted to the precise contents of the dimension table attributes • Properties – Verbose, descriptive, complete – Quality assured, indexed – Equally available, documented NLS/IITB/DWH 51 Time Dimension • Every data warehouse fact table is a time series of some observations • We always seems to have one or more time dimensions in our fact table designs • Provides useful hierarchies : week, month, quarter, year, etc • Represents calendar with many useful attributes like day of week, day of month, week#, day#, quarter, weekday-flag, last-day-of-month-flag, holiday flag, etc. NLS/IITB/DWH 52 Slowly Changing Dimensions • The production key or customer key does not change, but the description of the product or customer does • The data warehouse has three options for above changes – Overwrite the dimension record with the new values, thereby losing history • It is used whenever the old value of the attribute has no significance • The corrections of any error falls into this category NLS/IITB/DWH 53 Slowly Changing Dimensions... – Create a new additional dimension record using a new value of the surrogate key • is primary technique for accurately tracking a change in an attribute within a dimension • requires use of a surrogate key • a slowly changing dimension is used when a true physical change to the dimension entity has taken place – Create an “old” field in the dimension record to store the immediate previous attribute value • It is used when a change is tentative NLS/IITB/DWH 54 Time Stamping the Changes • The design of slowly changing dimension may be established by adding begin and end time stamps and a transaction description in each instance of a dimension record • This design allows very precise time slicing of the dimension by itself NLS/IITB/DWH 55 Large Dimensions • Data warehouses that store extremely granular data may require some extremely large dimensions • To support large dimensions we must choose the indexing technologies and data design approaches that: – supports rapid browsing of the unconditional dimension, especially for low cardinality attributes – Supports efficient browsing of cross-constrained values in the dimension table – Find and suppress duplicate entries in the dimension NLS/IITB/DWH 56 Foreign Key, Primary Key, Surrogate Key • All dimensional tables have single keys, which, by definition, are primary keys • All data warehouse keys must be meaningless surrogate keys; you must not use the original production keys • A four byte integer makes a good surrogate key • Surrogate date keys • Avoid smart keys • Avoid production keys NLS/IITB/DWH 57 Heterogeneous Product Schemas • Multiple fact tables are needed when a business has heterogeneous products • The global view needs a single core fact table crossing all lines of business, whereas local view focuses on specific product • There are many attributes and facts which apply only to a specific product; a single fact table is not feasible • create customized fact and (product) dimension table for each product, and build a core fact table with attributes that make sense across all lines of business; this allows to create a single portfolio (of products) for each customer NLS/IITB/DWH 58 Transaction Schema • Every data mart needs two separate models – Transaction version – Periodic snapshot version • ‘rolling’ snapshot containing averages across time • Snapshots allow us to quickly measure the status of the enterprise • The Transaction schema – low level transactions in the organization makes for a good dimensional frame work – The fact record for an individual transaction frequently contains only a single value NLS/IITB/DWH 59 Transaction Schema.. • The transaction-based WH commonly used in – – – – – NLS/IITB/DWH Time of day analysis Queue analysis Fraud detection Basket analysis Current status 60 Factless Fact Tables • useful to describe events and their coverage • an event fact table records occurrence of an event; has only flag and dimension keys (eg, student attendance) • coverage fact table is frequently needed when a primary fact table in a dimensional data warehouse is sparse; eg, primary fact table will not provide items which were on promotion but did not sale; the coverage table, containing only dimension keys, lists all items on sale NLS/IITB/DWH 61 Facts of Different Granularity • The dimensional model gains power as the individual fact records become more and more atomic • At the lowest level of individual transactions, the design is most powerful because – More of the descriptive attributes have single values – The design withstands surprise in the form of new facts, new dimensions, or new attributes within existing dimensions – More expressiveness at the lowest levels of granularity NLS/IITB/DWH 62 Technical Architecture The Back Room Source System Data Staging Services The Front Room Metadata Catalog Presentation Servers Dimensional Data Marts with Only Aggregated Data Data Staging Area Key NLS/IITB/DWH Standard Reporting Tools Query Services Desktop Data Access Tools Application Models Operational System Dimensional Date Marts Including Atomic Data Data Element Service Element Service Element 63 The Technical Architecture... It describes flow of data from the source systems to the decision makers • Data staging services – – – – NLS/IITB/DWH Extract Transformation Load Job control • Query services – – – – – Warehouse browsing Access and security Query management Standard reporting Activity monitor 64 Metadata Catalog • It is an integral part of the overall architecture • It contains information that describes the warehouse and plays an active role in its creation, use, and maintenance • Contains source system metadata (data and processes), data staging metadata (dimensions, transformations, aggregations), DBMS metadata (tables, indexes, stored procedures), and frontroom metadata (users, applications) NLS/IITB/DWH 65 Technical Architecture Features • Metadata driven – Metadata provides flexibility by buffering the various components of the system from each other – The metadata catalog provides parameters and information that allow the application to perform their task • Flexible services layers – The data staging services and data query services add to the flexibility of the architecture NLS/IITB/DWH 66 Back Room : Data Staging Area • It is the construction site for the Warehouse • The central role of the staging area is to evolve the source system of record for all downstream DSS and reporting environment • Data staging data models – The data models can be designed for performance and ease for development – Third normal form often appear in the data staging area because the source systems are duplicated NLS/IITB/DWH 67 Data Staging Area... • Atomic data marts hold the lowest level of necessary details to meet the most of the high value business requirements – Atomic data mart storage type should be relational rather than OLAP because of extreme level of detail, the number of dimensions, and size – Atomic data mart data model built around the dimensional model, not an ER model NLS/IITB/DWH 68 Transformation Services • It is a process of transforming the data from source systems into something presentable to the end users and valuable to the business • Different transformation services : – – – – – – – NLS/IITB/DWH Integration Slowly Changing dimension maintenance Referential integrity checking Data type conversion Aggregation Data content audit Pre- and post-step exits 69 Front Room Architecture • It is the public face of the warehouse, the business users see and work with day-to-day • The presentation servers are machines on which the data warehouse data is organized for direct querying by the end users and report writers • The major types of activities here : – – – – – NLS/IITB/DWH Warehouse or metadata browsing Access and Security Activity monitoring Query management Standard reporting 70 Warehouse Browsing • Using the browsing tools to find and access the information needed by the user • The warehouse browser should be dynamically linked to the metadata catalog • It should be able to pull the definition and derivations of the various data elements and to show a set of standard reports • Browsing tools – Visual Basic – Microsoft Access, etc NLS/IITB/DWH 71 Access and Security Services • Access and security services facilitate a user’s connection to the data base • It relies on authorization and authentication services where the user is identified and access rights are determined or access is refused • Levels of authentication depends on how sensitive the data is NLS/IITB/DWH 72 Activity Monitoring Services • Capturing the information about the use of the data warehouse • The capabilities are : – – – – NLS/IITB/DWH Performance User support Marketing Planning 73 Query Management Services • Query management services are the set of capabilities that manage the execution of the query, and return of the result set to the desktop • The major query management services are : – – – – NLS/IITB/DWH Query reformulation Query re-targeting and multi-pass SQL Aggregate awareness Query Governing 74 Standard Reporting Services • It has an ability to create a fixed-format report requiring limited user interaction, and regular execution schedules • Requirements for standard reporting tools are : – – – – – – – NLS/IITB/DWH Reporting developing environment Report execution server Time-and event-based scheduling of report execution Iterative execution Flexible report definition Flexible report delivery Report library with browsing capability 75 Back Room infrastructure factors • Infrastructure for the data warehouse includes the hardware, network, and lower-level functions, such as security etc… • The data base server is the biggest hardware platform decision for most data warehouse projects NLS/IITB/DWH 76 Back Room Infrastructure Factors... • The major factors in determining requirements for the server platforms are : – Data size • Most data warehouse/data mart projects tend to start out with no more than 200 GB • The data warehouse of less than 100 GB as small, those from 100 GB as typical, and those with more than 500 GB to be large – Volatility • It measures the dynamic nature of the database; it includes how often the data base will be updated, how much data is replaced each time NLS/IITB/DWH 77 Back Room Infrastructure Factors... – Number of users • How active the users are, how many are active concurrently, and their geographical distribution etc. are important factors in selecting a platform – Number of business processes • It increases the complexity of the data warehouse • Separate hardware platforms for each business process – Nature of use • It depends on the front-end tools, implication on platform selection, types of queries etc.. NLS/IITB/DWH 78 Technical Factors • Platforms – NT servers for medium-sized warehouse • The NT is cost-effective platform for smaller warehouses or data marts – Open system servers • The open system, or Unix, servers are the primary platform for most medium-sized or larger warehouse • If the data warehouse is based on a Unix environment, the warehouse team will need to know administrative tools, basic Unix commands and utilities to be able to develop and manage the warehouse NLS/IITB/DWH 79 Technical Factors... • Disks – Disk drives can have a major impact on the performance, flexibility, and scalability of the warehouse platform • Memory – More memory is better for data warehousing – Transaction requests are small and typically don’t need much memory, decision support queries requires more memory and involves large tables – If the table can fit in memory the performance can improve 10 to 100 times NLS/IITB/DWH 80 Technical Factors... • Database platform – Data warehouses are implemented using main framebased database products – Some data warehouses are implemented using a specialized multidimensional database products called MOLAP (multidimensional on-line analytical processing) engines – MOLAP engines came about in response to three main user requirements: simple data access, crosstab-style reports, fast response time – The significant benefit of using a MOLAP engine is the end user query performance NLS/IITB/DWH 81 Physical Design • In the physical design, the data warehouse team is required to estimate the warehouse’s size • In data warehouses, the size of dimension tables is insignificant compared to the size of the fact tables and the size of the indexes on the fact tables NLS/IITB/DWH 82 Initial Sizing Estimates... • preliminary sizing estimates include – – – – – – NLS/IITB/DWH Estimate row length Estimate number of rows Count and sizes of indexes Temp space space for metadata tables Considerable space for aggregate tables 83 Indexes and Query Strategies • To develop an index plan, it’s important to understand how the RDBMS’s query optimizer and indexes work – – – – – The B-tree index The bitmapped index The hash index Other index types Star schema optimization • Indexing the fact tables, Dimension tables, and indexing for loads NLS/IITB/DWH 84 End User Application Nature of use Strategic Customer Type Ad hoc power user Push-button knowledge workers Information Interface Value Desktop tools for do-it-yourself queries Migration path End User Application Reporting/AnalysisExamples Assured reference points -Low effort -Current business view -Flexible Migration path Standard report Operational consumers NLS/IITB/DWH Operational reporting environment 85 End User Application Template • It provides the layout and structure of a report that is driven by a set of parameters • This approach allows users to generate number of similar structure reports from a single template • Through the drill-down capabilities, a user could produce reports on other attributes; this action results in changing the actual template structure • Many data access tools provide this functionality transparently NLS/IITB/DWH 86 Typical Analysis Cycle • • • • • • • How’s business? What are the trends? What’s unusual? What is driving those exceptions? What if…? Make a business decision Implement the decision NLS/IITB/DWH 87 The Desktop Installation Readiness • The back room architecture and infrastructure will be established long before deployment as it is needed for development activities • The technology residing on user’s desktop is the last piece that must be put in place prior to the deployment NLS/IITB/DWH 88 The Desktop Installation Readiness... • Check list of activities that should occur well before the deployment – – – – – – – – – NLS/IITB/DWH Determine the client configuration requirement Determine LAN addresses Conduct a physical audit Complete the contract and procurement process Acquire user logons and security approval Test installation procedures on a variety of machines Schedule the installation Install the desktop hardware and/or software Complete installation testing 89 End User Education Strategy • A robust education strategy for business end user is a prerequisite for data warehouse success • Integrate and tailor education content • Education for business users must address three key aspects of the data warehouse – Data content – End user application – The data access tool NLS/IITB/DWH 90 The End User Education Strategy… • Data education content – provide an overview of structures, hierarchies, business rules, and definitions – Before deployment, identify, document, and communicate these data to the business users – Factors causing discrepancy between data from the warehouse and previously reported information are : • The data warehouse information is incorrect • The warehouse information has a different or new business definition or meaning • The previously reported information was incorrect NLS/IITB/DWH 91 An End User Support Strategy • The user support strategies vary by organization and culture, based largely on the expectations of senior business management • Determine the support organization structure – Centralized team of support resources handles the more global data warehouse maintenance and responsibility – The team typically serves as a second line of defense, and provides a pool of advanced application development resources NLS/IITB/DWH 92 An End User Support Strategy... • Establish support communication and feedback – Communication with your user should be minimum, consisting of general information, and status updates – Success stories can help motivate • Provide support documentation • Create a Warehouse web site NLS/IITB/DWH 93 Conclusion • Building a corporate-wide data warehouse is a challenging task • A systematic methodology essential • Plan the architecture globally but build it incrementally • Keep user requirements at the core of all development activities NLS/IITB/DWH 94