Enterprise Data Warehouse Fundamentals 101 KIDS Phase II Project Mojo Nwokoma Director, Enterprise Data Systems Architecture Office of Assessment & Information Services Oregon Department of Education 503-378-3600 x2242 Mojo.nwokoma@state.or.us 1 What is Enterprise DW/BI Solutions • Data Warehouse (DW) is a collection of integrated, subject oriented, time-variant, and non-volatile data from various sources into a single and consistent warehouse that supports reporting, analysis, and decision making within the enterprise. – Integrates operational data through consistent naming conventions, measurements, physical attributes, and semantics. • Business Intelligence (BI) solutions use a blending of technologies, including relational and multi-dimansional databases, client/server architecture, and graphical user interface, to integrate disparate data sources into a single coherent framework for real-time reporting, drill-through analysis and decision support. 2 KIDS Phase I Project Report: The Business Case for Change 1. NCLB & Federal Accountability: • • 2. The reporting and performance requirements of NCLB demands a fundamental change in the statewide data collection and reporting. The need to report individual student achievement and aggregation of data by subgroups. Statewide Accountability & Efficiency: • • 3. Major gaps and inefficiencies exist in information collection, reporting and analysis at both district and state levels that need to be addressed. Funding formula compliance and equity, and ability to evaluate relative effectiveness/ineffectiveness of education programs. Student & Community Service: • 4. Growing stakeholder demand for a significant improvement in student records availability, accessibility, portability, and accuracy. Cost Efficiency Gains: • Evidence of economies of scale resulting in cost reductions in the management of administrative systems at the larger school districts and ESD structure. Source: KIDS Phase 1 Final report: ODE & IBM Confidential (10/20/05) 3 KIDS Phase II Project Planning: 5 key questions for a successful project planning & implementation What? Who? When? SUCCESS Why? How? The Essential Building Blocks for a Successful Enterprise Information Management Project 5 Enterprise Data Warehouse Architecture H/W Server Platform Extraction Phase Transformation Phase Load Phase HR Applications SIS Data Mart SIS Finance Data Mart STAGING -Integration Curriculum & Instruction Data Management - Metadata - Cleansed -Profiled - Biz Rules FINANCE District Data Warehouse Data Mart Transportation Data Mart Instruction Nutrition 6 KIDS Phase II Project District/ESD Server Deployment & Data Warehouse Integration Architecture Beaverton District DW Hillsboro District DW Portland District DW Transaction System Transaction System Transaction System ODS ODS ODS ESDs DW Eugene District DW Transaction System ODE (State) DW Physical & Virtual Transaction System ODS ODS DW = Data Warehouse ODS = Operational Data Store LEGENDS: Districts Record Exchange KIDS Work = ODE Project Planning Methodology – “The How?” For the Project Team Step 1: Define the Work Breakdown Structure The first is to create a comprehensive Work Breakdown Structure (WBS). The WBS lists all the phases, activities and tasks required to undertake the project. Identify and describe each phase, activity and task required to complete the project successfully. Depict the order in which the tasks must be undertaken and identify any key internal and external project dependencies. Also list the critical project milestones, such as the completion of key project deliverables Step 2: Identify the Required Resources Having listed all of the tasks required to undertake the project, you now need to identify the generic resources required to complete each task. Examples of types of resource include: full-time and part-time staff, contractors, equipment and materials. For each resource type, identify the quantity required, the delivery dates and the project tasks in the WBS that the resource will be used to help complete. Step 3: Construct a Project Schedule To construct your schedule, you need to: List the phases, activities and tasks Sequence the phases, activities and tasks Add key internal and external dependencies Allocate relevant completion timeframes Add additional contingency to mitigate risk Assign resources required to complete tasks List critical delivery milestones Specify any assumptions and constraints 8 Current Data Environment Lack of granular, integrated, accurate, standardized, and timely data regarding student performance and achievement both for individual students and specific student subgroups. Lack of data and information integration between instructional management sources and assessment sources aligned to the state’s specific educational standards. Other factors affected include attendance record, discipline, teacher qualification, classroom size, and instructional hours. Lack of close collaboration between the districts and ODE in tracking students as they move through the educational system, both vertically and horizontally, in order to improve performance by identifying actionable indicators. Lack of specific multi-year student performance information to support longitudinal analysis, accessible at the State, District, and school levels with appropriate controls to assure confidentiality. Need for data and tool standardization between all reporting districts to ensure accurate, consistent, and useful analytical input for decision making purposes. No single version of the truth exists for business rules and data definitions among various data sources. Lack of easily validated financial information that accurately reports budgeted vs actual expenditures by program that allows correlation of these expenditures to student performances. Lack of appropriately controlled online access to information for all stakeholders regarding student progress and school quality. Problems with Current Decision Support • Data redundancy and process redundancy • • • • • • • • • Data is not integrated and cannot be shared Data is not understood or misunderstood Inconsistent data definitions & business rules Data retrieval is difficult and time consuming Operational files may not contain history Reports are inconsistent in content & format Data is too dirty for business analysis Multiple versions of the truth Reports and associated BI tools are not standardized. 10 Recommended Model for Enterprise Data Warehouse System KIDS Phase 11 Project Educational Stakeholder Communication Benchmarking/Decision Support District & State Reporting Day to Day Operations E-Portal Data Warehouse Operational Data Store ODS Transactional Database 11 PK-12 Data Model for Information Management (ODE & IBM Confidential) 10/20/05 Standards • • • • • • • • Governance (setting priorities) Data naming, aliases, abbreviations list Meta data capture and maintenance Data quality and data management Testing standards Security standards Measuring results (benefits, costs, usage) Service level agreements Rules and protocols to be followed by all users and developers for all applications 12 DW roles & responsibilities • • • • Business User (Client) Business User Support Data Administrator Data Analysts • Meta Data Administrator • Database Administrator • Developers – ETL – BI – Reports, Queries 13 DW roles & responsibilities (continued) • • • • Security Officer Auditor Data Warehouse Project Manager Technical Services • DW Architect • Technical Advisory Board • Steering Committee 14 Information quality • • • • • • • • Data is accurate Data is consistent Data is timely Data is integrated Data is complete Data values follow the business rules Data corresponds to valid values Data is well understood 15 Data Warehouse/ODS & BI Layers Data Integration Layer & Operational Data Store (ODS): • ODS is a non queryable centralized staging areas for storing extracted, cleansing, and transformed data, and for gathering centralized metadata for implementing an Enterprise Data Mart Architecture (EDMA), eliminating the need for another non queryable staging area called data warehouse. • Needed is a dimensionally modeled Data Warehouse for enterprise DSS, prepared to provide the best in query response performance and to support the most advanced OLAP functionalities. Meta data components » Data name (entity, attribute, table, column, field, etc.) » Business description of data (Project Start Date) » Source of data (file, field) » Business Owner of data » Business rules » Transformation rules » Domains (allowable values) » Data relationships 17 Meta data components (continued) » » » » » Data quality (measure of reliability) Timeliness (ex: current as of certain date) Historical information Aggregation rules Security (who has access) 18 Meta data management Meta Data = [Descriptive] Data About Data [of the Business] • Meta data administration • Business meta data • Technical meta data • ETL reconciliation • Data quality metrics • Standardization • Data ownership • Enterprise integration Information = Data within context Context = Meta data Information = Data + Meta data 19 KIDS Phase II Project 05-07 Workflow 1. Review and validate business case for Phase II project • • • Identify key stakeholders. Validate requirements, Deliverables, and Expectations Identify two or three key districts with viable data warehouse infrastructure as test sites. 2. Review enterprise architecture, and identify Infrastructure components • • • Transaction Level Applications Operating & Database systems Hardware Platforms 3. Design Data Warehouse & Operational Data Store (ODS) Data Model • • • • • Design Dimensional Modeling Schemas Configure Extraction, Transformation, Load (ETL) Metadata Capture (Repository or Data labels) Data Quality Profiling Vendor selection will be based on a competitive “Bake-off” results from three top vendors. 4. Data Reporting & Information Delivery Mechanism • • • • Leverage existing reporting On-line Analytical Processing (OLAP) infrastructure Design and implement subject-area data marts for effective horizontal reporting integration Develop OLAP cubes for report aggregation and slice/dice querying investigations Deploy enterprise portal with built-in security and user access authentication. 5. End-user Training • • Design and schedule end-user training at all levels of data and reporting needs. Identify “Train-the-Trainer” candidates from each school district for more detailed training. KIDS Phase II Project Key System Deliverables 1. Transactional Systems/District Data Warehouse • • Integrating systems that support instructional management that empowers teachers to combine student performance and instructional data to make informed classroom decisions. Districts will be better able to meet NCLB and state standards for having “highly qualified” teachers in the classroom. 2. Integrated & Interoperable Operational Data Store (ODS) • • • Provide ability to evaluate student performance within selected programs across various schools and districts in the State, and highlight ways to achieve AYP consistently over time. Allows for quick turnaround in transferring students records when they move between districts. Ease reporting burden on districts, and eliminate redundant and possibly inaccurate reporting of data, and a better foundation for integrated data analysis. 3. Data Warehouse & Decision Support System /Tools • • • A repository for State and district reporting and analysis even at student-level data. Allows for more meaningful system-level questions and answers by legislators or policy makers. Greater system accessibility to all users with relevant security access privileges will mean greater acceptance and use, and ultimately better decision. 4. System Wide Communication Portal • • Provide a focused location for access to information, analytical/reporting tools, and the necessary training and support. Also, maximization of effective system-wide use of state’s data warehouse. Rapid and wide dissemination of integrated and proven instructional and administrative practices. KIDS Phase II Project High-level Project Work Plan, Time-line, & Resource requirements Project Phase: Time-line Resource Requirements 1. Requirements Validation January 10, 2006 Mojo & Gary Scheduled trips to all Districts & ESDs 2. Inauguration of Governance & Project Team committee members January 30, 2006 Doug Kosty & Mojo Nwokoma 3. “Test Site” DW/ODS Modeling, integration, ETL, Data Quality, Meta Data Repository, and Vendor “Bake-off” contracting June 30, 2006 4. OLAP & Portal Development, including Training, and vendor “Bake-off” Contracting October 30, 2006 Database Administrator Data Modeler/Analysts Data Quality Analysts Business User (Client) Meta Data Administrator ETL & BI developers Data Warehouse Project Manager End-user Business Analyst Web Developer Portal Dashboard developer Business User (Client) BI OLAP Report Developer “Train-the-trainer” Data Security Officer BI - OLAP Data Warehouse Architecture 23 User expectations Expectations must be managed in terms of: • • • • • • • • • Schedule Budget Scope Performance Availability Simplicity (ease of use) Tool functionality Data cleanliness Users’ roles and responsibilities 24 User responsibilities • • • • • • • • • Be a full-time member of the Core Team Participate in all data modeling sessions Co-manage the DW project Make decisions and escalate disputes to the Steering Committee Provide meta data for business objects Identify data security requirements Participate in BI tool selection Participate in all review sessions Participate in all testing activities 25 IT Staffing • • • • DW roles & responsibilities Dedicated IT team New skill set – beyond tools, discipline Contractors & consultants • Knowledge transfer • Training – Just in time – Just enough Knowledge transfer through collaboration 26 Risks to be mitigated N N N N N N Low management commitment Low user commitment Unrealistic schedule Unrealistic user expectations Budget too small Untrained or unavailable staff 27 Risks to be mitigated N N N N N N N N Unclear or changing requirements Poor project management Creeping scope Initial project too large Wrong project Changing priorities Data cleansing not addressed early Vendors out of control 28 Risks to be mitigated N N N N N N Not architected properly (wrong design) Inappropriate organization structure Lost or changed sponsor New technology not understood No procedure to resolve disputes Exceeding platform capabilities 29