Managing Data Warehouse Projects: The Key Issues

advertisement
Enterprise Data Warehouse
Fundamentals 101
KIDS Phase II Project
Mojo Nwokoma
Director, Enterprise Data Systems Architecture
Office of Assessment & Information Services
Oregon Department of Education
503-378-3600 x2242
Mojo.nwokoma@state.or.us
1
What is Enterprise DW/BI Solutions
•
Data Warehouse (DW) is a collection of integrated, subject
oriented, time-variant, and non-volatile data from various
sources into a single and consistent warehouse that
supports reporting, analysis, and decision making within
the enterprise.
– Integrates operational data through consistent naming conventions,
measurements, physical attributes, and semantics.
•
Business Intelligence (BI) solutions use a blending of
technologies, including relational and multi-dimansional
databases, client/server architecture, and graphical user
interface, to integrate disparate data sources into a single
coherent framework for real-time reporting, drill-through
analysis and decision support.
2
KIDS Phase I Project Report:
The Business Case for Change
1.
NCLB & Federal Accountability:
•
•
2.
The reporting and performance requirements of NCLB demands a fundamental
change in the statewide data collection and reporting.
The need to report individual student achievement and aggregation of data by
subgroups.
Statewide Accountability & Efficiency:
•
•
3.
Major gaps and inefficiencies exist in information collection, reporting and analysis
at both district and state levels that need to be addressed.
Funding formula compliance and equity, and ability to evaluate relative
effectiveness/ineffectiveness of education programs.
Student & Community Service:
•
4.
Growing stakeholder demand for a significant improvement in student records
availability, accessibility, portability, and accuracy.
Cost Efficiency Gains:
•
Evidence of economies of scale resulting in cost reductions in the management of
administrative systems at the larger school districts and ESD structure.
Source: KIDS Phase 1 Final report: ODE & IBM Confidential (10/20/05)
3
KIDS Phase II Project Planning:
5 key questions for a successful
project planning & implementation
What?
Who?
When?
SUCCESS
Why?
How?
The Essential Building Blocks for a Successful
Enterprise Information Management Project
5
Enterprise Data Warehouse Architecture
H/W Server
Platform
Extraction
Phase
Transformation
Phase
Load
Phase
HR
Applications
SIS
Data Mart
SIS
Finance
Data Mart
STAGING
-Integration
Curriculum
&
Instruction
Data Management
- Metadata
- Cleansed
-Profiled
- Biz Rules
FINANCE
District Data
Warehouse
Data Mart
Transportation
Data Mart
Instruction
Nutrition
6
KIDS Phase II Project
District/ESD Server Deployment & Data Warehouse Integration Architecture
Beaverton District DW
Hillsboro District DW
Portland District DW
Transaction
System
Transaction
System
Transaction
System
ODS
ODS
ODS
ESDs DW
Eugene District DW
Transaction
System
ODE (State) DW
Physical
&
Virtual
Transaction
System
ODS
ODS
DW = Data Warehouse
ODS = Operational Data Store
LEGENDS:
Districts Record Exchange
KIDS Work = ODE
Project Planning Methodology – “The How?” For the
Project Team
Step 1: Define the Work Breakdown Structure
The first is to create a comprehensive Work Breakdown Structure (WBS). The WBS lists all the phases, activities and tasks
required to undertake the project. Identify and describe each phase, activity and task required to complete the project
successfully. Depict the order in which the tasks must be undertaken and identify any key internal and external project
dependencies. Also list the critical project milestones, such as the completion of key project deliverables
Step 2: Identify the Required Resources
Having listed all of the tasks required to undertake the project, you now need to identify the generic resources required to
complete each task. Examples of types of resource include: full-time and part-time staff, contractors, equipment and materials.
For each resource type, identify the quantity required, the delivery dates and the project tasks in the WBS that the resource
will be used to help complete.
Step 3: Construct a Project Schedule
To construct your schedule, you need to:
List the phases, activities and tasks Sequence the phases, activities and tasks Add key internal and external dependencies
Allocate relevant completion timeframes Add additional contingency to mitigate risk Assign resources required to complete
tasks List critical delivery milestones Specify any assumptions and constraints
8
Current Data Environment








Lack of granular, integrated, accurate, standardized, and timely data regarding student
performance and achievement both for individual students and specific student
subgroups.
Lack of data and information integration between instructional management sources
and assessment sources aligned to the state’s specific educational standards. Other
factors affected include attendance record, discipline, teacher qualification, classroom
size, and instructional hours.
Lack of close collaboration between the districts and ODE in tracking students as they
move through the educational system, both vertically and horizontally, in order to
improve performance by identifying actionable indicators.
Lack of specific multi-year student performance information to support longitudinal
analysis, accessible at the State, District, and school levels with appropriate controls
to assure confidentiality.
Need for data and tool standardization between all reporting districts to ensure
accurate, consistent, and useful analytical input for decision making purposes.
No single version of the truth exists for business rules and data definitions among
various data sources.
Lack of easily validated financial information that accurately reports budgeted vs actual
expenditures by program that allows correlation of these expenditures to student
performances.
Lack of appropriately controlled online access to information for all stakeholders
regarding student progress and school quality.
Problems with Current Decision Support
• Data redundancy and process redundancy
•
•
•
•
•
•
•
•
•
Data is not integrated and cannot be shared
Data is not understood or misunderstood
Inconsistent data definitions & business rules
Data retrieval is difficult and time consuming
Operational files may not contain history
Reports are inconsistent in content & format
Data is too dirty for business analysis
Multiple versions of the truth
Reports and associated BI tools are not standardized.
10
Recommended Model for Enterprise Data Warehouse System
KIDS Phase 11 Project
Educational Stakeholder
Communication
Benchmarking/Decision
Support
District & State
Reporting
Day to Day
Operations
E-Portal
Data Warehouse
Operational Data Store
ODS
Transactional Database
11
PK-12 Data Model for Information Management (ODE & IBM Confidential) 10/20/05
Standards
•
•
•
•
•
•
•
•
Governance (setting priorities)
Data naming, aliases, abbreviations list
Meta data capture and maintenance
Data quality and data management
Testing standards
Security standards
Measuring results (benefits, costs, usage)
Service level agreements
Rules and protocols to be followed by all
users and developers for all applications
12
DW roles & responsibilities
•
•
•
•
Business User (Client)
Business User Support
Data Administrator
Data Analysts
• Meta Data Administrator
• Database Administrator
• Developers
– ETL
– BI – Reports, Queries
13
DW roles & responsibilities
(continued)
•
•
•
•
Security Officer
Auditor
Data Warehouse Project Manager
Technical Services
• DW Architect
• Technical Advisory Board
• Steering Committee
14
Information quality
•
•
•
•
•
•
•
•
Data is accurate
Data is consistent
Data is timely
Data is integrated
Data is complete
Data values follow the business rules
Data corresponds to valid values
Data is well understood
15
Data Warehouse/ODS & BI Layers
Data Integration Layer & Operational Data Store (ODS):
• ODS is a non queryable centralized staging areas for storing extracted, cleansing, and transformed data,
and for gathering centralized metadata for implementing an Enterprise Data Mart Architecture (EDMA),
eliminating the need for another non queryable staging area called data warehouse.
• Needed is a dimensionally modeled Data Warehouse for enterprise DSS, prepared to provide the best in
query response performance and to support the most advanced OLAP functionalities.
Meta data components
» Data name (entity, attribute, table, column,
field, etc.)
» Business description of data (Project Start
Date)
» Source of data (file, field)
» Business Owner of data
» Business rules
» Transformation rules
» Domains (allowable values)
» Data relationships
17
Meta data components (continued)
»
»
»
»
»
Data quality (measure of reliability)
Timeliness (ex: current as of certain date)
Historical information
Aggregation rules
Security (who has access)
18
Meta data management
Meta Data = [Descriptive] Data About Data [of the Business]
• Meta data administration
• Business meta data
• Technical meta data
• ETL reconciliation
• Data quality metrics
• Standardization
• Data ownership
• Enterprise integration
Information =
Data within context
Context =
Meta data
Information =
Data + Meta data
19
KIDS Phase II Project
05-07 Workflow
1. Review and validate business case for Phase II project
•
•
•
Identify key stakeholders.
Validate requirements, Deliverables, and Expectations
Identify two or three key districts with viable data warehouse infrastructure as test sites.
2. Review enterprise architecture, and identify Infrastructure components
•
•
•
Transaction Level Applications
Operating & Database systems
Hardware Platforms
3. Design Data Warehouse & Operational Data Store (ODS) Data Model
•
•
•
•
•
Design Dimensional Modeling Schemas
Configure Extraction, Transformation, Load (ETL)
Metadata Capture (Repository or Data labels)
Data Quality Profiling
Vendor selection will be based on a competitive “Bake-off” results from three top vendors.
4. Data Reporting & Information Delivery Mechanism
•
•
•
•
Leverage existing reporting On-line Analytical Processing (OLAP) infrastructure
Design and implement subject-area data marts for effective horizontal reporting integration
Develop OLAP cubes for report aggregation and slice/dice querying investigations
Deploy enterprise portal with built-in security and user access authentication.
5. End-user Training
•
•
Design and schedule end-user training at all levels of data and reporting needs.
Identify “Train-the-Trainer” candidates from each school district for more detailed training.
KIDS Phase II Project
Key System Deliverables
1. Transactional Systems/District Data Warehouse
•
•
Integrating systems that support instructional management that empowers teachers to combine
student performance and instructional data to make informed classroom decisions.
Districts will be better able to meet NCLB and state standards for having “highly qualified” teachers
in the classroom.
2. Integrated & Interoperable Operational Data Store (ODS)
•
•
•
Provide ability to evaluate student performance within selected programs across various schools
and districts in the State, and highlight ways to achieve AYP consistently over time.
Allows for quick turnaround in transferring students records when they move between districts.
Ease reporting burden on districts, and eliminate redundant and possibly inaccurate reporting of
data, and a better foundation for integrated data analysis.
3. Data Warehouse & Decision Support System /Tools
•
•
•
A repository for State and district reporting and analysis even at student-level data.
Allows for more meaningful system-level questions and answers by legislators or policy makers.
Greater system accessibility to all users with relevant security access privileges will mean greater
acceptance and use, and ultimately better decision.
4. System Wide Communication Portal
•
•
Provide a focused location for access to information, analytical/reporting tools, and the necessary
training and support. Also, maximization of effective system-wide use of state’s data warehouse.
Rapid and wide dissemination of integrated and proven instructional and administrative practices.
KIDS Phase II Project
High-level Project Work Plan, Time-line, & Resource requirements
Project Phase:
Time-line
Resource Requirements
1.
Requirements Validation
January 10, 2006
Mojo & Gary Scheduled trips to all
Districts & ESDs
2.
Inauguration of Governance &
Project Team committee
members
January 30, 2006
Doug Kosty & Mojo Nwokoma
3. “Test Site” DW/ODS Modeling,
integration, ETL, Data Quality,
Meta Data Repository, and
Vendor “Bake-off” contracting
June 30, 2006
4. OLAP & Portal Development,
including Training, and
vendor “Bake-off” Contracting
October 30, 2006
Database Administrator
Data Modeler/Analysts
Data Quality Analysts
Business User (Client)
Meta Data Administrator
ETL & BI developers
Data Warehouse Project Manager
End-user Business Analyst
Web Developer
Portal Dashboard developer
Business User (Client)
BI OLAP Report Developer
“Train-the-trainer”
Data Security Officer
BI - OLAP Data Warehouse Architecture
23
User expectations
Expectations must be managed in terms of:
•
•
•
•
•
•
•
•
•
Schedule
Budget
Scope
Performance
Availability
Simplicity (ease of use)
Tool functionality
Data cleanliness
Users’ roles and responsibilities
24
User responsibilities
•
•
•
•
•
•
•
•
•
Be a full-time member of the Core Team
Participate in all data modeling sessions
Co-manage the DW project
Make decisions and escalate disputes to the
Steering Committee
Provide meta data for business objects
Identify data security requirements
Participate in BI tool selection
Participate in all review sessions
Participate in all testing activities
25
IT Staffing
•
•
•
•
DW roles & responsibilities
Dedicated IT team
New skill set – beyond tools, discipline
Contractors & consultants
•
Knowledge transfer
• Training
– Just in time
– Just enough
Knowledge transfer through collaboration
26
Risks to be mitigated
N
N
N
N
N
N
Low management commitment
Low user commitment
Unrealistic schedule
Unrealistic user expectations
Budget too small
Untrained or unavailable staff
27
Risks to be mitigated
N
N
N
N
N
N
N
N
Unclear or changing requirements
Poor project management
Creeping scope
Initial project too large
Wrong project
Changing priorities
Data cleansing not addressed early
Vendors out of control
28
Risks to be mitigated
N
N
N
N
N
N
Not architected properly (wrong design)
Inappropriate organization structure
Lost or changed sponsor
New technology not understood
No procedure to resolve disputes
Exceeding platform capabilities
29
Download