Data Mart/Warehouse: Progress and Vision Institutional Research and Planning University Information Systems 1/2/2007

advertisement
Data Mart/Warehouse: Progress and Vision
Institutional Research and Planning
University Information Systems
1/2/2007
What is data warehousing?
A data warehouse:
ƒ is a single place that contains complete,
accurate data from multiple sources, making
data analysis easier.
ƒ can streamline complex administrative
functions and support the decision-making
process.
ƒ is specifically designed for querying and
reporting.
ƒ is separate from operational/legacy systems
(e.g. SIS).
1/2/2007
How data warehousing works
ƒ Information is collected from a variety of
different sources and formats.
ƒ The data is “cleaned” – transformed into a
common format.
ƒ The cleaned data is added to the
warehouse, and made available through
data marts.
ƒ Data is frozen in time for comparative and
forecasting purposes.
ƒ Users can then write queries or make
reports using the data warehouse, instead
of linking separate databases
1/2/2007
IRP Data Mart/Warehouse
SIS
(completed)
HRS data
IRP
data selection
and design
Data
Warehouse
UIS
FRS data
Software/hardware
support and tools
Administration
Provost
Reports
Queries
President
Deans
Chairs
1/2/2007
Use of data warehousing
ƒ Reporting
Consistency and standardization of data. Users only
have to learn how to use one database to write queries and
reports and display data in graphical form.
ƒ Internal – Enrollment, retention & graduation
rates, etc.
ƒ External – IPEDS, US News & World Report,
Princeton Review
1/2/2007
Data Driven Decision Making
ƒ Monitor data
ƒ Data warehousing helps monitor important
university processes.
ƒ Rapid queries
ƒ Users can generate ad-hoc reports “on the fly”
without needing to spend time writing and rewriting the code needed to extract data from
multiple sources
ƒ Trend Analysis
ƒ Administrators can easily analyze admissions and
enrollment trends using historical data.
ƒ Daily Decision Making
ƒ Increase the efficiency of the daily functioning of
departments and services
1/2/2007
Benefits of Data Warehouse
ƒ Competitive advantage
ƒ Administrators can respond to changing
conditions based on data.
ƒ Consistent public image
ƒ Reputation for quality and excellence
ƒ Enhance public image
ƒ Facilitates accreditation
ƒ By using the data warehouse to “drill down” into
the college level and department level, more
detailed reports can be generated for
accreditation
1/2/2007
IRP’s Data Mart
Current data
Databases to be added
ƒ SIS
ƒ Enrollment
ƒ
ƒ Degrees
ƒ Admissions
ƒ Graduation
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ Retention
ƒ
ƒ
ƒ
Research expenditures
(2 years)
Course distribution
(2 semesters w/ ability to go back
further)
Faculty file to 1995
(in progress)
Staff file
(in process of being cleaned)
Budget file
(in conceptual design stage)
Athletics
(in progress)
Faculty Load (in development)
Student services (in the future)
Housing (in the future)
1/2/2007
Data source analysis diagram
NJIT
Student Information System
(SIS)
Enrollment &
Degree Data
Other Data
Commission of
Higher Education
(CHE)
Official Data
Student Information
Data Warehouse
(SIDW)
1/2/2007
Architecture model
Application (Distributed)
Source Systems
Security Service
- Log on Administration
NJIT-SIS
Focus Program
- Enrollment Files
- Degree Files
Connx Program
- Other Files
- Authentication
- Authorization
CHE
(Government)
ROLAP Service
- Query Management
- Data Analyze
- Aggregate Navigate
Source Models
Target Models
Source
Target
Mappings & Cleansing
Staging Files
- Dimension sets
- Facts
E-Mail
E-Mail
Data Staging Area
T3 connection
SURE File
Back Room
End User Application Client
Advanced Analysis Client
Front Room
Base Level
Presentation Repository
Metadata Catalog
Source Models
Report Library
Target Models
Code Definitions
Official Data Definitions
Data Staging Services
- Load
- Transformation
- Cleansing
- Export
- Metadata exchange
Data Staging
Administrator Client
Cleaned Data
Bulk
Loader
Data Warehouse
Backup /
Replicate
Database
Administrator Client
1/2/2007
Appendix
Current IRP Data Mart
Screens
1/2/2007
Screenshot: Predefined query
1/2/2007
Screenshot: Enrollment data query
1/2/2007
Screenshot: Enrollment cross-tab
1/2/2007
Screenshot: Admissions data query
1/2/2007
Screenshot: Admissions cross-tab
1/2/2007
Screenshot: Graduation rate query
1/2/2007
Screenshot: Graduation rate cross-tab
1/2/2007
Screenshot: Retention rate query
1/2/2007
Screenshot: Retention rate cross-tab
1/2/2007
Screenshot: Student life query
1/2/2007
Screenshot: Miscellaneous queries
1/2/2007
Components of Data Warehouse
1/2/2007
Extract, Transform and Load (ETL)
1/2/2007
DW Summary: Characteristics
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
Subject-orientation
integrated
non-volatile (i.e. not updated)
time variant (kept for long periods, for
forecasting and trend analysis)
summarized
large volume
not normalized
metadata
data sources
1/2/2007
Data Warehouse Process
1/2/2007
Data Warehouse Tools
ƒ Intelligent Agents and Agencies - tools work
and think for user.
ƒ Query Facilities and Managed Query
environments.
ƒ Statistical Analysis - One of the biggest
surprises in the data warehousing
marketplace is the resurgence of interest in
traditional statistical analysis, and the
concomitant resurrection of the popularity to
products like SAS and SPSS.
1/2/2007
Data Warehouse Tools
ƒ Data Discovery ƒ A large class of tools formerly classified as
decision support, artificial intelligence and expert
systems. They now make use of neural networks,
fuzzy logic, decision trees, and other tools from
advanced mathematics to allow a user to “sift”
through massive amounts of raw data to
“discover” new, interesting, insightful, and in many
cases useful things about the organization, its
operations, and its markets.
ƒ There are many different data discovery
tools/products on the market.
1/2/2007
Data Warehouse Tools
ƒ OLAP - On-line Analytical Processing:
ƒ often uses multi-dimensional spreadsheet tools
allowing users to look at information from many
different angles.
ƒ Users are able to “slice and dice” reports and to
look at the same kinds of information at different
levels at the same time.
ƒ Typical OLAP application might allow a product
manager to view sales figures for a given product
at the national level, see them broken down by
division, drill down to see territories within a
division, check sales numbers for each store within
a territory, and then compare them against sales of
stores from another territory.
1/2/2007
Data Warehouse Tools
Data Mining
Provides for:
ƒ Knowledge discovery in databases
ƒ Knowledge extraction
ƒ Data archeology
ƒ Data exploration
ƒ Data pattern processing
ƒ Data dredging
ƒ Information harvesting
1/2/2007
Data Mart Timeline:
Month1
Month2
Month3
Month4
Month5
Month6
Project Planning/Management/Policy
Development Environment
Software Installation
Data Inventory & Quality Assessment
Database Design
Application Design
Data Scrubbing & Loading
Application Development
Testing
Documentation
Rollout Complete Project
1/2/2007
Download