Data Warehouse

advertisement
Data Warehouse
Yong Shi
CSE DEPARTMENT
Strategic delivery of information
• The current Situation
The never-ending quest to access any
information, anywhere, anytime.
• The problem
Data is scattered in many types of
incompatible structures.
Analytical processing requirements
• Four levels of analytical processing:
1. Simple queries and reports
2. The ability to do “what if” processing
3. Step back and analyze what has previously
occurred to bring about the current state of date
4. Analyze what has happened in the past and what
needs to be done in the future for a specific change
Information data superstore(IDSS)
• Definition:
The architecture needed to support the farranging requirements of the four levels of analysis.
• Also called super data warehouse
• Data warehouses is not an end of
themselves but merely a step on the path to
the information data super store
Why need for a separate
environment
• The use of operational systems v.s data
warehouse
• The data’s characteristics
• The type of access
A strategy for building a data
warehouse
• Need indicators
• Action steps
• Three-stage data warehousing processing:
model  build  deploy
(understand)
(establish) (implement)
Organizational and cultural issues
• Cultural imperatives
• Success criteria
•Satisfy users’ requirements
•Make a significant contribution to the
success of the business
•The users accept and actively use it
•The benefits are not exceeded by the costs
•An adequate budget must be in place
Organizational and cultural issues
• Success criteria(continued)
•The implementation of the data warehouse
must not cause other problems that overshadow
the benefits
•A reasonable schedule must be established
Organizational and cultural issues
•
•
•
•
•
•
•
End user(client)
Strategic architecture
User liaison
End-user support
Data analyst
Security office
Data administration
Organizational and cultural issues
•
•
•
•
•
Database administration
Choosing the initial data and department
Establishing an infrastructure
Training users
Change in the power structure
End Users
• A crucial part of the project
• Gathering requirements and managing
expectations
• Cost justification process
• Design reviews
• User perspective
• User training
A technical architecture for DW
Source Data
Data
Acquisition
Component
Design
Component
Data
Manager
Component
Data
Delivery
Component
External
Data
Information
Directory
Component
Middleware
Component
Data Access
Component
Warehouse
Data
External
Data
Warehouse
Data
Management Component
Data Quality
• Why is data quality important?
Data is a critical issue
It will limit the ability of the end users to make
informed decision.
It has a profound effect on the image of the
enterprise.
The poor one will make it difficult to make
major changes in an organization.
Data Quality
• What is data quality?
•The data is accurate
•The data is stored according to data type
•The data has integrity
•The data is consistent
•The databases are well designed
Data Quality
•The data is not redundant
•The data follow business rules
•The data corresponds to established domains
•The data is timely
•The data is well understood
Data Quality
•The data satisfies the needs of the business
•The user is satisfied with the quality of the data and
the information derived from that data
•There are no duplicate records
•Data anomalies
Data Quality
• Assessment of existing data quality
•Programs that abnormally terminate with data
exceptions
•Clients who experience errors/anomalies
•Clients who do not know or are confused about what
the data actually means
•Data that cannot be shared due to lack of integration
Data Quality
• What data should be improved?
The energy should be spent on data where the
quality improvement will bring an important
benefit to the business.
We can ignore unimportant data and obsolete
data.
Other criteria: improve those which can be
fixed and kept clean.
Data Quality
• Purification process
•Determine the importance of data quality to the organization
•Identify the enterprise’s most important data and evaluate the
quality.
•Determine users’ and owners’ perception of data quality.
•Prioritize which data to purify.
•Assemble and train a team to clean the data.
•Select tools to aid in the purification process, etc.
Data Quality
• Data quality case
• Lesson1: If those entering the data have a
stake in the data being incorrect, the data
will be incorrect.
• Lesson2: Reports may show desired results,
but the reports may be highly inaccurate.
Directory/Catalog
• The challenge
Providing short-term benefit without
disabling broader long-term information
handling solutions.
Getting data into a warehouse is only
half of the process.
Security in the data warehouse
•
•
•
•
Basic security concepts
Physical security
Stand-alone or shared security
Remote access
Download