File - Department of Information Technology-SRIT

advertisement
DWM UNIT ii – 2 Marks
UNIT II
DATA WAREHOUSING
2 Marks
1) List out the functionalities of sourcing, transformation and clean up tools.
a.
b.
c.
d.
e.
Removing unwanted data from operational databases
Converting to common data names and definitions
Calculating summaries and derived data
Establishing defaults for missing data
Accommodating source data definition changes
2) Draw the overall Data warehouse architecture.
Pg No 116, Fig No 6.1
3) Define metadata and list out its types.
Metadata is data about data that describes the data warehouse. It is used for building,
maintaining, managing and using the data warehouse. Metadata can be classified into
two types. They are
a. Technical Metadata
b. Business Metadata
4) What are the documents included in technical metadata?
a.
b.
c.
d.
e.
Information about data sources
Transformation description
Warehouse object and data structure definitions for data targets
Rules used to perform data cleanup & data enhancement
Data mapping operations when capturing data from source systems &
applying it to the target warehouse database
f. Access authorization, backup history, archive history, information delivery
history, data acquisition history, data access, etc.
5) List out the issues of sourcing, transformation and clean up tools.
1) Database heterogeneity: DBMSs are very different in data models, data access
language, data navigation, operations, concurrency, integrity, recovery, etc
2) Data heterogeneity: difference in the way is defined and used in different models
– homonyms, synonyms, unit incompatibility, different attributes for the same
entity, and different ways of modelling the same fact.
1
DWM UNIT ii – 2 Marks
6) What are the documents included in business metadata?
a. Subject areas and information object type including queries, reports, images,
video and or audio clips
b. Internet home pages
c. Other information to support all data warehousing components
d. Data warehouse operational information.
7) List out the requirements of metadata.
i)
Should be a gateway to the data warehouse environment
ii)
Should support an easy distribution & replication of its content for high
performance & availability
iii)
Should be searchable by business oriented key words
iv)
Should support the sharing of information objects such as queries, reports,
data collection and subscription between users.
8) What are the tools in Data warehouse architecture?
1. Data query & reporting tools
2. Application development tools
3. Executive information system (EIS) tools
4. On-line analytical processing tools
5. Data mining tools
9) Define Metalayer.
Metalayer is used to shield end users from the complexities of SQL and it is
inserted between users and the database. Metalayer is software which provides subject
oriented views of a database and supports point and click creation of SQL. It is
designed for easy to use, point and click and visual navigation operation.
10) Why are Data mining used in organizations?
Most organizations engage in data mining to
1) To discover knowledge
2) To visualize data
3) To correct data
11) Define Data Visualization.
It is a method of presenting the output of all previously mentioned tools in
such a way that the entire problem and/or the solution is clearly visible to domain
experts and even to casual observers.
12) List out the problems of Data Marts.
1) Scalability: in situations where an initial small data mart grows quickly in
multiple dimensions
2
DWM UNIT ii – 2 Marks
2) Data integration
13) Define Data Marts.
It is a data store that is subsidiary to a data warehouse of integrated data. Data mart is
directed at a partition of data that is created for the use of a dedicated group of users.
There are two types of data marts
1) Dependent data marts
2) Independent data marts
14) How are Data Warehouse administration and management done?
1) Security & priority management
2) Monitoring updates from multiple sources
3) Data quality checks
4) Managing and updating metadata
5) Replicating, sub setting and distributing data
6) Backup and recovery
15) Why data warehouse is used by business users?
a) Decisions need to be made quickly and correctly using all available data.
b) Users are business domain experts, not computer professionals
c) Competition is heating up in the areas of business intelligence and added
information value.
d) The amount of data is doubling every 18 months, which affects response time.
16) What are the technology reasons for using Data Warehouse?
The technology reasons for the existence of data warehousing. First, the data
warehouse is designed to address the incompatibility of informational and operational
transactional systems.
17) Define the two approaches for building Data Warehouse.
Top down approach: Organization has developed an enterprise data model, collected
enterprise wide business requirements and decided to build an enterprise data
warehouse with subset data marts.
Bottom up approach: The business priorities resulted in developing individual data
marts, which are then integrated into enterprise data warehouse.
18) Define Holistic approach.
This approach is to consider all data warehouse components as parts of a single
complex system and take into account all possible data sources and all known usage
requirements. Failing to do this will result in a data warehouse design that is skewed
toward a particular business requirement, data source or a selected access tool.
19) Why building of Data Warehouse is a difficult task?
3
DWM UNIT ii – 2 Marks
1) To consolidate data from multiple often heterogeneous sources into a query
database.
2) Heterogeneity of data sources which affects data conversion, quality, and
timeliness.
3) Use of historical data, which implies that data maybe old
4) Tendency of database to grow very large
20) Why is Data Warehouse said to be Business driven?
The data warehouse is business driven requires continuous interactions with end users
and is never finished since both requirements and data sources change.
21) Write notes on Mainframe systems.
Mainframe is based on proven technology, has large data and throughput capacity, is
reliable, available and serviceable and support legacy databases however they are not
open and flexible and not optimized for ad hoc queries.
22) List out the logical steps to build a Data Warehouse.
1) Collect and analyze business requirements
2) Create a data model and physical design for data warehouse
3) Define data sources
4) Choose the database technology & platform for the warehouse
5) Choose database access and reporting tools
6) Choose database connectivity software
7) Choose data analysis and presentation software
8) Update the data warehouse.
23) What are the examples for access types?
a) Simple tabular form reporting
b) Ranking
c) Multivariable analysis
d) Time series analysis
e) Complex textual search
f) Ad hoc user specified queries
24) Define Data Replication.
Many companies use data replication servers to copy their most needed data to a
separate database where decision support applications can access it. Replication
technology creates copies of databases on a periodic basis, so that data entry and data
analysis can be performed separately.
25) What are the benefits of Data Warehouse?
a) Locating the right information
b) Presentation of information
4
DWM UNIT ii – 2 Marks
c) Testing of hypothesis
d) Discovery of information
e) Sharing the analysis.
26) Give examples for tangible benefits of data warehouse.
1) Product inventory turnover is improved
2) Costs of product introduction are decreased with improved selection target
markets
3) More cost effective decision making is enabled by separating query processing
from operational databases.
27) Give examples for intangible benefits of data warehouse.
1) Improved productivity by keeping all required data in a single location and
eliminating the rekeying of data.
2) Reduced redundant processing, support and software to support overlapping
decision support applications.
28) Define Interquery and Intraquery.
a) Interquery: in which different server threads (or processes) handle multiple
requests at the same time.
b) Intraquery : it decomposes the serial SQL query into lower level operations such
as scan, join, sort and aggregation. These lower level operations then are executed
concurrently in parallel.
29) Define Horizontal Parallelism.
Database is partitioned across multiple disks and parallel processing occurs within a
specific task ( table scan) that is performed concurrently on different processors
against different sets of data
30) Define Vertical Parallelism.
It occurs among different tasks all component query operations ( i.e scan , join, sort)
are executed in parallel in a pipelined fashion. In other words an output from one task
(Ex: scan) becomes an input into another task (Ex: join) as soon as records become
available.
31) What is data partitioning?
Data partitioning is a key requirement for effective parallel executions of database
operations. It spreads data from database tables across multiple disks so that I/O
operations such as read and write can be performed in parallel. There are two ways in
which data partitioning can be done
Random Partitioning
Intelligent Partitioning
32) Draw the tool layout and integration points of metadata.
Pg No 211 Fig No 11.3
5
DWM UNIT ii – 2 Marks
33) List out the requirements for parallel DBMS.
1) Support for function shipping
2) Parallel join strategies
3) Support for data repartitioning
4) Query compilation
5) Support for database transactions
1)
2)
3)
4)
5)
6)
7)
12 Marks
With neat sketch explain the data warehouse architecture.
Explain in detail about implementation considerations in data warehouse.
Explain in detail about design considerations in data warehouse.
Explain in detail about database architectures for parallel processing.
Write about implementation examples of metadata repository.
Write short notes on
a. Metadata Interchange Initiative
b. Metadata Defined
Write short notes on
a. Metadata Repository
b. Metadata Management
8) Write short notes on
a. Tool requirements
b. Vendor Approach
c. Access to legacy data
6
Download