DWM UNIT ii – 2 Marks UNIT II DATA WAREHOUSING 2 Marks 1) List out the functionalities of sourcing, transformation and clean up tools. a. b. c. d. e. Removing unwanted data from operational databases Converting to common data names and definitions Calculating summaries and derived data Establishing defaults for missing data Accommodating source data definition changes 2) Draw the overall Data warehouse architecture. Pg No 116, Fig No 6.1 3) Define metadata and list out its types. Metadata is data about data that describes the data warehouse. It is used for building, maintaining, managing and using the data warehouse. Metadata can be classified into two types. They are a. Technical Metadata b. Business Metadata 4) What are the documents included in technical metadata? a. b. c. d. e. Information about data sources Transformation description Warehouse object and data structure definitions for data targets Rules used to perform data cleanup & data enhancement Data mapping operations when capturing data from source systems & applying it to the target warehouse database f. Access authorization, backup history, archive history, information delivery history, data acquisition history, data access, etc. 5) List out the issues of sourcing, transformation and clean up tools. 1) Database heterogeneity: DBMSs are very different in data models, data access language, data navigation, operations, concurrency, integrity, recovery, etc 2) Data heterogeneity: difference in the way is defined and used in different models – homonyms, synonyms, unit incompatibility, different attributes for the same entity, and different ways of modelling the same fact. 1 DWM UNIT ii – 2 Marks 6) What are the documents included in business metadata? a. Subject areas and information object type including queries, reports, images, video and or audio clips b. Internet home pages c. Other information to support all data warehousing components d. Data warehouse operational information. 7) List out the requirements of metadata. i) Should be a gateway to the data warehouse environment ii) Should support an easy distribution & replication of its content for high performance & availability iii) Should be searchable by business oriented key words iv) Should support the sharing of information objects such as queries, reports, data collection and subscription between users. 8) What are the tools in Data warehouse architecture? 1. Data query & reporting tools 2. Application development tools 3. Executive information system (EIS) tools 4. On-line analytical processing tools 5. Data mining tools 9) Define Metalayer. Metalayer is used to shield end users from the complexities of SQL and it is inserted between users and the database. Metalayer is software which provides subject oriented views of a database and supports point and click creation of SQL. It is designed for easy to use, point and click and visual navigation operation. 10) Why are Data mining used in organizations? Most organizations engage in data mining to 1) To discover knowledge 2) To visualize data 3) To correct data 11) Define Data Visualization. It is a method of presenting the output of all previously mentioned tools in such a way that the entire problem and/or the solution is clearly visible to domain experts and even to casual observers. 12) List out the problems of Data Marts. 1) Scalability: in situations where an initial small data mart grows quickly in multiple dimensions 2 DWM UNIT ii – 2 Marks 2) Data integration 13) Define Data Marts. It is a data store that is subsidiary to a data warehouse of integrated data. Data mart is directed at a partition of data that is created for the use of a dedicated group of users. There are two types of data marts 1) Dependent data marts 2) Independent data marts 14) How are Data Warehouse administration and management done? 1) Security & priority management 2) Monitoring updates from multiple sources 3) Data quality checks 4) Managing and updating metadata 5) Replicating, sub setting and distributing data 6) Backup and recovery 15) Why data warehouse is used by business users? a) Decisions need to be made quickly and correctly using all available data. b) Users are business domain experts, not computer professionals c) Competition is heating up in the areas of business intelligence and added information value. d) The amount of data is doubling every 18 months, which affects response time. 16) What are the technology reasons for using Data Warehouse? The technology reasons for the existence of data warehousing. First, the data warehouse is designed to address the incompatibility of informational and operational transactional systems. 17) Define the two approaches for building Data Warehouse. Top down approach: Organization has developed an enterprise data model, collected enterprise wide business requirements and decided to build an enterprise data warehouse with subset data marts. Bottom up approach: The business priorities resulted in developing individual data marts, which are then integrated into enterprise data warehouse. 18) Define Holistic approach. This approach is to consider all data warehouse components as parts of a single complex system and take into account all possible data sources and all known usage requirements. Failing to do this will result in a data warehouse design that is skewed toward a particular business requirement, data source or a selected access tool. 19) Why building of Data Warehouse is a difficult task? 3 DWM UNIT ii – 2 Marks 1) To consolidate data from multiple often heterogeneous sources into a query database. 2) Heterogeneity of data sources which affects data conversion, quality, and timeliness. 3) Use of historical data, which implies that data maybe old 4) Tendency of database to grow very large 20) Why is Data Warehouse said to be Business driven? The data warehouse is business driven requires continuous interactions with end users and is never finished since both requirements and data sources change. 21) Write notes on Mainframe systems. Mainframe is based on proven technology, has large data and throughput capacity, is reliable, available and serviceable and support legacy databases however they are not open and flexible and not optimized for ad hoc queries. 22) List out the logical steps to build a Data Warehouse. 1) Collect and analyze business requirements 2) Create a data model and physical design for data warehouse 3) Define data sources 4) Choose the database technology & platform for the warehouse 5) Choose database access and reporting tools 6) Choose database connectivity software 7) Choose data analysis and presentation software 8) Update the data warehouse. 23) What are the examples for access types? a) Simple tabular form reporting b) Ranking c) Multivariable analysis d) Time series analysis e) Complex textual search f) Ad hoc user specified queries 24) Define Data Replication. Many companies use data replication servers to copy their most needed data to a separate database where decision support applications can access it. Replication technology creates copies of databases on a periodic basis, so that data entry and data analysis can be performed separately. 25) What are the benefits of Data Warehouse? a) Locating the right information b) Presentation of information 4 DWM UNIT ii – 2 Marks c) Testing of hypothesis d) Discovery of information e) Sharing the analysis. 26) Give examples for tangible benefits of data warehouse. 1) Product inventory turnover is improved 2) Costs of product introduction are decreased with improved selection target markets 3) More cost effective decision making is enabled by separating query processing from operational databases. 27) Give examples for intangible benefits of data warehouse. 1) Improved productivity by keeping all required data in a single location and eliminating the rekeying of data. 2) Reduced redundant processing, support and software to support overlapping decision support applications. 28) Define Interquery and Intraquery. a) Interquery: in which different server threads (or processes) handle multiple requests at the same time. b) Intraquery : it decomposes the serial SQL query into lower level operations such as scan, join, sort and aggregation. These lower level operations then are executed concurrently in parallel. 29) Define Horizontal Parallelism. Database is partitioned across multiple disks and parallel processing occurs within a specific task ( table scan) that is performed concurrently on different processors against different sets of data 30) Define Vertical Parallelism. It occurs among different tasks all component query operations ( i.e scan , join, sort) are executed in parallel in a pipelined fashion. In other words an output from one task (Ex: scan) becomes an input into another task (Ex: join) as soon as records become available. 31) What is data partitioning? Data partitioning is a key requirement for effective parallel executions of database operations. It spreads data from database tables across multiple disks so that I/O operations such as read and write can be performed in parallel. There are two ways in which data partitioning can be done Random Partitioning Intelligent Partitioning 32) Draw the tool layout and integration points of metadata. Pg No 211 Fig No 11.3 5 DWM UNIT ii – 2 Marks 33) List out the requirements for parallel DBMS. 1) Support for function shipping 2) Parallel join strategies 3) Support for data repartitioning 4) Query compilation 5) Support for database transactions 1) 2) 3) 4) 5) 6) 7) 12 Marks With neat sketch explain the data warehouse architecture. Explain in detail about implementation considerations in data warehouse. Explain in detail about design considerations in data warehouse. Explain in detail about database architectures for parallel processing. Write about implementation examples of metadata repository. Write short notes on a. Metadata Interchange Initiative b. Metadata Defined Write short notes on a. Metadata Repository b. Metadata Management 8) Write short notes on a. Tool requirements b. Vendor Approach c. Access to legacy data 6