UNIT DATA WAREHOUSE ARCHITECTURE Names of Sub-Units Data Warehouse Architecture Types, Centralized Data Warehouse, Independent Data Marts, Federated Data Model, Hub and Spoke Data Model, and Data Mart Bus. Overview This architectural type takes into account the enterprise-level information requirements. The overall infrastructure is established. Atomic-level normalized data at the lowest level of granularity is stored in the third normal form. Occasionally, some summarized data is included. Queries and applications access the normalized data in the central data warehouse. There are no separate data marts. Learning Objectives In this Unit you will learn – the Appraise and interpret the architectural design of a Data Warehouse. Learning Outcomes At the end of this unit, you would: Able to Describe the design of a Data Warehouse JGI JAIN D EE M E D- T O -B E U N I V E R S I T Y UNIT 3: Data Warehouse Architecture Pre-Unit Preparatory Material https://archive.nptel.ac.in/courses/110/105/110105147/ http://nitttrc.edu.in/nptel/courses/video/106108102/L05.html 2 UNIT 3: Data Warehouse Architecture JGI JAIN D EE M E D- T O -B E U N I V E R S I T Y 3.1 Data Warehouse Architecture Types This architectural type evolves in companies where the organizational units develop their own data marts for their own specific purposes. Although each data mart serves a particular organizational unit, these separate data marts do not provide “a single version of the truth.” The data marts are independent of one another. As a result, these different data marts are likely to have inconsistent data definitions and standards. Such variances hinder the analysis of data across data marts. For example, if there are two independent data marts, one for sales and the other for shipments, although sales and shipments are related subjects, the independent data marts would make it difficult to analyze sales and shipments data together. Fig. Data warehouse architectural types There are three common data warehouse architecture types typically used for building a data warehouse. Single-Tier Architecture Two-Tier Architecture Three-Tier Architecture 3 JGI JAIN UNIT 3: Data Warehouse Architecture D EE M E D- T O -B E U N I V E R S I T Y Each type of data warehouse architecture has its own benefits and limitations. Let’s explore the unique characteristics of each one of them. 3.1.1 Single-Tier Architecture The single-tier data warehouse architecture reduces the amount of data stored in a data warehouse by building a more compact data set. Its advantage is that it helps remove data redundancies and improves the quality of your data. However, it isn’t the ideal solution for agencies that own large volumes of data and operate with multiple data streams because it’s inefficient. The single-tier architecture has three layers: A source layers A data warehouse layer An analysis of layers In the single-tier architecture, only the source layer is physical. The data warehouse layer is virtual and provides data in a multidimensional view, created by an intermediate processing layer. One drawback of the single-tier architecture is the lack of separation between analytical and transactional processing. And that’s why this type of data warehouse architecture is not used frequently. Fig. Single Tier Architecture 4 JGI UNIT 3: Data Warehouse Architecture JAIN D EE M E D- T O -B E U N I V E R S I T Y 3.1.2 Two-Tier Architecture Unlike the single-tier architecture, the two-tier architecture contains a data staging area that ensures any data you load into the warehouse is cleansed and in the right format. It’s found between the source layer and the data warehouse layer, as depicted in the image below. Fig. Two Tier Architecture 3.1.3 Three-Tier Architecture The three-tier architecture is what most organizations go for when building a data warehouse system. It solves the connectivity problems that two-tier architecture commonly faces. The three-tier architecture is made up of: A source layers A reconciled layer A data warehouse layer The three-tier architecture is useful for extensive, enterprise-wide systems. But its disadvantage is the additional storage space it uses through the redundant, reconciled layer. The three-tier architecture also has three tiers: A bottom tier A top tier A middle tier 5 JGI JAIN UNIT 3: Data Warehouse Architecture D EE M E D- T O -B E U N I V E R S I T Y These three tiers are commonly called the layers of a data warehouse architecture. Let’s take an in-depth look at these layers. Fig. Three Tier Architecture 3.2 Centralized Data Warehouse A Centralized Data Warehouse is a data warehousing implementation wherein a single data warehouse serves the needs of several separate business unites simultaneously using a single data model that spans the needs of multiple business divisions. Today, having information means having power. In any a lot of aspects of daily living, having relevant information can give us more ease in daily activities. This is made manifest by the use of the internet. Because of the information that can be obtained every day, internet users are growing by the day and the time people spend on the internet is getting longer as web services are getting more and more sophisticated with applications that can gather and aggregate billions of disparate data into useful information. 6 UNIT 3: Data Warehouse Architecture JGI JAIN D EE M E D- T O -B E U N I V E R S I T Y Fig. Centralized Data Warehouse 3.3 Independent Data Marts A data mart is a simple form of a data warehouse that is focused on a single subject (or functional area), such as Sales or Finance or Marketing. Data marts are often built and controlled by a single department within an organization. Given their single-subject focus, data marts usually draw data from only a few sources. The sources could be internal operational systems, a central data warehouse, or external data. An independent data mart is created without the use of a central data warehouse. This could be desirable for smaller groups within an organization. It is not, however, the focus of this Guide. See the Data Mart Suites documentation for further details regarding this architecture. 7 JGI JAIN UNIT 3: Data Warehouse Architecture D EE M E D- T O -B E U N I V E R S I T Y Fig. Independent Data Marts 3.3.1 Dependent Data Mart A dependent data mart allows you to unite your organization's data in one data warehouse. This gives you the usual advantages of centralization. Fig. Dependent Data Mart 3.4 Federated Data Model Some companies get into data warehousing with an existing legacy of an assortment 8 UNIT 3: Data Warehouse Architecture JGI JAIN D EE M E D- T O -B E U N I V E R S I T Y of decision-support structures in the form of operational systems, extracted datasets, primitive data marts, and so on. For such companies, it may not be prudent to discard all that huge investment and start from scratch. The practical solution is a federated architectural type where data may be physically or logically integrated through shared key fields, overall global metadata, distributed queries, and other methods. In this architectural type, there is no one overall data warehouse. 3.5 Hub and Spoke Data Model This is the Inman Corporate Information Factory approach. Similar to the centralized data warehouse architecture, here too is an overall enterprise-wide data warehouse. Atomic data in the third normal form is stored in the centralized data warehouse. The major and useful difference is the presence of dependent data marts in this architectural type. Dependent data marts obtain data from the centralized data warehouse. The centralized data warehouse forms the hub to feed data to the data marts on the spokes. The dependent data marts may be developed for a variety of purposes: departmental analytical needs, specialized queries, data mining, and so on. Each dependent dart mart may have normalized, denormalized, summarized, or dimensional data structures based on individual requirements. Most queries are directed to the dependent data marts although the centralized data warehouse may itself be used for querying. This architectural type results from adopting a top-down approach to data warehouse development. 3.6 Data Mart Bus This is the Kimbal conformed supermarts approach. You begin with analyzing requirements for a specific business subject such as orders, shipments, billings, insurance claims, car rentals, and so on. You build the first data mart (supermart) using business dimensions and metrics. These business dimensions will be shared in the future data marts. The principal notion is that by conforming dimensions among the various data marts, the result would be logically integrated supermarts that will provide an enterprise view of the data. The data marts contain atomic data organized as a dimensional data model. This architectural type results from adopting an enhanced bottom-up approach to data warehouse development. 9 JGI JAIN UNIT 3: Data Warehouse Architecture D EE M E D- T O -B E U N I V E R S I T Y Summary A data warehouse architecture is a set of interconnected databases that store, organizes, and analyzes data. A data warehouse is a collection of databases that stores and organizes data in a systematic way. A data warehouse architecture consists of three main components: a data warehouse, an analytical framework, and an integration layer. The data warehouse is the central repository for all the data. The analytical framework is the software that processes the data and organizes it into tables. The integration layer is the software that connects the databases together and makes them accessible to other applications. A data warehouse architecture is an important part of any IT infrastructure because it helps to optimize the performance of the entire system. Self-Practice Mini Project: Fraud Detection using PaySim Financial Dataset Reference link: https://www.projectpro.io/article/data-warehouse-project-ideas-for- practice/572 Self-Assessment Questions a) Essay Short 1. Define Data Warehouse Architecture. 2. List Data Warehouse Architecture Types. 3. Define data marts. 4. Define the Data Model. 5. Define Centralized Data Warehouse. Medium 1. Writhe working principle of single-tier Data Warehouse Architecture. 2. Writhe working principle of two-tier Data Warehouse Architecture. 3. Writhe working principle of two-tier Data Warehouse Architecture. 4. Brief Federated Data Model in Data Warehouse Architecture. 5. Brief Data Mart Bus. 10 UNIT 3: Data Warehouse Architecture JGI JAIN D EE M E D- T O -B E U N I V E R S I T Y Long 1. Explain single-tier Data Warehouse Architecture with a neat sketch. 2. Describe Two-tier Data Warehouse Architecture with a neat sketch. 3. Explain Three-tier Data Warehouse Architecture with a neat sketch. 4. Elaborate Centralized Data Warehouse with a suitable sketch. 5. Explain Hub and Spoke Data Model with a neat diagram. 3.7 POST-UNIT READING MATERIAL https://www.youtube.com/watch?v=J326LIUrZM8&list=PL9ooVrP1hQOEDSc5QEbI8W YVV_EbWKJwX https://www.youtube.com/watch?v=G4NYQox4n2g 3.8 TOPICS FOR DISCUSSION FORUMS 11