i. In which situation we create subtype and supertype relationship. Answer: Entity subtypes are entered into the ER model in order to reduce the total number of attributes of each entity. Each entity has a set of unique attributes. However, the attributes of different entities can be repeated. Therefore, it is necessary to develop the ER-model so that the number of repeated attributes in different entities was minimal or reduced to zero. Repeating attributes carries redundancy in the database. The size of the database becomes unreasonably large, so this problem needs to be fixed. Entity subtypes are used to solve this problem. The idea of using an entity subtype is that a supertype is allocated for the entire diverse set of entities, which contains information common to all types of entities. Details (subtleties) of each type of entity are taken out separately in several specialized subtypes. An entity supertype is an entity type in which only common attributes are implemented (described) for entity subtypes that use this supertype. Example. Suppose you need to develop a database of employees of the educational institution. In an educational institution there are 3 entities, each of which represents a professional group of employees: the entity Administration; the entity Teacher; the entity Support staff. If for each entity to describe its own set of attributes, then you will notice that some attributes in different entities will be repeated. The following attributes are common to all entities: Name; Surname; Identification number. You can also highlight some unique attributes: the entity type “Administration” has the administrative rate, the name of the position held, etc.; the type of entity “Teacher” has the number of hours read, the rate per 1 hour, category, etc.; the entity type “Support personnel” has a staff rate, a weekend or working day ratio (if the employee worked on weekends), etc. To solve the problem of avoiding data repeatability, changes are made to the ER-model, as shown in Figure 1, namely: the “Employee” supertype of entity is introduced. This supertype contains common attributes for all types (subtypes) of entities; entity subtypes “Administration”, “Teacher”, “Support staff” are introduced. Each of the entity subtypes has its own unique attributes. ii. Each value of foreign key must match the primary key value or it may be null. In which situation the foreign key may be null. Give also an example to show that situation. Answer: Yes, it can be NULL I want to explain why a foreign key might need to be null. First remember a foreign key simply requires that the value in that field must exist first in a different table (the parent table). That is all a Foreign Key is by definition. Null by definition is not a value. Null means that we do not yet know what the value is. Let me give you a real-life example. Suppose you have a database that stores sales proposals. Suppose further that each proposal only has one sales person assigned and one client. So, your proposal table would have two foreign keys, one with the client ID and one with the sales rep ID. However, at the time the record is created, a sales rep is not always assigned (because no one is free to work on it yet), so the client ID is filled in but the sales rep ID might be null. In other words, usually you need the ability to have a null FK when you may not know its value at the time the data is entered, but you do know other values in the table that need to be entered. To allow nulls in an FK generally all you have to do is allow nulls on the field that has the FK. The null value is separate from the idea of it being an FK. Whether it is unique or not unique relates to whether the table has a one-one or a one-many relationship to the parent table. Now if you have a one-one relationship, it is possible that you could have the data all in one table, but if the table is getting too wide or if the data is on a different topic (the employee - insurance example @tbone gave for instance), then you want separate tables with a FK. You would then want to make this FK either also the PK (which guarantees uniqueness) or put a unique constraint on it. iii. Consider a computer shop as a business. Identify at least three major possible entities with four attributes of each and create proper ERD for its database with these three entities. Answer: CUSTOMER(Customer_id, Customer_name, Customer_phone, Customer_address) PRODUCT(Product_id, Product_name, Company, Model, Description) INVOICE(Invoice_id, Customer_id, Date, Total_amount) SALES(Sale_id, Product_id, Invoice_id, Unit_price, Quantity, Subtotal) iv. Why DBMSs maintain the Meta data for the databases. Give answer with an example of table and its Meta data. Answer: Metadata is simply defined as data about data. It means it is a description and context of the data. It helps to organize, find and understand data. Let me explain to you by giving a realworld example of metadata: Every time you take a photo with today’s cameras a bunch of metadata is gathered and saved with it. Such as File name, Size of the file, Date and time, Camera settings etc. Meta data in Relational database: Relational databases store and provide access not only data but also metadata in a structure called data dictionary or system catalog. It holds information about: tables, columns, data types, table relationship, constraints etc. Data dictionary: A data dictionary is a collection of descriptions of the data objects or items in a data model for the benefit of programmers and others who need to refer to them. A data dictionary contains a list of all files in the database, the number of records in each file, and the names and types of each field. Most database management systems keep the data dictionary hidden from users to prevent them from accidentally destroying its contents. Accessing metadata in RDBMS: RDBMS provides access to their metadata with a set of tables or views often called system catalog or data dictionary. We can access those views using plain SQL statements. select * from tables v. Apply the generalization process on following entities. Answer: Generalization Vehicle Supertype vi. Let suppose in an organization there are three types of Employees, Permanent, Project Base, Daily wages. Currently all employees’ records are in a single entity by the name of EMPLOYEE. Apply specialization process on EMPLOYEE entity. Assign possible attributes to all subtype entities as well as supertype entity. Answer: Employee EmpId EmpName EmpPhone EmpAddress Permanent Employee ContractNumber BillingRate Salaried Employee MonthlySalary StockOption HOURLY EMPLOYEE HourlyRate vii. Let we have a situation in which EERD is required, but we ignore it. What possible problems will arise for us as a database administrator. Answer: If we do not use EERD the same data may be present in more than one tables, and there will be data redundancy which leads to data inconsistency and we may loss the data integrity in the database. viii. What are the major features of a data warehouse that are not exist in a common centralized database. Answer: The key characteristics of a data warehouse are as follows: Some data is denormalized for simplification and to improve performance Large amounts of historical data are used Queries often retrieve large amounts of data Both planned and ad hoc queries are common The data load is controlled In general, fast query performance with high data throughput is the key to a successful data warehouse. ix. According to W. H. Inmon, “Data warehouse is a subject-oriented, integrated, nonvolatile, and time-variant collection of data in support of management’s decisions”. Explain in detail the statement of Inmon in your own words with examples. Answer: next page Subject-oriented: The information in the data warehouse revolves around some subject therefore it is not contained all company data ever but only the subject matters of the interest for instance data on your competitors need not appear in a data warehouse however your own sales data will most certainly be there Integrated: Each data warehouse or each team or even each person has their own preferences when it comes to naming conventions that’s why common standards are developed to make sure the data warehouse picks the best quality data from everywhere, this relates to master data governance but that is the topic for another time Time variant: Relates to the pack that the data warehouse contains historical data too as said before we mainly use the data warehouse for analysis and reporting which employs, we need to know what happened five or ten years ago Non-volatile: Employs the data only flows in the data warehouse as is once there it cannot be changed or deleted x. Following diagram is showing a complete data warehouse system. Elaborate the working flow of this diagram. Answer: The source systems are the OLTP systems that contain the data you want to load into the data warehouse. Online Transaction Processing (OLTP) is a system whose main purpose is to capture and store the business transactions. The source systems’ data is examined using a data profiler to understand the characteristics of the data. A data profiler is a tool that has the capability to analyze data, such as finding out how many rows are in each table, how many rows contain NULL values, and so on. The extract, transform, and load (ETL) system then brings data from various source systems into a staging area. ETL is a system that has the capability to connect to the source systems, read the data, transform the data, and load it into a target system (the target system doesn’t have to be a data warehouse). The ETL system then integrates, transforms, and loads the data into a dimensional data store (DDS). A DDS is a database that stores the data warehouse data in a different format than OLTP. The reason for getting the data from the source system into the DDS and then querying the DDS instead of querying the source system directly is that in a DDS the data is arranged in a dimensional format that is more suitable for analysis. The second reason is because a DDS contains integrated data from several source systems. When the ETL system loads the data into the DDS, the data quality rules do various data quality checks. Bad data is put into the data quality (DQ) database to be reported and then corrected in the source systems. Bad data can also be automatically corrected or tolerated if it is within a certain limit. The ETL system is managed and orchestrated by the control system, based on the sequence, rules, and logic stored in the metadata. The metadata is a database containing information about the data structure, the data meaning, the data usage, the data quality rules, and other information about the data. The audit system logs the system operations and usage into the metadata database. The audit system is part of the ETL system that monitors the operational activities of the ETL processes and logs their operational statistics. It is used for understanding what happened during the ETL process. Users use various front-end tools such as spreadsheets, pivot tables, reporting tools, and SQL query tools to retrieve and analyze the data in a DDS. Some applications operate on a multidimensional database format. For these applications, the data in the DDS is loaded into multidimensional databases (MDBs), which are also known as cubes. A multidimensional database is a form of database where the data is stored in cells and the position of each cell is defined by a number of variables called dimensions. Each cell represents a business event, and the values of the dimensions show when and where this event happened. Tools such as analytics applications, data mining, scorecards, dashboards, multidimensional reporting tools, and other BI tools can retrieve data interactively from multidimensional databases. They retrieve the data to produce various features and results on the front-end screens that enable the users to get a deeper understanding about their businesses. An example of an analytic application is to analyze the sales by time, customer, and product. The users can analyze the revenue and cost for a certain month, region, and product type.