Uploaded by Sabawoon Khan

larage scale database assignemnt

advertisement
i.
In which situation we create subtype and supertype relationship.
Answer:
Entity subtypes are entered into the ER model in order to reduce the total number of attributes
of each entity.
Each entity has a set of unique attributes. However, the attributes of different entities can be
repeated. Therefore, it is necessary to develop the ER-model so that the number of repeated
attributes in different entities was minimal or reduced to zero. Repeating attributes carries
redundancy in the database. The size of the database becomes unreasonably large, so this
problem needs to be fixed. Entity subtypes are used to solve this problem.
The idea of using an entity subtype is that a supertype is allocated for the entire diverse set of
entities, which contains information common to all types of entities. Details (subtleties) of each
type of entity are taken out separately in several specialized subtypes.
An entity supertype is an entity type in which only common attributes are implemented
(described) for entity subtypes that use this supertype.
Example.
Suppose you need to develop a database of employees of the educational institution. In an
educational institution there are 3 entities, each of which represents a professional group of
employees:



the entity Administration;
the entity Teacher;
the entity Support staff.
If for each entity to describe its own set of attributes, then you will notice that some attributes
in different entities will be repeated. The following attributes are common to all entities:



Name;
Surname;
Identification number.
You can also highlight some unique attributes:



the entity type “Administration” has the administrative rate, the name of the
position held, etc.;
the type of entity “Teacher” has the number of hours read, the rate per 1 hour,
category, etc.;
the entity type “Support personnel” has a staff rate, a weekend or working day ratio
(if the employee worked on weekends), etc.
To solve the problem of avoiding data repeatability, changes are made to the ER-model, as
shown in Figure 1, namely:


the “Employee” supertype of entity is introduced. This supertype contains common
attributes for all types (subtypes) of entities;
entity subtypes “Administration”, “Teacher”, “Support staff” are introduced. Each of
the entity subtypes has its own unique attributes.
ii. Each value of foreign key must match the primary key value or it may be null. In which
situation the foreign key may be null. Give also an example to show that situation.
Answer: Yes, it can be NULL
I want to explain why a foreign key might need to be null. First remember a foreign key simply
requires that the value in that field must exist first in a different table (the parent table). That is
all a Foreign Key is by definition. Null by definition is not a value. Null means that we do not yet
know what the value is.
Let me give you a real-life example. Suppose you have a database that stores sales proposals.
Suppose further that each proposal only has one sales person assigned and one client. So, your
proposal table would have two foreign keys, one with the client ID and one with the sales rep
ID. However, at the time the record is created, a sales rep is not always assigned (because no
one is free to work on it yet), so the client ID is filled in but the sales rep ID might be null. In
other words, usually you need the ability to have a null FK when you may not know its value at
the time the data is entered, but you do know other values in the table that need to be
entered. To allow nulls in an FK generally all you have to do is allow nulls on the field that has
the FK. The null value is separate from the idea of it being an FK.
Whether it is unique or not unique relates to whether the table has a one-one or a one-many
relationship to the parent table. Now if you have a one-one relationship, it is possible that you
could have the data all in one table, but if the table is getting too wide or if the data is on a
different topic (the employee - insurance example @tbone gave for instance), then you want
separate tables with a FK. You would then want to make this FK either also the PK (which
guarantees uniqueness) or put a unique constraint on it.
iii. Consider a computer shop as a business. Identify at least three major possible entities with
four attributes of each and create proper ERD for its database with these three entities.
Answer:
CUSTOMER(Customer_id, Customer_name, Customer_phone, Customer_address)
PRODUCT(Product_id, Product_name, Company, Model, Description)
INVOICE(Invoice_id, Customer_id, Date, Total_amount)
SALES(Sale_id, Product_id, Invoice_id, Unit_price, Quantity, Subtotal)
iv. Why DBMSs maintain the Meta data for the databases. Give answer with an example of
table and its Meta data.
Answer:
Metadata is simply defined as data about data. It means it is a description and context of the
data. It helps to organize, find and understand data. Let me explain to you by giving a realworld example of metadata:
Every time you take a photo with today’s cameras a bunch of metadata is gathered and saved
with it. Such as




File name,
Size of the file,
Date and time,
Camera settings etc.
Meta data in Relational database:
Relational databases store and provide access not only data but also metadata in a structure
called data dictionary or system catalog. It holds information about:
 tables,
 columns,
 data types,
 table relationship,
 constraints etc.
Data dictionary:
 A data dictionary is a collection of descriptions of the data objects or items in a
data model for the benefit of programmers and others who need to refer to them.
 A data dictionary contains a list of all files in the database, the number of records
in each file, and the names and types of each field. Most database management
systems keep the data dictionary hidden from users to prevent them from
accidentally destroying its contents.
Accessing metadata in RDBMS:
RDBMS provides access to their metadata with a set of tables or views often called system
catalog or data dictionary. We can access those views using plain SQL statements.
select * from tables
v. Apply the generalization process on following entities.
Answer: Generalization Vehicle Supertype
vi. Let suppose in an organization there are three types of Employees, Permanent, Project
Base, Daily wages. Currently all employees’ records are in a single entity by the name of
EMPLOYEE. Apply specialization process on EMPLOYEE entity. Assign possible attributes to all
subtype entities as well as supertype entity.
Answer:
Employee
EmpId
EmpName
EmpPhone
EmpAddress
Permanent
Employee
ContractNumber
BillingRate
Salaried
Employee
MonthlySalary
StockOption
HOURLY
EMPLOYEE
HourlyRate
vii. Let we have a situation in which EERD is required, but we ignore it. What possible
problems will arise for us as a database administrator.
Answer: If we do not use EERD the same data may be present in more than one tables, and
there will be data redundancy which leads to data inconsistency and we may loss the data
integrity in the database.
viii. What are the major features of a data warehouse that are not exist in a common
centralized database.
Answer: The key characteristics of a data warehouse are as follows:





Some data is denormalized for simplification and to improve performance
Large amounts of historical data are used
Queries often retrieve large amounts of data
Both planned and ad hoc queries are common
The data load is controlled
In general, fast query performance with high data throughput is the key to a successful data
warehouse.
ix. According to W. H. Inmon, “Data warehouse is a subject-oriented, integrated, nonvolatile,
and time-variant collection of data in support of management’s decisions”. Explain in detail
the statement of Inmon in your own words with examples.
Answer: next page
Subject-oriented: The information in the data warehouse revolves around some subject
therefore it is not contained all company data ever but only the subject matters of the interest
for instance data on your competitors need not appear in a data warehouse however your own
sales data will most certainly be there
Integrated: Each data warehouse or each team or even each person has their own preferences
when it comes to naming conventions that’s why common standards are developed to make
sure the data warehouse picks the best quality data from everywhere, this relates to master
data governance but that is the topic for another time
Time variant: Relates to the pack that the data warehouse contains historical data too as said
before we mainly use the data warehouse for analysis and reporting which employs, we need
to know what happened five or ten years ago
Non-volatile: Employs the data only flows in the data warehouse as is once there it cannot be
changed or deleted
x. Following diagram is showing a complete data warehouse system. Elaborate the working
flow of this diagram.
Answer:
The source systems are the OLTP systems that contain the data you want to load into the data
warehouse. Online Transaction Processing (OLTP) is a system whose main purpose is to capture
and store the business transactions. The source systems’ data is examined using a data profiler
to understand the characteristics of the data. A data profiler is a tool that has the capability to
analyze data, such as finding out how many rows are in each table, how many rows contain
NULL values, and so on.
The extract, transform, and load (ETL) system then brings data from various source systems into
a staging area. ETL is a system that has the capability to connect to the source systems, read the
data, transform the data, and load it into a target system (the target system doesn’t have to be
a data warehouse). The ETL system then integrates, transforms, and loads the data into a
dimensional data store (DDS). A DDS is a database that stores the data warehouse data in a
different format than OLTP. The reason for getting the data from the source system into the
DDS and then querying the DDS instead of querying the source system directly is that in a DDS
the data is arranged in a dimensional format that is more suitable for analysis. The second
reason is because a DDS contains integrated data from several source systems.
When the ETL system loads the data into the DDS, the data quality rules do various data quality
checks. Bad data is put into the data quality (DQ) database to be reported and then corrected in
the source systems. Bad data can also be automatically corrected or tolerated if it is within a
certain limit. The ETL system is managed and orchestrated by the control system, based on the
sequence, rules, and logic stored in the metadata. The metadata is a database containing
information about the data structure, the data meaning, the data usage, the data quality rules,
and other information about the data.
The audit system logs the system operations and usage into the metadata database. The audit
system is part of the ETL system that monitors the operational activities of the ETL processes
and logs their operational statistics. It is used for understanding what happened during the ETL
process.
Users use various front-end tools such as spreadsheets, pivot tables, reporting tools, and SQL
query tools to retrieve and analyze the data in a DDS. Some applications operate on a
multidimensional database format. For these applications, the data in the DDS is loaded into
multidimensional databases (MDBs), which are also known as cubes. A multidimensional
database is a form of database where the data is stored in cells and the position of each cell is
defined by a number of variables called dimensions. Each cell represents a business event, and
the values of the dimensions show when and where this event happened.
Tools such as analytics applications, data mining, scorecards, dashboards, multidimensional
reporting tools, and other BI tools can retrieve data interactively from multidimensional
databases. They retrieve the data to produce various features and results on the front-end
screens that enable the users to get a deeper understanding about their businesses. An
example of an analytic application is to analyze the sales by time, customer, and product. The
users can analyze the revenue and cost for a certain month, region, and product type.
Download