Task 3

advertisement
Task 1. Fill the gaps in the text using the words from the box
a.
b.
c.
d.
e.
Techniques
Columns
Operations
Spreadsheet
Redundancy
Taking a Relational Approach to Data Modeling
Relational theory, which is used in relational database design and documented by a
relational model, centers on the mathematical term relation. A relation, in this context, refers
to two-dimensional tables that have single-valued entries, such as a 1.__________. All data
entries exist on a row in one or more 2.__________. Data existing in a column is the same data
element, which can be defined by a unique name, and has no positional or ordinal significance
whatsoever. You may think the term relational is in reference to the use of relationships to link
data sets (entities) and tables, but this isn't the case.
In spite of being referred to as a relational theory, it isn't a theory in the normal sense.
It doesn't have a succinct, well-defined list of premise-conclusion statements. It wasn't
originally written to explain data behavior in the context of data storage structures in place al
the time (prior to relational theory, no RDBMSs existed). Relational theory was developed to
propose rules for organizing data in a new way. The priority for this new data organization was
to reduce 3.__________ and data maintenance anomalies. The reason it was called a theory
was that these rules for organizing and storing data are based upon the mathematical concepts
of relational set theory, which began to be explored in depth by George Cantor in the 1800s.
This means that relational theory isn't just a database design standard, as you can use relational
set theory 4.__________ to document data characteristics without considering any physical
deployment of a database platform. In fact, the mathematical set theory upon which relational
databases are based predates modern electronic data storage. The concepts used by Dr. Edgar
R Codd to apply relational set theory to data storage provide sound analysis and documentation
goals for describing the details of data element behavior in a business activity.
The relational model provides a data organization technique that allows for consistency
in storing and retrieving data by using recognized mathematical 5.__________. A good point
to remember is that scientific models reflect a sort of theoretical perfection that no one ever
achieves in reality (which results in the need to sometimes break the rules once you know when
it's a good idea to do so).
Relational DBMS Objectives
You saw, from your brief look at nonrelational DBMSs, some of the challenges of both
hierarchical and network database systems in terms of storing large amounts of data. Codd,
while seeking to improve these database models, outlined a series of objectives for a
"relational" DBMS in an invited paper to the 1974 International Federation for Information
Processing (IFIP) Congress.
These are the objectives for which he was striving:
 To provide a high degree of data independence
Information support for control systemsLesson 3 / Student
Page 1/7

To provide a community view of the data, of Spartan simplicity, so that a wide
variety of users in an enterprise (ranging from the most computer naive to the
most computer sophisticated) can interact with a common model (while not
prohibiting user-specific views for specialized purposes)
 To simplify the potentially formidable job of the DBA
 To introduce a theoretical foundation (albeit modest) to the activity of database
management (a field sadly lacking in solid principles and guidelines)
 To merge the fact or record retrieval and file management fields in preparation
for the addition, at a later time, of inferential services in the commercial world
(in other words, finding a new way of retrieving data other than using the singlelevel pointer structures)
 To lift data-based application programming to a new level, namely one in which
sets (and more specifically relations) are treated as operands instead of being
processed element by element
He goes on to list the four main components of his relational model. Codd said the
motive for the establishment of data collections isn't to store knowledge altruistically for its
own sake but to benefit the business by providing business clients with the data they need for
their activities. And since this is true, why not treat the client data needs as the drivers for
database development in the future? So, he therefore suggests that data should be organized to
do the following:
 To simplify, to the greatest practical extent, the types of data structures
employed in the principal schema (or community view)
 To introduce powerful operators to enable both programmers and
nonprogrammers to store and retrieve target data without having to "navigate"
to the target
 To introduce natural language (for example, English) methods with dialog box
support to permit effective interaction by casual (and possibly computer-naive)
users
 To express authorization and integrity constraints separately from the dala
structure (because they're liable to change)
Although the first commercial RDBMS was the Multics Relational Data Store (MRDS)
launched in 1978 by Honeywell, it wasn't until the early 1980s that RDBMSs became readily
available and Codd's concepts began to be tested.
What Are Codd's Rules of an RDBMS?
Several years after his IFIP paper, Codd came up with 12 rules, which are still used
today as the measure of the relational nature of databases. It's important to note that many of
the DBMSs we consider to be relational today don't conform to all these rules. Although these
rules act primarily as a measure of the degree to which a database can be described as relational,
it's also possible to use them in highlighting the importance of some of the aspects of Physical
modeling.
Although most database designs (the physicalization of a Logical data model) almost
never follow all these rules religiously, it's good to understand the foundations from which
you're working. These rules provide an indication of what a theoretically perfect relational
database would be like and provide a rationale for organizing data relationally. However, as
Information support for control systemsLesson 3 / Student
Page 2/7
you'll see in the course of this book, when physically implementing relational databases, we
often break some of these rules to tune the database's performance.
Rule 0: The Proviso
The proviso to these rules is a Rule 0: any system that claims to be a relational database
management system must be able to manage data entirely through its relational capabilities.
This means that the RDBMS must be self-contained as far as data management is
concerned. It must not require any hardware- or software-specific commands to be able to
access or manage the data. All data management activities must be command oriented and
accessible through the RDBMS's relational commands. In other words, although the RDBMS
software is loaded on a given piece of hardware and is under the control of an operating system,
the RDBMS doesn't directly reference any of the capabilities of the hardware or operating
system for data management. Although a front-end tool such as Enterprise Manager in SQL
Server 2000 may help create database objects, the actual management of these objects happens
within the database itself. This is accomplished using internal catalog tables lo locate and
manipulate all the data structures within the database. The actual location of this information
on disk, tape, or in memory isn't relevant.
Rule 1: The Information Rule
Rule 1 states that all information in a relational database is represented explicitly at the
logical level in exactly one way — by values in a table.
This means that data elements (and data values) aren't kept in a code block or a screen
widget. All data elements must be stored and managed in tables. Keeping a restricted value set
in the front end, using things such as LOV functions, or in the back end, using restricted domain
sets such as the 88-code description level of IMS or SQL triggers, violates this rule. Again, all
data values and program constants have to be stored in a table.
Rule 2: Guaranteed Access Rule
Rule 2 states that each and every datum (atomic value) in a relational database is
guaranteed to be logically accessible through referencing a table name, primary key value, and
column name.
This means that every value in every record in the database can be located by the table
name, column name, and unique identifier (as a key, not as a physical storage locator number)
of the record. It emphasizes the following two points:
• First, the importance of naming in modeling. Every table must have a unique name
(we hope across the enterprise but at least in the database), but meaningful names aren't
required — they're simply helpful to those accessing and maintaining the data. Some RDBMSs
allow duplicate table names as long as the creating owner is different, as in DB2.
• Second, the importance of choosing the data element(s) that will act as each table's
primary key.
Rule 3: Systematic Treatment of NULL Values
Rule 3 states that NULL values (distinct from an empty character string, a string of
blank characters, or a numeric zero value) are supported in the RDBMS as a systematic
Information support for control systemsLesson 3 / Student
Page 3/7
representation of missing information, independent of the data type of the column containing
the NULL value.
This means that the database engine has to allow NULL values for any data type, as
distinct and different from zeros, spaces, and N/A. This emphasizes the importance of the
database supporting defined nuliability (the ability to not have any value at all) and optionality
(the ability for optional relationships to other data sets).
Rule 4: Dynamic Online Catalog Based on the Relational
Model
Rule 4 slates that the description of the database structures is represented at the logical
level in the same way as ordinary data so that authorized users can apply the same relational
language lo database structure interrogation as they apply lo regular data.
Also, metadata about the actual data structures themselves should be able to be selected
from system tables, usually called system catalogs. For example, in Oracle these tables make
up the Oracle Data Dictionary. These catalogs or library tables contain the key pieces of data
about The Physical model in data element form. Some even store the definitions of the tables
and columns. This emphasizes that the data model and database structures are available for
public use.
Rule 5: Comprehensive Data Sublanguage Rule
Rule 5 states that a relational system may support several languages and various modes
of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one
language whose statements are expressible, by some well-defined syntax, as character strings
and whose ability to support all the following is comprehensible: data definition, view
definition, data manipulation (interactive and by program), integrity constraints, and
transaction boundaries (begin, commit, and rollback).
This means that a relational database must work with one or several programming
languages (SQL, T-SQL, and PL/SQL, for example) that are extensible enough to cover all the
functionality requirements of managing the environment. They must support any number of
changes to be treated by the DBMS as a single unit of work, which must succeed or fail
completely.
For a modeler, this means you need to be aware of the rules of the programming
languages being used in your world before you generate your physical database design. There
will be a list of restricted words you can't use for naming, for example.
Rule 6: View Updating Rule
Rule 6 states that all views that can theoretically be updated can also be updated by the
system.
Views are temporary sets of data based on the results of a query. Rule G proves that
Codd was very forward thinking. This rule means that if a view can be changed by changing
the base values that it displays, then it should also be possible for the data represented to be
manipulated directly, and the changes should ripple through to the base values. It also means
that each view should support the same full range of data manipulation options that's available
for tables.
Information support for control systemsLesson 3 / Student
Page 4/7
Up until recently, views were temporary arrays of data, accessible like a table but "readonly" as the answer to a query. This meant that the data really lived elsewhere, and a view was
simply a report-like display. Updating data through views was impossible. You could update
only the base data to impact the view. Materialized views (available in Oracle 8), indexed views
(available in Microsoft SQLServer 2000), and some other new functionality (such as
INSTEAD OF triggers in SQL Server 2000, which can take control as soon as any data
manipulation commands are executed against the view) changed all that. Given that a view
should know where its data comes from, it can now push an update backward to the origin.
Of course, restrictions still exist. Basically, a view can be updated only if the Data
Manipulation Language (DML) command against the view can be unambiguously decomposed
into corresponding DML commands against rows and columns of the underlying base tables.
At the time of this writing, using a CROUP BY or UNION, and so on, will take away the ability
of your view to be updated, as there's no one-to-one correlation between rows in the view and
in the base table. Inserting rows through views is usually problematic, as there may well be
columns outside the view scope that are NOT NULL (but with no default value defined).
Rule 7: High-Level Insert, Update, and Delete
Rule 7 states that the capability of handling a base relation or a derived relation as a
single operand applies not only to the retrieval of data but also to the insertion, update, and
deletion of data.
This rule underlines the mathematics of set theory upon which the relational database
is built. It says that records have to be treated as sets for all functions. First the set of records
(a set of one or more) is identified, and then the set is modified as a group, without having to
step through single row processing. This rule states that data manipulation processes occur
independently of the order of retrieval or storage of records in a table. All records are
manipulated equally.
Rule 8: Physical Data Independence
Rule 8 states that application programs and terminal activities remain logically
unimpaired whenever any changes are made in either storage representation or access methods.
This means that the data customer is isolated from the physical method of storing and
retrieving data from the database. They don't need to worry about factors such as the physical
location of data on physical disks or the disk space management for each table. In other words,
the logical manner in which the user accesses the data must be independent from the underlying
architecture (storage, indexing, partitioning, and so on). Such independence ensures that the
data remains accessible to the user no matter what performance tuning of the physical
architecture occurs.
Rule 9: Logical Data Independence
Rule 9 slates that application programs and terminal activities remain logically
unimpaired when changes of any kind that theoretically permit unimpairment are made to the
base tables.
This rule strongly suggests that The logical understanding of data organization and the
physical design choices of that data are completely independent. You should be able to change
the database-level design of data structures without a front end losing connectivity. This is
sometimes difficult to implement. We often buffer applications from database changes by
Information support for control systemsLesson 3 / Student
Page 5/7
restricting access through views only, by setting up synonyms, or by renaming tables if they
need to change drastically, but applications depend on the names of physical structures. The
term unimpairment that Codd uses refers to changes that aren't destructive. For instance,
dropping a column is destructive and likely to cause impairment to the application whereas
changing a name isn't from a logical perspective (although if not buffered by a view, the name
change can cause havoc).
Rule 10: Integrity Independence
Rule 10 states that integrity constraints specific to a particular relational database must
be definable in the relational data sublanguage and storable in the catalog, not in the application
programs.
A minimum of the following two integrity constraints must be supported:
 Data set integrity: No components of the identifying factor of the set are allowed
to have a NULL value (or a value representing a NULL, such as N/A).
 Referential integrity: For each distinct non-NULL foreign key value in a
relational data- base, a matching primary key value from the same domain must
exist. So, in other words, no parent record can be processed without all the
impacts to the children records being processed at the same time. Orphan
records, those not related to others in the database tables, aren't allowed.
These integrity constraints must be enforceable at the database level—not in the programming. So not
only must they be enforceable by the RDBMS, these constraints must also be enforcedhy the RDBMS, not by any
application program that uses this database.
Rule 11: Distribution Independence
Rule 11 states that an RDBMS has distribution independence. Distribution
independence implies that users shouldn't have to be aware of whether a database is distributed.
This means that anyone using data should be totally unaware of whether the database
is distributed (in other words, whether parts of the database exist in multiple locations). Fven
from the Physical model, it shouldn't make any difference where the DBA chooses to set up
the data storage, but it most certainly doesn't matter to the Logical model. This was very
forward thinking, as relational database vendors are only just now producing features that
support fully distributed databases.
Rule 12: Nonsubversion Rule
Rule 12 slates that if an RDBMS has a low-level (single-record-at-a-time) language,
that low- level language can't be used to subvert or bypass the integrity rules or constraints
expressed in the higher-level (multiple-records-at-a-time) relational language.
All this rule is saying is that there should be no way around the integrity rules in the
data- base. The rules should be so intrinsic that you have no way to violate these rules without
deleting and re-creating the database object.
Advantages of Using the Relational Model



The following list came from Professional Java Data (Apress, 2001). It's a nice
overview of what relational modeling is trying to achieve.
It describes data independent of the actual physical representation of the data.
The model of data is simple and easy to understand.
Information support for control systemsLesson 3 / Student
Page 6/7


It provides high-level operations for querying the data.
The model is conceptually simple, allowing application programmers to be able
to quickly grasp the important concepts they need to get started with their work.
 The model is based on a mathematical structure, which allows many operational
aspects to be proved, and the operations have well-defined properties.
 It's easy to modify and add to relational databases.
 The same database can be represented with less redundancy.
This is a pretty comprehensive list of the advantages of modeling your data relationally
before you commit yourself to a physical design. We have to add a few things here, though.
Codd was trying to say that understanding your data is paramount to designing a data
management solution. You need to divorce your analysis from the restrictions and
characteristics of any DBMS and concentrate on understanding The realities of the data you're
expecting to build and store. You need to document this in a way that you can communicate
your conclusions to all members of the development team no matter how technical they are—
so your analysts needs to be simple and yet comprehensive.
The details of the rules of normalization upon which relational modeling is based form
one part of the Logical modeler's task—understanding the atomic data organization needed to
document all the rules the business takes for granted. The communication and documentation
language of modeling, which is graphic in nature, comes in a variety of syntaxes, one of which
we'll cover in detail in Chapter 4 and use later in the book as we show how to build a variety
of relational models in the tutorial.
Task 2. Give an example of rule implementation in database
that you described in previous lesson. Choose rule by your last
number in your student`s record-book.
Information support for control systemsLesson 3 / Student
Page 7/7
Download