Task 1. Fill the gaps in the text using the words from the box a. b. c. d. e. Techniques Columns Operations Spreadsheet Redundancy Taking a Relational Approach to Data Modeling Relational theory, which is used in relational database design and documented by a relational model, centers on the mathematical term relation. A relation, in this context, refers to two-dimensional tables that have single-valued entries, such as a 1.__________. All data entries exist on a row in one or more 2.__________. Data existing in a column is the same data element, which can be defined by a unique name, and has no positional or ordinal significance whatsoever. You may think the term relational is in reference to the use of relationships to link data sets (entities) and tables, but this isn't the case. In spite of being referred to as a relational theory, it isn't a theory in the normal sense. It doesn't have a succinct, well-defined list of premise-conclusion statements. It wasn't originally written to explain data behavior in the context of data storage structures in place al the time (prior to relational theory, no RDBMSs existed). Relational theory was developed to propose rules for organizing data in a new way. The priority for this new data organization was to reduce 3.__________ and data maintenance anomalies. The reason it was called a theory was that these rules for organizing and storing data are based upon the mathematical concepts of relational set theory, which began to be explored in depth by George Cantor in the 1800s. This means that relational theory isn't just a database design standard, as you can use relational set theory 4.__________ to document data characteristics without considering any physical deployment of a database platform. In fact, the mathematical set theory upon which relational databases are based predates modern electronic data storage. The concepts used by Dr. Edgar R Codd to apply relational set theory to data storage provide sound analysis and documentation goals for describing the details of data element behavior in a business activity. The relational model provides a data organization technique that allows for consistency in storing and retrieving data by using recognized mathematical 5.__________. A good point to remember is that scientific models reflect a sort of theoretical perfection that no one ever achieves in reality (which results in the need to sometimes break the rules once you know when it's a good idea to do so). Relational DBMS Objectives You saw, from your brief look at nonrelational DBMSs, some of the challenges of both hierarchical and network database systems in terms of storing large amounts of data. Codd, while seeking to improve these database models, outlined a series of objectives for a "relational" DBMS in an invited paper to the 1974 International Federation for Information Processing (IFIP) Congress. These are the objectives for which he was striving: To provide a high degree of data independence Information support for control systemsLesson 3 / Student Page 1/7 To provide a community view of the data, of Spartan simplicity, so that a wide variety of users in an enterprise (ranging from the most computer naive to the most computer sophisticated) can interact with a common model (while not prohibiting user-specific views for specialized purposes) To simplify the potentially formidable job of the DBA To introduce a theoretical foundation (albeit modest) to the activity of database management (a field sadly lacking in solid principles and guidelines) To merge the fact or record retrieval and file management fields in preparation for the addition, at a later time, of inferential services in the commercial world (in other words, finding a new way of retrieving data other than using the singlelevel pointer structures) To lift data-based application programming to a new level, namely one in which sets (and more specifically relations) are treated as operands instead of being processed element by element He goes on to list the four main components of his relational model. Codd said the motive for the establishment of data collections isn't to store knowledge altruistically for its own sake but to benefit the business by providing business clients with the data they need for their activities. And since this is true, why not treat the client data needs as the drivers for database development in the future? So, he therefore suggests that data should be organized to do the following: To simplify, to the greatest practical extent, the types of data structures employed in the principal schema (or community view) To introduce powerful operators to enable both programmers and nonprogrammers to store and retrieve target data without having to "navigate" to the target To introduce natural language (for example, English) methods with dialog box support to permit effective interaction by casual (and possibly computer-naive) users To express authorization and integrity constraints separately from the dala structure (because they're liable to change) Although the first commercial RDBMS was the Multics Relational Data Store (MRDS) launched in 1978 by Honeywell, it wasn't until the early 1980s that RDBMSs became readily available and Codd's concepts began to be tested. What Are Codd's Rules of an RDBMS? Several years after his IFIP paper, Codd came up with 12 rules, which are still used today as the measure of the relational nature of databases. It's important to note that many of the DBMSs we consider to be relational today don't conform to all these rules. Although these rules act primarily as a measure of the degree to which a database can be described as relational, it's also possible to use them in highlighting the importance of some of the aspects of Physical modeling. Although most database designs (the physicalization of a Logical data model) almost never follow all these rules religiously, it's good to understand the foundations from which you're working. These rules provide an indication of what a theoretically perfect relational database would be like and provide a rationale for organizing data relationally. However, as Information support for control systemsLesson 3 / Student Page 2/7 you'll see in the course of this book, when physically implementing relational databases, we often break some of these rules to tune the database's performance. Rule 0: The Proviso The proviso to these rules is a Rule 0: any system that claims to be a relational database management system must be able to manage data entirely through its relational capabilities. This means that the RDBMS must be self-contained as far as data management is concerned. It must not require any hardware- or software-specific commands to be able to access or manage the data. All data management activities must be command oriented and accessible through the RDBMS's relational commands. In other words, although the RDBMS software is loaded on a given piece of hardware and is under the control of an operating system, the RDBMS doesn't directly reference any of the capabilities of the hardware or operating system for data management. Although a front-end tool such as Enterprise Manager in SQL Server 2000 may help create database objects, the actual management of these objects happens within the database itself. This is accomplished using internal catalog tables lo locate and manipulate all the data structures within the database. The actual location of this information on disk, tape, or in memory isn't relevant. Rule 1: The Information Rule Rule 1 states that all information in a relational database is represented explicitly at the logical level in exactly one way — by values in a table. This means that data elements (and data values) aren't kept in a code block or a screen widget. All data elements must be stored and managed in tables. Keeping a restricted value set in the front end, using things such as LOV functions, or in the back end, using restricted domain sets such as the 88-code description level of IMS or SQL triggers, violates this rule. Again, all data values and program constants have to be stored in a table. Rule 2: Guaranteed Access Rule Rule 2 states that each and every datum (atomic value) in a relational database is guaranteed to be logically accessible through referencing a table name, primary key value, and column name. This means that every value in every record in the database can be located by the table name, column name, and unique identifier (as a key, not as a physical storage locator number) of the record. It emphasizes the following two points: • First, the importance of naming in modeling. Every table must have a unique name (we hope across the enterprise but at least in the database), but meaningful names aren't required — they're simply helpful to those accessing and maintaining the data. Some RDBMSs allow duplicate table names as long as the creating owner is different, as in DB2. • Second, the importance of choosing the data element(s) that will act as each table's primary key. Rule 3: Systematic Treatment of NULL Values Rule 3 states that NULL values (distinct from an empty character string, a string of blank characters, or a numeric zero value) are supported in the RDBMS as a systematic Information support for control systemsLesson 3 / Student Page 3/7 representation of missing information, independent of the data type of the column containing the NULL value. This means that the database engine has to allow NULL values for any data type, as distinct and different from zeros, spaces, and N/A. This emphasizes the importance of the database supporting defined nuliability (the ability to not have any value at all) and optionality (the ability for optional relationships to other data sets). Rule 4: Dynamic Online Catalog Based on the Relational Model Rule 4 slates that the description of the database structures is represented at the logical level in the same way as ordinary data so that authorized users can apply the same relational language lo database structure interrogation as they apply lo regular data. Also, metadata about the actual data structures themselves should be able to be selected from system tables, usually called system catalogs. For example, in Oracle these tables make up the Oracle Data Dictionary. These catalogs or library tables contain the key pieces of data about The Physical model in data element form. Some even store the definitions of the tables and columns. This emphasizes that the data model and database structures are available for public use. Rule 5: Comprehensive Data Sublanguage Rule Rule 5 states that a relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, by some well-defined syntax, as character strings and whose ability to support all the following is comprehensible: data definition, view definition, data manipulation (interactive and by program), integrity constraints, and transaction boundaries (begin, commit, and rollback). This means that a relational database must work with one or several programming languages (SQL, T-SQL, and PL/SQL, for example) that are extensible enough to cover all the functionality requirements of managing the environment. They must support any number of changes to be treated by the DBMS as a single unit of work, which must succeed or fail completely. For a modeler, this means you need to be aware of the rules of the programming languages being used in your world before you generate your physical database design. There will be a list of restricted words you can't use for naming, for example. Rule 6: View Updating Rule Rule 6 states that all views that can theoretically be updated can also be updated by the system. Views are temporary sets of data based on the results of a query. Rule G proves that Codd was very forward thinking. This rule means that if a view can be changed by changing the base values that it displays, then it should also be possible for the data represented to be manipulated directly, and the changes should ripple through to the base values. It also means that each view should support the same full range of data manipulation options that's available for tables. Information support for control systemsLesson 3 / Student Page 4/7 Up until recently, views were temporary arrays of data, accessible like a table but "readonly" as the answer to a query. This meant that the data really lived elsewhere, and a view was simply a report-like display. Updating data through views was impossible. You could update only the base data to impact the view. Materialized views (available in Oracle 8), indexed views (available in Microsoft SQLServer 2000), and some other new functionality (such as INSTEAD OF triggers in SQL Server 2000, which can take control as soon as any data manipulation commands are executed against the view) changed all that. Given that a view should know where its data comes from, it can now push an update backward to the origin. Of course, restrictions still exist. Basically, a view can be updated only if the Data Manipulation Language (DML) command against the view can be unambiguously decomposed into corresponding DML commands against rows and columns of the underlying base tables. At the time of this writing, using a CROUP BY or UNION, and so on, will take away the ability of your view to be updated, as there's no one-to-one correlation between rows in the view and in the base table. Inserting rows through views is usually problematic, as there may well be columns outside the view scope that are NOT NULL (but with no default value defined). Rule 7: High-Level Insert, Update, and Delete Rule 7 states that the capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update, and deletion of data. This rule underlines the mathematics of set theory upon which the relational database is built. It says that records have to be treated as sets for all functions. First the set of records (a set of one or more) is identified, and then the set is modified as a group, without having to step through single row processing. This rule states that data manipulation processes occur independently of the order of retrieval or storage of records in a table. All records are manipulated equally. Rule 8: Physical Data Independence Rule 8 states that application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representation or access methods. This means that the data customer is isolated from the physical method of storing and retrieving data from the database. They don't need to worry about factors such as the physical location of data on physical disks or the disk space management for each table. In other words, the logical manner in which the user accesses the data must be independent from the underlying architecture (storage, indexing, partitioning, and so on). Such independence ensures that the data remains accessible to the user no matter what performance tuning of the physical architecture occurs. Rule 9: Logical Data Independence Rule 9 slates that application programs and terminal activities remain logically unimpaired when changes of any kind that theoretically permit unimpairment are made to the base tables. This rule strongly suggests that The logical understanding of data organization and the physical design choices of that data are completely independent. You should be able to change the database-level design of data structures without a front end losing connectivity. This is sometimes difficult to implement. We often buffer applications from database changes by Information support for control systemsLesson 3 / Student Page 5/7 restricting access through views only, by setting up synonyms, or by renaming tables if they need to change drastically, but applications depend on the names of physical structures. The term unimpairment that Codd uses refers to changes that aren't destructive. For instance, dropping a column is destructive and likely to cause impairment to the application whereas changing a name isn't from a logical perspective (although if not buffered by a view, the name change can cause havoc). Rule 10: Integrity Independence Rule 10 states that integrity constraints specific to a particular relational database must be definable in the relational data sublanguage and storable in the catalog, not in the application programs. A minimum of the following two integrity constraints must be supported: Data set integrity: No components of the identifying factor of the set are allowed to have a NULL value (or a value representing a NULL, such as N/A). Referential integrity: For each distinct non-NULL foreign key value in a relational data- base, a matching primary key value from the same domain must exist. So, in other words, no parent record can be processed without all the impacts to the children records being processed at the same time. Orphan records, those not related to others in the database tables, aren't allowed. These integrity constraints must be enforceable at the database level—not in the programming. So not only must they be enforceable by the RDBMS, these constraints must also be enforcedhy the RDBMS, not by any application program that uses this database. Rule 11: Distribution Independence Rule 11 states that an RDBMS has distribution independence. Distribution independence implies that users shouldn't have to be aware of whether a database is distributed. This means that anyone using data should be totally unaware of whether the database is distributed (in other words, whether parts of the database exist in multiple locations). Fven from the Physical model, it shouldn't make any difference where the DBA chooses to set up the data storage, but it most certainly doesn't matter to the Logical model. This was very forward thinking, as relational database vendors are only just now producing features that support fully distributed databases. Rule 12: Nonsubversion Rule Rule 12 slates that if an RDBMS has a low-level (single-record-at-a-time) language, that low- level language can't be used to subvert or bypass the integrity rules or constraints expressed in the higher-level (multiple-records-at-a-time) relational language. All this rule is saying is that there should be no way around the integrity rules in the data- base. The rules should be so intrinsic that you have no way to violate these rules without deleting and re-creating the database object. Advantages of Using the Relational Model The following list came from Professional Java Data (Apress, 2001). It's a nice overview of what relational modeling is trying to achieve. It describes data independent of the actual physical representation of the data. The model of data is simple and easy to understand. Information support for control systemsLesson 3 / Student Page 6/7 It provides high-level operations for querying the data. The model is conceptually simple, allowing application programmers to be able to quickly grasp the important concepts they need to get started with their work. The model is based on a mathematical structure, which allows many operational aspects to be proved, and the operations have well-defined properties. It's easy to modify and add to relational databases. The same database can be represented with less redundancy. This is a pretty comprehensive list of the advantages of modeling your data relationally before you commit yourself to a physical design. We have to add a few things here, though. Codd was trying to say that understanding your data is paramount to designing a data management solution. You need to divorce your analysis from the restrictions and characteristics of any DBMS and concentrate on understanding The realities of the data you're expecting to build and store. You need to document this in a way that you can communicate your conclusions to all members of the development team no matter how technical they are— so your analysts needs to be simple and yet comprehensive. The details of the rules of normalization upon which relational modeling is based form one part of the Logical modeler's task—understanding the atomic data organization needed to document all the rules the business takes for granted. The communication and documentation language of modeling, which is graphic in nature, comes in a variety of syntaxes, one of which we'll cover in detail in Chapter 4 and use later in the book as we show how to build a variety of relational models in the tutorial. Task 2. Give an example of rule implementation in database that you described in previous lesson. Choose rule by your last number in your student`s record-book. Information support for control systemsLesson 3 / Student Page 7/7