Information Engineering Dr B. Mills INFORMATION MODELING: 1.ENTITY RELATION DIAGRAMS (ERD) 2. THE OSI LAYERING MODEL 3. NORMALIZING DATA ERD – Entity Relation Diagramming In a well-designed relational database, each table represents an entity. In the figure below there are 4 entities(tables): Customers, Line Items, Invoices, Products. First we must understand entities ERD – Entity Relation Diagraming Entities A database contains one or more related tables. Each table holds all of the information about an object, person or thing. Some examples of database tables might be: - a customer table - an appointments table - an exam sessions table - a teachers' names table - a concert venue table Tables are entities Each table is about an object, person, or thing. Customers Appointments Books Students Products ERD – Entity Relation Diagraming Entities have attributes Entity = Customers. CustomerID FirstName LastName Data of Birth Address ERD – Entity Relation Diagraming Entities have attributes Entity = Products. ProductID ProductName Weight Manufacturer Warehouse ERD – Entity Relation Diagraming Entities= Customer, Products, Orders Customers Products Orders ERD – Entity Relation Diagraming Entity Relationship Diagrams These relationships can be shown in the form of a diagram. This diagram is known as an 'entity relationship diagram', E-R diagram or ERD As part of your exam, you will have to draw or interpret an E-R diagram. Before you can do this, you need to be able to interpret the relationships between the entities. These relationships take the form of: - one-to-one - one-to-many - many-to-many One-to-One A husband can only have one wife A wife can only have one husband this would be known as a 'one-to-one relationship' This relationship in a diagram would look like this: One-to-Many A mother can have many children A child can have only one mother this would be known as a 'one-to-many relationship' This diagram looks like this: Many-to-Many Think about a library A book can be read by many people People can read many books this would be known as a 'many-to-many relationship‘ This relationship looks like this: Modeling Your Data When designing a data model you should first determine the following: > The ‘Many’ side usually contains the foreign key > The ‘One’ side usually contains the primary key Before you design or set up a database, you should work out: - the entities - the attributes - the entity relationships This process is called 'data modelling' What The 7 – Layer OSI Model IS: . Defines a necessary elements for data communication between devices. Defines a communication architecture, for digital comuntication systems Visually and conceptually separates communication, network, and software functions What The 7 – Layer OSI Model IS: . Defines a necessary elements for data communication between devices. Defines a communication architecture, for digital comuntication systems Visually and conceptually separates communication, network, and software functions OSI Model Definition – 7 Layers 7 6 5 4 3 2 1 Layer 1 – Physical Layer 2 – Data Link Layer 3 – Network Layer 4 – Transport Layer 5 – Session Layer 6 – Presentation Layer 7 Application Please Do Not Throw Sausage Pizza Away How data moves through the layers Layer 7 - Application The Application layer provides services to the software through which the user requests network services. Examples: Internet Explorer, Safari, and other browsers FTP Mail Many applications that run on your computer are NOT part of the Application layer. This means that the following are not part of layer 7 because they do not request network services: Physical Microsoft Word or Excel Adobe Photoshop – Data Link – Network – Transport – Session – Presentatio Layer 6 - Presentation Manages data-format information for networked communications (the network’s translator) For outgoing messages, it converts data into a generic format for network transmission; for incoming messages, it converts data from the generic network format to a format that the receiving application can understand This layer is also responsible for certain protocol conversions, data encryption/decryption, or data compression/decompression Examples: MIDI JPG, GIF, TIF MPEG Physical – Data Link – Network – Transport – Session – Presentatio Layer 5 - Session The Session layer establishes, maintains, and manages the communication session between computers. Responsible for initiating, maintaining and terminating sessions Responsible for security and access control to session information (via session participant identification) Responsible for synchronization services, and for checkpoint services Examples: NFS SQL RPC Layer 4 - Transport The functions defined in this layer provide for the reliable transmission of data segments, as well as the disassembly and assembly of the data before and after transmission. Manages the transmission of data across a network Manages the flow (flow control) of data between parties by segmenting long data streams into smaller data chunks (based on allowed “packet” size for a given transmission medium) (packet sequencing) Provides acknowledgements of successful transmissions and requests retransmission for packets which arrive with errors (error detection and recovery) Examples: TCP UDP Layer 3 - Network The Network layer defines the processes used to route data across the network and the structure and use of logical addressing. Handles addressing messages for delivery, as well as translating logical network addresses and names into their physical counterparts (Logical Addresses are managed by local network admins.) Responsible for deciding how to route transmissions between computers This layer also handles the decisions needed to get data from one point to the next point along a network path This layer also handles packet switching and network congestion control Example: Physical IP Network routers – Data Link – Network – Transport – Session – Presentatio Layer 2 – Data Link Concerned with the linkages and mechanisms used to move data about the network and deals with the ways in which data is reliably transmitted. Handles special data frames (packets) between the Network layer and the Physical layer At the sending end this layer handles conversion of data into raw formats that can be handled by the Physical Layer. At the receiving end, this layer packages raw data from the physical layer into data frames for delivery to the Network layer The data link layer is often conceptually divided into two sub-layers: logical link control (LLC) and media access control (MAC). Examples: Network bridges Ethernet Wi-Fi Physical – Data Link – Network – Transport – Session – Presentatio Layer 1 - Physical This layer defines the electrical and physical specifications for the networking media that carry the data bits across a network. Converts bits into electronic signals for outgoing messages. Converts electronic signals into bits for incoming messages This layer manages the interface between the computer and the network medium (coax, twisted pair, etc.) This layer tells the driver software for the MAU (media attachment unit) (eg. network interface cards (NICs), modems) what needs to be sent across the medium Examples: Physical Network hubs and repeaters LAN and WAN topology – Data Link – Network – Transport – Session – Presentatio Advanced Topic NORMALIZATION Normalization In the field of Relational Database design, normalization is a way of ensuring that a database structure is suitable for generalpurpose querying and free of certain undesirable characteristics that could lead to a loss of Data integrity Data Integrity Refers to the validity of data The assurance that data is accurate, correct and valid to the validity of data What is Normalization? Database normalization is the practice of optimizing table structures. Optimization is done by a complete investigation of the various pieces of data that will be stored within the database An Introduction to Database Normalization - Preliminary Definitions Terminology in Normalization: Entity: The word ‘entity’ as it relates to databases can simply be defined as the general name for the information that is to be stored within a single table. Example: for storing information about the school’s students, then ‘student’ would be the entity. The student entity would likely be composed of several pieces of information, for example: student identification number, name, and email address. These pieces of information are better known as attributes. Relationship Understanding the relationships between the data items forming the various entities and between the entities themselves forms the foundation of database normalization. Remember, there are three types of data relationships that you should be aware of: One-to-One One-to-Many Many-to-Many Foreign Key and ERD Foreign key: A foreign key forms the basis of a One-to-Many relationship between two tables. The foreign key can be found in the Many table, and points to the primary key found in the One table Entity-relationship diagram (ERD): An ERD is a graphical representation of the database structure. An ERD can be created using sophisticated software or drawn on a piece of paper from your pocket. We want to eliminate data redundancy Redundancy happens when the same data values are stored more than once in a table, or when the same values are stored in more than one table. To prevent redundancy, normalization is done to improve performance when performing CRUD operations, especially searching for information One of the biggest disadvantages of data redundancy is that it increases the size of the database unnecessarily. Also data redundancy might cause the same result to be returned as multiple search results when searching the database causing confusion and clutter in results. Avoiding Redundancy Analysis This table maps (points to) various students to the classes found within their schedule. Issues: Assuming that the only intention of this table is to create student-class mappings, then there really is no need to repeatedly store the class time and professor ID. if there are 30 students to a class, then the class information would be repeated 30 times over Why avoid Redundancy? Redundancy introduces the possibility for error. the name of the class found in the final row in the table (Matj 148). Given the name of the class found in the first row, chances are that Matj 148 should actually be Math 148! While this error is easily identifiable when just four rows are present in the table, imagine finding this error within the rows representing the 60,000 enrolled students Database Normalization - The Three Normal Forms The process towards database normalization progressing through a series of steps, typically known as Normal Forms. First Normal Form (1NF) Converting a database to the first normal form is rather simple. The first rule calls for the elimination of repeating groups of data through the creation of separate tables of related data. Breaking bigger tables down into several smaller tables. The first table contains solely student information (Student): The second table contains solely class information (Class): The third table contains solely professor information (Professor): Second Normal form Once you have separated the data into their respective tables, you can begin concentrating upon the rule of Second Normal Form -the elimination of redundant data. Referring back to the Class table, typical data stored within might look like: Second Normal Form (2NF) While this table structure is certainly improved over the original, notice that there is still room for improvement. In this case, the className attribute is being repeated. With 60,000 students stored in this table, performing an update to reflect a recent change in a course name could be somewhat of a problem. Therefore: create a separate table that contains classID to className mappings (ClassIdentity): Class Identity The updated Class table would then be simply: Third Normal Form (3NF) For complete normalization of the school system database, the next step in the process is to satisfy the rule of the Third Normal Form. This rule seeks to eliminate all attributes from a table that are not directly dependent upon the primary key. In the case of the Student table, the college and college Location attributes are less dependent upon the student ID than they are on the major attribute. Therefore, we’ll create a new table that relates the major, college and college Location information: Third Normal Form The revised Student table would then look like: Some other Database Terms… Data Mining Data Matching Distributed Databases Boolean Operators SQL Servers Summary Normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics—insertion, update, and deletion anomalies—that could lead to a loss of data integrity.