School of Computer Sciences IS 342 Western Illinois University AMARAVADI FINAL REVIEW – OUTLINE OF TOPICS Database Preliminaries (Database introduction) - FYI Data vs Information: concepts of entities, attributes, primary key, files, file processing problems, usage of data/information. - Definition of DBMS vs database; file/table, records, fields, schema, primary and secondary keys. - Operational and advanced uses of databases. - Organizational importance of databases (engine for all IS applications, starting point in the transformation of data to information, decision support, strategic resource). - Database objectives: to make data available, to ensure data quality and integrity, centralized control of data, data independence. - DBMS activities: define schema/structure (data definition), enter data, modify data, query data (SQL), get reports – query-by example (QBE). - The database development cycle (planning, requirements definition, logical and physical design, implementation, maintenance). - The database concept/approach. - Entities, entity classes and attributes. - Ignore the student administration application. Database Fundamentals (Database evolution and environment) - Evolution of DBMS: intro of business computers, use in TP, problems, academic formulations, standardization of DBMS concepts. - The file processing era and problems caused: uncontrolled redundancy, program-data dependence, program maintenance, poor data quality, inability to get reports, application backlog. - The DBMS approach : conceptualization of data, organized design, centralized management, system controlled access, checks on data quality, query and reporting. - Rules for getting information from multiple tables - Database terminology: schema/structure/definition, file/table/relation, record, attribute/field. - Database architecture: the three-schema architecture (external, conceptual and internal) - Different classes of users in a DBMS (end users, administrators and developers). - DBMS components: Data definition, SQL, Programming language interface, Data dictionary, Screen/Report generation, Application generation, Export/Import. - Components of a database environment: Data dictionary, database client, SQL server, directory server Enterprise Applications programs, database. Database Planning - The database development cycle (planning, requirements definition, logical and physical design, implementation, maintenance). - About database planning: areas of organization, database size, # of analysts, HW/SW requirements. - Without adequate planning: org. changes impact development, schedule changes, rework, inaccurate estimation of resources (ignore). - Notion of a business process: a group of related activities - Functions, processes and activities - Activity data relationship: process/activity can create/update/Delete data. - The Enterprise Data Model (EDM): you need to be able to draw an EDM. - Pre-requisites of planning: management commitment, project team. - Planning metamodel: Study the organization from the top down, identify functions, processes and activities, identify which functions impact CSFs, study the applications and the eclasses they handle. - Steps in planning : identify background, information used, develop planning matrices, Enterprise model, define scope of project; planning matrices (function vs CSF, process vs eclass, application vs eclass) - Planning objectives: scope, overview of info requirements, critical processes & information classes, - resource requirements. Elements of a database plan (ignore). Outcomes of planning: planning matrices, preliminary data model, team assignments, application requirements, project priorities. Planning pitfalls: insufficient management support, strategic plan not available, direction not known and lack of co-ordination. Database Analysis - Information collected during RD: data structure and rule/constraint information. - Constraints: Integrity constraints and performance constraints. - Steps in RD (FYI): define scope, select methodology, identify user views, develop data model, crosscheck, specify constraints. - Top-down vs bottom-up approaches: Conceptual Data Model vs Enterprise Data Model; - ER Model (FYI) - Basic modeling concepts: Entity classes, attributes (multi-valued, derived) and relationships (degree, cardinality, class/subclass, recursive). - Basic symbols of ER: Entity class, relationship, primary-key, attribute, multi-valued attribute, Gerunds, relationships of different degrees (unary/recursive, binary, ternary), relationship cardinalities, classsubclass (sub-types, super-types), exclusive and exhaustive, non-exclusive and non-exhaustive; minimum-maximum cardinalities. - ER drawing guidelines: identify eclasses, identify relationships among eclasses, use rectangles for eclasses and diamonds for relationships, depict the cardinalities, use ovals for attributes, label attributes, underline the pkey. - ER additional guidelines (FYI): build model around central eclass, group attributes with entity classes, show only relevant attributes, do not show computed attributes or values, underline pkey, label all relationships, refine. - User views: reports, receipts, forms, memos and screen displays. - Constraints: domain, integrity, business and performance – you need to be able to define and give examples of each. Logical Database Design - I - Ill-structured, well-structured tables, anomalies - Design concepts: insertion, update/modification and deletion anomalies. - Design objectives: efficient form, no redundancies, queries/reporting facilitated, database implementable. - Design from ER using thumb rules, for 1:1, 1:M and M:N type relationships. - Use normalization theory if: complex data relationships, no planning/ER, maintenance - More design concepts: FD concept, FD diagrams, determinant, full F.D. rule - Functional dependency rules: reflexive, augmentation, union, decomposition, transitivity & substitution rules. - Concepts in design: Primary key, candidate key, composite key, non-key, foreign key/cross-reference key (when do we need these?) Logical Database Design -II - The normalization process: 1. Remove repeating groups 1st NF 2. Remove partial dependencies 2nd NF 3. Remove transitive dependencies 3rd NF 4. Remove multivalued dependencies 4th NF - More design concepts: repeating groups, partial dependency, transitive dependency and multi-valued functional dependency. - The normalization process with the FD approach: 1. Draw an FD chart 2. Group FDs together based on common determinants 3. Place each FD in a separate table 4. Add foreign keys if necessary. - The normal forms: 1st, 2nd, 3rd, 4th SQl: A Standard for Database Processing - Codd’s rules for RDBMS: information representation, guaranteed access, treatment of nulls, dynamic on-line catalog, comprehensive data sub-language, view updating, high-level insert, update and delete, physical, logical, integrity and distribution independence, non-subversion. - Types of SQL: DDL, DML, SQL/T, SQL/AU (DCL), SQL/I - DDL: Create/alter/drop table, Create/drop index, create/drop view. [need to be able to write DDL & DML statements] - DML: Insert, update, delete, select. - Types of Functions (and examples): Logical (=,>,<,<=,>=, <>,!=, Between, NOT, LIKE, %, IN) Arithmetic (ABS, ROUND, TRUNC, COUNT, AVG, SUM, MAX, MIN), String (Length, Substr, Lower, Upper) and Date (Add_months, Months_between, Next_Day, To_date). - Select statements using functions, expressions, logical operators, aggregates, group by and having. Physical Database Design (only till types of file organizations) - Physical Database Design: The process of mapping logical structures to internal structures. - Components of physical design (tables, storage and distribution, file organization, specification of integrity constraints). - Denormalization: de-normalization examples (1:1 and 1:M). - Storage strategies: volume and usage analysis, usage maps, estimating storage requirements - Data distribution strategies: centralized vs distributed (replicated, partitioned – vertical and horizontal), General principle for data distribution. - File organization criteria: retrieval time, access type, storage space, maintenance effort. - Types of file organizations: sequential, indexed-sequential, relative and hashed. - Relative and hashed organizations: IRG, IBG, buckets and slots, hashing algorithm, hash addresses, collisions/overflows (open overflow and chained overflow), load factor, search length. - Evaluation of hashing: hash sequence, extra space, low file activity ratio, real-time and OO. - Indexing: primary, secondary/non-key, clustering. - The indexed organization: cylinder, track, sequence set, overflow tracks; insertions and deletions - Indexing strategies: index on pkey, on foreign keys and on secondary keys depending on query frequency. - Advantages and disadvantages of ISAM: sequential access, # of indexes, uniform retrieval time, workhorse organization.