Info 275 Quiz #2 Review Database Development Software Realities - An enormous number of information systems are conceived and implemented every year - Often: o Delivered late or over budget (80%) o Completely fail or abandoned (40%) o Fail to address needs of users( training etc. 40%) o Don’t align with organizational goals (75%) - Major reasons for failure of software projects includes o Lack of a complete requirements specifications o Lack of appropriate specification o Poor decomposition of design into manageable components - We need a well-defined and logical approach to guide development - Structured approach to development was proposed called Systems Development Lifecycle (SDLC) Information Systems - Resources that enable collection, management, control, and dissemination of information throughout an organization - Database is fundamental component of IS, and its development/usage should be viewed from perspective of the wider requirements of the organization. Project - A planned undertaking that has a specified beginning and end and that produces some definite result - For our purpose: the result is a new or modified information system and associated database - Usually requires a team of experts (Data analyst, DBA’s) - The development of databases are critical component of information systems projects System Definition - Describes scope and boundaries of database system and the major user views - User view defines what is required of a database system from perspective of: o A particular job role (such as manager or supervisor) or o Enterprise application area (such as marketing, personnel or stock control) - Database application may have one or more user views (applications, modules, subsystems) - Identifying user views helps ensure that no major user of the database are forgotten when developing requirements for new system - User views also help in development of complex database system allowing requirements to be broken down into manageable pieces Representation of a Database System with Multiple User Views Example- Course Admin - Assume its many years ago, and X is considering building a new information system (that ends up looking like Banner) for student registration in course sections. Key Functions Create / Update Students Create / Update Courses Create / Update Sections Create / Update Buildings and Classrooms Create / Update Faculties and Departments Create / Update Professors Create / Update Timeblocks Assign Sections to Timeblocks Assign Sections to Classroom(s) Assign Sections to Professors Enroll Student in Section Drop Student From Section Provide Section Override to Student Generate Timetable Assign Student Grades Requirements Collection and Analysis - Process of collecting and analyzing information about the part of organization to be supported by the database system, and using this information to identify users’ requirements of new system - Requirements is about understanding, in detail user requirements in terms of: • Functions and Events (transactions) • Things (?) Defining System Requirements - Requirement: • Create a means to transport a single individual from home to place of work • Management Interpretation • IT Interpretation • User Interpretation Requirements Collection and Analysis - Information is gathered for each major user view including: • A description of data used or generated • Details of how data is to be generated/used • Any additional requirements for new database systems - Information is analyzed to identify requirements to be included in new database systems. Described in the requirements specification - AS the result of requirements, we develop a series of data models (conceptual, logical and physical) - Another important activity is deciding how to manage the requirements for a database system with multiple user views. - Two main approaches: • centralized approach; • • view integration approach; combination of both approaches. Centralized approach; - Requirements for each user view are merged into a single set of requirements. - A data model is created representing all user views during the database design stage. • View integration approach • Requirements for each user view remain as separate lists. • Data models representing each user view are created and then merged later during the database design stage. • Data model representing single user view (or a subset of all user views) is called a local data model. • Local data models are then merged at a later stage during database design to produce a global data model Database Design - Process of creating a design for a database that will support the enterprise’s mission statement and mission objectives for the required database system - Major deliverable: Data model - Main purpose of data modeling include: • To assist in understanding the meaning (Semantics) of the data • To facilitate communication about the information requirements - Building data model requires answering questions about entities, relationships, and attributes - A data model ensures we understand: - Each user’s perspective of the data; - Nature of the data itself, independent of its physical representations; - Use of data across user views. - Three phases of database design: o Conceptual database design o Logical database design o Physical database design. Conceptual Database Design - Process of constructing a model of the data used in an enterprise, independent of all physical considerations - Data model is built using the information in users requirements specifications - Conceptual data model is source of information for logical design phase Logical Database Design - Process of constructing a model of the data used in an enterprise based on a specific data model (relational) but independent of a particular DBMS and other physical considerations - Conceptual data model is refined and mapped on to a logical data model Physical Database Design - Process of producing a description of the database implementation of secondary stage - Describes base relations, file organizations and indexes used to achieve efficient access to data. Also describes any association integrity constraints and security measures - Tailored to a specific DBMS system Three-Level ANSI-SPARC Architecture and Phases of Database Design Application Design - Design of user interface and application programs that use and process the database - Database design and application design are parallel activities and are fdone in tandem, often by the same team - Includes two important activities: o User interface design o Transaction design User Interface Design Application Design- Transactions - Transaction: An action, or series of actions, carried out by a single user or application program, which accesses or changes content of the database - Important characteristics of transactions - o Data to be used by the transaction o Functional characteristics of the transaction o Output of the transaction o Importance to the users o Expected rate of usage Three main types of transactions o Retrieval o Update o Mixed Implementation - Physical realization of the database and application designs o Use DDL to create database schemas and empty database files. o Use DDL to create any specified user views o Use programming language to create the application programs. This will include the database transactions implemented using DML, possibly embedded in a host programming language. Data Conversion and Loading - Transferring any existing data into new database and converting any existing applications to run on new database. - Only required when new database system is replacing an old system o DBMS normally has utility that loads existing files into new database - May be possible to convert and use application programs from old systems for use by new system Testing - Process of running the database system with intent of finding errors - Use carefully planned test strategies and realistic data - Demonstrates that database and application programs appear to be working according to requirements Operational Maintenance - Process of monitoring and maintaining database system following installation - Monitoring performance of system o If performance falls, may require tuning or reorganization of the database - Maintaining and upgrading database application (when required) - Incorporating new requirements into database applications. CASE Tools - - Automated tools that assist with requirements, design and implementation tasks Support provided by CASE tools include: o data dictionary to store information about database system’s data; o design tools to support data analysis; o tools to permit development of corporate data model, and conceptual and logical data models; o tools to enable prototyping of applications. Provide following benefits: o Standards/Consistency o Automation/Higher Productivity o Higher Quality Design/Fewer Defects Concepts of the ER Model: - - Entity Types o Group of objects with same properties, identified by enterprise as having independent existence (a table). Entity Occurrence o A unique object of an entity type (a row). Entity Type Examples: o Tangible Things o Roles played by people o Organization units o Sites/Locations o Incidents, Events, Transactions Relationship Types o Set of meaningful associations among entity types. Relationship Occurrence o Unique association, which includes one occurrence from each participating entity type. Ternary relationship: o when the relationship has an attribute in the form of a new table. Ex: Staff registers Client at Branch. Recursive relationship: - - - o Relationship where the same entity participates more than once in different roles. o Relationship may be given role names to indicate purpose that each participating entity type plays in a relationship. Attributes o Property of an entity or relationship type. Attribute domain: o Set of allowable values for one or more attributes Simple Attribute: o Attribute composed of a single component with an independent existence. Composite Attribute: o Attribute composed of multiple components, each with an independent existence. Single Valued Attribute: o Attribute that holds a single value for each occurrence of an entity type. Multi-valued attribute: o Attribute that holds multiple values for each occurrence of an entity type. o We need special rules for dealing with these. Derived attribute: o Attribute that represents a value that is derivable from value of a related attribute or set of attributes, not necessarily in the same entity type. Example: Age from Date of Birth. Strong Entity: o Entity that is not existence-dependent on some other entity type. Weak Entity: o Entity that is existence-dependent on some other entity. Strong/Weak Example: A CLIENT has a PREFERENCE. A PREFERENCE cannot exist with a CLIENT. Structural Constraints: o Multiplicity: number or range of possible occurrences of an entity type that may relate to a single occurrence of an associated entity through a particular relationship. Represents policies or business rules established by user or company. o The most common degree for relationships is binary. Binary relationships are referred to as: One-to-one (1..1) One-to-many (1..*) Many-to-many (*..*) o Multiplicity is made up of two types of restrictions on relationships: Cardinality: - Describes maximum number of possible relationship occurrences for an entity participating in a given relationship type. Participation: Determines whether all or only some entity occurrences participate in a relationship. Problems with ER Models: o Fan Traps: Where a model represents a relationship between entity types, but pathway between certain entity occurrences is unclear. o Chasm Trap: Where a model suggests the existence of a relationship between entity types, but the pathway does not exist between certain entity occurrences. Normalization Purpose of Normalization: Major aim of relational database design is to group attributes into relations to minimize data redundancy. Normalization is a technique for producing a set of suitable relations that support the requirements of a database. Characteristics of a suitable set of relations include: - The minimal # of attributes necessary to support the data requirements of the enterprise. Only attributes with a close logical relationship are found in the same table. Minimal redundancy with each attribute represented only once with the important exception of attributes that form all of part of foreign keys. The benefits of using a database that has a suitable set of relations is that the database will be: - easier for the user to access and maintain data - minimizes storage space on the computer - less potential issues related to data integrity - optimizes performance for operations such as insert Functional Dependency: - - - Describes relationship between attributes. Goals of functional dependency analysis: o Ensure each relation contains information about a specific thing and each attribute serves to describe that thing o Ensure that relation contains only attributes with full functional dependency on the primary key. Characteristics of Functional Dependency: o There I a 1:1 relationship between the attributes on the left hand side (determinant) and those on the right hand side. o Holds for ALL time. o The determinant has the minimal number of attributes necessary to maintain the dependency with the attributes on the right side. Determinants should have the MIN number of attributes necessary to maintain the functional dependency with the attributes on the right side: full functional dependency. Transitive Dependency describes a condition where A,B and C are attributes of a relation such that A B and B C, then C is transitively dependent on A via B. Process of Normalization: - Formal technique for analyzing a relation based on its PK and the functional dependencies between the attributes of that relation. - Executed as a series of steps. o UNF (Unnormalized Form) A table that contains one or more repeating groups – multiple values in a single column. o 1NF (First Normal Form) A relation in which the intersection of each row and column contains one and only one value. Attained by indentifying repeating groups and flattening table (filling empty cells) or creating a new table to make up for multi valued attributes. o 2NF (Second normal form) Based on the concept of full functional dependency Only applies to relations with composite keys A relation that is 1NF and every non-PK is fully functionally dependent on the PK. 1NF 2NF o identify PK for 1NF table o identify functional dependencies o if partial dependencies exist on PK, remove them by placing them in a new table o 3NF (Third normal form) based on transitive dependency relation in 1NF and 2NF in which no non-PK attribute is transitively dependent on the PK. 2NF 3NF o IDENTIFY THE PK IN 2NF RELATION. o IDENTIFY FUNCTIONAL DEPENDECIES. o IF TRANSITIVE DEPENDENCIES EXIST ON THE PK, REMOVE THEM BY PLACING THEM IN A NEW TABLE WITH A COPY OF THE DETERMINANT WHICH BECOMES THE NEW PK. Conceptual Database Design Design Methodology - A structured approach that uses procedures, techniques, tools and documentation aids to support and facilitate the process of design 3 Main phases: - Conceptual Database design: design with no technology/ implementation assumptions - Logical database design: Design for specific model - Physical database design: design for specific DBMS (i.e. Oracle vs. SQL Server) Critical Success Factors in Database Design • Work interactively with the users as much as possible. • Follow a structured methodology throughout the data modelling process. • Use diagrams to represent as much of the data models as possible. • Build a data dictionary to supplement the data model diagrams. • Be willing to repeat steps. Conceptual Database Design Steps • Step 1: Identify entity types • Step 2: Identify relationship types • Step 3: Identify and associate attributes with entity or relationship types • Step 4: Determine attribute domains • Step 5: Determine candidate, primary, and alternate key attributes • Step 6: Consider use of enhanced modeling concepts (optional step) • Step 7: Check model for redundancy • Step 8: Validate conceptual model against user transactions • Step 9: Review conceptual data model with user Build Conceptual Data Model • • Goal: To build a conceptual data model of the data requirements of the enterprise. Model comprises entity types, relationship types, attributes and attribute domains, primary and alternate keys, and integrity constraints. • Documented in the form of an Entity-Relationship model and associated documentation. • ER Models document: • Entities • Attributes • Relationships Step 1: Identify Entity Types • Goal is to identify all of the ‘things’ that users need in the computer system. • Dream Homes: • PropertyForRent, PrivateOwner, Business Owner, Client, Branch, Staff, Lease, Preference • Document in a data dictionary: a document where we describe entities, attributes and relationships textually as well as on ER Diagram. Step 2: Identify Relationship • • • Relationships: naturally occurring associations between entities Multiplicity: - number (or range) of possible occurrences of an entity type that may relate to a single occurrence of an associated entity type through a particular relationship. Document relationships in the data dictionary and on an Entity Relationship Diagram Step 3: Define each entity’s attributes • Identify each individual piece of information associated with each entity. • Only capture attributes that are required by the application we are building. • There are multiple types: • Simple • Composite • Derived • Should we store staff age? Step 4: Identify Attribute Domains • Basic Domains: • Character • Numeric • Dates • We can also specify specific ranges or even specific values • Province: NS, NF, NB …… Step 5: Identify Keys • Process of defining how each entity will be uniquely identified. • Candidates • Primary • Alternate Step 6: Apply Advanced modeling techniques • Covers situations such as superclass/sub-class • Owner: • Private_owner • Business_owner • Required when the different subclasses exist and substantially different attributes Step 7: Check the model for redundancy • This step is to ensure that there are no unnecessary, redundant / duplicated relationships in the design. • Examine 1:1 relationships • Remove duplicates • Consider time dimension Step 8: Validate the model vs. user transactions • This step is taken to ensure that the model meets the users functional requirements for the database. • Correct entities? • Correct attributes? • Correct relationships? Validate model vs. Transactions (requirements) • Initial version of the model is developed based on requirements analysis • From application design, we get a list of required transactions • We then ‘map’ the transactions to our model to ensure we have the right entities, attributes and relationships in the model. • • • If we do not, we modify the model Sample query transactions: • a) generate a list of staff supervised by each supervisor • b) Generate a list of staff alphabetically • c) List the properties and owners sorted by branch Step 9: Review the model with users • Walk through the model with users to ensure that we have reflected their requirements in the database • First step: train users on how to read an ERD!