Week 1 notes Data constitutes the building blocks of information Information is produced by processing data Information is used to reveal the meaning of data Accurate, relevant, and timely information is the key to good decision making Good decision making is the key to organisation survival in a global environment Data management: a process that focuses on data collection, storage and retrieval. Common data management functions include addition, deletion, modification and listing. Database: a shared, integrated computer structure that houses a collection of related data. A database contains two types of data: end-user data (raw data) and metadata. Metadata: data about data; that is, data bout data characteristics and relationships. Database management system is a collection of programs that manages the database structure and controls access to the data stored in the database. A database resembles a very well organise electronic filing cabinet in which powerful software (the DBMS) helps manage the system. IT is the intermediary between the user and the database data are integrated and stored in a form of DB a DB keeps not only end-user data but also metadata end users cannot directly access to any data stored in a DB, but have to work with a DBMS to interact with the DB end users do not directly work with a DBMS, but they use relevant applications (programs) to request necessary information retrieved from the DB. These applications might be written by programmers and make roles to connect between a DBMS and end users. A DBMS has the following advantages o Improved data sharing o Improved data security o Better data integration o Minimized data inconsistency o Improved data access o Improved decision making o Increased end-user capacity Data inconsistency. A condition in which different version of the same data yield different (inconsistent) results Query. A question or task asked by an end user of a database in the form of SQL code. A specific request for data manipulation issued by the end user of the application to the DBMS 1-3b Single-user database – a database that supports only one user at a time Desktop database – a single user database that runs on a personal computer Multiuser database – a database that supports multiple concurrent users Workgroup database – a multiuse database that usually supports fewer than 50 users or is used for a specific department in an organisation Enterprise database – the overall company data representation, which proves support for present and expected future needs Centralise database – a database that is located at a single site Distributed database – a logically related database that is stored in two or more physically independent sites Cloud database – a database that is created and maintained using cloud services, such as MS Azure or Amazon AWS General purpose database – a database that contains a wide variety of data used in multiple disciplines Discipline specific database – a database that contains data focused on specific subject areas. Operational database -0 a database designed primarily to support a company’s day-to-day operations. Also known as a transactional database, OLTP, or production database Online Transaction Processing Database – see operational database Transactional Database – see operational database Production database – see operational database Analytical Database – a database focused primarily on storing historical data and business metrics used for tactical or str4ategic decision making Data warehouse – a specialised database that stores historical and aggregated data in a format optimised for decision support Online Analytical Processing – A set of tools that provide advanced data analysis for retrieving, processing and modelling data from the data warehouse Business intelligence – a set of tools and processes used to capture, collect, integrate, store, and analyse data to support business decision making Unstructured data – data that exiss in its original, raw state; that is, in the format in which it was collected. Structured data – data that has been formatted to facilitate storage, use, and information generation Semi-structure data – data that has already been processed to some extent. Extensible Markup Language (XML) – a metealanguage used to represent and manipulate data elements. Unlike other markup languages, XML permits the manipulation of a documents data elements XML database – a database system that stores and manages semi-strcutred XML dta NoSQL – a new generation of DBMS that is not based on the traditional relational database model. 1-7 Database system – an organisation of components that defines and regulates the collection, storage, management, and use of data in a database environment Data dictionary – a DBMS component that stores metadata – data about data. The data dictionary contains data definitions as well as data characteristics and relatships. May also include dtat that is external the DBMS Performance tuning – activities that make a database perform more eficeitnyl in terms of storage and access speed. Disadvantages of database systems o o o o o Incraswed costs Management complexity Maintaining currency Vendor dependence Frequent upgrade/replacement cycles 1-4 1-6 Structural depednece – a data characteristic in which a change in the database schema affaects data access, thus requiing changes in all access programs Structural independence – a data characteristic in which changes in the database schema do not affect data access Data type – defines the kind of values that can be used or stored. Also, used in programming languages and database systems to determine operation that can be applied to such data. Data dependence – a data condition in which data represenetiation and manipulation are dependent on the physical data storage characteristics Data independence – a condition in which data access is unaffected by changes in the physical data storage characteristics Logical data format – the way a person views data within the context of a problem domain Physical dta format – the way a computer “sees” (stores) data. Islands of information – in the old file system environment, pools of independent, often duplicated, and inconsistent data created and managed by different departments Data redundancy – exists when the same data is stored unnecessarily at different places Data integrity – in a relational database, a condition in which the data in the database complies with all entity and referential integrity constraints. critical problems within the file system are: structural dependence, data dependence, data redundancy, and data anomalies. Evolution of the file system Hardware Software Operating system software DBMS software Application programs and utility software People Procedures Data Data ananomoly – a data abnormality in which inconsistent changes have been made to a database. EG an employee moves but the address change is not corrected in all files in the database. Data modelling – the process of creating a specific data model for a determined problem domain Data model – a representation, usually graphic, of a complex “real-world data structure. Data models are used in the database design phase of the database life cycle 2-1 2-2 2-3 Entity – a person, place, thing, concept, or event for which data can be stored. Attribute – a characteristic of an etity or object. An attribute has a name and a data type Relationship – an association between entities One-to-many (1:M or 1..*) – associations among two or more entities that are used be data models. In a 1:M relationship, one entity instance is associated with many instances of the related entity Many-to-many ( M:N or *..*) relationship – association among two or more entites in which one occurrence of an entity is associated with many occurrences of a related entity and one occurrence of the related entity is associated with many occurrence of the first entity. One-to-one (1:1 or 1..1) relationship – associations among two or more entities that are used by data models. In a 1:1 relationship, one entity instances is associated with only one instance of the related entity Constraint – a restriction placed on data, usually expressed in the form of rules eg a student’s GPA must be between 0 and 4. 2-5 Hierarchical model – an early database model whose basic concepts and characteristics formed the basic concepts and characteristics formed the basis for subsequent database models Segment – in the hierarchical data model, the equivalent of a file system’s record type Network model – an early data model that represented data as a collection of record types in 1:M relationship Schema – a logical grouping of database objects, such as tables, indexes, views, and queries, that are related to each other. Subschema – the portion of the database that interacts with application programs Data Manipulation Language (DML) – the set of commands that allows an end user to manipulate the data in the database, such as SELECT, INSERT, UPDATE, DELETE, COMMIT, AND ROLLBACK Data definition language (DDL) – the language that allows a database administrator to define the database structure, schema, and subschema. Relational model – based on mathematical set theory and represents data as independent relations. Each relation (table) is conceptually represented as a two-dimensional structure of intersecting rows and columns. The relations are related to each other through the sharing of common entity characteristics (values in columns) Table (relation) – a logical construct perceived to be a 2D structure composed of intersecting rows(entities) and columns (attributes) that represent an entity set in the relational model Tuple – in the relational model, a table row Relational database management system (RDBMS) – collection of programs that manages a relational database. The RDBMS software translates a user’s logic requests (queries) into commands that physically locate and retrieve requested data Relational diagram – a graphical representation of a relational database’s entities, attributes within those entities, and the relationships among entities. Entity relationship model (ERM) – a data model that describes relationships (1:1, 1:M, and M:N) among entities ar the conceptual level with the help of ER diagrams Entity relationship diagram – a diagram that depicts an entity relationship model’s entities, attributes, and relations Entity instance (entity occurrence) – a row in a relational table Entity set – a collection of like entities Connectivitiy – the type of relationship between entities. Classifications include 1:1, 1:M, M:N. Chen notation – ER model Crow’s foot notation – a representation of the entity relatiohsp diagram that uses threepronged ‘symbol’ to represent the “many” sides of the relationship Class diagram notation – the set of symbols used in the creation of class diagrams. 3-1 Predicate logic – used extensively in maths to provide a framework in which an assertion (statement of fact) can be verified as either true or false Set theory – a math of mathematical science that deals with sets, or groups of things, and is used as the basis for data manipulation in the relational model. Tuple – in the relational model, a table row Attribute doman – in data modelling, the construct used to organise and describe an attributes set of possible values Primary key (PK) – the the relational model, an identifier composed of one or more attributes that uniquely idenitifies a row. Also a candidate key selected as a unique entity idenitifer Key one or more attributes that determine other attributes Determination – the role of a key. In the context of a database table, the statement “A determines B” indicates that knowing the value of attribute A means that the value of attribute B can be looked up Functional dependence – within relationship R, an attribute B is functionally dependent on an attribute A if and only if a given value of attribute A determines exactly one value of attribute B. The relationship “B is dependent on A” is equivalent to “A determines B” and is wrriten as A→B Determinant – any attribute in a specific row whose value directly determines other values in that row Dependent – an attribute whose value is determined by another attribute Full functional dependence – a condition in which an attribute is functionally dependent on a composite key but not any subset of the key Composite key – a multiple-attribute key Key attribute – an attribute that is part of a primay key Superkey – an attribute/s that uniquely idenifites each entity in a table Candidate key – a minimal superkey; that is, a key that does not contain a subset of attributes that is itself a superkey Entity integrity – the property of a relational table that guarantees each entity has a unique value in a primary key and that the key has no null values Null – the absence of an attribute value. Note that a null is not blank Foreign key – attribute/s in one table whose values must math the primary key in another table or whose values must be null Referential integrity – a condition by which a dependent table’s foreign key must have either a null entry or a matching entry in the related table Secondary key – a key used strictly for data retrieval purpose. EG customers are not likely to know their customr id (primary key) but the combination of of last name, first name, middle initial, and telephone number will probably mather the appropriate table row. Flags – special codes implemented by designers to trigger a required response, alert end users to specified conditions, or encode values. Flags may be used to repvent nulls by bringin attention to the absence value in a table. Index – an ordered array of indec values and row ID values (pointers). Indexes are generally used to speed up an dfacilitate data retrieval. Also known as an index key. Unique index – an index in which the index key can have only one associated pointer value (row) Composite entity – an entity designed to transform a M:N relationship into two 1:M relationships . The composit entity’s primary key comprises at least the primary keys of the entities that it connects. Also known as a bridge entity or associative entity. Linking table – in the relational model, a table that implements an M:M relationship. Domain – the possible set of values for a given attribute Required attribute- in ER modelling, an attribute that must have a vlue. IN other words, it cannot be left empty Optional attribute – in ER modelling, an attribute that does not require a value; therefore, it can be left empty Identifier – one or more attributes that uniquely idneitfy each entity instance Relational schema – the orgnaisation of a relational database as described by the database administrator Composite identifier – in ER modelling, a key composed of more than on attribute. Composite attribute – an attribute that can be further subdivided to yield additional attributes. Eg, a phone number such as 615-898-2368 may be divided into an area code (615), an exchange number (898) and a four digit code (2368). Simple attribute – an attribute that cannot be subdivided into meanginful components. Single value attribute – an attribute that can have only one value. Multivalued attribute -an aatribute that can have many values for a single entity occurrence. Eg, an EMP_DEGREE attribute might store the string “BBA, MBA, PHD” to indicate three different degrees held. Derived attribute – an atytribute that does not physically exist with the entity and is derived via an algorithym. Eg, the Age attribute might be derived by subtracting the birth date form the current date. Participants – an ER term for entities that participate in a relationship. Ef, in the relationship “PROFESSOR teaches CLASS”, the teaches relationship is based on the particpants PROFESSOR and CLASS. Connectivity – the classification of the relationship between entities. Callsifications include 1:1, 1:Mand M:N Cardinality – a property that assigns a specific value to connectivity and expresses the range of allowed entity occurrences associated with a single cocurence of the related entity Existence-dependent – a property of an entity whose existence depnds on one or more other entities. In such an environment, the existence-independent table must be created and loaded first because the the existence dependent key cannot reference a table that does not yet exist. Existence-independent – a property of an entity that can exist aprt from one or more related entities. Such a table must be created first when referencing an existence-dependent table Strong entity- an entity that is existence-independent, that is, it can exist apart from all of its related entities. Weak (non-identifying) relationship. A relaltionship in which the primary key of the related entity does not contain a primary key component of the parent entity Strong (identifying) relationship – a relationship that occurs when two entities are existencedependent; from a database design perspective, this relationship exists whenever the primary key of the related entity contains the primary key of the parent entity. A week entity – an entity that displays existence depdence and inherits the primary key of its parent entity. Eg, DEPENDENT requires the existence of an EMPLOYEE Optional participation – in ER modelling, a condition in which one entity occurrence does not require a corresponding entity occurrence in a particular relationship Mandatory participation – a relationship in which on entity occurrence must have a corresponding occurrence in another entity. Eg, an EMPLOYEE works in a DIVISION. (A person cannot be an employee without being assigned a division). Relationship degree – the number of entities or particpants associated with a relationship. A relationship degree can be unary, binary, ternary, or higher. Unary degree – an ER term used to describe an associated within an entity. Eg, an EMPLOYEE might manage another EMPLOYEE Binary relationship – an ER term for an association (relationship) between two entities. Eg, PROFESSOR teaches CLASS. Ternary relationship – an ER term used to describe an association (relationship) between three entities. Eg, DOCTOR prescribes a DRUG for a PATIENT. Recursive relationship – a relationship found within a single entity type. Eg, an EMPLOYEE is married to an EMPLOYEE or a PART is a component of another part. Iterative process – a process based on repition of steps and procedures. Extended entity relationship model (EERM) – Sometimes referred to as the enhanced entity relationship model; the result of adding more semantic construts, such as entity supertypes, subtypes and entity clustering to the original entity relationship (ER) model EER diagram – the entity relationship diagram resulting from the application of extended entity relationship concepts that provide additional semantic content to in the ER model Entity supertype – in a generalisation or specilisation hierarchy, a generic entity type that contains common characteristics of entity subtypes Entity subtype – in a generalistion or specialisation hierarchy a subset of an entity supertype. The entity supertype contains the common characteristics and the subtypes contain the unique characteristics of each entity. Specialisation hierarchy – a heiracrhy based on the top-down process of identifying lowerlevel, more specific entity subtypes from a higher-level entity supertype. Speciliasation is based on grouping unique characteristics and relationships of the subtypes Inheritance – in the EERD, the property that enables an entity subtype to inherit the attributes and relationships fo the entity supertype Subtype descriminator – the attribute in the supertype entity that determines to which entity subtype each supertype occurrence is related. Disjoint subtypes – in a specialistation hierarchy, these are unique and nonoverlapping subtype entity set. Overlapping subtype set – in a specilialsation hierarchy, a condition in which each entity instance (row) of the supertype can appear in more than one subtype Completeness constraint – a constraint that specifies whether each entity supertype occurrence must also be a member of atleast one subtype. The completeness constraint can be partial or total o Partial completeness – in a generalisation or specilisation hierarchy, a condition in which some supertype occurrences might not be members of any subtype o Total completeness – in a generalisation or specialisation hierarchy, a condition in which every supertype occuence must be a member of at least one subtype Specialisation – in a specialisation heirarhy, the grouping of unique attributes into subtype entity Generalisation – in a specialisation hierarchy, the grouping of common attributes into a supertype entity Entity cluser – a “virtual” entity tpe used to reprsetn multiple entites and relationships in the ERD. An entity cluster is formed by combining multiple interreleated entities into a single abstract entity object. An entity cluster is considered “virtual” or “abstract” because it is not actually an entity in the final ERD. Natural key (natural idenitifer) – a generally accepted identifier for real-world objects. As its name implies, a natural key is familiar to end users and forms part of their day-to-day business vocab. Surrogate key – a system-assigned primary key, generally numeric and auto-incremented Time-variant data – dat whose values are a function of time. Eg, time-variant data can be seen at work when a company’s history of all administrative appointments is tracked. Design trap – a problem that occurs when a relationship is improperly identified and therefore is represented in a way that is not consistent with the real world. The most common design traps is the fan trap Fan trap – a design trap that occurs when one entity is in two 1:M relationshiups with other entities, thus producing an association among the other entities that is not expressed in the model