Uploaded by jack.john.smith.1999

INFS1603Notes

advertisement
INFS1603 Notes ­ Ben Munns
Chapter 1: Database Systems
1.2 Data vs. Information
­ Data ­ ​Meaningful facts concerning things such as people, places, events or concepts
­ Raw bits and bytes that do not yet have meaning
­ Must be properly ​formatted​ for storage, processing, and presentation
­ Information​ ­ Data that has been processed and presented in a form of human interpretation, often
with the purpose of revealing trends of patterns
­ Refined, with ​context​, makes sense to people
­ Knowledge​ ­ The body of information and facts about a specific subject
­ Data management​ ­ a discipline that focuses on the proper generation, storage, and retrieval of data
1.3 Introducing the database
­ Database​ ­ a ​collection of data​ that exists over a long period of time
­ A database includes:
­ End user data​ ­ Raw facts of interest to the end user. The data you want to store.
­ Metadata​ ­ Data that describes what type of data is in the DB and where its stored
­ Can be used to define structure/requirements of data
­ “Data about data”
­ Data dictionaries show metadata of DB
­ Purpose: help people keep track of things
­ Database Management System (DBMS)​ ­ Collection of programs that manages the database
structure and controls access to the data stored in the database
­ e.g. Oracle
­
Data in context:
­ SAP: all business data
­ Google: Google searches go through Google DB, cached versions stored in DB
­ Amazon: DB keeps info on products (price, quantity, seller etc.), user accounts (name, credit
card info)
­ Facebook: Personal data, location info
Types of Databases
­ DBs can be classified according to the ​number of users​, the ​location​, ​extent of use, type of user​,
etc.
­ Number of users:
­ Single­user database​ (e.g. Personal computer DB)
­ Multi­user database ​(e.g. Workgroup DB (<50 workers), Enterprise DB (>50 workers))
­ Location:
­ Centralised database ​­ supports data located at a single site
­ Distributed database ​­ supports data distributed across several different sites
­ How they will be used and on time sensitivity of info gathered from them:
­ Operational DB​ ­ support a company’s day­to­day operations
­ Data warehouse​ ­ Storing data used to generate info required to make tactical/strategic
decisions
­ Degree to which data is structured:
­ Unstructured data​ ­ Data that exist in their original (raw) state (the format they were collected)
­ Structured data​ ­ Result of taking unstructured data and formatting such data to facilitate
storage, use, and generation of info
­ Semistructured data​ ­ Data that have already been processed to some extent
­ XML database​ ­ supports storage and management of semistructured XML data
­ Example DBs include:
­ Internet, Intranet and Extranet DB
Basic Terminology
­ Character​ ­ most basic element of data
­ Field​ ­ contains data (composed of characters) (e.g. Name)
­ Record​ ­ set of related fields (e.g. first name, last name, etc. of one user)
­ Database​ ­ collects related (somewhat logical) records
­ DBMS ​­ manages the database
1.4 Why database design is important
­ Database design​ ­ the activities that focus on the design of the DB structure that will be used to store
and manage end­user data
­ Requires the designer to identify precisely the DBs expected use (affects its focus)
­ Appropriate data repositories and relationships must be carefully considered and implemented
­ A well­designed DB facilitates data mgmt and generates accurate and valuable info
­ Poorly designed ­> errors and bad decisions
1.5 Evolution of FIle System Data Processing
History of handling data
­ Manual filing​ systems
­ Computerised filing systems via d
​ ata files
­ Database systems
Manual filing
­ Papers within systems organised in order to facilitate expected use of data
­ As orgs grew + reporting requirements became more complex ­> keeping track of data in a manual file
system more difficult
File System Data Management
­ Data processing (DP) specialist​ hired to create a computer­based system that would track data and
produce required reports
­ Initially computer files were similar to manual
files
­ When business users wanted data from the
computerised file ­> request for data to DP specialist
­ DP specialist would create program to retrieve
data, manipulate to user request, and present
as a printed report
­ As more computerised files were developed ­> lots of
data files contained related, overlapping data with no
means of controlling or managing the data consistently
across all files
­
Problems with File Systems
­ 3rd generation programming languages (​ 3GL) skills​ required which are ​expensive
­ Data is handled by programs (as in the model)
­ Skills to organise data are not standardised ­> programmers must be familiar with the file
system (standardised SQL makes it easier to transfer workers)
­ Lengthy development times ­> difficult to get quick answers
­ System ​administration​ is ​difficult​ as number of files expands (requires multiple file management
programs)
­ Structural dependency​ ­ access to the file is dependent on its structure
­ Data dependency​ ­ Changes in data types require changing all the programs that access the
file
­ Each file must have its own file management system.
­ Modifications are likely to produce bugs
­ Data ​redundancy, inconsistencies​ and ​anomalies (​ modification anomalies, insertion anomalies
and deletion anomalies)
­ Data redundancy​ ­ exists when the same data is stored unnecessarily at different
places
­ Data inconsistency​ ­ exists when different and conflicting versions of the same data
appear in different places
­ Data anomaly​ ­ develops when not all of the required changes in the redundant data are
made successfully
­ Modification anomalies​ ­ Updated data in one file not reflected in others
­ Insertion anomalies​ ­ New data in one file not inserted in others
­ Deletion anomalies​ ­ Deleted data in one file not deleted in others
­ Data integrity​ ­ The condition in which all of the data in the DB are consistent with the
real­world events and conditions i.e. data is a
​ ccurate​ and ​verifiable
­ Lack​ of ​security​ ­ Not centralised ­> only as safe as security implemented
­ Limited data sharing
­ Can’t have several computers/programs accessing the same data
­ Update issues
Database Management System (DBMS)
­ DBMS​ ­ A ​data storage​ and​ retrieval system​ which permits data to be stored non­redundantly while
making it appear to the user as if the data is well integrated
­ Means all data can be stored once (central
repository)
­ Can restrict read/write access using user rights
­ DBMS serves as the intermediary between the user and
the database
­ Hides much of the databases internal complexity
from application programs and users
­ Advantages of DBMS
­ ↓​ ​data ​inconsistency/anomalies
­ Data inconsistency​ ­ when different
versions of the same data appear in
different places
­ ↓​ data ​redundancy​ and ​↑​ data sharing
­ ↑ ​end user ​productivity
­ ↓​ data/structural​ dependency​ problems
­ Data independence​ ­ possible to change data type without affecting application
program’s ability to access the data
­ Structural independence​ ­ Possible to make changes in the file structure without
affecting the application program’s ability to access the data
­ Easier ​to create access, modify and delete data
­ Access through ad hoc queries
­ Enforces ​standards
­ Central​ security
­ Standardised ​backup​ and ​recovery
­ Concurrency​ handling (several computers interacting)
­ ↑ ​decision making with higher quality data
­ ↑​ data sharing
­ Disadvantages of DBMS
­ Increased costs​ for hardware, software, personnel
­ Management complexity​ e.g. different interfaces, security
­ Maintaining currency​ (keeping DB current through updates, patches etc.)
­ Vendor dependence
­ Frequent upgrade/replacement cycles
1.7 Database Systems
­ The DB System consists of logically related data stored in a single logical data repository
­ Centralised DB > eliminate most of file systems data inconsistency, anomaly, dependence and
structural dependence problems
Database Environment
­ Database System​ ­ An organisation of components that define and regulate the collection, storage,
mgmt, and use of data within a DB environment
­ Must be tactically, strategically and cost­effective
­ Can be created and managed at different levels of complexity with varying adherence to precise
standards
Five components:
­ Hardware​ ­ All of the system’s physical devices
­ e.g. computers, storage devices, printers, network devices, etc.
­ Software​ ­ Three types of software needed to make DB function:
­ OS​ ­ manages all hardware components (e.g. Windows, OS X, Linux)
­ DBMS software​ ­ Manages the Db within the DB system (e.g. Oracle, MySQL)
­ Applications and utilities​ ­ Used to access and manipulate data in the DBMS and manage the
computer environment in which data access and manipulation take place
­ People​ ­ All users of the DB system. 5 types of users:
­ System Admins​ ­ Oversee the DB system’s general operations
­ DB admins​ ­ manage the DBMS and ensure that the DB is functioning properly
­ DB designers​ ­ design the DB structure
­ System analysts/programmers​ ­ Design and implement the app programs (e.g. data entry
screens, reports, etc.)
­ End users​ ­ People who use the application programs to run the orgs daily operations (e.g.
Managers)
­ Procedures​ ­ The instructions and rules that govern the design and use of the DB system. Enforce the
standards.
­ Data​ ­ the collection of facts stored in the DB
DBMS Functions
­ Data dictionary mgmt​ ­ stores definitions of data elements and their relationships (metadata) in a data
dictionary
­ DBMS provides data abstraction, and it removes structural and data dependence from the
system
­ Data storage mgmt​ ­ Provides storage not only for data but for related data entry forms or screen
definitions, report definitions, data validation rules, etc.
­ also important for ​performance tuning​ ­ Activities that make the DB perform more efficiently in
terms of storage/access speed
­ Data transformation/presentation​ DBMS formats the physically retrieved data to make it conform to
the user’s logical expectations
­ Security mgmt​ ­ DBMS creates a security system that enforces user security/privacy
­ User access and operation (read, add, delete, modify) rules
­ Multi­user access control​ ­ Multiple users can access the DB concurrently without compromising the
integrity of the DB
­ Backup and recovery mgmt​ ­ provides to ensure data safety/integrity
­ Data integrity mgmt​ ­ DBMS promotes/enforces integrity rules ­> minimising data redundancy +
maximising data consistency
­ DB access languages and application programming interfaces​ ­ DBMS provides data access
through a non­procedural query language (user specifies what is to be done, not how its done) e.g.
SQL)
­ DB communication interfaces​ ­ Accept end­user requests via different network environments
Managing the database system: A shift in focus
­ The role of the human components changed from emphasis on programming (in file system) to focus
on the broader aspects of managing the orgs data resources
Chapter 2: Data Models
2.1 Data Modeling and Data Models
­ Data Model​ ­ A relatively simple representation, usually graphical, of more complex real­world data
structures
­ Represents data structures and their characteristics, relations, constraints, transformations, and
other constructs with the purpose of supporting a specific problem domain
­ “An abstraction of the real world”
­ Data modelling​ ­ Simple representation of complex world structures
­ An iterative, progressive process
­ Can be classified based on their d
​ egree of abstraction​:
­ Conceptual
­ Internal
­ External
­ Physical
2.2 The Importance of Data Models
­ Data models are a communication tool (between designers, programmers, and end users)
­ Data is viewed in different ways by different people BUT when a good DB blueprint is available, it does
not matter if views are different
2.3 Data Model Basic Building Blocks
­ Entity​ ­ Anything (person, place, thing, event) about which data are to be collected/stored
­ Entity type​ ­ the general (e.g. Person)
­ Entity instance​ ­ a particular example (e.g. Daniel)
­ Attribute​ ­ A characteristic of an entity
­ Relationship​ ­ An association among entities. Can be 1:M, M:N, 1:1
­ Constraint​ ­ A restriction placed on the data. Help ensure data integrity. Normally expressed in the
form of rules
2.4 Business Rules
­ Business rule​ ­ A brief, precise, and unambiguous description of a policy, procedure, or principle
within a specific organisation
­ Help to create and enforce actions within that orgs environment
­ Used to define entities, attributes, relationships, and constraints
­ Must be easy to understand and widely disseminated
Discovering Business Rules
­ Main sources of business rules are company managers, policy makers, department managers, and
written documentation
­ Direct interviews with end users are quick but may be less reliable
­ It pays to verify end­user perceptions
­ The process of identifying and documenting business rules is essential because:
­ Helps standardise company’s view of data
­ Communications tool between users and designers
­ Allow designer to understand the nature, role, and scope of the data
­ “” understand business processes
­ “” develop appropriate relationship participation rules/constraint
Translating Business Rules into Data Model Components
­ As a general rule, a noun in a business rule ­> entity, a verb ­> relationship
­ To properly identify the type of relationship, you should consider that relationships are bidirectional
­ Ask two Q’s:
­ How many instances of B are related to one instance of A?
­ How many instances of A are related to one instance of B?
Naming Conventions
­ Make objects unique and distinguishable from other objects
­ Entity names ­ Descriptive of the objects in the business environment, use familiar terminology
­ Attribute names ­ Descriptive of the data represented (also good to prefix with name of the entity)
­ Proper naming convention ­> self documenting
2.5 The Evolution of Data Models
­ Implementation models
­ Hierarchical DB models (not covered)
­ Network DB models (not covered)
­ Object­oriented DB models
­ Relational DB models
­ Conceptual Models covered in this course
­ Entity­relationship (ER) model
­ Object­oriented (OO) model
Hierarchical and Network Models
­ Hierarchical Model​ ­ Developed to manage large amounts of data for complex manufacturing projects
­ Basic logic represented by an upside­down tree,
contains levels (​segments​)
­
­
­
­
­
­
Within the hierarchy, a higher layer is perceived as the parent of the segment directly beneath it,
called a child
Advantages​:
­ Data retrieval can be f​ ast
­ 1:M promotes data ​integrity
­ High ​security
­ Efficiency with 1:M​ fixed relationships
Disadvantages
­ Cannot support M:N relationships​ (not all situations call for only 1:M relationships)
­ Data ​dependency
­ No data definition​ or ​manipulation language
Network Model​ ­ Created to represent complex data
relationships more effectively
­ Allows a record to have more than one parent
Advantages​:
­ Handles M:N relationships (better reflects real life)
­ Owner/member relationship promotes d
​ atabase
integrity
­ Data access and ​flexibility​ better than in
hierarchical model
Disadvantages​:
­ Difficult to design
­ Difficult to change​ once implemented
­ Data requests ​require highly technical skills​ (Programmers might have those, but
managers?)
­ Overall ​expensive
The Relational Model
­ Introduced in 1970 by E.F. Codd
­ DB only requires an entity and the relationship between said entities.
­ Info is stored regarding entities and how they related
­ Relational diagram​ ­ A representation of the relational DBs entities, the attributes with those entities,
and the relationships between those entities
­ Advantages:
­ Ability to ​simplify complex relationships
­ Data ​independent
­ Relatively ​easy to design and re­design​ the database
­ Sophisticated ​Structured Query Language (SQL)​ leads to ability to implement a
​ d hoc queries
­ Disadvantages;
­ Need for ​specialised staff
­ Development, installation, maintenance and security c
​ osts
The Entity Relationship model
­ ER Model​ ­ A detailed, logical representation of the data for an
org or for a business area
­ Expressed in terms of ​entities ​in the business environment, the
relationships​ or associations among those entities, and the
attributes ​of both the entities and their relationships
­
Normally represented using an ​ER diagram​, a graphical representation of the ER Model. Two
notations:
­ Chen notation (used in this course) ­ favours conceptual modeling
­ Crow’s foot ­ favours a more implemnentation­oriented approach
Object­Oriented Data Model (OODM)
­ Use data approach to program, develop classes etc. and how they interact
­ Data and relationships​ exist in a single
structure known as an object
­ OODM is the basis for ​object­oriented database
management system (OODBMS)
­ OODM is a semantic model
­ Contains meaning on relationships between
facts in an object as well as info about
relationships with other models
­ Specialised for certain problems
­ OODM allos object to contain all o
​ perations​ that can
be performed on it
­ OO Terminology:
­ Object​ ­ Abstraction of a real­world entity
­ Attributes​ ­ Describe properties of an object
­ Classes​ ­ Objects of similar characteristics
­ Unified markup language (UML)​ based on OO concepts that describe diagrams and symbols used to
graphically model a system
The Future of Data Models
­ Hybrid DBMSs​ ­ Retain adv. of relational model, provide object­oriented view of underlying data
­ SQL data services​ ­ Store data ​remotely​ without incurring expensive hardware, software, and
personnel costs
­ Companies operate on a “​ pay­as­you­go”​ system/​cloud​­based system
Data Models: A summary
­ Common characteristics of data models to be accepted:
­ Some degree of conceptual simplicity without compromising the semantic completeness
­
­
Must represent the real world as closely as possible
Behaviour must be in compliance with consistency/integrity characteristics of any mode
2.6 Degree of Data Abstraction
­ Data abstraction​ ­ ​reduction of a particular body of data to a simplified representation of the whole
External Model
­ External Model​ ­ the end users’ view of the data environment
­ Subsets of database based on permissions
­ A specific representation of an external view is known as
an ​external schema
­ Advantages of using external views:
­ Easy to identify specific data required for each
business unit
­ Makes designers job easy by providing feedback
about model’s adequacy
­ Ensure ​security ​constraints in the DB design
­ Makes application program development much
simpler
Conceptual Model
­ Conceptual Model​ ­ a global view of the entire DB as viewed by the entire org (i.e. integrates all
external views)
­ Basis for identification and high­level description of the main data objects
­ Uses two techniques:
­ ER Modelling​ ­ Top­down approach. Begins by looking for the data groups in the system
­ Based off the real world
­ Normalisation​ ­ Bottom­up approach. Begins by looking at the smallest individual items of data
recorded by the system
­ Building on first approach, fine tuning
­ Advantages of conceptual model:
­ Provides a relatively easily understood bird’s­eye (macro level) view of the data environment
­ Logical design​ ­ Both ​software independent​ (model does not depend on DBMS) and
hardware independent​ (model does not depend on hardware used in implementation)
­ ∴​ changes can be made with no effect on database design
Internal Model
­ Internal Model​ ­ Representation of the database as “seen” by the DBMS
­ Used when database is implemented
­ Internal Schema​ depicts specific representation of an internal model, using the database constructs
supported by the chosen database
­ i.e. depends on specific database software
­ ∴ ​A change in DBMS software ­> internal model must change
­ Logical independence​ ­ You can change the internal model without affecting conceptual model
Physical Model
­ Physical Model​ ­ operates at the lowest level of abstraction, describing the way data are saved on
storage media such as disks or tapes
­ Definition of both physical storage devices and (physical) access methods required
­ Precision required ­> DB designers who work at this level have detailed knowledge of
hardware/software\
­ Relational model logical ­> does not require physical­level details
­ Implementation​ of relational model may require physical­level fine­tuning for ​↑ ​performance
Chapter 4: Entity Relationship (ER) Modeling
4.1 The Entity Relationship Model (ERM)
Entities
­ Entities​ ­ an object about which the system requires to hold data
­ Entity type​ (class) ­ a collection of entities that share
common properties or characteristics (e.g. Person)
­ Entity instance​ ­ A single occurrence of an entity type
Attributes
­ Attributes​ ­ A property or characteristic of an entity that is of interest to the org
­ Each entity type has a set of general attributes associated with it
­ e.g. STUDENT has “Student ID”, “Student Name”, ...
­ Each entity instance has specific values of the attributes associated with it
­ e.g. S. LAW has “S221”, “Law, S.”, …
­ Can be ​required​ or ​optional
­ Attributes have ​domains​ (the attributes set of possible values)
­ Types of attributes:
­ Composite attribute​ ­ Super­set of sub­attributes (e.g. Address (= street, city, state and area
code))
­ Composite key​ ­ Two attributes to identify an instance (a composite PK) (e.g Flight_ID)
­ Simple attribute​ ­ cannot be subdivided (e.g. Student_ID)
­ Single­valued attribute​ ­ only has one value (simple or composite)
­ Multi­valued attribute​ ­ Can have many values (e.g. Skill)
­ To split ­ make new attributes for each instance OR make new entity
­ Represented using double lines
­ Derived attribute​ ­ Derived using an algorithm (not physically stored) (e.g. Years employed)
­ Represented using a dotted line
­ Can be as simple as adding two attribute values
­ Key attribute​ ­ Unique so to identify the entity
­ e.g. zID, Telephone
Keys
­ Key​ ­ An attribute/set of attributes whose values uniquely identify one occurrence of that entity
­ Candidate Key​ ­ an attribute that ​uniquely identifies each instance​ of an entity type (potential key)
­ Primary Key (PK)​ ­ ​Candidate key that has been selected​ to be used as an identifier for an entity type
­ Key you actually use
­ Characteristics of a good PK:
­ Unique values​ ­ PK must uniquely identify each entity instance. Cannot contain NULLS
­ Nonintelligent​ ­ PK should not have embedded semantic meaning other than identifying
­ No change over time​ ­ PK should be permanent and unchangeable otherwise update
issues for FKs, etc.
­ Preferably single­attribute​ ­ Simpler for linking FKs
­ Preferably numeric​ ­ Can implement counter style auto­increments
­ Security compliant​ ­ Don’t use sensitive data (e.g. social security number) for ID
­ Foreign Key (FK)​ ­ An attribute that contains a data item that is the P
​ K of another entity
Relationships
­ Relationship​ ­ A link between two entities (participants) which is significant for the system
­ Relationships always operate in both directions
­ Degree of a relationship​ ­ the number of entity types that participate in that relationship
­ e.g. Unary, Binary, Ternary, Quaternary
­ Relationships can be:
­ One to one
­ One to many
­ Many to many
­ Recursive (in a unary relationship)
­
Relationship strength​ ­ How the PK of a related entity is defined
­ Weak (non­identifying) relationships​ ­ PK of the related entity does not contain a PK
component of the parent entity (i.e. entity is independent)
­ Strong (identifying) relationships​ ­ PK of the related entity contains a PK component of the
parent entity (i.e. entity is dependent/weak)
Connectivity
­ Connectivity​ ­ Describes the relationship classification (e.g. 1:1, 1:M, M:N)
­ Indicates on ER diagram using numeric notation
Cardinality
­ Cardinality​ ­ The specific number of entity occurrences associated with one occurrence of a r​ elated
entity
­ “For example, the cardinality (1,4) written next to the CLASS entity in the “PROFESSOR
teaches CLASS” relationship indicates that each professor teaches up to four classes, which
means that the PROFESSOR table’s primary key value occurs at least once and no more than
four times as foreign key values in the CLASS table.”
­ Indicated by placing appropriate numbers besides the entity using the format (x, y) where x = min and y
= max
­ DBMS cannot handle implementation of cardinalities at the table level ­ provided by the application
software or by triggers
­ Cardinality constraint​ ­ The number of instances of entity A that can be associated with each instance
of entity B
­ Derived from business rules
­ Minimum cardinality​ ­ Minimum number of instances of one entity that is associated with each
instance of another entity
­ Maximum cardinality​ ­ Maximum number of instances…
­ Relationship participation​ ­ A participating entity in a relationship can be either o
​ ptional​ or ​mandatory
­ Determined by specific meaning of the terms used (depends on context, need to state
assumptions)
­ If Entity A has an optional relationship with Entity B, represented with a circle (see son below)
Weak Entities
­ Weak entity​ ­­ An entity that ​relies on the existence of another entity​. It has a PK that is partially or
totally derived from the parent entity
­ Indicated on ER Diagram using a double­walled entity rectangle
­ Implemented in the DBMS if an entity has a mandatory
FK
­ Meets two conditions:
­ Existence­dependent​ ­ Cannot exist without
entity with which it has a relationship
­ Has PK that is ​partially or totally derived from
the parent entity​ in the relationship
­ DB Designer usually determines whether an entity can
be weak ​based on business rules
­ If it is ​existence­independent​ (exists apart from
related entities) ­> ​strong (or regular) entity
Composite Entity
­ Composite entity​ ­ An entity type that associates the instances of one or more entity types. Contains
attributes that are peculiar (singular) to the relationship between those entity instances
­ Turn a relationship into an entity for additional info on relationships
­ M:N relationships should be avoided as relational databases can only handle 1:N relationships
­ M:N relationships should be d
​ ecomposed​ to 1:M relationships via a c
​ omposite entity
­ The composite entity:
­ Builds a ​bridge​ between the original entities
­ Composed of the ​PKs of the original entities
­ Is ​existence­dependent​ on the original entities
­ May contain ​additional attributes
­ Makes it easier to add info (new rows rather than columns)
­ Surrogate key​ ­ Not derived from data but artificially created for the composite entity ­ A
​ VOID!
­ Stops cascading delete as composite entity is no longer
reliant on FKs
Supertype and Subtype
­ Supertype​ ­ A more ​generic​ entity type compared to its subtypes
­ Subtype​ ­ A more ​specific​ entity type compared to its supertype
­ Inherits ​all attributes of the supertype
­ Has additional, ​specific attributes
­ An instance of a subtype is also an instance of a supertype BUT
an instance of a supertype may or may not be an instance of one
or more subtypes
Generalisation and specialisation
­ Generalisation​ ­ The process of defining a general entity type from a set of specialised entity types
­ Bottom­up ​process from subtypes to supertypes
­ Specialisation​ Defining one or more subtypes of the supertype
­ Top­down​ ​process from supertypes to subtypes
Constraints
­ Completeness constraint​ ­ whether an instance of a supertype must also be an instance of at least
one subtype
­ Total specialisation rule​: Yes!
­ Partial specialisation rule​: No!
­ Disjointness constraint​ ­ whether an instance of
a supertype may simultaneously be a member of
two (or more) subtypes
­ Disjoint constraint rule:​ No!
­ Overlap constraint rule:​ Yes!
­
Subtype discriminator(s)​ ­ the attribute(s) of the supertype that determine (code, note, identify) the
target subtype
­ Disjoint Constraint rule: One attribute
­ Overlapping constraint rule: composite attribute/several attributes
4.2 Developing an ER Diagram
­ An ​iterative process​, thus, based on repetition of processes and procedures. Usually involves the
following activities
­ Create a detailed narrative of the orgs operations
­ Identify the business rules based on the description of operations
­ Identify the main entities and relationships from the business rules
­ Develop the initial ERD
­ Identify the attributes and PKs that adequately describe the entities
­ Revise and review the ERD
­ During review, likely to uncover new objects, attributes, relationships, etc ­> important
­ During design, DB designer can gain info from interviews BUT also examining business forms/reports
ER Modelling Guideline
­ Data items should be put into ​logical groups
­ For each data group/entity type, there should be a ​key​ that uniquely identifies indv. members of entity
type
­ There should be ​no redundant data​ in the model
­ Ask yourself the following Q’s:
­ What are the relevant entities here?
­ What are the relevant relationships here?
­ Can I generalise some entities?
­ Document your ​assumptions​ as you go
­ Leave ​cardinalities​ until the end
­ There is no mechanical procedure, use rules of thumb and intuition. You will need many drafts..!
4.3 Database Design Challenges: Conflicting Goals
­ DB designers often make design compromises triggered by conflicting goals such as:
­ Adherence to design standards​ ­ Design standards help guide you in developing logical
structures that minimise data redundancies
­ Processing speed​ ­ Many orgs priorities processing speeds which is = minimal access time
which may be achieved by minimising the number/complexity of logically desirable relationships
­ Information requirements​ ­ May prioritise info generation which may ­> data transformations
which may expand number of entities/attributes ­> sacrifice “clean” design and/or high speed
­ Design is important BUT must meet end user requirements such as performance, security, shared
access, data integrity, query/reporting needs etc.
­ Documentation is important to understand and modify designs, ensures data compatibility and
coherence
Chapter 3: The Relational Database Model
3.1 A Logical View of Data
Relational Model
­ Relational Model​ ­ Represents data in a two dimensional table
called a ​relation​. Includes:
­ Relations​ ­ Two dimensional tables
­ Attributes​ ­ The column headers of a relation
­ Tuples​ ­ The rows of a relation (records, connected)
­ The name of a ​relation​ (table) and its set of attributes (column
headers) are the ​schema​ for the relation
­ Blueprint, no data
­ Database schema​ (metadata) ­ the set of schemas for all
relations in the design
­ Data dictionary​ ­ Describes the DB schema
­ Usually implemented in a ​RDBMS​ (relational database management system) such as O
​ racle
­ Relation​:
­ Every relation has a ​unique name
­ Every attribute value is a
​ tomic​ (no multi­value records)
­ Every row is ​unique
­ Attributes​ in tables have ​unique names
­ Can be same name if in different tables but should refer to the same info
­ Order of the columns/rows​ is ​irrelevant
3.2 Keys
­ Candidate Key​ ­ Any set of one or more columns whose combined values are unique among all
occurrences (i.e. tuples or rows)
­ Primary Key (PK)​ ­ the PK is any candidate key of that table which the DB designer arbitrarily
designates as “primary”
­ Alternate Key​ ­ the AKs are any candidate keys not currently selected as the PK
­ Foreign Key (FK)​ ­ A set of one or more columns in any table which may hold the values found in the
PK column of another table
­ The key’s role is based on ​determination​ (i.e. “A determines B” means if you know A you can
determine the value of B)
­ Determination is used in the definition of f​ unctional dependence​ ­ ​“The attribute B is
functionally dependent on the attribute A if each value in column A determines one and only one
value in column B.”
3.3 Integrity Rules
­ Three basic types of ​database integrity constraints:
1. Entity integrity​ ­ Requiring each row in a table has a different PK value (no NULLS)
­ NULLS should be avoided because their meaning isn’t clear, some designers use f​ lags
to indicate the absence of some value (e.g. ­99 to show no value has been assigned)
2. Referential integrity​ ­ Requiring the existence of a corresponding PK in another table for any
FK​ value
­ Cascading integrity when related records are deleted
3. Domain integrity​ ­ Restricting data in a column to its p
​ redefined data types
3.4 Relational Set Operators
­ Relational algebra​ ­ Defines the theoretical way of manipulating table contents using the right
relational operations:
­ SELECT ​­ yields values for all rows found in a table that satisfy a given condition (horizontal)
­ PROJECT ​­ yields all values for selected attributes (vertical)
­ UNION​ ­ combines all rows from two tables, excluding duplicate rows (must be ​union compatible
­ tables have same attribute characteristics)
­ INTERSECT​ ­ yields only the rows that appear in both tables
­ DIFFERENCE​ ­ yields all rows in one table that are not found in the other table
­ PRODUCT ​­ yields all possible pairs of rows from two tables (known as a Cartesian product)
­ JOIN ​­ allows info to be combined from two or more tables’
­ Inner Join​ ­ only returns matched records from the tables that are being joined
­ Natural Join​ ­ Links tables by selecting only the rows with common values in
their common attributes
­ Equality Join​ ­ Links tables on the basis of an equality condition (=) that
compares specified columns of each table
­ Theta Join​ ­ Use of any other comparison operator (>, <, etc.) to link tables
­ Outer Join​ ­ Matched pairs retained, any unmatched values left null
­ Left Outer Join​ ­ yields all rows from table A, inc. those not matched in table B
­ Right Outer Join​ ­ yields all rows from table B, inc. those not matched in table A
­ Full Outer Join​ ­ yields all rows from table A and table B
­ DIVIDE ​­ Uses one single­column table as the divisor and one 2­column table as the dividend
3.5 The Data Dictionary and the System Catalog
­ Data dictionary​ ­ Provides a detailed description of all tables found within the user/designer­created
database (contains all attribute names/characteristics ­ metadata)
­ System Catalog​ ­ A ​detailed system data dictionary​ that describes all objects within the DB, inc. data
about table names, the table’s creator/creation date, no. of columns in each table, data type of each
column, index file names, index creators, authorised users, and access privileges
­ Automatically produces DB documentation
­ In general terms, ​homonyms​ (same attribute name for different attributes) and ​synonyms​ (different
names to describe the same attribute) must be avoided
3.6 Relationships within the relational database
Conceptual Model to Relational Model
­ In general, ​each entity will be converted to a relation​. THe attributes of the entity becomes the
attributes of the relation
­ Eliminate ​composite and multi­valued attributes
­ Translate each ​entity ​into a r​ elation​ (table)
­ Translate appropriate ​relationships​ into a ​relation​ (others might just be a FK link)
Examples of mapping the ER Diagram to the Relational Model on the next page
3.7 Data redundancy revisited
­ The proper use of FKs does not eliminate data redundancies, but m
​ inimises​ them
­ Data redundancies can be damaging ­> proper use of FKs reduces this risk]
­ Sometimes data redundancies are required, e.g. To preserve historical accuracy of data, make
searching easier
3.8 Indexes
­ Index​ ­ An orderly arrangement used to logically access rows in a table. Composed of an i​ ndex key
(the index’s reference point) and a set of pointers (where the data is)
­ Purposes of indexes in DBMSs:
­ Retrieve data more efficiently
­ Retrieve data ordered by a specific attribute or attributes (e.g. can index customer’s last name
and order alphabetically)
­ Unique index​ ­ an index in which the index key can have only one pointer value (row) associated with
it (e.g. the PK)
­ A table can have many indexes, but each index is associated with only one table
­ Index key can have multiple attributes (composite index)
3.9 Codd’s Relational Database Rules
­ Published in 1985 by Dr. E. F. Codd to define a relational database as vendors were marketing
products as relational when they were not.
­ Note: even the dominant DB vendors do not fully support all 12 rules
Chapter 6: Normalisation of Database Tables
6.1 Database Tables and Normalisation
Logical Data Modelling
­ Conceptual Data Model​ ­ Represents the ​conceptual view​ of org data (e.g. ER Model)
­ Logical Data Model​ ­ Describe org data in a way that could be used for i​ mplementation​ in a DBMS
(e.g. Relational Model).
­ Logical mode is still independent of any particular DBMS
Redundancy
­ DB designers aim to ​reduce redundancy​ (i.e. DB should not store same data several times) to save
space and prevent problems
­ Aim for the ​rule(s) of one:
­ One ​type of item/entity type​ = (only) one r​ elation/table
­ One ​item/entity instance​ = (only) one ​tuple/row
­ One ​fact/attribute​ about entity = (only) one a
​ ttribute/column
­ Each attribute should explain (only) the entity type (relation/table) it belongs to
­ To achieve these aims, we use ​normalisation techniques
Normalisation
­ Normalisation​ ­ A process for converting complex data structures (relations) into simpler, more table
data structures
­ “Don’t add columns, add rows”
­ Normalisation:
­ Is a ​process​ that is accomplished in s
​ tages
­ Is a technique that is used to ​define “goodness”​ (or “badness”) of a relation
­ Results in data structures that have some ​desirable​ (“good”) p
​ roperties
­ Normal Form​ ­ a certain ​state​ of a ​relation​. Can be determined by apply r​ ules regarding dependencies
­ Uses a concept known as f​ unctional dependency
Functional Dependency
­ Functional Dependency​ ­ a ​semantic restriction​. It expresses the fact that some values for a relation
are not possible, given the way the world works
­ FDs are…
­ relationships​ between ​attributes ​in a relation
­ semantics ​of the attributes in a relation
­ can be inferred in a systematic way b
​ y applying a set of ​inference rules
­ Inference Rule​ ­ Logic rule for determining FD
­ A→B​ is an inference rule. Read: A determines B.
­ In a relation R: An attribute B is “functionally dependent” on an attribute A if the value of A
uniquely determines the value of B
­ Armstrong’s Inference Rules​ ­ a set of inference rules that can be used to ​infer all the FDs​ based on
a given set of FDs. Three rules (if x, y, z, w are attributes of a relation R) are:
1. Inclusion (Reflexive) Rule​ ­ ​if y ⊆ x then x → y
­ (⊆ = is a subset of)
­ e.g. IF State ⊆ Postcode, then Postcode→State.
­ 2052→NSW, 3000→VIC
2. Augmentation Rule​ ­ ​if x→y then wx→wy
­
­
­ e.g. if Postcode→State then Suburb,Postcode→Suburb, State
­ Randwick,2052→ Randwick, NSW
3. Transitivity Rule​ ­ ​if x→y and y→z then x→z
­ e.g. if Postcode→Suburb and Suburb→State then Postcode→State
­ 2052→Randwick and Randwick→NSW then 2052→NSW
Armstrong’s rules can be used to determine e
​ xtended inference rules
­ Additivity (Union) Rule​ ­ ​if x→y and x→z then x→yz
­ IF Postcode→State AND Postcode→Suburb
THEN Postcode→Suburb,State
­ Combines ​Transitivity​ and ​Augmentation
­ Decomposition (Projective) Rule​ ­​ if x→yz then x→y and x→z
­ IF Postcode→Suburb,State
THEN Postcode→Suburb AND Postcode→State
­ Reverse of ​additivity rule
­ Pseudotransitivity Rule​ ­ ​if x→y and wy→z then wx→z
­ IF Suburb→City AND Postcode,City→State
THEN Postcode, Suburb→State
­ Transitivity that
­ Accumulation Rule​ ­ ​if x→yz and z→bw then x→yzbw
­ Decomposition​ that x→z, ​transitivity​ that x→bw, ​additivity​ that x→y, x→z, x→b, x→w =
x→yzbw
Sets of FDS:
­ F of​ ­ a set of given FDs
­ F+​ ­ set of all implied FDs (full set). Called the c​ losure of F
­ F​min​ ­ minimal set (minimal cover) of FDs equivalent to F.
­ No redundancies ­ does not lose info, could determine F and F+ from F​min
­ Use Armstrong’s inference rules to change F of FDs to F+ or F​min
6.2 The Need for Normalisation
Lossless Decomposition
­ Our aim is to ​decompose​ relations/tables so to ​reduce size/redundancy
­ We use ​inferences rules​ for this decomposition ​process
­ We need to be sure that the decomposed components (tables/relations) have the l​ ossless​ join
property (i.e., decomposed components could be joined back together to the original table/relation)
Normalisation
­ Normalisation​ is a process for converting a relation to a s​ tandard (normal) form​. Is about being able to:
­ Decompose a relation/table into ​smaller components
­ In such a way that we could r​ ecapture the precise content​ of the original relation/table if we
would join (i.e. natural join) the decomposed components
­ Based on paper: Codd (1971)
­ Reasons for applying normalisation:
­ Minimise/​eliminate redundancy​ (duplicate data, one entity is recorded more than once in DB)
­ Prevent data inconsistencies​ through update, deletion, and insertion a
​ nomalies
­ Addition/insertion anomaly​ ­ Failure to add new data in all places where data needs to
be added (conflicting data)
­ Deletion anomaly​ ­ Failure to remove new data in all places data needs to be removed
­
­
Update anomaly​ ­ Failure to update new data in all places where data needs to be
updated
To make ​database design consistent
6.3 The Normalisation Process
­ Two types of functional dependence:
­ Partial dependency​ ­ exists when there is a
functional dependence in which the determinant is
only part of the PK
­ For example, if (A, B) → (C,D), B → C, and
(A, B) is the primary key, then the functional
dependence B → C is a partial dependency
because only part of the primary key (B) is
needed to determine the value of C
­ Straightforward to identify
­ Transitive dependency​ ­ such that X → Y, Y → Z,
and X is the primary key. In that case, the
dependency X → Z is a transitive dependency
because X determines the value of Z via Y.
­ More difficult to identify BUT will occur only
when functional dependence exist among
nonprime attributes
1NF
­
­
­
Aim: Create a ​valid​ ​relation
A relation/table is in ​1NF​ if:
­ All ​attributes​ contain only a
​ tomic value​ (i.e., there are no multivalued attributes)
­ All ​PK attributes​ are ​defined and not NULL​ (i.e. there is at least one candidate key)
Actions to create/check 1NF:
­ Add ​appropriate entry​ in at least the ​PK column(s)
­ Avoid/​split multivalued attributes​ and avoid/​split repeating groups of data​ (i.e transform
multivalued attributes to additional columns, or better, additional rows (via a new table))
2NF
­
­
­
3NF
­
­
­
Aim: remove ​partial dependencies​ (no repeating values in non­key fields)
A relation/table is in ​2NF ​if:
­ Each non­key field is functionally dependent on the entire PK (​no partial dependencies​)
­ The relation/table is in 1NF
Actions to create/check 2NF:
­ Draw FDs ​and ​partial dependencies​ diagrams
­ Remove ​partial dependencies​ (attributes not functionally dependent on the entire PK) by
separating the data items into a separate relation using appropriate PKs (may need
bridge/junction table)
­ Hint: Look for values that occur multiple times in non­key fields. This tells you that you have too
many fields in a single table. In a well­designed DB, the only data that is duplicated is in key
fields used to connect tables
Aim: Remove ​non­key dependencies​ (data that is not dependent on other keys)
A relation/table is in ​3NF​ if:
­ It has ​no transitive dependencies​ (no non­key attributes determined by other
non­candidate­key attributes)
­ It is in 2NF
Action to create/check 3NF:
­ Identify and remove ​transitive dependency
6.4 Improving the design
Areas to consider:
­ Evaluate PK assignments
­ Evaluate naming conventions
­ Refine attribute atomicity (keep subdividing attributes until it can no longer be subdivided)
­ Identify new attributes
­ Identify new relationships
­ Refine PKs as required for data granularity
­ Granularity​ ­ The level of detail represented by the values stored in a table’s row
­ Using a surrogate PK provides lower granularity and yields greater flexibility
­ Maintain historical accuracy (may require redundant data to ensure values aren’t changed)
­ Evaluate using derived attributes
6.5 Surrogate key considerations
­ Surrogate key may be used when:
­ Composite PK is too cumbersome to use, difficult to write search routines
­ PK might have too much descriptive content to be usable
­ Other reasons (e.g. To maintain historical data)
­ Surrogate key usually system­defined, managed via DBMS, numeric, automatically incremented
­ Decision requires trade­offs and professional judgement
­ Limitations might be undesirable from a managerial point of view ­> surrogate keys
3.6 Higher­Level Normal Forms
Boyce­Codd Normal Form (BCNF)
­ Aim: Higher normal forms such as BCNF do cover some specific aspects and problems with 3NF
(nonetheless, 3NF is widely considered to be “sufficient” by DB designers)
­ A table is in BCNF when every determinant (left hand side of dependency) is a candidate key
­ ∴ ​BCNF can only be violated (in 3NF) if a table contains more than one candidate key
­ A relation/table is in ​BCNF​ if:
­ No non­key attribute​ determines p
​ art of the PK​ (i.e. in example, B is part of PK BUT C­>B ​∴
not BCNF)
­
­
­
­ It is in 3NF
Based on paper ​Boyce & Codd (1974)
Sometimes called ​3.5NF
3NF is always achievable, BCNF is not always achievable (Beeri & Bernstein 1979)
4NF
­
­
­
­
Aim: Remove​ ​multivalued dependencies​ (One key determines multiple values of two other attributes
and those attributes are independent of each other)
A relation/table is in ​4NF​ if:
­ No row contains two or more multivalued facts about an entity (no multivalued dependencies)
­ Table is in 3NF
Action to create/check 4NF:
­ Create new tables for components of multivalued dependencies
Note: 4NF is largely academic and problems shouldn’t be encountered if proper design procedures are
used
3.7 Normalisation and Database Design
­ Normalisation should be part of the design process
­ You should be aware of good design principles and procedures as well as normalisation procedures:
­ ERD is created through iterative process
­ Normalisation focuses on characteristics of specific entities (micro view of ERD) ∴
​ ​difficult to
separate normalization and ER modelling
3.8 Denormalization
­ Normalisation is only ​one of many DB design goals
­ Normalised (decomposed) tables require ​additional processing​ ­> ​↓ ​processing speeds
­ Normalisation ​purity​ is often ​difficult t​ o sustain in the modern DB environment
­ Conflicts between design efficiency, info requirements, and processing speed solved through
compromises/tradeoffs​ inc. ​denormalisation
­ Denormalisation​ ­ Process of attempting to optimise the performance of a DB by (re­)adding
redundant data or by grouping data (reverse process of normalisation)
­ Advantage of higher processing speed must be carefully weighed against disadvantage of data
anomalies
­ Further, some anomalies are only theoretical interest and are not practical to remove (e.g. a
separate table for ZIP (ZIP_Code, City) in a customers table
­ Use common sense
­
Defects of unnormalised tables:
­ Data anomalies
­
­
­
Less efficient data updates due to larger tables
More cumbersome Indexing
No simple strategies for creating ‘views’ (virtual tables)
Summary
­ Normalisation is a ​table design technique​ aimed at minimising data redundancies
­ First 3 normal forms (1NF, 2NF, 3NF) are most commonly used
­ Normalisation is an important part ­ but o
​ nly a part​ ­ of the design process
­ Best practice: Continue the iterative ER process until all entities and their attributes are defined and all
equivalent ​tables are in 3NF
­ In exam:
­ If 3NF isn’t necessary, explain why ­ looks good
­ Go through steps of normalisation
Lecture Notes ­ Try to find a place to put these
Argument
­ Argument​ ­ In logic, an argument is a ​set of statements​ of which some of them (the p
​ remises​) are
intended to support another statement (the ​conclusion​)
­ “Valid” ​argument =/= ​“True”​ argument
­ Valid means the argument is following a logical structure (“truth preserving”)
­ Valid does not mean the contents are true (premise must be right)
Deduction
Deduction/deductive argument​ ­ An argument whose ​truth of the conclusion​ necessarily follows from
the ​truth of the premises
­ Makes an ​absolute ​argument
­ DA is ​“valid”​ if it is successful providing logical support for its conclusion (If all premises are
true, then the conclusion must be true). Also sound
­ e.g. A>B and B>C then A>C (Daniel is human, humans are mortal ­> Daniel is mortal)
­ DA is ​“invalid”​ if the truth of the premises does not guarantee that the conclusion is true. Not
sound
­ e.g. A>B and A>C then B>C (Daniel is lecturer, Daniel is German ­> Lecturers are
german)
­ Logical structure of a deductive argument is “​ truth preserving”​: the truth of the premises are
preserved onto the conclusion
­ A good deductively valid argument with truth premises is “​ sound”
Induction
­ Induction/Inductive Argument​ ­ An argument whose ​probabilistics support of the conclusion
necessarily stems from the data/real world observation
­ Claims conclusion is ​likely true​, but not necessarily true (the best answer)
­ An argument is ​strong​ if it is backed up by significant support, and w
​ eak​ if it is without such
support
­ Good inductively strong argument with true premises is ​“cogent”
­ e.g. All dogs you see have fleas, Bruno is a dog ­> Bruno likely to have fleas (likely but not
necessarily true)
Abduction
­ Abduction​ ­ “reverse implication”, the mechanism that changes things
­ e.g. You have a white bean (the result), and you know that all beans in my bag are white (the
generalisation). Hence, this bean must be from my bag, for if it were, it would have to be white
Inference
­ Inference ​­ the process or outcome of “inferring”: deriving by reasoning or concluding from premises or
evidence
­ The process of deriving the ​strict logical consequences o
​ f assumed premises (deductive
inference). Inference is a single step in a deductive chain
­ The process of arriving at some conclusion that, though it is not logically derivable from the
assumed premises, possess some ​degree of probability r​ elative to the premises (​inductive
inference​)
­ In logic, ​modus ponens​ and m
​ odus tollens​ are two forms for making valid inferences/valid argument
­ Modus ponens
1. If p is true, then q is true (Daniel is reliable, so when it’s lecture time, Daniel is at UNSW)
2. P is true (it’s lecture time)
Therefore, q is true (Therefore, Daniel is at UNSW)
­ Modus tollens
1. If p is true, then q is true (Daniel is reliable…
2. Q is not true (Daniel is not at UNSW)
Therefore, p is not true (Therefore, it’s not lecture time)
­ Fallacy of modus tollens/Denying the antecedent
1. If p is true, then q is true (Daniel is reliable…
2. P is not true (It’s not lecture time)
Therefore, q is not true?? (Therefore, Daniel is not at UNSW??)
­
Chapter 7: Introduction to Structured Query Language
7.1 Introduction to SQL
­ Relational DBMS’s query languages (e.g. SQL in Oracle) contain 3 components:
1. Data Definition Language (DDL)​ ­ Used to specify the database schema or modify an existing
one (Create table)
2. Data Manipulation Language (DML)​ ­ Used to manipulate the data (work with existing tables)
3. Data Control Language (DCL)​ ­ Used to control the DB, including saving of data (data access
rights to which user)
Data Definition Language
­ Data Definition Language (DDL)​ ­ DDL SQL statements define the ​structure of a database​, inc. rows,
columns, tables, indexes and DB specifics such as file locations
­ More part of the DBMS ­> ​large differences between the SQL variations
­ DML SQL commands inc. the following (in Oracle SQL):
­ CREATE​ to make a new DB, table, index or stored query
­ DROP​ to destroy an existing DB, table, index or view
­ DBBC ​(Database Console Commands) statements check the physical and logical consistency
of data
Data Manipulation Language
­ Data Manipulation Language (DML)​ ­ DML SQL statements used to ​retrieve and manipulate​ data
from the DB (i.e. this category encompasses the most fundamental commands inc. DELETE, INSERT,
SELECT, and UPDATE etc.)
­ Only ​minor differences between SQL variations
­ DML SQL commands inc. the following:
­ DELETE​ to remove rows
­ INSERT​ to add a row
­ SELECT​ to retrieve a row
­ UPDATE​ to change data in specified columns
­ Two types of DML:
1. Procedural, low­level DML​ ­ Specify exactly ​what d
​ ata is needed and h
​ ow​ this data is to be
created (e.g. programming language C, relational algebra)
­ What you do and how you do it (e.g. open file)
2. Non­procedural, high­level DML​ ­ Specify exactly ​what d
​ ata is needed, but now hot to create
this data (leaving the ​how​ to the internal implementation of a DBMS such as Oracle) (e.g. query
language SQL, relational calculus)
Data Control Language
­ Data Control Language (DCL)​ ­ DCL SQL statements control the ​security and permissions​ of the
objects or parts of the DB
­ More part of the DBMS and have hence l​ arge differences between the SQL variations
­ DCL SQL commands inc. the following (in Oracle SQL):
­ GRANT​ to allow specified users to perform specified tasks
­ DENY​ to disallow specified users from performing specified tasks
­ REVOKE ​to cancel previously granted or denied permissions
Relational Languages
­ Codd (1970, 1971)’s ​relational model​ is the conceptual and theoretical basis for relational DBs.
Includes ​two relational languages​:
1. Relational Algebra​ ­ ​procedural​, ​high­level language​ that provides a procedural (step­by­step)
way for specifying queries (Relational algebra provides a o
​ rder of steps​ to get to certain data)
2. Relational Calculus​ ­ ​non­procedural​, ​low­level language​ that provides a declarative way to
specify DB queries (“declares” a ​definition​ to get to certain data)
­ SQL is user­friendly relational calculus
­ For every expression in relational algebra there is an equivalent in relational calculus and vice versa
(​logically equivalent​)
­ Relational algebra/calculus are ​not very user friendly​. People almost always use ​SQL​ which is ​based
on relational calculus​, to work with RDBMS
Relational Algebra
­ Relational algebra has ​operations​. These fall into 3 main categories:
1. Union, Intersection​ and ​Difference​ ­ Boolean operations to define a new relation based on two
existing relations
2. Selection​ and ​Projection​ ­ Operations that remove parts of a relation
3. Cartesian Product​ and ​Join​ ­ Operations that combine the tuples of two relations
Union, Intersection and Difference
­ Union, Intersection and Difference are ​operations*​ on ​two relations (R and S)​, both relations should
have schemas with ​identical sets of attributes​ and ​identical order of the attributes
­ *​Other terms for “operations” are “​operators”​ and “​set operations​” (because they refer to
mathematical sets of distinct objects)
­ UNION: ​R ∪ S
­ The union of R and S is the set of all tuples that are in
R, S or both
­ In short: ​combine all tuples!
­ INTERSECT: ​R ∩ S
­ The intersection of R and S is the set of tuples that
appear in both tables
­ In short: ​find the common tuples!
­
​DIFFERENCE: ​R ­ S
­ The difference of R and S, is the set of tuples that are in
R but not in S
­ In short: ​Subtract the tuples in S from the tuples in R!
Selection and Projection
­ Selection and projection operations are applied to a single relation (R)
­ SELECTION​ ­ Returns a relation that contains only those tuples from a specified relation (R) that
satisfy a specified condition (​horizontal subset of a table​)
­ Relational operator is σ. ​σ​ predicate​R
­ PROJECTION​ ­ Returns a relation that contains a list of tuples for selected attributes from a specified
relation (R) eliminating duplicates (​vertical subset of a table​)
­ Relational operator is Π. ​Π​attribute 1, … attribute n​ R
Cartesian Product and Join
­ Cartesian ​= “relating to Rene Descartes (1596­1650) and his ideas”. Descartes made major progress
in analytical geometric
­ Cross Join (Cartesian Product)​ ­ Select all possible combinations of tuples in R with tuples in S
­ “R * S”, “all possible tuple combinations of two relations”, “everything join everything”
­ In SQL:
­ Explicit ​cross join ­ SELECT * FROM R C
​ ROSS JOIN​ S
­ Implicit ​cross join ­ SELECT * FROM R, S
­
Inner Join​ ­ Returns combined tuples from two relations that have the same value for a defined
attribute (match on the attribute/fulfill a certain criterion). Default/most common join type
­ SELECT * FROM R ​INNER JOIN​ S
EXPLICIT
ON R.attribute = S.attribute
­
­
­
­
SELECT * FROM R, S
IMPLICIT
ON R.attribute = S.attribute
Equi Join​ ­ joins based on equivalence (=) (e.g. the example)
Theta Join​ ­ When other comparison operators are used (<=, >=, <, >)
Natural Join​ ­ Joins tuples based on all attributes with identical names in the two relations (agree in
value for whatever attributes are common to the schemas of R and S ­ attributes are not explicitly
specified)
­
Full Outer Join​ ­ Selects and joins tuples from two tables that match on defined attribute. If there is no
match for a tuple, the tuple will still appear with missing attributes shown as NULL
­ SELECT * FROM R
FULL OUTER JOIN ​S
ON R.attribute = S.attribute
­
Left Outer Join​ ­ Select and joins tuple from the “left” table (R) with tuples from the “right” table (S) on
defined attributes. If there is no match, the attributes from the right side will contain NULL values
­ SELECT * FROM R
LEFT OUTER JOIN​ S
ON R.attribute = S.attribute
Right Outer Join​ ­ Select and joins tuple from the “left” table (R) with tuples from the “right” table (S) on
defined attributes. If there is no match, the attributes from the left side wil contain NULL values
­ SELECT * FROM R
RIGHT OUTER JOIN​ S
ON R.attribute = S.attribute
­
SQL
­
­
­
­
­
­
SQL = Structured Query Language = Sequel
SQL is the ​first standard database language
Originally ​developed by D. Chamberlin and R. Boyce at IBM
The most common SQL standard is ANSI/ISO SQL. Latest revision is S
​ QL:2011
Microsoft, Oracle, and other vendors have introduced deviations from ANSI SQL
As a relational language, SQL has ​three main components
­ Data Definition Language (DDL)
­ Data Manipulation Language (DML)
­ Data Control Language (DCL)
SQL DDL
­ To create the database structure:
­ CREATE SCHEMA AUTHORIZATION creator
­ e.g. CREATE SCHEMA AUTHORIZATION Chris
­ CREATE DATABASE Database_Name
­ e.g. CREATE DATABASE Student
­ To create tables:
­ CREATE TABLE Table_Name​ (
column_name
data_type [NULL | NOT NULL],
…
);
­
­
­
­
Security considerations may require that certain data be hidden from users
View​ is any relation that is made ​visible to the user
A view is a “​virtual relation”
SQL command is:
­ CREATE VIEW Viewname AS Statement
SQL DML
­ ANSI/ISO SQL standard use the terms ​“tables”, “columns”​ and
“rows”​ (not relations, attributes, and tuples)
­ The ​principal SQL DML statements​ are:
­ SELECT
­ INSERT
­ UPDATE
­ DELETE
­ Complete SQL statements consists of ​reserved words​ and ​user­defined​ words:
­ The ​reserved words​ are fixed ​part of the language
­ The ​user­defined words​ represent the meaning of the data to the user (e.g. “users”,
“bookings”)
Understanding SQL Query Structures
­ The ​SELECT​ statement is used to retrieve and display data from one or more tables
­ Relational algebra’s ​selection, projection and join statements​ can be performed with ​one single
SELECT statement
­ “SELECT FROM WHERE”
­ SELECT​ clause tells which attributes of the tuples matching the condition are produced as part
of the answer
­ FROM​ clause gives the names of relation(s)
­ WHERE​ clause is a condition that tuples must satisfy in order to match the query
SELECT​ ​[​DISTINCT​ ​| ​ALL​]​ ​{​ ​|​ ​[​column_expression AS new_name​] [​, …​]​}
FROM​ table_name ​[​alias​]​ ​[​, …​]
[​WHERE​ ​condition​]
[​GROUP BY​ column_list​]
[​HAVING​ condition​]
[​ORDER BY​ column_list​]​;
­
­
[]​ = optional elements
{}​ = element may or may not appear
|​ = “or”
; = end of the statement
SQL allows us to use keyword ​ALL​ to specify all tuples are to be selected
­ SELECT ALL
SELECT *
FROM ​PRODUCT
OR
FROM​ PRODUCT
SQL supports elimination of duplicates using keyword D
​ ISTINCT
­ SELECT DISTINCT​ Std_name
FROM​ STUDENTS
Mathematical Operators for SQL
­ Mathematical operators that can be used in the W
​ HERE​ clause
­ =
equal to
­ <
less than
­ <=
less than or equal to
­ >
greater than
­ >=
greater than or equal to
­ <>
not equal to
ASCII Codes in SQL
­ All characters/signs are assigned an ​ASCII​ (American Standard Code for Information Interchange)
code by the computer
­ Comparisons of strings are made
from left to right ­> ​useful for
names, problems for numbers
and dates​ (e.g. “2” is > “11”,
“01/01/2020” is sorted before
“12/31/2015” because 0<1)
­ Recommendation: use the
date/number format instead
of string
Logical (Boolean) Operators in SQL
­ Logical operators are:
­ OR
­ AND
­ NOT
­ Found in ​WHERE​ clause
Special Operators in SQL
­ BETWEEN​ ­ Used to define range limits
­ IS NULL​ ­ Used to check whether an attribute value is null
­ LIKE​ ­ Used to check for similar character strings
­ IN​ ­ Used to check whether an attribute value matches a value contains within a subset of listed values
­ EXISTS​ ­ Used to check whether an attribute has a value
Ordering SQL Results
­ ORDER BY ​<columns> : produces a list in ascending order
(also [ASC])
­ ORDER BY ​<columns> [​DESC​] : produces a list in
descending order
SQL Numeric Functions (Aggregate Functions)
­ Numerics functions include:
­ COUNT​ : the ​number of rows​ containing a specified attribute
­ MAX​ : the ​maximum​ value encountered
­ MIN​ : the ​minimum​ value encountered
­ AVG​ : the arithmetic ​mean​ (average) for the specified attribute
­ SUM​ : the ​total value​ for the specified numeric attribute
­ Numeric functions yield only one single value
Unique vs. Distinct
­ SELECT ​DISTINCT​ XY is correct ANSI SQL syntax
­ SELECT ​UNIQUE​ XY is old Oracle SQL syntax (otherwise identical to DISTINCT)
­ Note, you still do use UNIQUE to create tables and indexes
­ CREATE TABLE Test (Attribute Numeric NOT NULL ​UNIQUE​);
­ CREATE ​UNIQUE​ INDEX Unique_Index ON Table (Attribute) TABLESPACE Tablespace;
­ Note: Unique indexes guarantee that no two rows of a table have duplicate values in the key column(s).
Non­unique indexes do not impose this restriction
Grouping Data in SQL
­ GROUP BY ​<column>
­ A query that includes the ​GROUP BY​ groups the data from SELECT table(s) and produces single
summary row for each group
­ SELECT clause may contain column names, aggregate functions or constants
­ All column names in SELECT list appear in the G
​ ROUP BY​ clause unless the name is used only in an
aggregate function
­ The ​GROUP BY​ clause is valid only when used in conjunction with one of the SQL arithmetic functions
Multiple Table Operations in SQL
­ “Multiple table operations” are “joining operations”! (see earlier)
­ SELECT clause identifies the attributes to be displayed
­ FROM clause identifies the tables from which attributes are selected
­ WHERE clause specifies the joining condition for common columns
Lecture Notes: Object Oriented Modelling
8.1 Benefits/Limitations of ​ER/RDB​ Design
­ Relational modelling of data is not the “perfect” solution
­ Relational modelling is not the only approach to data modelling
Benefits
­
­
­
ER modelling​ common and easy design
technique
Models can be transformed, via
normalisation​ techniques, to be
implemented in standard SQL­based DBs
Clear separation between applications
(operations) and DB schema (data)​, data
can be used in different applications
Limitations
­
­
­
­
­
­
ER models cannot adequately support c
​ omplex data
­ the more complex the system, the harder it is to
model
Poor representation of “real­world” entities ­> m
​ any
joins ​during query processing (why we denormalise)
Semantic overloading
Limited​ types of ​operations​ supported ­ the more
complicated operations must be done in application
Handling of r​ ecursive queries​ is difficult
Schema changes​ are difficult
8.2 Object Modelling Concepts
Objects and Classes
­ Object­oriented analysis and design (OOAD​) models the world in objects
­ Object​ ­ An entity that has a well defined role in the a
​ pplication domain​ (our system). Has a ​state​,
behaviour​ and​ identity​.
­ State​ ­ State of an object encompasses its properties (attributes and relationships) and the
values those properties have. (i.e. all values and relationships defined)
­ Behaviour​ ​­ Represents how an object acts and reacts (operations or ‘methods’)
­ Identity
­ Object class​ ­ a set of objects that share a common structure
(share attributes, operations and relationships) (i.e. not the
instance)
­ Class diagram​ ­ an object­oriented model showing:
­ The object classes relevant for a system
­ The internal structure of these object classes
­ The relationships between object classes
­ The overall structure of the system
­ Class diagram is similar to ER ​EXCEPT​ we show what objects
can do (behaviours)
­ Two categories of relationships​:
­ Associations​ ­ Horizontal relation between two object
classes
­ e.g. “Students” (object class 1) may “read”
(association) “books” (object class 2)
­ Subtype/supertype​ ­ Vertical relation between two object classes
­ e.g. “Nurses (object class 1) are “a specific kind of” (subtype) “people” (object class 2)
­ Class diagrams show details about each object class:
­ Attributes​ ­ The dimensions/characteristics of an object class
­ e.g. “Lecturers” (object class) have an “age” and a “faculty” (attributes)
­ Operations​ ­ The functions/services/behaviours/methods provided by an object class
­ e.g. “Lecturers” (object class) can “teach” and “research” (operations
Derivation
­ Derived attribute​ ­ An attribute that can be derived from (is based on) other attributes
­ Derived association​ ­ An association that can be derived from other associations
­ In class diagram, a forward slash (/) indicated derivation
Encapsulation
­ Encapsulation​ ­ an object hides details not relevant for their use from other objects
­ Core idea of OO
­ Objects can be changed only through the use of their i​ nterfaces​ (​public​ ​operations)
­ Can’t be edited by other methods
­ Private operations are not visible, can only be executed by object
­ Benefits of encapsulation:
­ Control​ ­ if something odd is happening, you know exactly where to look (everything is
self­contained)
­ Flexibility​ ­ you can leave work on the internal parts of the object until late
­ Structure​ ­ Impose structure on data and system and system is just objects organised
Inheritance
­ Inheritance​ ­ The ability of an object class to inherit the attributes and operations of its superclass(es)
­ e.g. Class of cats is a subclass of class of mammals. The class of mammals are a superclass of
the class of cats
­ Single inheritance​ ­ A class inherits only from ​one superclass
­ Multiple inheritance​ ­ A class inherits from ​several superclasses
Superclasses and Subclasses
­ Classes​ can be organised into a ​class hierarchy
­ A class can have ​multiple parent classes​ (several superclass­subclass relationships)
­ A ​generalisation path​ (specialisation path) is shown as a solid line from the subclass to superclass
with a hollow triangle at the end pointing toward the superclass
­ Disjointness constraint
­ Disjoint​ ­ A subclass has no overlapping attribute with another subclass
­ Overlapping​ ­ A subclass may have overlapping attributes with another subclass
­ Completeness constraint
­ Incomplete​ ­ There could be other subclasses than those shown on the class diagram
­ Complete​ ­ There cannot be other subclasses; all subclasses are shown on the class diagram
­
­
Concrete class​ ­ A class that has ​direct
instances
­ Real world objects e.g. Research
student/coursework student
Abstract class​ A class that has ​no direct
instances​, but its subclasses may have direct
instances
­ Conceptual placeholder for class to have
to hold attributes (e.g. Postgrad student)
Overriding inheritance
­ Overriding​ ­ The process of ​replacing a method inherited from
a superclass​ by a more specific implementation of that method
in a subclass
­ Define new operation with same name ­> pick local method
over supertype method
­ Reasons for overriding:
­ Extensions​ add to the operation
­ Restrictions​ limit the operation
­ Optimisations​ improve the operation
Containment (Aggregation and Composition)
­ Two forms of ​containment​ type parent­child relationships
­ Aggregation​ ­ Implies a relationship where the child object can exist independently of the
parent object
­ Composition​ ­ Implies a relationship where the child object cannot exist independently of the
parent object
­ Difference of containment to subclass­superclass relationships is it is part of the relationship
­ e.g. Lecture hall can be composed of objects BUT they can exist independently ­> aggregation
(logical relation)
­ e.g. Student is a person ­> composition ​(hierarchical relation)
­ Aggregation​ ­ Implies a relationship where the c​ hild object can exist independently o
​ f the parent object
­ Expresses a ​part­of relationship​ between a c​ omponent object​ and an ​aggregate object
­ Is a kind of association in which a whole, the a
​ ssembly​, is composed of parts, the ​components
­ e.g. ​Course (parent) and Student (child). Delete the Course and the Students still exist
­ Represented with a ​hollow diamond​ at the aggregate end (parent)
­ Composition​ ­ Implies a relationship where the ​child object cannot exist independently o
​ f the parent
object
­ A stronger form of aggregation
­ e.g. ​House (parent) and Room (child). Delete the House and the Rooms cease to exist as well
­ Composition is represented with a ​solid diamond​ at the composed end (parent)
Polymorphism
­ Polymorphism​ ­ The ability of an operation to be applied to many classes
­ Polymorphism ­> operations ​will​ work even if they have the same name
­ e.g. Class: Juggler, operation: T
​ hrow()​ vs. Class: Ball, operation: Throw()
8.3 Benefits/Limitations of OO Design
Benefits
­
­
­
­
­
­
The OO design approach provides both the ​data
identification​ (in same construct (object)) and
the procedures (​data manipulation)​ to be
performed
It supports ​complex data structures​ and
provides a much better implementation of
real­world model
Can very easily use objects somebody has
already created (if they make it available)
“Toolkit of Classes” ­ Daniel
Not platform dependent (neither is RDB but its
application is)
Makes sense for low level data
8.4 Comparison between OOm and ERm
Limitations
­
­
­
­
­
­
­
It is hard to ​learn​ (conceptually different
philosophy)
Code reusability ​is not easy to implement
Creation of ​class hierarchy and defining
interrelationships ​is difficult
Queries may have to be written in 3
​ GL​ (e.g.
C++) ­ writing methods ­> programming,
requires professionals which are hard to find
Few tools (e.g. SQL) s
​ upport is not strong
Lack of support for ​views & security ​(don’t
have DBMS)
Also ​expensive
8.5 Summary of UML OO Modelling
Chapter 9: Database Development Process
9.1 The Information System
­ Information systems ​(IS)​ ​­ Systems that use IT to capture, transmit, store, retrieve, manipulate or
display ​information​ used in one or more business processes
­ Important issues when building IS:
­ The system must ​solve​ the right ​problem
­ The system must be ​built​ in the most ​effective​ way
­ The system must ​fit​ into the existing e
​ nvironment
­ The system must be ​easy ​to ​use
­ The performance of an IS depends on several factors:
­ Application design and implementation (programming)
­ DB design and implementation
­ Administrative procedures
­ Systems analysis​ ­ The process that establishes the need for and the extent of an IS (business rules)
­ Systems development​ ­ The process of creating an IS
­ Database Design​ ­ Takes place within the context and limits of an IS development process
­ Note: the plan and reality are likely to differ!
Evolution of Software Development Process Models
­ Code­and­Fix Model​ ­ Computer came with software (e.g. microwave) ­> manufacturer to fix problem
­ Stagewise Model​ ­ Organised things with a process
­ Waterfall Model​ ­ Organised into 4­5 stages which require signing off ­> stage to stage ­> can use
management principles (e.g. gating)
­ Key to making software development manageable
­ Used in large projects
­ Agile Software Development​ ­ Continually work on short iterations NOT big stages ­> reflects
changing requirements
­ Don’t really know cost/timeline etc.
­ Start­ups and end­user software
9.2 The Systems (Software?) Development Life Cycle (SDLC)
The 5 stages of the SDLC are:
­ Planning​ ­ general overview of the company and objectives
­ Initial assessment
­ Feasibility of a new system (feasibility study). Should address:
­ Technical aspects of hardware/software requirements
­ System cost
­ Operational cost
­ Analysis ​­ Problems defined during planning phase examined in greater detail
­ User requirements
­ Existing system evaluation (How do these requirements fit into overall system?)
­ Logical system design (Data Flow Diagrams, ER Diagrams, etc.)
­ Detailed System Design
­ Completion of design which includes screens, menus, reports etc. (back­end and front­end)
­ Training principles and methodologies also planned
­ Implementation ​­ hardware, DBMS software, application software installed, and DB design is
implemented
­ Cycle of coding, testing and debugging until it's ready for use
­ Installation, fine­tuning
­ Maintenance ​­ includes ​corrective maintenance​ (response to system errors), ​adaptive maintenance
(due to changes in business environment) and ​perfective maintenance​ (to enhance the system)
­ Evaluation
­ Maintenance
­ Enhancement
­ If maintenance cost is too high, systems value is suspect
9.3 The Database Life Cycle (DBLC)
­ Also called the Database Development Lifecycle (DDLC)
­ Part of/subset of/embedded in the SDLC (parallel)
­ Six phases:
1. Database Initial Study
2. Database Design
3. Implementation and Loading
4. Testing and Evaluation
5. Operation
6. Maintenance and Evolution
DBLC Phase 1: Database Initial Study
­ Analysing​ the ​organisation
­ Objectives
­ Operations
­ Structures
­ Defining ​problems​ and ​constraints
­ Function of existing system
­ Input of existing systems
­ Output of existing systems
­ Defining ​objectives
­ Initial objectives
­ Data sharing and interfaces with other
systems
­
Define ​scope​ and ​boundaries
­ DB design for which part of the org?
­ What hardware will be used?
DBLC Phase 2: Database Design
­ Conceptual Design
­ Data analysis​ and ​data requirements​ ­ What are the end­user views needed? What are the
inputs and outputs needed? What info is needed and where does it come from? Is it necessary?
Feasible?
­ ER modelling ​and ​normalisation​ ­ What are the business rules? What are the entities,
attributes and relationships for ER diagram? What are the keys? Do we need to normalise?
­ Model verification
­ Logical Design
­ Translating/​mapping​ the conceptual design into internal model of a selected DBMS (e.g.
ORACLE, Access, MySQL, etc.)
­ Physical Design
­ Defining ​data access characteristics o
​ f the database (e.g. indexes)
­ Optimising ​performance​ (e.g. choice of storage medium, hardware level)
­ Resources for implementation
­ DBMS
­ Hardware
­ Note: Logical and physical design can be carried out in parallel activities, but requires high level of
understanding of software and hardware
DBLC Phase 3: Implementation and Loading
Core tasks:
­ Creating DB
­ Assigning permissions​ to a database ​administrator​ (DBA)
­ Creating tables​ within DB
­ Assigning permission​ to ​users
Further areas needing attention:
­ Performance​ ­ hardware, software, indexes, buffer size, etc.
­ Security​ ­ Physical security, password security, access rights, audit trails, data encryption, diskless
workstations
­ Backup​ and recovery
­ Data integrity
­ Company standards
­ Concurrency control​ ­ Allowing simultaneous access to DB while preserving data integrity
Database security and data privacy
­ Database security​ is to ensure that only authorised users can perform authorised activities at
authorised times
­ Auth​entication​ ­> user has the basic right to use the system (which user it is)
­ Auth​orization​ ­> user has the right to do specific activities on the system (particular rights)
­ Data Privacy​ (information privacy) ­ the relationship between collection and dissemination of data,
technology, the public expectation of privacy and the legal and political issues surrounding them
­ Privacy concerns exist wherever p
​ ersonally identifiable info​ is stored.
­ Legal/ethical use of DB
­
Several wide range of sources where data privacy issues can arise, e.g.:
­ Healthcare records
­ Financial institutions/transactions
­ Criminal justice investigations
DBLC Phase 4: Testing and Evaluation
­ Testing ​performance​/performance fine tuning (over normalised?)
­ Testing ​security constraints
­ Testing ​integrity
­ Testing ​concurrent access
DBLC Phase 5: Operation
­ DB​ (and application) ​completed​, “Going into production/operation”
­ Running ​IS
­ Users​ and applications start to insert, receive, update and delete data…
­ DBA​ starts to (ongoing) fine­tune performance, allocate storage space, control access, backup data
­ Becomes ‘go­to’ guy
DBLC Phase 6: Maintenance and Evolution
­ DBA has responsibility for ​routine maintenance​ activities within DB:
­ Preventative​ maintenance (backup)
­ Corrective​ maintenance (recovery)
­ Adaptive​ maintenance (enhancing performance, adding entities, adding attributes, etc.)
­ Access control, statistics, ​auditing​, periodic system­usage summaries, etc.
9.4 Conceptual Design
­ Conceptual Design​ ­ first stage of DB design process. Goal is to design a DB that is independent of
DB software and physical details
­ Output: conceptual data model that describes main data entities, attributes, relationships, and
constraints
­ 4 steps:
1. Data analysis and requirements
2. Entity relationship modeling and normalisation
3. Data model verification
4. Distributed DB design
9.5 DBMS Software Selection
­ Selection of DBMS critical to IS’s smooth operation
­ Most common factors affecting purchase decision are:
­ Cost​ ­ inc. original purchase price, maintenance, operation, license, installation, training,
conversion
­ DBMS features and tools​ ­ some DBMS inc. tools to facilitate application development task
(e.g. query by example, report generators, data dictionaries, etc.), some make DBA job easier
(security, concurrency control, 3rd­party support etc.)
­ Underlying model​ ­ can be hierarchical, network, relational, object/relational, OO
­ Portability​ ­ DBMS can be portable across platforms, systems, and languages
­ DBMS Hardware requirements​ ­ inc. processor(s), RAM, disk space, and so on
9.6 Logical design
­ Logical Design​ ­ 2nd stage in DB design process. Goal is to design an enterprise­wide DB based on a
specific data model but independent of physical level details
­ All objects in conceptual model mapped to specific constructs
­ 4 steps:
1. Map conceptual model to logical model components
2. Validate logical model using normalisation
3. validate logical model integrity constraints
4. Validate logical model against user requirements
9.7 Physical Design
­ Physical design​ ­ Process of determining the data storage organisation and data access
characteristics in order to ensure its integrity, security, and performance
­ 3 steps:
1. Define data storage organisation
2. Define integrity and security measures
3. Determine performance measures
9.8 Database Design Strategies
Two classical approaches to DB design
­ Top­down design​ ­ starts by identifying data sets then defines data elements for each of those sets.
(i.e. identification of different entity types then definition of each entity’s attributes)
­ Better when overwhelming number, variety and complexity of entities, relations and transactions
­ Bottom­up design​ ­ first identifies data elements (items) then groups them together in data sets (i.e.
first defines attributes, then groups them to form entities
­
­
­ More productive for small DBs with few entities, attributes, relations, transactions
Selection often depends on:
­ Scope of the problem
­ Personal preference
­ Company’s structure (centralised or decentralised)
Two approaches are complementary rather than mutually exclusive
9.9 Centralised vs Decentralised Design
­ Centralised design​ ­ Productive when the data component is composed of a relatively small no. of
objects and procedures
Chapter 15: Database Administration and Security
15.3 Introduction of a Database: Special considerations
DBMS and Organisation Change
­ DBMS is a software package with computer programs that control creation, maintenance and use of
databases (e.g. Oracle)
­ The introduction of a DBMS to an org may affect the org in various ways
­ When a new DBMS is introduced, three important aspects have to be addressed:
­ Technological​ ­ DBMS hardware and software
­ Managerial​ ­ Administrative functions
­ Cultural​ ­ Corporate resistance to change
­ Not just being more effective BUT must look at power structure/org culture (may resist change)
15.4 The Evolution of the Database Administrator Function
­ Database Administrator (DBA)​ ­ A technical function that is responsible for physical DB design and
for dealing with technical issues such as security, enforcement, DB performance, backup and recovery
­ Data Administrator (DA)​ ­ A high­level function that is responsible for the overall mgmt of data
resources in an org, inc. maintaining corporate­wide definitions and standards
­ Responsible for control overall company data resource
­ Job description covers a larger area of operations than the DBA
Higher lead role (corporate)
­
DBA​ ­ Responsible for the ​control of databases
­ Role varies between companies
­ Location of DBA varies (is up to company mgmt)
­ Larger corporations make a distinction between DBA vs DA
DBA Tasks/Responsibilities
­ Installing​ and upgrading the ​DB server
­ Allocating​ system ​storage​ and planning future storage
­ Modifying​ the DB ​structure
­ Enrolling users​ and maintaining system security
­ Ensuring compliance​ with DB vendor license agreement
­ Controlling ​and monitoring ​user access
­ Monitoring​ and optimising the ​performance
­ Planning ​for ​backup​ and ​recovery
­ Maintaining archived data
­ Backing up ​and ​restoring DBs
­
­
Contacting DB vendor​ (e.g. for technical support)
Generating various reports
DBA Ethics
Source: DBA Code of Ethics
­ Responsibilities to Company​: (don’t be unethical)
­ Follow internal standards and regulations
­ Inform openly about issues, provide complete info, do not create knowledge silo
­ Ensure up­to­date security, have recovery plan in place
­ Responsibilities to Externals​ (stakeholders):
­ Follow external regulations ­ standards, laws
­ Protect externals from inappropriate data use ­ could be whistleblowing BUT mainly acting
ethical
­ Ensure privacy through authorisation and security ­ critical data should be secured
­ Responsibilities to Co­Workers​:
­ Be honest and open with co­workers
­ Protect co­workers form inappropriate data use
­ Share, teach and help grow collective knowledge base
­ Responsibilities to One’s self:
­ Stay up to date on industry and tech
­ Stay up to date on regulations
­ Learn new techniques, new tools and best practices
Download