Database Management Systems - Health Information Technology

Database Management
Systems
Pierce College
CIS260
• Course Description:
Concepts and theory of relational database management systems (RDBMS) including the analysis and design of relational database systems. This is a
project-based class. Entity Relationship modeling and advanced Microsoft Access techniques, in preparation for the Microsoft Office Specialist exam,
will be covered. Course discussion and hands-on case studies in the healthcare industry with comparison to other industries, as applicable, provides
practical knowledge and experience.
• Course Content Outline/Topics:
A. Database Management Systems
B. Database Development
C. Business rules and user requirements
D. Entity Relationship Diagrams
E. Normalization
F. Database Design Patterns
G. Validate and manage data
H. Data Queries
I. Database Forms, Views and Reports
J. Database Security and Administration
K. Software, Data and User Testing
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
2
Learning Objectives
A1. Discuss database management systems and database administration.
B1. Analyze, design, and create relational databases to meet industry and customer needs using
current relational database management system software.
C1. Identify the business rules and customer requirements to be included in the data model.
D1. Differentiate and create conceptual data models, logical data models and physical data models.
E1. Normalize relationships in tables.
F1. Use database design patterns in data modeling.
F2. Use modeling/diagramming software to model data.
G1. Validate, import, convert, and export data from one application to another
H1. Create data queries that sort, filter, manipulate and calculate data
I1. Develop effective queries, forms, reports and custom user interfaces for databases.
J1. Discuss ethics and security issues and regulations surrounding data and databases.
K1. Test the integrity of the database design.
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
3
CAHIMS Requirements
• Case studies, critiques/peer reviews, diagrams/models, group or team
discussions/debates, individual projects/assignments, online
research, team projects, computer labs (CAHIMS 3.3, 5.2, 6.1, 6.2,
7.1, 7.3, 7.4, 9.4)
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
4
Module 1:
Database Management Systems
Lecture 1: Define and Discuss Databases and Database Management Systems
(DBMS)
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
5
Database versus Database Management
System(DBMS)
• Database
“a comprehensive collection of related data organized for convenient access, generally in a
computer.”,
~Dictionary.com, “Database”
• Database System aka Database Management System (DBMS)
“a software system designed to allow the definition, creation, querying, update, and
administration of databases.”
~Wikipedia.com, “Database”
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
6
Variations of Database Management Systems
• Relational Database and Relational Database Management System
(RDBMS)
• The most common type of DBMS.
• “A relational database is a collection of data items organized as a set of
formally described tables from which data can be accessed easily. A relational
database is created using the relational model. The software used in a
relational database is called a relational database management system
(RDBMS). “
• “First defined in June 1970 by Edgar Codd, of IBM's San Jose Research
Laboratory. Codd's view of what qualifies as an RDBMS is summarized in
Codd's 12 rules.”
Wikipedia.com, “Relational Database”
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
7
Variations of Database Management Systems
• Object Relational Database Management System (ORDBMS)
• “An object-relational database (ORD), or object-relational database management system (ORDBMS),
is … similar to a relational database, but with an object-oriented database model: objects, classes and
inheritance are directly supported in database schemas and in the query language.
• An object-relational database can be said to provide a middle ground between relational databases and
object-oriented databases (OODBMS). In object-relational databases, the approach is essentially that of
relational databases: the data resides in the database and is manipulated collectively with queries in a
query language;…”
• Object Oriented Database Management System (OODBMS)
• “…at the other extreme are OODBMSes in which the database is essentially a persistent object store for
software written in an object-oriented programming language, with a programming API for storing and
retrieving objects, and little or no specific support for querying.”
http://en.wikipedia.org/wiki/ORDBMS, 3/26/13
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
8
Common RDBMS
According to research company Gartner, the five leading commercial relational database vendors by revenue in
2011 were Oracle (48.8%), IBM (20.2%), Microsoft (17.0%), SAP including Sybase (4.6%), and Teradata (3.7%).
According to Gartner, in 2008, the percentage of database sites using any given technology were (a given site
may deploy multiple technologies):
• Oracle Database - 70%
• Microsoft SQL Server - 68%
• MySQL (Oracle Corporation) - 50%
• IBM DB2 - 39%
• IBM Informix - 18%
• Adaptive Server Enterprise (Sybase Corporation) - 15%
• Sybase IQ - 14%
• Teradata - 11%
• Amazon Relational Database Service is a database as a service offering MySQL and Oracle database
engines.
~ Source Wikipedia.com “RDBMS”, 3/22/2013
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
9
Data Warehouse
“A database designed to support
decision making in an
organization. Data from the
production databases are copied
to the data warehouse so that
queries can be performed without
disturbing the performance or the
stability of the production
systems.”
http://www.pcmag.com/encyclopedia/term/40866/data-warehouse, 4/18/13
4/19/2013 5:00 PM
https://upload.wikimedia.org/wikipedia/commons/4/46/Data_warehouse_overview.JPG, 4/18/2013
Pierce College - CIS260 Database Management Systems
10
Data Marts, Cubes and Mining
Data Marts
“Data warehouses can become enormous with hundreds of gigabytes of transactions. As a result, subsets, known as "data marts," are
often created for just one department or product line.”
http://www.pcmag.com/encyclopedia/term/40866/data-warehouse, 4/18/13
Data Cubes
“An OLAP cube is an array of data understood in terms of its 0 or more dimensions…A cube can be considered a generalization of a
three-dimensional spreadsheet.”
http://en.wikipedia.org/wiki/OLAP_cube, 4/18/13
Data Mining
“Data mining …is the computational process of discovering patterns in large data sets involving methods at the intersection of
artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract
information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it
involves database and data management aspects, data preprocessing, model and inference considerations, interestingness metrics,
complexity considerations, post-processing of discovered structures, visualization, and online updating.”
http://en.wikipedia.org/wiki/Data_mining, 4/18/2013
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
11
Business Intelligence/Business Analytics
“Business intelligence, or BI, is an umbrella term that refers to a variety of software
applications used to analyze an organization’s raw data. BI as a discipline is made up of
several related activities, including data mining, online analytical processing, querying and
reporting.
Companies use BI to improve decision making, cut costs and identify new business
opportunities. BI is more than just corporate reporting and more than a set of tools to coax
data out of enterprise systems. CIOs use BI to identify inefficient business processes that
are ripe for re-engineering.”
http://www.cio.com/article/40296/Business_Intelligence_Definition_and_Solutions, 4/18/13
“Thomas Davenport [an American academic and author specializing in analytics] argues
that business intelligence should be divided into querying, reporting, OLAP, an "alerts" tool,
and business analytics. In this definition, business analytics is the subset of BI based on
statistics, prediction, and optimization.”
http://en.wikipedia.org/wiki/Business_intelligence#Business_intelligence_and_data_warehousing, 4/18/13
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
12
Big Data
“Every minute, 48 hours of video are uploaded onto YouTube. 204
million e-mail messages are sent and 600 new websites generated.
600,000 pieces of content are shared on Facebook, and more than
100,000 tweets are sent. And that does not even begin to scratch the
surface of data generation, which spans to sensors, medical records,
corporate databases, and more.”
http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/, 4/18/13
“By 2015, the average hospital will have two-thirds of a petabyte (665
terabytes) of patient data, 80% of which will be unstructured data like
CT scans and X-rays.”
http://www.forbes.com/sites/netapp/2013/04/17/healthcare-big-data/, 4/18/13
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
13
Module 2:
Database Administration
Lecture 1: Database Administration
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
14
Database Administration
Wikipedia defines database administration as “ the function of managing and maintaining database
management systems (DBMS) software” and lists the following database administrator (DBA) responsibilities:
“DBA Responsibilities
•
•
•
•
•
•
•
•
•
•
•
•
Installation, configuration and upgrading of Database server software and related products.
Evaluate Database features and Database related products.
Establish and maintain sound backup and recovery policies and procedures.
Take care of the Database design and implementation.
Implement and maintain database security (create and maintain users and roles, assign privileges).
Database tuning and performance monitoring.
Application tuning and performance monitoring.
Setup and maintain documentation and standards.
Plan growth and changes (capacity planning).
Work as part of a team and provide 24x7 support when required.
Do general technical troubleshooting and give cons.
Database recovery.”
Source: Wikipedia, 3/18/13, http://en.wikipedia.org/wiki/Database_administration_and_automation
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
15
Module 2:
Database Administration
Lecture 2: Governance, Policies, and Procedures
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
16
Governance, Policies, and Procedures
• To perform DBA responsibilities, there needs to be an understanding of
expectations and consequences.
• Who sets those expectations, who enforces those expectations, and who
implements those expectations is the purpose behind governance, policies and
procedures. Who is responsible for those expectations? EVERY stakeholder.
• Even the DBA has specific roles for helping set these expectations as shown in the
list of responsibilities:
• “Establish and maintain sound backup and recovery policies and procedures.
• Setup and maintain documentation and standards.”
Source: Wikipedia, 3/18/13, http://en.wikipedia.org/wiki/Database_administration_and_automation
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
17
Example:
Why?
Governance
Our college will comply with FERPA regulations
and protect the privacy of student education
records.
What?
Policies
Grades will not be posted.
How?
Procedures
Student grades will be mailed to
the student three days after final
exams.
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
18
Governance
• Governance addresses all levels of the organization and answers the question
“why are we setting these expectations?”
• “IT governance enables an organization to attain three vital objectives: regulatory
and legal compliance, operational excellence, and risk optimization.” [1]
• “An IT governance framework should not exist in isolation from either the overarching
corporate governance model or the ERM [enterprise risk management] model.”[2]
[1], [2]“IT
Excellence Starts with Governance”, An Ernst & Young White Paper, By Nick
Robinson, Manager, Technology & Security Risk Services,
http://www.technologyexecutivesclub.com/Articles/itgovernance/excellence.php
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
19
Policies and Procedures
“A set of policies are principles, rules, and guidelines formulated or adopted by
an organization to reach its long-term goals and typically published in a booklet
or other form that is widely accessible.
Policies and procedures are designed to influence and determine all major
decisions and actions, and all activities take place within the boundaries set by
them. Procedures are the specific methods employed to express policies in
action in day-to-day operations of the organization.
Together, policies and procedures ensure that a point of view held by the
governing body of an organization is translated into steps that result in an
outcome compatible with that view. “
Businessdictionary.com, http://www.businessdictionary.com/definition/policies-and-procedures.html, 3/18/13
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
20
Module 2:
Database Administration
Lecture 3: Database Career Paths and Certification
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
21
Healthcare Database
Career Paths and Certification
Career Paths
• database administrator
• database developer
• database analyst
• data analyst
• business analyst
• application analyst
4/19/2013 5:00 PM
Certifications
• HIMSS CAHIMS
• Microsoft MTA
• Microsoft MOS
• Microsoft MCSA
• CompTIA HIT
Pierce College - CIS260 Database Management Systems
22
Module 2:
Database Administration
Lecture 4: CAHIMS 9.4 Staying Current with Technology and Industry
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
23
Module 3:
Data Management
Lecture 1: What is Data Management?
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
24
Data Management
“Data management (DM) is the business function of planning for,
controlling, and delivering data and information assets. This function
includes
•
•
•
•
The disciplines of development, execution and supervision
of plans, policies, programs, projects, processes, practices and procedures
that control, protect, deliver and enhance
the value of data and information assets.“
“DAMA Guide to the Data Management Body of Knowledge“, DAMA International, 2010, pg. 4
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
25
Data, Information, Knowledge
The DAMA-DMBOK Guide defines
• data as the “representation of facts as text, numbers, graphics, images,
sound or video”
• information as “data in context”. “This context includes
…definition…format…timeframe…relevance”
• knowledge as “information in perspective, integrated into a viewpoint
based on the recognition and interpretation of patterns, such as trends,
formed with other information and experience.” “We gain in knowledge
when we understand the significance of information.”
“DAMA Guide to the Data Management Body of Knowledge“, DAMA International, 2010, pg. 4
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
26
Adding Value
“The Business Value of Data
What's the value of your organization's data?
The ability of business and IT managers to answer that question directly correlates to the
success of their company's business continuity and data recovery efforts.”
“The Business Value of Data”, by Michael Croy, Forsythe Solutions Group, Inc., ,http://www.technologyexecutivesclub.com/Articles/management/artBusinessValueofData.php, 03/25/13
“Data has value only when it is actually used, or can be useful in the future. All data lifecycle stages
have associated costs and risks, but only the “use” stage adds business value.”
“DAMA Guide to the Data Management Body of Knowledge“, DAMA International, 2010, pg. 3
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
27
Big Data
“is a collection of data sets so large and complex that it becomes difficult to process
using on-hand database management tools or traditional data processing applications.”
“Examples include Big Science, web logs, RFID, sensor networks, social networks, social
data (due to the social data revolution), Internet text and documents, Internet search
indexing, call detail records, astronomy, atmospheric science, genomics,
biogeochemical, biological, and other complex and often interdisciplinary scientific
research, military surveillance, forecasting drive times for new home buyers, medical
records, photography archives, video archives, and large-scale e-commerce.”
http://en.wikipedia.org/wiki/Big_data, 3/26/13
“A recent McKinsey report found that value gained from data in the US health care
sector alone could be more than US $300 billion every year. But traditional tools aren’t
enough to manage these extremely large amounts of fast-changing information.”
http://www.net-security.org/secworld.php?id=14594
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
28
Module 3:
Data Management
Lecture 2: Data Governance
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
29
Data Governance
“Data Governance is "a system of decision rights and accountabilities for informationrelated processes, executed according to agreed-upon models which describe who can take
what actions with what information, and when, under what circumstances, using what
methods.“”
Datagovernance.org, http://www.datagovernance.com/adg_data_governance_basics.html, 03/18/13
“Data governance is an approach that public and private entities can use to organize one or
more aspects of their data management efforts, including business intelligence (BI), data
security and privacy, master data management (MDM), and data quality (DQ)
management.”
Microsoft Corporation, http://www.microsoft.com/en-us/download/details.aspx?id=10985, 03/18/13
“A Governance Plan typically includes your objectives, administration and maintenance,
policies, support, stakeholder teams and other elements.”
“Governance”, http://www.webopedia.com/TERM/V/validation.html , 03/15/13
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
30
Parts of Data Governance
© DAMA International 2010
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
31
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
32
Module 3:
Data Management
Lecture 3: Data Validation
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
33
Data Validation
Validation (n.) Verification that something is correct or conforms to a
certain standard. In data collection or data entry, it is the process of
ensuring that the data that are entered fall within the accepted
boundaries of the application collecting the data.
“Validation”, http://www.webopedia.com/TERM/V/validation.html , 03/15/13
“…data validation is the process of ensuring that a program operates
on clean, correct and useful data.”
“Data validation”, http://en.wikipedia.org/wiki/Data_validation, 3/13/13
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
34
Module 3:
Data Management
Lecture 4: The Data Dictionary
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
35
Data Dictionary
“Database about a database. A data dictionary defines the structure of the database itself (not
that of the data held in the database) and is used in control and maintenance of large
databases. Among other items of information, it records
(1)
(2)
(3)
(4)
what data is stored,
name, description, and characteristics of each data element,
types of relationships between data elements,
access rights and frequency of access. Also called system dictionary when used in the context of a system
design.”
“Data dictionary”, http://www.businessdictionary.com/definition/data-dictionary.html#ixzz2NuokM0Dn , 3/18/13
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
36
Reasons for a data dictionary
• ““High-Availability and Disaster Recovery (HA/DR) or what is often called “Business
Continuity.””
• “to determine authoritative sources”
• “to gather the disparate elements and cull them into a single reporting entity”
• “for systems optimization”
• “No one group knows where all of the data is, which of it is authoritative, and which you
should track. “
“Developing a Data Dictionary”, Apr 8, 2011, http://www.informit.com/guides/content.aspx?g=sqlserver&seqNum=382
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
37
Guidelines for Developing the Data Dictionary
Recommended guidelines from the e-HIM Workgroup of the American Health Information Management Association (AHIMA), 2006;
1. Design a plan: Preplan the development, implementation, and maintenance of the data dictionary.
2. Develop an enterprise data dictionary; integrate common data elements across the entire institution to ensure consistency.
3. Ensure collaborative involvement: Make sure there is support from all key stakeholders.
4. Develop an approvals process: Ensure a documentation tail for all decision, updates, and maintenance.
5. Identify and retain details of data versions: Version control is important.
6. Design for flexibility and growth.
7. Design room for expansion of field values.
8. Follow established ISO/International Electro technical Commission (IEC) 11179 guidelines for metadata registry: to promote interoperability follow standards.
9. Adopt nationally recognized standards.
10. Beware of differing standards for the same concepts.
11. Use geographic codes and conform to the National Spatial Data Infrastructure and the Federal Geographic Data Committee.
12. Test the information system: Develop a test plan to ensure the system supports the data dictionary.
13. Provide ongoing education and training.
14. Assess the extend to which the data elements maintain consistency and avoid duplication.
Source: Health IT Workforce Curriculum, Version 2.0, Spring 2011, Component 11/Unit 8-2.
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
38
Key Points in the Guidelines: Involve
Stakeholders
• Involve all stakeholders in the discussion of creating the data dictionary.
• Stakeholders may include data creators, owners, users which will affect:
-Departments (represented across either facility or enterprise)
-Outside collaborating agencies/facilities
-Public health agencies
-Clinical providers including all specialties
-HIM administrative support services
-Reimbursement support service
-Legal support services
-IT support services
Source: Health IT Workforce Curriculum, Version 2.0, Spring 2011, Component 11/Unit 8-2
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
39
Key Points in Guidelines: Train Employees
• Ongoing education of long term employees to assure compliance is
an important tactic for maintaining quality
• New employees should also receive appropriate training to involve
them in assuring the quality of data being collected
Source: Health IT Workforce Curriculum, Version 2.0, Spring 2011, Component 11/Unit 8-2
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
40
Data Dictionary Examples and Best Practices
California Department of Pesticide Regulation
“Data Dictionary”
http://www.cdpr.ca.gov/docs/enforce/residue/datadict.htm
Northwest Environmental Data Network (NED)
“Best practices for data dictionary definitions and usage”
http://www.pnamp.org/sites/default/files/best_practices_for_data_dictionary_definitions_and_usage_version_1.1_2006-11-14.pdf
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
41
Module 4:
Stakeholders & Requirements
Lecture 1: Business, Functional, and Technical Requirements
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
42
Define Stakeholders
• Sponsor(s) – person(s) with decision and funding authority
• Subject Matter Expert(s) – person(s) who know the content
• End User(s)- person(s), or representatives of user groups, who will
utilize or benefit from the new product or process
• Support Staff – all other persons needed to implement and/or
maintain the product or process, i.e. training, IT, HR
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
43
Documenting Requirements
BUSINESS, FUNCTIONAL, AND TECHNICAL REQUIREMENTS
• Variety of titles exist for these requirements documents, but the objectives are the same.
• Business Requirements
• Answers the question “how does this project fill an organizational/department need?”
• High level. Used for management approval and resource allocation.
• Functional Requirements
• Answers the question “what do the users need?”
• End User level. Used for capturing and verifying the users’ requirements.
• Technical Requirements
• Answers the question “what does the technology (hardware/software) need?”
• Highly detailed. Used for programmers, developers, networking, DBA’s, testers, and all other needed IT staff.
• For small deliverables, all three may be described in one document, for complex projects,
may need many versions and levels.
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
44
Requirements Tips and Templates
• “Top 10 Writing good requirements tips”,
http://www.requirementone.com/Blog/2012/01/15/Top-10-Writing-good-requirements-tips, 3/18/13
• “Business Requirements Document: A High-level Review”,
http://www.isixsigma.com/implementation/project-selection-tracking/business-requirements-document-high-level-review/, 3/25/13
• Centers for Medicare & Medicaid Services, Requirements Document
(template)
http://www.hsd.state.nm.us/pdf/hcr/HIX/CMS%20Requirements%20Document.pdf, 3/25/13
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
45
Module 4:
Stakeholders & Requirements
Lecture 2: CAHIMS 3.3 Business, User and Technical Requirements
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
46
Module 4:
Stakeholders & Requirements
Lecture 3: CAHIMS 9.1 Business Communication and Ethics
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
47
Module 4:
Stakeholders & Requirements
Lecture 4: CAHIMS 9.3 Professionalism and Customer Service
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
48
Module 5:
Data Modeling
Lecture 1: Data Modeling
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
49
Why Model?
Earth
House
4/19/2013 5:00 PM
Molecule
Pierce College - CIS260 Database Management Systems
50
Data Modeling
“Communication and precision are the two key benefits that make a
data model so important.”
Data Modeling Made Simple: A Practical Guide for Business and IT Professionals, Technics Publications, LLC , 2nd Edition, Steve Hoberman, pg. 37
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
51
Transfer User Requirements into Data Models
Three types of data models:
Conceptual Model–simplest model; used to communicate and validate requirements with
stakeholders
• A Conceptual data model is the most abstract form of data model. It is helpful for communicating ideas to a wide range of
stakeholders because of its simplicity. Therefore platform-specific information, such as data types, is omitted from a
Conceptual data model.”
http://www.sparxsystems.com/enterprise_architect_user_guide/9.3/domain_based_models/conceptual_data_model.html, 3/25/13
Logical Model-more complex model; follows formal database design rules
• “Logical data modeling is the process of documenting the comprehensive business information requirements in an accurate
and consistent format. “
“Data modeling”, http://pic.dhe.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=%2Fcom.ibm.db2z10.doc.intro%2Fsrc%2Ftpc%2Fdb2z_datamodeling.htm, 3/25/13
Physical Model-the most complex model; performance specific and vendor specific model
• “The physical design of your database optimizes performance while ensuring data integrity by avoiding unnecessary data
redundancies. During physical design, you transform the entities into tables, the instances into rows, and the attributes into
columns.”
“Physical database design”, http://pic.dhe.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=%2Fcom.ibm.db2z10.doc.intro%2Fsrc%2Ftpc%2Fdb2z_datamodeling.htm, 3/25/13
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
52
“In Summary
The conceptual model is concerned with the real world view and
understanding of data; the logical model is a generalized formal
structure in the rules of information science; the physical model
specifies how this will be executed in a particular DBMS instance.”
http://www.aisintl.com/case/CDM-PDM.html, 3/25/13
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
53
Data Modeling Standards
• Several notation standards exist for data modeling
•
•
•
•
•
•
•
•
•
•
Peter Chen’s
Bachman notation
Barker's Notation
EXPRESS
IDEF1X
Martin notation
(min, max)-notation of Jean-Raymond Abrial in 1974
UML class diagrams
Merise
Object-Role Modeling)
http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model, 3/13/2013
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
54
Module 5:
Data Modeling
Lecture 2: Entity Relationship Modeling
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
55
The Entity Relationship(ER) Model
• The ER model shows information to be collected in the database
(entity) and its relationship with other information collected.
• Peter Chen, 1976, originator of the entity relationship model
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
56
Identify the “Entities”
Identify the “entities” – the nouns – the title of the information being collected
(patient, appointment, prescription, physician, etc.).
1. Draw a box for each entity and label with the entity name.
2. Label using the singular spelling of the noun and capitalize the noun.*
Figure 1: The design of the "box" will depend on the software used to create it.
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
57
Identify the “Relationships”
Identify the “relationships” – the verbs; how one piece of data or information
interacts/relates with another piece of information Draw a line between entities to
show relationship.
1.Label the line with verbs that describe the relationship.
2.The first verb is for reading left to right; the second verb is for reading right to
left.
Figure 2:
The Patient has an Appointment. The Appointment is for a Patient.
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
58
Identify the “Cardinality”
Identify the “cardinality” – the number of entities allowed in the relationship.
1.A single line touching an entity means “ONE”.
2.A line ending with three small lines, referred to as “crow’s feet”, means “Zero
or more”. Once created, this can be set to a different minimum such as “One or
more” or “Three or more”.
Figure 3: Single point at line end means "ONE"; crow’s feet, means "many" or "one or more".
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
59
Identify the “Optional/Optionality”
Identify “optional/optionality” – whether the relationship is required or not.
1. Microsoft VISIO uses the “O” to show “optional” as seen by the entity Appointment.
2. The “||” means one required.
Figure 4: The Patient may have "one or more“ Appointments; the Appointment must have one Patient.
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
60
Add the “Attributes”
Add the “attributes” – descriptors of the entity.
1. Label the attributes as singular tense.
2. Don’t put spaces or symbols between words if more than one is needed for
clarity.
3. Type or write as “camel case” – first letter of each word is upper case, all
other letters are lower case.
Figure 5: We identify the Patient by his/her name; we identify the Appointment by the date, time and Physician .
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
61
Add “Primary Key (PK)” and “Foreign Key (FK)”
Add “primary key (PK)” – use an attribute of the entity, or create a new attribute,
that uniquely identifies the entity.
1. A new attribute, primary key ID, is usually created for most entities because
none of the attributes identified are guaranteed to always be “unique”.
2. Add “foreign key(FK)” –this is the Primary key of the parent table in a
relationship. The PK of the parent (in this case “Patient”) is added to the child
table (in this case “Appointment”) thereby becoming the FK.
Figure 6:
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
62
Unique Identifiers (UID)
• A unique identifier/unique ID/UID is a number or combination of numbers
and letters that when used will only identify one entity or record.
• Examples of ID’s we think uniquely identify us:
• Driver’s license
• Social Security Number
• Telephone number
• Why they might not be unique:
• http://www.idanalytics.com/news-and-events/news-releases/2010/8-11-2010.php
• CustomerID, or PatientID, or AccountNumber are examples of new
identifiers created for the purpose of keeping the information unique.
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
63
Read the Data Models
Practice reading the diagrams.
1. Keep nouns singular when starting each sentence.
2. Read from left to right, then from right to left. The sentences must make sense
in both directions.
Noun A
may/must
have relationship(s)
with some number
of Noun B.
**
Bike
must
be sold with
one or more
Wheel(s)
***
A Wheel
May
Be sold with
A
Bike
Wheel
Bike
is sold with / is part of
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
64
Module 5:
Data Modeling
Lecture 3: Introduction to Microsoft VISIO
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
65
Diagramming Software
There are many ER diagramming tools.
• Free software ER diagramming tools that can interpret and generate ER
models and SQL and do database analysis
• Proprietary ER diagramming tools
• Free software diagram tools just draw the shapes without having any
knowledge of what they mean, nor do they generate SQL.
http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model#ER_diagramming_tools
• Microsoft VISIO is a proprietary diagramming tool that is free to
students in college programs covered under the Microsoft Academic
license.
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
66
Diagramming Software
• Get to know VISIO
http://office.microsoft.com/en-us/visio-help/get-to-know-visio-RZ001126777.aspx
• Visio 2007 training courses
http://office.microsoft.com/en-us/visio-help/visio-2007-training-courses-HA010214368.aspx?CTT=1
• Create a Database Model (also known as Entity Relationship
diagram)
http://office.microsoft.com/en-us/visio-help/create-a-database-model-also-known-as-entity-relationship-diagram-HA010115477.aspx
• Make the switch to VISIO 2010
http://office.microsoft.com/en-us/access-help/download-office-2010-training-HA101901726.aspx?CTT=1
• Make the switch to VISIO 2013
http://office.microsoft.com/en-us/support/make-the-switch-to-visio-2013-RZ102925050.aspx?CTT=5&origin=HA104032123
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
67
Module 5:
Data Modeling
Lecture 4: Normalization
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
68
Normalization
“Normalization is the process of efficiently organizing data in a
database. There are two goals of the normalization process: eliminating
redundant data (for example, storing the same data in more than one
table) and ensuring data dependencies make sense (only storing
related data in a table). Both of these are worthy goals as they reduce
the amount of space a database consumes and ensure that data is
logically stored.”
http://databases.about.com/od/specificproducts/a/normalization.htm
”Redundant data wastes disk space and creates maintenance
problems. If data that exists in more than one place must be changed,
the data must be changed in exactly the same way in all locations.”
http://support.microsoft.com/kb/283878
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
69
Normalization Rules
“There are a few rules for database normalization. Each rule is called a
"normal form." If the first rule is observed, the database is said to be in
"first normal form." If the first three rules are observed, the database is
considered to be in "third normal form." Although other levels of
normalization are possible, third normal form is considered the highest
level necessary for most applications. “
http://support.microsoft.com/kb/283878
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
70
First Normal Form (1NF)
“First Normal Form
• Eliminate repeating groups in individual tables.
• Create a separate table for each set of related data.
• Identify each set of related data with a primary key.
Do not use multiple fields in a single table to store similar data.”
http://support.microsoft.com/kb/283878
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
71
1NF Example
Not 1NF
1NF
Student
Student
StudentID
Student
Name
CourseID
SD12345
Sally Doe
Sally Doe
CIS260, CIS270, CIS280
JD23456
Jefferson Doe
Jefferson Doe CIS260, ENG101
Multiple entries for one field is a violation
of 1NF.
Split into two tables using a student unique
identifier to tie the information together.
4/19/2013 5:00 PM
Course
CourseID
CourseTitle
StudentID
CIS260
Database
SD12345
CIS270
Programming
SD12345
CIS280
Analysis
SD12345
CIS260
Database
JD23456
ENG101
English
JD23456
Pierce College - CIS260 Database Management Systems
72
Second Normal Form (2NF)
“Second Normal Form
• Create separate tables for sets of values that apply to multiple
records.
• Relate these tables with a foreign key.
Records should not depend on anything other than a table's primary
key (a compound key, if necessary).”
http://support.microsoft.com/kb/283878
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
73
2NF Example
Not 2NF
Student
StudentID
Student
SD12345
Sally Doe
JD23456
Jefferson Doe
2NF
Each
Student is
unique;
expanded
Course table
violates 2NF.
Enrollment
StudentID
CourseID
SD12345
CIS260
SD12345
CIS270
SD12345
CIS280
JD23456
CIS260
Course
CourseID
CourseTitle
StudentID
JD23456
ENG101
CIS260
Database
SD12345
CIS270
Programming SD12345
Course
CourseID
CourseTitle
CIS280
Analysis
SD12345
CIS260
Database
CIS260
Database
JD23456
CIS270
Programming
ENG101
English
JD23456
CIS280
Analysis
ENG101
English
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
Now each
Enrollment is
unique, and each
Course is unique.
74
Third Normal Form (3NF)
“Third Normal Form
• Eliminate fields that do not depend on the key.
Values in a record that are not part of that record's key do not belong in the
table.”
http://support.microsoft.com/kb/283878
• “A memorable statement of Codd's definition of 3NF, paralleling the
traditional pledge to give true evidence in a court of law, was given by Bill
Kent: "[Every] non-key [attribute] must provide a fact about the key, the
whole key, and nothing but the key." A common variation supplements this
definition with the oath: "so help me Codd".”
http://en.wikipedia.org/wiki/Third_normal_form
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
75
3NF Example
Not 3NF
Enrollment
Enrollment
EnrollmentID
Expanded Enrollment table not
3NF because Instructor is
dependent on Instructor ID, not
CourseID, and the record not
dependent on a unique record
key.
StudentID CourseID
InstructorID Instructor
SD12345
CIS260
FAC123
SD12345
CIS270
SD12345
3NF
StudentID
CourseID
InstructorID
2013Fall001 SD12345
CIS260
FAC123
2013Fall002 SD12345
CIS270
FAC234
2013Fall003 SD12345
CIS280
FAC345
2013Fall004 JD23456
CIS260
FAC456
2013Fall005 JD23456
ENG101
FAC567
Schmidt
Instructor
InstructorID
Instructor
FAC234
Jones
FAC123
Schmidt
CIS280
FAC345
Nguyen
FAC234
Jones
JD23456
CIS260
FAC456
Doe
FAC345
Nguyen
JD23456
ENG101
FAC567
Chen
FAC456
Doe
FAC567
Chen
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
76
Finished Normalized Entities
Course
CourseID
CourseTitle
CIS260
Database
CIS270
Programming
CIS280
Analysis
ENG101
English
4/19/2013 5:00 PM
Each entity now holds
unique data.
Student
StudentID
Student
SD12345
Sally Doe
JD23456
Jefferson Doe
Enrollment
Instructor
InstructorID
Instructor
EnrollmentID
FAC123
Schmidt
FAC234
StudentID
CourseID
InstructorID
2013Fall001 SD12345
CIS260
FAC123
Jones
2013Fall002 SD12345
CIS270
FAC234
FAC345
Nguyen
2013Fall003 SD12345
CIS280
FAC345
FAC456
Doe
2013Fall004 JD23456
CIS260
FAC456
FAC567
Chen
2013Fall005 JD23456
ENG101
FAC567
Pierce College - CIS260 Database Management Systems
77
Matching ER Diagram
Student
PK
StudentID
has / is of
Student
Enrollment
Instructor
PK
has / is of
PK
PK,FK1
PK,FK3
EnrollmentID
StudentID
CourseID
FK2
InstructorID
Course
has / is of
PK
CourseID
CourseTitle
InstructorID
Instructor
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
78
Normalization Exception
“From a relational model point of view, it is standard to have tables
that are in Third Normal Form. Normalized physical design provides
the greatest ease of maintenance, and databases in this form are
clearly understood by developers.
However, a fully normalized design may not always yield the best
performance. Sybase recommends that you design databases for
Third Normal Form, however, if performance issues arise, you may
have to denormalize to solve them.”
Copyright © 2003. Sybase Inc. All rights reserved. http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.dc20020_1251/html/databases/databases215.htm
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
79
Module 5:
Lecture 5: Other Modeling Concepts
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
80
Many-to-Many Relationships
“In a many-to-many relationship, a record in one table relates to
multiple records in a second table, and a record in the second table
relates to multiple records in the first table.
This type of relationship requires a third table, called a junction table.
The junction table contains the primary keys from the other two tables
as its foreign keys.”
http://office.microsoft.com/en-us/access-help/table-that-data-RZ006149432.aspx?section=26, 4/16/2013
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
81
Junction/Intersection Tables
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
82
Lookup Tables
Lookup tables are generally tables that contain rarely changing data and are
considered the master or reference list.* An example could be a zip code
table with associated city, state, county, country, latitude/longitude
information. Many different tables and/or databases could use this same
table for lookup and for presenting various combinations of the fields on
different reports.
• Lookup tables generally are single topic, used when the data rarely
changes, when multiple applications could share the same information, to
minimize the likelihood of typos, and to minimize storage for other tables
needing to reference this table.
• The relationship between a lookup table and another entity table is strictly
a lookup relationship.
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
83
Lookup Table Examples
OrderStatusID
OrderStatus
StateNamesID
StateNameShort
StateNameLong
1
Ordered
1
AL
Alabama
2
Back Ordered
2
AK
Alaska
3
Shipped
3
AZ
Arizona
4
Cancelled
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
84
Recursion
The Supervisor is also an
Employee, so a lot of duplication of
information.
4/19/2013 5:00 PM
By adding the EmployeeID of the
Supervisor to the field SupervisorID,
the relationship is established with
only one extra field needed.
Pierce College - CIS260 Database Management Systems
85
Supertypes and Subtypes
PATIENT
*First Name
*Last Name
*Gender
*Birthdate
MALE
FEMALE
*Number of
Pregnancies
4/19/2013 5:00 PM
INFANT
*Head
Circumference
Pierce College - CIS260 Database Management Systems
86
Module 5:
Data Modeling
Lecture 6: Data Patterns
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
87
Patterns/Templates
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
88
Data Patterns
"Thus Universal Patterns for Data Modeling are reusable guides that
provide a data modeling template for very prevalent or "universal"
themes that occur in data modeling.“
The Data Model Resource Book, Vol. 3: Universal Patterns for Data Modeling, Len Silverston, Paul Agnew, Wiley; 1 edition (January 9, 2009) , page 5
Watch Youtube video of Len Silverston,
author of The Data Model Resource Book,
http://www.youtube.com/watch?v=D5Rgpl6humE
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
89
Example Data Pattern –
Electronic Medical Records
Source: http://www.databaseanswers.org/data_models/electronic_medical_records/index.htm, 4/16/2013
4/19/2013 5:00 PM
Pierce College - CIS260 Database Management Systems
90