Uploaded by jeeg patel

cdmp-preparation

advertisement
CDMP Preparation Workshop
EDW April 2016
Presented by:
Chris Bradley and Katherine O’Keefe
Who We Are
Christopher Bradley
President DAMA-UK
CDMP Fellow
CDMP Author &
Examiner
DAMA Professional
Achievement Award
DMBoK 2 co-author
35 years Global Data Management
Experience
Author DMBoK education series
Independent Consultant Data
Management Advisors
Information Strategist, Author,
Trainer
chris.bradley@dmadvisors.co.uk
Christopher Bradley
Chris has 35 years of Information Management experience & is
a leading Independent Information Management strategy
advisor.
In the Information Management field, Chris works with
prominent organizations including HSBC, Celgene, GSK, Pfizer,
Icon, Quintiles, Total, Barclays, ANZ, GSK, Shell, BP, Statoil,
Riyad Bank & Aramco. He addresses challenges faced by large
organisations in the areas of Data Governance, Master Data
Management, Information Management Strategy, Data
Quality, Metadata Management and Business Intelligence.
He is a Director of DAMA- I, is the inaugural CDMP Fellow,
author & examiner for CDMP, a Fellow of the Chartered
Institute of Management Consulting (now IC) a member of
the MPO, and SME Director of the DM Board. He also is the
recipient of the DAMA lifetime professional achievement
award.
A recognised thought-leader in Information Management
Chris is the author of numerous papers, books, including
sections of DMBoK 2.0, a columnist, a frequent contributor to
industry publications and member of several IM standards
authorities.
He leads an experts channel on the influential BeyeNETWORK,
is a sought after speaker at major international conferences,
and is the co-author of “Data Modelling For The Business – A
Handbook for aligning the business with IT using high-level
data models”. He also blogs frequently on Information
Management (and motorsport).
Christopher Bradley
INFORMATION MANAGEMENT STRATEGIST
Chris.Bradley@DMAdvisors.co.uk
+44 7973 184475 (mobile) +44 1225 923000 (office)
@inforacer
uk.linkedin.com/in/christophermichaelbradley/
TRAINING
ADVISORY
infomanagementlifeandpetrol.blogspot.com
CONSULTING
CERTIFICATION
Who We Are
Katherine O’Keefe, PhD
• Project Lead Consultant,
CDMP exams design
• Data Governance and Data
Privacy Consultant and Trainer
with Castlebridge Associates
• Lecturer on Irish Law Society
certification course for Data
Protection
• Tutor and Lecturer in English and
Irish Literature and Drama
Ethics in Information
Management
Storytelling for Change
Management
Data Privacy and the EU
General Data Protection
Regulation
Katherine@castlebrige.ie
Castlebridge Associates
CHANGING HOW PEOPLE THINK ABOUT
INFORMATION
Katherine O’Keefe, PhD
Dr Katherine O'Keefe is a Data Governance and
Data Privacy consultant and trainer with
Castlebridge Associates, specializing in
“translating Data Geek to People Speak”.
Katherine has worked with clients in a variety of
sectors on consulting and training engagements
since starting with Castlebridge Associates. In
addition to her professional experience in Data
Governance and Privacy. Katherine holds a
Doctorate in Anglo-Irish Literature from
University College Dublin with an interdisciplinary
focus on Philosophy, and as well as being a Data
Governance and Privacy consultant, is a world
leading expert on the Fairy Tales of Oscar Wilde.
Ten years of experience teaching in diverse
learning environments. As an experienced
teacher of English as a foreign language, she
understands the challenges of translating
concepts across language and culture.
She is the author of “A Primer on Ethical
Principles in an Information Governance
Framework”, which sets out a structured, first
principles based framework for ethical decision
making in the processing of data.
Castlebridge Associates
CHANGING HOW PEOPLE THINK ABOUT
INFORMATION
Katherine O’Keefe
INFORMATION GOVERNANCE AND DATA PRIVACY
CONSULTANT AND TRAINER
Katherine@Castlebridge.ie
+353 86 3699863
@okeefekat
https://ie.linkedin.com/in/okeefekat
TRAINING
ADVISORY
www.castlebridge.ie
STRATEGY
CONSULTING
Castlebridge Associates
CHANGING HOW PEOPLE THINK ABOUT
INFORMATION
CDMP Revamped 2015
Comparing the Levels
CDMP Exam Prices
Item
Member
Non-Member
Associate (DM Fundamentals)
$220
$290*
Associate to Practitioner/ Master Data Management (DM) Fundamentals Exam
conversion**
$150
$220*
Practitioner/ Master DM Advanced Exam
$250
$330*
Practitioner/ Master Elective Exams (per exam)
$250
N/A
Master Case Study Elective Exam ***
$280
N/A
Exam re-take (Master & Practitioner levels only)
$230
$300*
*
**
**
**
Non-members receive 1 year’s Central Membership of DAMAI with their first DM exam
Associate exam focuses on theory and concepts based on the DMBOK (V1 currently)
Practitioner and Master focuses on applying/ implementing the theory and concepts
Marks gained at Associate Level do not convert to similar marks at Practitioner Level. Associate CDMP’s must write DM
Advanced to progress to the next level
*** Individuals aiming for Master must provide a case study related to one of their two elective topics as well as pass all 3 exams at
80% and above
An admin fee of $50 will be levied per exam for cancellations or date changes
Transfers of exams/ membership from one individual to another are not permitted
ALL EXAMS ARE TAKEN ONLINE ONLY
ONLINE OR CHAPTER-LED PROCTORING REQUIRES FULL PAYMENT UP FRONT
Please watch out at various international DAMA I endorsed conferences for exam proctoring and preparation workshops
Taking the Exams: Associate
1 Exam Data Management Fundamentals
 Data Management Fundamentals (Associate Level)
 100 questions
 90 minutes
 60% to pass
Taking the Exams: Practitioner
Data Management Fundamentals (Practitioner Level)
+ 2 Advanced Elective exams
 Data Management Fundamentals
(Practitioner Level)
 110 questions
 90 minutes
 70% to pass
 Elective Exams (each)
 100 questions
 90 minutes
 70% to pass
Taking the Exams: Master
Data Management Fundamentals (Practitioner Level)
+ 2 Advanced Elective exams
 Data Management Fundamentals
(Practitioner Level)
 110 questions
 90 minutes
 80% to pass
 Elective Exams (each)
 100 questions
 90 minutes
 80% to pass
Substitution Exams
Adjacent Knowledge Area
Certificate Recognition
CDMP Testing at EDW
Sunday, 4/17/2016
 10:30 AM - 02:00 PM | CDMP Preparation
 06:00 PM - 07:30 PM | CDMP Exam [Associate]
Tuesday, 4/19/2016
 04:30 PM - 06:00 PM | CDMP Exam [Associate or Practitioner or
Practitioner electives]
 06:00 PM - 07:30 PM | CDMP Exam [Associate or Practitioner or
Practitioner electives]
Wednesday, 4/20/2016
 02:00 PM - 03:30 PM | CDMP Exam [Associate or Practitioner or
Practitioner electives]
DMBOK Wheel
(Version 1)
Bloom’s Taxonomy of Learning:
Cognitive Domains
Strategize, Design,
Make, Plan, Produce
Reflect, Critique, Test,
Judge, Monitor, Assess
Integrate, Organize,
Compare, Deconstruct
Implement, Use,
Carry out, Execute
Classify, Compare,
Summarize, Explain
Recall, Restate, Define,
Identify, List, Name
Dimensions of Knowledge
Metacognitive
Conceptual
Procedural
Factual
Bloom’s Taxonomy Revised
The Anatomy of a
Multiple Choice Question Item
Stem
How many economists does it take
to change a lightbulb?
Key
Distractors
Alternatives
A. They can't tell you unless you give them a lightbulb
approximation to work on.
B. They're projecting three for next year, but that's a
conservative estimate.
C. Nine. One to change the bulb, and eight to hold a
seminar on how Nietzche would have done it.
D. One, but they'll spend three hours checking it for
alignment and leaks.
E. How many did it take this time last year?
Direct Answer (only correct choice)
vs. Best Answer (most correct choice)
A
An example of a Direct Answer item:
B
An example of a Best Answer Item:
The California State Capitol is
located in which city?
Why does the planet Mercury have a
year of 88 Earth days?
A.
B.
C.
D.
a)
Los Angeles
Monterey
Sacramento
San Jose
b)
c)
Mercury’s year is shorter than
Earth’s
Mercury’s small size and
elliptical orbit make it travel
faster than Earth.
Mercury’s orbit is closer to the
sun than is Earth’s.
Exam Questions: evaluating the same
information at different levels
A
B
Which of the following is
characteristic of a good
Data Steward? (According
to DAMA-DMBOK version 1)
You need Data Stewards for
your DG programme: which
of these people would best
fit the role?
A.
B.
C.
D.
a)
b)
c)
d)
Quality A
Quality B
Quality C
Quality D
Description of person A
Description of person B
Description of person C
Description of person D
Practitioner Level Knowledge:
Going Beyond the DMBOK
Data Management Functions
› Enterprise Data Modelling
› Value Chain Analysis
› Related Data Architecture
›
›
›
›
Specification
Analysis
Measurement
Improvement
DATA QUALITY
MANAGEMENT
›
›
›
›
Architecture
Integration
Control
Delivery
DATA
GOVERNANCE
META DATA
MANAGEMENT
Acquisition & Storage
Backup & Recovery
Content Management
Retrieval
Retention
› Strategy
› Organisation & Roles
› Policies & Standards
› Issues
› Valuation
DATA
WAREHOUSE
& BUSINESS
INTELLIGENCE
MANAGEMENT
›
›
›
›
Analysis
Data modelling
Database Design
Implementation
DATA
DEVELOPMENT
DOCUMENT & CONTENT
MANAGEMENT
›
›
›
›
›
›
›
›
›
DATA
ARCHITECTURE
MANAGEMENT
Architecture
Implementation
Training & Support
Monitoring & Tuning
DATABASE
OPERATIONS
MANAGEMENT
Acquisition
Recovery
Tuning
Retention
Purging
DATA SECURITY
MANAGEMENT
REFERENCE &
MASTER DATA
MANAGEMENT
›
›
›
›
›
›
›
›
›
›
External Codes
Internal Codes
Customer Data
Product Data
Dimension Management
›
›
›
›
›
Standards
Classifications
Administration
Authentication
Auditing
DMBoK Webinars to date
DMBoK Overview
26th Feb 2015
Master & Ref Data
30th March
Data Risk & Security:
October 20th
Data Modelling
2nd June
Metadata Management:
November 17th
Data Operations:
February 26th 2016
Data Quality
18th August
Data Lifecycle
Management:
December 11th
Document & Content
Management March 15th 2016
DW & BI
September
19th
Data Governance:
January 12th 2016
NEW FOR DMBoK 2
Data Integration & Interoperability
April 12th 2016
https://goo.gl/MdQlgn
CDMP Certification & DMBoK Training
More to come
CDMP Preparation &
Examinations
April 17-19
EDW 2016
Information Management
Disciplines of the DMBoK
April 26-28
IRM Training
London
UK
CDMP Preparation &
Examinations
May 16-18
IRM MDMDG
London
UK
Data Quality Management
May 26-27
Rome
Italy
Information Management
July 10-21
Disciplines of the DMBoK &
CDMP Preparation & Exams
Dubai
UAE
CDMP Preparation &
Examinations
November 7-9
IRM ED/BI
San Diego
USA
London
UK
Data Management Fundamentals
DM Fundamentals Contents
1.
Data Management Process
2.
Data Governance Function
3.
Data Architecture Management Function
4.
Data Development Function
5.
Data Operations Management Function
6.
Data Security Management Function
7.
Reference & Master Data Management Function
8.
Data Warehousing and Business Intelligence
Management Function
9.
Document and Content Management Function
10. Meta-data Management Function
11.
Data Quality Management Function
Data Management Process
ITIL
 IT Infrastructure Library
Information Lifecycle & SDLC
I N F O R M AT I O N L I F E C Y C L E
PLAN
SPECIFY
ENABLE
CREATE &
ACQUIRE
MAINTAIN &
USE
ARCHIVE &
RETRIEVE
PURGE
PLAN
ANALYSE
DESIGN
BUILD
TEST
DEPLOY
MAINTAIN
SYSTEMS DEVELOPMENT LIFECYCLE (SDLC)
(SOURCE DAMA)
The Information Lifecycle
 THE INFORMATION LIFECYCLE (DAMA)
PLAN
SPECIFY
ENABLE
› IM strategy › Architecture › Install or
› Governance › Conceptual, provision
servers,
logical and
› Define
networks,
policies and
procedures
for quality,
retention,
security etc
physical
modelling
storage,
DBMSs
› Access
controls
CREATE &
ACQUIRE
› Data
created,
acquired
(external),
extracted,
imported,
migrated,
organised
MAINTAI
N & USE
› Data
validated,
edited,
cleansed,
converted,
reviewed,
reported,
analysed
ARCHIVE
&
RETRIEVE
› Data
archived,
retained
and
retrieved
PURGE
› Data
deleted
(SOURCE DAMA)
Data Management Functions
› Enterprise Data Modelling
› Value Chain Analysis
› Related Data Architecture
›
›
›
›
Specification
Analysis
Measurement
Improvement
DATA QUALITY
MANAGEMENT
›
›
›
›
Architecture
Integration
Control
Delivery
META DATA
MANAGEMENT
Acquisition & Storage
Backup & Recovery
Content Management
Retrieval
Retention
› Strategy
› Organisation & Roles
› Policies & Standards
› Issues
› Valuation
DATA
WAREHOUSE
& BUSINESS
INTELLIGENCE
MANAGEMENT
›
›
›
›
Analysis
Data modelling
Database Design
Implementation
DATA
DEVELOPMENT
DATA
GOVERNANCE
DOCUMENT & CONTENT
MANAGEMENT
›
›
›
›
›
›
›
›
›
DATA
ARCHITECTURE
MANAGEMENT
Architecture
Implementation
Training & Support
Monitoring & Tuning
DATABASE
OPERATIONS
MANAGEMENT
Acquisition
Recovery
Tuning
Retention
Purging
DATA SECURITY
MANAGEMENT
REFERENCE &
MASTER DATA
MANAGEMENT
›
›
›
›
›
›
›
›
›
›
External Codes
Internal Codes
Customer Data
Product Data
Dimension Management
›
›
›
›
›
Standards
Classifications
Administration
Authentication
Auditing
Data Management Organisations
 DATA GOVERNANCE COUNCIL
The primary and highest authority organisation for data
governance. Includes senior managers serving as executive
data stewards, DM Leader and the CIO.
 DATA STEWARDSHIP STEERING COMMITTEE
One or more cross-functional groups of coordinating data
stewards responsible for support and oversight of a particular
data management initiative.
 DATA STEWARDSHIP TEAM
One or more business data stewards collaborating on an area
of data management, typically within an assigned subject area,
led by a Coordinating Data Steward.
 DATA GOVERNANCE OFFICE
Exists in larger organisations to support the above teams.
Data Stewards
 EXECUTIVE DATA STEWARD
Senior Managers who serve on a Data Governance
Council.
 COORDINATING DATA STEWARD
Leads and represents teams of business data stewards
in discussions across teams and with executive data
stewards. Coordinating data stewards are particularly
important in large organizations.
 BUSINESS DATA STEWARD
A knowledge worker and business leader recognized as
a subject matter expert who is assigned accountability
for the data specifications and data quality of
specifically assigned business entities, subject areas or
databases.
Data Governance
Workflow:
•
DQ & MDM Tool
What Is Data Governance?
THE EXERCISE OF AUTHORITY AND CONTROL, PLANNING, MONITORING, AND
ENFORCEMENT OVER THE MANAGEMENT OF DATA ASSETS. (DAMA INTERNATIONAL)
The Design & Execution Of Standards & Policies Covering …
 Design and operation of a management system to assure that data delivers value and is



not a cost
Who can do what to the organisation’s data and how
Ensuring standards are set and met
A strategic & high level view across the whole organisation
To Ensure …
 Key principles/processes of effective Information Management are put into practice
 Continual improvement through the evolution of an Information Management strategy
Data Governance Is NOT …
 A “one off” Tactical management exercise
 The responsibility of the Technology and IT department alone
Why Is Data Governance Critical?

Higher volumes of data generated by organisations (raw data, devices, CRM, ECM, IOT)

Proliferation of data-centric systems

New product development

To make the management of information front and centre and part of
the culture

Greater demand for reliable information: Gain deep insights
through analytics

Trust in Information: “What do you mean by ….?”

Tighter regulatory compliance

Competitive advantage: Improved decision making

Business change is no longer optional – it’s inevitable: Agility AND ability to respond to change
•
Big Data explosion (and hype)
Drivers for Data Governance
1.
2.
Global operations are typically complex, disparate and often
inefficient in their approaches to information management
(IM).
Shared and / or critical information is siloed & this siloed
information impairs enterprise level reporting, decisionmaking and performance optimization
3.
Aggregated information is required by certain business
functions, but is not readily available
4.
Business and IT neither talk the same language, nor have a
common understanding about information management,
causing a considerable knowledge gap to exist with regards
to critical data elements for the enterprise.
5.
Information management budgets and program focuses are
siloed, often inside individual projects with no enterprise
scope.
6.
Enterprise wide information lacks semantic consistency
(meaning & definition).
7.
The information management needs of multiple “owners”
across the enterprise must be rationalized.
8.
Decentralized IT organizations operate independently within
individual business unit, adding complexity and challenge.
9.
Business perceives IT as being insufficiently agile to meet ad
hoc information needs.
10. If even discussed, Business and IT can’t agree who actually
“owns” the data.
11.
Data context is critical to consumers, but often lacking.
12.
Operationalization of information management projects at
the enterprise level is a difficult challenge.
13. Regulation & compliance make effective information
management no longer optional.
14. Data quality must be operationalized across the entire
organization to assure the usefulness of the information
that business users consume.
15. Organisations need to become information-centric
enterprises.
16. Successful transformation of an organization into an
information-centric enterprise requires a designated
champion from senior management to educate and guide
the company in operationalizing strategic data plans.
17.
Strategic thinking and decision-making is needed on the
issue of whether data should be centralized or distributed.
Exercise
1.
List the top 5 drivers for Data Governance /
Information Management for Your Company
2.
For each of the drivers above, describe the issues
faced / evidence and implications of these
Data Governance Activities
Guiding principles
 Data management is a shared responsibility
 Data Stewards have responsibilities in all 10 management functions
 Every data governance/data stewardship programme is unique
 The best data stewards are found not made
 Shared decision making is the hallmark of data governance
 DG Councils/Data Stewards (legislative) while DMSO (executive)
 Data Governance occurs at enterprise and local levels
 No substitute for visionary and active IT leadership
 Centralised organisation for DM professionals is essential
 Define a formal charter for the Data Governance Council
 Data Strategy should be driven by the Business Strategy
Ethical issues raised by IT
Who should have access to data?
 To whom does the data belong?
 Who is responsible for maintaining accuracy and security?
 Does the ability to capture data imply a responsibility to
monitor its use?
 Should data patterns be analyzed to prevent risks to
employees / customers?
 How much information is necessary and relevant for
decision making?
 Should certain data "follow" individuals or corporations
throughout their lives?
 Does IT lead to job elimination, job repetition, or job
enhancement?
Preparing for an exam by
creating questions
 What is the Learning objective / Area of knowledge?
(Data Governance)
 Stem (construct a question):
 Key (the correct answer)
 Distractor 1
 Distractor 2
 Distractor 3
Data Architecture Management
Enterprise Architecture Types and
Structures
Enterprise Architecture
Enterprise architecture (EA) is the process of translating business vision and strategy into effective
enterprise change by creating, communicating and improving the key requirements, principles and
models that describe the enterprise's future state and enable its evolution.
Segment Architecture
Segment architecture is a detailed, formal description of areas within an enterprise, used at the
program or portfolio level to organize and align change activity.
Solutions Architecture
Solution architecture is a kind of architecture domain, that aims to address specific problems and
requirements, usually through the design of specific information systems or applications.
Enterprise Architecture Types and
Structures
LEVEL
Enterprise
Architecture
Segment
Architecture
Solution
Architecture
SCOPE
DETAIL
IMPACT
AUDIENCE
Agency /
Organization
Low
Strategic
Outcomes
All Stakeholders
Line of Business
Medium
Business Outcomes
Business Owners
High
Operational
Outcomes
Users and
Developers
Function / Process
Enterprise Architecture
Frameworks
Examples include:
TOGAF – The Open Group Architecture Framework, probably the most widely adopted
framework and contains an Architecture Development Method (ADM), content metamodel and defined artefacts within the business, application, data and technology
domains.
Zachman – the first enterprise architecture framework and defines artifacts in a 6 x 6
matrix (interrogatives (What, How, Where etc.) as columns and stakeholder perspective
as rows (Executive, Business , Architect etc.). It is an ontology not a methodology for
enterprise architecture.
FEA - The U.S. federal enterprise architecture (FEA) is an initiative of the U.S. Office of
Management and Budget that aims to comply with the Clinger-Cohen Act and provide a
common methodology for IT acquisition in the US federal government.
An enterprise architecture framework defines how to
organize the structure and views associated with an
enterprise architecture.
Enterprise Architecture Types and
Structures
Business Architecture
The Business Architecture defines the
business strategy, governance, organization,
and key business processes.
Application Architecture
The Application Architecture defines the
major kinds of application system necessary
to process the data and support the
business.
Data Architecture
The Data Architecture describes the
structure of an organization's logical and
physical data assets and data management
resources.
Technology (Infrastructure) Architecture
The Technology Architecture describes the
logical software and hardware capabilities
that are required to support the deployment
of business, data, and application services.
This includes IT infrastructure, middleware,
networks, communications, processing,
standards, etc.
Enterprise Architecture Domains
Enterprise Architecture Types and
Structures
Enterprise Data Model
Depicts the relationships between critical
data entities within the enterprise. This
diagram is developed to address the
concerns of business stakeholders.
Information Value Chain Matrix
A Value Chain diagram provides a high-level
orientation view of an enterprise and how it
interacts with the outside world.
Database Architecture
A data architecture describes the
architecture of the data structures used by a
business and/or its applications.
Data Integration Architecture
Data integration involves combining data
residing in different sources and providing
users with a unified view of these data e.g.
ETL or Virtualisation.
Document Content Architecture
The Document Content Architecture, or DCA
for short, was a document standard
supported by IBM in the early 1980s.
Meta-data Architecture
A model that describes how and with what
the architecture will be described in a
structured way.
Data Architecture Terms
Enterprise Architecture Types &
Structures
TOGAF
INPUTS &
OUTPUTS
Enterprise Architecture Types &
Structures
TOGAF
Artifacts
Enterprise Architecture Types & Structures
Enterprise Architecture
Types and Structures
Federal Enterprise Architecture Framework
Data Development
3. Data Development
Definition: Designing, implementing, and maintaining solutions to meet the data needs of the enterprise.
Goals:
1. Identify and define data requirements.
2. Design data structures and other solutions to these requirements.
3. Implement and maintain solution components that meet these requirements.
4. Ensure solution conformance to data architecture and standards as appropriate.
5. Ensure the integrity, security, usability, and maintainability of structured data assets.
Inputs:
• Business Goals and Strategies
• Data Needs and Strategies
• Data Standards
• Data Architecture
• Process Architecture
• Application Architecture
• Technical Architecture
Suppliers:
• Data Stewards
• Subject Matter Experts
• IT Steering Committee
• Data Governance Council
• Data Architects and Analysts
• Software Developers
• Data Producers
• Information Consumers
Participants:
• Data Stewards and SMEs
• Data Architects and Analysts
• Database Administrators
• Data Model Administrators
• Software Developers
• Project Managers
• DM Executives and Other IT
Management
Activities:
1. Data Modeling, Analysis and Solution Design (D)
1.Analyze Information Requirements
2.Develop and Maintain Conceptual Data Models
3.Develop and Maintain Logical Data Models
4.Develop and Maintain Physical Data Models
2. Detailed Data Design (D)
1.Design Physical Databases
2.Design Information Products
3.Design Data Access Services
4.Design Data Integration Services
3. Data Model and Design Quality Management
1.Develop Data Modeling and Design Standards (P)
2.Review Data Model and Database Design Quality (C)
3.Manage Data Model Versioning and Integration (C)
4. Data Implementation (D)
1.Implement Development / Test Database Changes
2.Create and Maintain Test Data
3.Migrate and Convert Data
4.Build and Test Information Products
5.Build and Test Data Access Services
6.Validate Information Requirements
7.Prepare for Data Deployment
Tools:
• Data Modeling Tools
• Database Management Systems
• Software Development Tools
• Testing Tools
•
•
•
•
Primary Deliverables:
• Data Requirements and Business Rules
• Conceptual Data Models
• Logical Data Models and Specifications
• Physical Data Models and Specifications
• Meta-data (Business and Technical)
• Data Modeling and DB Design Standards
• Data Model and DB Design Reviews
• Version Controlled Data Models
• Test Data
• Development and Test Databases
• Information Products
• Data Access Services
• Data Integration Services
• Migrated and Converted Data
Consumers:
• Data Producers
• Knowledge Workers
• Managers and Executives
• Customers
• Data Professionals
• Other IT Professionals
Data Profiling Tools
Model Management Tools
Configuration Management Tools
Office Productivity Tools
Activities: (P) – Planning (C) – Control (D) – Development (O) - Operational
What Is A Data Model?
A model is a
representation of
something in our
environment
making use of
standard symbols
to enable improved
understanding of
the concept
A data model
describes the
specification,
definition and rules
for data in a
business area
A data model is a
diagram (with
additional
supporting
metadata) that
uses text and
symbols to
represent data to
give the reader a
better
understanding of
the data
A data model
describes the
inherent logical
structure of the
data within a given
domain and, by
implication, the
underlying
structure of that
domain itself
A Data Model Represents
relationships
among those entities
and
(often implicit)
relationships among
those attributes
Relationships form
a concrete Business
Assertion
A relationship called "is
the placer of" operates
on entity classes
CUSTOMER and ORDER
and forms the
following concrete
assertion:
Relationships should be
named in both
directions, thus in the
other direction we
have:
“Each
CUSTOMER is
the placer of
zero, one or
more ORDER(s)"
"Each ORDER
must be placed
by one and only
one
CUSTOMER"
Is this true?
Is this true…
always?
What Is A Conceptual Data Model?
 A description of a Business (or an area of the
Business) in terms of the things it needs to
know about.
 The Data things are “entities” and the “facts
about things” are attributes & relationships.
 It’s a representation of the “real world”, not
a technical implementation of it
 Should be able to be understood by Business
users
Definition:
A Student is any person who has been admitted to a course, has paid, and has enrolled in one or more
modules within a course. Tutors and other staff members may also be Students
Business Assertions
 A Student enrolls for zero, one or more modules
 A Course can be taught through zero, one or more Modules
 A Room can be the location of zero, one or more modules
 A Tutor can be the teacher of zero, one or more modules
The Other Way?
 A Module is enrolled in by zero or many students
 A Module is an offering within zero or one course
 A Module is located in zero or one room
 A Module is taught by zero or one tutor
Really?
A Data Model Represents
WHO
Classes of
entities
(kinds of things)
about which a
company wishes
to know or hold
information
Person, Employee, Vendor, Customer,
Department, Organisation, …
WHAT
Product, Service, Raw Material, Training
Course, Flight, Room, …
WHEN
Time, Day, Date, Calendar, Reporting Period,
Fiscal Period, …
WHERE
WHY
Geographic location, Delivery address,
Storage Depot, Airport, …
HOW
Invoice, Policy, Contract, Agreement,
Document, Account, …
Order, Complaint, Inquiry, Transaction, …
What is an Entity?
Entity: A classification of the types of objects found in the real world --persons, places,
things, concepts and events – of interest to the enterprise.
DAMA Dictionary of Data Management
WHO?
WHAT?
WHEN?
WHERE?
WHY?
HOW?
Identifying Entities
ONE of those things?
Does this imply an instance of a SINGLE thing, not
What is
A Rule Of Thumb
Is it an
Entity?
a group or collection
Are there
MULTIPLE instances of these things?
ONE of those things?
What are the facts I want to hold against ONE of
How do I identify
those things?
WANT to hold facts about these things?
PROCESSES will act upon it, so does the
Do I even
“thing” make sense in a well formed process phrase
i.e. a verb – noun pair?
Sample Entities
Location
Product
Customer
Region
Order
Raw Material
Building
Exercise
Identify Entities
Exercise: Entities
Which of these might / might not be valid entities?
Student
Building
Maths
Department
Course
Catalogue
Attendance
Sheet
Enrolment
Form
Professor
Plumb
Prerequisite
list
Module
Organisation
Chart
Student
Directory
Module
Description
Qualification
Certification
Body
Graduation
Exercise: Entities
Which of these might / might not be valid entities?
Student
Building
Maths
Department
Course
Catalogue
Attendance
Sheet
Enrolment
Form
Professor
Plumb
Prerequisite
list
Module
Organisation
Chart
Student
Directory
Module
Description
Qualification
Certification
Body
Graduation
Data Model Levels
ENTERPRISE
Domain of an Enterprise data
concept
Communication Focus
Described in more
detail by
CONCEPTUAL
LOGICAL
Reverse engineered
into
Generates
schema of
PHYSICAL
Reverse engineered
into
Implemented
in
PHYSICAL IT SYSTEM
Implementation Focus
Within subject
area/domain
Described in more
detail by
We All Use Models
1st Normal Form
 1NF DEFINITION:
 Every non-key attribute in an entity must depend on it’s primary key
A PRIMARY KEY MUST BE
› Unique - the primary key uniquely
TO PUT A MODEL INTO
1NF
1. Identify the primary key
identifies each instance of the entity
2. Remodel repeating values
defined for every instance of the entity
3. Remodel multi-valued
attributes
› Mandatory – the primary key must be
› Unchanging – while not mandatory, it is
desirable that the primary key does not
change
2nd Normal Form
2NF DEFINITION:
EACH ENTITY MUST HAVE THE FEWEST POSSIBLE CORRECT PRIMARY KEY AT TRIBUTES
How do
we do
this?
Take each nonkey attribute
(i.e. not a primary,
foreign or alternate
key).
Test if it
depends
entirely on the
primary key
If it doesn’t,
move it out to
a new entity
3rd Normal Form
3NF DEFINITION:
E A C H N O N K E Y E L E M E N T M U S T B E D I R E C T LY D E P E N D E N T U P O N T H E P R I M A R Y K E Y A N D
NOT UPON ANY OTHER NON -KEY ATTRIBUTES
How do
we do
this?
For each nonkey attribute
(i.e. not a primary,
foreign or alternate
key)
Test if it
depends
entirely on the
primary key &
nothing else
If it doesn’t,
move it out to
a new entity
PRISM
DATABASE DESIGN PRINCIPLES
Performance and Ease of Use
Ensure quick and easy access to data
Reusability
Multiple applications can use the data
Integrity
The data should have valid business meaning and
value
Security
Data should only be available to authorised users
Maintainability
Ensure cost of maintenance does not exceed its
value to the organisation
Physical database design
best-practice
 Use normalised design for relational databases
supporting OLTP apps.
 Use views, functions and stored procedures to create nonnormalised, application-specific, object-friendly, conceptual
(virtual) views of data.
 Use standard naming conventions.
 Enforce data security and integrity at the database level, not
in the application.
 Keep database processing on the database server as much as
possible.
 Grant permissions on database objects only to application
groups or roles, not to individuals.
 Do not permit any direct, ad-hoc updating of the database.
Transforming from
a logical to physical data model






Denormalisation
Selectively and justifiably violating normalisation rules to reduce
retrieval time, potentially at the expense of additional space,
insert/update time and reduced data quality.
Surrogate keys
Substitute keys not visible to the business.
Indexing
Create additional index files to optimise specific types of queries.
Partitioning
Break a table or file horizontally or vertically.
Views
Virtual tables used to simplify queries, control data access and
rename columns.
Dimensionality
Creation of fact tables with associated dimension tables. Structured
as star schemas and snowflake schemas for BI.
Database index architecture

Non-clustered
The data is present in arbitrary order, but the logical ordering is specified
by the index. The non-clustered index tree contains the index keys in
sorted order, with the leaf level of the index containing the pointer to
the record.

Clustered
Clustering alters the data block into a distinct order to match the index,
resulting in the row data being stored in order. The primary feature of a
clustered index is the ordering of the physical data rows in accordance
with the index blocks that point to them.

Cluster
When multiple databases and multiple tables are joined. The records for
the tables sharing the value of a cluster key shall be stored together in
the same or nearby data blocks. This may improve the joins of these
tables on the cluster key, since the matching records are stored together
and less I/O is required to locate them. A cluster can be keyed as a B-Tree
index or hash table.
Types of indexes
Bitmap index
A bitmap index is a special kind of index
that stores the bulk of its data as bit
arrays. Works well for data such as gender
(small number of distinct values but many
occurrences of those values.
Sparse index
A sparse index in databases is a file with
pairs of keys and pointers for every block
in the data file. Every key in this file is
associated with a particular pointer to the
block in the sorted data file.
Dense index
A file with keys and pointers for every
record in the data file. Every key in this file
is associated with a particular pointer to a
record in the sorted data file.
Reverse index
A reverse key index reverses the key value
before entering it in the index. E.g., the
value 24538 becomes 83542 in the index.
Reversing the key value is particularly
useful for indexing data such as sequence
numbers, where new key values
monotonically increase.
Partitioning
Horizontal partitioning
Horizontal partitioning is the partitioning of a table
into a number of smaller tables on the basis of rows.
For example, in an employee table, employees with a
salary of less than £25, 000 will be partitioned into a
different table.
Vertical partitioning
Vertical partitioning is dividing the table based on the
different columns. For example, in a customer table,
retrieving only the name and contact number of
customers into a different table.
Hierarchical Data Models
A hierarchical database
model is a data model in
which the data is organized
into a tree-like structure.
The structure allows
representing information
using parent/child
relationships: each parent can
have many children, but each
child has only one parent.
Network Data Models
The network model is a
database model conceived as a
flexible way of representing
objects and their relationships.
Its distinguishing feature is that
the schema, viewed as a graph in
which object types are nodes
and relationship types are arcs, is
not restricted to being a
hierarchy or lattice.
Prime, Class, Modifier, Qualifier Words
The following word classification types are used by various
data modelling tools and are defined below with examples.
Prime Word:
The prime word identifies the object or element
being defined. Typically, these objects represent a
person, place, thing, or event about which an
organization wishes to maintain information. Prime
words may act as primary search identifiers when
querying a database system and provide a basic list
of keywords for developing a general-to-specific
classification scheme based on business usages.
CUSTOMER in Customer Address is an example of a
prime word.
Modifier:
A modifier gives additional information about the
class word or prime word. Modifiers may be
adjectives or nouns. DELIVERY in Customer Delivery
Address is an example of a modifier. Other modifier
examples: ANNUAL, QUARTERLY, MOST, and LEAST.
Class Word:
A class word is the most important noun in a data
element name. Class words identify the use or
purpose of a data element. Class words designate
the type of information maintained about the object
(prime word) of the data element name. ADDRESS in
Customer Address is an example of a class word.
Qualifier:
A qualifier is a special kind of modifier that is used
with a class word to further describes a characteristic
of the class word within a domain of values, or to
specify a type of information that can be attached
to an object.
Examples: FEET, METERS, SECONDS, and WEEKS.
Defining Word Classification Types
ACID Test For Transaction Processing
A T O M I C I T Y, C O N S I S T E N C Y, I S O L A T I O N , D U R A B I L I T Y
ATOMICITY
Atomicity requires that database modifications
must follow an "all or nothing" rule. Each
transaction is said to be atomic. If one part of the
transaction fails, the entire transaction fails and
the database state is left unchanged.
To be compliant with the 'A', a system must
guarantee the atomicity in each and every
situation, including power failures / errors /
crashes.
This guarantees that 'an incomplete transaction'
cannot exist.
ACID Test For Transaction Processing
A T O M I C I T Y, C O N S I S T E N C Y, I S O L A T I O N , D U R A B I L I T Y
CONSISTENCY
The consistency property ensures that any transaction the
database performs will take it from one consistent state to another.
Consistency states that only consistent (valid according to all the
rules defined) data will be written to the database.
Quite simply, whatever rows will be affected by the transaction will
remain consistent with each and every rule that is applied to them
(including but not only: constraints, cascades, triggers).
While this is extremely simple and clear, it's worth noting that this
consistency requirement applies to everything changed by the
transaction, without any limit (including triggers firing other triggers
launching cascades that eventually fire other triggers etc.) at all.
ACID Test For Transaction Processing
A T O M I C I T Y, C O N S I S T E N C Y, I S O L A T I O N , D U R A B I L I T Y
I S O L AT I O N
The requirement that no transaction should be able to interfere with another transaction at all.
In other words, it should not be possible that two transactions affect the same rows run concurrently, as the
outcome would be unpredicted and the system thus made unreliable.
This property of ACID is often relaxed (i.e. partly respected) because of the huge speed decrease this type of
concurrency management implies.
In effect the only strict way to respect the isolation property is to use a serial model
where no two transactions can occur on the same data at the same time and
where the result is predictable (i.e. transaction B will happen after
transaction A in every single possible case).
In reality, many alternatives are used due to speed concerns,
but none of them guarantee the same reliability.
ACID Test For Transaction Processing
A T O M I C I T Y, C O N S I S T E N C Y, I S O L A T I O N , D U R A B I L I T Y
DURABILITY
Durability means that once a transaction has been
committed, it will remain so.
In other words, every committed transaction is
protected against power loss/crash/errors and cannot
be lost by the system and can thus be guaranteed to
be completed.
In a relational database, for instance, once a group of
SQL statements execute, the results need to be stored
permanently. If the database crashes right after a group
of SQL statements execute, it should be possible to
restore the database state to the point after the last
transaction committed.
BASE
These ACID qualities seem indispensable, and yet they are
incompatible with availability and performance in very large
systems.
For example, suppose you run an online book store and you
proudly display how many of each book you have in your
inventory.
Every time someone is in the process of buying a book, you
lock part of the database until they finish so that all visitors
around the world will see accurate inventory numbers.
That works well if you run The Shop Around the Corner but
not if you run Amazon.com.
BASE
Amazon might instead use cached data.
Users would not see not the inventory count at this second, but
what it was say an hour ago when the last snapshot was taken.
Also, Amazon might violate the “I” in ACID by tolerating a small
probability that simultaneous transactions could interfere with
each other.
For example, two customers might both believe that they just
purchased the last copy of a certain book. The company might
risk having to apologize to one of the two customers (and
maybe compensate them with a gift card) rather than slowing
down their site and irritating lots of other customers.
BASE
The CAP computer science theorem quantifies the inevitable trade-offs.
Eric Brewer’s CAP theorem: If you want consistency, availability, and
partition tolerance, you have to settle for two out of three. (For a
distributed system, partition tolerance means the system will continue
to work unless there is a total network failure. A few nodes can fail and
the system keeps going.)
ACID
An alternative to ACID is BASE:
BAsic Availability
BASE
BASE
Soft-state
Eventual consistency
Rather than requiring consistency after every transaction, it is enough
for the database to eventually be in a consistent state. (Accounting
systems do this all the time. It’s called “closing out the books.”) It’s OK
to use stale data, and it’s OK to give approximate answers.
Data Operations Management
DBA Responsibilities
 Ensuring the performance and reliability of the
database, including performance tuning, monitory
and error reporting.
 Implementing appropriate backup and recovery
mechanisms to guarantee the recoverability of the
data in any circumstance.
 Implementing mechanisms for clustering and failover
of the database, if continual data availability data is a
requirement.
 Implementing mechanisms for archiving data
operations management.
Factors affecting availability
 Manageability
The ability to create and maintain an effective
environment.
 Recoverability
The ability to re-establish service after interruption,
and correct errors caused by unforeseen events or
component failures.
 Reliability
The ability to deliver service at specified levels for a
stated period.
 Serviceability
The ability to determine the existence of problems,
diagnose their cause and repair/solve the problems.
Causes of poor database
performance
 Memory allocation (buffer/cache for data)
 Locking and blocking
 Failure to update database statistics
 Poor SQL coding
 Insufficient indexing
 Application activity
 Increase in the number, size or use of databases
 Database volatility
Data Technology Architecture
 DBMS software
 Relational database management utilities
 Data modelling and management software
 Business intelligence software for reporting and analysis
 Extract-Transform-Load (ETL) and other data integration tools
 Data quality analysis and data cleansing tools
 Meta-data management software, including meta-data
repositories
Data technologies to be included in the technology architecture include
Technology Architecture
Components - “Bricks”
Current
Products currently supported and used.
Preferred
Products preferred for use by most
applications.
Deployment Period
Products to be deployed for use in the next 1-2
years.
Containment
Products limited to use by certain applications.
Strategic Period
Products expected to be available for use in
the
next 2+ years.
Emerging
Products being researched and piloted for
possible
future deployment.
Retirement
Products the organisation has retired or
intends to retire this year.
Data Security Management
Data Security Guiding Principles
 Be a responsible trustee of data
about all parties
 Set passwords following a set of password
complexity guidelines
 Understand and comply with all
pertinent regulations and
guidelines
 Create role groups
 Use CRUD matrices to help map
data access needs
 Ensure Data Security Policy is
reviewed and approved by the
governance council
 Identify detailed application
security requirements on
projects
 Classify all enterprise data and
information products for
confidentiality
 Formally request, track and approve all user
and group authorisations
 Centrally manage user identity data and
group membership data
 Use views to restrict access to sensitive
columns or specific rows
 Strictly limit and consider every use of
shared or service user accounts
 Monitor data access activity to understand
trends
Sources of Data Security Requirements
• Privacy and confidentiality
of clients information
• Trade secrets
• Business partner activity
• Mergers & acquisitions
• Regulations may restrict
access to information
• Acts to ensure openness and
accountability
• Provision of subject access
rights
• And more …
STAKEHOLDER
CONCERNS
GOVERNMENT
REGULATIONS
NECESSARY
BUSINESS
ACCESS NEEDS
LEGITIMATE
BUSINESS
CONCERNS
• Data Security must be
appropriate
• Data security must not be
too onerous to prevent
users from doing their jobs.
• Goldilocks principle
• Trade secrets
• Research & other IP
• Knowledge of customer
needs
• Business partner
relationships and impending
deals
Source: DMBoK
A4
AUTHENTICATION
AUTHORISATION
Validate users are who they say
they are
Identify the right individuals and
grant them the right privileges to
specific, appropriate views of
data
ACCESS
AUDIT
Enable individuals and their
privileges in a timely manner
Review security actions and user
activity (to ensure compliance
with regulations and
conformance with policy and
standards)
A4
CIA
CONFIDENTIALITY
Preventing the disclosure of information to unauthorised
individuals or systems.
INTEGRITY
Preventing the undetectable modification of information.
AVAILABILITY
Ensuring that information is available where and when it is
needed.
4 issue types:
THREAT
VULNERABILITY
An aspect that might be
environmental or manmade or
environmental) that has the potential
to compromise the confidentiality,
integrity or availability of an
information asset
A weakness that could be exploited
to compromise the confidentiality,
integrity or availability of an
information asset
RISK
IMPACT
the likelihood that a threat will
exploit a vulnerability to compromise
the confidentiality, integrity or
availability of an information asset
A loss of confidentiality, integrity or
availability which may result in more
significant losses to competitive
advantage, revenue, life, property or
reputation
Source: DMBoK
Exercise
A4 = ?
CIA = ?
4 Issue Types = ?
Network Security
Network Security Threats:
 Viruses, worms, and Trojan horses
 Spyware and adware
 Zero-day attacks, also called zero-hour attacks
 Hacker attacks
 Denial of service attacks
 Data interception and theft
 Identity theft
Network Security Components:
 Anti-virus and anti-spyware
 Firewall, to block unauthorized access to your network
 Intrusion prevention systems (IPS), to identify fast-spreading threats,
such as zero-day or zero-hour attacks
 Virtual Private Networks (VPNs), to provide secure remote access
Securing IT Infrastructure
Encryption
The process of transforming information
using an algorithm (called a cipher) to make
it unreadable to anyone except those
possessing special knowledge, usually
referred to as a key.
Network Encryption
A network security process that applies
crypto services at the network transfer layer
- above the data link level, but below the
application level.
Email Encryption
A network security process that applies crypto
services at the network transfer layer - above
the data link level, but below the application
level.
 S/MIME - form of encryption that is included
in several email clients by default (such as
Outlook Express and Mozilla Thunderbird)
and relies on the use of a Certificate
Authority to issue a secure email certificate.
 PGP - the commercial version, where
OpenPGP is a free, open source equivalent)
takes a de-centralised approach to email
encryption. It does not rely on trusting a
Certificate Authority, rather the users create
encryption keys themselves.
IT Security Threats
 Privilege Escalation
Software programs often have bugs that
can be exploited. These bugs can be
used to gain access to certain resources
with higher privileges that can bypass
security controls.
 Worm
A worm is a specific type of virus.
Unlike a typical virus, it’s goal isn’t to
alter system files, but to replicate so
many times that it consumes hard disk
space or memory.
 Virus
A virus is a computer program that, like
a medical virus, has the ability to
replicate and infect other computers.
 Spyware
Like Trojans, spyware can pilfer
sensitive information, but are often
used as advertising tools as well. The
intent is to gather a user’s information
by monitoring Internet activity and
transmitting that to an attacker.
 Trojan
They masquerade as normal, safe
applications, but their mission is to allow
a hacker remote access to your
computer. In turn, the infected
computer can be used as part of a denial
of service attack and data theft can
occur (e.g. keystroke logger).
 Spam
Spam is unsolicited junk mail. It comes
in the form of an advertisement, and in
addition to being a time waster, has he
ability to consume precious network
bandwidth.
IT Security Threats

Botnets
Botnets are created with a Trojan and reside on IRC networks. The bot
can launch an IRC client, and join chat room in order to spam and
launch denial of service attacks.

Logic bomb
They are bits of code added to software that will set off a specific
function. Logic bombs are similar to viruses in that they can perform
malicious actions like deleting files and corrupting data.

Adware
Similar to spyware, adware observes a user’s Internet browsing habits.
But the purpose is to be able to better target the display of web
advertisements.

Rootkits
Rootkits are some of the most difficult to detect. They are activated
when your system boots up — before anti-virus software is started.
Rootkits allow the installation of files and accounts, or the purposes of
intercepting sensitive information.
Reference & Master Data Management
Reference and Master Data
Reference Data
Is used to classify or categorise other
data, for example.
Master Data
Is the authoritative, most accurate
data available about key business
entities, used to establish the context
for transactional data. Master data
values are considered ‘golden’.
Code
Value
Description
US
United States of America
GB
United Kingdom
What is Event / Transaction Data?
Event data example:
“Bob bought a Mars bar from Morrison's on Monday 3rd Jan at 4pm and paid using cash.”
WHO
WHAT
WHERE
WHEN
HOW
QUANTITY
AMOUNT
Bob Smith
Twix bar
Morrison's, Bath
16:00 Monday
3rd January 2011
Cash
1
£0.60
CUSTOMER
CODE
PRODUCT
CODE
VENDOR
CODE
DATE
PAYMENT
METHOD
QUANTITY
AMOUNT
BS005
CONF101
WMBATH
2011-01-03 16:00
CASH
1
£0.60
Terminology
FIELD (or attribute): column in a database
table
RECORD: row in a database table
About Event Data
 AKA Transaction data
Includes information:
 Describes an action (a
verb)
E.g. “buy”
identifying the nouns that
were involved in the
event
(the Who / What / Where
/ When / How and maybe
even the Why):
 May include
measurements about
the action:
 Quantity bought
 Amount paid
› Bob Smith
› Twix bar
› Morrisons, Bath
› 16:00 Monday 3rd Jan
2011
› Cash
Does not include
information:
describing the nouns:
› Bob is female, aged 25
and works for British
Airways
› Monday 3rd Jan 2011 is
a bank holiday
› The address of
Morrisons Bath is: York
Place, London Road,
Bath, BA1 6AE.
› That Twix is a special
offer 200g jumbo bar
What is…
MASTER DATA?
› Defines and describes the nouns (things) of
the business.
e.g. Field, Well, Rig, Product, Store, Theraputic
Area, Adverse Event, etc.
› Data about the “things” that will participate
in events.
› Provides contextual information about
events / transactions.
› Stored in many systems
› Packaged Systems
› Line of Business Systems
› Spreadsheets
› SharePoint Lists
MASTER DATA
MANAGEMENT
(MDM)?
MASTER DATA
MANAGEMENT
(MDM)?
› The ongoing reconciliation
and maintenance of master
data.
› Comprises a set of
processes and tools that
consistently defines and
manages the nontransactional data entities
of an organisation.
› Control over master data
values to enable consistent,
shared, contextual use across
systems, of the most accurate,
timely, and relevant version
of truth about essential
business entities.
[DAMA, the Data Management
Association]
[Wikipedia]
Master Data – What’s the problem?
No organisation has just one
system
(unless the are tiny)
Details about the same noun are
found in multiple systems, e.g.
Customer, Product
Problems
 Data may need to be rekeyed in
each system
The same customers may be defined in:
• Finance systems
• Marketing systems
• Line of business systems
 Systems may not be in synch (new
records, updated records)
 Duplicate data: are “ABC Ltd” and
“ABC Limited” the same thing?
 No single version of the truth
 Reporting / Analysis: difficult to
combine data from multiple
systems
SOLUTION:
Master Data Management!
Standard “Hub” architectures
1. REPOSITORY
2. REGISTRY
3. HYBRID
4. VIRTUALISED
*A key difference is the
number of fields that are
stored centrally
Example: PERSON
Customer
code
First
name
Last
name
Date of birth
Preferred
delivery address
line 1
Preferred
delivery address
post code
Credit
rating
Occupation
Car
BS005
Bob
Smith
1985-12-25
Royal Crescent
BA1 7LA
A
Information
Architect
Audi R8
IDENTIFIERS
CORE FIELDS
ALL FIELDS
Example: PERSON
Customer
code
First
name
Last
name
Date of birth
Preferred
delivery address
line 1
Preferred
delivery address
post code
Credit
rating
Occupation
Car
BS005
Bob
Smith
1985-12-25
Royal Crescent
BA1 7LA
A
Information
Architect
Audi
R8
ALL FIELDS
Repository
IDENTIFIERS
CORE FIELDS
CORE
FIELDS
Hybrid
IDENTIFIERS
Registry
ALL FIELDS
NONE
Virtualised
Master Data Examples

Party Master Data
Includes data about individuals, organizations and the roles they play in
business relationships (e.g. customers, citizens, patients, vendors,
suppliers, business partners, competitors, employees, students etc.

Financial Master Data
Includes data about business units, cost centers, profit centers, general
ledger accounts, budgets, projections and projects.

Product Master Data
Focusses on an organization's internal products or services. May include
bill-of-materials, manuals, design documents, SOPs etc. (can be
unstructured data).

Location Master Data
Includes data about business party addresses and geographic
positioning coordinates, such as latitude, longitude and altitude.
Master Data Match Rules
Three primary scenarios:
1. Duplicate identification match rules
Focus on a specific set of fields that uniquely identify an entity and identify merge opportunities
without taking automatic action. Business Data stewards can review these occurrences and
decide to take action on a case-by-case basis.
2. Match-merge rules
Match records and merge the data from these records into a single, unified, reconciled and
comprehensive record. If the rules apply across data sources, create a single unique and
comprehensive record in each database.
3. Match-link rules
Identify and cross-reference records that appear to relate to a master record without updating
the content of the cross-referenced record. Match-link rules are easier to implement and much
easier to reverse.

Rules around the matching, merging and linking of data from multiple
systems about the same person, group, place or thing.
Guiding Principles
 Shared reference and master
data belong to the
organisation, not to a
particular application or
department.
 Reference and master data
management is an on-going
data quality improvement
program; its goals cannot be
achieved by one project alone.
 Golden data values represent the
organisation’s best efforts at
determining the most accurate,
current and relevant data values for
contextual use. New data may prove
earlier assumptions to be false.
Therefore apply matching rules with
caution and ensure that any changes
that are made are reversible.
 Replicate master data values only
from the database of record.
 Business data stewards are
the authorities accountable
 Request, communicate, and, in some
for controlling reference data
cases, approve changes to reference
values. Business data stewards
data values before implementation.
work with data professionals
to improve the quality of
reference and master data.
DW & BI Management
Why Use A Data Warehouse?
Legacy Applications + Databases = Chaos
Production
Control
MRP
Inventory
Control
Parts Management
Logistics
Shipping
Raw Goods
Order Control
Purchasing
Enterprise Data Warehouse = Order
Continuity
Consolidation
Control
Compliance
Collaboration
Finance
Marketing
Sales
Accounting
Single version of
the truth
Management
Reporting
Enterprise Data
Warehouse
Engineering
Actuarial
Human
Resources
Every question = decision
Two purposes of data warehouse:
1) save time building reports;
2) Report & analyze in ways you could not do before
Simplified Business Intelligence Stack
REPORTING & ANALYSIS TOOLS
Standard/ad-hoc reports, analytics,
data mining, dashboards, scorecards…
DATA WAREHOUSE
Dimensional data model (star schema)
or Virtual Data Warehouse
DATA INTEGRATION LAYER
E.G Extract, Transform & Load (ETL) or
Enterprise Information Integration (EII)
DATA
SOURCE
DATA
SOURCE
DATA
SOURCE
DATA
SOURCE
Operational systems, legacy databases,
ERP/CRM, text files, spreadsheets…
What is Data Warehousing?
(DMBoK)
Data Warehousing is the term used to describe the processes that maintain
the data contained within a data warehouse, namely:






Extract processes
Cleansing processes
Transformation processes
Load processes
Associated Control processes
The use of Meta-data
What is a Data Warehouse?
(2)
Integrated Decision Support
Database, and…
…Related Software Programs
•
CDC – Change Data Capture
•
ETL – Extract, Transform &
Load
•
DQ – Data Quality
•
DV – Data Virtualisation
DATA WAREHOUSE
DATA INTEGRATION LAYER
DATA
SOURCE
DATA
SOURCE
DATA
SOURCE
DATA
SOURCE
DAMA Definition
REPORTING & ANALYSIS TOOLS
What is Business Intelligence?
(DMBoK)
Business Intelligence (BI) is a set of business capabilities.
BI can mean any of the following:





Query, analysis and reporting by knowledge workers
Query, analysis and reporting processes and procedures
A synonym for the business intelligence environment
The market segment for business intelligence tools
Strategic and operational analytics and reporting on corporate operational data to
support business decisions, risk management and compliance
 A synonym for Decision Support Systems (DSS)
BROAD DEFINITION:
NARROWER DEFINITION:
› Analysis, Query and Reporting
REPORTING & ANALYSIS TOOLS
BROAD DEFINITION
› “Business Intelligence a set of
methodologies, processes,
architectures, and technologies
that transform raw data into
meaningful and useful
information used to enable more
effective strategic, tactical, and
operational insights and decisionmaking.” [Forrester Research]
DATA WAREHOUSE
DATA INTEGRATION LAYER
DATA
SOURCE
DATA
SOURCE
DATA
SOURCE
DATA
SOURCE
NARROW
DEFINITION
What is Business Intelligence (BI)?
What is Data Warehousing and Business
Intelligence Management (DW-BIM)? (DMBoK)
Data Warehousing and Business Intelligence Management (DW-BIM)
is the collection, integration and presentation of data to knowledge
workers for the purpose of business analysis and decision-making.
 DW-BIM is composed of activities supporting all phases of the decision support lifecycle that provides context, moves and transforms data from sources to a common
target data store, and then provides knowledge workers various means of access,
manipulation and reporting of the integrated target data.
Objectives of DW-BIM include…
Integrated data
 From disparate sources
 Historical and current
Ensuring credible, accurate, timely data is used in reports and BI
applications
Ensuring high-performance data access for reports and BI
applications
Making best use of the outputs of the Reference and Master
Data Management, Data Governance, Data Quality and Metadata disciplines
A Dimensional Model
Dimension tables
 Examples: Location, Product, Time, Promotion,
Organisation etc.
 Records in the dimension tables correspond to nouns.
 The data in the dimension tables changes slowly – the
number of new records created each day is typically
low.
 Fact tables
 Contains measures (e.g. Sales Value GBP)
and dimension columns
 Records in the fact tables correspond to
events, transactions, or measurements.
 The number of new records created each day
is typically high.
Dimension tables
 A dimension table is one of a set of companion tables to a fact table,
forming a vertex of the “star”
 Each dimension table represents a particular business entity – records
represent nouns within the business
Products, Customers, Times, Locations etc.
 Each dimension table contains a single field that serves as its primary key
 Each dimension table also contains a number of fields providing details of
the entity – each of these fields is known as an attribute (or dimension)
Dimension tables and Hierarchies
Hierarchies for the dimensions are stored in
the dimensional table itself.
 E.g. Product dimension has the hierarchies
from Manufacturer, Brand and Product Type
to Product.
There is no need for the individual
hierarchical lookup tables like Manufacturer
lookup, Brand lookup, Product Type lookup
to be shown in the model.
Dimension tables (summary)
1. Records in dimension tables correspond to nouns
 Tables are “short” – 10s to 1,000s of records
2. Data changes slowly
3. Rich set of attributes
 Tables are “wide” – many columns
4. Denormalised
 No need to join to further lookup tables
 Lots of redundancy
Fact tables
 Facts are used to store numerical
measurements captured in a
‘measurement event’ caused by a
business process
 Each fact table has a
compound primary key
consisting of two or more
foreign keys
 A fact table is the primary table in
each dimensional model, forming
the centre of “star”
 A fact table may additionally
contain fields that are used to
record the value of a business
measure, e.g. Sales Value in
GBP – each of these fields is
known as a measure (or fact)
 Each fact table represents a manyto-many relationship
 Each fact table contains two or
more foreign keys to dimension
tables
 The most useful measures are
numeric and additive
‘Additive’ means that it is meaningful to sum the values over
multiple records.
Cost and Revenue are examples of additive facts.
Fact tables
 Records in fact tables correspond to events,
transactions, or measurements.
 Data is added regularly
›Tables are “long” – often millions of
records
 Rich set of attributes
› Tables are “narrow” – minimal number of
columns
 Low redundancy
What are slowly changing
dimensions?
Dimensions whose values change
infrequently as a result of UPDATE
operations in the source system
 For example
› A product may be renamed
› A product may be reclassified (i.e. the “product type”
may change)
› A supplier may change address
› A person might change their name
› Etc., etc.
 In fact most dimensions will
change slowly over time!
Why do slowly changing dimensions
present problems?
 The Data Warehouse will need to be
updated to reflect the changes made in
the source system.
_so there’s some ETL work to be done.
 If we just overwrite the details with the
new details, we’ll effectively change the
history stored in the Data Warehouse.
_When we re-run reports against historical data,
they’ll no longer return the same results as before.
How can we handle slowly changing
dimensions?
There are standard techniques for handling slowly
changing dimensions.
1.Type 1 (overwrite)
2.Type 2 (add new row)
3.Type 3 (add new attribute)
4. Type 4 (add history table)
5. Type 6 (hybrid)
6. Others – see the internet!
 We may need to employ different techniques for
different fields.
Type 1 - Overwrite
Overwrite the dimension record with
the new values, thereby losing
history.
_Used when correcting
an error, for instance
Type 2 – Create new record
Create a new additional dimension record using a new
value of the surrogate key (NOTE: a surrogate key is
required!)
_Used when a true change has
occurred and it is appropriate to
partition history.
_Historic FACT records can continue
to point to the “old” dimension
record while new FACT records will
point to the “new” dimension
record.
Type 3 – Use an “old” field
Create an “old” field in the dimension record
to store the immediate previous value of the
attribute.
_Used when the change is “soft” or
tentative, or when we wish to track
history based on the old value as
well as the new (e.g. change of sales
boundaries)
_Supports analysis by either of two
versions.
_Works best when there is only one
soft change at a time.
Slowly Changing Dimensions
Summary
Three most common techniques:
1. Type 1 – Overwrite
2.Type 2 – Keep all old versions in separate
records
3.Type 3 – Keep the latest old version in an
“old” field
Different techniques for different
fields
Document
&
Content
Management
8. Document & Content Management
Definition: Planning, implementation, and control activities to store, protect, and access data found
within electronic files and physical records (including text, graphics, images, audio, and video).
Goals:
1. To safeguard and ensure the availability of data assets stored in less structured formats.
2. To enable effective and efficient retrieval and use of data and information in unstructured formats.
3. To comply with legal obligations and customer expectations.
4. To ensure business continuity through retention, recovery, and conversion.
5. To control document storage operating costs.
Inputs:
• Text Documents
• Reports
• Spreadsheets
• Email
• Instant Messages
• Faxes
• Voicemail
• Images
• Video recordings
• Audio recordings
• Printed paper files
• Microfiche
• Graphics
Suppliers:
• Employees
• External parties
Activities:
1. Document / Records Management
1.Plan for Managing Documents / Records (P)
2.Implement Document / Records Management Systems for
Acquisition, Storage, Access, and Security Controls ( O, C)
3.Backup and Recover Documents / Records (O)
4.Retain and Dispose of Documents / Records (O)
5.Audit Document / Records Management (C)
2. Content Management
1.Define and Maintain Enterprise Taxonomies (P)
2.Document / Index Information Content Meta-data (O)
3.Provide Content Access and Retrieval (O)
4.Govern for Quality Content (C)
Participants:
• All Employees
• Data Stewards
• DM Professionals
• Records Management Staff
• Other IT Professionals
• Data Management Executive
• Other IT Managers
• Chief Information Officer
• Chief Knowledge Officer
Tools:
• Stored Documents
• Office Productivity Tools
• Image and Workflow
Management Tools
• Records Management Tools
• XML Development Tools
• Collaboration Tools
• Internet
• Email Systems
Primary Deliverables:
• Managed records in many
media formats
• E-discovery records
• Outgoing letters and emails
• Contracts and financial
documents
• Policies and procedures
• Audit trails and logs
• Meeting minutes
• Formal reports
• Significant memoranda
Consumers:
• Business and IT users
• Government regulatory agencies
• Senior management
• External customers
Metrics:
• Return on investment
• Key Performance Indicators
• Balanced Scorecards
Activities: (P) – Planning (C) – Control (D) – Development (O) - Operational
Terms
 Document Management
The storage, inventory and control of electronic and
paper documents.
 Content Management
The organisation, categorisation, and structure of data
/ resources so that they can be stored, published and
reused in multiple ways.
 Taxonomy
The science or technique of classification.
 Ontology
A type of model that represents a set of concepts and
their relationships within a domain.
Main Activities
Document &
Records
Management
•Document / Record Management is the lifecycle management of the
designated significant documents of the organization.
•Not all documents are significant as evidence of the organization’s
business activities and regulatory compliance.
•Records management manages paper and microfiche / film records
from their creation or receipt through processing, distribution,
organization, and retrieval, to their ultimate disposition.
•Content management is the organization, categorization, and structure of
data / resources to be stored, published, and reused in multiple ways.
Content
Management
•Content includes data / information, that exists in many forms and in
multiple stages of completion within its lifecycle. Content may be found on
electronic, paper or other media.
•The lifecycle of content can be active, with daily changes through
controlled processes for creation, modification, and collaboration of
content before dissemination.
Document/Record Management
Lifecycle
Identification
Creation,
Approval
and
enforcement
of policies
Classification
of
documents
/ records
Storage
Retrieval
and
Circulation
Preservation
and
Disposal
Taxonomies
 Grouped into four types:
1.Flat Taxonomy – no relationship among the
controlled set of categories (example: list of
countries).
2.Facet Taxonomy – for example meta-data
where each attribute (creator, title,
keywords etc.) is a facet of a content object.
3.Hierarchical Taxonomy – for example
geography, from continent down to address.
4.Network Taxonomy – for example a
recommender engine (if you liked that, you
may also like this…).
MetaData
Management
9. Meta-data Management
Definition: Planning, implementation, and control activities to enable easy access to high quality, integrated meta-data.
Goals:
1. Provide organizational understanding of terms, and usage
2. Integrate meta-data from diverse source
3. Provide easy, integrated access to meta-data
4. Ensure meta-data quality and security
Inputs:
• Meta-data
Requirements
• Meta-data Issues
• Data Architecture
• Business Meta-data
• Technical Meta-data
• Process Meta-data
• Operational Meta-data
• Data Stewardship
Meta-data
Suppliers:
• Data Stewards
• Data Architects
• Data Modelers
• Database
Administrators
• Other Data
Professionals
• Data Brokers
• Government and
Industry Regulators
Activities:
1. Understand Meta-data Requirements (P)
2. Define the Meta-data Architecture (P)
3. Develop and Maintain Meta-data Standards (P)
4. Implement a Managed Meta-data Environment (D)
5. Create and Maintain Meta-data (O)
6. Integrate Meta-data (C)
7. Manage Meta-data Repositories (C)
8. Distribute and Deliver Meta-data (C)
9. Query, Report, and Analyze Meta-data (O)
Participants:
• Meta-data Specialist
• Data Integration
Architects
• Data Stewards
• Data Architects and
Modelers
• Database Administrators
• Other DM Professionals
• Other IT Professionals
• DM Executive
• Business Users
Tools:
• Meta-data Repositories
• Data Modeling Tools
• Database Management
Systems
• Data Integration Tools
• Business Intelligence Tools
• System Management Tools
• Object Modeling Tools
• Process Modeling Tools
• Report Generating Tools
• Data Quality Tools
• Data Development and
Administration Tools
• Reference and Master Data
Management Tools
Primary Deliverables:
• Meta-data Repositories
• Quality Meta-data
• Meta-data Models and
Architecture
• Meta-data Management
Operational Analysis
• Meta-data Analysis
• Data Lineage
• Change Impact Analysis
• Meta-data Control Procedures
Consumers:
• Data Stewards
• Data Professionals
• Other IT Professionals
• Knowledge Workers
• Managers and Executives
• Customers and Collaborators
• Business Users
Metrics:
• Meta Data Quality
• Master Data Service Data
Compliance
• Meta-data Repository Contribution
• Meta-data Documentation Quality
• Steward Representation /
Coverage
• Meta-data Usage / Reference
• Meta-data Management Maturity
• Meta-data Repository Availability
Activities: (P) – Planning (C) – Control (D) – Development (O) - Operational
Where do you encounter metadata
every day?
MetaData
D AT A
M E T A D AT A
MetaData
Where else do you use metadata
every day?
Exercise
Where do YOU encounter MetaData
every day?
Types of Meta-data

Business meta-data
Relates business perspective to the meta-data user (e.g.
business data definitions, regulatory or contractual
constraints, data quality statements).

Technical and Operational meta-data
Targeted at IT operations users’ needs (e.g. data archiving
and retention rules, audit rules, recovery and backup rules)

Process meta-data
Other system elements (e.g. data stores involved, process
name, roles and responsibilities)

Data Stewardship meta-data
Data about stewards and stewardship processes (e.g. Data
Owners, Data Subject Areas, Data Users, Data Stewards).
Meta-data Architecture
 Centralised Meta-data Architecture
Centralised architecture consists of a single meta-data
repository that contains copies of live meta-data from various
sources
 Distributed Meta-data Architecture
A single access point. The meta-data retrieval engine responds
to user requests by retrieving data from source systems in real
time; there is no persistent repository.
 Hybrid Meta-data Architecture
A combined alternative. Meta-data still moves directly from
the source systems into the repository, however, repository
design only accounts for the user-added meta-data, the critical
standardised items and the additions from manual sources.
Industry Meta-data Standards
 OMG (Common Warehouse Meta-data (CWM), Information Management
Metamodel (IMM), MDC Open Information Model (OIM), XML, UML, SQL)
 World Wide Web Consortium (W3C): RDF (Relational Defintion Framework)
 Dublin Core: Dublin Core Meta-data Initiative (DCMI)
 Distributed Management Task Force (DTMF): Web-based Enterprise
Management (WBEM)
 Meta-data standards for unstructured data
Data Quality Management
Data Quality Management Cycle

The Data Management Body of Knowledge identifies 4 key activities necessary for operationalising DQM:
Acting to resolve any identifies
issues to improve data quality
and better meet business
expectations
Planning for the assessment of
the current state and
identification of key metrics for
measuring data quality
DEMING CYCLE
(continuous improvement
Monitoring and measuring
the levels in relation to the
defined business
expectations
Deploying processes for
measuring and improving
the quality of data
What is Data Quality Management?
› Poor Data Quality Management does not equate to
poor data quality
› But when you don’t have good Data Quality
Management…
» The current level of data quality will be unknown
» Maintaining a sufficient level of data quality will be a result
of ‘winging it’ and the sheer persistence of talent
» The risk to the business will increase
› It is infinitely more sensible to ensure good data
quality by having good
management through
a coherent set of
policies, standards,
processes and
supporting technology
“Ultimately, poor data quality is
like dirt on the windshield. You
may be able to drive for a long
time with slowly degrading vision,
but at some point you either have
to stop and clear the windshield
or risk everything”
Ken Orr, The Cutter Consortium
“Data errors can cost a company
millions of dollars, alienate
customers, suppliers and business
partners, and make implementing
new strategies difficult or even
impossible.
The very existence of an
organisation can be threatened by
poor data”
Joe Peppard – European School of
Management and Technology
So How Good Does Data Quality Need To Be?
Answer: It depends…
In February 2011, the UK government
launched a crime-mapping website for
England and Wales (www.police.uk).
Unfortunately, for a number of reasons,
the postcode allocated to a specific
police incident didn’t always correspond
to the precise location of the crime.
The net result was that poor accuracy in
the recording of geographical
information led many quiet residential
streets to be incorrectly identified as
crime hotspots.
In the context of creating
aggregated statistics to
assess relative crime rates
between counties, the
data quality is perfectly
acceptable.
However, if the same data
is used by an insurance
company, there is an issue
for the homeowners who
receive inflated home
insurance premiums.
Data fit
for purpose
Data not fit for
purpose
Data quality can only be considered within the context of the intended use of the data
Data needs to be “fit for purpose”
Data quality needs to be assessed on that basis
Benefit and Impact
Good data quality
benefit
Poor data quality
impact
Adherence to corporate &
Regulatory acts
Improved confidence in Data
Reduced “busy work” in data
archaeology
Enriched Customer Satisfaction
Better decision making
Effective Marketing and Advertising
Cost efficiencies
Improved Operational Efficiency &
streamlining
Ineffectual Advertising & Marketing
Reputational damage
Diminished Regulatory Compliance
Decrease in Customer Satisfaction
Uneconomical Business Processes
Compromised Health, Safety &
Security
Erratic Business Intelligence
Amplified Corporate Risk
Impaired Business Agility
What can & can’t be achieved with DQ?
Can:
Can’t:
• Make order from chaos
• Be solely responsible for managing data
• Drive business accountability for
• Perform miracles to create “data
enterprise data
perfection”
• Keep track of data assets: where they’re • Magically fix all historic data quality
stored, who’s got access, and how often
issues
they are cleansed and checked.
• Ensure data quality processes are
established
Dimensions of Data Quality
› Validity– Data are valid if it
› Completeness– The
proportion of stored data
against the potential of "100%
complete" Business rules
define what "100% complete"
represents.
COMPLETENESS
CONSISTENCY
› Uniqueness– No thing will be
recorded more than once
based upon how that thing is
identified. The Data item
measured against itself or its
counterpart in another data
set or database.
› Timeliness– The degree to
which data represent reality
from the required point in
time. The time the real world
event being recorded
occurred.
UNIQUENESS
Data
Quality
Dimensions
ACCURACY
› Accuracy– The degree to which data
TIMELINESS
VALIDITY
conforms to the syntax (format, type,
range) of its definition. Database,
metadata or documentation rules as
to the allowable types (string, integer,
floating point etc.), the format
(length, number of digits etc.) and
range (minimum, maximum or
contained within a set of allowable
values).
correctly describes the "real world"
object or event being described. The
degree to which data correctly
describes the "real world" object or
event being described.
› Consistency– The absence of
difference, when comparing two or
more representations of a thing
against a definition. The absence of
difference, when comparing two or
more representations of a thing
against a definition
Source: DAMA UK
Data Profiling, Analysis & Assessment
1. Identify a data set for review
2. Catalogue the business uses of that data set
3. Subject the data set to empirical analysis using data
profiling tools
4. List all potential anomalies
5. For each anomaly:
› Review with SME to determine if it represents a true
data flaw
› Evaluate potential business impacts
6. Prioritise criticality of important anomalies in
preparation for defining data metrics
Typical Outputs of Data Quality Profiling
COLUMN PROFILING
FREQUENCY ANALYSIS
PRIMARY/FOREIGN KEY
ANALYSIS
DUPLICATE ANALYSIS
BUSINESS RULES
CONFORMANCE
OUTLIER ANALYSIS
•Record count, unique count, null count, blank count, pattern count
•Minimum, maximum, mean, mode, median, standard deviation,
standard error
•Completeness (% of non-null records)
•Data type (defined v actual)
•Primary key candidates
•Count/percentage each distinct value
•Count/percentage each distinct character pattern
•Candidate primary/foreign key relationships
•Referential integrity checks between tables
•Identification of potential duplicate records (with variable sensitivity)
•Using a preliminary set of business rules
•Identification of possible out of range values or anomalous records
Data Quality Business Rules










Value domain membership
Definitional Conformance
Range conformance
Format compliance
Mapping conformance
Value presence and record completeness
Consistency rules
Accuracy verification
Uniqueness verification
Timeliness validation
Christopher Bradley
INFORMATION MANAGEMENT STRATEGIST
Chris.Bradley@DMAdvisors.co.uk
+44 7973 184475 (mobile) +44 1225 923000 (office)
@inforacer
TRAINING
ADVISORY
uk.linkedin.com/in/christophermichaelbradley/
infomanagementlifeandpetrol.blogspot.com
C O N S U LT I N G
C E R T I F I C AT I O N
Download