Data Architecture

advertisement
Enterprise Data Architecture
October 28, 2004
Revision: 1.0
Status: Draft
Prepared by
Ralph C. Alderson
Senior Consultant
Third Coast Software Foundry
Status: Working indicates the document may not be complete or reliable
Draft indicates the author considers the document accurate and complete, still under coordination
Recommended indicates the author and core team consider the document ready for approval
Published indicates the document is reviewed, approved and ready for use
Enterprise Data Architecture
What is Data Architecture?
Data architecture is where the rubber meets the sky.
– Neil Snodgrass, Data Architecture Consultant, Answerthink
Even among IT practitioners, there is a general misunderstanding (or perhaps more accurately, a
lack of understanding) of what Data Architecture is, and what it provides. In general, Data
Architecture is a master plan of the enterprise data locations, data flows, and data availability.
It is a conceptual infrastructure to support data quality, data stewardship, data integration, data
migration, and system collaboration. This infrastructure embodies a set of guidelines and
standards which ensure that the data assets are managed appropriately, and that they
conform to sanctioned principles for stewardship and quality.
Data Architecture is the discipline of designing, creating, and maintaining this infrastructure. It
must accommodate the data and information needs of the company and do so in a manner
which promotes high reliability and easy data integration among applications and data
repositories. The most visible and tangible product of effective Data Architecture is a reporting
environment that 1) provides a single version of the corporate “truth” 2) allows business analysts
to discover new insights, and 3) allows business executives and corporate decision makers to
derive corporate strategies and actionable tactics from their data. Such a reporting
environment usually entails one or more data warehouses, and one or more departmental or
“competency” data marts.1
The architecture describes how data flows from corporate transactions, through the various
layers of transformation and integration, through operational data stores, all the way to the
decision-support applications. It is an infrastructure that, when properly implemented, (i.e.
follows the architecture and conforms to the corporation’s suite of “best practices”) guarantees
the three benefits of the reporting environment described above.
As the humorous quote at the beginning of this paper indicates, Data Architecture often seems
somewhat nebulous as there is no physical manifestation (like an executable program manifests
programming code, or like a relational database manifests an entity relationship data model).
Data Architecture has no programmatic instantiation and exists only as standards, policies, and
corporate “best practices.” It resides only in the artifacts (text documents and pictorial
diagrams) which describe it, and in the “tribal knowledge” of the enterprise. The artifacts which
describe it are the blueprint of the architecture, and serve a similar function for building reliable
systems as a building architect’s blueprint serves for building a house.
1
A data warehouse can be built without Enterprise Data Architecture, but it is highly inadvisable. Likewise, a data architecture can
exist for an enterprise that is not doing any data warehousing, but provides the optimal benefit to the corporation when it establishes
the blueprint for integrating enterprise data into a data warehouse.
Status: Working indicates the document may not be complete or reliable
Draft indicates the author considers the document accurate and complete, still under coordination
Recommended indicates the author and core team consider the document ready for approval
Published indicates the document is reviewed, approved and ready for use
Page 2
3/7/2016
A corporation’s Data Architecture is a mirror of the data and information generated and
captured by the enterprise in order to do its business. It describes the business rules and the
concepts which are critical for the enterprise to operate efficiently. It offers a “seal of approval”
on the reliability of the data, and guarantees that corporate decision makers can make wellinformed, fact-based decisions on policies and strategies. It provides for a sanctioned plan for
stewardship of the data assets of the corporation, and determines the rules on how data gets
created, how it moves through the enterprise, and how it gets consumed.
Indeed, Data Architecture influences everything in the enterprise which “touches” the data. It
motivates data polices, influences corporate goals, enables strategies for achieving those goals,
and validates the tactics which implement those strategies. It encompasses all systems and
programs in which data originates, in which data is transformed and/or cleansed, and in which
data is migrated to, or integrated with, other systems.
By standardizing data definitions, data formats, and the acceptable storage, integration, and
usage of the data, the architecture prepares the environment for data management, and it is
by invigorating these standards that the powerful benefits of the Data Architecture (high data
quality and unquestionable data reliability) are enabled. Also, by dictating how data gets
integrated, migrated, cleansed, and transformed, Data Architecture provides a plug-and-play
framework for data warehousing.
Status: Working indicates the document may not be complete or reliable
Draft indicates the author considers the document accurate and complete, still under coordination
Recommended indicates the author and core team consider the document ready for approval
Published indicates the document is reviewed, approved and ready for use
3/7/2016
Page 3
Figure 1. A Typical Data Architecture Environment
What are the artifacts and deliverables of Data Architecture?
Since Data Architecture is a conceptual and abstract discipline, it has no simple representation that
one can point to and say, “That’s Data Architecture.” It encompasses everything a company
captures and maintains in the realm of data and information (see Figure 1). Having such a broad
scope and impact, and such a high level of abstraction, it requires some imagination to conceive
and understand what it is all about.
The one artifact that comes closest to capturing the essence of Data Architecture is a high-level
data-flow diagram (Figure 2). But data flow is only one aspect of a complete architecture. There
must be rules about how data flows or migrates through the information systems, and there must be
a crystal clear understanding throughout the IT realm of which subject areas and concepts are
important to the company’s business model. In addition there must be an enterprise-wide
agreement as to the semantics of those concepts in all possible contexts (within the business model).
Figure 2. – Data Flow Diagram
Status: Working indicates the document may not be complete or reliable
Draft indicates the author considers the document accurate and complete, still under coordination
Recommended indicates the author and core team consider the document ready for approval
Published indicates the document is reviewed, approved and ready for use
Page 4
3/7/2016
A fundamental goal of the architecture is to have absolutely unquestionable data quality and
reliability. Semantic clarity is the first step, but disciplined stewardship of the data, the concepts, and
the business rules is the only way to move forward, past that first step, to achieve a robust and
effective architecture.
In order to complete the picture, and implement the type of data environment which an ideal Data
Architecture provides, there must be:



Inspired analysis and design of the overall architecture
Corporate sanction of the architecture’s goals
Enforced compliance with the architecture’s rules
The following deliverables and artifacts of the Data Architecture are designed to ensure that these
three principles are delivered to the information systems which are destined to utilize the
architecture. This is not a mandatory or an all-inclusive list. It is simply a recommended
methodology, and does not preempt a different approach utilizing other documents and principles
to achieve the desired environment.
Business Concept Definitions
Having corporate sanctioned definitions for the concepts which animate a company’s
business model is the single most important element of Data Architecture. None of the
major benefits of the architecture will accrue without them. Yet business concept
definitions are often overlooked (or worse, purposely ignored) because (to many IT
practitioners) it seems painfully like “documentation for documentation’s sake”. Nothing
within the realm of enterprise data could be further from the truth.
Semantic clarity is mandatory for getting the full utility and all of the collateral benefits of
enterprise Data Architecture. Unless all systems and programs agree on a single definition
for each and every critical business concept, then there can not be any reliable data
migration, data integration, data cleansing, or data warehousing.
Analysts and
executives who query the data warehouse(s) would have little or no reason for confidence
in the accuracy of the information which is presented to them.
Data Stewardship Agreements
Stewardship is a vital element of any Data Architecture. Data stewards ensure the quality,
accessibility, and protection of the data, and define the data standards (data definitions,
concept definitions, data formats, and data domains). They are the guardians and
maintainers of the Data Architecture. They ensure that there is a single data store of
record (DSOR) for the vertical stripe of data which they are stewarding, and they prevent
non-conforming data silos from participating in the architecture.
Stewardship agreements are corporate documents that grant stewardship responsibilities
to a person, initiative, or department, and need the advice and consent of the CIO or a
CIO designate.
Stewards are typically positioned at a high level of corporate
responsibility, e.g. V.P or Director.
Data Sharing Agreements
Data sharing agreements are corporate documents that describe the data, where it is
located, who protects it, and who can access it. Most data should be freely available
Status: Working indicates the document may not be complete or reliable
Draft indicates the author considers the document accurate and complete, still under coordination
Recommended indicates the author and core team consider the document ready for approval
Published indicates the document is reviewed, approved and ready for use
Page 5
throughout the enterprise. But some sensitive data needs to be restricted. The data
sharing agreement, signed by all interested parties describes who can access the
restricted data, when it is available, and how the access is accomplished.
Even data that is not sensitive needs to be certified as “sharable.” Entities within the
enterprise that want access to the DSOR for a concept need to be certified as conforming
to the standards maintained for that concept (see Data Standards, below).
Data Usage Models (Stewardship Matrix)
Anyone who has been in Information Systems very long has heard of, and probably used,
a diagram known as a CRUD matrix. CRUD stands for (C)reate, (R)ead, (U)pdate, and
(D)elete, and details the data usage for an application, a system, or an initiative. The
Data Usage Model (sometimes called a Stewardship Matrix) extends the old-fashioned
CRUD matrix so that one can, at a glance, not only see how each application interacts
with a given concept, but which application data store is the data store of record (DSOR)
for each concept. The system which has the DSOR for a concept inherits the stewardship
responsibilities for that business concept, and is obliged to:
1.
Get enterprise-wide agreement of a definition for that concept
2.
Document all of the business rules that pertain to the concept
3.
Determine who (which systems and employee types) can see and use that data
(via Data Sharing agreements discussed above), and
4.
Maintain the integrity of the concept (by setting enterprise-wide data definitions,
data formats, and data domains for the concept).
Data Standards (Definition, Format, and Domain)
Data definitions are often captured in modeling tools like Erwin, and then propagated to
the physical database in the form of comments on tables, columns, and relationships.
They quite frequently can come directly from the Business Concept Definition document
(see above). The DSOR for a concept contains the sanctioned definitions which relate to
the concept and its attributes. Similarly, the DSOR should be considered the sanctioned
format for the data attributes for a concept, and for the valid domain values for that
concept.
An important criterion in data sharing is to make sure that all parties which want to use the
data must define that data in exactly the same way – in entity and attribute definitions, in
format, and in domain values. This is crucial to having certifiably correct reports, and a
high level of data quality.
Where definitions, formats, or domains are different, it is hard to rationalize that both sides
of the data sharing are, indeed, talking about the same concept, and before a sharing
agreement can be executed and sanctioned by the enterprise (with signatures of
appropriate parties) one side or the other must change and conform to the other (or both
sides can change and use a negotiated settlement to remediate the differences).
Data Warehouse Artifacts
Data warehouses have many artifacts and deliverables.
All of the artifacts and
deliverables mentioned here for Data Architecture will be utilized in building a data
warehouse.
Status: Working indicates the document may not be complete or reliable
Draft indicates the author considers the document accurate and complete, still under coordination
Recommended indicates the author and core team consider the document ready for approval
Published indicates the document is reviewed, approved and ready for use
3/7/2016
Page 6
Data Flow Diagrams
Many in Information Systems think of data flow diagrams (DFD) as being equivalent to
Data Architecture – as being The Architecture. DFDs are a vital tool for conveying the
scope and boundaries of the architecture, but, (as we hope we have demonstrated in this
white paper) they are only a tool, and only one of many.
DFDs describe how data flows throughout the enterprise – from creation of the data,
through various layers of refinement, cleansing, and transformation, to the consumption of
the data on reports, executive dashboards, or display screens. They are a key to
documenting the overall architecture, and are a very useful starting place for the data
mapping used by cleansing initiatives or for ETLs which load the data warehouse.
Conceptual Models
Conceptual models are diagrams that summarize all of the critical and interesting
concepts which are inherent in the business, and the relationships among them. A very
high-level conceptual model diagrammatically details only the subject areas (e.g.
Finance, Human Resources, Products, etc.) of interest, and the relationships between
subject areas and concepts. This type of model is called, naturally, a Subject Area Model.
The next lower level of detail is captured by a concept model (sometimes called a data
planning model) which depicts each interesting concept and the relationships among the
concepts. One method of portraying this model is with an un-attributed entity relationship
(ER) model. Indeed, most (if not all) of the business concepts will end up being fullyattributed entities in one or more logical models which support one or more transactional
systems. The relationships between concepts in this type of model conform naturally
enough to the concept of relationships in ER modeling.
Another very effective technique for conceptual modeling is a formal modeling notation
known as Object-Role Modeling (ORM). Object-Role modeling was designed for this
purpose, and allows useful insights into the concepts and relationships which might be
overlooked using the traditional ER modeling notation. ORM is sometimes eschewed as
being too tedious, but this is due mostly to a lack of good graphical tools designed to
support the technique.
Logical Models
If you have undertaken the discipline of creating conceptual models, you will find that the
logical models evolve from the conceptual ones quite naturally. The major concepts
become entities, and many of the minor ones become attributes for those entities.
Physical Models
Physical models are dependent on the choice of DBMS used, and are in the domain of the
DBAs. Whereas the physical representation is definitely an artifact of the architecture, its
main purpose is to document where (what DBMS, what database, and how the concepts
and entities had to be modified (if at all) in order to become a column in a table. The
physical residence of business concepts is an important piece of information for Data
Sharing Agreements.
Status: Working indicates the document may not be complete or reliable
Draft indicates the author considers the document accurate and complete, still under coordination
Recommended indicates the author and core team consider the document ready for approval
Published indicates the document is reviewed, approved and ready for use
3/7/2016
3/7/2016
Page 7
Metadata Standards and Maintenance
Metadata is the sum of all of the corporate knowledge about the corporation’s business
processes and the data that qualifies and quantifies it. There are two types of metadata:
technical and business.
Technical metadata is used by Information Technology practitioners to standardize,
categorize, and define the data structures used to capture information in databases.
Technical metadata describes the physical properties of the data, how it relates to other
data, and mappings between sources and destinations of data that is moving through the
system(s). It is invaluable for standardizing the data formats, definitions and domains across
systems.
Business metadata is used to guide the system users (data consumers) through the data
and the problems they are trying to solve with it. It provides, on a fundamental level, basic
description information for the data fields. At a more robust level, it provides the
foundation for understanding the content and source of the information. The business
metadata provides a conceptual context for the technical metadata, and is often
undocumented, only to remain as “tribal knowledge.”
Accurately capturing and
standardizing business metadata is always an important challenge for Data Architecture.
Does my company need Data Architecture?
It’s hard to imagine a company that wouldn’t benefit from a well-designed and robust Data
Architecture, but for some companies it is absolutely critical. Here are typical circumstances
under which a formal Data Architecture is mandated for a company:

It is building (or anticipates building) a data warehouse (or data marts) – mandated to
remediate potential data quality and data reliability issues.

It is building (or anticipates building) an operational data store (ODS) – potential data
integration and data quality issues.

It is pursuing a six-sigma strategy – data quality and reliability issues.

It is pursuing ISO certification – data quality and reliability issues.

Enterprise data is used to analyze operations:
o
To discover marketplace opportunities – data quality and reliability issues.
o
To create marketing strategies – data quality and reliability issues.
o
To fine tune and optimize operations – data quality and reliability issues.

Enterprise data is used as the basis for high-level decision-making – data quality and
reliability issues.

It believes that enterprise data is a corporate asset that needs to be leveraged and
protected. – data stewardship issues

It wants to get “the best and the most from its enterprise data”
integration, and stewardship issues.
©
– data quality, reliability,
Status: Working indicates the document may not be complete or reliable
Draft indicates the author considers the document accurate and complete, still under coordination
Recommended indicates the author and core team consider the document ready for approval
Published indicates the document is reviewed, approved and ready for use
Page 8
3/7/2016
What are the benefits that Data Architecture provides?
At the very least, Data Architecture provides a high-level map of the data topology for an
enterprise. It describes how the data originates, where it resides, where it migrates, what
transformations are applied to it to cleanse and standardize it, and what it means (the
semantics). At its best, it goes way beyond this simple documentation, and becomes an active
principle that lives within the data, energizing and leveraging it in a multitude of ways. The data
becomes an organic corporate asset that invigorates the enterprise and provides a clear path
to the realization of the corporate vision, goals, and strategies.
To someone that has never experienced a robust and inspired Data Architecture in action, this
may sound a little like poetic license or hyperbole. But it truly is not. Metaphors aside, corporate
personnel who discover the synergistic benefits of Data Architecture for the first time, are often
amazed at how they ever functioned without it.
Data that once was suspect or needed “tweaking” in order to balance the books, becomes as
reliable as “Old Faithful.” Analysts who once complained that the reliability of the data made
their analysis contrived and incomplete, become ardent converts, clamoring for more
bandwidth to allow their heuristics to discover all of the exciting possibilities that are contained in
their newly invigorated data warehouse. Data warehouse developers who previously spent
many hours of overtime trying to shoe-horn data from legacy systems into the warehouse,
happily discover that ETLs and data maps become self-revealing, and the data warehouse is
found to be the software equivalent of “plug-and-play.” Executives who had struggled to find
meaning in their daily, weekly and monthly reports, now discover nuggets of information which
inspire new visions, and blaze new trails to outsmart and outmaneuver the competition.
Because of guaranteed data reliability and the framework which enables death-defying data
transformations, Data Architecture can have a positive impact on virtually every operational
function, every department, and every profit center. The artifacts describe how this should
happen: Data Stewards enable semantic clarity and enforce the standards. Data analysts and
planners set the policies and discover the vision. Program and project managers instantiate the
ideals. Data integrators become empowered to fold all data into a single vocabulary, whether
they are dealing with existing disparate systems, new system development, or third-party
packaged system. And everyone throughout the enterprise finds a new appreciation and
respect for the data that pulses through the architecture’s veins.
Status: Working indicates the document may not be complete or reliable
Draft indicates the author considers the document accurate and complete, still under coordination
Recommended indicates the author and core team consider the document ready for approval
Published indicates the document is reviewed, approved and ready for use
Page 9
3/7/2016
Ok, it sounds like we could benefit from formal Data Architecture. How do we proceed?
An experienced data architect can analyze and document the current data environment,
determine which aspects need refinement, enhancement, or extension, and devise a road map
for achieving a best-practice Data Architecture which will allow the enterprise to get the best
and most from its data ©. If no such architect exists in-house, there are many qualified
consultants who can provide the experience, acumen, and level of expertise to design a plan
for achieving the desired benefits. A data architect may be utilized to analyze the current
environment only, or provide a complete architecture implementation – from capturing
metadata, to defining concepts, to implementing a data warehouse.
Optimal Data Architectures are flexible and can be implemented in stages. The key is to have a
high-level plan which accounts for the goals and aspirations of the enterprise. Once that is in
place, the benefits of Data Architecture can be prioritized and implemented in a seamless,
phased-in approach that accommodates the specific needs of any organization.
Ralph C. Alderson is a Senior Consultant with Third Coast Software Foundry, Austin, Texas, who specializes in Data Architecture
and data-related issues.
Status: Working indicates the document may not be complete or reliable
Draft indicates the author considers the document accurate and complete, still under coordination
Recommended indicates the author and core team consider the document ready for approval
Published indicates the document is reviewed, approved and ready for use
Download