ET_MDD_v1.0_2014-03

advertisement
Project:
Title:
Version: 1.0
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
PhUse
Emerging Technology Working Group
Metadata definitions
Document1
Page 1 of 44
Project:
Title:
Version: 1.0
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
Table of Contents
1
INTRODUCTION: PURPOSE OF THIS DOCUMENT .................................................................... 4
2
SCOPE ............................................................................................................................................ 4
3
DEFINITIONS .................................................................................................................................. 5
3.1 METADATA MANAGEMENT .................................................................................................... 5
3.1.1 Metadata ...................................................................................................................... 5
3.1.2 Structural metadata ...................................................................................................... 6
3.1.3 Descriptive metadata ................................................................................................... 7
3.1.4 Study Instance Metadata ............................................................................................. 9
3.1.5 Metadata repository ..................................................................................................... 9
3.1.6 Metadata registry ....................................................................................................... 11
3.1.7 Data element .............................................................................................................. 12
3.1.8 Attribute ...................................................................................................................... 13
3.1.9 Class .......................................................................................................................... 14
3.1.10 Data type .................................................................................................................... 15
3.1.11 Value level metadata (VLM) ....................................................................................... 16
3.2 CONTROLLED TERMINOLOGY, CODE SYSTEMS & VALUE SETS .................................. 19
3.2.1 Controlled Terminology/controlled vocabulary ........................................................... 19
3.2.2 Code system .............................................................................................................. 21
3.2.3 Dictionary ................................................................................................................... 22
3.2.4 Concept ...................................................................................................................... 22
3.2.5 Code ........................................................................................................................... 23
3.2.6 Code list ..................................................................................................................... 24
3.2.7 Value set .................................................................................................................... 24
3.3 MASTER DATA MANAGEMENT ............................................................................................ 27
3.3.1 Master Data ................................................................................................................ 27
3.3.2 (Master) Reference Data ........................................................................................... 30
3.3.3 Master Data Management ......................................................................................... 31
3.4 INTEROPERABILITY .............................................................................................................. 33
Categorization of Interoperability (by HL7) ................................................................................... 33
3.4.1 Technical interoperability (“machine interoperability”) ............................................... 33
3.4.2 Semantic interoperability ............................................................................................ 34
3.4.3 Process Interoperability ............................................................................................. 35
3.5 DATA AGGREGATION, INTEGRATION, POOLING ............................................................. 37
3.5.1 Data pooling ............................................................................................................... 37
3.5.2 Data integration .......................................................................................................... 38
3.5.3 Data aggregation........................................................................................................ 39
4
APPENDICES ............................................................................................................................... 41
4.1 CDISC GLOSSARY ................................................................................................................ 41
4.2 RELATED DOCUMENT .......................................................................................................... 41
4.3 WORKING GROUP MEMBERS ............................................................................................. 41
Document1
Page 2 of 44
Project:
Title:
Version: 1.0
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
5
PARKING LOG OF IMPLEMENTATION ...................................................................................... 43
6
GENERAL COMMENTS (TO BE TAKEN OUT FOR FINAL DOCUMENT) ................................ 43
Document1
Page 3 of 44
Project:
Title:
Version: 1.0
1
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
INTRODUCTION: purpose of this document
This document provides agreed definitions within the PhUse CSS working group around metadata
management and related aspects across the industry. It is expected that these definitions will be re-used
in the FDA guidelines as cross industry definitions.
To be of operational value, the document contains not only definitions but also a short description and
example of use. Whenever possible, the definitions are built from those existing definitions from FDA
guidance's, CDISC glossary, check cross industry definition (e.g. Gartner). Reference to the source
definition is provided either directly with the definition or in the reference section.
This document does not intend to be extensive and complete. It is intended to bring clarification on the
most commonly used (and misused!) definition in our industry around metadata and master data
management;
The CDISC glossary [CDISC1] (and document in attachment) is used as reference in this document. It is
expected that the reader of this document is familiar with the abbreviations and Synonyms contained in
the CDISC glossary; these are not repeated here.
2
SCOPE
The following topic areas are in scope of this document
• Metadata management.
• Master data management
• Controlled terminology, code system, value set
• Data pooling, data integration, data aggregation
• Interoperability, semantic interoperability
Definitions are provided per topic area to ease reading and structure of this document.
Document1
Page 4 of 44
Project:
Title:
Metadata Management
Metadata Definitions
Date: 12th March 2014
Version: 1.0
3
Working Group:
Emerging Technologies
DEFINITIONS
3.1
Metadata management
(Organization Level)
Study
Metadata
Metadata
Structural
Metadata
Descriptive
Metadata
Semantic
Descriptive
Metadata
3.1.1
(Study Level)
Study Structural
Metadata
Study Descriptive
Metadata
Process
Descriptive
Metadata
Metadata
Synonym
Definition
source
& 


Description
Document1
Wikipedia. The term metadata refers to "data about data". The term is ambiguous,
as it is used for two fundamentally different concepts (types).
o Structural metadata is about the design and specification of data
structures and is more properly called "data about the containers of data";
o Descriptive metadata, on the other hand, is about individual instances of
application data, the data content. In this case, a useful description
ISO 11179. “Descriptive data about an object [ISO/IEC 20944-1]”. Thus, metadata
is a kind of data.
Adrienne Tannenbaum, Metadata Solutions:
o "Metadata: the detailed description of the instance data; the format and
characteristics of populated instance data; instances and values depending
on the role of the metadata recipient." and "Instance data: That which is
input into a receiving tool, application, database, or simple processing
engine".
o Meta metadata “The descriptive details of metadata; metadata qualities
and locations that allow tool-based processing and access; the basic
attributes of metadata solutions:”
Metadata describe instance data.
 Instance data are data stored in a computer as the result of data entry by a
person or data processing by an application.
 A metadata can become an instance data described itself by a level 2 metadata
(or meta metadata)
o Each CDISC standard or instance of a standard defined could be
considered an object. That object will have properties that describe
Page 5 of 44
Project:
Title:
Metadata Management
Metadata Definitions
Date: 12th March 2014
Version: 1.0
o
Working Group:
Emerging Technologies
the operations that can be performed on it and by whom; i.e, Global
SDTM objects -standard template definitions for SDTM standard
domains for each version of the standard- can be copied and a few
properties adjusted (instantiated at a compound level or study level to
force the inclusion of PERM variables and define some of themor some
EXP variables as Mandatory). The available "Copy" operation and the
available "properties that can be changed" and associated "values
permitted to change (from x to y)" are metadata elements to be used
by the corresponding MDR processing tool to instantiate that object.
The relationships among standards can be considered meta-metadata
so that "conversion" or "visualization" tools can relate data elements
as they move from one instance of data to other data instance of the
data. – mapping
There are 2 types of metadata (see below for more details description and examples)
Example

Structural metadata

Descriptive metadata
See structural metadata and descriptive metadata
Recommended Descriptive data about an object
definition
3.1.2
Structural metadata
Synonym
Definition
source
Standard metadata
& 

http://en.wikipedia.org/wiki/Metadata
The design and specification of data structures (e.g. format, semantic, ..), cannot
be “data about data”, because at design time the application contains no data. In
this case the correct description would be "data/information about the containers
of data".
[FDA1]
Structural metadata is structured information that describes, explains, or
otherwise makes it easier to retrieve, use, or manage data.
Description
Document1
Structural metadata is what most of people mean by metadata. Structural metadata is
said to “give meaning to data” or to put data “in context.”
Subset of structural metadata as legacy data, without standards, also have structural
metadata
Key components of structural metadata include data domains, data elements,
terminology, data mappings and transformations, and data derivations.
The successful usage of structural metadata requires data standards governance that
should include:
 workflows to address the creation and/or revision of structural metadata
 version control of structural metadata and study instance metadata (see definition
Page 6 of 44
Project:
Title:
Version: 1.0

Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
below)
access control, by user role
Standards metadata (a subset of structural metadata), is the source from which the
study instance metadata (see below) is built.
Example
The number 120 itself is meaningless without structural metadata such as


The name of the variable (e.g. Systolic Blood Pressure) with its definition
The unit related to this physical quantity (e.g; Systolic Blood Pressure Unit =
mmHG)
CDISC SDTM is the data standard approved across the industry for clinical data to be
transferred to the FDA.

For instance the variable “Sex” is described by a set of structural meta data such as
the label, data type (char) and associated value sets (male and female, ..), role in
SDTM, …

The metadata for the AE (Adverse Event) SDTM domain that is compliant with the
CDISC SDTM Implementation Guide (Version 3.1.3) consists of attributes such as
Variable Name, Variable Label, Type, Controlled Terms, Role, etc.
A data model - describing the classes, attributes, relationships and hierarchies –
constitutes the structural metadata of the underlying data base.
Recommended In pharmaceutical research, structural metadata describes the instance data that are
definition
collected and derived during clinical research across different processes and systems.
As such they facilitate clinical software re-use and thus business process efficiency.
Structural metadata is defined, maintained, and governed at the level of an
organisation (pharma company, CRO, CDISC, ..) across all projects; at the study level, it
is the study instance metadata - extracted from the structural metadata – which is of
application.
3.1.3
Descriptive metadata
Synonym
Definition
source
Description
Document1
Process metadata (subset of descriptive metadata)
Semantic metadata (subset of descriptive metadata)
& 
http://en.wikipedia.org/wiki/Metadata
The individual instances of application data, the data content. In this case, a useful
description would be "data about data content" or "content about content".

Ralph Kimball's "Process metadata describes the results of various operations in a
data warehouse."
It is used in different contexts
 Data operations and statistical analysis (semantic metadata) Additional content
on the data that support further analysis of the data. For instance patient
population in the context of a clinical trial study is descriptive metadata
Page 7 of 44
Project:
Title:
Version: 1.0

Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
Software implementation (process metadata): describes the results of various
operations happening in an application, be it in a data warehouse or any other
application. This includes
o processes used to reformat (convert) or transcode content.
o all information needed to support data lineage & traceability
o details of origin and usage (including start and end times for creation,
updates and access).
Descriptive metadata is often a key enabler in deriving business value from data
through both direct relationships and indirect relationships between instance data. In
effect, it creates the “how”, “where”, “who”, and “when” for the instance data.
Example

“How” - how the instance data is used within the info flow

“Where” - source of the instance data

“Who” - who created, modified and approved the instance data

“When” - versioning info of the instance data

Data operations and statistical analysis (semantic metadata): patient population,
indication, therapeutic area

Software implementation (process metadata):
o metadata needed for the effective management of version control for
structural metadata: UserID who executed the last modification, date of
the last modification,UserID who approved the last modification.
o metadata needed for the effective management of instance data:
o what is source of the data, in which system(s) is it authored
o which transformation happened to the data, how, when, by whom
o metadata needed for managing access control: different roles for
accessing information and which action can they can perform (create,
read, update, delete)
o Audit trail: who access which information, when
Recommended In pharmaceutical research, descriptive metadata describes process or domain-specific
definition
information about instance data collected and derived during clinical research. It
provides conceptual, contextual, and processing information for instance data and as
such descriptive metadata is a key enabler in deriving business value from instance
data. It can also provide greater depth and more insight about the "container" of the
data, whether it is a file, document, or representation.
Descriptive metadata is defined itself by structural metadata; it is generated by
systems or people.
3.1.4
Study Instance Metadata
Synonym
Document1
Study Data Standards or Study Specific Structural metadata (subset of Study Instance
metadata)
Page 8 of 44
Project:
Title:
Metadata Management
Metadata Definitions
Version: 1.0
Definition
source
Description
Date: 12th March 2014
Working Group:
Emerging Technologies
& (no source found)

Study Instance metadata is a defined grouping of metadata that serves as the most
complete representation of the metadata that defines an individual study.

It is commonly thought of as the set of metadata that is actually consumed by the
clinical technology platform to facilitate processes that are more automated and
consistent.
Study Instance Metadata consists of Structural metadata and some Descriptive
metadata to support the management of the Study Instance Metadata

Example of Study Instance Structural metadata: subset of SDTM data domains and
variables needed to collect and derive instance data for a specific study

Example of Study Instance Descriptive metadata. For a Statistical Computing
Environment (SCE) that is leveraging metadata to automate the production of
TLFs, the Study Instance Descriptive metadata could include study-specific
selections that help the SCE process the metadata, such as the selection of BY
variables to determine appropriate breaks for a table in that particular study.
The Study Instance Structural Metadata is extracted from the Structural metadata
maintained at the enterprise/organisation level; is therefore a subset of the enterprise
Structural metadata.
The Study Instance Metadata is exported to and consumed by the clinical data
platform to ensure maximal automation and consistency of the processes for trial
design, execution, storage, analysis, and submission.
Example
see above
Recommended Add a definition here
definition
3.1.5
Metadata repository
Synonym
Definition
source
Metadata registry
& http://datadictionary.blogspot.com/2008/03/metadata-repositories-vs-metadata.html
Definitions from Dr. Data Dictionary site - a place, room, or container where something
is deposited or stored. Note that here is nothing in this definition about the quality of
the things being stored or the process to check to see if new incoming items are
duplicates of things already in the repository. If I have 100 users they could each
define "Customer" as they see fit and put their own definition into the metadata
repository as their own definition. No problems.
http://en.wikipedia.org/wiki/Metadata_repository
“A Metadata repository is a database created to gather, store, and distribute
contextual information about business data, when documented it is known as
metadata. This contextual information of business data include meaning and content,
policies that govern, technical attributes, specifications that transform, and programs
Document1
Page 9 of 44
Project:
Title:
Version: 1.0
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
that manipulate.
The metadata repository is responsible for physically storing and cataloging metadata.
The metadata that is stored should be generic, integrated, current, and historical.
Generic for a metadata repository means that the meta model should store the
metadata by generic terms instead of storing it by an applications-specific defined
way, so that if your data base standard changes from one product to another the
physical meta model of the metadata repository would not need to change.
Integration of the metadata repository allows all entities of the enterprise business to
view all metadata subject areas. The metadata repository should also be designed so
that current and historical metadata both can be accessed. Metadata repositories used
to be referred to as a data dictionary.
http://en.wikipedia.org/wiki/Data_dictionary . A data dictionary, or metadata
repository, as defined in the IBM Dictionary of Computing, is a "centralized repository
of information about data such as meaning, relationships to other data, origin, usage,
and format." The term may have one of several closely related meanings pertaining to
databases and database management systems (DBMS):
 a document describing a database or collection of databases
 an integral component of a DBMS that is required to determine its structure
a piece of middleware that extends or supplants the native data dictionary of a DBMS
http://www.springerreference.com/docs/html/chapterdbid/63927.html
http://www.uspto.gov/web/patents/patog/week13/OG/html/1388-4/US08407194
20130326.html
http://www.bls.gov/ore/pdf/st000010.pdf
Description
Example

Data Store for Structural metadata, defined within an organization

Study Instance Metadata are derived from the Structural metadata defined in a
Metadata repository, but are generally not stored in the MDR as they are study
specific

Descriptive metadata are not stored either in a MDR
CDISC SHARE
NCI caDSR
Recommended A metadata repository (MDR) is a centralized repository of structural metadata, with
definition
information about instance data such as semantics (meaning), relationships to other
data, origin, usage, and format.
When the emphasis is put on control of new metadata – through a specific registration
process with well identified administration/registration authority - the metadata
repository is often called a metadata registry
Recommendation is to use terms
Document1
Page 10 of 44
Project:
Title:
Metadata Management
Metadata Definitions
Version: 1.0


3.1.6
Date: 12th March 2014
Working Group:
Emerging Technologies
Metadata registry when the software has a strong registration process
Metadata repository when the software is more of a library with less emphasis on
registration
Metadata registry
Synonym
Definition
source
Metadata repository
& http://en.wikipedia.org/wiki/Metadata_registry A metadata registry is a central
location in an organization where metadata definitions are stored and maintained in a
controlled method.
A metadata registry typically has the following characteristics:
 Protected environment where only authorized individuals may make changes
 Stores data elements that include both semantics and representations
 Semantic areas of a metadata registry contain the meaning of a data element with
precise definitions
 Representational areas of a metadata registry define how the data is represented
in a specific format, such as in a database or a structured file format (e.g., XML)
http://datadictionary.blogspot.com/2008/03/metadata-repositories-vs-metadata.html
Definitions from Dr. Data Dictionary site - A Registry has the connotation of more than
just a shared dumping ground. Registries have the additional capability to create
workflow processes to check that new metadata is not a duplicate (for a given
namespace). One of the definitions from Webster is an official record book. Note the
word official
ISO/IEC 11179-3 Third edition 2013-02-15
3.2.113
Registry: information system for registration (3.2.108)
Description
3.2.78
metadata registry (MDR): information system for registering metadata (3.2.74)
 The structure of a metadata registry is specified in the form of a conceptual data
model. The metadata registry is used to keep information about data elements
and associated concepts, such as “data element concepts”, “conceptual domains”
and “value domains”.
See above
Example
See above
Recommended See above
definition
3.1.7
Data element
Synonym
Document1
Variable
(Note: the term “attribute” is also used interchangeably for DE when “attribute” is
Page 11 of 44
Project:
Title:
Metadata Management
Metadata Definitions
Version: 1.0
Date: 12th March 2014
Working Group:
Emerging Technologies
synonym of a variable or the property of a class)
Definition
[FDA1]
A data element is the smallest (or atomic) piece of information that is useful for
analysis (e.g., a systolic blood pressure measurement, a lab test result, a response
to a question on a questionnaire).
A data element is an atomic unit of data that has precise meaning or precise semantics
[CDISC1]
1. For XML, an item of data provided in a mark-up mode to allow machine
processing. [FDA - GL/IEEE]
2. Smallest unit of information in a transaction. [Center for Advancement of Clinical
Research]
3. A structured item characterized by a stem and response options together with a
history of usage that can be standardized for research purposes across studies
conducted by and for NIH. [NCI, caBIG]
NOTE: The mark up or tagging facilitates document indexing, search and retrieval,
and provides standard conventions for insertion of codes.
[ISO/IEC 11179-4:2004, 3.4]
Description
Example
Document1
Unit of data for which the definition, identification, representation and permissible
values are specified by means of a set of attributes.
The data element is foundational concept in an ISO/IEC 11179 metadata registry. The
purpose of the registry is to maintain a semantically precise structure of data
elements.
Each Data element in an ISO/IEC 11179 metadata registry:
 should be registered according to the Registration guidelines (11179-6)
 will be uniquely identified within the register (11179-5)
 should be named according to Naming and Identification Principles (11179-5)
 should be defined by the Formulation of Data Definitions rules (11179-4)
 may be classified in a Classification Scheme (11179-2)
A Data Element is the most elementary unit of data that cannot be further subdivided
from a semantic point of view, as it is linked with a precise meaning.
A data element has different properties:
 An identification such as a data element name
 A clear definition/ semantic description
 A data type
 Optional enumerated permissible values (value sets)
 One or more representation terms (synonyms)
 An author and registration authority who takes responsibility for the definition of
the data element
Birth Date is a Data Element
It is described by a set of properties
Page 12 of 44
Project:
Title:
Version: 1.0





Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
DE name: Birthdate
Definition/description: date and time on which the subject is born
Data type: date (mm/dd/yyyy – hh/mm/ss – time zone)
Value sets: not applicable
Synonyms: BRTHDTC in CDISC SDTM, birthdate in BRIDG
If Variable in SDTM is provided as a synonym of Data Element, then Data Element
would have a similar association to ItemDef as Variable to ItemDef in the Define-XML.
Recommended A Data Element is the most elementary unit of data that cannot be further subdivided
definition
from a semantic point of view, as it is linked with a precise meaning. The definition,
identification, representation and permissible values of a data element are specified
by means of a set of properties.
3.1.8
Attribute
Synonym
Property
(Note: the term “Data element” is also used interchangeably for attribute – but it is a
different concept)
Definition
source
& http://en.wikipedia.org/wiki/Attribute_(computing)
In computing, an attribute is a specification that defines a property of an object,
element, or file. An attribute of an object usually consists of a name and a value; of an
element, a type or class name; of a file, a name and extension.
[Source: Understanding HL7 version 3: Andrew Hinchley]
Attributes are abstractions of the data captured about classes.
[Source: ISO 1087]
Attribute is short for attribute type and attribute value. Attribute type: category of
attribute values used as a criterion for the establishment of a concept system
[source: Medical Data Management” Florian Leiner et al]
Attribute value: Value of an attribute type as observed for a particular object.
[Source: ISO 21090] Characteristic of an object that is assigned a name and a type
NOTE The value of an attribute can change during the lifetime of the object.
Description
A prerequisite for correct and proper use and interpretation of data is that both users
and owners of data have a common understanding of the meaning and representation
of the data. To facilitate this common understanding, a number of attributes, of the
data have to be defined. Such attributes include: the element’s name, data type,
caption presented to users, detailed description, and basic validation information such
as range checks.
Description of the characteristics of an object /class in a logical model. If the attributes
Document1
Page 13 of 44
Project:
Title:
Version: 1.0
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
represent the most elementary unit of data that cannot be further subdivided from a
semantic point of view it can be considered as a Data Element.
Attribute is an overloaded term. It is sometime used as synonym of Data Element or as
synonym of a property of a Data Element. While the first case may be correct in many
cases1, we suggest to avoid the second practice and to use the term “property”
instead.
Example
in BRIDG,
 raceCode is an attribute of class Person (i.e. Person.raceCode),
 value is an attribute of DefinedObservationResult.
Recommended Properties of an object or class in a conceptual or logical data model.
definition
3.1.9
Class
Synonym
Definition
source
Description
Object
& http://en.wikipedia.org/wiki/Class_(computer_programming)
In object-oriented programming, a class is a construct that is used to define a distinct
type. The class is instantiated into instances of itself – referred to as class instances,
class objects, instance objects or simply objects. ….A class usually represents a noun,
such as a person, place or thing, or something nominalized. For example, a "Banana"
class would represent the properties and functionality of bananas in general. A single,
particular banana would be an instance of the "Banana" class, an object of the type
"Banana"
[Source: ISO 21090]class
descriptor for a set of objects with similar structure, behaviour and relationships
Description of a set of objects that share the same attributes, operations, methods,
relationships, and semantics


Example
StudySite Class in the BRIDG model
ManufacturedMaterial class in HL7 RIM: An Entity or combination of
Entities transformed for a particular purpose by a manufacturing process
Recommended Description of a set of objects that share the same attributes, operations, methods,
definition
relationships, and semantics
A class has:

An identifier such as a class name

A clear object definition / semantic description

One or more representation terms/words

A list of Data Element (also known as attributes)
In an information model, like BRIDG, an attribute may have a data type like “ADDRESS” which is a class. This
attribute will not qualify as being a Data Element
1
Document1
Page 14 of 44
Project:
Title:
Version: 1.0
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies

A list of related classes and a description of the relationship type(s)

Any description – in addition to Data Elements – that allow to map the object
within an application
3.1.10 Data type
Synonym
Definition
source
Storage format
&
[Source: ISO 11404]
A data type is a classification identifying one of various types of data, such as realvalue, integer or Boolean, that determines the possible values for that type; the
operations that can be done on values of that type; the meaning of the data; and the
way values of that type can be stored.
[Source: ISO 21090]
set of distinct values, characterized by properties of those values, and by operations
on those values
[ Source: http://msdn.microsoft.com/]
Objects that contain data have an associated data type that defines the kind of data;
for example, character, integer, or binary, the object can contain. The following objects
have data types:
 Columns in tables and views.
 Parameters in stored procedures.
 Variables.
 Transact-SQL functions that return one or more data values of a specific data
type.
 Stored procedures that have a return code, which always has an integer data
type.
Description
Storage format in a Data Base – not the display format in the User Interface
Data types define the kind of data – or the format - that can be included in a field (Data
Element, Attribute or variable). There are two categories of data type:
 simple / primitive data types such as Boolean, Integer, Character –defined in
ISO 11404,
 abstract data types –defined in ISO 21090 – and defining basic concepts that
are commonly encountered in healthcare in support of information exchange.
Abstract data types are using the terminology, notations and data types
defined in ISO/IEC 11404, thus extending the set of data types defined in that
standard
Example


Document1
Primitive data type (ISO 11404): boolean, enumerated, character, time, integer,
real, …
Abstract data types (ISO 21090): Address, PQ (for Physical Quantity) or II (for
Page 15 of 44
Project:
Title:
Metadata Management
Metadata Definitions
Working Group:
Emerging Technologies
Date: 12th March 2014
Version: 1.0
Instance Identifier), CD (Concept Descriptor), Range (low, high), Period (start, end)
Recommended Data types define the format - that can be included in a specific Data Element (or
definition
variable or attribute) , There are two categories of data type:
 simple / primitive types such as Boolean, Integer, Character –defined in
ISO11404,
 abstract data types such as Address, PQ (Physical Quantity) –defined in ISO
21090 – and using the terminology, notations and data types defined in
ISO/IEC 11404
3.1.11 Value level metadata (VLM)
Synonym
Definition & CDISC Define-XML Specification Version 2.0 – http://www.cdisc.org/define-xml
source
Value Level Metadata is metadata defined based on the value of other variable(s) to
support data review and analysis in cases where variable metadata is not sufficient.
The normalized data structure used by datasets based on the SDTM, SEND and
ADaM models (generally one record per subject per topic variable (test code or
parameter code) per visit or observation) provides an efficient method for
transmitting information. However, there are cases where the dataset variable
metadata does not provide sufficient detail to support data review and analysis. In
these cases Value Level Metadata should be provided in the Define-XML document.
Value Level Metadata enables the specification of the metadata of a variable under
conditions involving one or more other dataset variables. The definition of a variable
for a specific condition is known as Value Level Metadata.
Description


Document1
Note: The Define-XML team is working on creating an Implementation Guide on the
different use cases of VLM and associated requirements. It is expected that the
Define-XML Implementation Guide will be gradually available to the public
Variable level metadata = structural metadata on variable
o E.g. Variable VSORRESU is a coded concept with as structural metadata
type = text, length = 30, value set further specified through VLM
Value-level metadata is a specific term used in the CDISC Define-XML standard due
to the fact that CDISC standards include dataset definitions in a generic way (vertical
structure) which does not allow capturing explicit semantic dependencies between
variables. For instance in VS we have the following variables.
Variable
Label
Type
lght
EX1
EX2
VSTESTCD
Vital Signs Test Short Name
text
20
SYSBP
HGTH
VSTEST
Vital Signs Test Name
text
24
Systolic BP Height
VSORRES
Result or Finding in Original text
Units
30
Integer
Float
Page 16 of 44
Project:
Title:
Metadata Management
Metadata Definitions
Working Group:
Emerging Technologies
Date: 12th March 2014
Version: 1.0
VSORRESU Original Units
Text
20
CMHG,
MMHG
INCH,
CM, M
It is clear that if the VSTEST = Systolic Blood Pressure, the result in VSORRES and
Unit code in VSORRESU will be different than if the VSTEST = Height . This is
further specified through VLM as displayed below
Example(s)
Variable
Where
Type
Length
VSORRES
VSTESTCD EQ HEIGHT
(Height)
float
5.1
VSORRES
VSTESTCD EQ SYSBP (Systolic integer
Blood Pressure)
3
VSORRESU VSTESTCD EQ HEIGHT
(Height) AND COUNTRY IN (
"CAN" , "MEX" )
text
5
["cm" =
"Centimeter"]
<Unit
(UH_MC)>
VSORRESU VSTESTCD EQ HEIGHT
(Height) AND COUNTRY EQ
USA
text
5
["IN" = "Inch"]
<Unit
(UH_NMC)>
Example 1.




Document1
Controlled
Terms
or
Format
Data values are often stored in variables that are dedicated to a single
kind of measurement, for example height values are stored in a variable
named “height” and weight values are stored in a variable named
“weight”.
But sometimes data values for different measurements are stored in a
single shared variable. And Height and weight values can all be stored in a
variable named “result_value”. So, how can you know which values are
height and which weight?
A second variable could name the measurement whose values are stored
in “result_value”. This second variable could be named “result_name”,
thus a data set contains the variable “result_name” with values like
“height” and “weight” and the data set contains the variable
“result_value” with values like “185” and “75 ”. This data design is good
for software, which likes consistency in the data variable names. But the
metadata describing attributes of the shared variables must be able to
describe these attributes separately for each value of result_name.
Value Level Metadata is the metadata design that enables metadata
descriptions of “result_value” for each “result_name”. VLM assigns a
different set of variable characteristics to result_value for each value of
Page 17 of 44
Project:
Title:
Metadata Management
Metadata Definitions
Working Group:
Emerging Technologies
Date: 12th March 2014
Version: 1.0
result_name, stating that the attributes of values in result_value are
different when result_name is “height” as compared to when
result_name is “weight”.
Example 2.
This example of value level metadata is related to the example provided in section 3.3.7
“Value set” with Family Pet
Variable
Where
Type
Length
Controlled
Terms
or
Format
FamilyPet
-
Text
20
Animals
Breed
FamilyPet EQ “Dog”
Text
20
Breed of Dogs
FamilyPet Different “Dog”
Text
20
A set of data about a group of families, that contains a variable “Family pet” may also
contain a separate variable “Breed” (considered a variable qualifier of “Family Pet”)
that is conditioned upon the value of the data element “Family pet”:


Recommend
ed definition
Document1
the variable “Family pet” bound to the value set “Animals”
and the data element “Breed” bound to the value set “Breeds of Dog” when
“Family pet”=”Dog”.
VLM is the mechanism (implementation approach) used in the Define-XML standard to
express semantic dependencies between variables defined independently within the
CDISC standards. For instance a vital sign test is defined by a test code, a value and a
unit. These are independent variables in the CDISC Standards. The value level metadata
allow expressing the dependencies i.e. the value and the unit will be different based on
the test code.
Page 18 of 44
Project:
Title:
Metadata Management
Metadata Definitions
Date: 12th March 2014
Version: 1.0
3.2
Working Group:
Emerging Technologies
Controlled Terminology, code systems & value sets
In this section we only limit the definition to the terms most often used in clinical research operations,
to clarify the confusion between terms like “code lists”, “controlled terminology”, “dictionary” like
MedDRA.
The components of controlled vocabularies …
How Controlled
Vocabularies are described
and used
Concept
Identifiers
Concepts
.. with example from CDISC Terminology
How
Controlled
In define.xml(not
machine
processable)
 Controlled
terminology
CT/
Vocabulariescode
are: CDISC
described
NCI EVS CT
Value set CUI for SEX: C66731
Female CUI: C16576
Concept
Representation
and used


Other (machine processable): OID. URI
Concept
Identifiers
Concepts
“Women”
Concept
Representation
C16576 + F
F (primary)
Designations
Codes
Code
System
Versioning
ISO 21090
Datatypes – the
CD Concept
Descriptor
Designations
C16576
Codes
Code
System
Versioning
Code
Systems
Value Set
Definition
Female
ISO 21090
Datatypes – the
CD Concept
Descriptor
Code
Systems
Value Set
Definition
Value Sets
Value Set
Versioning
3.2.1
female
Value Sets
inspired
inspired from
from Julie
Julie James,
James,
BlueWave
BlueWave Informatics
Informatics
Value Set
Versioning
C66731 (for SEX)
inspired
inspired from
from Julie
Julie James,
James,
BlueWave
BlueWave Informatics
Informatics
Controlled Terminology/controlled vocabulary
Synonym
Definition
source
Document1
Controlled vocabulary
& [CDISC].
CDISC Controlled Terminology is a set of standard value lists that are used throughout
the clinical research process from data collection through analysis and submission
History of alignment of CDISC terminology:
 NCI EVS (Enterprise Vocabulary Services) original terminology applicable to
SDTMIG (2005)
 HL7 EHR Clinical research functional profile linking HL7 standards with CDISC
Page 19 of 44
Project:
Title:
Version: 1.0



Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
CDASH (data collection standards)
HITSP - (replaced by HITSC)
ISO - in progress
JIC - Future intention to align with JIC?
http://en.wikipedia.org/wiki/Controlled_vocabulary
Controlled vocabularies provide a way to organize knowledge for subsequent retrieval.
Controlled vocabulary schemes mandate the use of predefined, authorised terms that
have been preselected by the designer of the vocabulary
[Source: Mapping from a Clinical Terminology to a Classification: AHIMA]
Controlled means that the content of the terminology is validated with careful quality
assurance procedures in place to ensure that the terminology is structurally sound,
biomedically accurate and consistent with current practice.
Controlled terminology in the context of Controlled Vocabulary:
[Amy Warner, A Taxonomy Primer].
Controlled vocabularies … are organized lists of words and phrases, or notation
systems, that are used to initially tag content, and then to find it through
navigation or search.

Description
Document1
[Source: ISO Standard 1087] and [Medical Informatics: Computer Applications in
Healthcare and Biomedicine]
The terms terminology, vocabulary and nomenclature are often used
interchangeably by creators of coding systems and by authors discussing the
subjects. ISO Standard 1087 (Terminology –Vocabulary) lists the various
definitions for these terms.
o Terminology: Set of terms representing the system of concepts of a
particular subject field
o Nomenclature: System of terms that is elaborated according to preestablished naming rules
o Dictionary: Structured collection of lexical units, with linguistics
information about each of them
o Vocabulary: Dictionary containing the terminology of a subject field
A Controlled Terminology is a synonym of Controlled Vocabulary.
It is a set of standardized words and phrases (designations) used to refer to concepts.

It has a defined scope or describes a specific domain

It may support categorization, indexing, and retrieval of information (optional).

A good terminology typically includes preferred terms and synonyms while
promoting consistency in preferred terms and in the assignment of the same terms
to similar content.
Page 20 of 44
Project:
Title:
Version: 1.0
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
A controlled terminology – or code system – can be used for coding i.e. assignation of
a code together with a verbatim
Example
ICD-9 CM, SNOMED CT, LOINC, MedDRA are all controlled terminologies AND code
systems
CDISC CT is a controlled terminology but not a true code system because


no OID to represent all the CDISC CT as a unique well identified set ,
governance: The organisation that publishes/manage it (NCI) with OID and
designation, is not the same than the one responsible for it (CDISC)
 it can be extended by the sponsor
Recommended Controlled terminology is a set of standardized words and phrases (designations) used
definition
to refer to concepts.
3.2.2

It has a defined scope or describes a specific domain

It may support categorization, indexing, and retrieval of information (optional).

A good terminology typically includes preferred terms and synonyms while
promoting consistency in preferred terms and in the assignment of the same terms
to similar content.
Code system
Synonym
Controlled Terminologies, Controlled Vocabularies, Coding schemes, [Dictionary is
sometimes
used
incorrectly]
(and sometime also code lists e.g. ISO country code)
Definition
source
& [Source: ISO 21090]
managed collection of concept identifiers, usually codes, but sometimes more complex
sets of rules and references
references
NOTE They are often described as collections of uniquely identifiable concepts with
associated representations, designations, associations and meanings.
EXAMPLES ICD-9, LOINC and SNOMED-CT
Description
A Code System is a more strictly “regulated” controlled terminology
• A Code system may be described as “a collection of uniquely identifiable concepts
with associated representations, designations, associations, and meanings” (B for
Blue, Y for Yellow) – while a controlled terminology could be just a list of words
(Blue, Yellow, ..)
• A Concept should be unique in a given Code System and should have unique
identifier (e.g. CUI – concept unique identifier), following the governance rules of
the Code System
• A Code system should have:
 an identifier (e.g. OID) that uniquely identifies the Code System.
 a description consisting of prose that describes the Code System, and may
include the Code System uses, maintenance strategy, intent and other
Document1
Page 21 of 44
Project:
Title:
Metadata Management
Metadata Definitions
Version: 1.0
Date: 12th March 2014
Working Group:
Emerging Technologies
information of interest
 administrative information proper to the Code System, such as ownership,
source URL, and copyright information
 a code system version, as the code system could evolve over time (with some
time change in the underlying concept)
A controlled terminology – or code system – can be used for coding i.e. assignation of
a code together with a verbatim
Example
ICD-9 CM, SNOMED CT, LOINC, and MedDRA, NCIT (NCI Thesaurus), ISO 3166 for
country code
Note: CDISC CT is not a code system as it does not have a strict version control and
governance– see above).
Recommended A Code system – as a controlled terminology - is described as “a collection of uniquely
definition
identifiable concepts with associated representations, designations, associations, and
meanings”. Each concept in a code system is unique. A code system has strict
governance rules to manage its content (and this is the main difference with a
controlled terminology where there is no governance).
3.2.3
Dictionary
Synonym
Definition
source
Controlled Terminology/Controlled vocabulary
&
-
Description
Often used in clinical data management for MedDRA, this term is an overloaded term
with different significations in different contexts. We therefore suggest to avoid its use
and use the proper wording i.e. controlled terminology or code system
Example
MedDRA, WHODRUG
Recommended Do not use !
definition
3.2.4
Concept
Synonym
Definition
source
& [Source: ISO 21090]
unitary mental representation of a real or abstract thing; an atomic unit of thought
NOTE 1 It should be unique in a given code system.
NOTE 2 A concept can have synonyms in terms of representation and it can be a
primitive or compositional term.
Description
• A Concept is a unitary mental representation of a real or abstract thing – an atomic
unit of thought – within a specific context
Document1
Page 22 of 44
Project:
Title:
Version: 1.0
•
•
•
Example
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
The purpose of defining the concept is to share meaning in information exchange
They constitute the smallest semantic entities with which models are built. The
authors and the readers of a model use concepts and their relationships to build
and understand the models; these are what matter to the human user of models.
A concept can be labelled with a code (machine readable) and/or a designation
(human readable) ; a collection of codes constitute a code system
Concepts and real world objects are defined at a different level (object is an actual
thing that exists – while a concept is a mental thing)
real “unit of thought”: apple, pomme (when we need a more refined definition such as
green or red apple – the concept can be refined)
abstract “unit of thought”: love
Recommended A Concept is a unitary mental representation of a real or abstract thing – an atomic
definition
unit of thought; a concept can be labelled with a code and/or a designation
3.2.5
Code
Synonym
Permissible value
Definition
source
& [Source: ISO 21090]
concept representation published by the author of a code system as part of the code
system, being an entity of that code system
Description
• A Code is a machine processable Concept Representation published by the author
of a Code System as part of the Code System
• It is the preferred unique identifier (unambiguous) for that concept in that Code
System for the purpose of communication (preferred machine-readable identifier),
and is used in the 'code' property of an ISO 21090 CD data type
• Codes are sometimes meaningless identifiers, and sometimes they are mnemonics
that imply the represented concept to a human reader.
Note:
• a concept representation has a code and one or more designations. If there is
more than one designation of the same concept – these are synonym of each
other’s.
 In a code system that has synonyms, it is useful to have a “primary
designation” assigned by the code system provider.
 This is helpful in maintenance, because if a change is needed then this can
be done without needing to retire and re-author the whole concept;
whereas if there is no primary designation, it is difficult to decide whether
making a change to “one of the synonyms” means retiring and reauthoring the whole concept.
• a decode is generally used as the (primary) designation of a concept
Example
Document1
•
•
MedDRA code – has meaningless identifiers – “10040589” (Shoplifting)
ISO (2 letter) Country codes – mnemonic – GB = Great Britain
Page 23 of 44
Project:
Title:
Version: 1.0
•
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
In CDISC Vocab
• C16576 is the code for Female in CDISC Vocab CT
• F is the designation for Female
• Female might be another designation (and is a synonym of F , and should
ideally be the primary designation as this more human readable)
Recommended Meaningless identifiers of a concept, which should ideally be linked with a designation
definition
(or decode) which is human readable/meaningful
3.2.6
Code list
Synonym
Definition
source
Value set, Code system (e.g. ISO country code)
&
Description
Code lists within a database are implementations of a CT. The coded value is
operational and not necessarily part of the CT. For example a codelist 1=Male,
2=Female is the sponsor application of the CDISC terminology for SEX containing value
list (Male, Female).
Example
Recommended Do not use – not precise enough – use either code system or value set as appropriate
definition
3.2.7
Value set
Synonym
Definition
source
Description
Code list
& Source: ISO 21090]
that which represents a uniquely identifiable set of valid concept representations,
where any concept representation can be tested to determine whether or not it is a
member of the value set

A Value Set represents a uniquely identifiable set of valid concepts in context i.e.
bound to a specific data element.

A value set draws from one or more code systems (see examples below) :
The figure below shows the relationship between values set and code system.
Document1
Page 24 of 44
Project:
Title:
Metadata Management
Metadata Definitions
Working Group:
Emerging Technologies
Date: 12th March 2014
Version: 1.0
class
1
0,n
1,n
dataElement1
valueSetDEBinding
valueSet
1,n
valueSetDefinition
1,1
0,1
1,n
1,n
valueSet
expansion
codedConcept
codeSystem
1,1





Not all dataElements are coded with a value set; therefore the cardinality between
a dataElement and a valueSetDEBinding is 0,1
A valueSetBinding will usually only be to a single value set but a valueSet can be
bound to more than one dataElement if necessary; therefore cardinality 1,n
between valueSetBinding and ValueSet.
A value set must always have a definition, so the cardinality between valueSet and
valueSetDefinition is 1,1.
A valueSet may relate to one or more codeSystem – so that cardinality should be
1,n.
A valueSet contains 1,n coded concepts when you expand it from its definition,
which can be sourced from one or more codeSystems. But each coded concept
comes from only one codeSystem so that cardinality should be 1,1.
Examples




Example 1. A value set is needed to instantiate the data element “family pets”.
o codeSystem 1= “Animals” (including “guinea pig”, “rabbit”, “hamster” etc.)
o codeSystem 2 = “Breeds of Dog” (“Poodle”, “Alsatian”, “Jack Russell” etc.)
which can come from a code system called “Breeds of Dog”.
o valueSet = “family pets” draws concepts from two code systems –
“Animals” and “Breeds of Dog”.
Example 2. In SDTM, LBTESTCD is a value set that can be extended. There are a
number of LabTest concepts defined using the NCI Thesaurus code system. But if
there is a lab test that you need that is not in NCI, you can add it using any other
lab related terminology concept such as LOINC or SNOMED – so here again the
value set is drawn from more than one code system.
Example 3: most SDTM value sets can be extended with sponsor defined concepts
(which needs to be defined as part of the sponsor code system)
Example 4. In SDTM AESEV cannot be extended
Notes

Document1
The Unique Meaning rule is important when a value set contains concepts from
more than one code system. Its aim is to ensure that the value set does not
Page 25 of 44
Project:
Title:
Version: 1.0

Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
contain identical concepts from two different code systems and that every concept
has a single globally unique identifier. So a value set should not contain both the
concept “C103812 CD19 Cell to Lymphocyte Ratio Measurement” from the NCI
and the concept “8117-4 Cells.CD19/100 Cells” from LOINC as they both represent
the same real world thing.
Inclusion of concepts in a value set must be properly governed i.e. the added
concepts must be defined and managed in a code system. Any organisation – and
certainly pharmaceutical companies - need to have a properly governed code
systems
o Organisation can build a code system by taking CDISC CT and potentially
adding new concepts through a well-documented process
o However by adding new concepts, any organization diverges with the industry
standards. It is therefore recommended
 To ask to the SDO at the source of the code system if the concept
exists,
 if not then request to add the code or
 if it exists and it is not clear then request to provide an update
o To implement a well-documented governance process if the organisation
wants to add their own concepts
Extensibility: if a value set include others, does it allow for extensibility ? So others
should NOT be accepted as a concept in a code system/value set.
Example


Recommended 
definition

Document1
Value set for countries is all the complete ISO 3166. Value set for LATAM countries
is the subset of ISO 3166 for the Southern American countries
Value set for the variable SEX in CDISC is identified by C66731 and is composed by
F (for Female), M (for Male), U (for Unknown), UN (for undifferentiated)
A Value Set represents a uniquely identifiable set of valid concepts in context i.e.
bound to a specific data element.
It is not recommended to extend a value sets, if there is a real need this should
done under a well-defined governance process.
Page 26 of 44
Project:
Title:
Metadata Management
Metadata Definitions
Version: 1.0
3.3
Working Group:
Emerging Technologies
Date: 12th March 2014
Master data management
Definition
Instance
defines
Master Data
Structural
Metadata
Sources
Content
(transactional data)
Transactional
Transactional Application
Application
(within
(within an
an enterprise)
enterprise)
defines
Reference Data
Content (code system)
Standard
Standard Development
Development
Organization
Organization
(outside
(outside an
an enterprise)
enterprise)
3.3.1
Master Data
Synonym
Definition
source
Master Reference Data
& http://en.wikipedia.org/wiki/Master_data
 Master Data is a single source of basic business data used across multiple
systems, applications, and/or processes.
Master data is information that is key to the operation of a business. … can include
reference data. This key business information may include data about customers,
products, employees, materials, suppliers, and the like. ... Because master data may
not be stored and referenced centrally, but is often used by several functional groups
and stored in different data systems across an organization, master data may be
duplicated and inconsistent (and if so, inaccurate).
Thus Master Data is that persistent, non-transactional data that defines a business
entity for which there is, or should be, an agreed-upon view across the organization.
[Gartner – Magic Quadrant for Master Data Management of Customer Data Solution]
http://www.gartner.com/technology/reprints.do?id=1-1CK9UDO&ct=121019&st=sb
Master data is the consistent and uniform set of identifiers and extended attributes
that describes the core entities of the enterprise, such as customers, prospects,
citizens, suppliers, sites, hierarchies and chart of accounts.
Document1
Page 27 of 44
Project:
Title:
Metadata Management
Metadata Definitions
Version: 1.0

Description

Date: 12th March 2014
Working Group:
Emerging Technologies
Master Data are objects, that must be manipulated across different
systems and therefore need to have a consistent meaning and definition to
ensure they can be uniquely identified across these systems
o It is produced within a transactional system (the “master system”)
as part of a transaction and is used for reference and validation in
transactions within other systems.
o Master data are defined by a set of attributes (see example below)
that support unique identification of the object and/or additional
information for use across different systems
Master Reference Data = Master Data + Reference Data (consumed in the
same way) – see below for definition of reference data

Master Data – as any other data – are defined with Structural Meta data

Master Data are categorized in dimensions referred to with a unique
identifier.
o In marketing a typical master data dimension is customer
o In clinical research, the following are considered as master data
dimension: drug product, device, study, site, investigator, staff,
sponsor.
o Visit and Subject are not master data dimensions because they
would not be persisted in an independent repository – but there
should be an agreement on how to uniquely identify them within a
specific trial

Master Data are persisted in a specific repository either centralized or
virtual, integrating data from different systems. Master data repositories
are generally implemented per dimension ; so they would be a study
master data repository, an investigator DB, a product registry, …..
Dimension
(with key identifier in SDTM)
Identifying attributes (recommendation – not normative)
Drug Product
(Investigational and
Comparator)
ID (IMP_ID ISO11615 or MPD_ID)
Product name
(set of ) Active Ingredient
Dose Form
Strength
Administration device
Device
Study
(STUDYID)
Document1
Device name
Unique Device Identifier (in CDISC: UDEVID)
Type
Manufacturer
Model
Batch Identifier
Lot Identifier
Serial
Sponsor
Page 28 of 44
Project:
Title:
Metadata Management
Metadata Definitions
Version: 1.0
Date: 12th March 2014
Working Group:
Emerging Technologies
Study Name
Protocol ID; StudyID if different than ProtocolID
Protocol Title
Product (DrugProductID or DeviceID)
Registered trial id (CT.gov or EUDRACT)
(Protocol Short Title)
Site
(SITEID)
Name (centre)
SiteID
Phone, fax
Complete Postal Address (country, zip code, town, street..
see ISO 21090 address)
Site Type (hospital, clinic, pharmacy, …)
Investigator
(INVID)
(member of a clinical
organisation which is a key staff
member and treated separately
and stored in a Investigator DB)
Staff
(internal to a sponsor or a CRO
or a hospital)
Sponsor
(sponsorID)
Example



Document1
InvestigatorID
Name
Phone, fax
Email
Complete Postal Address (ISO 21090 address)
Name
DateOfBirth
Phone, fax
Email
Initials/username
Complete Postal Address ((ISO 21090 address)
SponsorID (like Dun&Bradstreet unique number or OID)
Name
Phone, fax
Postal Address
Site identification information such as: Site ID, Site Name, Site Address, …
Investigator identification attributes
The picture below gives an example of investigator master data in different
Health Care systems: how they different and how they need to be
integrated within a centralized repository (line at the bottom) to ensure
that the SAME investigator described DIFFERENTLY in different systems is
referred and use in the same way across systems.
Page 29 of 44
Project:
Title:
Version: 1.0
Recommende
d definition
3.3.2
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
Master Data is a single source of basic business data used across multiple systems,
applications, and/or processes.
It is an object including several attributes supporting unique identification and use
across multiples systems.
Master Data are categorized by dimensions and persisted in specific repository
(Master) Reference Data
Synonym
Definition
source
Code System (for frequently used concept such as country code)
Reference Data
& http://en.wikipedia.org/wiki/Master_data
Reference Data is the set of permissible values to be used by other (master or
transaction) data fields. Reference data normally changes slowly, reflecting changes in
the modes of operation of the business, rather than changing in the normal course of
business.
http://en.wikipedia.org/wiki/Reference_data
Reference data are data from outside the organization (often from standards
organizations) which is, apart from occasional revisions, static. This non-dynamic data
is sometimes also known as "standing data".[1] Examples would be currency codes,
Countries (in this case covered by a global standard ISO 3166-1) etc. Reference data
should be distinguished [2] from "Master Data" which is also relatively static data but
originating from within the organization e.g. products, departments, even customers.
http://www.information-management.com/issues/20060401/1051002-1.html#Login
Reference data is any kind of data that is used solely to categorize other data found in
a database, or solely for relating data in a database to information beyond the
boundaries of the enterprise. Specific differences between reference and master data.
Document1
Page 30 of 44
Project:
Title:
Version: 1.0
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
Identification is a major difference between reference and master data.
 In master data, the same entity instance, such as a product or customer, can be
known by different names or IDs. For example, a product typically follows a
lifecycle from a concept to a laboratory project to a prototype to a production run
to a phase. In each of these phases, the name of the product may change, and its
product identifier may, too throughout their life cycle. Beyond product, we are all
aware that customers can change their names, or have identical names, ...
 By contrast, reference data typically has much less of a problem with
identification. This is partly because reference data changes more slowly. Existing
issues tend to revolve around the use of acronyms as codes. Reference data, such
as product line, gender, country or customer type, often consists of a code, a
description and little else. The code is usually an acronym, which is actually very
useful, because acronyms can be used in system outputs, even views of data, and
still be recognizable to users.
Description


Master Reference Data is a Code system (see definition above) that is widely used
across many different systems and need to be used consistently to ensure data
integration. For instance COUNTRY code is used across many applications. This is a
Master Reference Data
Master reference data are managed as any other code system, with strict
governance. They therefore do not change so often as Master data which are
generated by a transactional system

Example
The term “reference data” is widely used in different contexts and therefore it is
suggested to use it in its full context i.e. “Master Reference Data” .
For example other uses of the term “reference data”
 Clinical reference data = in SDTM this means data that is not subject level
specific, for instance Trial Summary domain data
 Reference range for laboratory values
Country Code
Recommended Recommended to use “Master Reference Data” and not “Reference Data”
definition
In the context of Master data Management, Master Reference Data is the set of
codes, from a code system widely accepted and used, to be used within data fields
ACROSS different applications
3.3.3
Master Data Management
Synonym
Definition
source
Document1
Reference Data Management; MDM
& [Gartner – Magic Quadrant for Master Data Management of Customer Data Solution]
http://www.gartner.com/technology/reprints.do?id=1-1CK9UDO&ct=121019&st=sb
MDM is a technology-enabled discipline in which business and IT work together to
ensure the uniformity, accuracy, stewardship, semantic consistency and accountability
of the enterprise's official, shared master data assets.
Page 31 of 44
Project:
Title:
Version: 1.0
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
[Source: Master Data Management]
Master Data Management (MDM) is the collective application of governance, business
processes, policies, standards and tools facilitate consistency in data definition.
The idea of Master Data focuses on providing unobstructed access to a consistent
representation of shared information [Source: SAS White Paper on Supporting Your
Information Strategy with a Phased Approach to Master Data Management
Description
Master Data Management (MDM) comprises of a set of processes and tools that
consistently define and manage the master data and master reference data of an
enterprise, which are fundamental to the company’s business operations.
MDM has the objective of providing processes & tools for collecting, aggregating,
matching, consolidating, quality-assuring, persisting and distributing such data
throughout an organization to ensure consistency and control in the ongoing
maintenance and application use of this information.
Example
There are different models for master data management – the 2 main extremes are
 Centralized model – where all data are managed within a central data store
and pushed to the different applications within an organization.
 Decentralized model (registry) where the master data are managed within
each applications but then reconciled through a registry systems to federate.
Specific products from vendors such as INFORMATICA, IBM, Software AG,…
Recommended Set of processes and tools needed for the deployment of master data and master
definition
reference data within an organization.
Document1
Page 32 of 44
Project:
Title:
Metadata Management
Metadata Definitions
Version: 1.0
3.4
Date: 12th March 2014
Working Group:
Emerging Technologies
Interoperability
Categorization of Interoperability (by HL7)
Synonym
Definition
source
Interworking, To be interoperable; interoperate
& 
ISO 11179 interoperability concerning the creation, meaning, computation, use,
transfer, and exchange of data [ISO/IEC 20944-1]

ISO 1117: capability to communicate, execute programs, or transfer data among
various functional units in a manner that requires the user to have little or no
knowledge of the unique characteristics of those units [ISO/IEC 2382-1]"

IEEE: ability of two or more systems of components to exchange information and to
use the information that has been exchanged. IEEE
(Source:
http://www.ieee.org/education_careers/education/standards/standards_glossary.
html)

Interoperability describes the extent to which systems and devices can exchange
data, and interpret that shared data. For two systems to be interoperable, they
must be able to exchange data and subsequently present that data such that it can
be understood by a user.
(Source: http://www.himss.org/library/interoperability-standards/what-is)
Description
Interoperability provides means to share information between disparate information
systems in such a way that the information can be used in a meaningful way
Example
Interoperability between healthcare and clinical research
Recommende
d definition
Ability of two or more functional units or systems [technical interoperability, and to use
is semantic interoperability] of components to exchange information and to use the
information that has been exchanged
3.4.1
Technical interoperability (“machine interoperability”)
Synonym
Document1
Machine Interoperability; Syntactic Interoperability, Functional Interoperability
Page 33 of 44
Project:
Title:
Version: 1.0
Definition
source
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
& Technical Interoperability: The focus of technical interoperability is on the conveyance
of data, not on its meaning. Technical interoperability encompasses the transmission
and reception of information that can be used by a person but which cannot be further
processed into semantic equivalents by software. Note that mathematical operations
can be -- and frequently are -- performed at the level of technical interoperability. A
good example is the use of a “check digit” to determine the integrity of a specific unit
of transmitted or keyed-in data. The same mathematical formula is performed at each
end of a transaction and the results compared to assure that the data was successfully
transmitted.
Technical interoperability moves data from system A to system B.
(Source: Coming to Term: Scoping Interoperability for Health Care, HL7 EHR
Interoperability WG)
Description
Example
Technical Interoperability is usually associated with hardware/software
components, systems and platforms that enable machine-to-machine communication
to take place. This kind of interoperability is often centered on (communication)
protocols and the infrastructure needed for those protocols to operate.
Technical/syntactical interoperability is usually associated with data formats. Certainly,
the messages transferred by communication protocols need to have a well-defined
syntax and encoding, even if it is only in the form of bit-tables.
TCP/IP, XML, HTTPS, SMIME, Web services
Recommended Technical interoperability is about exchanging information between systems without
definition
explicit guarantee of shared meaning.
3.4.2
Semantic interoperability
Synonym
Definition
source
Description
Document1
& Semantic Interoperability: To maximize the usefulness of shared information and to
apply applications like intelligent decision support systems, a higher level of
interoperability is required. This is called semantic interoperability which has been
defined as the ability of information shared by systems to be understood… so that
non-numeric data can be processed by the receiving system. Semantic interoperability
is a multi-level concept with the degree of semantic interoperability dependent on the
level of agreement on data content terminology and the content of archetypes and
templates
used
by
the
sending
and
receiving
systems.
Semantic Interoperability ensures that system A and system B understand the data in
the same way
(Source: Coming to Term: Scoping Interoperability for Health Care, HL7 EHR
Interoperability WG)
Semantic Interoperability is associated with the meaning of content and machine
interpretation of it. Thus, interoperability on this level means that there is a common
understanding between people and machine of the meaning of the content
(information) being exchanged.
To achieve semantic interoperability across computer systems, we need proper
Page 34 of 44
Project:
Title:
Version: 1.0
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
definition of data thru metadata, controlled terminologies and master data.
Example
1. CDISC SDTM with full compliance with future SDTM implementation guidelines
would support the goal of SI
2. CDISC SHARE is used to improve the definitions within CDISC SDTM
3. Physical implementation of BRIDG (ex: Janus CTR)
4. CDISC ADaM datasets which are well defined and approved, but currently
there is no officially defined SI (ex: linking ADaM datasets with output, SI is
achieved by creating layers that are sponsor-defined)
5. Note: add a counter example of lack of SI: CDISC ADaM (For Tim to add an
example)
Recommended Semantic Interoperability is about ensuring that exchange between systems is
definition
understood and appropriately used.
3.4.3
Process Interoperability
Synonym
Definition
source
Organizational Interoperability
& Process Interoperability: Process interoperability is an emerging concept that has
been identified as a requirement for successful system implementation into actual
work settings. 1
Process interoperability coordinates work processes, enabling the business processes
at the organizations that house system A and system B to work together. Process
interoperability is achieved when human beings share a common understanding, so
that business systems interoperate and work processes are coordinated.2
Organizational Interoperability: the ability of organizations to effectively communicate
and transfer (meaningful) data (information) even though they may be using a variety
of different information systems over widely different infrastructures, possibly across
different geographic regions and cultures. ,3
(Sources:
1. Coming to Term: Scoping Interoperability for Health Care, HL7 EHR Interoperability
WG
2. Principles of Health Interoperability HL7 and SNOMED (Health Information
Technology Standards), author: Tim Benson, April 2012)
3: EU Interoperability Framework (EIF)
Description
Document1
Process interoperability deals primarily with methods for the optimal integration of
computer systems into actual work settings and includes the following:
• Explicit user role specification
• Useful, friendly, and efficient human-machine interface
• Data presentation/flow supports work setting
• Engineered work design
Page 35 of 44
Project:
Title:
Version: 1.0
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
• Proven effectiveness in actual use
Example
Getting married would alter taxation data. The process of adjusting the marriage
status will trigger a process of adjusting the required taxation items via technical and
semantic interoperability.
Healthcare providers must standardize business rules to ensure that health
information is recorded in a uniform and timely manner such that the transfer of
information between systems is consistent and complete.
ICH Good Clinical Practice (GCP) , an ethical and scientific quality standard for
designing, conducting, recording, and reporting trials that involve the participation of
human subjects.
Maintaining/conveying information such as user roles between systems.
Recommended The mechanisms by which the integrity of workflow processes can be maintained
definition
between systems.
Document1
Page 36 of 44
Project:
Title:
Version: 1.0
3.5
3.5.1
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
Data aggregation, integration, pooling
Data pooling
Synonym
Definition
source
data integration (= data pooling + transformation)
&
http://english.stackexchange.com/questions/44643/meaning-of-data-pooling
Data Pooling:
In the more general case, we pool our resources so that collectively we make better
use of them. In the computing sense, data pool can be slightly misleading, because it
often just means a centralised database. Strictly speaking, it ought to mean an
arrangement whereby multiple distributed data servers store "their own" data
locally but provide access to that data across the entire network. In practice, it's a
buzzword that's often used loosely.
http://en.wikipedia.org/wiki/Data_pool
A data pool is a centralized database, where all necessary information to perform
business transactions between trading partners is stored in a standardized way.
Description
Document1
To use data from different sources we need 3 things
 Pooling i.e. pulling together different kinds of data from different sources to give a
holistic representation of what was observed. Different data sources are
combined into one central (virtual or physical) location in the format they were
originally collected. This is data pooling
Page 37 of 44
Project:
Title:
Version: 1.0


Example
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
Transformations i.e. mappings to restructure the data format into a common
standardized format, but leave the data itself unchanged. This often occurs since
the format in which the data is collected is different across different systems.
- This is data integration
- Pooling and integration/transformation may happen together (i.e. data are
transformed while they are combined) which leads to confusion.
Derivations i.e. use of mathematical or logical algorithms to change or to create
new data values or flags. Derivations also include imputations for missing data to
facilitate statistical analysis and inference. This is data aggregation.

Data pooling from different data collection instrument (EDC, LAB, ECG, MI, ..)
before generation of the SDTM data sets
 Pooling clinical trial data in order to identify rare and uncommon safety signals
 Pooling (for Integrated Safety reports) consists of adding the numbers of events
observed in a given treatment group across the trials and dividing the results by
the total number of patients included in this group
Recommended Data pooling is pulling together data from different sources and to combine them into
definition
one central (virtual or physical) location without transformation.
3.5.2
Data integration
Synonym
Definition
source
& http://en.wikipedia.org/wiki/Data_integration
Data integration involves combining data residing in different sources and providing
users with a unified view of these data.
IBM: http://www-01.ibm.com/software/data/integration/
Data integration is the combination of technical and business processes used to
combine data from disparate sources into meaningful and valuable information. A
complete data integration solution encompasses discovery, cleansing, monitoring,
transforming and delivery of data from a variety of sources.
Data integration involves combining data residing in different sources and providing
users with a unified view of these data. (Source: Data Integration: A Theoretical
Perspective)
Others
Condition of an information system in which each data item needs to be recorded,
changed, deleted, or otherwise edited just once, even if it is used in several application
systems. (source: Medical Data Management, A practical Guide)
Document1
Page 38 of 44
Project:
Title:
Version: 1.0
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
Data integration means anything from two systems passing data back and forth
(loosely coupled) to a shared data environment in which all data elements are unique
and non-redundant and are reused by multiple applications (tightly coupled)
(Source: Data Strategy)
Description
Data integration is the act of transforming data i.e mapping – toward a common
standardized format. It allows users to have a unified view of data that are coming
from different applications
Data integration can be the result of

Data pooling and then transformation
 Data transformation and then pooling
Example
1. SDTM data sets for a clinical trial is an “integrated data set” resulting from the
pooling an transformation of data from different source systems
2. A Web application integrating data from various sources
3. Integration of clinical research data and metadata
Recommended Data integration is the result of transforming data into a common format within a
definition
central (virtual or physical) location, maintaining integrity and non-redundancy.
3.5.3
Data aggregation
Synonym
Definition
source
& http://en.wikipedia.org/wiki/Aggregate_data
In statistics, aggregate data describes data combined from several measurements.
When data are aggregated, groups of observations are replaced with summary
statistics based on those observations.[1]
http://searchsqlserver.techtarget.com/definition/data-aggregation
Data aggregation is any process in which information is gathered and expressed in a
summary form, for purposes such as statistical analysis. A common aggregation
purpose is to get more information about particular groups based on specific variables
such as age, profession, or income. The information about such groups can then be
used for Web site <personalization to choose content and advertising likely to appeal
to an individual belonging to one or more groups for which data has been collected.
For example, a site that sells music CDs might advertise certain CDs based on the age
of the user and the data aggregate for their age group. Online analytic processing (
OLAP) is a simple type of data aggregation in which the marketer uses an online
reporting mechanism to process the information.
Document1
Page 39 of 44
Project:
Title:
Version: 1.0
Metadata Management
Metadata Definitions
Date: 12th March 2014
Description

Example
ADaM is an aggregated data set
Working Group:
Emerging Technologies
Differences between data integration and aggregation is:
Data integration: transforming data into a common format within a central
(virtual or physical) location, maintaining integrity and non-redundancy.
Data Aggregation is summary of the integrated data (ex: by age group, race,
…)


A Clinical Trial Management System that aggregates all the clinical
trial's information
NIH Biomedical Translational Research Information System (BTRIS): A
clinical research data repository for aggregation and re-use of data
collected at the NIH
Recommended Data aggregation is any process in which information is gathered and expressed in a
definition
summary form, for purposes such as statistical analysis.
Document1
Page 40 of 44
Project:
Title:
Version: 1.0
4
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
Appendices
4.1
CDISC glossary
cdisc_glossaryterms_
version7.1_final_2008.doc
4.2
Related Document
Related Documents
Reference
No.
Document Name
Filename
[FDA1]
Guidance for Industry. Providing Regulatory
Submissions in Electronic Format — Standardized
Study Data - DRAFT GUIDANCE . February 2012
http://www.fda.gov/downloads/Drugs/Guid
ances/UCM292334.pdf
[CDISC1]
CDISC Glossary - 2009
http://www.cdisc.org/stuff/contentmgr/file
s/0/08a36984bc61034baed3b019f3a87139/
misc/act1211_011_043_gr_glossary.pdf
[ISO1]
ISO1179 ISO/IEC 11179 Metadata Registry (MDR)
standard
Accessible on ISO site
[ISO2]
ISO2109
ISO 21090 Healthcare Data Type Standard
Accessible on ISO site (draft version
available on Internet)
4.3
Working group members
Name
email
Isabelle de Zegher (co-chair)
Isabelle.dezegher@parexel.com
Mitra Rocca (co-chair)
Mitra.rocca@fda.hhs.gov
Marcelina Hungria (co-chair)
mhungria@dicoregroup.com
Yun Oldshue
yun.oldshue@takeda.com
Kenneth Stoltzfus
kenneth.m.stoltzfus@accenture.com
Julie James
julie_james@bluewaveinformatics.co.uk
Tim Church
tim.church@torch.uk.net
Gregory Steffens
Gregory.steffens@novartis.com
Praveen Garg
Praveen.Garg@iconplc.com
Document1
Page 41 of 44
Project:
Title:
Version: 1.0
Metadata Management
Metadata Definitions
Date: 12th March 2014
John Leveille
jleveille@d-wise.com
Aimee Basile
abasile@celgene.com
Sam Hume
shume@cdisc.org
Document1
Working Group:
Emerging Technologies
Page 42 of 44
Project:
Title:
Metadata Management
Metadata Definitions
Version: 1.0
5
Date: 12th March 2014
Working Group:
Emerging Technologies
Parking log of implementation
Maintenance/governance of code system – including the need for primary designation
6
General comments (to be taken out for final document)
Tim Church
Although the purpose of the document is to provide description and examples as well as agreed definitions, I find at
times the separation of 'Definition & source' and 'Description' in each section is a little confusing. While reading
through the document I found it most useful to start with the 'Recommended definition', then read the detailed
'Description' with 'Examples'. The 'Definition & source' is interesting but distracting and I wonder if we really need to
repeat direct quotes from the sources. Surely, if relevant these can be worked into the 'Description' with suitable
reference links.
Although the purpose of the document is to provide description and examples as well as agreed
definitions, I find at times the separation of 'Definition & source' and 'Description' in each section is a
little confusing. While reading through the document I found it most useful to start with the
'Recommended definition', then read the detailed 'Description' with 'Examples'. The 'Definition &
source' is interesting but distracting and I wonder if we really need to repeat direct quotes from the
sources. Surely, if relevant these can be worked into the 'Description' with suitable reference links.
3.1.1 Perhaps Recommended definition should explicitly explain that metadata is a generalised term.
3.1.2 Would it be clearer to remove basic description out of the Synonym section, and ensure it is
covered in the Description. The first reference in the Definition & source has already been listed in
3.1.1. This is another reason why I think it would be clearer to add references in the Description without
quoting in Definition & source. In the Description the first paragraph has quoted text but no reference
as to origin.
3.1.4 As there is no source in the Definition and source section then the information there could easily
be amalgamated with the Description. The Description contains Examples that should be moved to the
Examples section. We need to add something to the Recommended definition as this is the main
purpose of the document.
3.1.5 and 3.1.6 are indicated as synonyms of each other. As the Recommended definition in 3.1.5 covers
the use of both the terms then I wonder if we should use whatever is useful from the Definition &
source in 3.1.6 with the Definition of 3.1.5 and confine the references to the reference section. i.e. I'm
not sure that 3.1.6 section is justified. In line with section 3.2.1 the header could be Metadata
Repository/metadata registry.
3.1.11 VLM is not a synonym but an acronym. Move examples into correct section.
3.2.1 It might be an idea to repeat the Description text in the Recommended definition section. That
would make the document more friendly as a quick reference tool.
Document1
Page 43 of 44
Project:
Title:
Version: 1.0
Metadata Management
Metadata Definitions
Date: 12th March 2014
Working Group:
Emerging Technologies
3.2.2 Would it be worth adding in the Synonym section that Dictionary is sometimes used, but
incorrectly? (similarly to 3.1.8)
Document1
Page 44 of 44
Download