Project: Title: Version: 0.1 Metadata Management Metadata Definitions Working Group: Emerging Technologies Date: 22 April 2013 PhUse Emerging Technology Working Group Metadata definitions Document1 Page 1 of 16 Project: Title: Version: 0.1 Metadata Management Metadata Definitions Working Group: Emerging Technologies Date: 22 April 2013 Table of Contents 1 INTRODUCTION ............................................................................................................................. 4 2 SCOPE ............................................................................................................................................ 4 3 DEFINITIONS .................................................................................................................................. 4 3.1 METADATA MANAGEMENT .................................................................................................... 4 3.1.1 Metadata ...................................................................................................................... 4 3.1.1 Structural metadata ...................................................................................................... 5 3.1.2 Operational metadata................................................................................................... 6 3.1.3 Data element ................................................................................................................ 6 3.1.4 Attribute ........................................................................................................................ 7 3.1.5 Class ............................................................................................................................ 7 3.1.6 Data type ...................................................................................................................... 7 3.1.7 Metadata management ................................................................................................ 7 3.1.8 Metadata repository ..................................................................................................... 7 3.2 MASTER DATA MANAGEMENT .............................................................................................. 8 3.2.1 Master Data .................................................................................................................. 8 3.2.2 Master Data Management ........................................................................................... 8 3.2.3 Master Reference Data ................................................................................................ 8 3.2.4 Master Data Source System ........................................................................................ 8 3.2.5 Reference Data ............................................................................................................ 8 3.2.6 Reference Data Management ...................................................................................... 8 3.3 CONTROLLED TERMINOLOGY, CODE SYSTEMS & VALUE SETS .................................... 9 3.3.1 Concept ........................................................................................................................ 9 3.3.2 Code ............................................................................................................................. 9 3.3.3 Code system ................................................................................................................ 9 3.3.4 Concept definition ........................................................................................................ 9 3.3.5 Concept designation .................................................................................................... 9 3.3.6 Concept domain ........................................................................................................... 9 3.3.7 Concept identifier ......................................................................................................... 9 3.3.8 Concept representation ................................................................................................ 9 3.3.9 Value set ...................................................................................................................... 9 3.4 INTEROPERABILITY ................................................................................................................ 9 3.4.1 Interoperability .............................................................................................................. 9 3.4.2 Technical interoperability ............................................................................................. 9 3.4.3 Semantic interoperability .............................................................................................. 9 3.5 DATA AGGREGATION, INTEGRATION .................................................................................. 9 3.5.1 Data pooling ................................................................................................................. 9 3.5.2 Data aggregation.......................................................................................................... 9 3.5.3 Data integration ............................................................................................................ 9 4 INPUT (DRAFT MATERIAL THAT CAN BE USED – TO BE DELETED IN FINAL DOCUMENT)10 4.1 METADATA MANAGEMENT .................................................................................................. 10 4.2 MASTER DATA MANAGEMENT ............................................................................................ 11 Document1 Page 2 of 16 Project: Title: Version: 0.1 Metadata Management Metadata Definitions Working Group: Emerging Technologies Date: 22 April 2013 4.3 CONTROLLED TERMINOLOGY ............................................................................................ 12 4.4 INTEROPERABILITY .............................................................................................................. 14 4.5 DATA AGGREGATION ........................................................................................................... 15 5 REFERENCES & RELATED DOCUMENTS ................................................................................ 16 6 APPENDICES ............................................................................................................................... 16 6.1 CDISC GLOSSARY ................................................................................................................ 16 Document1 Page 3 of 16 Project: Title: Metadata Management Metadata Definitions Version: 0.1 1 Working Group: Emerging Technologies Date: 22 April 2013 INTRODUCTION: purpose of this document This document provides agreed definitions around meta-data management and related aspects across the industry. It is expected that these definitions will be re-used in the FDA guidelines as agreed cross industry definitions. To be of operational value, the document contains not only definitions but also a short description and example of use. Whenever possible, the definitions are built from those existing definitions from FDA guidance's, CDISC glossary, check cross industry definition (e.g. Gartner). Reference to the source definition is provided. This document does not intend to be extensive and complete. It is intended to bring clarification on the most commonly used (and miused !) definition in our industry around metadata and master data management; The CDISC glossary [CDISC1] (and document in attachment) is heavily used as reference in this document; It is expected that the reader of this document is familiar with the abbreviations and acronyms contained in the CDISC glossary; these are not repeated here. 1) Metadata, (2) Meta data, or (3) Meta-data 2 SCOPE The following topic areas are in scope of this document • Metadata management: metadata (structural & operational), data elements, attributes, classes.. • Master data management: Master data, reference data, master reference data • Controlled terminology, code systems, value sets, permissible values • Data pooling, data integration, data aggregation • Interoperability, semantic interoperability Definitions are provided per topic area to ease reading and structure of this document. 3 DEFINITIONS 3.1 Metadata management 3.1.1 Metadata Acronym Definition & Metadata is often described as data about data. source The term metadata –"data about data” - is an ambiguous term which is used for two fundamentally different concepts (http://en.wikipedia.org/wiki/Metadata ). Document1 Structural metadata, the design and specification of data structures (e.g. format, semantic, ..), cannot be “data about data”, because at design time the application contains no data. In this case the correct description would be "data/information about the containers of data". Page 4 of 16 Project: Title: Version: 0.1 Metadata Management Metadata Definitions Date: 22 April 2013 Descriptive metadata, on the other hand, is about individual instances of application data, the data content (e.g. patient population for a specific study, audit trail). In this case, a useful description would be "data about data content" or "content about content". Description See structural (or descriptive) metadata and operational metadata Example See structural metadata and operational metadata Normative definition (do we want to do this ?? suggest yes) 3.1.1 Working Group: Emerging Technologies Structural metadata Acronym Definition & source http://en.wikipedia.org/wiki/Metadata The design and specification of data structures (e.g. format, semantic, ..), cannot be “data about data”, because at design time the application contains no data. In this case the correct description would be "data/information about the containers of data". [FDA1] Structural metadata is structured information that describes, explains, or otherwise makes it easier to retrieve, use, or manage data. Description Structural metadata is what most of people mean by metadata. Structural metadata is said to “give meaning to data” or to put data “in context.” Data about a library book such as author, type of book, and the Library of Congress number, are structural metadata and were once maintained on index cards. SAS labels and formats are a rudimentary form of structural metadata, although they have not historically been referred to as metadata. Example The number 120 itself is meaningless without structural metadata such as - The name of the variable (e.g. Systolic Blood Pressure) with its definition - The unit related to this physical quantity (e.g; Systolic Blood Pressure Unit = mmHG) The CDISC SDTM data standard is the structural metadata of all the data collected with that standard. For instance the variable “Sex” is described by a set of structural meta data such as the label, data type (char) and associated value sets (male and female, ..), role in SDTM, … A data model - describing the classes, attributes, relationships and hierarchies – constitutes the structural metadata of the underlying data base. Normative definition Document1 (do we want to do this ?? suggest yes) Page 5 of 16 Project: Title: Version: 0.1 3.1.2 Metadata Management Metadata Definitions Working Group: Emerging Technologies Date: 22 April 2013 Operational metadata Acronym Definition & source Description Example Normative definition http://en.wikipedia.org/wiki/Metadata The individual instances of application data, the data content. In this case, a useful description would be "data about data content" or "content about content". Descriptive metadata are also called operational metadata. It is used in different contexts Data operations and statistical analysis. Additional content on the data that support further analysis of the data. For instance patient population in the context of a clinical trial study is operational metadata Software implementation: all information needed to support data lineage & traceability including data on origin, usage… such as Study related metadata: patient population, indication, therapeutic area Software related metadata: o What is the source of the data and in which system is it authored o Who can use a piece of information different roles for access and action they can perform: who can edit it in which system, who has read access to it o Which transformation happen to the data, how and when o Audit trail: who access which information, when (do we want to do this ?? suggest yes) 3.1.3 Data element Acronym DE Definition [FDA1] A data element is the smallest (or atomic) piece of information that is useful for analysis (e.g., a systolic blood pressure measurement, a lab test result, a response to a question on a questionnaire). [CDISC1] 1. For XML, an item of data provided in a mark-up mode to allow machine processing. [FDA - GL/IEEE] 2. Smallest unit of information in a transaction. [Center for Advancement of Clinical Research] 3. A structured item characterized by a stem and response options together with a history of usage that can be standardized for research purposes across studies conducted by and for NIH. [NCI, caBIG] NOTE: The mark up or tagging facilitates document indexing, search and retrieval, Document1 Page 6 of 16 Project: Title: Metadata Management Metadata Definitions Version: 0.1 Working Group: Emerging Technologies Date: 22 April 2013 and provides standard conventions for insertion of codes. [ISO1] Description unit of data for which the definition, identification, representation and permissible values are specified by means of a set of attributes A Data Element is the most elementary unit of data that cannot be further subdivided from a semantic point of view, as it is linked with a precise meaning. A data element has: An identification such as a data element name A clear definition/ semantic description A data type Optional enumerated values (value sets) One or more representation terms (synonyms) Synonyms Example In the context of SDTM a variable is equivalent to a Data Element In the context of BRIDG, an attribute is equivalent to a Data Element Birth Date is a Data Element Normative definition DE name: BirthDate Definition: date and time on which the subject is born Data type: date (mm/dd/yyyy – hh/mm/ss – time zone) Value sets: not applicable Synonyms: BRTDTC in CDISC SDTM, birthdate in BRIDG (do we want to do this ?? suggest yes) 3.1.4 Attribute 3.1.5 Class 3.1.6 Data type 3.1.7 Metadata management 3.1.8 Metadata repository Document1 Page 7 of 16 Project: Title: Metadata Management Metadata Definitions Version: 0.1 3.2 Working Group: Emerging Technologies Date: 22 April 2013 Master data management 3.2.1 Master Data Acronym Definition & [Gartner – Magic Quadrant for Master Data Management of Customer Data Solution] source http://www.gartner.com/technology/reprints.do?id=1-1CK9UDO&ct=121019&st=sb Master data is the consistent and uniform set of identifiers and extended attributes that describes the core entities of the enterprise, such as customers, prospects, citizens, suppliers, sites, hierarchies and chart of accounts. Description o Example Normative definition 3.2.2 Master Data Management Acronym Definition & [Gartner – Magic Quadrant for Master Data Management of Customer Data Solution] source http://www.gartner.com/technology/reprints.do?id=1-1CK9UDO&ct=121019&st=sb MDM is a technology-enabled discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise's official, shared master data assets. Description o Example Normative definition 3.2.3 Master Reference Data 3.2.4 Master Data Source System 3.2.5 Reference Data 3.2.6 Reference Data Management Document1 Page 8 of 16 Project: Title: Metadata Management Metadata Definitions Version: 0.1 3.3 Date: 22 April 2013 Controlled Terminology, code systems & value sets 3.3.1 Concept 3.3.2 Code 3.3.3 Code system 3.3.4 Concept definition 3.3.5 Concept designation 3.3.6 Concept domain 3.3.7 Concept identifier 3.3.8 Concept representation 3.3.9 Value set 3.4 Interoperability 3.4.1 Interoperability 3.4.2 Technical interoperability (“machine interoperability”) 3.4.3 Semantic interoperability 3.5 Working Group: Emerging Technologies Data aggregation, integration 3.5.1 Data pooling 3.5.2 Data aggregation 3.5.3 Data integration Document1 Page 9 of 16 Project: Title: Version: 0.1 4 Metadata Management Metadata Definitions Working Group: Emerging Technologies Date: 22 April 2013 INPUT (draft material that can be used – to be deleted in final document) 4.1 Metadata management Term Acronym Definition attribute Description of a property of an object. An attribute may be further described as a data element stored in a metadata repository and in implementation, becomes one or more variables. For example: in BRIDG, raceCode is an attribute of class Person (i.e. Person.raceCode), and value is an attribute of DefinedObservationResult. class Set of Data Elements describing a logical “thing” A class has: • An identifier such as an class name • A clear object definition / semantic description • One or more representation terms • A list of DE (also known as attributes) • A list of related classes and a description of the relationship type(s)• Any description – in addition to DE – that allow to map the object with an application vertical Data Type A data type is a classification identifying one of various types of data, such as real-valued, integer or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored. Metadata Management MEM Meta Data MDR Repository Document1 Metadata Management is a worldwide infrastructure composed of policies, procedures, standards, models, skills, tools and training needed to promote the shareability of data throughout the enterprise and to our customers. Repository composed of Descriptive Meta Data. Within the clinical research world, there is around 30.000 to 50.000 different data elements covering all potential data that can be collected for a patient. Page 10 of 16 Project: Title: Version: 0.1 4.2 Metadata Management Metadata Definitions Working Group: Emerging Technologies Date: 22 April 2013 Master data management Term Acronym Definition Master Data Master Data is business data that has a consistent meaning and definition to ne shared across systems; this applies particularly to data such as site identification, investigator identification, and study identification. It is produced into a “master system” as part of a transaction and is used for reference and validation in transactions within other systems. Master Data – as any other data – are defined with structural Meta data Master Data MDM Management Master Data Management comprises a set of processes and tools that consistently defines and manages the non-transactional data entities of an enterprise which is fundamental to the company’s business operations (may include reference data). Master Data Management has the objective of providing processes for collecting, aggregating, matching, consolidating, quality-assuring, persisting and distributing such data throughout the enterprise to ensure consistency and control in the ongoing maintenance and application use of this data. This is sometimes known as Reference Data Management. Master Reference Data A combination of Master Data and Reference Data. The governance of these 2 components is however quite different: reference data are often defined by external organizations and are defined at design time; they are generally managed within a terminology server (or a meta data repository) as part of all the code systems master data are created during application run time through a transaction and are stored into the source system considered as the source of truth. Master Data Source System Master Data Source System is the application that houses a master data “dimension” (or type of master data such as site or investigator) for Perceptive Informatics. The system is available to all applications (operational and information provisioning, including the Data Warehouse) across the enterprise. Reference Data In context of Master Reference Data Management this corresponds to the set of code systems that are commonly used across many different systems and attributes Reference Data Management Management of Reference Data Document1 Page 11 of 16 Project: Title: Version: 0.1 4.3 Controlled terminology Term Acronym Metadata Management Metadata Definitions Working Group: Emerging Technologies Date: 22 April 2013 Definition Concept A concept is a “unit of thought” within a particular domain – a unitary or atomic mental representation of a real or abstract thing Concepts, as abstract, language- and context-independent representations of meaning, are important for the design and interpretation of static information models. They constitute the smallest semantic entities1 with which models are built. The authors and the readers of an information model use concepts and their relationships to build and understand the models. code Code’ is the machine-processable part of a Concept Representation, published by the author of a code system as part of the code system. It is the preferred unique machine-readable identifier for that concept in that code system and is used in the 'code' property of an ISO 21090 CD data type. Codes are sometimes meaningless identifiers, and sometimes they are mnemonics that imply the represented concept to a human reader; meaningless identifiers are advised particularly in larger vocabulary systems Code system A Code System is a managed collection of concept representations, including codes and/or designations (or human readable text/decode), but sometimes with more complex sets of rules, references (definitions), and relationships. Although things may be differentially referred to as terminologies, vocabularies, or coding schemes, or even classifications, the ISO 21090 CD datatype considers all such collections ‘code systems’. A code system is typically created for a particular purpose; they may consist of finite collections, such as concepts that represent individual countries, colours, or states, or they may represent broad and complex collections of concepts across a particular domain, e.g., SNOMED-CT, ICD, LOINC, and CPT. A code system should be uniquely identifiable; for ISO 21090conformant uses, this identifier shall take the form of an ISO OID. 1 As models are layered and developed, the size and description of the smallest semantic entity may change, to best meet the use case(s) and requirements, and to show different views on reality Document1 Page 12 of 16 Project: Title: Term Acronym Version: 0.1 Definition Metadata Management Metadata Definitions Working Group: Emerging Technologies Date: 22 April 2013 Concept definition A concept definition is the explanation of the meaning of the concept. The concept definition may be provided wholly by the concept designation, with or without additional text etc. (see concept representation), but particularly in large code systems that employ description logic or similar ontological functionality, the full definition of the concept may require knowledge of its relationship to other concepts within the code system. Concept designation A concept designation is a language symbol for a concept that is intended to convey the concept meaning to a human being. A concept designation may also be known as an appellation, symbol, or term, this latter being the most common synonym. A concept designation is typically used to populate the 'displayName' property of an ISO 21090 CD data type. Concept domain A concept domain is a sentence or paragraph that defines the semantic space (the totality of meaning that can be expressed by the concepts that can be used) for the “thing" that a coded attribute in an information model is to encompass, plus examples of these “things”. For example: an information model class is “car” and the coded attribute is “manufacturer”; the concept domain is “The company that makes/markets the car to the general public; examples include General Motors, Ford Motor Company and Mercedes-Benz”. Concept identifier A concept identifier is a vocabulary object that unambiguously and globally uniquely represents a concept within the context of a code system in a machine readable way. A concept identifier consists of: cthe OID for Code System + Code (+ Designation/Display name). To make a Concept Identifier human readable, the “display name” (the designation) is added thus: the OID for Code System + Code (+ Designation/Display name). The designation (display name) is not mandatory in the ISO 21090 concept identifier, but it is considered good terminology practice to always have the designation for safety reasons (data unscrambling etc.)2. Concept representati on A concept representation is a vocabulary object that enables the description and manipulation of a concept in systems and applications (such as information models, xml schema). A concept representation is minimally formed by putting together a code and a designation. However, a concept representation in a code system may also be augmented with additional text, annotations, 2 Debate as to whether the display name should be carried in a concept identifier continues. There are a significant group who feel that the display name should not be carried. Document1 Page 13 of 16 Project: Title: Term Acronym Version: 0.1 Definition Metadata Management Metadata Definitions Working Group: Emerging Technologies Date: 22 April 2013 references and other resources that serve to further identify and clarify what the concept is. Value set 4.4 A value set is a uniquely identifiable set of valid concept identifiers that instantiate a concept domain in use (in an application, an xml instance etc.) where any concept identifier used can be tested to determine whether it is a member of the value set at a specific point in time. Value sets exist to instantiate the permissible content of a concept domain for a particular use in an information model vocabulary binding, in analysis, in UI data collection - in a pick list (drop-down box), etc. A value set is useful only in the context of instantiation of an attribute in an information model, not as a stand-alone object (this is in contrast to a code system, which exists in its own right). Interoperability Term Semantic Interoperabil ity Acronym Definition FDA guidance “Interoperability” means the ability to communicate and exchange data accurately, effectively, securely, and consistently with different information technology systems, software applications, and networks in various settings, and exchange data such that clinical or operational purpose and meaning of the data are preserved and unaltered. Technical interoperability describes the lowest level of interoperability whereby two different systems or organizations exchange data so that the data are useful. There is nothing that defines how useful. The focus of technical interoperability is on the conveyance of data, not on its meaning. Technical interoperability supports the exchange of information that can be used by a person but not necessarily processed further. When applied to study data, a simple exchange of nonstandardized data using an agreed-upon file format for data exchange (e.g., SAS transport file) is an example of technical interoperability. Semantic interoperability describes the ability of information shared by systems to be understood, so that nonnumeric data can be processed by the receiving system. Semantic interoperability is a multi- Document1 Page 14 of 16 Project: Title: Term Acronym Version: 0.1 Definition Metadata Management Metadata Definitions Working Group: Emerging Technologies Date: 22 April 2013 level concept with the degree of semantic interoperability dependent on the level of agreement on data content terminology and other factors. With greater degrees of semantic interoperability, less human manual processing is required, thereby decreasing errors and inefficiencies in data analysis. The use of controlled terminologies and consistently defined metadata support semantic interoperability. Process interoperability is an emerging concept that has been identified as a requirement for successful system implementation into actual work settings. Simply put, it involves the ability of a system to provide the right data to the right entity at the right point in a business process. 4.5 data aggregation Document1 Page 15 of 16 Project: Title: Version: 0.1 5 Metadata Management Metadata Definitions Working Group: Emerging Technologies Date: 22 April 2013 REFERENCES & RELATED DOCUMENTs Related Documents Reference No. Document Name Filename [FDA1] Guidance for Industry. Providing Regulatory Submissions in Electronic Format — Standardized Study Data - DRAFT GUIDANCE . February 2012 http://www.fda.gov/downloads/Drugs/Guid ances/UCM292334.pdf [CDISC1] CDISC Glossary - 2009 http://www.cdisc.org/stuff/contentmgr/file s/0/08a36984bc61034baed3b019f3a87139/ misc/act1211_011_043_gr_glossary.pdf [ISO1] ISO1179 ISO/IEC 11179 Metadata Registry (MDR) standard Accessible on ISO site [ISO2] ISO2109 ISO 21090 Healthcare Data Type Standard Accessible on ISO site (draft version available on Internet) Status Name Company Date Signature Author Author Author Author 6 6.1 Appendices CDISC glossary cdisc_glossaryterms_ version7.1_final_2008.doc Document1 Page 16 of 16