CDISC Study Data Tabulation Model: Conformance Rules Development Guide Version 1.0 Prepared by the CDISC SDS Sub-team for SDTM Conformance Rules 3-16-2015 SDTM Conformance Rules Development Guide v1.0 Contents Introduction ......................................................................................................................... 1 Purpose................................................................................................................................ 1 Specification for the Rules Metadata Model ...................................................................... 2 Assumptions for the Rules Metadata Model ....................................................................... 4 Examples for the Rules Metadata Model ............................................................................ 5 Change History: .................................................................................................................. 6 SDS SDTM Conformance Rules Sub-team Members ........................................................ 8 Introduction The CDISC Study Data Tabulation Model (SDTM) guides the organization, structure, and format of standard clinical trial tabulation datasets submitted to a regulatory authority. This document recommends standard practices for use in formulating and documenting compliance and conformance rules as a supplement to the STDM and SDTM Implementation Guide (SDTMIG). Purpose The purpose of this guide is to provide instructions and a standard, concise structure for identifying and classifying SDTM and SDTMIG text that may constitute a rule definition. The structure for the rules, the Rules Metadata Model, and the conventions for its content are described in detail. Users of this guide are aided to both understand rules in approved rules catalogs that implement this guidance and to author their own rules if the need arises. 1 SDTM Conformance Rules Development Guide v1.0 Specification for the Rules Metadata Model The following table provides a description of the columns in the SDTM Rules metadata: Metadata Rule ID Class Description Identifier used to uniquely identify a rule. The ID is nondescriptive text string comprised of a simple prefix ‘CG’ with a 4 digit numeric suffix. The prefix is an abbreviation for ‘CDISC Guidance’ and is intended to denote the origin of the rule as being authored by a body sanctioned by the CDISC organization. The numeric suffix is a unique sequential counter starting from ‘0001’ that does imply any rule precedence or ranking based on a position in this sequential series. Observation Class scope for rule using 3 character abbreviation: Domain Variable Req Yes Yes ALL – All observation classes SPC – Special Purpose Class FND– Findings Class EVT – Events Class INT – Interventions Class FNA – Findings About TDM – Trial Design Domains When a rule applies to multiple classes each class is listed separated by commas. If the list of included Classes is extensive then the use of the NOT() token (where the list of excluded Classes is listed inside the parentheses e.g. NOT(FND)) is preferred. If ALL is used in the Rule ID then ALL should be the only value in this column. Domain scope for rule using standard CDISC domain abbreviations, exceptions ALL, NOT(). When a rule applies to multiple domains each domain is listed separated by commas. If the list of included domains is extensive then the use of the NOT() token (where the list of excluded domains is listed inside the parentheses e.g. NOT(SC,EG,PE)) is preferred. If ALL is used in the Rule ID then ALL should be the only value in this column. Variable scope for rule. The variable name should be listed here except when the rule applies to multiple domains/observation classes. In such cases a stubbed version of the variable name with dashes in place of the domain abbreviation should be used e.g. -STDY, --ENDY. If the rule applies generally across many or all variables then the keyword “GEN” should be the only value in this column. Yes Yes 2 SDTM Conformance Rules Development Guide v1.0 Rule A concise and unambiguous statement of the conformance or compliance principle to be applied to the rule object. Only one principle should be stated per defined rule. If the rule only applies during a particular circumstance or condition then the condition should be stated separately in the ‘Condition’ column. Yes Condition Whenever a rule is applied only in a conditional set of circumstances the condition or conditions should be defined in this column. Some guidelines for composing the condition statements follow: Do not prefix the condition with ‘If’ or ‘Where’. This is implicit in the fact that it is a conditional statement. Multiple conditions should be separated by standard logical operators e.g. ‘and’, ‘or’, ‘and not’, etc. When referencing variables in the same domain it is not necessary to use a domain prefix. However references to variables in other domains should always use the domain prefix e.g. EG.VISIT, AE.AESER, etc. When referencing a character value then enclosing quotes are required to make these distinct from variable references e.g. “= ‘Y’” vs. “= ACTARMCD”. Wildcards are permitted in conditions e.g. ‘--ENDY’ and are appropriate when defining rules with multiple class and/or domain application. No IG Section Within a domain model, values may be: Description/Overview, Specification, Assumption, Example. Where applicable, e.g., General Assumptions for All Domains, include specific section number as reference If itemized, include specific reference number/letter IG text that states or implies the rule. It is recommended to include all directive statements for completeness (i.e. must, required, should, could, may statements) and to copy/paste text from IG directly as needed. For Section 4, General Assumptions for All Domains, suggest representing once (vs. repeating per domain). Exceptions to be noted by domain. SDTM IG versions for which the rule is in effect. Use standard numeric portion of SDTM IG version e.g. 3.2, 3.3. If rule is in effect for multiple versions each SDTM IG version number is listed delimited by commas. Identifier for the version or release of the rule. Used to track additions and updates to the rules catalog. No Item # Text IG Versions Batch ID No No Yes Yes 3 SDTM Conformance Rules Development Guide v1.0 Assumptions for the Rules Metadata Model 1. General a) All rules should be stated uniquely at the highest applicable level of the rule hierarchy possible. For example: A rule stating that “ARM must not be null” (because its core variable status is Required) should be implemented instead as a general rule for Required variables for all classes, all domains, and all variables, with a condition e.g. Rule: “^= null”, Condition: ”Core = REQ”. b) There should be only one object variable for a rule. If the rule has an interdependent element e.g. ARM must equal ACTARM when ARM = “Screen Failure” then there should be two related rules where the object of the first is ARM and the second ACTARM and the Condition field used to define when these rules are in effect. See Examples. 2. Rule Syntax a) Observation classes, domains and variable names are in uppercase. b) Operators and keywords are in lowercase. c) Any reference to a variable in a domain other than object variable’s domain must be in the form “Domain.Variable” e.g. TA.ARM, DM.SEX. d) If a variable’s values should be members of a discrete list described in guidance but not in CDISC Controlled Terminology then the syntax should be “Variable in (‘value1’, ‘value2’)”. Quotes around individual text values in the list are required. e) Logical operators should be used in place of phrases such as ‘less than or equal to’, ‘not greater than’, ‘should be’, ‘must equal’, etc. This is to enforce a concise and unambiguous rendering of the rule. The following are examples of logical operator substitutions: Use: Instead of: = Should equal, must equal, equals <= Less than or equal to < Less than > Greater than >= Greater than or equal to In Must be defined in, must include ^ Not – general negation operator 3. Rule Terminology a) Controlled or conventional terminology should be used to describe conditions or requirements in a standard way. The following table lists terms and definitions that should be consistently applied when defining rules: Term Definition first The first instance e.g. date (implies sorting) last The last instance e.g. date (implies sorting) unique Only occurrence of value within defined scope present Object is present, e.g. dataset, variable, does not imply that object is populated with data 4 SDTM Conformance Rules Development Guide v1.0 null Object does not contain data one-to-one Value pair is unique, isomorphic relation b) If a variable’s values should be members of a discrete list described in CDISC Controlled Terminology then the syntax should be “Variable in {CT List Name}”. Note the use of braces instead of parentheses. c) When a variable should not contain a value use the keyword “null” e.g. “AGETXT = null” rather than phrases like “is missing”, “equals blank”, “should not be populated”. d) When describing a rule that has an expectation or a restriction to a certain type of data the following standard terminology should be used. Data type Definition numeric Numbers, format chars (.+-) only alphabetic Letters of alphabet only alphanumeric Alphabetic, Numeric, and punctuation/extended format characters boolean True or False datetime ISO8601 Format Date Examples for the Rules Metadata Model Example Rules for use of –PRESP and –OCCUR The examples below illustrate a number of rule metadata conventions and general principles in defining the conformant use of the –PRESP and –OCCUR variables. As these variables are permitted in only certain domains in the Events and Interventions observations classes the rule metadata fields Class and Domain are used define the scope accordingly. The Variable, Rule, and Condition field used the ‘—‘ stub prefix to specify that the rules apply to the same root variables in multiple domains. Rule CG0085 – Uses a delimited list of allowable values and the ‘null’ keyword in the rule. This rule does not have a Condition as it applies to every scenario where –PRESP can be used. Rule CG0086 – Illustrates use of a compound condition with the ‘and’ operator. Rule CG0088 – Shows the use of ‘not exists’ which specifies that the --OCCUR variable should not be present in the dataset if the –PRESP variable is also not present. This ‘null’ keyword would not be used in this case as it would allow the presence of the variable with null values. ID Class Domain Variable Rule CG0085 EVT, INT --PRESP --PRESP in ('Y', null) CG0086 EVT, INT AE, MH, CE, HO, CM, EC, PR, SU MH, CE, HO, CM, EC, PR, SU Condition --OCCUR --OCCUR ^= null --PRESP = 'Y' and --STAT ^= 'NOT DONE' CG0087 EVT, INT MH, CE, HO, CM, EC, PR, SU --OCCUR --OCCUR = null --PRESP ^= 'Y' CG0088 EVT, INT MH, CE, HO, CM, EC, PR, SU --OCCUR --OCCUR not present --PRESP not present CG0089 EVT, INT MH, CE, HO, CM, EC, PR, SU --PRESP --PRESP = 'Y' --OCCUR ^= null 5 SDTM Conformance Rules Development Guide v1.0 Change History: Version Date 0.1 09-Mar-2014 0.2 15-May-2014 Author Stetson Line Stetson Line 0.3 0.4 14-Sep-2014 05-Jan-2015 Stetson Line Stetson Line 1.0 16-Mar-2015 Stetson Line Description of Changes Initial Draft Added ‘MLT’ keyword and guidance to Rule ID, Class, Domain, and Variable sections. Changed ‘GEN’ to ‘ALL’ in Variable guidance as standard keyword for Observation Classes and Domains. Added ‘FNA’ for Findings About to list of Observation Class tokens. Added Data Types to Rule guidance. Revisions to support refinements in rules scope and to implement SHARE team rule harmonization recommendations. 1. Added ‘TDM’ to list of Observation Class tokens 2. Introduced Not() syntax to selectively exclude Observation Classes and Domains from rule scope 3. Substituted ‘^’ as preferred token for Not logical operator. Not is still permitted in compound condition expressions e.g. ‘AND NOT’. 4. Changed guidance to require quotes around text values to distinguish such literals from variable names. Revisions to rules format and rules guide to support ease of review, use, and assimilation with rules developed by other teams and agencies. 1. Creation of non-descriptive rule identifier, deprecation of previous Rule ID format and guidance. 2. Promotion of Class, Domain, and Variable columns to near top of rule metadata order. Taken together these replace the intended utility of the deprecated Rule ID. 6 SDTM Conformance Rules Development Guide v1.0 3. Addition of the IG Version field to rule metadata along with explanation of its use. 4. Addition of the Batch Version field to rule metadata along with explanation of its use. 5. Addition of an Assumptions section. Existing assumptions in previous guide version and new content are consolidated in this section. 6. Addition of an Examples section. Existing examples in previous guide version and new content are consolidated in this section. 7. Insertion of a date column in the Change History table. 8. Insertion of a table of contents, revisions to text and text styles to make more compatible with current SDTM IG publication format 9. Insertion of Cover Page 10. Insertion of sub-team membership table. 7 SDTM Conformance Rules Development Guide v1.0 SDS SDTM Conformance Rules Sub-team Members Team Member Stetson Line Carlo Radovsky Tom Guinter Anne Russotto Sue Sullivan Gary Walker Daniel DiPrimeo Naveed Khaja Abhishek Dabral Manuel Anido Lynn Anderson Max Kanevsky Helena Sviglin Janet Reich Sergiy Sirichenko Organization Independent etera solutions Independent Celgene D-Wise Technologies Quintiles Puma Biotechnology Independent Forest Laboratories Forest Laboratories etera solutions Pinnacle 21 FDA, CBER Amgen Pinnacle 21 8 SDTM Conformance Rules Development Guide Nate Freimark v1.0 Theorem Clinical Research 9