CDISC MDR Business Requirements Specification 3.1.3 Analysis Dataset Creation (single study) High level requirement Problem to solve (+ root cause analysis) Presumption is that a data collection specification already exists that identifies all the concepts and variables to be included in the study. Analysis dataset creation spans aspects of the protocol (the analysis section), the study analysis plan and the submission analysis plan Deliverable is an analysis dataset specification which meets all the protocol, study and submission requirements and is sufficient for the purposes of writing the programming required to create these datasets Want to be able to write programs to produce ADaM compliant datasets and traceability (results back to ADaM datasets back to SDTM datasets back to CRF/eCRF) as efficiently as possible Process No process to enforce collection of meta-data when collecting data No process to ensure consistency in data collection across studies No tools to store relationship between variables at a conceptual level (e.g. SYSBP may be collected with site and position) Data collection tool do not support collection of metadata ??? High level storyboard Detailed storyboard People Additional burden on data collection team with benefits to other people Protocol Team focus on ONE protocol and overlook need for data integration for submission (ISSE and ISE) and further data mining CDISC SDTM does not manage different groupings in different contexts (e.g. SYSBP with/without qualifiers) CDISC SDTM limited to safety No consistency in the way variables are used across studies No information on how variables were linked together in data collection No agreement on terminology/code list in clinical standards BRIDG is the conceptual model linking variables Technology External data standards 1. Generate SDTM datasets in a metadata driven way 2. Define analysis datasets and variables (including algorithms) 3. Create ADaM datasets and variables using the CMDR variables now provided in SDTM datasets (via step 1) 4. Create define.xml with full traceability using the CMDR metadata and metadata created and stored in steps 1-3 above SDTM The starting point for creation of analysis datasets is a set of datasets containing CMDR compliant variables populated with the data collected in the study. If the datasets are SDTM compliant, no pre-processing of the data is required before commencing analysis dataset creation. If that is not the case, then the first step is to create SDTM compliant datasets. Note that some SDTM domains (primarily safety domains) are well specified in the SDTM Implementation Guide (SDTM IG). Many domains not covered by the SDTM IG are likely to remain company specific due to different reporting Page: 1 / 4 CDISC MDR Business Requirements Specification needs and different toolsets. For consistent and efficient generation of SDTM datasets, a metadata driven approach is employed: 1. CMDR concepts for which named SDTM domains exist in the SDTM IG will already have that link established, as a default, within the concept (e.g. the AE concept would be identified as matching the SDTM AE domain) 2. A study-based link between CMDR concepts and SDTM classes and domains will be established in a study/compound/company-based metadata registry for those CMDR concepts for which named SDTM domains do not exist in the SDTM IG (and SDTM domains were not agreed during the population of the CMDR) 3. Using the link (stored within the concept) between the BRIDG model and the CMDR variables, generic programs would “map” CMDR variables to the relevant SDTM variables within the appropriate SDTM domain (identified in steps 1 and 2 above) 4. Any CMDR variables which do not fit into the relevant findings, interventions, events domain would be auto-populated into domain-level SUPPQUAL datasets to ensure consistent handling of variables CMDR Concept definition. There is a complex concept defined for each SDTM domain define in the SDTM IG. This complex concept contains all the simple concepts related to STDM variables ; this will allow to map any relevant variables to SDTM variables Note: SDTM variables names should be used a gold standard variable name as much as possible Concept definition. It is possible to define new complex concepts to cover domains not yet covered in SDTM. The CMDR would then be the definitive source for named SDTM domains. The SDTM IG would provide key training and guidance for implementation, but the IG would no longer be maintained as the authoritative source for the SDTM specification. Note: a company-specific MDR should also be able to manage simple and complex concepts, e.g. in the case of new indications/proprietary information this should however be minimized the variables included in these company specific concept should be based on CMDR simple concept/variable as much as possible Analysis Datasets Through review of the protocol analysis section and study/submission analysis plans (which identify the analyses and displays that are required): Identify new analysis datasets required Identify new analysis variables required Identify algorithms required for analysis variables may need to identify alternative algorithms required to Page: 2 / 4 CDISC MDR Business Requirements Specification demonstrate robustness of analysis to different algorithms some algorithms may be data driven e.g. pre-defined algorithm doesn’t handle all situations that are present in the data establish rules about handling of incomplete data identify any variables required for statistical procedures e.g. dummy variables for logistic regression etc ADaM Create ADaM datasets using metadata defined and stored in step 2 above and by conducting the following Identify protocol violators (using the CDISC inclusion/exclusion standard; may need new analysis variables for this e.g. to determine compliance or to derive from the medical history whether the subject has had “cancer in past N years?” Set population flags (often set programmatically, but sometimes entered manually) CMDR (for Analysis Datasets and ADaM creation) Concept definition. It is possible to define a complex concept for each analysis data set. This practice should however be limited to “standard data sets” that are expected to be created regularly. Very specific analysis data set should be created either within the company specific MDR or in the study meta-data registry. Variable definition. It is possible to specify new variables (linked to a simple concept). While defining new variables – it is possible to provide a algorithm together if the variable is derived. Note: in the first phase of CMDR there is not further specification on algorithm, and it will be hard to get agreement on definitions for many analysis variables. In a second phase we may want to standardize on derivation algorithms as they also define semantic. Note: Analysis Datasets and ADaM creation would use CMDR variable and concept information in a basic lookup manner but would not provide algorithms/coding information Summary of needs Define.xml Generate define.xml that provides full traceability using CMDR metadata, metadata derived and stored as part of the SDTM generation, and metadata derived during the ADaM dataset creation CMDR Variable search. possibility to retrieve variables and their full definitions for inclusion into a define.xml file Define.xml could link to CMDR content rather than carrying duplicated/downloaded information (but as this is more of a define.xml requirement, needs to be coordinated with CDISC TLC) CMDR CMDR concepts for which named SDTM domains exist will already have Page: 3 / 4 CDISC MDR Business Requirements Specification that link established, as a default, within the concept (the AE concept would be identified as matching the SDTM AE domain) Using the link (stored within the concept) between the BRIDG model and the CMDR variables, generic programs would “map” CMDR variables to the relevant SDTM variables Study/compound/company-based metadata registry A study-based link between CMDR concepts and SDTM classes and domains will be established in a study/compound/company-based metadata registry for those CMDR concepts for which named SDTM domains do not exist Page: 4 / 4