Experience and process for collaborating with an outsource company to create the define file. Ganesh Sankaran TAKE Solutions Agenda • Typical work flow when sponsors create the SDTM / ADaM in-house and collaborate with vendors for the Define files • Define.xml Sections • Define.xml Process - How do we go about extracting the information from the data & documents provided ..? • Validating Define.xml & the typical Checks • Common Issues • Conclusion – How soon should the sponsor start..? Typical Work flow collaborating with a Vendor for creating Define files Sponsor provides the documents & Draft Data Sponsor reviews the findings and update the specification / dataset / annotation Send the updated Annotations/Specifica tion / XPTs back to the vendor for a final delivery (Pass II) Run the compliance / structure checks on the data Generate draft Define.xml & run the compliance checks Summarize the Issues/findings and deliver the draft define for review Runs the compliance checks, re-generate the final version of Define (Pass II) Inputs that are provided.. • Annotated Case Report Form • Mapping Specification documents • SAS Datasets / XPTs • Sponsor Controlled Terminology Documents, if applicable • Protocol, if Trial Design Domain to be produced • Data Guide / Supplemental Document Define.XML Section • • • • • • • TOC – Metadata of Datasets blankcrf (Annotated ) Variable Level Metadata Value Level Metadata Controlled Terminology Computational Algorithms Supplemental Data Definition Document Define.XML Section (Not visible through the Style Sheet) • Xmlns - Identifies the default namespace for this document • ODMVersion - Identifies the ODM version that underlies the schema for the DefineXML • FileOID - unique identifier for this file. • CreationDateTime - When the specific version of the define.xml file was created. • StudyName, StudyDescription, ProtocolName – Study level Information Define.XML Components and how do we generate them… • MetaData Generation – • DOMAIN Level • VARIABLE Level • VALUE Level • ORIGIN, CODELIST, Comments and Computational Algorithm • blankcrf, Data Guide / Supplemental Docs • Generate Define.xml • Validate Define files Define.XML process Input Sheet for Define.XML Generation • DOMAIN Level Input – SAS based macro utility will create the Input s for this sheet based on the Datasets provided VARIABLE METADATA – By reading through the metadata of the SAS datasets provided, variable Level metadata input sheet is populated. Input Sheet for Define.XML Generation • ORIGIN information will be extracted based on the Annotations & Mapping Specification provided. Based on the variables for which CODELIST , COMPUTATION ALGORITHM and VALUELIST need to be populated, OID will be assigned here. Based on the OIDs assigned in the VARIABLE LEVEL sheet, VALUE LEVEL input sheet and CODELIST input sheet will be generated by reading the data and the associated codelist files. Input Sheet for Define.XML Generation • Value Level Input • Codelist / Computation Methods Input External Documents – blankcrf & Data Guide • Annotated Case Report Form and Supplemental Documents like Data Guide will be linked to the define.xml • ORIGIN Page number presented as part of the variable level metadata must be hyperlinked to the corresponding CRF pages attached to the Define file. Input Sheet for Define.XML Generation • Once the Domain Level, Variable level, Value Level, Codelist sheets are created, external documents linked and the ORIGIN, COMPUTATIONAL ALGORITHM & External Dictionary information updated and inputs reviewed, DEFINE.XML can be generated Validation Checks • Structural Checks: Type of Checks on the Metadata Type of Checks on the Metadata 1. 2. 3. 4. 5. 6. Non-standard SDTM variables 7. Variable Names in lower case 8. Variable Order mismatch 9. Variables with Formats 10. Permissible variables present with NULL Values for all records Domain Label mismatch Variable Label mismatch Data type mismatch Missing Expected & Required variables Required / Expected Variables with NULL values for all records Validate Define.XML • A valid Define.xml should be well formed & conform to the XML schemas. Should reference correct versions of CDISC standards. Sample Validation Checks 1. XML is well formed 2. All Required Elements are included and / not empty 3. OID attribute element must be unique within a single Metadata version – No duplicates def:leaf element, def:ComputationMethod , def:valueListDef, 4. No Duplicates in ItemGroupDef, ItemDef, ItemRef, Study, CodeList element etc. 6. Invalid Data type value for CODELIST elements 7. CodeValue must be unique within a single CodeList 8. Invalid Codelist for variable, nonextensible CT 8. Invalid Data type value for ItemDef elements 9. Invalid ‘Filetype’, ‘MedDRA’ values 10. Invalid ‘Repeating’, ‘Mandatory’ values Common Issues • Origin is ‘CRF’, but not annotated. ORIGIN ‘Derived’ but annotated in the CRF. • Key variables not properly defined. • While presenting Custom domains, Domain assumption should be followed. Sometimes custom domains derived without a TOPIC variable. • Subjects collected as part of external data LB/EG, but not populated in DM domain. All Subjects must be present in DM domain. • One-to-one relationship missing across some of the paired variables like TEST / TESTCD, PARAM / PARMCD, VISIT / VISITNUM, AVISIT /AVISITN, TPT / TPTNUM TPT & TPTREF • Common variables across different domains having different ORIGIN derivation. If it’s the same across, can go with “Copied from ADSL.XX” Common Issues (contd) • Generally, XPTs up to 1 GB size is fine. If the XPT file size exceeds 1GB, it must be split to smaller datasets not exceeding 1 GB. Study Data Specifications • Split files should have the same metadata structure so that concatenation / merging of the split datasets should be feasible. Both smaller split files & larger (non-split) file should be included. • Split datasets and the method applied should be documented in the data guide • If not following linear approach, need to make sure consistency between ADaM/SDTM sources. Common Issues (Contd) • ADaM when derived in a Parallel Stream might require extra efforts for ensuring traceability & Data Lineage. Conclusion • Finalize the scope of the work being outsourced / to be performed by the vendor. • Explain the process being followed and agree to a common form for exchange of documets that could expedite the Define files generation. • While working across a family of similar studies within the same indication, after a couple of iterations/studies, should look for achieving better efficiency. • Identify the Vendor(s) at least three months before you expect the first Define.XML to be published. If possible, do a pilot or DEMO define. Thank You