Virology Therapeutic Area Data Standards User Guide (VR-UG) Prepared by the CDISC Virology Team Notes to Readers This provisional user guide is based upon the forthcoming Version 1.4 of the CDISC Study Data Tabulation Model and the CDISC Pharmacogenomic/Genetics Study Data Tabulation Model Implementation Guide (SDTMIG-PGx), currently under development. See Appendix C for Representations and Warranties, Limitations of Liability, and Disclaimers. Revision History Date September 6, 2012 December 6, 2012 Version 1.0 Draft 1.0 Provisional Summary of Changes Released version for public comment. Released version reflecting all changes and correction identified during the comment period. © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional CDISC Virology User Guide (Version 1.0) TABLE OF CONTENTS 1 INTRODUCTION ................................................................................................................................................1 1.1 PURPOSE .......................................................................................................................................................1 1.2 CDER GUIDANCE ON ANTIVIRAL PRODUCT DEVELOPMENT .........................................................1 1.3 RELATIONSHIP TO PRIOR DOCUMENTS................................................................................................ 2 1.4 ORGANIZATION OF THIS DOCUMENT ...................................................................................................2 1.5 DESIGN CONSIDERATIONS AND APPROACH .......................................................................................2 2 RELATIONSHIPS BETWEEN THE PHARMACOGENOMICS/GENETICS (PGX) AND BIOSPECIMEN DOMAINS ............................................................................................................................... 4 2.1 RELATIONSHIPS BETWEEN MOLECULAR CONCEPTS .......................................................................6 3 VIRAL RESISTANCE FINDINGS (VR) ...........................................................................................................8 3.1 ASSUMPTIONS FOR VIRAL RESISTANCE TEST FINDINGS (VR) DOMAIN MODEL ..................... 11 3.2 EXAMPLES FOR VIRAL RESISTANCE TEST FINDINGS (VR) DOMAIN MODEL ........................... 11 4 PHARMACOGENOMICS FINDINGS (PF) ................................................................................................... 14 4.1 ASSUMPTIONS FOR PHARMACOGENOMICS TEST FINDINGS (PF) DOMAIN MODEL ................ 17 4.2 GENETIC VARIATION ASSUMPTIONS .................................................................................................. 17 4.3 EXPLANATORY NOTES ON SLC DATABASE GENETIC CODES ...................................................... 18 4.4 EXAMPLES FOR VIRAL GENETICS FINDINGS .................................................................................... 18 5 PHARMACOGENOMICS/GENETICS METHODS AND SUPPORTING INFORMATION (PG) ......... 26 5.1 ASSUMPTIONS FOR PHARMACOGENOMICS (PG) DOMAIN MODEL ............................................. 30 5.2 LIST OF IDENTIFIED COMMON SUPPQUALS ...................................................................................... 30 5.3 EXAMPLES OF TESTCDS FOR REFERENCING PUBLIC DATABASES ............................................. 31 5.4 PG EXAMPLES ............................................................................................................................................ 31 6 PGX BIOLOGICAL STATE (PB) .................................................................................................................... 33 6.1 ASSUMPTIONS FOR THE PGX BIOLOGICAL STATE (PB) DOMAIN MODEL ................................. 34 6.2 EXAMPLES FOR PGX BIOLOGICAL STATE (PB) DOMAIN MODEL................................................. 34 7 SUBJECT BIOLOGICAL STATE (SB) .......................................................................................................... 35 7.1 ASSUMPTIONS FOR THE SUBJECT BIOLOGICAL STATE MARKER (SB) DOMAIN MODEL ...... 36 7.2 EXAMPLES FOR SUBJECT BIOLOGICAL STATE MARKER (SB) DOMAIN MODEL ...................... 36 APPENDIX A – NEW AND DELETED DOMAINS AND VARIABLES ............................................................ 38 APPENDIX B – VIROLOGY CONCEPT MAPS .................................................................................................. 39 B.1 VIROLOGY RESISTANCE TESTING MAPS ............................................................................................ 39 B. 2 GENETIC TESTING .................................................................................................................................... 40 B. 3 BUILDING KNOWLEDGE OF VIRAL RESISTANCE MUTATION ...................................................... 42 B. 4 INFERRING VIRAL RESISTANCE FROM GENETIC MUTATION RESULTS .................................... 43 APPENDIX C – PARTICIPATING INDIVIDUALS AND ORGANIZATIONS ................................................ 44 APPENDIX D – REPRESENTATIONS AND WARRANTIES, LIMITATIONS OF LIABILITY, AND DISCLAIMERS .................................................................................................................................................. 45 © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page i December 6, 2012 CDISC Virology User Guide (Version 1.0) LIST OF TABLES VR.xpt, Pharmacogenomics Findings - one record per viral load observation per specimen collected, per test, per date of test, per subject, tabulation. ............................................................................................................................... 8 PF.xpt, Pharmacogenomics Findings - one record per method/setup observation per specimen collected, per date of test, per subject, Tabulation ......................................................................................................................................... 14 PG.xpt, Pharmacogenomics — Findings. One record per method/setup observation per specimen collected, per date of test, per subject, Tabulation ..................................................................................................................................... 26 PB.xpt, Pharmacogenomics Biological State - Special Purpose Domain. One record per biomarker used in the study, tabulation. .................................................................................................................................................................... 33 SB.xpt, Subject Biological State – Special-Purpose Domain. One record per subject per observed biological state in the study, tabulation. .................................................................................................................................................... 35 LIST OF FIGURES Figure 1: Biologic Specimen Natural Hierarchy ..........................................................................................................5 Figure 2: Relationships Between Molecular Concepts .................................................................................................7 © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page ii December 6, 2012 CDISC Virology User Guide (Version 1.0) 1 INTRODUCTION 1.1 PURPOSE The purpose of this provisional Virology Therapeutic Data Standards Area User Guide (VR-UG) is to provide guidance on the implementation of the Study Data Tabulation Model (SDTM) data standards for virology data. This provisional virology user guide is designed to be used in concert with the SDTM model, the SDTMIG-PGx (currently under development) and the SDTMIG. See paragraph four below and section 1.3 for more information. This is the first attempt by CDISC to develop submission standards for virology-focused clinical trials, so it is expected that there will be areas for further development. This user guide is dependent upon the publication of Version 1.4 of the SDTM. Due to these reasons, the CDISC Virology Team is publishing this supplement as provisional in order to allow time for completion of the new version of the SDTM and to collect input from implementers. The measurement of viral concentration is central to virology studies. The measurement of viral concentration (i.e., viral load, a measure of disease burden) in specimens from subjects is handled via the existing SDTM LB domain. This guide provides guidance on handling measurements of viral concentration from in vitro resistance testing. Virology studies may also record viral genetic variations, and relate these to changes in antiviral drug resistance and susceptibility. To this end, this VR-UG includes the following draft domains: 1. 2. 3. 4. 5. Viral Resistance (VR) - This new Findings domain is for data on viral resistance obtained by growing a virus in culture in the presence of a drug and then quantifying the viral response to the drug. Pharmacogenomics/Genetics Methods and Supporting Information (PG) - This updated Findings domain describes new SDTM variables and stores information about the test methodology that collects the set-up and quality control of the test. This contributes to the understanding of the test results contained in the Pharmacogenomics Findings (PF) domain. Pharmacogenomics Findings (PF) - This updated Findings domain includes new SDTM variables, and is for the submission of results of genetic variations and gene expression. PGx Biological State (PB) - This new Special-Purpose domain is a reference dataset that relates a set of genetic variations to an inference about the medical meaning of the set of genetic variations. Subject Biological State (SB) - This new Special-Purpose domain holds the medical statement from the PB domain for individual subjects. Through the use of the PB and SB domain a mechanism is provided to stay aligned with current medical knowledge. A Pharmacogenomic/Genetics Study Data Tabulation Model Implementation Guide (SDTMIG-PGx) is currently under development. This document will describe how to represent genetic data collected on samples of DNA and RNA in SDTM-based format. The SDTMIG-PGx is envisioned to describe how to accommodate genetic information from humans as well as from viruses, bacteria, and other microorganisms including genetic variation and gene expression. Members of the user community are encouraged to participate in the vetting of the SDTMIG-PGx standards. 1.2 CDER GUIDANCE ON ANTIVIRAL PRODUCT DEVELOPMENT Implementers who intend to submit data to FDA are strongly encouraged to review current CDER guidance documents related to the submission of antiviral drug resistance data, such as the CDER Draft Guidance on Antiviral Product Development -Conducting and Submitting Virology Studies to the Agency – Guidance for Submitting HCV Resistance Data (January 2012) and Guidance for Submitting HIV Resistance Data (June 2012). © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 1 December 6, 2012 CDISC Virology User Guide (Version 1.0) 1.3 RELATIONSHIP TO PRIOR DOCUMENTS This document does not replace any of the standards defined in the current Study Data Tabulation Model Implementation for Human Clinical Trials (SDTMIG) or other implementation guides to the SDTM. When used for clinical trials data, this SDTM supplement should be implemented together with the current version of the SDTMIG (available at http://www.cdisc.org/standards). The SDTMIG is based on the general SDTM conceptual model for representing clinical study data that is submitted to regulatory authorities and should be read prior to reading the VR-UG v.1.0. An understanding of both of these documents is needed before attempting to understand this virology addendum. 1.4 ORGANIZATION OF THIS DOCUMENT This document contains information on how to format tabulation data for the purpose of submission. While the document is self-contained with respect to virology-specific information, the domains were designed to work in concert with existing SDTM model constructs. This document has been organized into the following sections: Section 1: Introduction - This section provides an orientation to this document. Section 2: Relationships Between the Pharmacogenomics/Genetics (PGx) and Biospecimen Domains (BE, BS) - Provides an overview of the new domains and their relationship to each other as well as to existing domains described in the SDTMIG. Section 3: Viral Resistance Findings (VR) - Describes the domain, assumptions and examples for viral resitance findings Section 4: Pharmacogenomics Findings (PF) - Describes the PF domain and includes domain models, assumptions, and examples. Section 5: Pharmacogenomics/Genetics Methods and Supporting Information (PG) - Describes proposed new virology and updated PGx domains and assumptions for inclusion in a future SDTM based implementation guide. Section 6: PGx Biological State (PB) - Describes the domain, assumptions, and examples for a reference dataset of biomarkers. Section 7: Subject Biological State (SB) - Describes the domain, assumptions, and examples for subject-level biomarker data. Appendices - Contains a table of new and deleted SDTM variables, a list of participating organizations, and legal notices. 1.5 DESIGN CONSIDERATIONS AND APPROACH The purpose of this section is to review the design approach and lessons learned by mapping complex PGx and virology data. This section also serves to document issues encountered during the development process and their respective resolutions. 1. 2. 3. 4. 5. The initial approach was to represent genetic variation data by using the HUGO Nomenclature (HGNC). An example HUGO representation of example a codon-level mutation is c.28CTC>ATC, which means that at position 28 of the DNA sequence where the expected nucleotide sequence is "CTC" the sequence "ATC" was observed. During an early review, FDA reviewer participants asked that these data be parsed out into expected nucleotide (CTC), position (28), and observed nucleotide (ATC). An earlier design handled the position, expected nucleotide, observed nucleotide, and a number of other test characteristics in separate rows. This design required variables to group rows together and also resulted in very large files. Given the challenges with the multi-row design, a different structure was proposed, and new variables were added, so that the multiple results obtained for what was really a single test could be represented in a single row. The representation of codon changes and amino acid changes were considered to be separate results and should be submitted as separate rows. SPECIES and STRAIN were added to the domains to allow for the separation of genetic and genomic data from pathogens, such as viruses (that are the subject of this user guide) from genetic data on their © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 2 December 6, 2012 CDISC Virology User Guide (Version 1.0) 6. 7. 8. human hosts (whose species and stain, if not human, would be submitted in the Demographics domain). It was suggested that SUBSTRAIN and CLADE be added to the domains. However, because of ambiguous definitions and because the hierarchy used seems to differ, these potential additions were deferred until a future version. Representing viral resistance data in an SDTM-based domain model is a challenge. An initial attempt was made to model these data in the Microbiology domains, but this approach was abandoned because the current MB/MS domain structures are limited to resistance based on only one result. Virology data, on the other hand, includes multiple results, and a net assessment that summarizes these results. The use of the LB domain, which already includes examples of viral test data, was next considered but this approach was felt to create too high a burden for creating test codes which would have included the virus as part of the test name. After considering these alternatives, the team chose to create a Viral Resistance (VR) domain that includes the species and strain variables, eliminating the need to maintain pathogen-specific test names. A draft SDTMIG-PGx document underwent public review in 2010. The need for new examples and domains was identified to better document the PGx Biological State (PB) and Subject Biological State (SB). These domains are included in draft form in this VR-UG and will be included SDTMIG-PGx that is currently under development. © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 3 December 6, 2012 CDISC Virology User Guide (Version 1.0) 2 RELATIONSHIPS BETWEEN THE PHARMACOGENOMICS/GENETICS (PGX) AND BIOSPECIMEN DOMAINS The section explains the concepts and relationships between the two PGx domains (PG and PF) in this guide and the three biospecimen domains (BE, BS, ES) that will be included in the forthcoming SDTMIGPGx. These domains support specimen re-sectioning (when a portion of a specimen is tested) or specimen extraction (when a genetic sample such as RNA or DNA is extracted for genetic/genomic testing). 1. In the top left corner of Figure 1 is the Biospecimen Events (BE) domain that is used to capture the date/time of important steps within the specimen handling process. Examples include the following: - Date/time specimen was sent to a lab - Date/time specimen was received by a lab - Date/time and duration for flash freezing and/or thawing of the specimen 2. Next in line after BE is the Biospecimen Handling domain (BS) that contains the details regarding biospecimen handling. Examples include specimen volume, flash-frozen temperature, preservative type, preservative volume, stabilizing reagent, and stabilizing reagent volume. 3. The Extracted Sample (ES) domain stores information regarding materials extracted from sample such as RNA or DNA from a blood or tissue sample. It may also contain information about resections obtained from a biospecimen (e.g., RNA/DNA quantity extracted, genetic material extract, and PGx specimen condition). 4. The results of PGx tests tend to be sensitive to the degree of adherence to the specimen handling and test-setup processes specified in the test protocol. Therefore, additional quality control (QC) observations are captured to document compliance to proper procedures. Knowledge of the setup processes also contributes to improved understanding of the test results. Therefore, a two-domain structure was developed, in which setup and QC observations are stored separately from test results. The former are stored in the Pharmacogenomics/Genetics Methods and Supporting Information (PG) domain (also described as the PG Setup and QC domain for brevity), while results are stored in the Pharmacogenomics Findings (PF) domain. This separation allows PG set up and QC to appear once, and be related to multiple pharmacogenomic findings. Examples include the following: For Gene Expression: Normalization Technique, RNA Integrity Number, A260/A230 ratio, and A260/A280 ratio. For Genetic Variation (Genotype / SNP Probe): exons sequenced, sequence start, and sequence length. 5. The PGx Findings (PF) domain contains the results of genetic variation and gene expression tests. For genetic variation tests, test results may include portions of the genetic sequence and comparisons with reference gene sequence. 6. Linking between these five domains is accomplished by means of SDTM identifiers (STUDYID, LNKID, and REFID). 7. The PGx Biological State (PB) (Reported Medical Condition Associations) domain is a specialpurpose domain that documents known associations between observed variations and medical conclusions (e.g., disease diagnosis, resistance of a virus to a particular drug). 8. The Subject Biological State (SB) (Subject Medical Condition Associations) domain applies associations documented in PB to observed subject variation and mutation data in PF to document medical conclusions for individual subjects. There are many genetic tests that involve the comparison of subject data to a published database. Under certain circumstances, a test may be re-evaluated against different versions of the published database. That being the case, there is a need for additional linking is needed. To accomplish this, records in the PF domain should use the LINKID to connect to the record in the PG domain that documents the reference database used. Examples of this linking will be included in the forthcoming SDTMIG-PGx. The diagram below describes the high-level hierarchy that links these domains starting with the biologic specimen. © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 4 December 6, 2012 CDISC Virology User Guide (Version 1.0) Figure 1: Biologic Specimen Natural Hierarchy © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 5 December 6, 2012 CDISC Virology User Guide (Version 1.0) 2.1 RELATIONSHIPS BETWEEN MOLECULAR CONCEPTS Figure 2 below shows an example of how the relationships from the collected specimen to the results can be represented. 1. 2. 3. 4. 5. The Biospecimen Handling domain (BS), which is described in the forthcoming SDTMIG-PGx, contains information about the collected specimen. For example, this could be a collected tissue from a normal or cancerous section of an organ. ABC-004 is the specimen identifier held in BSREFID. There were no Biospecimen Events of interest in the example, so the BE domain is not included. The Extracted Specimen domain (ES) (also described in the SDTMIG-PGx) shows the identifier assigned to the genetic sample such as DNA once it is extracted. This domain would contain identifiers for both the collected tissue sample and the extracted genetic sample. Tests reported in the Pharmacogenomics Findings domain (PF) are linked to the specimen on which they were run by means of the extracted specimen identifier (e.g. ABC-004-01), which is stored in PFREFID, which is the same value in ESSPID. The gene (in PFGENROI) is then associated with the amino acids that have been detected and identified in rows containing a PFTESTCD value of AAOBS for the observed amino acid and GENLOC for the position. The amino acid can then be associated to the actual variation or mutation either represented as a codon (with three nucleotides) or as individual nucleotides with their respective positions. PFRESCAT qualifies the result in ORRES and STRESC (e.g., point mutation). © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 6 December 6, 2012 CDISC Virology User Guide (Version 1.0) Figure 2: Relationships Between Molecular Concepts Relationships Among Molecular Concepts in Virology Examples BS.REFID ABC-004 [Specimen] 1:M Hierarchy is mostly one to many except between Amino Acids and Codons ES.SPID ABC-004-01 [Genetic Material Such as DNA] 1:M PF.REFID ABC-004-01 [DNA} 1:M PF GENTYP = GENE GENROI = GENEID [Gene] 1:M PF TESTCD = AA [Amino Acid] AA=I GENLOC=71 PF GENLOC 213 [Codon] 1:1 PFTESTCD=CDN PFORRES= ATT PFREFRES=GTT PF PFSPID GENLOC=213 GENLOC=214 GENLOC=215 1:M Nucleotide TESTCD = NUC ORRES = A ORRES = A ORRES = T RESCAT = “Point Mutation” © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 7 December 6, 2012 CDISC Virology User Guide (Version 1.0) 3 VIRAL RESISTANCE FINDINGS (VR) VR.xpt, Pharmacogenomics Findings - one record per viral load observation per specimen collected, per test, per date of test, per subject, tabulation. Variable Name Variable Label STUDYID DOMAIN Study Identifier Domain Abbreviation USUBJID Controlled Terms or Format Type CDISC Notes Core Identifier Identifier Definition: Unique identifier for a study within the submission. Definition: Two-character abbreviation for the domain most relevant to the observation. Req Req Unique Subject Identifier Char Identifier Definition: Unique subject identifier within the submission. Req VRSEQ Sequence Number Num Identifier Definition: Sequence number given to ensure uniqueness within a dataset for a subject. Can be used to join related records. Req VRGRPID Group ID Char Identifier VRREFID VRLNKID VRASYID VRTESTCD Specimen ID Link ID Assay ID Genomics Test Code Char Char Char Char * VRTEST Char * Char * VRGENTYP Pharmacogenomics Test Description Test Reference Terminology Code Test Reference Terminology Name Test Reference Terminology Version Gene Type VRGENROI Gene Region of Interest Char VRSPCIES Biological Classification Char * VRSTRAIN Type of Strain Char * VRCAT Category for Char Pharmacogenomics Lab Test * VRTSTRCD VRTSTRNM VRTSTRVR Char Char Role **VR Char Char Char Definition: Used to tie together a block of related records in a single domain to support relationships within the domain and between domains. Identifier Definition: The identifier of the viral specimen being tested. Identifier Definition: Supports linking information across different domains Identifier Definition: A unique identifier for a test as maintained by a lab. Topic Definition: Short name for the test. Examples: IC50T, IC50R Synonym Definition: The verbatim name used to obtain the measurement or finding. Qualifier Examples: IC50 result on treatment, IC50 fold change from baseline. Result Qualifier Definition: The code of the result. For example; R is the code for Arginine and C49488 is the code for Y. Result Qualifier Definition: The name of the Reference Terminology for the result. For example; CDISC, SNOMED, LOINC. Result Qualifier Definition: The version number of the Reference Terminology, if required. Perm Result Qualifier Definition: Identifies the type of genetic region of interest, for example, GENENAME, SECTOR, PROTEIN. Record Area within the DNA sequences. Qualifier Example: Protease (in the case of HIV), NS3/4A, NS5B (in the case of HCV). Grouping Definition: Biological classifications for an organism capable of breeding and Qualifier producing offspring. May also be used to designate organisms. Example: HOMO SAPIENS, RAT, MOUSE, BACTERIUM, HCV, HIV Grouping Definition: A genetic variant or subtype of a micro-organism. Qualifier Examples: 1a, 1b. Grouping Definition: Used to categorize types of viral resistance tests. Qualifier Exp © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Perm Perm Perm Req Req Perm Perm Perm Exp Perm Perm Exp Page 8 December 6, 2012 CDISC Virology User Guide (Version 1.0) Variable Name VRSCAT VRDRUG Variable Label Controlled Terms or Format Type Subcategory for Char Pharmacogenomics Lab Test Drug Name Char * Role Grouping Qualifier CDISC Notes Core Definition: A further categorization of the various test types based on particular Perm characteristics of a test. Record Qualifier Definition: the name of the drug for which resistance is based on genetic biological markers. Examples: Saquinavir, Indinavir Result Qualifier Definition: Result of the measurement or finding as originally received or collected. Example: For this domain the results are generally numeric/char value as provided by the laboratory. Variable Definition: Represents the unit of measure used by VRORRES if applicable. Qualifier Example: copies/5uL, LOG10 IU/mL Exp Result Qualifier Definition: Provides information such as the gene being tested for genotyping tests as well as interpretations and other supporting information such as insertions and deletions or intensity and P-Value for Array tests. Example for CDNOBS: AGC Result Qualifier Definition: Used for continuous or numeric results or findings in standard format; copied in numeric format from VRSTRESC. VRSTRESN should store all numeric test results or findings. Example for p-Value: 0.5391 Variable Definition: Represents the unit of measure used by VRSTRESN. Qualifier Exp Record Qualifier Definition: Used to indicate exam not done. Should be null if a result exists in VRSTRESC. Perm Exp VRORRES Result or Finding in Original Units Char VRORRESU Original Units Char VRSTRESC Character Result/Finding Char in Std Format VRSTRESN Numeric Result/Finding Num in Standard Units VRSTRESU Standard Units Char * VRSTAT Pharmacogenomics Status Char (ND) VRREASND Reason Test Not Done Char Record Qualifier VRXFN Raw Data File or LSID Char Record Qualifier VRNAM Vendor Name Char Record Qualifier Definition: Describes why a measurement or test was not performed such as Perm BROKEN EQUIPMENT, SUBJECT REFUSED, or SPECIMEN LOST. Used in conjunction with VRSTAT when value is NOT DONE. Definition: Direct reference identifier for Microarray or Genotypic data Perm contained in a separate file in its native format. Life Sciences Identifier (LSID) Definition: Name or identifier of the laboratory or biotech firm that provided the Perm test results. VRSPEC Specimen Type Char Record Qualifier Definition: Defines the type of specimen used for a measurement. Examples: TISSUE, SERUM, PLASMA, TUMOR, DNA, RNA Perm VRSPCCND Specimen Condition Char Record Qualifier Definition: Free or standardized text describing the condition of the specimen. Example: HEMOLYZED, ICTERIC, LIPEMIC, FRESH, FROZEN, PARAFFIN-EMBEDDED etc. Perm (UNIT) * © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Exp Exp Exp Page 9 December 6, 2012 CDISC Virology User Guide (Version 1.0) Variable Name Variable Label Controlled Terms or Format Type Role CDISC Notes Core VRMETHOD Method Code for Test Char * Record Qualifier Definition: Special instructions for the execution of genomics or genetic testing. Req Examples: PhenoSense GT VRBLFL Baseline Flag Char (NY) Definition: Indicator used to identify a baseline value, Perm VRDRVFL Derived Flag Char (NY) Definition: Used to indicate a derived record. Perm VISITNUM Visit Number Num Record Qualifier Record Qualifier Timing Exp VISIT Visit Name Char Timing VISITDY Num Timing VRDTC Planned Study Day of Visit Date/Time of Test VRDY Study Day of Test Num Timing VRTPT Planned Time Point Name Char Timing VRTPTNUM Planned Time Point Number Elapsed Time from Reference Point Num Timing Definition: 1. Clinical encounter number. 2. Numeric version of VISIT, used for sorting. Definition: 1.Protocol-defined description of clinical Encounter 2.May be used in addition to VISIT and VISITDY. Definition: Planned study day of the visit based upon RFSTDTC in Demographics. Definition: Date/time of specimen collection Definition: 1. Study day of specimen collection, measured as integer days. 2. Algorithm for calculations must be relative to the sponsor-defined RFSTDTC variable in Demographics. This formula should be consistent across the submission. Definition: 1.Text Description of time when specimen should be taken. 2. This may be represented as an elapsed time relative to a fixed reference point, such as time of last dose. See VRTPTNUM and VRTPTREF. Examples: Start, 5 min post. Numerical version of VRTPT to aid in sorting. Time Point Reference Char VRELTM VRTPTREF Char Char ISO 8601 ISO 8601 Timing Timing Timing © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Perm Perm Exp Perm Perm Perm Definition: Elapsed time (in ISO 8601) relative to a planned fixed reference Perm (VRTPTREF). This variable is useful where there are repetitive measures. Not a clock time or a date time variable. Examples: '-P15M' to represent the period of 15 minutes prior to the reference point indicated by VRTPTREF, or 'P8H' to represent the period of 8 hours after the reference point indicated by VRTPTREF. Definition: Name of the fixed reference point referred to by VRELTM, Perm VRTPTNUM, and VRTPT. Examples: PREVIOUS DOSE, PREVIOUS MEAL. Page 10 December 6, 2012 CDISC Virology User Guide (Version 1.0) Variable Name VRRFTDTC 3.1 Type Date/Time of Reference Char Time Point Controlled Terms or Role Format ISO 8601 Timing CDISC Notes Date/time of the reference time point, VRTPTREF. Core Perm ASSUMPTIONS FOR VIRAL RESISTANCE TEST FINDINGS (VR) DOMAIN MODEL 1. 2. 3. 3.2 Variable Label This domain is for data on viral resistance obtained by growing virus in culture in the presence of a drug and then quantifying virus (e.g. measuring “viral load”). This is distinct from “viral load” measured on samples taken directly on a study subject to measure the status of the virus within the subject that would be submitted in the LB domain. It is also distinct from genetic testing performed to detect viral variations and infer viral resistance from variations; that data is stored in PF (see Example 4 in Section 4.6 of this supplement). This domain is for clinical and pre-clinical use. Viral resistance is determined by exposing the amplified virus in isolation (in vitro) to an anti-viral drug and then deriving from the raw viral load values for each concentration the inhibitory concentrations (ICs) for various proportions of virus. For instance the IC50 is a concentration that limits growth to 50% of what is seen for virus grown without drug. These inhibitory concentrations for a sample taken from a study subject may be compared with inhibitory concentrations for a control strain of virus, usually a “wild type”, susceptible to the drug in question. The ratio of inhibitory concentration for study subject virus and control virus is called a “fold increase.” All these measures may be considered in reaching an overall assessment of the virus’s resistance to the drug. EXAMPLES FOR VIRAL RESISTANCE TEST FINDINGS (VR) DOMAIN MODEL Example 1: This HIV example shows that viral concentrations are measured after exposure to specified concentrations to determine levels of susceptibility.. This example compares the subject’s specimen’s culture measurements to those of a control sample. A similar comparison could be made to a baseline measurement for the subject. In these examples, Rows 1-7 pertain to resistance to Drug A and Rows 8-14 pertain to resistance to Drug B. Rows 1 and 8 show the response of the virus extracted from the subject based on drug concentrations expected to produce 50% inhibition of the standard virus growth. Rows 2 and 9 show a control viral sample response based on drug concentrations expected to produce 50% inhibition of the standard virus growth. Rows 3 and 10 show the fold change of the response of the virus extracted from the subject from control viral sample response based on drug concentrations expected to produce 50% inhibition of the standard virus growth. This is the on-treatment result divided by the reference result. Rows 4 and 11 show the response of the virus extracted from the subject based on drug concentrations expected to produce 95% inhibition of the standard virus growth. Rows 5 and 12 show a control viral sample response based on drug concentrations expected to produce 95% inhibition of the standard virus growth. © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 11 December 6, 2012 CDISC Virology User Guide (Version 1.0) Rows 6 and 13 show the fold change of the response of the virus extracted from the subject from control viral sample response based on drug concentrations expected to produce 95% inhibition of the standard virus growth. This is the on-treatment result divided by the reference result. Rows 7 and 14 show the net assessment (Reduced or Increased Susceptibility) based on the measurements. Row STUDYID DOMAIN USUBJID VRSEQ VRGRPID VRREFID VRGENTYP VRGENROI VRTESTCD VRTEST 1 ABC-123 VR R12345 1 1 16248 SECTOR IC50T IC50 Result on Treatment 2 ABC-123 VR R12345 2 1 16248 SECTOR IC50R 3 ABC-123 VR R12345 3 1 16248 SECTOR 4 ABC-123 VR R12345 4 1 16248 SECTOR IC50 Reference Control Result IC50 Fold Change from Reference IC95 Result on Treatment 5 ABC-123 VR R12345 5 1 16248 SECTOR 6 ABC-123 VR R12345 6 1 16248 SECTOR 7 ABC-123 VR R12345 7 1 16248 SECTOR Nucleoside Reverse Transcriptase Nucleoside Reverse Transcriptase Nucleoside Reverse Transcriptase Nucleoside Reverse Transcriptase Nucleoside Reverse Transcriptase Nucleoside Reverse Transcriptase Nucleoside Reverse Transcriptase NETASSMT IC95 Reference Control Result IC95 Fold Change from Reference Net Assessment 8 ABC-123 VR R12345 8 2 16248 SECTOR Protease IC50T IC50 Result on Treatment 9 ABC-123 VR R12345 9 2 16248 SECTOR Protease IC50R 10 ABC-123 VR R12345 10 2 16248 SECTOR Protease IC50FCR 11 ABC-123 VR R12345 11 2 16248 SECTOR Protease IC95T IC50 Reference Control Result IC50 Fold Change from Reference IC95 Result on Treatment 12 ABC-123 VR R12345 12 2 16248 SECTOR Protease IC95R 13 ABC-123 VR R12345 13 2 16248 SECTOR Protease IC95FCR 14 ABC-123 VR R12345 14 2 16248 SECTOR Protease NETASSMT © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional IC50FCR IC95T IC95R IC95FCR IC95 Reference Control Result IC95 Fold Change from Reference Net Assessment Page 12 December 6, 2012 CDISC Virology User Guide (Version 1.0) Row (cont) 1 2 3 4 5 6 7 VRSPCIES VRSTRAIN VRDRUG VRORRES HIV HIV HIV HIV HIV HIV HIV 1 1 1 1 1 1 1 Drug A Drug A Drug A Drug A Drug A Drug A Drug A 13.11 2.99603 4.37471 136.77 37.6061 3.6 Reduced Susceptibility umol umol 8 9 10 11 12 13 14 HIV HIV HIV HIV HIV HIV HIV 1 1 1 1 1 1 1 Drug B Drug B Drug B Drug B Drug B Drug B Drug B 7.97 1.71997 4.63569 28.54 11.9079 2.4 Reduced Susceptibility umol umol © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional VRORRESU VRSTRESC umol umol VRSTRESN VRSTRESU 13.11 2.99603 4.37471 136.77 37.6061 3.6 umol umol 7.97 1.71997 4.63569 28.54 11.9079 2.4 umol umol umol umol Reduced Susceptibility umol umol umol umol Reduced Susceptibility Page 13 December 6, 2012 CDISC Virology User Guide (Version 1.0) 4 PHARMACOGENOMICS FINDINGS (PF) PF.xpt, Pharmacogenomics Findings - one record per method/setup observation per specimen collected, per date of test, per subject, Tabulation Variable Name Variable Label STUDYID DOMAIN Study Identifier Domain Abbreviation USUBJID PFSEQ Controlled Terms or Format Type CDISC Notes Core Identifier Identifier Definition: Unique identifier for a study within the submission. Definition: Two-character abbreviation for the domain most relevant to the observation. Req Req Unique Subject Identifier Char Sequence Number Num Identifier Identifier Definition: Unique subject identifier within the submission. Definition: Sequence number given to ensure uniqueness within a dataset for a subject. Can be used to join related records. Req Req PFGRPID Group ID Char Identifier Perm PFREFID PFLNKID Specimen ID Link ID Char Char Identifier Identifier Definition: Used to tie together a block of related records in a single domain to support relationships within the domain and between domains. Definition: The identifier of the genetic specimen being tested. Definition: Supports linking information across different domains. PFASYID PFRLOCID Assay ID Reference Result Location Genomics Test Code Char Char Identifier Identifier PFTESTCD Char Char Role **PF Char * PFTEST Pharmacogenomics Test Char Description * PFTSTRCD * PFGENLOC Test Reference Terminology Code Test Reference Terminology Name Test Reference Terminology Version Genetic Region of Interest Type Genetic Region of Interest Genetic Location PFSPCIES Biological classification Char PFTSTRNM PFTSTRVR PFGENTYP PFGENROI Char Char Char Char Char Char * Perm Perm Definition: A unique identifier for a test as maintained by a lab. Definition: Provides an external database identifier that can be used to locate the documented reference sequence. Examples: dbSNP RS Number. Topic Definition: Short name for the test. Examples: AA, CHGTYP, CDNPOS, CDNOBS, PATHTYP, POLYTYP, NINT1VAL, NINT2VAL, PVAL, FOLDCHG, LOTINT, LOGERROR Synonym Definition: The verbatim name used to obtain the measurement or finding. Qualifier Examples: Amino Acid, Genetic Change Type, Codon Position, Observed Codon, Pathological Type, Normalized Intensity Value 1, Normalized Intensity Value 2, P Value, Fold Change, Log Intensity Type, Log Error. Result Qualifier Definition: The code of the result. For example: LOINC code 48005-3 for amino acid change. Result Qualifier Definition: The name of the Reference Terminology for the result. Examples: CDISC, SNOMED, LOINC. Result Qualifier Definition: The version number of the Reference Terminology, if required. Perm Perm Result Qualifier Definition: Identifies the type of genetic region of interest, for example, GENENAME, SECTOR, PROTEIN. Result Qualifier Definition: Area within the DNA sequences. Example: Protease (in the case of HIV), NS3/4A, NS5B (in the case of HCV). Result Qualifier Definition: Specifies a location within a sequence pertaining to the observed results contained in PFORRES, PFSTRESC and PFSTRESN. Grouping Definition: Biological classifications for an organism capable of breeding and Qualifier producing offspring. May also be used to designate organisms. Example: HOMO SAPIENS, RAT, MOUSE, BACTERIUM, HCV, HIV Exp © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Req Perm Perm Perm Perm Exp Perm Perm Page 14 December 6, 2012 CDISC Virology User Guide (Version 1.0) Variable Name Variable Label Controlled Terms or Format Type PFSTRAIN Type of Strain Char * PFCAT Category for Char Pharmacogenomics Lab Test Subcategory for Char Pharmacogenomics Lab Test * PFMUTYP Mutation Type Char PFORRES Result or Finding in Original Units Char PFORRESU Original Units Char PFSTRESC Character Result/Finding Char in Std Format PFSTRESN Numeric Result/Finding Num in Standard Units PFSTRESU Standard Units Char * PFRESRCD Result Reference Terminology Code Char * PFRESRNM Result Reference Terminology Name PFRESRVR Result Reference Terminology Version PFSCAT Role CDISC Notes Core Grouping Qualifier Grouping Qualifier Definition: A genetic variant or subtype of a micro-organism. Examples: 1a, 1b. Definition: Used to categorize types of genetic/genomic tests. Examples: MICRO ARRAY, EGFR MUTATION ANALYSIS. * Grouping Qualifier * Grouping Qualifier Definition: A further categorization of the various test types based on particular Perm characteristics of a test. Examples: OBSERVED VALUE, INTERPRETATION, PHENOTYPIC EXPRESSION Definition: Indicates whether a mutation is inheritable or not. Perm Examples: GERMLINE (UNIT) Perm Exp Result Qualifier Definition: Result of the measurement or finding as originally received or collected. Example: Observed Nucleotide value: T. Variable Definition: Represents the unit of measure used by PFORRES if applicable. Qualifier Example: copies/5uL, LOG10 IU/ml Exp Perm Result Qualifier Definition: Provides information such as the gene being tested for genotyping Exp tests as well as interpretations and other supporting information such as insertions and deletions or intensity and P-Value for Array tests. Example: Nucleotide change from reference sequence: A>T. Result Qualifier Definition: Used for continuous or numeric results or findings in standard Perm format; copied in numeric format from PFSTRESC. PFSTRESN should store all numeric test results or findings. Example for P-Value: 0.5391 Variable Definition: Represents the unit of measure used by PFSTRESN. Perm Qualifier Result Qualifier Definition: The code of the result. For example: R is the code for Arginine and C49488 is the code for Y. Perm Char Result Qualifier Definition: The name of the Reference Terminology for the result. For example: CDISC, SNOMED. LOINC Perm Char Result Qualifier Definition: This is the code of the result. For example; R is the code for Arginine Perm and C49488 is the code for Y. PFREFRES Reference Result Value Char Result Qualifier Definition: Reference result used to determine variations based on the reference Perm sequence. PFRESCAT Result Category Char Result Qualifier Definition: Identifies the type of result being reported. Example: RESISTANCE VARIANT Perm PFSTAT Test Status Char Record Qualifier Perm (ND) © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Definition: Used to indicate exam not done. Should be null if a result exists in PFSTRESC. Page 15 December 6, 2012 CDISC Virology User Guide (Version 1.0) Variable Name Variable Label Controlled Terms or Format Type Role CDISC Notes Core PFREASND Reason Test Not Done Char Record Qualifier PFXFN Char PFNAM Raw Data File or Life Science Identifier Vendor Name Record Qualifier Record Qualifier Definition: Describes why a measurement or test was not performed such as Perm BROKEN EQUIPMENT, SUBJECT REFUSED, or SPECIMEN LOST. Used in conjunction with PFSTAT when value is NOT DONE. Definition: Direct reference identifier for Microarray or Genotypic data Perm contained in a separate file in its native format. Definition: Name or identifier of the laboratory or biotech firm who provides the Perm test results. PFSPEC Specimen Type Char PFSPCCND Specimen Condition Char Record Qualifier Record Qualifier Definition: Defines the type of specimen used for a measurement. Examples: DNA, RNA Definition: Free or standardized text describing the condition of the specimen. Example: CONTAMINATED PFMETHOD Method Code for Test Char * Record Qualifier Definition: Special instructions for the execution of genomics or genetic testing. Req Examples: SNP PROBE, CLIP SEQUENCING, PYROSEQUENCING, BICHROME GENE EXPRESSION CHIP). PFBLFL Baseline Flag Char (NY) Record Qualifier Definition: Indicator used to identify a baseline value, Perm PFDRVFL Derived Flag Char (NY) Record Qualifier Definition: Used to indicate a derived record. Perm VISITNUM Visit Number Num Timing Definition: 1. Clinical encounter number. 2. Numeric version of VISIT, used for sorting. Exp VISIT Visit Name Char Timing Definition: 1. Protocol-defined description of clinical encounter 2. May be used in addition to VISITNUM and/or VISITDY Perm VISITDY Planned Study Day of Num Visit Date/Time of Specimen Char Collection Study Day of Specimen Num Collection Timing Definition: Planned study day of the visit based upon RFSTDTC in Perm Demographics. Definition: Exp Date/time of specimen collection Definition: Perm 1. Study day of specimen collection, measured as integer days. 2. Algorithm for calculations must be relative to the sponsor-defined RFSTDTC variable in Demographics. This formula should be consistent across the submission. PFDTC PFDY Char * ISO 8601 Timing Timing © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Perm Perm Page 16 December 6, 2012 CDISC Virology User Guide (Version 1.0) Variable Name Variable Label Type Controlled Terms or Format Role PFTPT Planned Time Point Name Char Timing PFTPTNUM Planned Time Point Number Elapsed Time from Reference Point Num Timing PFTPTREF Time Point Reference Char PFRFTDTC Date/Time of Reference Char Time Point PFELTM 4.1 ISO 8601 Timing Timing ISO 8601 Timing Core Definition: 1.Text Description of time when specimen should be taken. Perm 2. This may be represented as an elapsed time relative to a fixed reference point, such as time of last dose. See PFTPTNUM and PFTPTREF. Examples: Start, 5 min post. Definition: Numerical version of PFTPT to aid in sorting. Perm Definition: Elapsed time (in ISO 8601) relative to a planned fixed reference Perm (PFTPTREF). This variable is useful where there are repetitive measures. Not a clock time or a date time variable. Examples: '-P15M' to represent the period of 15 minutes prior to the reference point indicated by PFTPTREF, or 'P8H' to represent the period of 8 hours after the reference point indicated by PFTPTREF. Definition: Name of the fixed reference point referred to by PFELTM, Perm PFTPTNUM, and PFTPT. Examples: PREVIOUS DOSE, PREVIOUS MEAL. Definition: Date/time of the reference time point, PFTPTREF. Perm ASSUMPTIONS FOR PHARMACOGENOMICS TEST FINDINGS (PF) DOMAIN MODEL 1. 2. 3. 4. 5. 6. 7. 8. 4.2 Char CDISC Notes PF captures results for genetic variation and gene expression. This domain is for clinical and pre-clinical use, and for tests on a study subject or an infectious microbe. PFASYID is used to distinguish between records for the same genetic test performed using different assays. The combination of PFNAM, PFASYID, and REFID will be needed to obtain the full set of genomic data produced and sent by the lab for a specific test. PFMETHOD lists techniques for the execution of genomics or genetic testing. Only the p-value calculation performed by the lab and sent to the sponsor should be included in PF. External terminology variables, (e.g., PFRESCD, PFRESRNM, PFRESRVR), will have examples in the forthcoming SDTMIG-PGx. For viral findings, mutation type (PFMUTYP) should always be set to “GERMLINE”. PFCAT is used to designate the technology used, (e.g., GENETIC VARIATION, GENE EXPRESSION). GENETIC VARIATION ASSUMPTIONS 1. 2. 3. PFTESTCD generally specifies what the test assessed, such as nucleic acid, amino acid, or codon. PFSCAT is used to categorize the tests, for example, AMINO ACID, MUTATION, or IDENTIFIER. PFASYID provides a mechanism to identify results as belonging to a common set. When a genetic test is performed on an individual subject using multiple assays, the combination of vendor name and PFASYID will support linking between the PF domains and the full set of genomic data produced and sent by the lab. This can facilitate delivering additional information to regulatory agencies, if needed. © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 17 December 6, 2012 CDISC Virology User Guide (Version 1.0) 4. 5. 4.3 PFORRES and PFSTRESC are used to store genetic and amino acid variants as well as interpretations and other supporting information such as insertions and deletions or intensity and p-values for array tests. If no standardization is being done, both variables will have identical content. When results indicate a mixture of genetic results, as when two strains of a virus such as HIV are present in a sample, all the results present should be concatenated using slashes. For example, “C/T” indicates that at that nucleotide position, the virus has both cysteine and tyrosine present, indicating a multi-strain infection. EXPLANATORY NOTES ON SLC DATABASE GENETIC CODES The following information is provided for those not familiar with genetic nomenclature. Codons are made up of three nucleotides. A nucleotide may have one of the following values: A (adenosine), G (guanosine), T (thymidine), or C (cytidine). Amino acids are encoded by the nucleotides. It is the preferred convention to use a single-letter code (SLC) to identify an amino acid. This link, http://www.cbs.dtu.dk/courses/27619/codon.html, provides the mapping between the single-letter amino acid code and its full text name, which correlates to the codon values. 4.4 EXAMPLES FOR VIRAL GENETICS FINDINGS Example 1: Only amino acid observations are being reported. In this example, the change type is a substitution. Row 1: The DNA came from a sample taken from the study subject at Visit 1. The test assesses the observed amino acid in the genetic region shown in PFGENROI at the location given by PFGENLOC, performed by the vendor (PFNAM) using a particular method (PFMETHOD). The result is an amino acid, represented by the standard one-letter code. The record also shows a reference result (the amino acid at the same location in the reference sequence) in PFREFRES, and provides a classification of the result, based on the comparison of the observed result to the reference result, in PFRESCAT. Row 1 STUDYID DOMAIN USUBJID PFSEQ PFREFID PFGENTYP PFGENROI PFTESTCD PFTEST PFSPCIES PFSTRAIN PFCAT PFSCAT P70815101 PF P7081-510101201 1 ABC-001 PROTEIN NS5B AA Amino Acid HCV 1a GENETIC VARIATION AMINO ACID PFRESCAT Point Mutation PFNAM Acme Genetics PFSPEC DNA VISITNUM 1 VISIT Baseline Row (cont) 1 PFORRES R PFGENLOC 65 PFREFRES Q © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional PFMETHOD CLIP SEQUENCING PFBLFL Y VISITDY 1 VFDTC 2003-03-27 Page 18 December 6, 2012 CDISC Virology User Guide (Version 1.0) Example 2: This example contains both an amino acid observation and the underlying nucleic acid sequence (codon). The variant identified in these records is a deletion. Row 1 shows the amino acid observed at a particular location in the genetic region NS5B71 and its reference result. This record is marked as derived, since the amino acid is derived from the observed codon via the standard look-up table. Row 2 reports the nucleic acid sequence for the associated codon and its reference result and classifies the comparison between the result and the reference result as a DELETION. Row 2 PFGENTYP PFTESTCD AA PFTEST Amino Acid PFSPCIES HCV PFSTRAIN 1b PFCAT PROTEIN PFGENROI NS5B 2 ABC-003 PROTEIN NS5B CDN Codon HCV 1b GENETIC VARIATION PFMETHOD CLIP SEQUENCING PFBLFL Y VISITNUM 1 VISIT Baseline VISITDY 1 PFDTC 20030327 CLIP SEQUENCING Y 1 Baseline 1 20030327 P70815101 PF PFORRES I PFSTRESC I PFGENLOC 71 PFREFRES V PFRESCAT PFNAM Acme Genetics PFSPEC DNA ATT ATT 213 GTT DELETION Acme Genetics DNA 2 Row (cont) 1 PFREFID ABC-003 DOMAIN PF 1 USUBJID P7081510106891 P7081510106891 PFSEQ 1 STUDYID P70815101 GENETIC VARIATION PFSCAT AMINO ACID NUCELOTIDE Example 3: The example below focuses on how variations would be reported at the nucleotide level. Note that the change type record was not shown, but would be recorded just as shown in previous examples. There is one record for which the observed nucleotide is different from the reference result. The nucleic acid at this position is missing, so the change type (PFRESCAT) is "DELETION". Nucleotide-level reporting is suggested only for special circumstances such as frame shifts since it tends to greatly increase the size of the data files. Codon-level reporting (as in the previous two examples) will result in a significant 66% saving of space. Row 1 shows the deletion of a nucleotide at a particular position. This absence of a nucleotide at this position is represented as the result “NONE”. Rows 2-9 show adjacent nucleotide positions, which are unchanged. © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 19 December 6, 2012 CDISC Virology User Guide (Version 1.0) Row 1 STUDYID P70815101 DOMAIN PF 2 P70815101 PF 3 P70815101 PF 4 P70815101 PF 5 P70815101 PF 6 P70815101 PF 7 P70815101 PF 8 P70815101 PF 9 P70815101 PF USUBJID P341510106345 P341510106345 P341510106345 P341510106345 P341510106345S P341510106345 P341510106345 P341510106345 P341510106345 PFSEQ 1 PFREFID ABC-004 PFGENTYP PROTEIN PFGENROI NS5B PFTESTCD NUC PFTEST Nucleotide PFASYID D391395001 PFSPCIES HCV PFSTRAIN 1a PFCAT GENETIC VARIATION PFSCAT NUCLEOTIDE 2 ABC-004 PROTEIN NS5B NUC Nucleotide D391395001 HCV 1a GENETIC VARIATION NUCLEOTIDE 3 ABC-004 PROTEIN NS5B NUC Nucleotide D391395001 HCV 1a GENETIC VARIATION NUCLEOTIDE 4 ABC-004 PROTEIN NS5B NUC Nucleotide D391395001 HCV 1a GENETIC VARIATION NUCLEOTIDE 5 ABC-004 PROTEIN NS5B NUC Nucleotide D391395001 HCV 1a GENETIC VARIATION NUCLEOTIDE 6 ABC-004 PROTEIN NS5B NUC Nucleotide D391395001 HCV 1a GENETIC VARIATION NUCLEOTIDE 7 ABC-004 PROTEIN NS5B NUC Nucleotide D391395001 HCV 1a GENETIC VARIATION NUCLEOTIDE 8 ABC-004 PROTEIN NS5B NUC Nucleotide D391395001 HCV 1a GENETIC VARIATION NUCLEOTIDE 9 ABC-004 PROTEIN NS5B NUC Nucleotide D391395001 HCV 1a GENETIC VARIATION NUCLEOTIDE © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 20 December 6, 2012 CDISC Virology User Guide (Version 1.0) Row (cont) 1 PFORRES NONE PFSTRESC NONE 2 T T 3 C 4 PFSTRESN PFGENLOC 213 PFREFRES A PFRESCAT DELETION PFNAM Acme Genetics PFSPEC DNA PFMETHOD DIRECT SEQUENCING PFBLFL Y 214 Acme Genetics DNA DIRECT SEQUENCING Y C 215 Acme Genetics DNA DIRECT SEQUENCING Y A A 216 Acme Genetics DNA DIRECT SEQUENCING Y 5 A A 217 Acme Genetics DNA DIRECT SEQUENCING Y 6 G G 218 Acme Genetics DNA DIRECT SEQUENCING Y 7 A A 219 DNA DIRECT SEQUENCING Y 8 G G 220 Acme Genetics Acme Genetics DNA DIRECT SEQUENCING Y 9 T T 221 Acme Genetics DNA DIRECT SEQUENCING Y Example 4: This is an example of viral genetic testing undertaken to determine drug resistance. Records come in pairs, one record for the observed codon and one for the amino acid coded by the observed codon. This distinction is made in PFSCAT. All records are for the same sample of RNA from a strain 1a of HIV. Rows 1 and 2: These results show a variation in the Protease region of the virus. The change in the codon shown in Row 2 is classified as a point mutation. The change in amino acid is classified as a resistance mutation (PFRESCAT). Rows 3-26: Illustrate the representation of other variants in a similar manner. Row 1 2 3 STUDYID STDY505357 STDY505357 DOMAIN PF USUBJID 521298 PFSEQ 1 PFGRPID 1 PFREFID D391395 PFGENTYP SECTOR PFGENROI Protease PFTESTCD AA PF 521298 2 1 D391395 SECTOR Protease CDN STDY505357 PF 521298 3 2 D391395 SECTOR Protease AA © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional PFTEST Amino Acid Codon PFSPCIES HIV PFSTRAIN 1a HIV 1a Amino Acid HIV 1a PFCAT GENETIC VARIATION GENETIC VARIATION GENETIC VARIATION PFSCAT AMINO ACID NUCLEOTIDE AMINO ACID Page 21 December 6, 2012 CDISC Virology User Guide (Version 1.0) Row 4 STUDYID STDY505357 DOMAIN PF USUBJID 521298 PFSEQ 4 PFGRPID 2 PFREFID D391395 PFGENTYP SECTOR PFGENROI Protease PFTESTCD CDN PFTEST Codon PFSPCIES HIV PFSTRAIN 1a PFCAT GENETIC VARIATION PFSCAT NUCLEOTIDE 5 STDY505357 STDY505357 PF 521298 5 3 D391395 SECTOR Protease AA HIV 1a 521298 6 3 D391395 SECTOR Protease CDN HIV 1a GENETIC VARIATION GENETIC VARIATION AMINO ACID PF Amino Acid Codon STDY505357 STDY505357 PF 521298 7 4 D391395 SECTOR Protease AA HIV 1a PF 521298 8 4 D391395 SECTOR Protease CDN Amino Acid Codon HIV 1a 9 STDY505357 PF 521298 9 5 D391395 SECTOR Reverse Transcriptase AA Amino Acid HIV 1a GENETIC VARIATION AMINO ACID 10 STDY505357 PF 521298 10 5 D391395 SECTOR Reverse Transcriptase CDN Codon HIV 1a GENETIC VARIATION NUCLEOTIDE 11 STDY505357 PF 521298 11 6 D391395 SECTOR Reverse Transcriptase AA Amino Acid HIV 1a GENETIC VARIATION AMINO ACID 12 STDY505357 PF 521298 12 6 D391395 SECTOR Reverse Transcriptase CDN Codon HIV 1a GENETIC VARIATION NUCLEOTIDE 13 STDY505357 PF 521298 13 7 D391395 SECTOR Reverse Transcriptase AA Amino Acid HIV 1a GENETIC VARIATION AMINO ACID 14 STDY505357 PF 521298 14 7 D391395 SECTOR Reverse Transcriptase CDN Codon HIV 1a GENETIC VARIATION NUCLEOTIDE 15 STDY505357 PF 521298 15 8 D391395 SECTOR Reverse Transcriptase AA Amino Acid HIV 1a GENETIC VARIATION AMINO ACID 16 STDY505357 PF 521298 16 8 D391395 SECTOR Reverse Transcriptase CDN Codon HIV 1a GENETIC VARIATION NUCLEOTIDE 17 STDY505357 STDY505357 PF 521298 17 9 D391395 SECTOR Protease AA HIV 1a 521298 18 9 D391395 SECTOR Protease CDN HIV 1a GENETIC VARIATION GENETIC VARIATION AMINO ACID PF Amino Acid Codon STDY505357 PF 521298 19 10 D391395 SECTOR Protease AA Amino Acid HIV 1a 6 7 8 18 19 © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional GENETIC VARIATION GENETIC VARIATION GENETIC VARIATION NUCLEOTIDE AMINO ACID NUCLEOTIDE NUCLEOTIDE AMINO ACID Page 22 December 6, 2012 CDISC Virology User Guide (Version 1.0) Row 20 STUDYID STDY505357 DOMAIN PF USUBJID 521298 PFSEQ 20 PFGRPID 10 PFREFID D391395 PFGENTYP SECTOR PFGENROI Protease PFTESTCD CDN PFTEST Codon PFSPCIES HIV PFSTRAIN 1a PFCAT GENETIC VARIATION PFSCAT NUCLEOTIDE 21 STDY505357 STDY505357 PF 521298 21 11 D391395 SECTOR Protease AA HIV 1a 521298 22 11 D391395 SECTOR Protease CDN HIV 1a GENETIC VARIATION GENETIC VARIATION AMINO ACID PF Amino Acid Codon STDY505357 STDY505357 PF 521298 23 12 D391395 SECTOR Protease AA HIV 1a PF 521298 24 12 D391395 SECTOR Protease CDN Amino Acid Codon HIV 1a STDY505357 STDY505357 PF 521298 25 13 D391395 SECTOR Protease AA HIV 1a PF 521298 26 13 D391395 SECTOR Protease CDN Amino Acid Codon HIV 1a 22 23 24 25 26 GENETIC VARIATION GENETIC VARIATION GENETIC VARIATION GENETIC VARIATION Row (cont) 1 PFORRES PFSTRESC PFGENLOC PFREFRES PFRESCAT PFNAM PFSPEC PFMETHOD PFBLFL I I 10 L Resistance Mutation DNA ATC ATC 28 CTC Point Mutation CLIP SEQUENCING CLIP SEQUENCING Y 2 Acme Genetics Acme Genetics 3 G G 17 G Silent Mutation DNA GGG GGG 49 GGR Duplication CLIP SEQUENCING CLIP SEQUENCING Y 4 Acme Genetics Acme Genetics 5 I I 13 V Polymorphism DNA ATA ATA 37 GTA Point Variation CLIP SEQUENCING CLIP SEQUENCING Y 6 Acme Genetics Acme Genetics 7 L L 33 I Unexpected Mutation DNA TTA TTA 97 ATA Point Mutation CLIP SEQUENCING CLIP SEQUENCING Y 8 Acme Genetics Acme Genetics © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional DNA DNA DNA DNA NUCLEOTIDE AMINO ACID NUCLEOTIDE AMINO ACID NUCLEOTIDE PFDRVFL Y Y Y Y Page 23 December 6, 2012 CDISC Virology User Guide (Version 1.0) Row (cont) 9 PFORRES PFSTRESC PFGENLOC PFREFRES PFRESCAT PFNAM PFSPEC PFMETHOD PFBLFL M M 41 L Resistance Mutation Acme Genetics DNA CLIP SEQUENCING Y 10 ATG ATG 121 TTA Point Mutation Acme Genetics DNA CLIP SEQUENCING Y 11 V V 90 V Silent Mutation DNA GTT GTT 268 GTY Deletion CLIP SEQUENCING CLIP SEQUENCING Y 12 Acme Genetics Acme Genetics 13 I I 135 V Polymorphism Acme Genetics DNA CLIP SEQUENCING Y 14 ATA ATA 103 GTA Point Variation Acme Genetics DNA CLIP SEQUENCING Y 15 K K 70 E Unexpected Mutation Acme Genetics DNA CLIP SEQUENCING Y 16 AAA AAA 208 AGA Point Mutation Acme Genetics DNA CLIP SEQUENCING Y 17 G G 48 V Unexpected Mutation DNA GGG GGG 142 GTG Point Mutation CLIP SEQUENCING CLIP SEQUENCING Y 18 Acme Genetics Acme Genetics 19 K K 20 K/R Unexpected Mutation DNA AAG AAG 58 ARG Point Mutation CLIP SEQUENCING CLIP SEQUENCING Y 20 Acme Genetics Acme Genetics 21 M M 36 I Unexpected Mutation DNA 106 106 106 ATA Point Mutation CLIP SEQUENCING CLIP SEQUENCING Y 22 Acme Genetics Acme Genetics 23 A A 71 V Unexpected Mutation DNA GCT GCT 211 GTT Point Mutation CLIP SEQUENCING CLIP Y 24 Acme Genetics Acme © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional DNA DNA DNA DNA DNA PFDRVFL Y Y Y Y Y Page 24 December 6, 2012 CDISC Virology User Guide (Version 1.0) Row (cont) PFORRES PFSTRESC PFGENLOC PFREFRES PFRESCAT PFNAM PFSPEC Genetics 25 V V 82 T/S Unexpected Mutation 26 GCT GCT 244 GTT Point Mutation PFMETHOD PFBLFL PFDRVFL SEQUENCING Acme Genetics Acme Genetics DNA CLIP SEQUENCING CLIP SEQUENCING DNA Y Y The PB domain that would accompany this findings example can be found in Example 1 of the PB domain. Example 5: The example shows typical results for genetic variation tests. The Findings data structure was extended to accommodate genetic concepts. It now supports genetic region of interest (PFGENROI), its type (PFGENTYP), the reference result (PFREFRES), and genetic location (PFGENLOC). Rows 1 and 2 both show the use of the HUGO nomenclature in the PFSTRESC variable. The reference sequence can be represented in the PG domain as a row with a test code of GENBNKID. The result variable in PG reported the NIH Genetic Sequence Database (GenBank) accession number associated with the reference sequence being used. The XFN variable may contain a pointer to a file containing the entire reference sequence. This example also shows the use of the new terminology variables (PFTSTRCD, PFTSTRNM, PFTSTRVR) to link the tests to an external dictionary such as LOINC. Row 1-2 shows that at nucleotide at position 2155 a genetic change occurred where a G was changed to an A. It also shows that a change occurred from the amino acid Glycine (Gly) at location 719 to Serine. Since the amino acid change is not directly observed but derived from the variation, the derived flag is set to "Y" if the sponsor is performing the interpretation. Row 1 STUDYID ABC-01234 DOMAIN PF 2 ABC-01234 Pf USUBJID 17C0154 PFSEQ 1 PFREFID 8250863 1 8250863 PFASYID X421395001 X421395001 PFTESTCD AA CDN PFTEST Amino Acid Codon PFTSTRCD 48005-3 PFTSTRNM LOINC PFTSTRVR 2.40 48004-6 LOINC 2.40 Row (cont) 1 PFGENTYP SECTOR PFGENROI Protease PFGENLOC 13 PFSPCIES HCV 2 SECTOR Protease 2155 HCV © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional PFCAT GENETIC VARIATION GENETIC VARIATION PFORRES S PFSTRESC C.13G>S PFREFRES G AGC C.2155GGC>AGC GGC PFRESCAT Disease Mutation Point Mutation PFNAM Biotech ABC Biotech ABC PFSPEC DNA DNA Page 25 December 6, 2012 CDISC Virology User Guide (Version 1.0) 5 PHARMACOGENOMICS/GENETICS METHODS AND SUPPORTING INFORMATION (PG) PG.xpt, Pharmacogenomics — Findings. One record per method/setup observation per specimen collected, per date of test, per subject, Tabulation Variable Name Variable Label STUDYID DOMAIN Study Identifier Domain Abbreviation USUBJID Controlled Terms or Format Type Char Char Role CDISC Notes Core Identifier Identifier Definition: Unique identifier for a study within a submission. Req Definition: Two-character abbreviation for the domain most relevant to the Req observation. Unique Subject Identifier Char Identifier Definition: Unique subject identifier within a submission. Req PGSEQ Sequence Number Num Identifier Definition: Sequence number given to ensure uniqueness of records for a subject within a dataset. Can be used to join related records. Req PGGRPID Group ID Char Identifier Definition: Used to tie together a block of related records in a single domain Perm to support relationships within the domain and between domains. Example: For PGx we have decided that a simple numbering convention works quite well (e.g. 1, 2, 3, etc.) PGREFID Specimen ID Char Identifier Definition: The identifier of the genetic specimen being tested. Example: Specimen ID. Perm PGSPID PGLNKID Sponsor ID Link ID Char Char Identifier Identifier Definition: Optional sponsor-defined reference number. Definition: Supports linking information across different domains. Perm Perm PGASYID Assay ID Char Identifier Definition: A unique identifier for a test as maintained by a lab. Exp PGTESTCD Pharmacogenomics Test Code Char Definition: Short name for the test or measurement described in PGTEST. Examples: QTYEXT represents the DNA or RNA Quantity Extracted. ACTSEQ for Active Sequence NORMMETH for Normalization Technique DIAG for Diagnosis DNAPUR for DNA Purity LBLCMPND for Label Compound EXON for Exon with Change EXONSEQ for Exons Sequenced Req **PG * © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Topic Page 26 December 6, 2012 CDISC Virology User Guide (Version 1.0) Variable Name Variable Label Controlled Terms or Format Type PGTEST Pharmacogenomics Test Description Char * PGTSTRCD Char * Char * PGGENROI Test Reference Terminology Code Test Reference Terminology Name Test Reference Terminology Version Genetic Region of Interest Type Genetic Region of Interest PGSPCIES Biological classification Char * PGSTRAIN Strain Char * PGCAT Category for Pharmacogenomics Test Char * PGSCAT Reference Subcategory for Char Pharmacogenomics Test * PGORRES Result or Finding in Original Units Char PGORRESU Original Units Char PGSTRESC Character Result/Finding Char in Std Format PGSTRESN Numeric Result/Finding in Num Standard Units PGSTRESU Standard Units PGTSTRNM PGTSTRVR PGGENTYP Char Char CDISC Notes Core Synonym Qualifier Definition: Verbatim name of the test or examination used to obtain the measurement or finding. Example: Quantity Extracted Result Qualifier Definition: The code of the test. Example: 48019-4 is the code for Genetic Change Type using LOINC. Result Qualifier Definition: The name of the Reference Terminology for the test. Example: CDISC, SNOMED, LOINC. Result Qualifier Definition: The version number of the Reference Terminology, if required. Req Definition: Identifies the type of genetic region of interest, for example, GENENAME, SECTOR, PROTEIN. Definition: Area within the DNA sequences. Result Qualifier Example: Protease (in the case of HIV), NS3/4A, NS5B (in the case of HCV). Grouping Definition: Biological classifications for an organism capable of breeding Qualifier and producing offspring. Examples: HOMO SAPIENS, RAT, MOUSE, STAPHYLOCCCUS AUREUS, HCV Grouping Definition: A genetic variant or subtype of a micro-organism. Qualifier Example: 1a for HCV Grouping Definition: Used to categorize types of genetic/genomic tests. Qualifier Examples: GENETIC VARIATION, GENE EXPRESSION. Exp Result Qualifier Char Char Role (UNIT) * © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Grouping Qualifier Definition: A further categorization of the various test types based on particular characteristics of a test. Examples: INTERPRETATION, SETUP, and QC. Result Qualifier Definition: Result of the measurement or finding as originally received or collected. Example for Exons Sequenced: 13-21 Variable Definition: Represents the unit of measure used by PGORRESU if Qualifier applicable. Example: copies/5ul, LOG10 IU/ml Result Qualifier Definition: An expression of the genetic change recorded in PGORRES in standard nomenclature such HUGO. Result Qualifier Definition: Used for continuous or numeric results or findings in standard format; copied in numeric format from PGSTRESC. PGSTRESN should store all numeric test results or findings. Example: Exon that is exhibiting the variant: 18 Variable Definition: Represents the unit of measure used by STRESN if applicable. Qualifier Example: copies/5ul Perm Exp Exp Exp Perm Perm Exp Perm Exp Perm a Exp Perm Perm Page 27 December 6, 2012 CDISC Virology User Guide (Version 1.0) Variable Name Variable Label Controlled Terms or Format Type PGRESRCD Result Reference Terminology Code Char PGRESRNM Result Reference Terminology Name PGRESRVR * Role CDISC Notes Core Result Qualifier Definition: The code of the result. For example: R is the code for Arginine and C49488 is the code for Y. Exp Char Result Qualifier Definition: The name of the Reference Terminology for the result. For example; CDISC, SNOMED, LOINC Exp Result Reference Terminology Version Char Result Qualifier Definition: The version number of the Reference Terminology, if applicable. Example: LOINC 2.38. Exp PGSTAT Completion Status Char PGREASND Reason Test Not Done Char PGXFN Raw Data File or LSID Char Record Qualifier Definition: Direct reference identifier for a raw Microarray or Genotypic data Perm file. PGNAM Vendor Name Char Record Qualifier Definition: Name or identifier of the laboratory or biotech firm who provides Perm the test results. PGSPEC Specimen Type Char Specimen Condition Char PGMETHOD Method of Test or Examination Char VISITNUM Visit Number Num Timing VISIT Visit Name Char Timing Definition: Defines the type of specimen used for a measurement. Examples: TISSUE, SERUM, PLASMA, TUMOR, DNA, RNA Definition: Free or standardized text describing the condition of the specimen. Example: HEMOLYZED, ICTERIC, LIPEMIC, FRESH, FROZEN, PARAFFIN-EMBEDDED. Definition: Special instructions for the execution of genomics or genetic testing. Examples: SNP PROBE, LASER MICRODISSECTION, POPULATION, CLIP SEQUENCING, DIRECT SEQUENCING, PYROSEQUENCING, REAGENT, GENE chip such as AGILENT or AFFYMETRIX. Definition: 1. Clinical encounter number. 2. Numeric version of VISIT, used for sorting. Definition: 1. Protocol-defined description of clinical encounter 2. May be used in addition to VISITNUM and/or VISITDY Perm PGSPCCND Record Qualifier Record Qualifier VISITDY Planned Study Day of Visit Num Timing Definition: Planned study day of the visit based upon RFSTDTC in Demographics. Perm **ND * * © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Result Qualifier Definition: Used to indicate exam not done. Should be null if a result exists Perm in PGORRES. Record Definition: Describes why a measurement or test was not performed. Perm Qualifier Examples: BROKEN EQUIPMENT, SUBJECT REFUSED, SPECIMEN LOST AND AMPLIFYING PROBLEM. Record Qualifier Perm Req Exp Perm Page 28 December 6, 2012 CDISC Virology User Guide (Version 1.0) Variable Name PGDTC Variable Label Char Controlled Terms or Format ISO 8601 Role CDISC Notes Core Timing Definition: Date and time of specimen collection. Exp Num Timing Perm PGTPT Planned Time Point Name Char Timing PGTPTNUM Planned Time Point Number Num Timing Definition: 1. Study day of specimen collection, measured as integer days. 2. Algorithm for calculations must be relative to the sponsor-defined RFSTDTC variable in Demographics. This formula should be consistent across the submission. Definition: 1. Text Description of time when specimen should be taken. 2. This may be represented as an elapsed time relative to a fixed reference point, such as time of last dose. See PGTPTNUM and PGTPTREF. Examples: Start, 5 min post. Definition: Numerical version of PGTPT to aid in sorting. PGELTM Elapsed Time from Reference Point Char PGTPTREF Time Point Reference Char PGRFTDTC Date/Time of Reference Time Point Char PGDY Date/Time of Specimen Collection Study Day of Specimen Collection Type ISO 8601 Timing Timing ISO 8601 Timing Perm Perm Definition: Elapsed time (in ISO 8601) relative to a planned fixed reference Perm (PGTPTREF). This variable is useful where there are repetitive measures. Not a clock time or a date time variable. Examples: '-P15M' to represent the period of 15 minutes prior to the reference point indicated by PGTPTREF, or 'P8H' to represent the period of 8 hours after the reference point indicated by PGTPTREF. Definition: Name of the fixed reference point referred to by PGELTM, Perm PGTPTNUM, and PGTPT. Examples: PREVIOUS DOSE, PREVIOUS MEAL. Date/time of the reference time point, PGTPTREF. Perm * Indicates variable may be subject to controlled terminology © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 29 December 6, 2012 CDISC Virology User Guide (Version 1.0) 5.1 ASSUMPTIONS FOR PHARMACOGENOMICS (PG) DOMAIN MODEL 1. 2. 3. 4. 5. 6. 7. 8. 5.2 PG will capture information about the test methodology that contributes to the understanding of the test results contained in the PF domain. These are methods that are used for test setup or quality control. This domain is for both clinical and pre-clinical use and for tests on both study subjects and infectious microbes. This domain contains the following new variables: PGTSTRCD, PGPSTRNM, PGPSTRDE, PGRESRCD, PGRESRNM, and PGRESRVR. These are needed to link to external terminology such as LOINC. Examples for use of these variables will be included in the forthcoming SDTMIG-PGx document. Additional data elements that are specific to Pharmacogenomics Findings will be supported via the use of Supplemental Qualifiers. Examples of typical data that might be submitted via a SUPPPG dataset include those listed in the table below. PGREFID should contain an identifier for the DNA or RNA extraction sample. PGREASND is used in conjunction with PGSTAT when value is NOT DONE. PGTESTCD and PGTEST should not include gene codes. Whether collecting the complete detail for a variation or mutation or a subset, the test code GENEID will be used to collect the gene of interest in the results variable. DISC plans to use test codes that correspond to LOINC codes. When using the pharmacogenomics domains for viral test reporting, the identification of the virus requires the virus name be placed in the PGSPCIES field and if available, the strain, type or subtype is placed in PGSTRAIN field. LIST OF IDENTIFIED COMMON SUPPQUALS QNAM GNANLDTC QLABEL Gene Analysis Date and Time COMMENTS Can be used to indicate when genetic/genomic data was re-evaluated against the public database(s). RPANLDTC Reported Gene Analysis Date and Time Used to reference date/time results were reported back to the sponsor. ACCESNO ACCESSION NUMBER The accession number is obtained from the lab for a lab test. Accession numbers can be useful in cases where the FDA requests more information for a particular test. The accession number helps locate information in the public databases. Note: DO NOT confuse this with Assay ID which is an identifier assigned to the work order at the lab. © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 30 December 6, 2012 CDISC Virology User Guide (Version 1.0) 5.3 EXAMPLES OF TESTCDS FOR REFERENCING PUBLIC DATABASES The following examples show how reference databases can be identified using the PGTEST and PGTESTCD fields. PGTESTCD PGTEST COMMENTS XREFDB Reference Database Identifies the public database used to identify variants and mutations. HGNCID HUGO ID HUGO Gene Nomenclature Committee Database Reference. For example, the HGNCID for the EGFR gene is HGNC: 3326 PROTDB HUMAN PROTEIN DATABASE Database Reference GENBNKID Gen Bank ID of Reference Gene Bank Identification of Reference 5.4 PG EXAMPLES Example 1: This example shows genotypic data collected about the exons that were sequenced, the associated gene name, the sequence position, and length. Row 1 specifies the gene name related to the results in the PF domain. Row 2 identifies the GenBank identifier for the Reference Sequence. Row 3 identifies the exons that were sequenced during the test. Row 4 points to additional documentation related to the conduct of the genetic tests whose results will be reported in the PF domain. Rows 5-6 show the quality control information for the starting position and length of the gene sequence used in the test SEQSTART and SEQLTH. Row 1 2 3 4 5 6 STUDYID ABC01234 ABC01234 DOMAIN PG USUBJID 17C0154 PGSEQ 1 PGGRPID 1 PGREFID 5493283 PGTESTCD GENEID PG 17C0154 2 1 5493283 GENBNKID ABC01234 ABC01234 PG 17C0154 3 1 5493283 EXONSEQ PG 17C0154 4 1 5493283 REFERENC ABC01234 ABC01234 PG 17C0154 5 1 5493283 SEQSTART PG 17C0154 6 1 5493283 SEQLTH © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional PGTEST Gene Identifier Gen Bank ID of Reference PGASYID 1 PGGENTYP GENE PGGENROI EGFR PGCAT IDENTIFIER 1 GENE EGFR IDENTIFIER Exons Sequenced Submission Reference Document 1 GENE EGFR VARIANT 1 GENE EGFR VARIANT Sequence Start Sequence Length 1 GENE EGFR VARIANT 1 GENE EGFR VARIANT PGNAM Biotech ABC Biotech ABC Biotech ABC Biotech ABC Biotech ABC Biotech ABC Page 31 December 6, 2012 CDISC Virology User Guide (Version 1.0) Row 1 EGFR PGORESS EGFR PGSTRESC PGSPEC Tumor 2 NM_005228.3 NM_005228.3 Tumor 3 13-21 13-21 DNA 4 5.23.445.1.4.165008.1.8:86175 5.23.445.1.4.165008.1.8:86175 5 1 1 1 DNA 6 5616 5616 5616 DNA © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional PGSTRESN PGMETHOD MASSIVELY PARALLEL SEQUENCING MASSIVELY PARALLEL SEQUENCING MASSIVELY PARALLEL SEQUENCING MASSIVELY PARALLEL SEQUENCING MASSIVELY PARALLEL SEQUENCING MASSIVELY PARALLEL SEQUENCING PGDTC 2012-1023T10:06 2012-1023T10:06 2012-1023T10:06 2012-1023T10:06 2012-1023T10:06 2012-1023T10:06 Page 32 December 6, 2012 CDISC Virology User Guide (Version 1.0) 6 PGX BIOLOGICAL STATE (PB) PB.xpt, Pharmacogenomics Biological State - Special Purpose Domain. One record per biomarker used in the study, tabulation. Variable Name Variable Label BRIDG Mapping Type ISO 21090 Datatype Controlled Terms or Format STUDYID DOMAIN Study Identifier Domain Abbreviation Char Char PBSEQ Sequence Number Num PBMRKRID Biological State Identifier Char PBGENTYP Genetic Region of Interest Type Genetic Region of Interest Char PBSPCIES Biological classification Char * PBSTRAIN Type of Strain Char * PBDRUG Drug Name Char PBDIAG Diagnosis Char PBMRKR Biological Marker Char PBGENROI **PS Identifier Identifier CDISC Notes Core Definition: Unique identifier for a study. Definition: Two-character abbreviation for the domain. Req Req Identifier * Char © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Role * Definition: Sequence number given to ensure uniqueness Req within a dataset for a subject. Can be used to join related records. Identifier Definition: Uniquely identifies an individual or group of Req biological markers that have medical significance in the study. Record Definition: Identifies the type of genetic region of interest, Exp Qualifier for example, GENENAME, SECTOR, PROTEIN. Record Definition: Area within the DNA sequences. Perm Qualifier Example: Protease (in the case of HIV), NS3/4A, NS5B (in the case of HCV). Grouping Definition: Biological classifications for an organism Perm Qualifier capable of breeding and producing offspring. May also be used to designate organisms. Examples: HOMO SAPIENS, RAT, MOUSE, BACTERIUM, HCV, HIV Grouping Definition: A genetic variant or subtype of a microPerm Qualifier organism. Examples: 1a, 1b. Record Definition: the name of the drug for which resistance is Exp Qualifier based on genetic biological markers. Examples: Saquinavir, Indinavir Record Definition: Disease diagnosis based on detected genetic Exp Qualifier biological markers. Example: Adenocarcinoma. Record Definition: Identifies a biological marker that is part of the Exp Qualifier group identified by PBMRKID. Examples: G48V, L10I Page 33 December 6, 2012 CDISC Virology User Guide (Version 1.0) Variable Name PBSTMT 6.1 Medical Statement BRIDG Mapping Type ISO 21090 Datatype Controlled Terms or Format Char * Role Record Qualifier CDISC Notes Core Definition: Represents a medical conclusion such as confirmation of a diagnosis or resistance to a particular medication based on genetic biological markers. Example: RESISTANCE (when PBDRUG is populated), POSITIVE (when PBDIAG is populated) Exp ASSUMPTIONS FOR THE PGX BIOLOGICAL STATE (PB) DOMAIN MODEL 1. 2. 3. 4. 6.2 Variable Label The Pharmacogenomic Biological State domain is a special-purpose reference dataset (i.e., independent of data about study subjects) that relates a set of genetic variations to an inference about the medical meaning of that set of genetic variations (i.e., a Medical Statement). The medical statement in PBSTMT may document the implications of these variations for use of a drug, (in PBDRUG) or for the diagnosis of a medical condition (in PBDIAG). In some cases, a medical statement may be inferred from the presence of a single genetic variant, but more often all genetic variations in a set must be present for an inference to be drawn. The PB domain is structured with one record for each variant contained in the set. PBMRKR identifies the genetic variation. It is recommended that standard nomenclature be used to identify the genetic variations. PBMRKRID is used to group genetic variation records which belong to a set and which form the basis for medical statement inference. It is recommended that the value in PBMRKID be formed from the short names of the genetic variations that make up the set, separated with a plus (+) symbol. This uniquely identifies each set of genetic variations and is also intelligible to reviewers. EXAMPLES FOR PGX BIOLOGICAL STATE (PB) DOMAIN MODEL Example 1: This example shows two sets of genetic variations with associated inferences about a drug. Row 1 shows a single genetic variant (set of one) whose presence infers resistance to Saquinavir. Rows 2-6 shows the individual genetic variations in a group of genetic variations, which, if all are presence, infers resistance to Indinavir. ROW STUDYID DOMAIN PBSEQ PBMRKRID 1 STDY-505357 PB 1 G48V 2 STDY-505357 L101+K20R+M361+ PB 2 A71+V82T 3 STDY-505357 L101+K20R+M361+ PB 3 A71+V82T 4 STDY-505357 L101+K20R+M361+ PB 4 A71+V82T 5 STDY-505357 L101+K20R+M361+ PB 5 A71+V82T 6 STDY-505357 L101+K20R+M361+ PB 6 A71+V82T PBGENROI PBGENTYP PBSPCIES PBSTRAIN PBDRUG Protease SECTOR HIV 1a Saquinavir Protease SECTOR HIV 1a lndinavir Protease SECTOR HIV 1a lndinavir Protease SECTOR HIV 1a lndinavir Protease SECTOR HIV 1a lndinavir Protease SECTOR HIV 1a lndinavir © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional PBMRKR G48V PBSTMT RESISTANCE L10I RESISTANCE K20R RESISTANCE M361 RESISTANCE A71V RESISTANCE V82T RESISTANCE Page 34 December 6, 2012 CDISC Virology User Guide (Version 1.0) 7 SUBJECT BIOLOGICAL STATE (SB) SB.xpt, Subject Biological State – Special-Purpose Domain. One record per subject per observed biological state in the study, tabulation. Variable Name Variable Label Type STUDYID Study Identifier Char DOMAIN Domain Abbreviation Char USUBJID Controlled Terms or Format Role CDISC Notes Core Identifier Definition: Unique identifier for a study. Req Identifier Definition: Two-character abbreviation for the domain. Req Unique Subject Identifier Char Identifier Definition: Unique subject identifier within the submission. Req SBSEQ Sequence Number Num Identifier Definition: Sequence number given to ensure uniqueness within a dataset for a subject. Can be used to join related records. Req SBGRPID Group ID Char Identifier Used to tie together a block of related records in a single domain for a subject Perm SBREFID Specimen ID Char Identifier Definition: The identifier of the genetic or viral specimen being tested. Perm SBMRKRID Biological Marker Identifier Char Identifier Definition: Uniquely identifies an individual or a group of biological markers that has medical significance in the study. Req SBGENTYP Genetic Region of Interest Type Record Qualifier Definition: Identifies the type of genetic region of interest. Examples: GENENAME, SECTOR, PROTEIN. Exp SBGENROI Genetic Region of Interest Char Record Qualifier Definition: Area within the DNA sequences. Example: A genotype or subtype of a microorganism. Perm SBSPCIES Biological Classification Char * Grouping Qualifier Perm SBSTRAIN Type of Strain Char * Grouping Qualifier Definition: Biological classifications for an organism capable of breeding and producing offspring. May also be used to designate organisms. Examples: HOMO SAPIENS, RAT, MOUSE, BACTERIUM, HCV, HIV Definition: A genetic variant or subtype of a microorganism. Examples: 1a, 1b. SBNAM Vendor Name Char Visit Number Num VISIT Visit Name Char Timing VISITDY Planned Study Day of Visit Num Timing Definition: Name or identifier of the laboratory or biotech firm who provides the test results. Definition: 1. Clinical encounter number. 2. Numeric version of VISIT, used for sorting. Definition: 1.Protocol-defined description of clinical Encounter 2.May be used in addition to VISIT and VISITDY. Definition: Planned study day of the visit based upon RFSTDTC in Demographics. Perm VISITNUM Record Qualifier Timing SB * Char © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Perm Exp Perm Perm Page 35 December 6, 2012 CDISC Virology User Guide (Version 1.0) Variable Name SBDTC Variable Label Date/Time of Test Type Char Controlled Role Terms or Format ISO 8601 Timing CDISC Notes Core Definition: Date/time of specimen collection Exp 7.1 ASSUMPTIONS FOR THE SUBJECT BIOLOGICAL STATE MARKER (SB) DOMAIN MODEL 1. 5. The SB domain provides the linkage between a subject's observed genomic or virological findings and medical statements defined within the PB domain. Thus, based on one or more subject findings, a statement about a subject's clinical state, or about the response of a subject's virus to a treatment is made. SBMRKRID should match a value of PBMRKRID in the PB domain. In order to access medical statement pertaining to a subject, link via --MRKRID the SB and PB domains (PBSTMT). The PBMRKR variable is used to identify the members (biological markers) that belong to the group identified by PBMRKRID in the PB domain. These are then linked via SBMRKRID-PBMRKRID to the SB domain. The medical statement is about either a drug (PBDRUG) or a medical condition (PBDIAG) as designated in the PB domain. 7.2 EXAMPLES FOR SUBJECT BIOLOGICAL STATE MARKER (SB) DOMAIN MODEL 2. 3. 4. Example 1: This example below shows how to associate markers to a subject in order to communicate the presence of a particular biological state. Row 1 shows an example of one biological marker (G48V) being in the group and related back to a subject. Row 2 shows an example of multiple biological markers (L101, K20R, M361, A71, and V82T) that must be present before a virus can be said to have resistance to Indinavir. STUDYID STDY505357 STDY505357 DOMAIN SB USUBJID 521298 SB 521298 SBSEQ 1 SBGRPID 1 2 1 VISITNUM 1 VISIT BASELINE VISITDY -1 SBDTC 3/1/2010 1 BASELINE -1 3/1/2010 SBREFID D391395 G48V SBGENTYP Sector SBGENROI Protease SPSPCIES HIV SBSTRAIN 1a SBNAME BIOTECHA D391395 L101+K20R+M361+A71+V82T Sector Protease HIV 1a BIOTECHA © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional SBMRKRID Page 36 December 6, 2012 CDISC Virology User Guide (Version 1.0) APPENDICES © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 37 December 6, 2012 CDISC Virology User Guide (Version 1.0) Appendix A – New and Deleted Domains and Variables In order to develop this virology supplement to the SDTM, new SDTM domains and variables were created to complement the SDTM and SDTMIG. Classification Major Type Deletion Domain(s) PG/PF Major Major Addition Addition BS, BE, ES PG, PF, PB, SB Major Addition VR Major Addition to Findings PG, PF Major Addition to Identifiers Addition to Findings Addition to Findings PG, PF Addition to Identifiers Addition to Findings Addition to Findings Addition to Findings Addition to Findings Addition to Findings PG, PF Major Major Major Major Major Major Major Major PG, PF, SB, PB Description The LOINC variable will be deleted in the next version of the SDTM. New terminology variable will provide more functionality will take its place. New domains for specimen collection and handling. New domains for OMICS data (e.g., gene expression, genetic variation). Future releases will include cytogenetics. New domain to store viral resistance data collecting during viral load testing. --TSTRCD, --TSTRNM, --TSTRVR Terminology variables that support the content in TEST and TESTCD. --ASYID for Assay ID – a unique identifier for a test maintained by the lab. --SPCIES and --STRAIN – use to describe the organism whose DNA/RNA is undergoing testing. --RESRCD, --RESRNM, --RESRVR Terminology variables that support the encoding of the test results. --LNKID used to support the linking of results between domains. --DRUG, --INDIC, --MRKR, --STMT, --DIAG to support documenting the PGx biological markers being used. --GENROI, --GENTYP, --GENLOC PB, SB --MRKRID is a biomarker identifier. PF --RLOCID is a public database entry identifier. PF --MUTYP can be used to designate somatic or germline. PG, PF PG, PF PB © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 38 December 6, 2012 CDISC Virology User Guide (Version 1.0) Appendix B – Virology Concept Maps This section includes initial concept maps for viral drug resistance testing, viral genetic variation testing, and inference of drug resistance from genetic variations results. The VR concept map shows viral resistance testing to produce inhibitory concentration results (IC50, IC95), and results derived from inhibitory concentration results. The genetic testing diagram shows the mapping of date into the PG and PF domains. The combination of viral resistance and genetic testing to develop knowledge of genetic variations that cause viral resistance is mapped to the PB domain. Inferred viral resistance from genetic testing results is shown mapped to the SB domain. Note that specimen collection includes extraction of part of a specimen to create a different kind of specimen. Figure 1 below shows the color codes used in the maps that follow. Figure 1 - Color Codes B. 1 Virology Resistance Testing Maps Details of specimen handling may be stored in biospecimen domains being developed as part of the SDTMIG-PGx document. Figure 2 shows inhibitory concentration results (IC50, IC95) stored in the VR domain. Figure 2 – Inhibitory Concentration Results © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 39 December 6, 2012 CDISC Virology User Guide (Version 1.0) Figure 3 shows derivation of ‘fold increase” results (comparisons between two inhibitory concentration results). Shows the net assessment based on all the “fold increase” results. Note: Comparisons to baseline are not always part of the net assessment, Figure 3 – From Inhibitory Concentrations to New Assessment of Resistance B. 2 Genetic Testing Figure 4 for genetic testing shows original specimen collection and processing of a specimen to produce a viral DNA specimen not shown. The map also includes set-up parameters and QC testing results (at the far left in the diagram) stored in PG. Genetic results are mapped to the PF domain. © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 40 December 6, 2012 CDISC Virology User Guide (Version 1.0) Figure 4: Genetic Testing of Virus © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 41 December 6, 2012 CDISC Virology User Guide (Version 1.0) B. 3 Building Knowledge of Viral Resistance Mutation Viral samples undergo both viral resistance and genetic mutation testing. Results of these tests are compiled, and from them knowledge of what genetic variations confer resistance to what drugs can be determined. These sets of variations that confer resistance to a particular drug are stored in the PB domain. Figure 5: Combining Viral Resistance and Viral Mutation Knowledge © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 42 December 6, 2012 CDISC Virology User Guide (Version 1.0) B. 4 Inferring Viral Resistance from Genetic Mutation Results Once it is known which sets of genetic variations confer resistance to a drug, a sample need not be tested for viral resistance directly. Genetic testing shows what genetic variations a virus has. A virus’s genetic variations can be compared to reference data in the PB domain to infer its resistance to drugs. Inferred resistance is stored in the SB domain. Figure 6: Viral Drug Resistance Inferred from Variations © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 43 December 6, 2012 CDISC Virology User Guide (Version 1.0) Appendix C – Participating Individuals and Organizations Leadership Team Carol Vaughn Chuck Cooper Diane Wold Doris Li Fatima Elnigoumi Fred Wood Helena Sviglin James Sullivan Jenise Gillespie-Pederson Joy Li Joyce Hernandez Patricia Wesolowski Patrick Harrington Phil Pochon Rhonda Facile Rich Nagel Sharon Wang Ward Lemaire Wayne Kubick Organization Affiliation Sanofi-Aventis FDA, CDER Data Standards Program GSK IMCLONE FDA, CDER Data Standards Program Octagon Research Solutions FDA, Center for Drug Evaluation and Research (CDER) Vertex FDA, CDER Data Standards Program FDA, Center for Drug Evaluation and Research (CDER) Merck (Project Leader) Vertex FDA, Center for Drug Evaluation and Research (CDER) Covance (Project Co-Leader) CDISC (Project Co-Leader) Liaison Technologies Genentech J&J CDISC © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 44 December 6, 2012 CDISC Virology User Guide (Version 1.0) Appendix D – Representations and Warranties, Limitations of Liability, and Disclaimers CDISC Patent Disclaimers It is possible that implementation of and compliance with this standard may require use of subject matter covered by patent rights. By publication of this standard, no position is taken with respect to the existence or validity of any claim or of any patent rights in connection therewith. CDISC, including the CDISC Board of Directors, shall not be responsible for identifying patent claims for which a license may be required in order to implement this standard or for conducting inquiries into the legal validity or scope of those patents or patent claims that are brought to its attention. Representations and Warranties Each Participant in the development of this standard shall be deemed to represent, warrant, and covenant, at the time of a Contribution by such Participant (or by its Representative), that to the best of its knowledge and ability: (a) it holds or has the right to grant all relevant licenses to any of its Contributions in all jurisdictions or territories in which it holds relevant intellectual property rights; (b) there are no limits to the Participant¹s ability to make the grants, acknowledgments, and agreements herein; and (c) the Contribution does not subject any Contribution, Draft Standard, Final Standard, or implementations thereof, in whole or in part, to licensing obligations with additional restrictions or requirements inconsistent with those set forth in this Policy, or that would require any such Contribution, Final Standard, or implementation, in whole or in part, to be either: (i) disclosed or distributed in source code form; (ii) licensed for the purpose of making derivative works (other than as set forth in Section 4.2 of the CDISC Intellectual Property Policy (³the Policy²)); or (iii) distributed at no charge, except as set forth in Sections 3, 5.1, and 4.2 of the Policy. If a Participant has knowledge that a Contribution made by any Participant or any other party may subject any Contribution, Draft Standard, Final Standard, or implementation, in whole or in part, to one or more of the licensing obligations listed in Section 9.3, such Participant shall give prompt notice of the same to the CDISC President who shall promptly notify all Participants. No Other Warranties/Disclaimers. ALL PARTICIPANTS ACKNOWLEDGE THAT, EXCEPT AS PROVIDED UNDER SECTION 9.3 OF THE CDISC INTELLECTUAL PROPERTY POLICY, ALL DRAFT STANDARDS AND FINAL STANDARDS, AND ALL CONTRIBUTIONS TO FINAL STANDARDS AND DRAFT STANDARDS, ARE PROVIDED ³AS IS² WITH NO WARRANTIES WHATSOEVER, WHETHER EXPRESS, IMPLIED, STATUTORY, OR OTHERWISE, AND THE PARTICIPANTS, REPRESENTATIVES, THE CDISC PRESIDENT, THE CDISC BOARD OF DIRECTORS, AND CDISC EXPRESSLY DISCLAIM ANY WARRANTY OF MERCHANTABILITY, NONINFRINGEMENT, FITNESS FOR ANY PARTICULAR OR INTENDED PURPOSE, OR ANY OTHER WARRANTY OTHERWISE ARISING OUT OF ANY PROPOSAL, FINAL STANDARDS OR DRAFT STANDARDS, OR CONTRIBUTION. Limitation of Liability IN NO EVENT WILL CDISC OR ANY OF ITS CONSTITUENT PARTS (INCLUDING, BUT NOT LIMITED TO, THE CDISC BOARD OF DIRECTORS, THE CDISC PRESIDENT, CDISC STAFF, AND CDISC MEMBERS) BE LIABLE TO ANY OTHER PERSON OR ENTITY FOR ANY LOSS OF PROFITS, LOSS OF USE, DIRECT, INDIRECT, INCIDENTAL, CONSEQUENTIAL, OR SPECIAL DAMAGES, WHETHER UNDER CONTRACT, TORT, WARRANTY, OR OTHERWISE, ARISING IN ANY WAY OUT OF THIS POLICY OR ANY RELATED AGREEMENT, WHETHER OR NOT SUCH PARTY HAD ADVANCE NOTICE OF THE POSSIBILITY OF SUCH DAMAGES. Note: The CDISC Intellectual Property Policy can be found at: http://www.cdisc.org/about/bylaws_pdfs/CDISCIPPolicy-FINAL.pdf © 2012 Clinical Data Interchange Standards Consortium, Inc. All rights reserved Provisional Page 45 December 6, 2012