CDISC Pharmacogenomics Standards Joyce Hernandez (Joyce Hernandez Consulting, LLC) © CDISC 2014 1 Agenda • • • • • • Project background Domains, Relationships & Molecular Concepts Variables Specimen Genealogy Specimen Hierarchy Pharmacogenomics (PGx) Examples: Biospecimen events and findings Genetic Variation Gene Expression • Next steps and team © CDISC 2014 Background • Initial Data Focus for Version 1.0 Specimen Collection and Handling Specimen Hierarchy Genetic Variation utilizing well-known standards (HGVS) Genotyping data (common formats currently used) Viral Genetics (includes some viral classification variables) • Special sections to enhance understanding Glossary of genetic and genomic terms Nomenclatures (HGVS, HLA) CMAPS to document common processes © CDISC 2014 New Domains to support PGx © CDISC 2014 4 BE - Specimen Event Domain Relationships within a USUBJID BEREFID BS - Specimen USUBJID BSREFID STUDYID PG - Setup and QC USUBJID PGREFID PF -Findings USUBJID PGREFID SB - Subject Biological State USUBJID SBREFID SBMRKRID PB - PGx Biologoical State PBREFID PBMRKRID © CDISC 2014 5 Molecular concepts represented in the domains 1 2 3 4 4c 5 4a 4b © CDISC 2014 6 PGx Specific Variables - Specimen RELSPEC Name --REFID --PARENT Label Reference ID Notes Specimen identifier. Specimen Parent When the specimen in question has been obtained from another specimen (e.g., via resectioning, aliquoting), --PARENT holds the --REFID of the “parent specimen;” that is, the specimen from which the current specimen has been obtained. --SPCLVL Specimen Level Any specimen obtained directly from the subject has a specimen level of 1. Specimens obtained from a level 1 specimen have a specimen level of 2; from a level 2 specimen have a specimen level of 3; etc. A level 4 specimen, therefore, would be a specimen (4) obtained from a specimen (3) obtained from a specimen (2) obtained from a specimen (1) obtained from the subject. --DTC Date/Time Collected Date/time of specimen collection. For specimens with a specimen level greater than 1, --DTC refers specifically the date/time of collection for the originating specimen, i.e., the specimen obtained directly from the subject. A specimen is a sample of the subject which undergoes a test in place of the subject when the test cannot be performed on the subject directly, with the understanding that any results obtained thereby may be treated as pertaining to the subject. However, once the specimen has been separated from the subject, any changes in the subject’s state will not be reflected by the specimen. Therefore, when a test is performed on a specimen, the results cannot be guaranteed to pertain to the subject as they are at the time of the test, only to the subject as they were at the time of specimen collection. © CDISC 2014 7 List of IG Use Cases • BE/BS – Biospecimen Domains • RELSPEC – Related Specimens • Run parameters for PCR. Details of SNP probe assays. PB/SB – PGx Marker Domains • Protein variation in viral genetics. Protein and nucleic variation in viral genetics. Frame shifts, both viral and subject. Nucleotide reads. Zygosity. Single-nucleotide polymorphisms (SNP) reads. HLA allelic records Observed somatic vs. gremlins variations. Observed levels of somatic variations in a biopsy sample. Gene expression measured via qRT-PCR. Gene expression measured via microarray. PG – PGx Methods and Supporting Information • Specimen genealogy and hierarchy. PF – Pharmacogenomics Findings • Specimen handling such as freeze/thaw cycles and transportation. Steps in obtaining cell-free RNA from blood plasma. Types of quality evaluation. Simple and complex genetic markers for drug resistance. Relating PGx Domains A somatic variation and its related medical diagnosis. Germline variations and related inherited risk of cancer. Genetic variations relating to drug metabolism. © CDISC 2014 8 Specimen Genealogy RELSPEC Row STUDYID USUBJID REFID SPEC 1 ABC-123 001-01 SPC-001 TISSUE 2 ABC-123 001-01 SPC-001-A TISSUE SPC-001 2 3 ABC-123 001-01 SPC-001-B TISSUE SPC-001 2 4 ABC-123 001-01 SPC-001-B-1 DNA SPC-001-B 3 5 ABC-123 001-01 SPC-003 BRAIN 6 ABC-123 001-01 SPC-003-A RNA © CDISC 2014 PARENT SPCLVL 1 1 SPC-003 2 9 Biospecimen Events and Findings Row STUDYID DOMAIN USUBJID 1 ABC134 BE 43871 2 ABC134 BE 43871 3 ABC134 BE 43871 4 ABC134 BE 43871 5 ABC134 BE 43871 Row 1 (cont) 2 (cont) 3 (cont) 4 (cont) 5 (cont) Row BEBODSYS Nervous System [A08] Nervous System [A08] Nervous System [A08] Nervous System [A08] Nervous System [A08] STUDYID DOMAIN SPDEVID BESEQ BEREFID BETERM BEDECOD 1 1148.267 Excision 2 1148.267 Flash Frozen TS409871 309827 LN43871 3 1148.267 4 1148.267 Stored in Freezer Thaw 5 1148.267 Shipped BELOC VISITNUM VISIT BRAIN 1 BASELINE BRAIN 1 BASELINE BRAIN 1 BASELINE BRAIN 1 BASELINE BRAIN 1 BASELINE USUBJID BSSEQ BSREFID BECAT BESCAT EXCISION COLLECTION SOFT TISSUE FLASH FROZEN PREP STORED STORING 2005-03-20 2005-03-20 2005-03-20 2005-03-20 ABC134 BS 43871 1 2 ABC134 BS 43871 2 BSSTRESU BSPEC 1 (cont) cm3 BRAIN 2 (cont) C BRAIN © CDISC 2014 BSANTREG CEREBRAL AQUEDECT CEREBRAL AQUEDECT 01 BESTDTC 2005-0320T15:07 2005-0320T15:07 2005-0320T13:22 2005-0321T10:29 2005-0321T11:00 TRANSPORT BEENDTC 2005-0320T13:22 2005-0321T10:29 2005-0321T10:36 2005-0321T15:00 BSCAT BSORRES VOLUME Volume SPECIMEN MEASURE MENT 2 cm3 2 FFRZTMP Flash Frozen Temp SPECIMEN HANDLING -80 C -80 1148.267 Row PREP ABC LAB BSTEST 1148.267 1 BEPRTYID THAW SHIPPED BEDTC 2005-03-20 BSTESTCD BEPARTY BSORRESU BSSTRESC BSBLFL VISITNUM BSDTC Y 1 2005-03-20 BSSTRESN 2 10 Variables – (Pathogens) Name Label Notes --MSPCES*** Microorganism Species In findings domains, --SPCIES holds the species of the pathogen to which the subject is a host when the pathogen is the focus of the. In instances when both the subject and the pathogen are tested, records for the pathogen are distinguished and differentiated from records for the subject by the use of the --SPCIES variable. Not to be confused with DMSPCIES, which holds the species of the subject. --MSTRN Microorganism Strain As --SPCIES. --STRAIN holds the strain of the pathogen to which the subject is a host when the pathogen is the focus of the test. *** SDTMIG omits --SPCIES because all subjects in most human clinical trials must be homo sapiens; the nature of the study obviates the need for this information to be included in SDTM datasets. The exception is Virology when a viral species must be identified. © CDISC 2014 11 Variables – (Genetics/Genomics Test related) Name Label Notes For genetic variation, usually the level of granularity and/or molecular component of interest: Examples: Nucleotide, Amino Acid, Allele --TEST Test Name --REFSEQ Reference Sequence --GENTYP Type of Genetic Region of The type the portion of the genome serving as a locus for the experiment/test. Examples: GENE, SECTOR, Interest PROTEIN --GENRI Genetic Region of Interest The portion of the genome serving as a locus for the experiment/test. Often the name of a gene. Examples: EGFR, KRAS, CYP2D6 --GENLI Genetic Location of Interest The numeric position within the sequence for the targeted read. Compare vs. --GENLOC. --GENLI and --GENTGT are variables that should be used only when the the test specifies a single genetic read to the exclusion of all other possibilities, and the result is a matter of occurrence, either as a percentage or as a boolean observation. --GENTGT Genetic Target The genetic read targeted by the probe at the position specified by --GENLI. --ALLELC Allele Humans are diploid: they have two homologous copies of each chromosome. However, the two copies are not necessarily identical, since one chromosome is inherited from each parent. Therefore, in tests that compare chromosomes, or parts of chromosomes (alleles), the --ALLELE variable is used to denote results for one or the other of the two alleles (chromosomes). © CDISC 2014 Depending on the type of test method, the reference sequencing is most likely to be either the rsID from dbSNP (for targeted tests) or a GenBank accession number (for non-targeted tests). 12 Variables – (Genetics/Genomics Result related) Name Label Notes --GENSR Genetic Sub-Region The sub-region within the genetic region of interest in which the observed varition at the position given in --GENLOC is located, if relevant. Because exon numbers can be variable and are not regulated, caution should be exercised when populating this variable. --GENLOC Genetic Location One of the three variables used to define a genetic read. --GENLOC holds the numeric position within the sequence for the observed result. --ORRES Result or Finding in Original Units --ORREF Reference Result One of the three variables used to define a genetic read. --ORREF holds the expected result at the position specified by --GENLOC according to the reference sequence specified by --REFSEQ. --STRESC Result or Finding in Standard Format When --GENLOC is populated, --STRESC holds the observed variation, given in HGVS nomenclature. When --GENLI is populated and --ORRES=Y, --STRESC holds the observed variation as targeted, given in HGVS nomenclature. Otherwise, --STRESC is copied or derived from --ORRES. --RSNUM Reference SNP Reference identifier for previously identified instances of the variation, such as the rs# in dbSNP. --MUTYP Mutation Type The type of mutation, usually either GERMLINE (inherited) or SOMATIC (arising only in parts of the individual, as in cancer). --ANMETH Analysis Method Analysis method applied to obtain a summarized result. Analysis method describes the method of secondary processing applied to a complex observation result (e.g. an image or a genetic sequence). © CDISC 2014 One of the three variables used to define a genetic read. --ORRES holds the observed result at the position specified by --GENLOC. When --GENLI is populated, --ORRES follows the standard rules. 13 Genetic Variation Example Row STUDYID DOMAIN USUBJID PGSEQ PGTESTCD 1 ABC-01234 PG 17C0154 1 EXON 2 ABC-01234 PG 17C0154 2 SEQSTART 3 ABC-01234 PG 17C0154 3 SEQLONG PGTEST PGGENTYP Exons Sequenced Sequence Start Sequence Length PGGENRI GENE EGFR GENE EGFR GENE EGFR Row STUDYID DOMAIN USUBJID PFSEQ PFREFID PFTESTCD PFTEST PFGENRI PFREFSEQ 1 ABX-01256 PF XX7-154 1 5493283 NUC Nucleotide EGFR NM_005228.3 2 ABX-01256 PF XX7-212 1 8970343 NUC Nucleotide EGFR NM_005228.3 3 ABX-01256 PF XX7-220 1 7629230 NUC Nucleotide EGFR NM_005228.3 Row PFGENSR PFSTRESC 1 (cont) Exon 18 c.2156G>C 2 (cont) Exon 20 3 (cont) Exon 16 Row STUDYID PFMETHOD PFRUNID VISITNUM PFDTC 5.23.445.1.4.1650 Biotech ABC 08.1.8:86175 Massively Parallel Sequencing 8970723 1 2012-1023T10:06 c.2369C>T 5.23.445.1.4.1650 Biotech ABC 08.1.8:87952 Massively Parallel Sequencing 8925000 1 2012-1023T12:50 c.2073A>T 5.23.445.1.4.1650 Biotech ABC 08.1.8:87970 Massively Parallel Sequencing 8925018 1 2012-1023T13:03 DOMAIN PFXNAM PBSEQ PBMRKRID PFNAM PBGENTYP PBGENRI PBDRUG PGDIAG Astrocytoma PGCAT PGORRES PGSTRESC 13-21 13-21 1499 1499 1127 1127 GENETIC VARIATION GENETIC VARITATION GENETIC VARITATION PFCAT PFORRES PFORREF PFGENLOC C G 2156 T C 2369 T A 2073 GENETIC VARIATION GENETIC VARIATION GENETIC VARIATION PBMRKR PBSTMT 2073A>T Decreased risk of diffusely infiltrating astrocytoma 1 ABC-01234 PB 1 2073A>T GENE EGFR 2 ABC-01234 PB 2 G719A GENE EGFR EGFR TKIs G719A Increased sensitivity 3 ABC-01234 PB 3 T790M GENE EGFR EGFR TKIs T790M Decreased sensitivity Row STUDYID DOMAIN USUBJID SBSEQ SBREFID SBMRKRID SBGENTYP SBGENRI SBNAM VISITNUM SBDTC 1 ABC-01234 SB 17C0154 1 5493283 G719A GENE EGFR Biotech ABC 1 2012-10-23T10:06 2 ABC-01234 SB 17C0212 1 8970343 T790M GENE EGFR Biotech ABC 1 2012-10-23T10:06 3 ABC-01234 SB 17C0220 1 7629230 2073A>T GENE EGFR Biotech ABC 1 2012-10-23T10:06 © CDISC 2014 14 Gene Expression Example – Arrays Row STUDYID DOMAIN USUBJID SPDEVID PFSEQ PFGRPID PFREFID PFTESTCD PFTEST PFCAT PFORRES 1 A12345 PF 43871 AGSG4900DA 2 1 2287.09443 NINT1VAL Normalized Intensity 1 Value Analytic 1.16279 2 A12345 PF 43871 AGSG4900DA 3 1 2287.09443 NINT2VAL Normalized Intensity 2 Value Analytic 0.96469 3 4 A12345 A12345 PF PF 43871 43871 MANAN03 MANAN03 4 5 1 1 2287.09443 2287.09443 PVAL FOLDCHG P Value Fold Change Post-Analytic Post-Analytic 0.05391 1.8 VISITNUM 2 PFDTC 2005-0321T11:28:17 2 2005-0321T11:28:17 2 2005-0321T11:28:17 2 2005-0321T11:28:17 Row PFSTRESC PFSTRESN PFXFN PFNAM PFSPEC PFMETHOD PFRUNID PFANMETH 1 (cont) 1.16279 1.16279 2.16.090.1.1357 64.3.4:7280912 Deluxe Central Labs RNA Microarray 1000450001 LOWESS 2 (cont) 0.96469 0.96469 2.16.090.1.1357 64.3.4:7280912 Deluxe Central Labs RNA Microarray 1000450001 LOWESS 3 (cont) 0.05391 0.05391 2.16.090.1.1357 64.3.4:7280912 Deluxe Central Labs RNA 4 (cont) 1.8 1.8 2.16.090.1.1357 64.3.4:7280912 Deluxe Central Labs RNA Row STUDYID Microarray Microarray PFBLFL 1000450001 1000450001 DOMAIN SPDEVID DISEQ DIPARMCD DIPARM DIVAL AGM-G4851B AGM-G4851B AGM-G4851B AGS- G4900DA 1 2 3 TYPE MANUF MODEL Device Type Manufacturer Model Microarray Kit Agilent G4851B 1 TYPE Device Type Microarray Scanner AGS- G4900DA AGS- G4900DA MANAN03 2 3 1 MANUF MODEL TYPE Manufacturer Model Device Type Agilent G4900DA Workstation 1 2 3 A12345 A12345 A12345 DI DI DI 4 A12345 DI 5 6 7 A12345 A12345 A12345 DI DI DI © CDISC 2014 15 Next Steps • • • • Currently under CDISC internal review Public Review Posting – 2nd Quarter Final Posting – 3rd Quarter Next Project – 4th Quarter - Cytogenetics © CDISC 2014 16 Contact Information and Team Anyone that wishes to join the team please contact Joyce: Joyce.hernandez_0029@yahoo.com Name Joyce Hernandez, Team Leader Company Joyce Hernandez Consulting Mohtaram Bahmanian Sally Cassals Rhonda Facile Doris Li Cliona Molony Mona Oakes Phil Pochon Janet Reich Ellen Schatz James Sullivan Richard Tyhach Patricia Wesolowski Diane Wold Darcy Wold Fred Wood ImClone Independent Consultant CDISC ImClone Merck ImClone Covance Amgen Eli Lilly Vertex Eli Lilly Vertex GSK Independent Consultant Accenture Helena Sviglin Patrick Harrington Joy Li FDA Liaison FDA FDA © CDISC 2014 17