CDISC PGxIG Presentation 2014-Apr-17

advertisement
CDISC Pharmacogenomics
Standards
Joyce Hernandez (Joyce Hernandez Consulting, LLC)
© CDISC 2014
1
Agenda
•
•
•
•
•
•
Project background
Domains, Relationships & Molecular Concepts
Variables
Specimen Genealogy
Specimen Hierarchy
Pharmacogenomics (PGx) Examples:
 Biospecimen events and findings
 Genetic Variation
 Gene Expression
• Next steps and team
© CDISC 2014
Background
• Initial Data Focus for Version 1.0





Specimen Collection and Handling
Specimen Hierarchy
Genetic Variation utilizing well-known standards (HGVS)
Genotyping data (common formats currently used)
Viral Genetics (includes some viral classification
variables)
• Special sections to enhance understanding
 Glossary of genetic and genomic terms
 Nomenclatures (HGVS, HLA)
 CMAPS to document common processes
© CDISC 2014
New Domains to support PGx
© CDISC 2014
4
BE - Specimen Event
Domain
Relationships
within a
USUBJID
BEREFID
BS - Specimen
USUBJID
BSREFID
STUDYID
PG - Setup and QC
USUBJID
PGREFID
PF -Findings
USUBJID
PGREFID
SB - Subject Biological State
USUBJID
SBREFID
SBMRKRID
PB - PGx Biologoical State
PBREFID
PBMRKRID
© CDISC 2014
5
Molecular concepts represented in the
domains
1
2
3
4
4c
5
4a
4b
© CDISC 2014
6
PGx Specific Variables - Specimen
RELSPEC
Name
--REFID
--PARENT
Label
Reference ID
Notes
Specimen identifier.
Specimen Parent
When the specimen in question has been obtained from another specimen (e.g., via resectioning, aliquoting), --PARENT
holds the --REFID of the “parent specimen;” that is, the specimen from which the current specimen has been obtained.
--SPCLVL
Specimen Level
Any specimen obtained directly from the subject has a specimen level of 1. Specimens obtained from a level 1 specimen
have a specimen level of 2; from a level 2 specimen have a specimen level of 3; etc. A level 4 specimen, therefore, would
be a specimen (4) obtained from a specimen (3) obtained from a specimen (2) obtained from a specimen (1) obtained
from the subject.
--DTC
Date/Time Collected
Date/time of specimen collection. For specimens with a specimen level greater than 1, --DTC refers specifically the
date/time of collection for the originating specimen, i.e., the specimen obtained directly from the subject.
A specimen is a sample of the subject which undergoes a test in place of the subject when the test cannot be performed
on the subject directly, with the understanding that any results obtained thereby may be treated as pertaining to the
subject. However, once the specimen has been separated from the subject, any changes in the subject’s state will not be
reflected by the specimen. Therefore, when a test is performed on a specimen, the results cannot be guaranteed to
pertain to the subject as they are at the time of the test, only to the subject as they were at the time of specimen
collection.
© CDISC 2014
7
List of IG Use Cases
•
BE/BS – Biospecimen Domains



•
RELSPEC – Related Specimens

•
Run parameters for PCR.
Details of SNP probe assays.
PB/SB – PGx Marker Domains

•
Protein variation in viral genetics.
Protein and nucleic variation in viral genetics.
Frame shifts, both viral and subject.
Nucleotide reads.
Zygosity.
Single-nucleotide polymorphisms (SNP) reads.
HLA allelic records
Observed somatic vs. gremlins variations.
Observed levels of somatic variations in a biopsy sample.
Gene expression measured via qRT-PCR.
Gene expression measured via microarray.
PG – PGx Methods and Supporting Information


•
Specimen genealogy and hierarchy.
PF – Pharmacogenomics Findings











•
Specimen handling such as freeze/thaw cycles and transportation.
Steps in obtaining cell-free RNA from blood plasma.
Types of quality evaluation.
Simple and complex genetic markers for drug resistance.
Relating PGx Domains



A somatic variation and its related medical diagnosis.
Germline variations and related inherited risk of cancer.
Genetic variations relating to drug metabolism.
© CDISC 2014
8
Specimen Genealogy
RELSPEC
Row
STUDYID
USUBJID
REFID
SPEC
1
ABC-123
001-01
SPC-001
TISSUE
2
ABC-123
001-01
SPC-001-A
TISSUE
SPC-001
2
3
ABC-123
001-01
SPC-001-B
TISSUE
SPC-001
2
4
ABC-123
001-01
SPC-001-B-1
DNA
SPC-001-B
3
5
ABC-123
001-01
SPC-003
BRAIN
6
ABC-123
001-01
SPC-003-A
RNA
© CDISC 2014
PARENT
SPCLVL
1
1
SPC-003
2
9
Biospecimen Events and Findings
Row
STUDYID
DOMAIN
USUBJID
1
ABC134
BE
43871
2
ABC134
BE
43871
3
ABC134
BE
43871
4
ABC134
BE
43871
5
ABC134
BE
43871
Row
1 (cont)
2 (cont)
3 (cont)
4 (cont)
5 (cont)
Row
BEBODSYS
Nervous System
[A08]
Nervous System
[A08]
Nervous System
[A08]
Nervous System
[A08]
Nervous System
[A08]
STUDYID
DOMAIN
SPDEVID
BESEQ
BEREFID
BETERM
BEDECOD
1
1148.267
Excision
2
1148.267
Flash Frozen
TS409871
309827
LN43871
3
1148.267
4
1148.267
Stored in
Freezer
Thaw
5
1148.267
Shipped
BELOC
VISITNUM
VISIT
BRAIN
1
BASELINE
BRAIN
1
BASELINE
BRAIN
1
BASELINE
BRAIN
1
BASELINE
BRAIN
1
BASELINE
USUBJID
BSSEQ
BSREFID
BECAT
BESCAT
EXCISION
COLLECTION
SOFT
TISSUE
FLASH
FROZEN
PREP
STORED
STORING
2005-03-20
2005-03-20
2005-03-20
2005-03-20
ABC134
BS
43871
1
2
ABC134
BS
43871
2
BSSTRESU
BSPEC
1 (cont)
cm3
BRAIN
2 (cont)
C
BRAIN
© CDISC 2014
BSANTREG
CEREBRAL
AQUEDECT
CEREBRAL
AQUEDECT
01
BESTDTC
2005-0320T15:07
2005-0320T15:07
2005-0320T13:22
2005-0321T10:29
2005-0321T11:00
TRANSPORT
BEENDTC
2005-0320T13:22
2005-0321T10:29
2005-0321T10:36
2005-0321T15:00
BSCAT
BSORRES
VOLUME
Volume
SPECIMEN
MEASURE
MENT
2
cm3
2
FFRZTMP
Flash
Frozen
Temp
SPECIMEN
HANDLING
-80
C
-80
1148.267
Row
PREP
ABC LAB
BSTEST
1148.267
1
BEPRTYID
THAW
SHIPPED
BEDTC
2005-03-20
BSTESTCD
BEPARTY
BSORRESU BSSTRESC
BSBLFL
VISITNUM
BSDTC
Y
1
2005-03-20
BSSTRESN
2
10
Variables – (Pathogens)
Name
Label
Notes
--MSPCES***
Microorganism
Species
In findings domains, --SPCIES holds the species of the pathogen to which the subject is a host when the
pathogen is the focus of the. In instances when both the subject and the pathogen are tested, records for the
pathogen are distinguished and differentiated from records for the subject by the use of the --SPCIES variable.
Not to be confused with DMSPCIES, which holds the species of the subject.
--MSTRN
Microorganism
Strain
As --SPCIES. --STRAIN holds the strain of the pathogen to which the subject is a host when the pathogen is
the focus of the test.
*** SDTMIG omits --SPCIES because all subjects in most human clinical trials must be homo sapiens; the nature of the study
obviates the need for this information to be included in SDTM datasets. The exception is Virology when a viral species must be
identified.
© CDISC 2014
11
Variables – (Genetics/Genomics Test related)
Name
Label
Notes
For genetic variation, usually the level of granularity and/or molecular component of interest: Examples: Nucleotide,
Amino Acid, Allele
--TEST
Test Name
--REFSEQ
Reference Sequence
--GENTYP
Type of Genetic Region of The type the portion of the genome serving as a locus for the experiment/test. Examples: GENE, SECTOR,
Interest
PROTEIN
--GENRI
Genetic Region of Interest The portion of the genome serving as a locus for the experiment/test. Often the name of a gene. Examples: EGFR,
KRAS, CYP2D6
--GENLI
Genetic Location of
Interest
The numeric position within the sequence for the targeted read. Compare vs. --GENLOC.
--GENLI and --GENTGT are variables that should be used only when the the test specifies a single genetic read to
the exclusion of all other possibilities, and the result is a matter of occurrence, either as a percentage or as a
boolean observation.
--GENTGT
Genetic Target
The genetic read targeted by the probe at the position specified by --GENLI.
--ALLELC
Allele
Humans are diploid: they have two homologous copies of each chromosome. However, the two copies are not
necessarily identical, since one chromosome is inherited from each parent. Therefore, in tests that compare
chromosomes, or parts of chromosomes (alleles), the --ALLELE variable is used to denote results for one or the
other of the two alleles (chromosomes).
© CDISC 2014
Depending on the type of test method, the reference sequencing is most likely to be either the rsID from dbSNP (for
targeted tests) or a GenBank accession number (for non-targeted tests).
12
Variables – (Genetics/Genomics Result related)
Name
Label
Notes
--GENSR
Genetic Sub-Region
The sub-region within the genetic region of interest in which the observed varition at the position given in --GENLOC
is located, if relevant. Because exon numbers can be variable and are not regulated, caution should be exercised
when populating this variable.
--GENLOC
Genetic Location
One of the three variables used to define a genetic read. --GENLOC holds the numeric position within the sequence
for the observed result.
--ORRES
Result or Finding in
Original Units
--ORREF
Reference Result
One of the three variables used to define a genetic read. --ORREF holds the expected result at the position specified
by --GENLOC according to the reference sequence specified by --REFSEQ.
--STRESC
Result or Finding in
Standard Format
When --GENLOC is populated, --STRESC holds the observed variation, given in HGVS nomenclature.
When --GENLI is populated and --ORRES=Y, --STRESC holds the observed variation as targeted, given in HGVS
nomenclature.
Otherwise, --STRESC is copied or derived from --ORRES.
--RSNUM
Reference SNP
Reference identifier for previously identified instances of the variation, such as the rs# in dbSNP.
--MUTYP
Mutation Type
The type of mutation, usually either GERMLINE (inherited) or SOMATIC (arising only in parts of the individual, as in
cancer).
--ANMETH
Analysis Method
Analysis method applied to obtain a summarized result. Analysis method describes the method of secondary
processing applied to a complex observation result (e.g. an image or a genetic sequence).
© CDISC 2014
One of the three variables used to define a genetic read. --ORRES holds the observed result at the position specified
by --GENLOC.
When --GENLI is populated, --ORRES follows the standard rules.
13
Genetic Variation Example
Row
STUDYID
DOMAIN
USUBJID
PGSEQ
PGTESTCD
1
ABC-01234
PG
17C0154
1
EXON
2
ABC-01234
PG
17C0154
2
SEQSTART
3
ABC-01234
PG
17C0154
3
SEQLONG
PGTEST
PGGENTYP
Exons
Sequenced
Sequence
Start
Sequence
Length
PGGENRI
GENE
EGFR
GENE
EGFR
GENE
EGFR
Row
STUDYID
DOMAIN
USUBJID
PFSEQ
PFREFID
PFTESTCD
PFTEST
PFGENRI
PFREFSEQ
1
ABX-01256
PF
XX7-154
1
5493283
NUC
Nucleotide
EGFR
NM_005228.3
2
ABX-01256
PF
XX7-212
1
8970343
NUC
Nucleotide
EGFR
NM_005228.3
3
ABX-01256
PF
XX7-220
1
7629230
NUC
Nucleotide
EGFR
NM_005228.3
Row
PFGENSR
PFSTRESC
1 (cont)
Exon 18
c.2156G>C
2 (cont)
Exon 20
3 (cont)
Exon 16
Row
STUDYID
PFMETHOD
PFRUNID
VISITNUM
PFDTC
5.23.445.1.4.1650
Biotech ABC
08.1.8:86175
Massively Parallel
Sequencing
8970723
1
2012-1023T10:06
c.2369C>T
5.23.445.1.4.1650
Biotech ABC
08.1.8:87952
Massively Parallel
Sequencing
8925000
1
2012-1023T12:50
c.2073A>T
5.23.445.1.4.1650
Biotech ABC
08.1.8:87970
Massively Parallel
Sequencing
8925018
1
2012-1023T13:03
DOMAIN
PFXNAM
PBSEQ
PBMRKRID
PFNAM
PBGENTYP
PBGENRI
PBDRUG
PGDIAG
Astrocytoma
PGCAT
PGORRES
PGSTRESC
13-21
13-21
1499
1499
1127
1127
GENETIC
VARIATION
GENETIC
VARITATION
GENETIC
VARITATION
PFCAT
PFORRES
PFORREF
PFGENLOC
C
G
2156
T
C
2369
T
A
2073
GENETIC
VARIATION
GENETIC
VARIATION
GENETIC
VARIATION
PBMRKR
PBSTMT
2073A>T
Decreased risk of diffusely infiltrating
astrocytoma
1
ABC-01234
PB
1
2073A>T
GENE
EGFR
2
ABC-01234
PB
2
G719A
GENE
EGFR
EGFR TKIs
G719A
Increased sensitivity
3
ABC-01234
PB
3
T790M
GENE
EGFR
EGFR TKIs
T790M
Decreased sensitivity
Row
STUDYID
DOMAIN
USUBJID
SBSEQ
SBREFID
SBMRKRID
SBGENTYP
SBGENRI
SBNAM
VISITNUM
SBDTC
1
ABC-01234
SB
17C0154
1
5493283
G719A
GENE
EGFR
Biotech ABC
1
2012-10-23T10:06
2
ABC-01234
SB
17C0212
1
8970343
T790M
GENE
EGFR
Biotech ABC
1
2012-10-23T10:06
3
ABC-01234
SB
17C0220
1
7629230
2073A>T
GENE
EGFR
Biotech ABC
1
2012-10-23T10:06
© CDISC 2014
14
Gene Expression Example – Arrays
Row
STUDYID
DOMAIN
USUBJID
SPDEVID
PFSEQ
PFGRPID
PFREFID
PFTESTCD
PFTEST
PFCAT
PFORRES
1
A12345
PF
43871
AGSG4900DA
2
1
2287.09443
NINT1VAL
Normalized Intensity 1 Value
Analytic
1.16279
2
A12345
PF
43871
AGSG4900DA
3
1
2287.09443
NINT2VAL
Normalized Intensity 2 Value
Analytic
0.96469
3
4
A12345
A12345
PF
PF
43871
43871
MANAN03
MANAN03
4
5
1
1
2287.09443
2287.09443
PVAL
FOLDCHG
P Value
Fold Change
Post-Analytic
Post-Analytic
0.05391
1.8
VISITNUM
2
PFDTC
2005-0321T11:28:17
2
2005-0321T11:28:17
2
2005-0321T11:28:17
2
2005-0321T11:28:17
Row
PFSTRESC
PFSTRESN
PFXFN
PFNAM
PFSPEC
PFMETHOD
PFRUNID
PFANMETH
1 (cont)
1.16279
1.16279
2.16.090.1.1357
64.3.4:7280912
Deluxe Central
Labs
RNA
Microarray
1000450001
LOWESS
2 (cont)
0.96469
0.96469
2.16.090.1.1357
64.3.4:7280912
Deluxe Central
Labs
RNA
Microarray
1000450001
LOWESS
3 (cont)
0.05391
0.05391
2.16.090.1.1357
64.3.4:7280912
Deluxe Central
Labs
RNA
4 (cont)
1.8
1.8
2.16.090.1.1357
64.3.4:7280912
Deluxe Central
Labs
RNA
Row
STUDYID
Microarray
Microarray
PFBLFL
1000450001
1000450001
DOMAIN
SPDEVID
DISEQ
DIPARMCD
DIPARM
DIVAL
AGM-G4851B
AGM-G4851B
AGM-G4851B
AGS- G4900DA
1
2
3
TYPE
MANUF
MODEL
Device Type
Manufacturer
Model
Microarray Kit
Agilent
G4851B
1
TYPE
Device Type
Microarray Scanner
AGS- G4900DA
AGS- G4900DA
MANAN03
2
3
1
MANUF
MODEL
TYPE
Manufacturer
Model
Device Type
Agilent
G4900DA
Workstation
1
2
3
A12345
A12345
A12345
DI
DI
DI
4
A12345
DI
5
6
7
A12345
A12345
A12345
DI
DI
DI
© CDISC 2014
15
Next Steps
•
•
•
•
Currently under CDISC internal review
Public Review Posting – 2nd Quarter
Final Posting – 3rd Quarter
Next Project – 4th Quarter - Cytogenetics
© CDISC 2014
16
Contact Information and Team
Anyone that wishes to join the team please contact Joyce:
Joyce.hernandez_0029@yahoo.com
Name
Joyce Hernandez, Team Leader
Company
Joyce Hernandez Consulting
Mohtaram Bahmanian
Sally Cassals
Rhonda Facile
Doris Li
Cliona Molony
Mona Oakes
Phil Pochon
Janet Reich
Ellen Schatz
James Sullivan
Richard Tyhach
Patricia Wesolowski
Diane Wold
Darcy Wold
Fred Wood
ImClone
Independent Consultant
CDISC
ImClone
Merck
ImClone
Covance
Amgen
Eli Lilly
Vertex
Eli Lilly
Vertex
GSK
Independent Consultant
Accenture
Helena Sviglin
Patrick Harrington
Joy Li
FDA Liaison
FDA
FDA
© CDISC 2014
17
Download