4 Hypothesis and Specific aims

advertisement
TABLE OF CONTENTS
1
ABBREVIATIONS USED IN THE PROTOCOL ................................................................3
2
ABSTRACT ...........................................................................................................................4
3
BACKGROUND AND SIGNIFICANCE .............................................................................5
4
HYPOTHESIS AND SPECIFIC AIMS .................................................................................7
4.1
Hypothesis ....................................................................................................................7
4.2
Specific Aim 1 ..............................................................................................................7
5
PRELIMINARY DATA ........................................................................................................7
6
STUDY DESIGN ...................................................................................................................8
6.1
Description ....................................................................................................................8
6.2
Study Population ...........................................................................................................8
6.2.1
Approximate Number of Subjects ..................................................................8
6.2.2
Inclusion Criteria ............................................................................................8
6.2.3
Exclusion Criteria ...........................................................................................8
6.3
Recruitment...................................................................................................................9
6.4
Study Duration ..............................................................................................................9
6.5
6.4.1
Approximate Duration of Subject Participation .............................................9
6.4.2
Approximate Duration of Study .....................................................................9
Procedures.....................................................................................................................9
6.5.1
Submission of variant data into ClinVar ........................................................9
6.5.2
Expert clinical level curation of human genomic variants ...........................11
6.5.3
Study Time and Events Table ......................................................................12
6.5.4
Study Flow Diagram ....................................................................................13
6.6
Primary Endpoints ......................................................................................................14
6.7
Statistics ......................................................................................................................14
6.7.1
Statistical Analysis Plan ...............................................................................14
6.7.2
Statistical Power and Sample Size Considerations ......................................14
Version Date October 2013
1
6.8
7
6.8.1
Data Collection and Storage .........................................................................14
6.8.2
Records Retention ........................................................................................17
SAFETY MONITORING ....................................................................................................17
7.1
8
Data Management .......................................................................................................14
Adverse Event Reporting ............................................................................................17
SAMPLE COLLECTION AND RETENTION ...................................................................17
8.1
Collection ....................................................................................................................17
8.1.1
8.2
9
Total Volume of Blood Collected ................................................................18
Retention .....................................................................................................................18
PROTECTION OF HUMAN SUBJECTS ...........................................................................18
9.1
Informed Consent .......................................................................................................18
9.2
Protection of Human Subjects Against Risks .............................................................19
9.3
Data Monitoring Plan .................................................................................................20
10
REFERENCES .....................................................................................................................21
11
ATTACHMENTS ................................................................................................................22
11.1 Attachment 1: ClinGen Structural Variation Data Submission Template ..................22
11.2 Attachment 2: ClinGen Sequence Variation Data Elements Dictionary ....................22
11.3 Attachment 3: ClinGen Sequence Variation Data Submission Template ..................22
11.4 Attachment 4: ClinGen Phenotype Data Collection Sheet .........................................22
11.5 Attachment 5: Proposed Opt-out Language ...............................................................22
Version Date October 2013
2
1
ABBREVIATIONS USED IN THE PROTOCOL
Abbreviation
AE
CETT
CLIA
CMA
dbGaP
dbSNP
dbVar
DGV
FTP
HGVS
HIPAA
HPO
ICCG
ICD9
IRB
ISCA
LSDB
NCBI
NIH
OHSR
PCPGM
PHI
RFR
SAE
SNOMED CT
WES
WGS
Version Date October 2013
Term
Adverse event
Collaboration, Education, and Test Translation
Clinical Laboratory Improvement Amendments
Cytogenomic microarray
Database of Genotypes and Phenotypes
Single Nucleotide Polymorphism Database
Database of Structural Variation
Database of Genomic Variation
File Transfer Protocol
Human Genome Variation Society
Health Insurance Portability and Accountability
Act
Human Phenotype Ontology
International Collaboration for Clinical Genomics
International Classification of Diseases, 9th
Revision
Institutional Review Board
International Standards for Cytogenomic Arrays
Locus Specific Database
National Center for Biotechnology Information
National Institutes of Health
Office of Human Subjects Research
Partners Healthcare Center for Personalized
Genetic Medicine
Protected Health Information
Reason For Referral
Serious adverse event
Systematized Nomenclature of Medicine--Clinical
Terms
Whole Exome Sequencing
Whole Genome Sequencing
3
2
ABSTRACT
Technological advances are quickly making variant detection across the whole genome
commonplace in the medical care environment, though the interpretation of such variation
remains a challenge. Determination of the biological and medical significance of structural (copy
number) and sequence-level variation in the human genome requires extensive investigation of
both normal and disease populations; obtaining large numbers of patients amongst disease
populations has historically been challenging. Now that whole genome structural and sequence
analyses are rapidly being integrated into routine clinical care, clinical genetics laboratories have
been provided a unique and timely opportunity for capturing genomic variation from thousands
of individuals during the course of clinical genetic testing. An international consortium of
cytogenomic, molecular and clinical geneticists based in diagnostic laboratories, clinical
facilities, and research laboratories, the Clinical Genome Resource (ClinGen), has been formed
to harness such clinical genomic variation and phenotypic information into a unified, curated,
publically available database. Deidentified clinical data collected, submitted, and curated through
this project will be housed within ClinVar, a database within the National Center for
Biotechnology Information (NCBI). This effort will be critical to develop the open resource for
understanding genomic variation needed to allow for more accurate interpretation of the
enormous levels of variation present in human genomes. Such a resource would also benefit
professional organizations seeking to define guidelines on how to use genetic information in
clinically useful ways. LAB NAME HERE would participate in this observational study by
contributing deidentified genotype and phenotype data generated during the course of routine
clinical genetic testing to the ClinVar database, and participate in ongoing curation, quality
assurance, and genotype-phenotype correlation projects related to this effort.
Version Date October 2013
4
3
BACKGROUND AND SIGNIFICANCE
The current landscape of human genomic variation databases includes those focused on
variation as it relates to normal populations, association with common diseases (e.g.,
variation from genome-wide association studies), pharmacologic responses, and rare
variation. Each of these types of databases provides vital information when trying to decipher
the functional significance of human genomic variation. The study of variation as it relates
to specific disease populations, however, has the greatest potential to provide information on
genotype-phenotype correlations amongst rare but highly penetrant traits, information that
could also be applied to more common diseases. Effective studies of this type of variation
require large datasets from both normal (control) and disease populations.
Several centralized, public databases of human variation in the normal population already
exist, including the Database of Genomic Variation (DGV)1, the 1000 Genomes Project 2, the
Exome Sequencing Project 3 and the Single Nucleotide Polymorphism Database (dbSNP) 4.
In contrast, there is no single, comprehensive (including both sequence and structural
variation), freely accessible database of variation from disease populations. Though freely
accessible and curated databases of structural variation exist, including the International
Standards for Cytogenomic Arrays (ISCA) Consortium database (which has now been
incorporated into ClinGen), most publicly available sequence-level genomic variation from
disease populations is either buried in publications or contained within diverse Locus
Specific Databases (LSDBs), of which over 1,500 exist
(http://www.hgvs.org/dblist/glsdb.html#). These LSDBs have several drawbacks that limit
their utility:
1) Size: most disease or gene-centric databases are very small, and many genes/genomic
regions are not yet represented;
2) Fragmentation: the data may be located in multiple, independent databases, and
submission is often confined to a limited number of research laboratories;
3) Lack of standardization: data submission and content is not standardized to allow
comparison of data submitted across laboratories;
4) Lack of curation: some data is deposited without any, or with an incorrect,
interpretation of the functional or clinical significance5; and
5) Access: the data is not easily accessible to the research and clinical communities,
particularly for use in high-throughput genomic analyses.
An additional resource for human genetic variation in disease populations is the Human Gene
Mutation Database (www.hgmd.org). This resource, however, is not freely available, and
Version Date October 2013
5
only represents published data, which is a much smaller proportion of the actual amount of
variation being observed in clinical laboratories. Further, approximately 30% of mutations
within these databases annotated from the literature are in non-standardized formats, and/or
are labeled with incorrect clinical assertions5. One reason for this high error rate is that
incorrect initial variant interpretations are only revised when an erratum to the original
manuscript is published.
Given the current state of publicly available sequence-level variant information, as well as
the fact that emerging technologies will soon allow for the detection of both structural and
sequence-level variant data on a single platform, we have joined the Clinical Genome
Resource (ClinGen), a consortium of clinical genetic testing laboratories vested in sharing
variant data in a single, open access database6. Such an approach would harness the wide
phenotypic and genotypic variant spectra observed in clinical laboratories, and take
advantage of data already being generated during the course of routine clinical care.
Historically, it has been difficult to identify and collect large groups of individuals with
specific phenotypes, particularly with Mendelian genetic traits, from an unselected
population. Clinical genetic testing laboratories, however, are uniquely poised to provide
large volumes of variant data on affected populations, as these individuals constitute their
primary sample base. In addition, as phenotypic information is routinely captured by many
clinical laboratories in order to properly process and interpret test results, basic phenotype
information is already available for many clinical testing cases.
ClinGen will leverage the hundreds of thousands of tests performed every year within
clinical laboratories around the world on affected patients with demonstrable phenotypes to
build a database of clinical grade genomic variation. As a member of this organization, LAB
NAME HERE will be one of the many entities contributing deidentified genotype and
phenotype data to this effort. Initial efforts will focus on rare disorders; the majority of
clinical genetic testing performed today is in this category, and this information has the
highest immediate clinical utility in genomic medicine. We will expand our assessment of
variation to include the entire human genome as whole exome and genome sequencing
rapidly make their way into routine clinical use as primary diagnostic tests. Unlike other
database efforts, the goals of our effort are to support the submission of all variation observed
in affected individuals during clinical testing into a publically available database, and to
make this information freely and easily accessible to the community. We intend to ensure the
quality of the information through ongoing curation efforts.
Version Date October 2013
6
4
HYPOTHESIS AND SPECIFIC AIMS
4.1
Hypothesis
This is a data repository study. We will contribute to a database of primarily clinical grade
variation data across the genome that concentrates the knowledge and curation capabilities of
our clinical laboratory community into a single environment. Capturing the large number of
clinical genetic tests being performed on patients with disease phenotypes presents a unique
opportunity to contribute to our understanding of the functional significance of human
genomic variation, which will benefit the growing community of medical genomics
clinicians and researchers. The specific aims to achieve this goal are:
4.2
Specific Aim 1
Submit de-identified variant and phenotypic data into ClinVar, a unified database at NCBI.
A large consortium of highly experienced clinical laboratories has committed to submitting
de-identified genomic data (from clinical cytogenomic microarray, single gene, gene panel,
and, later, whole exome/genome sequencing testing) and basic phenotypic data into ClinVar,
a database developed and housed within the National Center for Biotechnology Information
(NCBI). LAB NAME HERE will use existing and newly developed software bridges to
current analysis programs to ensure simple, automated mechanisms to de-identify and
reformat datasets for submission.
5
PRELIMINARY DATA
ClinGen and the ClinVar database represent the expansion of previous efforts known as the
International Standards for Cytogenomic Arrays (ISCA) Consortium
(https://www.iscaconsortium.org) and the International Collaboration for Clinical Genomics
(ICCG). The ISCA Consortium, founded in 2007, was a group of laboratories, clinicians, and
researchers dedicated to raising the standard of patient care by improving the quality of
cytogenomic microarray (CMA) testing for structural variation. One of the group’s main goals
was the creation and maintenance of a publicly available database of human structural variation.
This project has resulted in the deposition of over 40,000 cases into the database of structural
Variation (dbVar) at NCBI (http://www.ncbi.nlm.nih.gov/dbvar/studies/nstd37/). The ISCA
Consortium has also focused on other issues related to CMA testing, including clinical utility7,8
and evidence-based array design9,10 and result interpretation10,11. Realizing that these types of
issues were not unique to structural variation, the ISCA Consortium chose to expand its efforts to
include sequence-level variation, and officially became the International Collaboration for
Version Date October 2013
7
Clinical Genomics (ICCG) in 2012.6 In 2013, ICCG became a part of the Clinical Genome
Resource, combining its efforts with those of others focused on data curation and machine
learning initiatives.
6
STUDY DESIGN
6.1
Description
Geisinger Health System will be serving as the co-coordinating center, along with the Partners
Healthcare Center for Personalized Genetic Medicine (PCPGM), for this project, which aims to
establish a data repository of curated, clinical-grade genomic variation data. This project has
been approved by the IRBs at both institutions (Geisinger IRB# 2013-0367, Parnters IRB
#2013P001099/BWH). Data will be stored within the ClinVar database at the NCBI and NOT at
Geisinger or PCPGM. Data submission procedures are discussed in section 6.5, “Procedures,” as
well as section 6.8.1, “Data Collection and Storage.”
6.2
6.2.1
Study Population
Approximate Number of Subjects
The goal of this project is to place information from at least 100,000 participants worldwide in
the database. Approximately # individuals from LAB NAME HERE will participate per year;
this study will go on indefinitely.
6.2.2
Inclusion Criteria
All patients undergoing clinical cytogenomic or molecular-based testing at a participating
clinical genetics laboratory will be eligible for inclusion in this study. There are no restrictions
on age, gender, ethnicity, etc. Patients will not be undergoing any type of intervention or
procedure as a result of participating in this study. No biological specimens will be collected,
utilized, transferred, or stored solely for the purposes of this study; any interactions with
biological specimens will be solely for the purpose of the clinical genetic testing ordered on
behalf of the patient, and will conform to standard operating procedures for a clinical genetics
laboratory.
6.2.3
Exclusion Criteria
Patients undergoing ONLY biochemical genetic testing at participating clinical genetics
laboratories will be excluded from this study.
Version Date October 2013
8
6.3
Recruitment
Subjects are not being actively recruited for this project; data from ALL patients referred for
cytogenomic and/or molecular-based testing at a participating laboratory (as described in section
6.2.2, “Inclusion Criteria”) will be deidentified and submitted to ClinVar, unless the patient optsout of participation. Subjects may request not to be included (opt-out).
6.4
6.4.1
Study Duration
Approximate Duration of Subject Participation
As this is a data repository, duration of subject participation (i.e., amount of time data will
remain in the repository) is indefinite.
6.4.2
Approximate Duration of Study
As this is a data repository, duration of the study (i.e., amount of time this repository will remain
active) is indefinite.
6.5
6.5.1
Procedures
Submission of variant data into ClinVar
This project will support the submission of genotypic and phenotypic data into a centralized
location - the ClinVar database housed at NCBI. ClinVar has been developed by NCBI to
“provide a freely accessible, public archive of reports of the relationships among human
variations and phenotypes along with supporting evidence”
(http://www.ncbi.nlm.nih.gov/clinvar/intro). ClinVar will support the submission of variant
observations and basic phenotype information by laboratories, either as summarized variant data
or individual case data. The system will also enable the submission of assertions made regarding
the clinical significance of variants (e.g., “pathogenic,” “benign,” etc.) as well as the evidence to
support those assertions.
The types of data to be collected include both genotype and phenotype data. For structural
variation, the project will focus on genome-wide data generated from CMA testing, from
individuals with a broad spectrum of clinical phenotypes, including (but not limited to)
neurodevelopmental disabilities and/or congenital anomalies. For sequence variation, the
project will focus on sequence-level variants identified through both single-gene and multi-gene
panel assays from individuals referred for diagnostic, carrier, and pre-symptomatic testing for
inherited disorders. In the future, we will also include variants identified through whole-exome
(WES) and whole-genome sequencing (WGS) analysis. We will submit a separate amendment to
Version Date October 2013
9
this protocol to discuss the procedures by which this will occur at a later date. Until that time,
we will NOT be submitting variants or phenotypic information obtained during the course of
WES or WGS.
All data submitted will be stripped of all HIPAA identifiers by the submitting laboratory. The
submitting laboratory will then assign the data a unique code NOT derived from any protected
health information (PHI) (such as their lab accession number). This code is known only to the
submitting laboratory. The data is then submitted to NCBI, where the submitting laboratory’s
unique code is removed and yet another unique NCBI identifier is assigned. NCBI provides the
submitting laboratory with the link between their unique laboratory-assigned codes and the
NCBI identifier. NCBI NEVER has access to PHI, and PHI is NEVER placed in the public
database. The submitting laboratory is the only entity that can link a particular case to PHI; they
need to be able to do this in order know which cases they have already submitted to the database
(to avoid data duplication), to amend clinical reports if necessary following project curation
activities (as described below), etc. Further, because the data was originally generated as part of
routine clinical care, the submitting laboratories must keep it associated with PHI within the
confines of the laboratory, in order to comply with Clinical Laboratory Improvement
Amendments (CLIA) guidelines.
Initially, most submitted data will be derived from single gene mutation analyses, gene panel
testing, or CMA testing. Data consisting of only variant information (including clinical
interpretation) and basic phenotype information (age at time of testing, gender, phenotype
provided at the time of testing utilizing standardized vocabularies) will be considered nonidentifiable, and will therefore not require full consent or submission through an opt-out process
(see section 9.1, “Informed Consent,” for a detailed description of this process). Examples of
this type of data include individual variants identified on single gene testing, compound
heterozygous variants identified on single gene testing, or small sets of variants identified either
through CMA (the set of independent calls from an assay) or multi-gene panel tests. This deidentified data can be deposited for public access within ClinVar.
Full genome-wide datasets, such as the raw data files generated from CMA analyses, could be
considered potentially identifiable, though this possibility is considered remote. As an extra
mechanism of protection, this type of data would only be submitted by laboratories with an optout mechanism (described below in Section 9.1) in place. This type of data will also be initially
processed through the database of Genotypes and Phenotypes (dbGaP), before de-identified
variant information is released publicly through ClinVar. Potentially identifiable files will
Version Date October 2013
10
remain within dbGaP under controlled access, which requires approval by NIH before an
investigator can gain access to the data.
6.5.2
Expert clinical level curation of human genomic variants
In conjunction with the development submission of data into a single accessible environment,
tremendous effort will be focused on developing and demonstrating efforts to curate the data,
including improving the classification of variants with respect to their role in human health and
disease. The curation process is integral to the success of this and any effort attempting to
determine the functional significance of genomic variation.
ClinGen has developed, in partnership with NCBI, several algorithm-based curation tools and
integrated them into the data submission process for structural variation through its previous
project, the ISCA Consortium. The goal of this effort is to improve the quality of data entering
the database while providing immediate value to the submitter through validation and
consistency checks. Data curation is carried out at multiple levels: 1) quality checks, such as
checks for appropriate genome position and nomenclature, are performed, 2) regions where the
same laboratory has reported different clinical calls for similar observations are identified and
flagged, and 3) the submitted calls are compared against an expert curated list of regions with
established clinical significance (http://www.ncbi.nlm.nih.gov/dbvar/studies/nstd45/). The results
are presented to the submitter via an authenticated web page that includes a genome overview, a
genome browser and a tabular report of the regions to be reviewed.
Each identified "conflict" region is presented to the submitter who has the option to click on the
variant and change the call, or choose to retain the original call and ignore the conflict warning.
For these cases, the submitter is asked to provide a reason for their decision. Upon completion of
the review, the submitter finalizes the submission, which will incorporate a history of any
changes applied. The editing history of each event is retained in a central database to allow for
full review at a later date. This curation tool provides a user-friendly system by which the
information submitted to the database can be reviewed prior to integration with the full dataset.
This type of curation process has already demonstrated significantly enhanced quality control for
clinical reporting.
Similar laboratory-level curation tools will be developed for sequence-level data. Those tools
will be available within ClinVar as described above for structural variant submission.
Version Date October 2013
11
In addition to being curated against previous submissions from our laboratory, variants will also
be curated against the submissions of other laboratories. Individual laboratories may come to
different conclusions regarding the clinical consequences of the same variant, and the expertlevel review process can provide a method of resolution12. Furthermore, as evidence available
regarding particular variants changes over time, clinical classifications made at the time of data
submission may require re-evaluation13. Expert level curation will allow for “real time”
integrated analysis of data from multiple submitting laboratories with currently available
evidence for clinical assertion.
Should any variants submitted by LAB NAME HERE receive updated clinical classifications as
a result of these curation efforts, the original clinical reports will be amended to reflect this
information, and redistributed to the original ordering clinician for discussion with the patient,
per standard clinical laboratory protocol.
6.5.3
Study Time and Events Table
Not applicable.
Version Date October 2013
12
6.5.4
Study Flow Diagram
Version Date October 2013
13
6.6
Primary Endpoints
Not applicable. The study focuses on the creation of a data repository with no specified
endpoint.
6.7
6.7.1
Statistics
Statistical Analysis Plan
The goal of this study is to create a repository of de-identified genotype/phenotype information
on individuals referred for clinical genetic testing. Because of the nature of this project, statistical
analyses are not required. This repository is to serve as a resource for other researchers, who will
determine their own analysis procedures based upon their individual research questions. Deidentified data (i.e., individual structural or sequence-level variants) within the repository will be
publically available through the ClinVar database; researchers will not need an IRB approved
protocol to access this information. Potentially identifiable data (e.g. raw data files from CMA
testing, whole exome datasets, etc.) will be released through dbGaP to qualified researchers at
the discretion of a data access committee at NCBI. Researchers wishing to access this type of
data must have an IRB approved protocol.
6.7.2
Statistical Power and Sample Size Considerations
Not applicable.
6.8
6.8.1
Data Management
Data Collection and Storage
A data submission template has previously been developed for structural variation by the ISCA
Consortium (see Attachment 1); this template will continue to be used for this expanded project.
A similar approach has begun for sequence-level variants, and a preliminary data element
dictionary and data submission template have been established (see Attachments 2 and 3,
respectively).
All variants detected in an individual during the course of routine clinical genetic testing are
stripped of identifying information and submitted to NCBI. Each variant is submitted with a
minimum amount of information necessary to define the variant (e.g., genomic location, etc.),
the laboratory method used to identify it (see Attachments 1 and 3), as well as the initial
clinical assertion (e.g. pathogenic, uncertain, benign, etc.) of the submitting laboratory. As
discussed above, all clinical assertions are subject to further evaluation by an expert
consensus process, and could be reclassified following consensus curation activities 14. For
Version Date October 2013
14
any given variant, the submitting laboratory also has the option of submitting more detailed
information, such as phenotypic information, data on variant frequency in cases versus controls,
variants found in cis/trans, publications, segregation data, functional data, analytical method,
source of data, etc.
To facilitate uniform phenotypic data collection, phenotypes, diseases and clinical terms will be
described using standard ontologies. An ontology is a computational representation of a domain
of knowledge based upon a controlled, standardized vocabulary for describing entities and the
semantic relationships between them. The terminology for phenotype ontology already built into
ClinVar comes preferentially from two sources: the Human Phenotype Ontology (HPO) 15 and
SNOMED CT (Systematized Nomenclature of Medicine--Clinical Terms;
http://www.nlm.nih.gov/research/umls/Snomed/snomed_main.html). All phenotype information
entered into the ClinVar database will be coded using some type of ontological system – no free
text phenotype information will be publicly displayed in an effort to maintain patient privacy.
For phenotypic data, the standard format will consist of general variables such as age at time of
testing, gender, race/ethnicity (if provided), type of testing (symptomatic vs. asymptomatic), and
reason for referral for clinical testing. We will use several different resources to collect
phenotype data. ClinGen has developed a one-page phenotype “checklist” utilizing HPO terms
organized by body system (see Attachment 4). These forms have been designed for use as part
of a standard clinical test requisition form, or to be used when ordering testing electronically.
ClinGen has also worked with the software company Cartagenia (www.cartagenia.com) to
develop a data submission tool incorporating these forms. Additionally, Cartagenia has
developed algorithms to automatically detect HPO classifiers from free text, such as Reason for
Referral (RFR) text. This Cartagenia algorithm includes “ICD9 expansion", a feature that allows
frequently used ICD9 (International Classification of Diseases, 9th Revision) codes to
automatically be mapped to HPO classification codes. We will utilize both of these tools, as well
as any other tools developed by ClinGen to meet our expanding needs, such as disease-specific
phenotype forms (utilizing standardized vocabulary), etc., in order to facilitate the detailed
phenotype information necessary for certain single gene or gene-panel tests.
Edit this section as appropriate for your laboratory, discussing any software/vendors you
use or plan to use:
ClinGen has also worked with different software vendors to develop a variety of tools in order to
facilitate the data submission process. These relationships will continue as the efforts of the
ISCA Consortium and ICCG are expanded into those of ClinGen. For example, Cartagenia has
worked to provide a more integrated source for laboratories to use to facilitate the deVersion Date October 2013
15
identification and transfer of data to the NCBI database. Cartagenia utilizes a web-based
database and software platform for securely managing genotype and phenotype data for patients
and study subjects, which has been adapted to the needs of the ISCA Consortium through the
creation of the ISCA BENCH platform. ISCA BENCH includes features such as security
measures of accountability and HIPAA (Health Insurance Portability and Accountability Act)
compliance, de-identification of patient identifiers, and transparent linkage of genotype and
phenotype information. The Cartagenia software has been installed in multiple laboratories and
automatically performs the de-identification and transfer of the data from the clinical laboratory
to the current ISCA database at NCBI.
In addition, multiple other array software vendors (such as Affymetrix, Agilent, BioDiscovery,
BlueGnome, Oxford Gene Technology, and Roche Nimblegen) have also incorporated the
standardized data form into their software such that genotype data is in the proper format for
direct submission to NCBI. We will continue to use these groups and others to facilitate our data
submission, and will work with others in the future to ensure that data is transferred as efficiently
as possible.
All systems used for data submission will enable the creation of unique IDs (i.e., NOT the
original clinical laboratory ID) for case submission, with links to submitter IDs that are
maintained solely by the submitting laboratory. This functionality has already been built within
Cartagenia and will be extended to other tools supported by the project. All data is transmitted to
NCBI via secure file transfer protocol (FTP) site.
ClinVar quality control and versioning
The data elements of ClinVar are stored using reliable, standard technologies, including a
relational database in order to ensure transactional integrity, backup, transaction history and
rollback, and custodianship. Structural and sequence variants submitted to ClinVar are mapped
to reference sequences and annotated according to Human Genome Variation Society (HGVS)
standards. ClinVar assigns a version number to all data submissions, allowing submitters to
update their records and retaining the previous version for review. Updates in content will
happen on a daily basis with semi-annual or annual changes in data definitions and backward
compatibility in data reporting. The system also ensures that a minimum set of data is included in
each submission. The level of confidence in the accuracy of assertions of clinical significance
depends in large part on the supporting evidence, so this information, when available, is
collected and visible to users.
Version Date October 2013
16
Several quality control checks have been built into the system: a check that the variant is defined
within valid regions of genomic sequence; a check that the variant description conforms to
HGVS nomenclature; identification of conflicts among submitters and the published literature or
prior submitted data; and validation of the phenotype relative to controlled vocabularies and
current understanding of gene to phenotype relationships. The quality control checks will be
implemented in a manner to minimize barriers to data submission yet ensure high quality data.
As such, certain checks will prevent data submission, whereas others will allow data submission
but render a warning output to the submitter. For example, variant descriptions that are not
resolvable on a reference sequence will not be allowed, whereas discrepancies in variant
classification will be provided in a discrepancy report. Both of these quality control checks of the
variant classification system have been in place for several years in support of the ISCA project
and have been highly successful in improving the quality of structural variant interpretations.
6.8.2
Records Retention
Submitted data will remain in the database indefinitely.
7
SAFETY MONITORING
7.1
Adverse Event Reporting
The procedures used during this study (de-identification and submission of existing data to a
publicly available database) are extremely unlikely to result in any adverse events (AEs). In the
event that a clinical adverse event (AEs) occurs, it will be reported to the institutional review
board (IRB) regardless of whether it is considered study related. In the event of a serious AE
(SAE), this will be reported to the IRB according to the IRB guidelines. All other AEs will be
summarized and submitted to the IRB during continuing review.
8
SAMPLE COLLECTION AND RETENTION
8.1
Collection
No samples are being collected solely for the purposes of this research study. This study strictly
involves submission of existing data (results of clinical genetic testing performed on samples
acquired for the purpose of such testing) into a data repository. No biological specimens are
being stored, transferred, or analyzed for the purposes of this study.
Version Date October 2013
17
8.1.1
Total Volume of Blood Collected
Not applicable.
8.2
Retention
No biological samples are being stored for the purposes of this study. Data submitted to the data
repository will be stored indefinitely. Individual data can be removed from the repository at any
time at the request of the subject.
9
PROTECTION OF HUMAN SUBJECTS
9.1
Informed Consent
For this study, we are requesting a waiver of full informed consent. In place of full informed
consent, we will implement an “opt-out” participant awareness process. Currently, the ISCA
Consortium (the precursor to ClinGen) successfully uses an opt-out process for the submission of
data from CMA testing to dbGaP and dbVar within NCBI. This method will continue to be
utilized for data collection of genomic data from CMA, single-gene, and gene-panel assessments
through ClinGen. The opt-out model was developed by the Collaboration, Education and Test
Translation (CETT) Program of the National Institutes of Health (NIH) Office of Rare Disease
Research 16. It was reviewed and approved by the NIH Office of Human Subjects Research
(OHSR) and has previously been approved by several academic IRBs, including Emory
University, Mayo Clinic, and Geisinger Health System for use in the ISCA Consortium.
The opt-out process is considered appropriate for these purposes for several reasons: 1) Recontacting patients to obtain formal consent is not possible for most clinical laboratories and not
feasible for the scope of this project. Often, patient contact information is not provided to clinical
testing laboratories, making it impossible to contact patients directly. Further, managing the full
consent process of thousands of potential participants around the world would be cost prohibitive
and hinder the collection of a robust dataset. 2) There is little risk to participants. Though the raw
data files generated during CMA testing are theoretically identifiable, this would be highly
unlikely in practice. The raw data files are kept under controlled access, and only researchers
with IRB approved protocols may apply for and be granted access to this data. 3) There are
ample opportunities for patients to learn of the project and opt-out of participation. Descriptions
of the project and the opt-out process in lay-friendly terms are on test requisition forms, clinical
result reports, and on participating laboratory websites (see Attachment 5). Patients have the
Version Date October 2013
18
opportunity to opt-out of participation by simply checking a box on either the requisition or test
result and returning it to the laboratory, calling a toll-free number, or submitting a short form
through the laboratory website. Patients may opt-out at any time. Should they choose this
option, their data will either not be sent to NCBI, or, if it has already been sent, it will be
removed from the database.
The opt-out process also provides a process by which researchers can re-contact participants with
multiple layers of protection to reduce possible identification. Interested researchers may contact
the submitting laboratory to discuss potential research opportunities; the laboratory then contacts
the referring clinician, who contacts the patient directly. If the patient is interested in the
opportunity, he/she can then contact the researcher directly and consent to their study.
Keep or remove as appropriate for your laboratory:
As stated above, we plan to include WES and WGS datasets in the ClinVar database in the
future. We will submit an amendment to this protocol describing the process of data collection
and consent surrounding this type of data submission once we are ready to accept this type of
data.
9.2
Protection of Human Subjects Against Risks
There is minimal risk associated with this research study. While there is a risk of loss of privacy
and loss of confidentiality, multiple measures are in place to protect the privacy of individuals'
and their health information. Data submitted to ClinVar will be free of identifiers linking
individuals to the data. The data will be submitted using a unique identifier generated by the
submitting laboratory and unrelated to any existing patient identifier. NCBI will never be
provided with the keys between these laboratory identifiers and actual patient PHI. NCBI will
then assign a separate, unique, ClinVar ID; this ID will be used for public data display. NCBI
will provide the submitting laboratory with the keys between the ClinVar ID and the unique
laboratory identifier. The submitting laboratory will keep this information on passwordprotected computers and/or in locked file cabinets. No personal health identifiers for the tested
individuals or information regarding the ordering physicians will be included in the data
submission.
Version Date October 2013
19
9.3
Data Monitoring Plan
Not applicable. Study does not include any intervention.
Version Date October 2013
20
10 REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
Iafrate AJ, Feuk L, Rivera MN, et al. Detection of large-scale variation in the human
genome. Nat Genet. Sep 2004;36(9):949-951.
Consortium GP. A map of human genome variation from population-scale sequencing.
Nature. Oct 28 2010;467(7319):1061-1073.
Tennessen JA, Bigham AW, O'Connor TD, et al. Evolution and Functional Impact of
Rare Coding Variation from Deep Sequencing of Human Exomes. Science. May 21 2012.
Sayers EW, Barrett T, Benson DA, et al. Database resources of the National Center for
Biotechnology Information. Nucleic Acids Res. Jan 2011;39(Database issue):D38-51.
Bell CJ, Dinwiddie DL, Miller NA, et al. Carrier testing for severe childhood recessive
diseases by next-generation sequencing. Sci Transl Med. Jan 12 2011;3(65):65ra64.
Riggs ER, Wain KE, Riethmaier D, et al. Towards a universal clinical genomics
database: the 2012 international standards for cytogenomic arrays consortium meeting.
Human mutation. Jun 2013;34(6):915-919.
Miller DT, Adam MP, Aradhya S, et al. Consensus statement: chromosomal microarray
is a first-tier clinical diagnostic test for individuals with developmental disabilities or
congenital anomalies. American journal of human genetics. May 14 2010;86(5):749-764.
Riggs E, Wain K, Riethmaier D, et al. Chromosomal microarray impacts clinical
management. Clin Genet. Jan 25 2013.
Baldwin EL, Lee JY, Blake DM, et al. Enhanced detection of clinically relevant genomic
imbalances using a targeted plus whole genome oligonucleotide microarray. Genetics in
medicine : official journal of the American College of Medical Genetics. Jun
2008;10(6):415-429.
Riggs ER, Church DM, Hanson K, et al. Towards an evidence-based process for the
clinical interpretation of copy number variation. Clin Genet. May 2012;81(5):403-412.
Kaminsky EB, Kaul V, Paschall J, et al. An evidence-based approach to establish the
functional and clinical significance of copy number variants in intellectual and
developmental disabilities. Genetics in medicine : official journal of the American
College of Medical Genetics. Sep 2011;13(9):777-784.
Cotton RG, Auerbach AD, Beckmann JS, et al. Recommendations for locus-specific
databases and their curation. Hum Mutat. Jan 2008;29(1):2-5.
Giardine B, Borg J, Higgs DR, et al. Systematic documentation and analysis of human
genetic variation in hemoglobinopathies using the microattribution approach. Nat Genet.
Apr 2011;43(4):295-301.
Kearney HM, Thorland EC, Brown KK, Quintero-Rivera F, South ST. American College
of Medical Genetics standards and guidelines for interpretation and reporting of postnatal
constitutional copy number variants. Genet Med. Jul 2011;13(7):680-685.
Robinson PN, Kohler S, Bauer S, Seelow D, Horn D, Mundlos S. The Human Phenotype
Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum
Genet. Nov 2008;83(5):610-615.
Version Date October 2013
21
16.
Faucett WA, Hart S, Pagon RA, Neall LF, Spinella G. A model program to increase
translation of rare disease genetic tests: collaboration, education, and test translation
program. Genet Med. May 2008;10(5):343-348.
11 ATTACHMENTS
11.1 Attachment 1: ClinGen Structural Variation Data Submission Template
11.2 Attachment 2: ClinGen Sequence Variation Data Elements Dictionary
11.3 Attachment 3: ClinGen Sequence Variation Data Submission Template
11.4 Attachment 4: ClinGen Phenotype Data Collection Sheet
11.5 Attachment 5: Proposed Opt-out Language
Version Date October 2013
22
Download