TABLE OF CONTENTS 1 ABBREVIATIONS USED IN THE PROTOCOL ................................................................3 2 ABSTRACT ...........................................................................................................................4 3 BACKGROUND AND SIGNIFICANCE .............................................................................5 4 HYPOTHESIS AND SPECIFIC AIMS .................................................................................7 4.1 Hypothesis ....................................................................................................................7 4.2 Specific Aim 1 ..............................................................................................................7 5 PRELIMINARY DATA ........................................................................................................7 6 STUDY DESIGN ...................................................................................................................8 6.1 Description ....................................................................................................................8 6.2 Study Population ...........................................................................................................8 6.2.1 Approximate Number of Subjects ..................................................................8 6.2.2 Inclusion Criteria ............................................................................................8 6.2.3 Exclusion Criteria ...........................................................................................8 6.3 Recruitment...................................................................................................................9 6.4 Study Duration ..............................................................................................................9 6.5 6.4.1 Approximate Duration of Subject Participation .............................................9 6.4.2 Approximate Duration of Study .....................................................................9 Procedures.....................................................................................................................9 6.5.1 Submission of variant data into ClinVar ........................................................9 6.5.2 Expert clinical level curation of human genomic variants ...........................11 6.5.3 Study Time and Events Table ......................................................................12 6.5.4 Study Flow Diagram ....................................................................................13 6.6 Primary Endpoints ......................................................................................................14 6.7 Statistics ......................................................................................................................14 6.7.1 Statistical Analysis Plan ...............................................................................14 6.7.2 Statistical Power and Sample Size Considerations ......................................14 Version Date October 2013 1 6.8 7 6.8.1 Data Collection and Storage .........................................................................14 6.8.2 Records Retention ........................................................................................17 SAFETY MONITORING ....................................................................................................17 7.1 8 Data Management .......................................................................................................14 Adverse Event Reporting ............................................................................................17 SAMPLE COLLECTION AND RETENTION ...................................................................17 8.1 Collection ....................................................................................................................17 8.1.1 8.2 9 Total Volume of Blood Collected ................................................................18 Retention .....................................................................................................................18 PROTECTION OF HUMAN SUBJECTS ...........................................................................18 9.1 Informed Consent .......................................................................................................18 9.2 Protection of Human Subjects Against Risks .............................................................19 9.3 Data Monitoring Plan .................................................................................................20 10 REFERENCES .....................................................................................................................21 11 ATTACHMENTS ................................................................................................................22 11.1 Attachment 1: ClinGen Structural Variation Data Submission Template ..................22 11.2 Attachment 2: ClinGen Sequence Variation Data Elements Dictionary ....................22 11.3 Attachment 3: ClinGen Sequence Variation Data Submission Template ..................22 11.4 Attachment 4: ClinGen Phenotype Data Collection Sheet .........................................22 11.5 Attachment 5: Proposed Opt-out Language ...............................................................22 Version Date October 2013 2 1 ABBREVIATIONS USED IN THE PROTOCOL Abbreviation AE CETT CLIA CMA dbGaP dbSNP dbVar DGV FTP HGVS HIPAA HPO ICCG ICD9 IRB ISCA LSDB NCBI NIH OHSR PCPGM PHI RFR SAE SNOMED CT WES WGS Version Date October 2013 Term Adverse event Collaboration, Education, and Test Translation Clinical Laboratory Improvement Amendments Cytogenomic microarray Database of Genotypes and Phenotypes Single Nucleotide Polymorphism Database Database of Structural Variation Database of Genomic Variation File Transfer Protocol Human Genome Variation Society Health Insurance Portability and Accountability Act Human Phenotype Ontology International Collaboration for Clinical Genomics International Classification of Diseases, 9th Revision Institutional Review Board International Standards for Cytogenomic Arrays Locus Specific Database National Center for Biotechnology Information National Institutes of Health Office of Human Subjects Research Partners Healthcare Center for Personalized Genetic Medicine Protected Health Information Reason For Referral Serious adverse event Systematized Nomenclature of Medicine--Clinical Terms Whole Exome Sequencing Whole Genome Sequencing 3 2 ABSTRACT Technological advances are quickly making variant detection across the whole genome commonplace in the medical care environment, though the interpretation of such variation remains a challenge. Determination of the biological and medical significance of structural (copy number) and sequence-level variation in the human genome requires extensive investigation of both normal and disease populations; obtaining large numbers of patients amongst disease populations has historically been challenging. Now that whole genome structural and sequence analyses are rapidly being integrated into routine clinical care, clinical genetics laboratories have been provided a unique and timely opportunity for capturing genomic variation from thousands of individuals during the course of clinical genetic testing. An international consortium of cytogenomic, molecular and clinical geneticists based in diagnostic laboratories, clinical facilities, and research laboratories, the Clinical Genome Resource (ClinGen), has been formed to harness such clinical genomic variation and phenotypic information into a unified, curated, publically available database. Deidentified clinical data collected, submitted, and curated through this project will be housed within ClinVar, a database within the National Center for Biotechnology Information (NCBI). This effort will be critical to develop the open resource for understanding genomic variation needed to allow for more accurate interpretation of the enormous levels of variation present in human genomes. Such a resource would also benefit professional organizations seeking to define guidelines on how to use genetic information in clinically useful ways. LAB NAME HERE would participate in this observational study by contributing deidentified genotype and phenotype data generated during the course of routine clinical genetic testing to the ClinVar database, and participate in ongoing curation, quality assurance, and genotype-phenotype correlation projects related to this effort. Version Date October 2013 4 3 BACKGROUND AND SIGNIFICANCE The current landscape of human genomic variation databases includes those focused on variation as it relates to normal populations, association with common diseases (e.g., variation from genome-wide association studies), pharmacologic responses, and rare variation. Each of these types of databases provides vital information when trying to decipher the functional significance of human genomic variation. The study of variation as it relates to specific disease populations, however, has the greatest potential to provide information on genotype-phenotype correlations amongst rare but highly penetrant traits, information that could also be applied to more common diseases. Effective studies of this type of variation require large datasets from both normal (control) and disease populations. Several centralized, public databases of human variation in the normal population already exist, including the Database of Genomic Variation (DGV)1, the 1000 Genomes Project 2, the Exome Sequencing Project 3 and the Single Nucleotide Polymorphism Database (dbSNP) 4. In contrast, there is no single, comprehensive (including both sequence and structural variation), freely accessible database of variation from disease populations. Though freely accessible and curated databases of structural variation exist, including the International Standards for Cytogenomic Arrays (ISCA) Consortium database (which has now been incorporated into ClinGen), most publicly available sequence-level genomic variation from disease populations is either buried in publications or contained within diverse Locus Specific Databases (LSDBs), of which over 1,500 exist (http://www.hgvs.org/dblist/glsdb.html#). These LSDBs have several drawbacks that limit their utility: 1) Size: most disease or gene-centric databases are very small, and many genes/genomic regions are not yet represented; 2) Fragmentation: the data may be located in multiple, independent databases, and submission is often confined to a limited number of research laboratories; 3) Lack of standardization: data submission and content is not standardized to allow comparison of data submitted across laboratories; 4) Lack of curation: some data is deposited without any, or with an incorrect, interpretation of the functional or clinical significance5; and 5) Access: the data is not easily accessible to the research and clinical communities, particularly for use in high-throughput genomic analyses. An additional resource for human genetic variation in disease populations is the Human Gene Mutation Database (www.hgmd.org). This resource, however, is not freely available, and Version Date October 2013 5 only represents published data, which is a much smaller proportion of the actual amount of variation being observed in clinical laboratories. Further, approximately 30% of mutations within these databases annotated from the literature are in non-standardized formats, and/or are labeled with incorrect clinical assertions5. One reason for this high error rate is that incorrect initial variant interpretations are only revised when an erratum to the original manuscript is published. Given the current state of publicly available sequence-level variant information, as well as the fact that emerging technologies will soon allow for the detection of both structural and sequence-level variant data on a single platform, we have joined the Clinical Genome Resource (ClinGen), a consortium of clinical genetic testing laboratories vested in sharing variant data in a single, open access database6. Such an approach would harness the wide phenotypic and genotypic variant spectra observed in clinical laboratories, and take advantage of data already being generated during the course of routine clinical care. Historically, it has been difficult to identify and collect large groups of individuals with specific phenotypes, particularly with Mendelian genetic traits, from an unselected population. Clinical genetic testing laboratories, however, are uniquely poised to provide large volumes of variant data on affected populations, as these individuals constitute their primary sample base. In addition, as phenotypic information is routinely captured by many clinical laboratories in order to properly process and interpret test results, basic phenotype information is already available for many clinical testing cases. ClinGen will leverage the hundreds of thousands of tests performed every year within clinical laboratories around the world on affected patients with demonstrable phenotypes to build a database of clinical grade genomic variation. As a member of this organization, LAB NAME HERE will be one of the many entities contributing deidentified genotype and phenotype data to this effort. Initial efforts will focus on rare disorders; the majority of clinical genetic testing performed today is in this category, and this information has the highest immediate clinical utility in genomic medicine. We will expand our assessment of variation to include the entire human genome as whole exome and genome sequencing rapidly make their way into routine clinical use as primary diagnostic tests. Unlike other database efforts, the goals of our effort are to support the submission of all variation observed in affected individuals during clinical testing into a publically available database, and to make this information freely and easily accessible to the community. We intend to ensure the quality of the information through ongoing curation efforts. Version Date October 2013 6 4 HYPOTHESIS AND SPECIFIC AIMS 4.1 Hypothesis This is a data repository study. We will contribute to a database of primarily clinical grade variation data across the genome that concentrates the knowledge and curation capabilities of our clinical laboratory community into a single environment. Capturing the large number of clinical genetic tests being performed on patients with disease phenotypes presents a unique opportunity to contribute to our understanding of the functional significance of human genomic variation, which will benefit the growing community of medical genomics clinicians and researchers. The specific aims to achieve this goal are: 4.2 Specific Aim 1 Submit de-identified variant and phenotypic data into ClinVar, a unified database at NCBI. A large consortium of highly experienced clinical laboratories has committed to submitting de-identified genomic data (from clinical cytogenomic microarray, single gene, gene panel, and, later, whole exome/genome sequencing testing) and basic phenotypic data into ClinVar, a database developed and housed within the National Center for Biotechnology Information (NCBI). LAB NAME HERE will use existing and newly developed software bridges to current analysis programs to ensure simple, automated mechanisms to de-identify and reformat datasets for submission. 5 PRELIMINARY DATA ClinGen and the ClinVar database represent the expansion of previous efforts known as the International Standards for Cytogenomic Arrays (ISCA) Consortium (https://www.iscaconsortium.org) and the International Collaboration for Clinical Genomics (ICCG). The ISCA Consortium, founded in 2007, was a group of laboratories, clinicians, and researchers dedicated to raising the standard of patient care by improving the quality of cytogenomic microarray (CMA) testing for structural variation. One of the group’s main goals was the creation and maintenance of a publicly available database of human structural variation. This project has resulted in the deposition of over 40,000 cases into the database of structural Variation (dbVar) at NCBI (http://www.ncbi.nlm.nih.gov/dbvar/studies/nstd37/). The ISCA Consortium has also focused on other issues related to CMA testing, including clinical utility7,8 and evidence-based array design9,10 and result interpretation10,11. Realizing that these types of issues were not unique to structural variation, the ISCA Consortium chose to expand its efforts to include sequence-level variation, and officially became the International Collaboration for Version Date October 2013 7 Clinical Genomics (ICCG) in 2012.6 In 2013, ICCG became a part of the Clinical Genome Resource, combining its efforts with those of others focused on data curation and machine learning initiatives. 6 STUDY DESIGN 6.1 Description Geisinger Health System will be serving as the co-coordinating center, along with the Partners Healthcare Center for Personalized Genetic Medicine (PCPGM), for this project, which aims to establish a data repository of curated, clinical-grade genomic variation data. This project has been approved by the IRBs at both institutions (Geisinger IRB# 2013-0367, Parnters IRB #2013P001099/BWH). Data will be stored within the ClinVar database at the NCBI and NOT at Geisinger or PCPGM. Data submission procedures are discussed in section 6.5, “Procedures,” as well as section 6.8.1, “Data Collection and Storage.” 6.2 6.2.1 Study Population Approximate Number of Subjects The goal of this project is to place information from at least 100,000 participants worldwide in the database. Approximately # individuals from LAB NAME HERE will participate per year; this study will go on indefinitely. 6.2.2 Inclusion Criteria All patients undergoing clinical cytogenomic or molecular-based testing at a participating clinical genetics laboratory will be eligible for inclusion in this study. There are no restrictions on age, gender, ethnicity, etc. Patients will not be undergoing any type of intervention or procedure as a result of participating in this study. No biological specimens will be collected, utilized, transferred, or stored solely for the purposes of this study; any interactions with biological specimens will be solely for the purpose of the clinical genetic testing ordered on behalf of the patient, and will conform to standard operating procedures for a clinical genetics laboratory. 6.2.3 Exclusion Criteria Patients undergoing ONLY biochemical genetic testing at participating clinical genetics laboratories will be excluded from this study. Version Date October 2013 8 6.3 Recruitment Subjects are not being actively recruited for this project; data from ALL patients referred for cytogenomic and/or molecular-based testing at a participating laboratory (as described in section 6.2.2, “Inclusion Criteria”) will be deidentified and submitted to ClinVar, unless the patient optsout of participation. Subjects may request not to be included (opt-out). 6.4 6.4.1 Study Duration Approximate Duration of Subject Participation As this is a data repository, duration of subject participation (i.e., amount of time data will remain in the repository) is indefinite. 6.4.2 Approximate Duration of Study As this is a data repository, duration of the study (i.e., amount of time this repository will remain active) is indefinite. 6.5 6.5.1 Procedures Submission of variant data into ClinVar This project will support the submission of genotypic and phenotypic data into a centralized location - the ClinVar database housed at NCBI. ClinVar has been developed by NCBI to “provide a freely accessible, public archive of reports of the relationships among human variations and phenotypes along with supporting evidence” (http://www.ncbi.nlm.nih.gov/clinvar/intro). ClinVar will support the submission of variant observations and basic phenotype information by laboratories, either as summarized variant data or individual case data. The system will also enable the submission of assertions made regarding the clinical significance of variants (e.g., “pathogenic,” “benign,” etc.) as well as the evidence to support those assertions. The types of data to be collected include both genotype and phenotype data. For structural variation, the project will focus on genome-wide data generated from CMA testing, from individuals with a broad spectrum of clinical phenotypes, including (but not limited to) neurodevelopmental disabilities and/or congenital anomalies. For sequence variation, the project will focus on sequence-level variants identified through both single-gene and multi-gene panel assays from individuals referred for diagnostic, carrier, and pre-symptomatic testing for inherited disorders. In the future, we will also include variants identified through whole-exome (WES) and whole-genome sequencing (WGS) analysis. We will submit a separate amendment to Version Date October 2013 9 this protocol to discuss the procedures by which this will occur at a later date. Until that time, we will NOT be submitting variants or phenotypic information obtained during the course of WES or WGS. All data submitted will be stripped of all HIPAA identifiers by the submitting laboratory. The submitting laboratory will then assign the data a unique code NOT derived from any protected health information (PHI) (such as their lab accession number). This code is known only to the submitting laboratory. The data is then submitted to NCBI, where the submitting laboratory’s unique code is removed and yet another unique NCBI identifier is assigned. NCBI provides the submitting laboratory with the link between their unique laboratory-assigned codes and the NCBI identifier. NCBI NEVER has access to PHI, and PHI is NEVER placed in the public database. The submitting laboratory is the only entity that can link a particular case to PHI; they need to be able to do this in order know which cases they have already submitted to the database (to avoid data duplication), to amend clinical reports if necessary following project curation activities (as described below), etc. Further, because the data was originally generated as part of routine clinical care, the submitting laboratories must keep it associated with PHI within the confines of the laboratory, in order to comply with Clinical Laboratory Improvement Amendments (CLIA) guidelines. Initially, most submitted data will be derived from single gene mutation analyses, gene panel testing, or CMA testing. Data consisting of only variant information (including clinical interpretation) and basic phenotype information (age at time of testing, gender, phenotype provided at the time of testing utilizing standardized vocabularies) will be considered nonidentifiable, and will therefore not require full consent or submission through an opt-out process (see section 9.1, “Informed Consent,” for a detailed description of this process). Examples of this type of data include individual variants identified on single gene testing, compound heterozygous variants identified on single gene testing, or small sets of variants identified either through CMA (the set of independent calls from an assay) or multi-gene panel tests. This deidentified data can be deposited for public access within ClinVar. Full genome-wide datasets, such as the raw data files generated from CMA analyses, could be considered potentially identifiable, though this possibility is considered remote. As an extra mechanism of protection, this type of data would only be submitted by laboratories with an optout mechanism (described below in Section 9.1) in place. This type of data will also be initially processed through the database of Genotypes and Phenotypes (dbGaP), before de-identified variant information is released publicly through ClinVar. Potentially identifiable files will Version Date October 2013 10 remain within dbGaP under controlled access, which requires approval by NIH before an investigator can gain access to the data. 6.5.2 Expert clinical level curation of human genomic variants In conjunction with the development submission of data into a single accessible environment, tremendous effort will be focused on developing and demonstrating efforts to curate the data, including improving the classification of variants with respect to their role in human health and disease. The curation process is integral to the success of this and any effort attempting to determine the functional significance of genomic variation. ClinGen has developed, in partnership with NCBI, several algorithm-based curation tools and integrated them into the data submission process for structural variation through its previous project, the ISCA Consortium. The goal of this effort is to improve the quality of data entering the database while providing immediate value to the submitter through validation and consistency checks. Data curation is carried out at multiple levels: 1) quality checks, such as checks for appropriate genome position and nomenclature, are performed, 2) regions where the same laboratory has reported different clinical calls for similar observations are identified and flagged, and 3) the submitted calls are compared against an expert curated list of regions with established clinical significance (http://www.ncbi.nlm.nih.gov/dbvar/studies/nstd45/). The results are presented to the submitter via an authenticated web page that includes a genome overview, a genome browser and a tabular report of the regions to be reviewed. Each identified "conflict" region is presented to the submitter who has the option to click on the variant and change the call, or choose to retain the original call and ignore the conflict warning. For these cases, the submitter is asked to provide a reason for their decision. Upon completion of the review, the submitter finalizes the submission, which will incorporate a history of any changes applied. The editing history of each event is retained in a central database to allow for full review at a later date. This curation tool provides a user-friendly system by which the information submitted to the database can be reviewed prior to integration with the full dataset. This type of curation process has already demonstrated significantly enhanced quality control for clinical reporting. Similar laboratory-level curation tools will be developed for sequence-level data. Those tools will be available within ClinVar as described above for structural variant submission. Version Date October 2013 11 In addition to being curated against previous submissions from our laboratory, variants will also be curated against the submissions of other laboratories. Individual laboratories may come to different conclusions regarding the clinical consequences of the same variant, and the expertlevel review process can provide a method of resolution12. Furthermore, as evidence available regarding particular variants changes over time, clinical classifications made at the time of data submission may require re-evaluation13. Expert level curation will allow for “real time” integrated analysis of data from multiple submitting laboratories with currently available evidence for clinical assertion. Should any variants submitted by LAB NAME HERE receive updated clinical classifications as a result of these curation efforts, the original clinical reports will be amended to reflect this information, and redistributed to the original ordering clinician for discussion with the patient, per standard clinical laboratory protocol. 6.5.3 Study Time and Events Table Not applicable. Version Date October 2013 12 6.5.4 Study Flow Diagram Version Date October 2013 13 6.6 Primary Endpoints Not applicable. The study focuses on the creation of a data repository with no specified endpoint. 6.7 6.7.1 Statistics Statistical Analysis Plan The goal of this study is to create a repository of de-identified genotype/phenotype information on individuals referred for clinical genetic testing. Because of the nature of this project, statistical analyses are not required. This repository is to serve as a resource for other researchers, who will determine their own analysis procedures based upon their individual research questions. Deidentified data (i.e., individual structural or sequence-level variants) within the repository will be publically available through the ClinVar database; researchers will not need an IRB approved protocol to access this information. Potentially identifiable data (e.g. raw data files from CMA testing, whole exome datasets, etc.) will be released through dbGaP to qualified researchers at the discretion of a data access committee at NCBI. Researchers wishing to access this type of data must have an IRB approved protocol. 6.7.2 Statistical Power and Sample Size Considerations Not applicable. 6.8 6.8.1 Data Management Data Collection and Storage A data submission template has previously been developed for structural variation by the ISCA Consortium (see Attachment 1); this template will continue to be used for this expanded project. A similar approach has begun for sequence-level variants, and a preliminary data element dictionary and data submission template have been established (see Attachments 2 and 3, respectively). All variants detected in an individual during the course of routine clinical genetic testing are stripped of identifying information and submitted to NCBI. Each variant is submitted with a minimum amount of information necessary to define the variant (e.g., genomic location, etc.), the laboratory method used to identify it (see Attachments 1 and 3), as well as the initial clinical assertion (e.g. pathogenic, uncertain, benign, etc.) of the submitting laboratory. As discussed above, all clinical assertions are subject to further evaluation by an expert consensus process, and could be reclassified following consensus curation activities 14. For Version Date October 2013 14 any given variant, the submitting laboratory also has the option of submitting more detailed information, such as phenotypic information, data on variant frequency in cases versus controls, variants found in cis/trans, publications, segregation data, functional data, analytical method, source of data, etc. To facilitate uniform phenotypic data collection, phenotypes, diseases and clinical terms will be described using standard ontologies. An ontology is a computational representation of a domain of knowledge based upon a controlled, standardized vocabulary for describing entities and the semantic relationships between them. The terminology for phenotype ontology already built into ClinVar comes preferentially from two sources: the Human Phenotype Ontology (HPO) 15 and SNOMED CT (Systematized Nomenclature of Medicine--Clinical Terms; http://www.nlm.nih.gov/research/umls/Snomed/snomed_main.html). All phenotype information entered into the ClinVar database will be coded using some type of ontological system – no free text phenotype information will be publicly displayed in an effort to maintain patient privacy. For phenotypic data, the standard format will consist of general variables such as age at time of testing, gender, race/ethnicity (if provided), type of testing (symptomatic vs. asymptomatic), and reason for referral for clinical testing. We will use several different resources to collect phenotype data. ClinGen has developed a one-page phenotype “checklist” utilizing HPO terms organized by body system (see Attachment 4). These forms have been designed for use as part of a standard clinical test requisition form, or to be used when ordering testing electronically. ClinGen has also worked with the software company Cartagenia (www.cartagenia.com) to develop a data submission tool incorporating these forms. Additionally, Cartagenia has developed algorithms to automatically detect HPO classifiers from free text, such as Reason for Referral (RFR) text. This Cartagenia algorithm includes “ICD9 expansion", a feature that allows frequently used ICD9 (International Classification of Diseases, 9th Revision) codes to automatically be mapped to HPO classification codes. We will utilize both of these tools, as well as any other tools developed by ClinGen to meet our expanding needs, such as disease-specific phenotype forms (utilizing standardized vocabulary), etc., in order to facilitate the detailed phenotype information necessary for certain single gene or gene-panel tests. Edit this section as appropriate for your laboratory, discussing any software/vendors you use or plan to use: ClinGen has also worked with different software vendors to develop a variety of tools in order to facilitate the data submission process. These relationships will continue as the efforts of the ISCA Consortium and ICCG are expanded into those of ClinGen. For example, Cartagenia has worked to provide a more integrated source for laboratories to use to facilitate the deVersion Date October 2013 15 identification and transfer of data to the NCBI database. Cartagenia utilizes a web-based database and software platform for securely managing genotype and phenotype data for patients and study subjects, which has been adapted to the needs of the ISCA Consortium through the creation of the ISCA BENCH platform. ISCA BENCH includes features such as security measures of accountability and HIPAA (Health Insurance Portability and Accountability Act) compliance, de-identification of patient identifiers, and transparent linkage of genotype and phenotype information. The Cartagenia software has been installed in multiple laboratories and automatically performs the de-identification and transfer of the data from the clinical laboratory to the current ISCA database at NCBI. In addition, multiple other array software vendors (such as Affymetrix, Agilent, BioDiscovery, BlueGnome, Oxford Gene Technology, and Roche Nimblegen) have also incorporated the standardized data form into their software such that genotype data is in the proper format for direct submission to NCBI. We will continue to use these groups and others to facilitate our data submission, and will work with others in the future to ensure that data is transferred as efficiently as possible. All systems used for data submission will enable the creation of unique IDs (i.e., NOT the original clinical laboratory ID) for case submission, with links to submitter IDs that are maintained solely by the submitting laboratory. This functionality has already been built within Cartagenia and will be extended to other tools supported by the project. All data is transmitted to NCBI via secure file transfer protocol (FTP) site. ClinVar quality control and versioning The data elements of ClinVar are stored using reliable, standard technologies, including a relational database in order to ensure transactional integrity, backup, transaction history and rollback, and custodianship. Structural and sequence variants submitted to ClinVar are mapped to reference sequences and annotated according to Human Genome Variation Society (HGVS) standards. ClinVar assigns a version number to all data submissions, allowing submitters to update their records and retaining the previous version for review. Updates in content will happen on a daily basis with semi-annual or annual changes in data definitions and backward compatibility in data reporting. The system also ensures that a minimum set of data is included in each submission. The level of confidence in the accuracy of assertions of clinical significance depends in large part on the supporting evidence, so this information, when available, is collected and visible to users. Version Date October 2013 16 Several quality control checks have been built into the system: a check that the variant is defined within valid regions of genomic sequence; a check that the variant description conforms to HGVS nomenclature; identification of conflicts among submitters and the published literature or prior submitted data; and validation of the phenotype relative to controlled vocabularies and current understanding of gene to phenotype relationships. The quality control checks will be implemented in a manner to minimize barriers to data submission yet ensure high quality data. As such, certain checks will prevent data submission, whereas others will allow data submission but render a warning output to the submitter. For example, variant descriptions that are not resolvable on a reference sequence will not be allowed, whereas discrepancies in variant classification will be provided in a discrepancy report. Both of these quality control checks of the variant classification system have been in place for several years in support of the ISCA project and have been highly successful in improving the quality of structural variant interpretations. 6.8.2 Records Retention Submitted data will remain in the database indefinitely. 7 SAFETY MONITORING 7.1 Adverse Event Reporting The procedures used during this study (de-identification and submission of existing data to a publicly available database) are extremely unlikely to result in any adverse events (AEs). In the event that a clinical adverse event (AEs) occurs, it will be reported to the institutional review board (IRB) regardless of whether it is considered study related. In the event of a serious AE (SAE), this will be reported to the IRB according to the IRB guidelines. All other AEs will be summarized and submitted to the IRB during continuing review. 8 SAMPLE COLLECTION AND RETENTION 8.1 Collection No samples are being collected solely for the purposes of this research study. This study strictly involves submission of existing data (results of clinical genetic testing performed on samples acquired for the purpose of such testing) into a data repository. No biological specimens are being stored, transferred, or analyzed for the purposes of this study. Version Date October 2013 17 8.1.1 Total Volume of Blood Collected Not applicable. 8.2 Retention No biological samples are being stored for the purposes of this study. Data submitted to the data repository will be stored indefinitely. Individual data can be removed from the repository at any time at the request of the subject. 9 PROTECTION OF HUMAN SUBJECTS 9.1 Informed Consent For this study, we are requesting a waiver of full informed consent. In place of full informed consent, we will implement an “opt-out” participant awareness process. Currently, the ISCA Consortium (the precursor to ClinGen) successfully uses an opt-out process for the submission of data from CMA testing to dbGaP and dbVar within NCBI. This method will continue to be utilized for data collection of genomic data from CMA, single-gene, and gene-panel assessments through ClinGen. The opt-out model was developed by the Collaboration, Education and Test Translation (CETT) Program of the National Institutes of Health (NIH) Office of Rare Disease Research 16. It was reviewed and approved by the NIH Office of Human Subjects Research (OHSR) and has previously been approved by several academic IRBs, including Emory University, Mayo Clinic, and Geisinger Health System for use in the ISCA Consortium. The opt-out process is considered appropriate for these purposes for several reasons: 1) Recontacting patients to obtain formal consent is not possible for most clinical laboratories and not feasible for the scope of this project. Often, patient contact information is not provided to clinical testing laboratories, making it impossible to contact patients directly. Further, managing the full consent process of thousands of potential participants around the world would be cost prohibitive and hinder the collection of a robust dataset. 2) There is little risk to participants. Though the raw data files generated during CMA testing are theoretically identifiable, this would be highly unlikely in practice. The raw data files are kept under controlled access, and only researchers with IRB approved protocols may apply for and be granted access to this data. 3) There are ample opportunities for patients to learn of the project and opt-out of participation. Descriptions of the project and the opt-out process in lay-friendly terms are on test requisition forms, clinical result reports, and on participating laboratory websites (see Attachment 5). Patients have the Version Date October 2013 18 opportunity to opt-out of participation by simply checking a box on either the requisition or test result and returning it to the laboratory, calling a toll-free number, or submitting a short form through the laboratory website. Patients may opt-out at any time. Should they choose this option, their data will either not be sent to NCBI, or, if it has already been sent, it will be removed from the database. The opt-out process also provides a process by which researchers can re-contact participants with multiple layers of protection to reduce possible identification. Interested researchers may contact the submitting laboratory to discuss potential research opportunities; the laboratory then contacts the referring clinician, who contacts the patient directly. If the patient is interested in the opportunity, he/she can then contact the researcher directly and consent to their study. Keep or remove as appropriate for your laboratory: As stated above, we plan to include WES and WGS datasets in the ClinVar database in the future. We will submit an amendment to this protocol describing the process of data collection and consent surrounding this type of data submission once we are ready to accept this type of data. 9.2 Protection of Human Subjects Against Risks There is minimal risk associated with this research study. While there is a risk of loss of privacy and loss of confidentiality, multiple measures are in place to protect the privacy of individuals' and their health information. Data submitted to ClinVar will be free of identifiers linking individuals to the data. The data will be submitted using a unique identifier generated by the submitting laboratory and unrelated to any existing patient identifier. NCBI will never be provided with the keys between these laboratory identifiers and actual patient PHI. NCBI will then assign a separate, unique, ClinVar ID; this ID will be used for public data display. NCBI will provide the submitting laboratory with the keys between the ClinVar ID and the unique laboratory identifier. The submitting laboratory will keep this information on passwordprotected computers and/or in locked file cabinets. No personal health identifiers for the tested individuals or information regarding the ordering physicians will be included in the data submission. Version Date October 2013 19 9.3 Data Monitoring Plan Not applicable. Study does not include any intervention. Version Date October 2013 20 10 REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. Iafrate AJ, Feuk L, Rivera MN, et al. Detection of large-scale variation in the human genome. Nat Genet. Sep 2004;36(9):949-951. Consortium GP. A map of human genome variation from population-scale sequencing. Nature. Oct 28 2010;467(7319):1061-1073. Tennessen JA, Bigham AW, O'Connor TD, et al. Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes. Science. May 21 2012. Sayers EW, Barrett T, Benson DA, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. Jan 2011;39(Database issue):D38-51. Bell CJ, Dinwiddie DL, Miller NA, et al. Carrier testing for severe childhood recessive diseases by next-generation sequencing. Sci Transl Med. Jan 12 2011;3(65):65ra64. Riggs ER, Wain KE, Riethmaier D, et al. Towards a universal clinical genomics database: the 2012 international standards for cytogenomic arrays consortium meeting. Human mutation. Jun 2013;34(6):915-919. Miller DT, Adam MP, Aradhya S, et al. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. American journal of human genetics. May 14 2010;86(5):749-764. Riggs E, Wain K, Riethmaier D, et al. Chromosomal microarray impacts clinical management. Clin Genet. Jan 25 2013. Baldwin EL, Lee JY, Blake DM, et al. Enhanced detection of clinically relevant genomic imbalances using a targeted plus whole genome oligonucleotide microarray. Genetics in medicine : official journal of the American College of Medical Genetics. Jun 2008;10(6):415-429. Riggs ER, Church DM, Hanson K, et al. Towards an evidence-based process for the clinical interpretation of copy number variation. Clin Genet. May 2012;81(5):403-412. Kaminsky EB, Kaul V, Paschall J, et al. An evidence-based approach to establish the functional and clinical significance of copy number variants in intellectual and developmental disabilities. Genetics in medicine : official journal of the American College of Medical Genetics. Sep 2011;13(9):777-784. Cotton RG, Auerbach AD, Beckmann JS, et al. Recommendations for locus-specific databases and their curation. Hum Mutat. Jan 2008;29(1):2-5. Giardine B, Borg J, Higgs DR, et al. Systematic documentation and analysis of human genetic variation in hemoglobinopathies using the microattribution approach. Nat Genet. Apr 2011;43(4):295-301. Kearney HM, Thorland EC, Brown KK, Quintero-Rivera F, South ST. American College of Medical Genetics standards and guidelines for interpretation and reporting of postnatal constitutional copy number variants. Genet Med. Jul 2011;13(7):680-685. Robinson PN, Kohler S, Bauer S, Seelow D, Horn D, Mundlos S. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet. Nov 2008;83(5):610-615. Version Date October 2013 21 16. Faucett WA, Hart S, Pagon RA, Neall LF, Spinella G. A model program to increase translation of rare disease genetic tests: collaboration, education, and test translation program. Genet Med. May 2008;10(5):343-348. 11 ATTACHMENTS 11.1 Attachment 1: ClinGen Structural Variation Data Submission Template 11.2 Attachment 2: ClinGen Sequence Variation Data Elements Dictionary 11.3 Attachment 3: ClinGen Sequence Variation Data Submission Template 11.4 Attachment 4: ClinGen Phenotype Data Collection Sheet 11.5 Attachment 5: Proposed Opt-out Language Version Date October 2013 22